Article 3 – Mari Skjerdal Lysne

Aesthetic Engagement with Mouse Bird Snake Wolf – Literary Multimodal Literacy in English Language Education

Mari Skjerdal Lysne

Download PDF


Multimodal literature plays an increasingly large role in literature and language education, including in English language teaching (ELT) in many contexts. In order to engage with all aspects of multimodal literature, it is necessary to consider both the multimodal and aesthetic nature of literary texts and what type of literacy is required to engage with these texts. In this article I will refer to this as ‘literary multimodal literacy’. I explore how literary multimodal literacy can be understood and how a group of lower secondary school learners in Norway demonstrated some of these competences in a classroom literacy event concerning the multimodal novel Mouse Bird Snake Wolf by David Almond and Dave McKean (2013). Based on this investigation, I will point to possible implications for teaching and acquiring literary multimodal literacy in the English classroom.

Keywords: multimodal literature, multimodal novel, ELT, literary multimodal literacy, aesthetic engagement

Mari Skjerdal Lysne is a PhD candidate at Western Norway University of Applied Sciences, Norway. She has a background as a teacher in upper secondary school and teacher educator within the fields of English language literature and English subject pedagogy. Her research concerns multimodal literature and learners’ acquisition of literary multimodal literacy in English language education.



Recent decades have seen a ‘multimodal turn’ (Jewitt, 2014, p. 4) in education as well as society, signalling a shift from the privileging of language to the recognition of the role multiple modes play in communication. The multimodal turn has led to a wider understanding of what literacy entails (Gee, 2014; UNESCO, 2023), as well as to an increased multimodal understanding and more diversity in communication and literature. It has also influenced English language education, for instance by recognizing that multimodal meaning-making is central to language learning (Eisenmann & Meyer, 2018; Lim, 2021; Maagerø & Tønnessen, 2022) and moving from mainly verbal and print-based texts to include a variety of multimodal texts (Beavis, 2013; Eisenmann & Summer, 2020; Hoff, 2022). The multimodal turn is also represented in the present English subject curriculum in Norway, which states that texts

can be spoken and written, printed and digital, graphic and artistic, formal and informal, fictional and factual, contemporary and historical. The texts can contain writing, pictures, audio, drawings, graphs, numbers and other forms of expression that are combined to enhance and present a message. (Utdanningsdirektoratet, 2019, p. 3)

The multimodal turn in education, including in English language teaching (ELT), has increasingly been the subject of research (for example, Kulju et al., 2018; Lim, Toh & Nguyen, 2022; Si, Hodges & Coleman, 2022; Yi, 2014). In the Norwegian context, Jakobsen and colleagues (Jakobsen, 2019; Jakobsen & Tønnessen, 2018) have researched multimodality in literacy practices in the English classroom. They found that although there was considerable multimodal input in English classes, learners were seldom required to produce multimodal texts or respond to multimodal texts as ensembles. Instead, multimodality has been seen as supportive in ELT, and something learners learn through instead of about (Jakobsen, 2022). In many instances in ELT, verbal text has been considered principal, while other modes, especially the visual, have been seen as illustrative, motivational, or supportive (Heinz, 2018; Jakobsen & Tønnessen, 2018; Skulstad, 2020), despite researchers pointing to multimodal literature as central for developing for instance intercultural competence (Heggernes, 2019; Mourão, 2023) and a wide range of literacies (Arizpe, Farrar & McAdam, 2017; Ellis, 2018).

Notwithstanding the large and increasing focus on the benefits of employing multimodal literature, there is less focus on this literature as literature in educational research (Arizpe, 2021; van der Pol, 2012). Reading literature, including multimodal literature, involves cognitive, motivational, and affective processes which differ from reading other types of texts (Meier et al., 2017). However, Hoff (2022) claims that when literature is clustered together with other text types, as in the English subject curriculum in Norway, there is a danger that the specific properties of literature as aesthetic cultural expressions may be disregarded. Reading multimodal literary texts competently thus includes both literary-aesthetic literacy as well as multimodal literacy (Leibrandt, 2019). While Leibrandt (2019) employs the two phrases ‘reading literacy’ related to multimodal literature and ‘a multimodal literacy, which includes literary-aesthetic reading’ (p. 259), I will use the term ‘literary multimodal literacy’ in this article. This term does not necessarily represent a new type of literacy. However, the double-barrelled term equates the two descriptors of literacy and may thus help emphasize the importance of both, closely integrated, concepts.

The present study is part of a larger project which seeks to explore how we can understand literary multimodal literacy and encourage the development of it in English language education in Norway. About changes in literacy education, Lim and colleagues (2022) write that ‘real change arguably happens only in the classroom – through how the teachers teach and what the students learn’ (p. 11). Through a classroom intervention, the present study explores a pedagogical approach to literary multimodal literacy. I will present an adapted model of literary literacy which explicitly includes the multimodal perspective. Further I will investigate how and to what extent learners demonstrated the literacy competences required to engage with a multimodal literary text as both multimodal and literary-aesthetic in a classroom literacy event, and include possible implications for teaching and acquiring literary multimodal literacy in ELT.


Literacy and Multimodal Literature

Definitions of literary literacy pinpoint the importance of aesthetic engagement and interpretative and critical practices (Bland, 2023; Cai & Traw, 1997). Locke and Andrews (2004) write about ‘literature-related literacies’ in the plural form, referring to ‘a range of competences enabling students to read, interpret and critique literary texts (however defined) and to engage in the production of such texts’ (p. 7). This plurality may be useful when exploring how literary multimodal literacy can be approached in education. Alter and Ratheiser (2019) present a model for literary literacy which consists of four competences: empathic competence, aesthetic and stylistic competence, cultural and discursive competence, and interpretative competence. In addition, general linguistic and reading competences are required (Alter & Ratheiser, 2019). However, when the literature in question is multimodal, competences need to involve not just the linguistic mode, but multiple modes and their intermodal relationships.

Alter and Ratheiser’s (2019) model builds on previous models for literary competences (for example, Burwitz-Meltzer, 2007; Diehr & Surkamp, 2015; Spiro, 1991) and the Common European Framework of Reference for Languages: Learning, teaching, assessment – Companion volume (CEFR Companion volume; Council of Europe, 2018, 2020). It thus relates to the type of ‘can do’ statements or competence aims often found in language curricula and has a practical approach to language learning. Alter and Ratheiser (2019) employ a multimodal text, a picturebook, as their example text. In this respect, they go further in a multimodal direction than the CEFR Companion volume, which focuses on the linguistic mode while visual semiotic resources seem to be deemed supportive (Sindoni et al., 2022). However, Alter and Ratheiser (2019) demonstrate some of the same terminology when they write about ‘illustrations’ (p. 383) and employing visual texts to involve younger learners. While Alter and Ratheiser’s model usefully and effectively combines an aesthetic approach to literature with a consideration for practical application in the ELT classroom, there is a need for approaches which support readers’ engagement with the whole multimodal ensemble. In the following, I will outline how the competences in Alter and Ratheiser’s model may look if aesthetic aspects and literary conventions are explicitly combined with a focus on multimodality.

Empathic competence is the ability to relate to text and characters personally and emotionally. This is tied to the ability to relate a literary text to one’s own experiences (Sipe, 2008). However, empathic competence also includes recognizing a character’s emotions and how they are expressed (Alter & Ratheiser, 2019). This can be done through a range of semiotic resources, in the case of printed multimodal literature resources such as words, typography, colours, facial expressions or gestures of the character, placement, symbols and so on.

Aesthetic and stylistic competence is tied to subjective experiences of literature and knowledge of stylistic devices which influence these experiences. Reading multimodal literature requires the reader to have knowledge about the semiotic resources they encounter, as well as the ability to explore and synthesize intermodal relationships, the interplay of different modes (Hallet, 2018; Jewitt, 2014). To do so explicitly, the different semiotic resources need to be identified and the reader must consider their meaning potential in the context of the text. One approach to this may be found in Serafini’s (2010) framework for understanding multimodal texts, where he outlines three analytical perspectives: the perceptual, the structural, and the ideological. These perspectives further influenced a pedagogical approach to multimodal literacy (Serafini, 2015), and I find that these perspectives may be useful when expanding Alter and Ratheiser’s model to include literary multimodal literacy. The perceptual perspective calls attention to the elements of a multimodal ensemble to create an inventory of the literal or denotative content. The structural perspective takes this further, as the denotative content is explored to construct meaning potentials. A component of these perspectives is to develop a metalanguage to describe and discuss modes, semiotic resources, and meaning potential in multimodal texts (Lim, 2021; Pantaleo, 2020; Unsworth & Macken-Horarik, 2017). This can help structure learners’ noticing and subsequent interpretations, ‘offering a fresh view of choices that have been taken for granted in diegetic reading’ (Macken-Horarik et al., 2018, p. 255). Considering stylistic devices through these perspectives can encourage a deeper reading of all modes and help readers go beyond an ‘intuition-based rather than fully literate comprehension’ (Hallet, 2018, p. 28) of the multimodal ensemble.

Cultural and discursive competence is related to Serafini’s ideological perspective. This competence refers to the ability to identify and work with the sociocultural context and values of a text and how they are expressed. The ideological analytical perspective invites readers to consider the sociocultural, historical, and political contexts in which a multimodal text is produced, distributed, and received, as well as how sociocultural practices and values shape and are shaped by semiotic resources and influence the meaning potential of a multimodal ensemble (Kress, 2010; Serafini, 2010). It can also be an opportunity to ‘disrupt commonplace assumptions’ (Serafini, 2015, p. 419) and investigate the learners’ culture(s) as well as that of the text they are engaging with. Hallet (2018) points out that visual semiotic resources may have a ‘seemingly simple affordance […] which often rely on the recognisability of the depicted object’ (p. 28), which initially may make such a disruption challenging without guidance.

Finally, interpretative competence is the ability to infer meaning from a literary text. This brings together the other competences to ‘ascribe meaning and significance’ to the text as a whole (Alter & Ratheiser, 2019, p. 382). Although all competences to some extent require interpretation, the interpretative competence attends to the meaning potential of the ensemble as a whole, whether that is a page, an opening, or the complete work.

Literary multimodal literacy thus comprises the intersection between knowledge of both literary and multimodal conventions, awareness of contexts, and the individual readers’ emotional, cognitive, creative, and social responses. In order to engage aesthetically with a multimodal literary text, readers need to be able to also engage with the full range of semiotic resources at play in the text. Through five English lessons, I attempted to help learners both employ and expand their knowledge of literary and multimodal conventions and offer them opportunities to personally engage with and respond to the multimodal literary text they read. Alter and Ratheiser’s model of literary competences, with the multimodal turn I have suggested here, both informed the classroom intervention and was used to examine learners’ expressions of competences related to engaging with multimodal literature.


Lesson Design

The study was conducted in a Norwegian lower secondary ELT classroom, where the class took part in a literacy event centred around the multimodal novel Mouse Bird Snake Wolf (Almond & McKean, 2013). I took the role of teacher in the class for five 45-minute English lessons over a two-week period. There were 24 learners in class, and their teacher was also present during the lessons. The learners were around 14 years old. The previous year they had participated in a project exploring comics in their Norwegian class, and the teacher reported that they were used to reading picturebooks in English, especially to learn about history and culture. The learners gave individual, informed consent to inclusion in the assembled data material. The lessons took Serafini’s (2014, p. 92) curricular framework for organizing learners’ encounters with multimodal texts as a starting point, which consists of three phases:

  • Exposure – exposing students to a wide variety of visual images and/or a particular multimodal ensemble.
  • Exploration – exploring the designs, features and structures of various visual images and particular multimodal ensembles.
  • Engagement – engaging in the production and/or interpretation of a particular visual image and/or multimodal ensembles.

These phases build on each other and to some extent overlap, see Table 1 for the lesson design overview.


Lesson Phase(s) Learning activities
1 Exposure


Joint reading

Literary conversation in class

2 Exposure Joint reading
3 Exploration


Close reading and literary conversation in groups


4 Engagement Multimodal text production
5 Exposure


Multimodal text production

Learner presentations

Reflection notes

Table 1. Lesson phases and corresponding learning activities

The learners read Mouse Bird Snake Wolf by David Almond and Dave McKean (2013). As a brief multimodal novel, the text employs conventions from both comics or graphic novels and picturebooks in a type of ‘fusion’ text (Evans, 2011, 2013; Reid & Serafini, 2018). This can be seen, for instance, in the use of framed and unframed panels mixed with single and double splash pages, and narration and dialogue in text blocks. It is therefore a text where learners can employ possible previous knowledge of multimodal formats, but which also challenges their expectations. Mouse Bird Snake Wolf takes the form of a fairy tale or creation myth. However, it places the gods in the roles of distant past creators, and three children in the roles of active creators of new things in a world full of empty spaces. The book begins with wonder both for the characters and the reader as the possibilities of the storyworld are explored, but becomes disconcerting as the children’s creations become increasingly complex and dangerous. The verbal language is repetitive and suggestive of an oral storytelling style, while McKean’s drawings are vibrant in colour, with the gods and the wolf as striking contrasts. The book thus offers opportunities for various interpretations of semiotic resources and meanings of the text as a whole.

In the exposure and exploration phases, the class participated in joint reading to help all learners follow along as well as emphasise the social nature of the literacy event (Sipe, 2008). The reading was broken up by literary conversations in full class, with the aim of expressing reading experiences and through this exploring the multimodal novel (Aase, 2005; Serafini, 2014; van der Pol, 2012). These conversations initially involved the peritextual resources of the text and subsequently concerned creating a perceptual inventory and exploring intra- and intermodal meaning potential (Serafini, 2015). My role as instructor was to help the learners express what they noticed as well as to bring various elements to their attention, thus attempting to extend what they initially perceived. The teacher is, among other roles, a clarifier, prober, and extender of the learners’ responses (Bland, 2023; Sipe, 2008), and I brought with me supplementary knowledge. Through literary discussions, we came up with a common vocabulary, a type of metalanguage, for describing and discussing semiotic resources and literary devices. This was a pedagogical choice, as I wanted to use what the learners noticed as a starting point to avoid limiting what they perceived to a prescribed list and hopefully offer more ownership of their own reading and learning (Jewitt, 2008; Kachorsky & Serafini, 2019). The literary conversations resulted in seven terms, which were written on the board for later reference: colours, frames, perspective, body and clothes, face, font, and words and images. The final word pair referred to considerations of the intermodal relationships.

In the literary class conversations, I also attempted to model a structural perspective (Serafini, 2015). In my conversation with the learners, I focused on how the multimodal ensemble was constructed and meaning potentials of the semiotic resources, as well as connecting this to aspects of the literary competences outlined by Alter and Ratheiser (2019). Although mainly effected through series of questions and answers, as teacher (T) I aimed for the conversation to be explorative for both the learners and myself, where I could follow their leads and exploit opportunities which arose. One example is this exchange, where a student (S1) considered colours (from my field notes):

‘The gods in the sky are black and white, and there is colour on the earth’ (S1).
‘Why do you think that is?’ [no reply] ‘Does it make you feel anything?’ (T)
‘Black and white is sort of sad or cold, colours make me feel happier.’ (S2)

Here S2 expressed aesthetic competence, pointing out an association between the colours in the spread and their own reaction. Following this exchange, the class and I also decided to use ‘colours’ as one of the common terms.

The literary class conversation functioned as modelling for what the learners were doing in the next lesson. They worked in groups exploring openings they chose in the multimodal novel, and wrote responses on a common digital board, Padlet. Table 2 offers an overview of the student responses which constitute the data material, including the written prompts the learners were given in each activity. Padlet gives opportunities to share text, images, links etc. in real-time, and learners can thus gain insight into others’ work as well as share their own. The social and dialogic aspect was important here, both to encourage the expansion of what individual learners noticed and how this was interpreted, as well as experiencing others’ perspectives. To inspire close reading, the learners were instructed to study just one opening in the novel closely. This is an approach often related to more traditionally viewed literary texts, but which also can be beneficial to reading in multimodal environments typically associated with quick impressions and passing attention (Beavis, 2010; Hayles, 2010).

In the engagement phase, the students were asked to interpret the final page of the novel and share this with the rest of the class in the digital response system Menti, which allows for brief and immediate student responses. The learners were also invited to respond creatively to the text through creating their own multimodal texts which continued the story. Finally, I asked them to write reflection notes about their understanding of the final spread in the multimodal novel and their own multimodal creation.


Response No. of responses Organisation and medium Prompt
Padlet 7 Groups


Digital shared board

Consider these questions:


What do you see? What is in the images? What do the words say?

Think about the meanings of the words and the images. What do they mean separately? What do they mean together? Why do you think so?

How do the different elements you notice make you feel or think, and how do you think they influence the story?

Menti 22 Individual


Digital student response system

How do you understand the final page?
Reflection notes 17 Individual


Pen and paper (handwritten)



Option to answer in English or Norwegian:

1. Think about the Menti where you wrote about the ending of the book, and what you saw the others wrote. How do you interpret the final page of the book? Why do you interpret it like this?

2. Tell me about the page you created as a continuation of the story. What did you want to express to the reader? Why did you do it like this?

Table 2. Written student responses constituting the data material


Data Material and Analysis

The data material collected in the study includes my fieldnotes and a range of student responses. As I took on the role of teacher in the class, I was a partially participating observer (Bryman, 2016). In this role I was able to examine the pedagogical approach, as well as observe learners’ responses to the teaching intervention. Due to my participatory role, fieldnotes were written directly after each lesson. These were employed as aids for recollection, while the student responses were the data material analysed in this study.

Attendance in each lesson, the organisation of activities and consideration of learners’ personal consent to my collecting their texts resulted in varying numbers of responses for each activity. The responses varied in length from one word to several paragraphs, both due to differences in the type of texts and affordances of the medium and individual learner differences. All responses were transferred from the platforms where they were first submitted to secure data storage locations, and the handwritten texts were transcribed for easier access. In the following, quotations from the students’ texts will be given in the form they wrote them. In the instances where the learners answered in Norwegian, I will offer an English translation. I have not maintained the learners’ idiosyncrasies (such as spelling errors or dialect influence) in the translations.

The data material was subject to a qualitative content analysis employing deductive category assignment (Mayring, 2022). Clauses were chosen as the basic unit of analysis but nevertheless examined in context. The initial step of coding was undertaken in co-operation with two colleagues. We familiarized ourselves with the material and individually labelled parts of the data material according to code labels adapted from Alter and Ratheiser (2019), considering how competences were expressed in relation to a multimodal text. The individual coding was then subject to discussion in the coding team, and the coding guidelines, outlined in Table 3, were reviewed (Mayring, 2022). After the second round of coding, the team again discussed some of the units of analysis as a form of control coding and to further modify the guidelines (Bakken & Andersson-Bakken, 2021; Mayring, 2022). The dialogue in the team demonstrated a common understanding of the guidelines and a high degree of agreement about the analysis. The final round of coding again employed the revised coding guidelines.


Category label Explanation Examples
Empathic competence Refers to the ability to relate personally and emotionally to a text, and recognize characters’ emotions. (1) I see Ben standing in between them, to shadows, he looks kinda confused and scared.
Aesthetic and stylistic competence Refers to the ability to personally engage with and experience a literary text and consider the stylistic devices (both literary and multimodal conventions) it employs. (2) The colors are making me feel scared and sick.

(3) The words are also getting bigger and bigger so that means he is screaming.

Cultural and discursive competence Refers to the ability to identify and work with the sociocultural context and values of a text and how they are expressed, including cultural expectations and assumptions. (4) we see a wolf that is very angry and its seem like he want to eat the children for breakfast, he is hungry because he has red eyes and you can see on his eyes that he wants to eat them
Interpretative competence Refers to the ability to infer meaning from a literary text. (5) ‘Eg tolkar den siden er slutten på boka og ulven dreper alle sammen og der med kommer det svart bild fordi det er ingen som lever igjen’ [I interpret this page as the end of the book and the wolf kills everyone and therefore there is a black image because there is no one left alive].
Not coded Does not express competences as defined in the model, or it is not possible to conclude which, if any, competence is expressed. (6) wolfeyes and a garden

(7) ikkje bra [not good]

Table 3. Coding guidelines with category labels, explanations, and examples from data material

Some of the examples require further explanations. In (4), there are four code units which needed to be considered together. In the first unit, the learners claimed the wolf ‘want to eat the children for breakfast’. The leaners further explained their reading by pointing to a visual element: ‘he is hungry because he has red eyes’. That red eyes relate to hunger and eating was not necessarily suggested in the multimodal ensemble, but may rather be a result of the readers’ expectations of wolfish behaviour based on knowledge about representations of wolves in European fairy tales. This may thus be seen as an example of how cultural expectations influence the reading, although the learners did not express explicit awareness of this, and the response was labelled ‘Cultural and discursive competence’.

‘Interpretative competence’ is sometimes expressed in tandem with other competences. Some units were therefore assigned two labels. In (5) the learner focused on the dominating colour of the final page and decoded this as a representation of emptiness or grief. This supported the learner’s interpretation that the wolf ‘dreper alle sammen’ [kills everyone], and the unit was labelled both ‘Interpretative competence’ and ‘Aesthetic and stylistic competence’.

Some units were not coded. I recognized that some parts of the data either did not reflect any of the above-mentioned competences, or I was unable to extrapolate whether it did so. In (6), the statement is a description of what the learner saw on the page, but it is not possible to relate this to any of the literary competences outlined by Alter and Ratheiser. Example (7) may refer to both how the spread is making the learners feel (‘Aesthetic and stylistic competence’) and the spread’s role in the whole narrative (‘Interpretative competence’), but as these are the only two words in this response, it is impossible to decide if it is any of these, or if it instead expresses a general resistance to the task (Øgreid, 2021).

Applying previously defined category labels may cause ‘blindness’ (Mayring, 2022, p. 196) to aspects of the material, for instance non-coded units which demonstrate a certain extent of noticing, a prerequisite for the perceptual analytical perspective and the reading of a multimodal literary text in general. Nevertheless, as I seek to explore whether the extended understanding of Alter and Ratheiser’s (2019) model may help teachers and students develop the literacies needed to engage with multimodal literature, looking for whether and how these competences were expressed was considered an applicable approach. There may also be methodological challenges with using student responses in research, for instance that the texts may be incomplete, that the learners are reluctant to write, or that the context is unclear for the learners (Øgreid, 2021). The student responses also limit the material to what the learners expressed in writing. More data sources, for instance through recording group discussions or conducting learner interviews, could have offered further insight into the learners’ reasoning about their literary engagement. However, the student responses represent a realistic outcome of a sequence of English lessons and kept the data collection in the context of learning activities.


Literary Multimodal Literacy in Learners’ Responses

In the following I will present some of the student responses and discuss how and to what extent these demonstrate competences related to literary multimodal literacy. The first example is from the group work with written responses in Padlet, referring to the fifth opening in the multimodal novel:

(8): The gods are covering about half of the two sides. There is five gods and a foot to a sixth one. Arrond them there is just clouds, but the clouds get darker when theire close to the gods. The gods are grey in collor. We can see that the gods look lazy and together with the tekst they are ‘endless’ lazy. What you see on the page the tekst confirm. Because the gods are grey and theire big. We see the gods from a middle distance. We see teire body language and facial expression. Therefore we are sure that theire lazy. Theire bored of sitting there talking, eating and drinking. Theire wearing the same as greek gods, but theire are having grey clothes instead of white. This page does not have any frames. The page shows us that the gods are very very very lazy.

In this response, the learners inferred meaning from several aspects of the spread, as they drew together impulses from the image and the verbal text to understand the gods’ emotional state, ‘bored’ and ‘very very very lazy’. However, they did not explain which elements in the verbal text they based their interpretation on. The word in quotation marks, ‘endless’, is not a direct quotation from the verbal text, but may be a transfer from the Norwegian ‘uendeleg’ which the learners set apart to demonstrate that they were not satisfied with the word choice. The learners mentioned semiotic resources such as colour, size, interpersonal distance and borders, in part using the common terminology agreed upon in class (for example, ‘collor’ [colour], ‘frames’, while ‘middle distance’ was used in the class conversation when discussing perspective). Some of these resources are explicitly linked to the learners’ reading of emotions (‘Empathic competence’) and description of characters (‘Stylistic competence’), while for others (for example, ‘This page does not have any frames’) the meaning potential was not explored.

In the description of the gods’ clothes, the learners showed recognition of a cultural trope, a typical representation of Greek gods in European art, suggesting that they noticed a variance from what they expected. This could be an initiation of a deeper enquiry into the use of this cultural trope but did not continue beyond this sentence, and they did not explore what the representation of the gods may signify in the multimodal novel. In the opening described in (4), the wolf is presented in stark contrast to the children and the surroundings both in size and colour while the children’s gestures suggest fear, establishing the wolf as a sinister character. In the cultural context of most of the students, sinister wolves eat children in fairy tales. In making this interpretation, the learners drew on culturally shaped expectations to the genre and thus showed ability to identify some cultural elements of the multimodal text. However, this was not done explicitly, nor did they further investigate the background for their understanding of the characters.

(9): I see Ben standing between them, to shadows, he looks kinda confused and scared. It is 2 frames here on this page, i think it shows different emotions, and what he is imagining/thinking about. He is close but not too close, and he is looking straightforward.

While the learners in (8) gave reasons for their interpretation of characters’ emotions, this was not the case in (9). However, also this response brought in semiotic resources such as borders, interpersonal distance, and gaze. In the second sentence, there was an endeavour to suggest meaning potential for the borders/panels on this page, while the last sentence did not attempt an analysis.

(10): We see a boy who is screaming. The words are also getting bigger so that meens he is screaming. He seems to be a little angry. A boy and a wolf. The words are saying a lot, but mostly ‘turn back wolf’. The meaning are to show the wolf that the boy is angry. Together they are meaning anger and a messsage that he is angry. The reason the boy looks angry is maybe because he is screaming. The colors are making me feel scared and sick. So they influence the story to be a bit more scary.

Example (10) offers an example of consideration of the meaning potential of the typography (‘the words’), as well as an attempt to connect the meaning potential of words and images. The penultimate sentence was also labelled ‘Aesthetic and stylistic competence’. This was one of few instances where students explicitly connected semiotic resources (‘colors’) to their personal experience of the text. The following examples are from the reflection notes (11) and Menti (12), both illustrating learners’ understanding of the last page of the multimodal novel:

(11): I interpret the last side like this: I see the background, but I don’t care about it.
(12): very good and nice book but i wouldt read it if i had my own choice because it was kinda boring.

Both responses were labelled ‘Aesthetic and stylistic competence’, as they include personal reactions to literature. Example (11) shows a subjective response to elements of the visual mode, albeit less concisely concerning what the learner responded to, and (12) can be seen as an instance of the CEFR A2 level descriptor ‘Can express his/her reactions to a work, reporting his/her feelings and ideas in simple language’ (Council of Europe, 2020, p. 107). Responses to these prompts differed greatly, with some learners focusing explicitly on the last page, while others read it in light of the rest of the multimodal novel:

(13): Eg tolkar det som om at dei trur dei er kvitt ulven, men eigentleg er han der fortsatt og gøymer seg i mørket. Eg tolkar det slik fordi eg ser to raude auge i mørket. [I interpret it as if they think they are rid of the wolf, but he is really still there and hides in the dark. I interpret it like this because I see two red eyes in the dark.]
(14): I think it means that the wolf is getting alive again.
(15): I think that no matter what, there will always be evil in this world

Examples (13) to (15) were all labelled ‘Interpretative competence’, which requires drawing together other competences to assign meaning to the literary text. In the material from Padlet, the label ‘Interpretative competence’ was often used in combination with other labels which demonstrates this co-occurrence, as in (8). In the shorter responses, there were a few instances of the same, for instance (13), where the learner explained their interpretation based on visual resources. However, these responses have more units labelled only ‘Interpretative competence’. Several of these concern the plot, as in (14), rather than ‘implicit meanings and ideas’ (Alter & Ratheiser, 2019, p. 383; Council of Europe, 2020, p. 59). Example (15) is an exception, where the learner considered possible themes in the multimodal novel.

Overall, the students’ responses indicate that they were able to interpret the literary text, but only to a slight extent support their interpretations. The units labelled ‘Aesthetic and stylistic competence’ show that learners could identify especially visual elements in the multimodal novel. Some instances of uncoded units display observations which may be a prerequisite for further exploration of the text, see (6), but these instances were not extended further to develop a perceptual inventory or consider meaning potential. In addition, none of the responses about the learners’ own multimodal texts were coded. This may indicate that the learners did not connect the learning from and about the multimodal novel to their own multimodal creations (Serafini, 2014), at least not explicitly.


Teaching and Learning Literary Multimodal Literacy

In the examined material, there is an emphasis on visual elements, which may suggest that the learners found this interesting and perhaps more accessible than the verbal text. The class teacher also indicated that the classroom conversation and group work engaged learners who often remain less involved when the topic is a primarily verbal text. However, the learners seem to have lost the full multimodal ensemble from view. In the few instances where the student responses refer to the intermodal relationship between images and verbal text, for example, (8) and (10), they indicate correspondence. It seems as if the learners were used to or expected this kind of relationship, which is also suggested by for instance the CEFR Companion volume, while other word-image relationships were less familiar.

Many of the units labelled ‘Aesthetic and stylistic competence’ only indicate semiotic resources the learners noticed, that is having a perceptual perspective on the text (Serafini, 2010, 2015). One of the teacher’s roles is to encourage learners to expand their perception and consider meaning potential. In their study of children reading picturebooks, Arizpe and Styles (2003) found that good questions were central, both to help readers express what they did when encountering a multimodal text, but also to teach children to read such texts. Similarly to the findings in the present study, they saw that the children seemed comfortable answering ‘what’ questions, whereas ‘why’ questions were more challenging at first. In my material, there are some instances where the meaning potential of the semiotic resources were decoded or interpreted to some extent, including connecting them to literary principles such as characterisation, for instance in (8). In these instances, the learners employed a structural perspective (Serafini, 2010, 2015). In order to help learners take this step, Alter and Ratheiser (2019) suggest that

one of our [teachers’] central tasks in the discussion of literature will be to bridge the gap between analysis and interpretation by asking, again and again: What is achieved by feature x that you have just observed? How does this feature (e.g. narrative perspective, rhyme scheme, cultural discourse) make us read the text differently? What effect does this have on me as a reader? (p. 382)

Such questions and the learners’ responses, especially pertaining to visual, verbal and design elements, were actively demonstrated in the classroom literary conversations, but they are not apparent in all student responses. The benefits of explicitly teaching aspects of multimodality are highlighted in research on using multimodal literature in the classroom (for example, Heinz, 2018; Pantaleo, 2020; Reyes-Torres & Portalés Raga, 2020). Macken-Horarik and colleagues (2018) point out that having a metalanguage for multimodality in the classroom changed learners’ responses ‘as if they had discovered (or been shown) a portal into symbolic meaning – the “why” beneath the “what”’ (p. 253). I propose that this does not only pertain to stylistic devices in all modes, but also to aspects such as cultural characteristics, emotional responses, and aesthetic experiences. Making this visible through a common vocabulary allows for wondering and co-creation in a learning community, as well as the chance to investigate how textual engagement may look for different learners.

However, it is not entirely clear what ‘explicit teaching’ or ‘explicit instruction’ entails. Despite encouraging the development of a metalanguage, Jewitt (2008) warns that teaching a technical metalanguage may restrain the reader’s opportunity to explore possible meaning potentials through ‘the risk of a static grammar of modes that cannot account for the power of context and the transformative character of systems of meaning making’ (p. 252). In encouraging aesthetic engagement with the text, I sought to be a co-reader, a ‘fellow wonderer’ (Sipe, 2008, p. 201), with the students instead of purely an instructor. This was an important premise for my approach, as I wanted to use what the learners noticed as a starting point and find a vocabulary close to their everyday language. The explicit instruction thus took the form of literary conversations, question and answer-sequences, and my expansion and utilization of the learners’ input, as well as guidance on their group work. In hindsight, I noticed that I could have introduced the learners to different terms, leading them towards a more conventional metalanguage (for example, inspired by Kress & van Leeuwen, 2006; Serafini, 2014). This could have offered easier transferability to other contexts as well as an extension of vocabulary and conceptual knowledge. The findings suggest that it may be useful to treat aspects such as intermodal relationships, considering the whole ensemble, moving from the perceptual to the structural perspective, and considering the basis for interpretations explicitly in one form or another.

Another aspect which may benefit from explicit treatment is the ideological perspective, which may foster cultural and discursive competence. In the material, there is little to suggest an ideological perspective and very few units were labelled ‘Cultural and discursive competence’. This competence may be seen in relation to intercultural competence, with literature an especially fertile ground for developing such competence (Alter & Ratheiser, 2019; Hoff, 2022). Awareness of sociocultural context was not an explicit focus of the learning activities, nor in the prompts offered to the learners, and the lack of evidence for cultural and discursive competence may demonstrate the need for such a focus.

The learning activities in the classroom intervention differed in classroom organisation, time, prompts, and media, which resulted in different types of responses. Padlet had more detailed instructions for the learners, and in addition the terminology was available on the board. Despite this, there was little direct use of these terms in the students’ written responses, and it seems like the learners rather used them as indications of what they could consider (see [10] on typography). When writing in Menti and their reflection notes, the terms were not in front of the learners, and they focused less on these concepts than in Padlet. Despite not using the common vocabulary, having it available may thus have helped some learners notice aspects of the multimodal ensemble they otherwise would not have considered.

The only instances labelled ‘Empathic competence’ and ‘Cultural and discursive competences’ are from the material from Padlet, and this material also shows clearer connections between the perceptual and structural perspectives. This may indicate that the demonstration of these competences requires longer time and can be encouraged by social interaction and literary, explorative conversations in groups. More time was set aside for this work than the individual responses. In addition, many of the students also utilized the whole time ascribed to the group work, while for Menti and the reflection notes several learners reported they were ‘done’ well before the allotted time was over. Setting aside enough time, as well as encouraging the use of this time through purposeful questioning, scaffolding, and cooperation, seems to be valuable.

The classroom intervention consisted of only five lessons. However, this was realistic in a Norwegian ELT context at this level, where learners typically have only 90 minutes of English lessons per week. There is little time for learners to practice and internalize this type of engagement with a multimodal literary text. Thus, when the learners are relatively new to the subject matter, as in this case, there is both a need to return to the topic several times and consider further scaffolding in the learning activities than what I managed. Scaffolding should include the opportunity to express personal experiences with the literature, tying together personal reactions, analyses, and interpretations to a richer and deeper engagement with the literature.



As Arizpe and colleagues observe, ‘there continues to be some opposition between the view of picturebooks as either useful resources for developing literacy, or art objects that should offer aesthetic pleasure rather than any sort of “lesson”’ (2017, p. 276). In this article, I have argued that to engage meaningfully with multimodal literature we need to consider it as both. In addition, we need to think about what types of literacies are necessary to experience this literature. Eisner (1985) claims that reading graphic novels is ‘an act of both aesthetic perception and intellectual pursuit’ (p. 8), and this is true also for other multimodal literature (Evans, 2013). The examined student responses show that students express competences related to literary multimodal literacy, but the material does not unequivocally show literary-aesthetic and multimodal awareness in all cases. Exploring the material through this lens has allowed for a careful consideration of the nature of the learners’ engagement with the multimodal novel in question, as well as reflecting on benefits and limitations of the teaching and learning approach employed in this English classroom. With the explicit inclusion of multimodal texts in English teaching, there is a continuing need to encourage the development of literacies which support learners’ engagement with these texts.



Almond, David, illus. Dave McKean (2013). Mouse Bird Snake Wolf. Walker Books.



Aase, L. (2005). Litterære samtalar. In B. K. Nicolaysen, & L. Aase (Eds.), Kulturmøte i tekstar. Litteraturdidaktiske perspektiv (pp. 106–124). Det Norske Samlaget.

Alter, G., & Ratheiser, U. (2019). A new model of literary competences and the revised CEFR descriptors. ELT Journal, 73(4), 377–386.

Arizpe, E. (2021). The state of the art in picturebook research from 2010 to 2020. Language Arts, 98(5), 260–272.

Arizpe, E., Farrar, J., & McAdam, J. (2017). Picturebooks and literacy studies. In B. Kümmerling-Meibauer (Ed.), The Routledge companion to picturebooks (pp. 371–381). Routledge.

Arizpe, E., & Styles, M. (2003). Children reading pictures. Routledge Falmer.

Bakken, J., & Andersson-Bakken, E. (2021). Innholdsanalyse. In E. Andersson-Bakken, & C. P. Dalland (Eds.), Metoder i klasseromsforskning (pp. 305–326). Universitetsforlaget.

Beavis, C. (2010). Twenty first century literature: Opportunities, changes and challenges. In D. Wyse, R. Andrews, & J. Hoffman (Eds.), The Routledge international handbook of English, language and literacy teaching (pp. 33–44). Routledge.

Beavis, C. (2013). Literary English and the challenge of multimodality. Changing English: Studies in Culture and Education, 20(3), 241–252.

Bland, J. (2023). Compelling stories for English language learners. Bloomsbury Academic.

Bryman, A. (2016). Social research methods (5th ed.). Oxford University Press.

Burwitz-Meltzer, E. (2007). Ein Lesekompetenzmodell für den fremdsprachlichen Literaturunterricht. In L. Bredella, & W. Hallet (Eds.), Literaturunterricht, Kompetenzen und Bildung (pp. 127–157). Wissenschaftlicher Verlag Trier.

Cai, M., & Traw, R. (1997). Literary literacy. Journal of Children’s Literature, 23(2), 20–33.

Council of Europe. (2018). Common European framework of reference for languages: Learning, teaching, assessment: Companion volume with new descriptors.

Council of Europe. (2020). Common European framework of reference for languages: Learning, teaching, assessment – Companion volume. Council of Europe Publishing.

Diehr, B., & Surkamp, C. (2015). Die Entwicklung literaturbezogener Kompetenzen in der Sekundarstufe I: Modellierung, Abschlussprofil und Evaluation. In W. Hallet, C. Surkamp, & U. Krämer (Eds.), Literaturkompetenzen Englisch: Modellierung – Curriculum – Unterrichtsbeispiele (pp. 21–40). Klett-Kallmeyer.

Eisenmann, M., & Meyer, M. (2018). Introduction: Multimodality and multiliteracies. Anglistik: International Journal of English Studies, 29(1), 5–23.

Eisenmann, M., & Summer, T. (2020). Multimodal literature in ELT: Theory and practice. Children’s Literature in English Language Education, 8(1), 52–73.

Eisner, W. (1985). Comics & sequential art. Poorhouse Press.

Ellis, G. (2018). The picturebook in elementary ELT: Multiple literacies with Bob Staake’s Bluebird. In J. Bland (Ed.), Using literature in English language education: Challenging reading for 8-18 year olds. Bloomsbury.

Evans, J. (2011). Raymond Briggs: Controversially blurring boundaries. Bookbird: A Journal of International Children’s Literature, 49(4), 49–61.

Evans, J. (2013). From comics, graphic novels and picturebooks to fusion texts: A new kid on the block! Education 3-13, 41(2), 233–248.

Gee, J. P. (2014). Foreword. In F. Serafini (Ed.), Reading the visual. An introduction to teaching multimodal literacy (pp. xi–xii). Teachers College Press.

Hallet, W. (2018). Reading multimodal fiction: A methodological approach. Anglistik: International Journal of English Studies, 29(1), 25–40.

Hayles, N. K. (2010). How we read: Close, hyper, machine. ADE Bulletin, 150, 62–79.

Heggernes, S. L. (2019). Opening a dialogic space: Intercultural learning through picturebooks. Children’s Literature in English Language Education, 7(2), 37–60.

Heinz, S. (2018). Researching multimodal reader response(s) in the EFL classroom. In A.-J. Zwierlein, J. Petzold, K. Boehm, & M. Decker (Eds.), Anglistentag 2017 Regensburg. Proceedings (pp. 311–324). Wissenschaftlicher Verlag Trier.

Hoff, H. E. (2022). Promoting 21st century skills through classroom encounters with English language literature in Norway: Theoretical and practical considerations. In M. Dypedahl (Ed.), Moving English language teaching forward (pp. 165–194). Cappelen Damm Akademisk.

Jakobsen, I. K. (2019). Inspired by image: A multimodal analysis of 10th grade English school-leaving written examinations set in Norway. Acta Didactica Norge, 13(1), 1–27.

Jakobsen, I. K. (2022). Multimodality and literacy practices in English. Exploring the role of multimodal texts in English language teaching in Norway [PhD Dissertation]. UiT The Arctic University of Norway.

Jakobsen, I. K., & Tønnessen, E. S. (2018). A design-oriented analysis of multimodality in English as a foreign language. Designs for Learning, 10(1), 40–52.

Jewitt, C. (2008). Multimodality and literacy in school classrooms. Review of Research in Education, 32, 241–267.

Jewitt, C. (2014). Introduction: Handbook rationale, scope and structure. In C. Jewitt (Ed.), The Routledge handbook of multimodal analysis (2nd ed., pp. 1–7). Routledge.

Kachorsky, D., & Serafini, F. (2019). From picturebooks to propaganda: Developing visual and multimodal literacies. In E. Domínguez Romero, J. Bobkina, & S. Stefanova (Eds.), Teaching literature and language through multimodal texts (pp. 70–92). IGI Global.

Kress, G. (2010). Multimodality. A social semiotic approach to contemporary communication. Routledge.

Kress, G., & van Leeuwen, T. (2006). Reading images: The grammar of visual design (2nd ed.). Routledge.

Kulju, P., Kupiainen, R., Wiseman, A. M., Jyrkiäinen, A., Koskinen-Sinisalo, K.-L., & Mäkinen, M. (2018). A review of multiliteracies pedagogy in primary classrooms. Language and Literacy, 20(2), 80–101.

Leibrandt, I. M. (2019). Postmodern literacy: Multimodal, hypertextual, intertextual reading. In E. Domínguez Romero, J. Bobkina, & S. Stefanova (Eds.), Teaching literature and language through multimodal texts (pp. 258–276). IGI Global.

Lim, F. V. (2021). Towards education 4.0: An agenda for multiliteracies in the English language classroom. In F. A. Hamied (Ed.), Literacies, culture, and society towards industrial revolution 4.0: Reviewing policies, expanding research, enriching practices in Asia (pp. 11–30). Nova Science.

Lim, F. V., Toh, W., & Nguyen, T. T. H. (2022). Multimodality in the English language classroom: A systematic review of literature. Linguistics and Education, 69.

Locke, T., & Andrews, R. (2004). A systematic review of the impact of ICT on literature-related literacies in English, 5-16. Research Evidence in Education Library. EPPI-Centre, Social Science Research Unit, Institute of Education.

Maagerø, E., & Tønnessen, E. S. (2022). Multimodal literacy in English as an additional language. In S. Diamantopoulou, & S. Ørevik (Eds.), Multimodality in English language learning (pp. 27–38). Routledge.

Macken-Horarik, M., Love, K., Sandiford, C., & Unsworth, L. (2018). Functional grammatics: Re-conceptualizing knowledge about language and image for school English. Routledge.

Mayring, P. (2022). Qualitative content analysis. SAGE.

Meier, C., Roick, T., Henschel, S., Brüggemann, J., Frederking, V., Rieder, A., Gerner, V., & Stanat, P. (2017). An extended model of literary literacy. In D. Leutner, J. Fleischer, J. Grünkorn, & E. Klieme (Eds.), Competence assessment in education. Research, models and instruments (pp. 55–74). Springer.

Mourão, S. (2023). Picturebooks for intercultural learning in foreign language education. A scoping review. Zeitschrift für interkulturellen Fremdsprachenunterricht, 28(1), 173–209.

Øgreid, A. K. (2021). Elevtekster som empirisk materiale i kvalitative studier. In E. Andersson-Bakken, & C. P. Dalland (Eds.), Metoder i klasseromsforskning (pp. 327–354). Universitetsforlaget.

Pantaleo, S. (2020). Slow looking: “reading picturebooks takes time”. Literacy, 54(1), 40–48.

Reid, S. F., & Serafini, F. (2018). More than words: An investigation of the middle-grade multimodal novel. Journal of Children’s Literature, 44(2), 32–44.

Reyes-Torres, A., & Portalés Raga, M. (2020). Multimodal approach to foster the multiliteracies pedagogy in the teaching of EFL through picturebooks. Atlantis, 42(1), 94–119.

Serafini, F. (2010). Reading multimodal texts: Perceptual, structural and ideological perspectives. Children’s Literature in Education, 41(2), 85–104.

Serafini, F. (2014). Reading the visual. An introduction to teaching multimodal literacy. Teachers College Press.

Serafini, F. (2015). Multimodal literacy: From theories to practices. Language Arts, 92(6), 412–422.

Si, Q., Hodges, T. S., & Coleman, J. M. (2022). Multimodal literacies classroom instruction for K-12 students: A review of research. Literacy Research and Instruction, 61(3), 276–297.

Sindoni, M. G., Moschini, I., Adami, E., & Karatza, S. (2022). The common framework of reference for intercultural digital literacies (CFRIDiL): Learning as meaning-making and assessment as recognition in English as an additional language contexts. In S. Diamantopolou, & S. Ørevik (Eds.), Multimodality in English language learning (pp. 221–237). Routledge.

Sipe, L. (2008). Storytime: Young children’s literary understanding in the classroom. Teachers College Press.

Skulstad, A. S. (2020). Multimodality. In A.-B. Fenner, & A. S. Skulstad (Eds.), Teaching English in the 21st century (2nd ed., pp. 261–283). Fagbokforlaget.

Spiro, J. (1991). Assessing literature: Four papers. In C. Brumfit (Ed.), Assessment in literature teaching (pp. 16–83). Macmillan.

United Nations Educational, Scientific and Cultural Organization [UNESCO]. (2023, February 2). What you need to know about literacy. UNESCO.

Unsworth, L., & Macken-Horarik, M. (2017). Interpretive responses to images in picture books by primary and secondary school students: Exploring curriculum expectations of a ‘visual grammatics’. English in Education, 49(1), 56–79.

Utdanningsdirektoratet. (2019). Curriculum in English (ENG01-04). Utdanningsdirektoratet [The Norwegian Directorate for Education and Training].

van der Pol, C. (2012). Reading picturebooks as literature: Four-to-six-year-old children and the development of literary competence. Children’s Literature in Education, 43, 93–106.

Yi, Y. (2014). Possibilities and challenges of multimodal literacy practices in teaching and learning English as an additional language. Language and Linguistics Compass, 8(4), 158–169.