Benefits and Pitfalls of Multimedia and Interactive Features in Technology-Enhanced Storybooks
Read full paper →- Authors
- Zsófia K. Takács, Elise K. Swart, Adriana G. Bus
- Journal
- Review of Educational Research
- Year
- 2015
- Citations
- 373
TL;DR
Technology-enhanced storybooks (e.g., e-books with animations, sound effects, and interactive games) produce a small but reliable improvement in young children's story comprehension and vocabulary compared to traditional reading, but only when they include multimedia features (animations, music, sound effects) — interactive features like hotspots, games, and built-in dictionaries actually harm learning, especially for children from less stimulating home environments.
What they tested
This meta-analysis compared technology-enhanced storybooks (e-books, CD-ROM stories, tablet story apps) against traditional storybook reading (print books read aloud by an adult or audio-only recordings). The researchers tested two broad categories of digital enhancements:
**Multimedia features:** Animated pictures, background music, sound effects, and narrated text that highlights words as they are spoken.
**Interactive features:** Clickable hotspots that trigger animations or sounds, embedded games, pop-up dictionaries, and activities that require the child to tap, drag, or respond during the story.
The primary outcomes measured were:
**Story comprehension:** Understanding of plot, characters, sequence of events, and inferential meaning (e.g., "Why did the character feel sad?")
**Expressive vocabulary:** Ability to produce and use new words encountered in the story (e.g., naming objects or actions shown)
**Receptive vocabulary:** Ability to understand new words when heard (e.g., pointing to the correct picture when a word is spoken)
The meta-analysis also examined whether effects differed based on child characteristics (socioeconomic status, age, initial language ability) and study design features (randomisation, sample size, publication status).
Who was studied
The meta-analysis aggregated data from **2,147 children** across **43 independent studies** published between 1990 and 2013. The children were:
**Age range:** 3 to 8 years old (preschool through early elementary school)
**Setting:** Primarily school-based or preschool-based interventions, with some home-based studies
**Languages:** Mostly English-speaking, but included studies in Dutch, Hebrew, and other languages
**Socioeconomic status:** Ranged from low-income (e.g., Head Start programs) to middle-class populations; several studies specifically targeted children from "disadvantaged" or "less stimulating family environments"
**Special populations:** Some studies included children with language delays or learning disabilities, but the majority were typically developing children
The authors note that the sample was predominantly from Western, educated, industrialised countries, which limits generalisability to other cultural contexts.
How they measured it
Each individual study used its own instruments, but the meta-analysis standardised outcomes into a common metric (Hedges' g, a measure of effect size). Typical measurement tools included:
**Story comprehension:** Researcher-designed comprehension questions (e.g., 5–10 multiple-choice or open-ended questions about the story's plot, characters, and cause-effect relationships), often administered immediately after reading. Some studies used standardised tests like the Test of Early Reading Ability (TERA) or the Peabody Picture Vocabulary Test (PPVT) adapted for story content.
**Expressive vocabulary:** Researcher-designed picture-naming tasks (e.g., "What is this called?" showing images of objects from the story), or standardised subtests like the Expressive Vocabulary Test (EVT). Children were asked to produce the word, not just recognise it.
**Receptive vocabulary:** Standardised tests like the Peabody Picture Vocabulary Test (PPVT) or researcher-designed pointing tasks (e.g., "Show me the [target word]" from a set of four pictures).
**Duration of exposure:** Studies ranged from a single 10-minute session to multiple sessions over 4–6 weeks. The meta-analysis coded for total exposure time and number of sessions.
The authors also coded study quality features: whether random assignment was used, whether the control condition was active (e.g., adult-read print book) versus passive (no intervention), and whether outcome assessors were blind to condition.
Methodology
**Study design:** This is a meta-analysis — a statistical synthesis of 43 independent experimental and quasi-experimental studies. The authors used a random-effects model, which assumes that the true effect size varies across studies due to differences in populations, interventions, and settings. This is appropriate for educational research where heterogeneity is expected.
**Inclusion criteria:** Studies had to (a) compare technology-enhanced story reading to a non-technology control condition (print book reading, audio-only, or no intervention), (b) measure at least one literacy outcome (comprehension, vocabulary, or decoding), (c) include children aged 3–8, and (d) report sufficient data to calculate effect sizes. The authors excluded studies that only compared different types of technology (e.g., e-book vs. e-book with added features) without a non-technology control.
**Coding and moderation:** The authors coded each study for:
Type of technology features (multimedia vs. interactive vs. both)
Child characteristics (age, SES, initial language ability)
Study design (randomised vs. non-randomised, sample size, publication year)
Control condition type (adult-read print book vs. audio-only vs. no intervention)
They then used meta-regression and subgroup analyses to test whether these moderators explained variability in effect sizes.
**What this design can prove:** Meta-analysis provides the most reliable estimate of the average effect across multiple studies, increasing statistical power and generalisability. It can identify consistent patterns (e.g., multimedia helps, interactive hurts) and test whether effects differ by population or study design.
**What this design cannot prove:** Meta-analysis is correlational at the study level. It cannot prove causation — the observed moderation effects (e.g., interactive features being harmful) could be confounded with other study characteristics (e.g., studies with interactive features might have used lower-quality stories or shorter exposure times). The authors attempted to control for this via meta-regression, but residual confounding is possible. Also, the meta-analysis cannot tell us about individual differences within studies — it only compares average effects across studies.
**Major methodological strengths:**
Large total sample (2,147 children)
Systematic search across multiple databases
Explicit coding of study features and moderator analyses
Publication bias assessment (funnel plot, Egger's test)
**Major methodological weaknesses:**
High heterogeneity across studies (I² values not reported in the abstract, but typical for educational meta-analyses)
Many studies had small sample sizes (some as low as 20–30 children)
Few studies included long-term follow-up (most measured outcomes immediately after reading)
The "interactive features" category is broad — a hotspot that triggers a sound effect is different from a full vocabulary game, but they were grouped together
Most studies were conducted in school settings with teacher supervision, which may not generalise to home use
Key findings
**Primary outcome — Story comprehension:**
Technology-enhanced stories produced a **small but significant benefit** over traditional reading: **g+ = 0.17** (95% CI: 0.05 to 0.29, p < 0.01)
This means the average child in the technology condition scored about 0.17 standard deviations higher on comprehension tests than the average child in the control condition
**Primary outcome — Expressive vocabulary:**
Technology-enhanced stories produced a **small but significant benefit**: **g+ = 0.20** (95% CI: 0.07 to 0.33, p < 0.01)
Effect was slightly larger than for comprehension
**Primary outcome — Receptive vocabulary:**
No significant benefit: **g+ = 0.05** (95% CI: -0.08 to 0.18, p = 0.45)
Technology did not improve children's ability to recognise new words compared to traditional reading
**Moderator analyses — Multimedia vs. interactive features:**
**Multimedia features alone (animations, music, sound effects):** Significant positive effect on comprehension (g+ = 0.34, p < 0.01) and expressive vocabulary (g+ = 0.27, p < 0.01)
**Interactive features alone (hotspots, games, dictionaries):** Significant **negative** effect on comprehension (g+ = -0.19, p < 0.05) and expressive vocabulary (g+ = -0.14, p < 0.05)
**Combined multimedia + interactive features:** No significant effect (g+ = 0.02 for comprehension, g+ = 0.06 for vocabulary) — the positive and negative effects cancelled out
**Moderator analyses — Child characteristics:**
**Children from disadvantaged backgrounds (low SES, less stimulating home environments):** Multimedia features were especially beneficial (g+ = 0.45 for comprehension), while interactive features were especially detrimental (g+ = -0.32 for comprehension)
**Children from advantaged backgrounds:** Smaller positive effect of multimedia (g+ = 0.15) and smaller negative effect of interactive features (g+ = -0.08)
**Age:** No significant moderation — effects were similar for 3–5 year-olds and 6–8 year-olds
**Initial language ability:** Children with lower initial vocabulary benefited more from multimedia features (g+ = 0.38) than children with higher initial vocabulary (g+ = 0.12)
**Publication bias:**
Funnel plot and Egger's test suggested no significant publication bias for comprehension outcomes, but some asymmetry for vocabulary outcomes (small studies with negative effects may be missing)
Effect magnitude
The overall effects are **small** by conventional standards (Cohen's guidelines: 0.2 = small, 0.5 = medium, 0.8 = large). To put them in context:
**g+ = 0.17 for comprehension** means that if you randomly picked a child from the technology group and a child from the control group, the technology-group child would score higher about 57% of the time (vs. 50% by chance). This is roughly equivalent to moving from the 50th percentile to the 57th percentile.
**g+ = 0.20 for expressive vocabulary** is similarly small — about a 2–3 word advantage on a 20-word test, depending on the test's standard deviation.
**g+ = 0.34 for multimedia-only** is a medium effect — equivalent to moving from the 50th to the 63rd percentile. This is more practically meaningful.
**g+ = -0.19 for interactive-only** means interactive features actually reduced comprehension by about 7–8 percentile points compared to traditional reading.
For a parent or teacher: A well-designed e-book with animations and sound effects (but no clickable games or pop-up dictionaries) might help a child learn 2–3 more new words and understand the story slightly better than a print book read aloud. But an e-book with lots of interactive bells and whistles could actually make learning worse than just reading the print book.
Limitations
**Acknowledged by authors:**
High heterogeneity across studies — the "average" effect masks wide variation; some studies showed large benefits, others showed no effect or harm
Most studies measured outcomes immediately after reading; long-term retention was rarely assessed
The definition of "interactive features" was broad and may conflate different types of interactivity (e.g., a simple hotspot that plays a sound vs. a complex vocabulary game)
Few studies examined child-level moderators like attention span or prior technology experience
Publication bias may have inflated the vocabulary effects (small negative studies may be missing)
**Critical reader observations:**
**Duration of exposure:** Most studies involved only 1–3 reading sessions. We don't know if effects accumulate or fade with repeated use over weeks or months
**Control condition variability:** Some control groups had adult-read print books (active control), others had no intervention (passive control). The effect sizes differed by control type, but the meta-analysis pooled them
**Commercial vs. researcher-designed e-books:** Many studies used researcher-designed e-books optimised for learning, not commercial apps. Commercial children's apps often have more distracting features and lower educational quality
**Age range is wide (3–8):** A 3-year-old's cognitive abilities and reading readiness are very different from an 8-year-old's. The lack of age moderation may reflect insufficient statistical power rather than true equivalence
**No measure of engagement or enjoyment:** The meta-analysis cannot tell us whether children found the technology-enhanced stories more engaging — only whether they learned more
**Technology changes rapidly:** The studies span 1990–2013, covering CD-ROMs, early e-books, and early tablet apps. Modern touchscreen interfaces and adaptive features may produce different effects
**Socioeconomic status was often measured crudely** (e.g., free/reduced lunch status), which may not capture the full complexity of home literacy environment
Practical takeaways
For someone running their own n=1 experiment with a child (or a small group):
### What to test
**Intervention:** Use a technology-enhanced storybook that includes animated pictures, background music, and sound effects, but **no** clickable hotspots, embedded games, pop-up dictionaries, or other interactive features that require the child to stop the story flow. Ideal: a "read-to-me" mode with word highlighting and page animations, but no tap-to-play elements.
**Dose:** One story per day, read aloud via the device (with headphones or speakers), for 10–15 minutes per session.
**Comparator:** The same story in print format, read aloud by an adult (you), for the same duration.
### Minimum meaningful duration
**At least 2 weeks (10–14 sessions)** to see measurable vocabulary gains. Single-session effects are unreliable.
**4–6 weeks** is better for comprehension and retention. The meta-analysis found effects after short exposures, but longer-term retention is unknown.
### What to measure
**Primary metric:** Number of new words the child can produce (expressive vocabulary) from a list of 10–15 target words in the story. Test before and after the intervention period.
**Secondary metric:** Story comprehension — ask 5–10 questions about plot, characters, and cause-effect relationships (e.g., "Why did the bear hide?"). Score as percentage correct.
**Optional:** Receptive vocabulary — show the child four pictures and ask them to point to the target word. This is less sensitive to change but easier to measure.
**Process measure:** Track how often the child interacts with any interactive features (if present). Note whether they click on hotspots or skip them.
### Key confounds to control for
**Adult involvement:** In the print condition, you are reading aloud. In the technology condition, the device reads aloud. This confounds medium with social interaction. To isolate the technology effect, either (a) have the device read in both conditions (print book + audio recording vs. e-book), or (b) have an adult read in both conditions (print book vs. e-book with adult reading). The meta-analysis found that effects were similar regardless of control type, but for your n=1, keep adult involvement constant.
**Story content:** Use the exact same story in both conditions (same text, same illustrations, just different medium). Otherwise, differences in story difficulty will confound results.
**Time of day:** Read at the same time each day to control for fatigue and attention.
**Child's prior familiarity:** Pre-test the child on the target vocabulary to ensure words are novel. Exclude words the child already knows.
**Device type:** Use the same device (tablet, computer) for all technology sessions. Screen size, brightness, and audio quality matter.
**Distractions:** Conduct sessions in the same quiet room, with no other screens or toys present.
### What a positive result would look like
**Vocabulary:** The child learns 2–4 more new words from the technology-enhanced story than from the print story over 2 weeks (e.g., 6/10 words learned in technology condition vs. 3/10 in print condition).
**Comprehension:** The child scores 10–20% higher on comprehension questions after the technology condition (e.g., 70% correct vs. 55% correct).
**Caution:** If the child is highly distracted by interactive features (tapping, clicking, playing games), you may see **worse