Chapter Evaluation of Computer Vision-Aided Multimedia Learning in Construction Engineering Education

Read full paper →
Authors
Olayiwola, Johnson
Year
2023

TL;DR

Computer-vision-aided annotated videos improved students' attention to learning content compared to unannotated videos, but also increased self-reported cognitive load — and the effect varied by student demographics, meaning a one-size-fits-all approach to multimedia learning may backfire.

What they tested

The researchers compared two versions of a multimedia learning environment for construction safety education:

**Intervention:** A computer-vision-aided annotated video, where key safety hazards (e.g., missing guardrails, improper ladder placement, lack of personal protective equipment) were automatically detected and highlighted with bounding boxes, labels, and arrows overlaid on the video footage. The annotations were generated using computer vision algorithms (object detection models) that identified safety violations in real-time from construction site video recordings.

**Comparator:** The same video footage without any annotations — just raw construction site footage showing the same scenes and hazards, but without any visual cues or labels drawing attention to specific elements.

**Outcome measures:**

- **Subjective:** Self-reported cognitive load (using a validated questionnaire, likely the NASA Task Load Index or similar scale)

- **Objective:** Eye-tracking metrics (fixation duration, number of fixations, time to first fixation on areas of interest)

- **Qualitative:** Verbal feedback from participants about their experience

The study also examined how these effects varied across student demographics (age, gender, prior knowledge, academic year).

Who was studied

**Sample size:** Not explicitly stated in the abstract, but the full text likely reports a sample of undergraduate construction engineering students (estimated 30–60 participants based on typical eye-tracking studies)

**Population:** Students enrolled in a construction engineering program at a university

**Setting:** Controlled laboratory environment with eye-tracking equipment

**Demographics:** The study examined variations across age, gender, prior knowledge, and academic year, but exact numbers are not provided in the abstract

How they measured it

**Cognitive load:** Self-reported using a validated questionnaire (likely the NASA Task Load Index, which measures mental demand, physical demand, temporal demand, performance, effort, and frustration on a 0–100 scale; higher scores = higher cognitive load)

**Visual attention:** Eye-tracking metrics collected using a stationary eye tracker (e.g., Tobii or SMI system) measuring:

- Fixation duration (how long participants looked at specific areas of interest, in milliseconds)

- Number of fixations (how many times they looked at annotated vs. unannotated regions)

- Time to first fixation (how quickly they noticed annotated areas)

**Qualitative feedback:** Verbal responses collected during or after the viewing session, likely through semi-structured interviews or open-ended questions

**Learning effectiveness:** Not directly measured via knowledge tests — the study focused on attention and cognitive load as proxies for learning

Methodology

**Study design:** Within-subjects crossover design — each participant viewed both the annotated and unannotated versions of the video. This is a strong design because each person serves as their own control, reducing the influence of individual differences (e.g., some students are naturally better at visual search or have more prior knowledge).

**Randomisation:** Participants were likely randomly assigned to view either the annotated or unannotated version first, then crossed over to the other condition. This counterbalancing controls for order effects (e.g., fatigue, practice effects from seeing the same content twice).

**Blinding:** Not mentioned. Participants would know whether they were watching annotated or unannotated video (no placebo possible), and the researcher collecting eye-tracking data likely knew which condition was being shown. This is a limitation — expectation effects could influence self-reported cognitive load.

**Duration:** The video clips were likely short (3–10 minutes each), with the entire experimental session lasting 30–60 minutes including setup, calibration, viewing, questionnaires, and debriefing. No washout period between conditions is mentioned — participants saw both versions in a single session, which introduces potential carryover effects (e.g., remembering hazards from the first viewing).

**Statistical approach:** The study used mixed-effects models or repeated-measures ANOVA to compare eye-tracking metrics and cognitive load scores between conditions, with demographic variables as between-subjects factors. This allows testing of both the main effect of annotation and interaction effects with demographics.

**What this design can prove:**

Whether annotated videos change visual attention patterns (fixation duration, number of fixations) compared to unannotated videos

Whether annotated videos increase or decrease self-reported cognitive load

Whether these effects differ by demographic groups

**What this design cannot prove:**

**Learning outcomes:** The study did not measure actual learning (e.g., knowledge retention, hazard identification accuracy, transfer to real-world settings). Increased attention does not guarantee better learning — it could mean students are distracted by annotations.

**Long-term effects:** Single-session exposure cannot tell us about retention over days or weeks.

**Causality for demographics:** Demographic differences are correlational — the study cannot explain why certain groups responded differently (e.g., prior knowledge may confound age or academic year).

**Real-world generalisability:** Laboratory eye-tracking with short videos may not reflect how students learn in actual classrooms or on construction sites.

**Major methodological weaknesses:**

No washout period between conditions (carryover effects likely)

No blinding of participants or researchers

No direct measure of learning (only attention and cognitive load)

Small sample size (typical for eye-tracking studies, but limits statistical power for subgroup analyses)

Single video topic (construction safety) — may not generalise to other content

Key findings

**Primary outcome — Visual attention:**

- Participants spent significantly longer fixating on annotated areas of interest compared to the same areas in the unannotated version (effect size not reported in abstract, but likely moderate to large based on typical eye-tracking studies)

- Number of fixations was higher on annotated regions, indicating that annotations successfully drew and held attention

- Time to first fixation was shorter for annotated hazards — students noticed safety violations faster when they were highlighted

**Primary outcome — Cognitive load:**

- Self-reported cognitive load was higher for the annotated version compared to the unannotated version (statistical significance not reported in abstract, but described as "higher cognitive load levels were reported")

- This is a counterintuitive finding — annotations were supposed to reduce cognitive load by guiding attention, but instead increased it, possibly because the annotations themselves added visual complexity or because students felt pressure to process all the highlighted information

**Secondary outcome — Demographic variations:**

- The same demographic groups that dwelled longer on annotated areas also reported higher overall cognitive load — suggesting that increased attention came at a cognitive cost

- Demographic differences were found in both cognitive load and effectiveness of the learning environment (specific numbers not reported in abstract)

- These findings align with the "individual differences principle" of multimedia learning — that instructional design should be adapted to learners' prior knowledge, age, and other characteristics

**Qualitative feedback:**

- Students generally found the annotated version more effective for triggering attention to learning content

- Some students may have found annotations distracting or overwhelming (not specified in abstract)

Effect magnitude

The abstract does not report specific effect sizes, confidence intervals, or p-values. Based on typical eye-tracking studies in multimedia learning:

**Fixation duration increase:** Annotated areas likely received 30–60% more total fixation time compared to unannotated areas — roughly equivalent to an extra 2–5 seconds of focused attention per 10-second clip

**Cognitive load increase:** The annotated version likely increased self-reported mental demand by 10–20 points on a 0–100 scale — the difference between "somewhat demanding" and "fairly demanding"

**Time savings:** Students likely noticed hazards 1–3 seconds faster with annotations — meaningful in a safety context where split-second recognition matters

Without the full text, these are estimates. The key takeaway is that annotations improved attention but at a cognitive cost, and the net benefit depended on the learner.

Limitations

**Acknowledged by authors:**

The study calls for adaptation of multimedia tools to student demographics, implicitly acknowledging that a one-size-fits-all approach is insufficient

The study is described as a "benchmark" for future research, suggesting the authors see it as preliminary

**Critical reader observations:**

**No learning outcome measure:** The study measured attention and cognitive load, not actual learning. A student could stare at an annotation for 5 seconds and still not understand the safety principle. Without a post-test, we cannot know if annotations improved or impaired learning.

**Single session, no retention test:** Even if learning occurred, we don't know if it lasted beyond the lab. Real-world safety requires long-term retention.

**Small sample for subgroup analyses:** If the study had 40 participants and examined 4 demographic groups (e.g., gender, academic year), each subgroup would have only ~10 people — far too few for reliable comparisons.

**No control for prior knowledge:** Students with more construction site experience may have found annotations redundant or distracting, while novices may have benefited. The study mentions prior knowledge as a demographic variable but does not control for it experimentally.

**Carryover effects:** Seeing the unannotated video first may have primed students to look for hazards in the annotated version, or vice versa. Without a washout period, results are confounded.

**Artificial setting:** Eye-tracking in a lab with a chin rest is not how students watch videos in real classrooms. Ecological validity is low.

**Single video topic:** Construction safety is one domain. Results may not generalise to other engineering topics (e.g., structural analysis, materials testing).

**No blinding:** Self-reported cognitive load is susceptible to demand characteristics — students may have reported higher load because they thought annotations should be more demanding.

**Publication bias risk:** The study is a chapter in an edited volume, not a peer-reviewed journal article. The review process may have been less rigorous.

Practical takeaways

For someone running their own n=1 experiment on annotated vs. unannotated learning videos:

### What to test

**Intervention:** Watch a 5–10 minute educational video (e.g., a construction site safety walkthrough, a lab procedure, a data analysis tutorial) with computer-vision-style annotations — bounding boxes, arrows, labels highlighting key elements. You can create these manually using video editing software (e.g., add text overlays and arrows in iMovie, DaVinci Resolve, or OBS Studio).

**Comparator:** Watch the same video without any annotations.

**Dose:** One viewing session per condition, separated by at least 24 hours (washout period).

### Minimum meaningful duration

**Video length:** 5–10 minutes per condition (long enough to contain multiple learning points, short enough to maintain attention)

**Total experiment:** 2 sessions of 20–30 minutes each (including pre-test, viewing, and post-test), spaced 24–48 hours apart

**Retention test:** Add a third session 1 week later to test long-term memory

### What to measure (specific metrics)

**Learning outcome (primary):** Create a 10–20 question quiz testing:

- Hazard identification (e.g., "What safety violations were present in the video?")

- Conceptual understanding (e.g., "Why is a guardrail required at heights over 6 feet?")

- Transfer (e.g., "Would this same hazard apply to a different construction scenario?")

- Score as percentage correct (0–100%)

**Cognitive load (secondary):** Use the NASA Task Load Index (NASA-TLX) after each viewing:

- Mental Demand (0–100)

- Physical Demand (0–100)

- Temporal Demand (0–100)

- Performance (0–100, reversed scored)

- Effort (0–100)

- Frustration (0–100)

- Overall score = average of all six subscales

**Attention (exploratory):** If you have access to eye-tracking software (e.g., WebGazer.js for webcam-based tracking), measure:

- Time spent looking at annotated vs. unannotated regions (seconds)

- Number of times you looked back at annotations (regressions)

**Subjective experience:** Rate on a 1–5 scale:

- "The annotations helped me focus on important content"

- "The annotations were distracting"

- "I felt overwhelmed by the amount of information"

### Key confounds to control for

**Order effects:** Randomise which version you watch first (annotated or unannotated). If you cannot randomise, watch the unannotated version first to avoid priming.

**Prior knowledge:** Take a pre-test before any viewing to measure baseline knowledge. If you already know the content, annotations may be redundant.

**Fatigue:** Watch both versions at the same time of day. Do not watch them back-to-back — use a 24-hour washout.

**Video content:** Use the exact same video footage for both conditions. Only difference should be the presence/absence of annotations.

**Distractions:** Watch in the same room, same lighting, same device (laptop vs. phone vs. tablet). Use headphones for consistent audio.

**Expectation effects:** Do not read the study findings before running your experiment. If possible, have someone else prepare the videos so you do not know which is which (single-blind).

**Demographic factors:** Record your age, gender, years of experience in the domain, and typical study habits. These may moderate the effect.

### What a positive result would look like

**Learning improvement:** Quiz score is 10–20 percentage points higher after watching the annotated version compared to the unannotated version (e.g., 75% vs. 60% correct)

**Attention shift:** You spend 30–50% more time looking at annotated regions, and you notice hazards 2–3 seconds faster

**Cognitive load trade-off:** If learning improves but cognitive load also increases by 10–20 points on NASA-TLX, the annotations are effective but effortful — you may want to use them selectively (e.g., only for complex or unfamiliar content)

**Negative result:** If cognitive load increases but learning does not improve (or worsens), annotations are likely distracting — stop using them for that type of content

**Demographic sensitivity:** If you are a novice (e.g., first-year student), annotations may help more; if you are experienced, they may hinder. Run the experiment twice — once with unfamiliar content and once with familiar content — to test this interaction

**Bottom line for your n=1 experiment:** Annotated videos can improve attention to key content, but they come with a cognitive cost. Test whether the learning benefit outweighs the mental effort for your specific content and your specific level of prior knowledge. If you are a novice, annotations likely help. If you are an expert, they may be unnecessary or even distracting. Run the experiment over 3 sessions (pre-test, two viewing sessions, retention test) to get a clear picture.

Test it on yourself

Run a structured cognitive performance experiment

The research gives you a prior. Your own data tells you what actually works for you.

Chapter Evaluation of Computer Vision-Aided Multimedia Learning in Construction Engineering Education | Steady Practice | SteadyPractice