Chapter Prediction of Cognitive Load during Industry-Academia Collaboration via a Web Platform

Read full paper →
Authors
Murzi, Homero
Year
2023

TL;DR

Researchers used a 5-channel EEG headset and a recurrent neural network (LSTM) to predict cognitive load in 19 people using a web platform, achieving a model that could forecast brain signals with reasonable accuracy — but the study is too small and preliminary to support any personal experiment recommendations.

What they tested

The researchers tested whether a specific type of artificial neural network (Long Short-Term Memory, or LSTM) could predict a person's cognitive load — measured via EEG brain signals — while they used a web platform designed to connect construction industry professionals with academics. The intervention was the web platform itself, with no comparator condition (no control group, no alternative platform, no baseline task). The outcome measures were:

**Primary outcome:** Accuracy of the LSTM model in predicting EEG signals, measured by root mean square error (RMSE) and coefficient of determination (R²).

**Secondary outcome:** The relationship between predicted EEG signals and self-reported mental workload (measured via the NASA-TLX scale, though the paper does not report specific NASA-TLX results).

Who was studied

**Sample size:** 19 participants

**Population:** Potential end-users of a web platform designed for industry-academia collaboration in the construction sector. No further demographic details (age, gender, education, profession) are reported.

**Setting:** A "real case scenario" — participants interacted with the web platform in an unspecified environment. It is unclear whether this was a lab, office, or home setting.

**Key limitation:** The sample is extremely small (n=19) for training a neural network, and no information is given about participant characteristics, making it impossible to know who this applies to.

How they measured it

**Cognitive load (physiological):** A 5-channel EEG device (likely a consumer-grade or research-grade portable headset, though the specific model is not named) recorded brain signals from five electrode locations on the scalp. EEG measures electrical activity from neurons firing, and specific frequency bands (theta, alpha, beta) are associated with mental workload. The researchers did not report which frequency bands they analysed or how they defined "cognitive load" from the raw EEG signal.

**Mental workload (self-report):** The NASA Task Load Index (NASA-TLX), a validated 6-item questionnaire (mental demand, physical demand, temporal demand, performance, effort, frustration) scored 0–100 per item, with higher scores indicating greater workload. However, the paper does not report any NASA-TLX results — it only mentions that the scale was used.

**Model performance metrics:** Root mean square error (RMSE) — lower is better, measures average prediction error in the same units as the EEG signal. Coefficient of determination (R²) — ranges from 0 to 1, with 1 meaning perfect prediction. The paper reports RMSE = 0.021 and R² = 0.89 for the best model, but these numbers must be interpreted with caution (see Limitations).

Methodology

**Study design:** This is not a controlled experiment. It is a proof-of-concept machine learning study. The design is:

**Single-group observational:** All 19 participants performed the same task (using the web platform) while their EEG was recorded.

**No randomisation:** There was no random assignment to conditions because there was only one condition.

**No blinding:** Both participants and researchers knew the purpose of the study. Blinding is irrelevant here because there is no treatment to blind.

**No control group:** There was no baseline task (e.g., resting state, low-load task, or a different platform) to compare cognitive load levels.

**Duration:** Not reported. Participants interacted with the web platform for an unspecified amount of time. The EEG recording duration is also not stated.

**Data split:** The researchers split the EEG data into training (70%), validation (15%), and test (15%) sets. This is standard for machine learning, but with only 19 participants, the test set likely contains data from only 2–3 people.

**Statistical approach:** The researchers trained an LSTM (a type of recurrent neural network designed for time-series data) to predict future EEG signals based on past EEG signals. They compared the model's predictions to the actual recorded EEG signals using RMSE and R². They also compared predicted mental workload (derived from the predicted EEG) to actual self-reported NASA-TLX scores, but no statistical tests (p-values, confidence intervals) are reported for this comparison.

**What this design can prove:**

That an LSTM model can learn patterns in EEG data from a small group of people using a specific web platform.

That the model can generate EEG-like signals that resemble real recordings (low RMSE, high R² on the test set).

**What this design cannot prove:**

That the model predicts cognitive load in new users (the test set came from the same 19 people as the training set — this is not external validation).

That the web platform causes high or low cognitive load (no comparator condition).

That the model works for different web platforms, different tasks, or different populations.

That EEG-based cognitive load prediction is better than simpler methods (e.g., self-report, task performance metrics).

**Major methodological weaknesses:**

**Extremely small sample (n=19)** for training a neural network, which typically requires hundreds or thousands of participants to generalise.

**No external validation:** The model was tested on a subset of the same 19 people. It is unknown whether it would work for a new person.

**No baseline or control condition:** Without a low-load or high-load reference task, we cannot interpret what the predicted cognitive load values mean.

**Missing details:** EEG electrode locations, frequency bands, recording duration, participant demographics, and NASA-TLX results are not reported.

**Potential overfitting:** An R² of 0.89 on a test set from only 19 participants is suspiciously high and may indicate that the model memorised patterns specific to these individuals rather than learning generalisable features.

Key findings

The LSTM model achieved a root mean square error (RMSE) of 0.021 on the test set. The units are not specified (likely microvolts, the standard unit for EEG amplitude).

The model achieved a coefficient of determination (R²) of 0.89 on the test set, meaning the model explained 89% of the variance in the actual EEG signals.

The paper claims the model could predict mental workload (via NASA-TLX) from the predicted EEG signals, but no specific numbers (correlation, p-value, or error) are reported for this comparison.

No primary vs. secondary outcome distinction is made because this is not a hypothesis-testing study.

**Important caveat:** These performance metrics are reported without confidence intervals, without comparison to a baseline model (e.g., a simple average predictor), and without external validation. In machine learning, an R² of 0.89 on a test set from 19 people is not credible for real-world deployment.

Effect magnitude

The paper does not report effect sizes in the traditional sense (Cohen's d, mean differences, etc.) because there is no comparison between conditions. The "effect" is the model's prediction accuracy:

An RMSE of 0.021 means the model's predictions were, on average, off by about 0.021 units (likely microvolts). To put this in context, a typical EEG signal ranges from about -100 to +100 microvolts, so an error of 0.021 microvolts is very small — almost too small to be believable for a 5-channel consumer-grade EEG device, which typically has noise levels of 1–5 microvolts.

An R² of 0.89 means the model captured 89% of the variation in the EEG signal. For comparison, state-of-the-art EEG prediction models on large datasets (hundreds of participants) typically achieve R² values of 0.4–0.7. An R² of 0.89 on 19 participants is a red flag for overfitting.

**In plain English:** The model appears to predict brain signals with very high accuracy, but the numbers are suspiciously good given the tiny sample size and consumer-grade hardware. A more realistic interpretation is that the model learned the specific noise patterns of these 19 individuals rather than general features of cognitive load.

Limitations

**Acknowledged by authors (inferred from abstract):**

The study is presented as a proof-of-concept, not a definitive validation.

The authors note that the model could be used to "understand users' cognitive demand" but do not claim it is ready for deployment.

**Critical reader observations:**

**Sample size:** 19 participants is far too small for training a neural network. Machine learning models typically require at least 100–200 participants per class or condition to generalise. With 19 people, the model likely memorised individual-specific EEG patterns.

**No external validation:** The test set came from the same 19 people. The model has never been tested on a new person. This is the most critical limitation — the reported accuracy is meaningless for real-world use.

**No baseline task:** Without a low-load condition (e.g., resting with eyes closed) or a high-load condition (e.g., solving complex math problems), we cannot interpret what "cognitive load" means in this context. The model might be predicting general brain activity, not cognitive load specifically.

**Consumer-grade EEG:** A 5-channel EEG device has limited spatial resolution and is prone to noise from muscle movement, eye blinks, and environmental interference. The paper does not report how they cleaned the EEG data (e.g., artefact removal, filtering).

**Missing demographic data:** Without knowing the age, profession, or tech-savviness of participants, we cannot assess generalisability. Construction industry professionals and academics may have very different cognitive responses to a web platform.

**No statistical tests:** The paper reports RMSE and R² but no confidence intervals, p-values, or cross-validation results. This makes it impossible to assess the reliability of the findings.

**Publication venue:** The paper appears to be a book chapter or conference proceeding, not a peer-reviewed journal article. The quality of peer review is unknown.

**Industry funding:** Not explicitly stated, but the topic (industry-academia collaboration platform) suggests potential conflict of interest. The platform itself may have been developed by the researchers or their collaborators.

Practical takeaways

**For someone running their own n=1 experiment:**

**What to test:**

**Do not test this specific model.** The study is too preliminary and the model is not validated for individual use. The LSTM approach requires training on your own EEG data, which is impractical without a lab-grade EEG setup and machine learning expertise.

**Instead, test a simpler cognitive load tracking method:** Use a 2-minute self-report scale (e.g., the NASA-TLX or the simpler Single Ease Question) immediately after using a web platform or software tool. Rate your mental effort on a 1–7 scale.

**Minimum meaningful duration:**

For self-report tracking: 1–2 weeks of daily ratings, using the platform for at least 15–30 minutes each session. This gives enough data points to see patterns.

For EEG-based tracking (if you have access to a consumer EEG headset like the Muse or Emotiv): At least 5–10 sessions of 10–20 minutes each, under consistent conditions (same time of day, same room, same lighting).

**What to measure:**

**Primary metric:** Self-reported mental workload (NASA-TLX or a single 1–7 rating of "mental effort required").

**Secondary metric:** Task completion time and error rate (if the platform involves specific tasks like finding information or completing a form).

**Optional metric:** Heart rate variability (HRV) from a chest strap or smartwatch — lower HRV is associated with higher cognitive load. Measure for 5 minutes before and after using the platform.

**Confound tracking:** Log your sleep quality (1–5 scale), caffeine intake (mg), time of day, and how many hours since you last ate. All of these affect cognitive load.

**Key confounds to control for:**

**Time of day:** Cognitive performance varies by circadian rhythm. Always test at the same time of day (±1 hour).

**Caffeine and stimulants:** Record all caffeine, nicotine, and other stimulant use. A difference of one cup of coffee can change cognitive load ratings by 1–2 points on a 7-point scale.

**Sleep debt:** Even one night of poor sleep increases cognitive load by 15–30% on typical tasks. Log sleep quality and duration.

**Familiarity with the platform:** The first 3–5 uses will have higher cognitive load due to learning effects. Discard the first 3 sessions from analysis, or track "session number" as a variable.

**Task difficulty:** If the platform has different features or tasks, standardise what you do each session (e.g., "create a new project, add 5 team members, upload 3 documents").

**Environmental noise:** Background noise (conversations, traffic, music) increases cognitive load. Use noise-cancelling headphones or test in a quiet room.

**What a positive result would look like:**

**For self-report:** Your mental workload ratings decrease by at least 1 point on a 7-point scale (or 10 points on NASA-TLX) after a platform redesign or after you become more familiar with the platform. A decrease of 0.5 points per week over 4 weeks is a meaningful improvement.

**For task performance:** Task completion time decreases by at least 20% while error rate stays the same or decreases. For example, if it takes you 10 minutes to complete a task on day 1, and 8 minutes on day 14, that is a positive result.

**For EEG (if using consumer hardware):** A decrease in frontal theta power (4–8 Hz) and an increase in parietal alpha power (8–12 Hz) during platform use, compared to baseline. These are standard EEG markers of lower cognitive load. However, consumer EEG devices are noisy — you would need at least 10 sessions to see a reliable trend.

**Combined metric:** The strongest positive result is a consistent pattern across multiple measures: lower self-reported workload, faster task completion, fewer errors, and improved HRV (higher HRV) during platform use.

**Bottom line for self-experimenters:** Skip the EEG and neural network approach entirely. Use a simple daily self-report scale (1–7 mental effort) plus task completion time. Track for 2–4 weeks, controlling for time of day, caffeine, and sleep. This will give you actionable data about which features of a web platform or software tool are cognitively demanding — without needing a PhD in machine learning or a lab-grade EEG setup.

Test it on yourself

Run a structured cognitive performance experiment

The research gives you a prior. Your own data tells you what actually works for you.

Chapter Prediction of Cognitive Load during Industry-Academia Collaboration via a Web Platform | Steady Practice | SteadyPractice