Machine Learning for Brain Disorders

Year: 2023

TL;DR

This is not a single experiment but a comprehensive methods textbook (Neuromethods series) that teaches researchers how to apply machine learning to brain disorders — it provides no experimental results, effect sizes, or causal findings, so it cannot directly inform a personal experiment, but it offers frameworks for analysing your own data if you are collecting brain-related measurements.

What they tested

This is a methodological reference work, not an interventional study. The book covers:

**Fundamentals of machine learning** (supervised vs unsupervised learning, feature selection, model evaluation)

**Data types used in brain disorder research**: clinical assessments (questionnaires, cognitive tests), neuroimaging (MRI, fMRI, PET), electroencephalography (EEG) and magnetoencephalography (MEG), genetics and omics data, electronic health records, mobile device data, connected objects and sensors

**Core ML methodologies**: classification, regression, clustering, deep learning, transfer learning, explainable AI

**Validation and datasets**: cross-validation strategies, data leakage prevention, sample size calculations, public datasets

**Applications to specific disorders**: Alzheimer's disease, Parkinson's disease, schizophrenia, depression, autism, epilepsy, traumatic brain injury, stroke

There are no comparators, no interventions, and no outcome measures in the traditional sense. The "outcome" is methodological guidance — how to design, run, and validate an ML study for brain disorders.

Who was studied

No human participants were studied in this book. The volume is a compilation of methodological chapters written by experts. The intended audience is:

Researchers and graduate students new to ML for brain disorders

Experienced researchers wanting to expand their knowledge

Engineers, computer scientists, neurologists, psychiatrists, radiologists, and neuroscientists

The chapters reference hundreds of individual studies that collectively involve thousands of patients across various brain disorders, but no single sample is described in this book itself.

How they measured it

No measurements were taken in this book. However, the book describes measurement instruments used in the studies it references:

**Neuroimaging**: MRI (structural, functional, diffusion tensor), PET, SPECT — measuring brain volume, cortical thickness, white matter integrity, glucose metabolism, amyloid burden

**Electrophysiology**: EEG (event-related potentials, spectral power, connectivity), MEG — measuring electrical and magnetic brain activity at millisecond resolution

**Clinical assessments**: Mini-Mental State Examination (MMSE, 0–30, lower = worse cognition), Hamilton Depression Rating Scale (HAM-D, 0–52, higher = more depressed), Unified Parkinson's Disease Rating Scale (UPDRS, 0–199, higher = worse)

**Genetics/omics**: SNP arrays, whole-genome sequencing, transcriptomics, proteomics, metabolomics

**Digital phenotyping**: smartphone accelerometer, GPS, call logs, screen time, keystroke dynamics, wearable heart rate and step count

Methodology

**Study design**: This is a methods textbook (Neuromethods series) — a collection of 20+ chapters written by different authors, each reviewing the state of the art in a specific sub-area of ML for brain disorders. It is not a systematic review, meta-analysis, or original research study.

**Structure**: The book is organised into five parts:

1. Fundamentals of ML (chapters on basic concepts, model types, evaluation metrics)

2. Data types (chapters on clinical data, neuroimaging, EEG/MEG, genetics, EHRs, mobile devices)

3. Core methodologies (chapters on classification, regression, deep learning, transfer learning, explainability)

4. Validation and datasets (chapters on cross-validation, data leakage, sample size, public datasets)

5. Applications (chapters on Alzheimer's, Parkinson's, schizophrenia, depression, autism, epilepsy, stroke, TBI)

**What this design can prove**: A methods textbook can provide comprehensive, expert-reviewed guidance on how to conduct ML research in brain disorders. It can synthesise best practices, highlight common pitfalls, and direct readers to appropriate techniques for specific data types and research questions.

**What this design cannot prove**: It cannot prove that any specific ML method works for a given disorder, because it does not present original experimental results. It cannot establish causal relationships between brain features and clinical outcomes. It cannot provide effect sizes, confidence intervals, or p-values for any intervention. It cannot tell a self-experimenter what to do.

**Major methodological weaknesses**:

No systematic search strategy (not a systematic review)

No quantitative synthesis (not a meta-analysis)

No original data analysis

No registration or protocol

Potential selection bias in which topics and studies are included

No formal quality assessment of referenced studies

Authors may have conflicts of interest (not disclosed in the abstract)

Publication date 2023 means some methods may already be outdated given rapid ML progress

Key findings

Since this is a methods book, there are no numerical findings. However, the book makes several methodological claims that are relevant for anyone wanting to use ML in brain research:

**Data quality trumps algorithm complexity**: Simple models (logistic regression, random forests) often outperform deep learning on small or noisy clinical datasets. A typical rule of thumb is 10–20 samples per feature to avoid overfitting.

**Cross-validation is essential**: K-fold cross-validation (typically k=5 or k=10) is the standard for estimating model performance. Leave-one-out cross-validation is used for very small samples (n<50).

**Data leakage is the most common fatal error**: Leakage occurs when information from the test set influences training — e.g., normalising data before splitting, using future data to predict past events, or including features that are proxies for the outcome.

**Sample size requirements vary dramatically**: For classification of Alzheimer's disease from MRI, ~100–200 subjects per group may achieve 80–90% accuracy. For rare disorders or subtle early-stage detection, 500–1000+ subjects may be needed.

**Explainability is increasingly required**: Methods like SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) are now standard for understanding which brain features drive predictions.

**Multimodal fusion often outperforms single-modality models**: Combining MRI + clinical tests + genetics can improve classification accuracy by 5–15% over any single modality.

**External validation is rare but critical**: Most published models perform 10–20% worse on external datasets from different scanners or populations.

Effect magnitude

Not applicable — no experimental effects are reported. However, the book does reference typical performance metrics from the literature:

Alzheimer's disease classification from MRI: AUC (area under the receiver operating characteristic curve) of 0.85–0.95, sensitivity 80–90%, specificity 80–90%

Depression classification from resting-state fMRI: AUC 0.60–0.75 (modest, often not clinically useful)

Schizophrenia classification from structural MRI: accuracy 70–85%

Epileptic seizure detection from EEG: sensitivity 90–98%, false positive rate 0.1–1 per hour

Parkinson's disease diagnosis from voice recordings: accuracy 85–95% in controlled settings, dropping to 60–75% in real-world conditions

These numbers come from individual studies referenced in the book, not from the book itself.

Limitations

**Acknowledged by the authors (implicitly through the book's structure)**:

The field is rapidly evolving, so methods described may become outdated

Most ML models for brain disorders have not been validated in clinical practice

Sample sizes in brain disorder ML studies are often too small for reliable results

Many published models are not reproducible due to incomplete reporting

There is a gap between technical performance and clinical utility

**Critical reader observations**:

No quantitative synthesis of evidence — the book is a collection of expert opinions, not a systematic review

No assessment of publication bias (positive results are more likely to be published)

No discussion of the replication crisis in ML for healthcare

Limited coverage of fairness and bias issues (ML models may perform differently across demographic groups)

No practical guidance for individual self-experimenters — the book is written for researchers with access to large datasets and clinical populations

The cost and complexity of most methods (MRI, EEG, genetic sequencing) make them inaccessible to individuals running n=1 experiments

No mention of effect sizes, confidence intervals, or Bayesian approaches for small-sample inference

The book does not address how to handle the multiple comparisons problem when testing many ML models on the same data

Practical takeaways

For someone running their own n=1 experiment, this book is **not directly applicable** because it focuses on group-level ML analyses with large datasets. However, the methodological principles can be adapted for personal experiments if you are collecting quantitative brain data (e.g., from consumer EEG devices, smartphone cognitive tests, or wearable sensors).

### What to test

**If you have a consumer EEG device (e.g., Muse, Emotiv)**: Test whether a specific intervention (meditation, neurofeedback, caffeine, sleep extension) changes your EEG power spectrum in a predictable way. For example, test whether 20 minutes of daily mindfulness meditation for 4 weeks increases frontal alpha power (8–12 Hz) during rest.

**If you use smartphone cognitive tests**: Test whether a nootropic supplement (e.g., 200 mg caffeine + 100 mg L-theanine) improves your reaction time or working memory on a standardised test (e.g., the n-back task or psychomotor vigilance task).

**If you track sleep with a wearable**: Test whether blue-light blocking glasses 2 hours before bed for 2 weeks changes your sleep onset latency or slow-wave sleep percentage.

### Minimum meaningful duration

For EEG changes: 2–4 weeks of daily practice to see stable baseline shifts

For cognitive tests: 1–2 weeks per condition (intervention vs control), with at least 5–10 test sessions per condition

For sleep tracking: 2 weeks per condition, with at least 7 nights of usable data per condition

### What to measure (specific metrics)

**EEG**: Frontal alpha power (µV²), theta/beta ratio, frontal asymmetry index. Measure during a standardised 5-minute eyes-closed resting state.

**Cognitive tests**: Reaction time (ms), accuracy (%), throughput (correct responses per minute). Use a validated test like the Psychomotor Vigilance Task (PVT) or the n-back task.

**Sleep**: Sleep onset latency (minutes), total sleep time (hours), slow-wave sleep percentage (%), sleep efficiency (%). Use a validated wearable (e.g., Oura Ring, Whoop, or actigraphy).

**Mood/energy**: Daily ratings on a 0–10 scale for mood, energy, focus, and stress (collected at the same time each day).

### Key confounds to control for

**Time of day**: Test at the same time each day (±1 hour) to control for circadian effects

**Prior sleep**: Record sleep duration and quality the night before each test session

**Caffeine and alcohol**: Standardise intake (e.g., no caffeine 4 hours before testing, no alcohol 12 hours before)

**Exercise**: Record daily exercise type and duration

**Menstrual cycle**: If applicable, track cycle phase as it affects EEG and cognition

**Learning effects**: Use parallel versions of cognitive tests or randomise the order of conditions

**Expectation**: If possible, use a blinded design (e.g., have someone else prepare your supplement doses so you don't know which is active)

**Placebo effect**: Include a placebo condition (e.g., identical-looking capsules with rice flour)

### What a positive result would look like

**EEG**: A consistent increase in frontal alpha power of ≥15% during the intervention period compared to baseline, with the effect reversing when you stop the intervention (reversal design). Use a moving average (7-day window) to visualise the trend.

**Cognitive tests**: A reduction in reaction time of ≥20 ms (for PVT) or an increase in n-back accuracy of ≥5% that is consistent across at least 5 test sessions per condition. Calculate the mean and standard deviation for each condition — a positive result is a difference greater than 1.5 times the pooled standard deviation.

**Sleep**: A reduction in sleep onset latency of ≥10 minutes or an increase in slow-wave sleep of ≥5% that persists for at least 5 nights during the intervention period.

**Mood/energy**: An increase of ≥2 points on a 0–10 scale that is consistent across at least 10 days of the intervention period.

**Important caveat**: With n=1 experiments, you cannot generalise your results to anyone else. You also cannot rule out placebo effects, regression to the mean, or random fluctuations without a rigorous reversal design (A-B-A-B) or multiple crossovers. For a credible self-experiment, run at least 3 cycles of intervention and control (e.g., 2 weeks on, 2 weeks off, 2 weeks on, 2 weeks off) and look for consistent patterns.

**If you want to use ML on your own data**: You would need at least 50–100 data points per feature to train a reliable model. For most self-experimenters, simple visual inspection and basic statistics (mean, standard deviation, effect size) are more appropriate than ML. The book's advice on data leakage and cross-validation is still relevant — never test a model on data that was used to train it, even in an n=1 context.

Buy on Amazon →More Cognitive Performance research