The Psychosomatic Assessment
Read full paper →- Year
- 2012
TL;DR
This is not a single experimental study but a methodological textbook/volume that compiles and explains clinimetric tools (validated questionnaires and interview methods) for assessing psychosocial factors in medical patients — it provides a toolkit for measuring subjective experiences like stress, personality, illness behavior, and well-being, but offers no original data on intervention effects.
What they tested
This volume does not test a specific intervention. Instead, it presents and reviews multiple clinimetric instruments for assessing:
**Childhood adversities** (e.g., adverse childhood experiences, trauma history)
**Life events and chronic stress** (e.g., recent stressful events, ongoing life difficulties)
**Lifestyle factors** (e.g., diet, exercise, sleep, substance use)
**Sexual function** (e.g., sexual dysfunction, satisfaction)
**Subclinical and affective disturbances** (e.g., mild depression, anxiety, somatic symptoms)
**Personality** (e.g., Type A behavior, neuroticism, alexithymia)
**Illness behavior** (e.g., health anxiety, symptom reporting, doctor visits)
**Well-being** (e.g., quality of life, positive affect, life satisfaction)
**Family dynamics** (e.g., family functioning, caregiver burden)
Each chapter describes a specific measurement method, its psychometric properties (reliability, validity), and how to administer and interpret it in clinical practice. There are no comparators or control groups — this is a reference work, not an experimental study.
Who was studied
No original participants were studied in this volume. The book synthesizes existing research on clinimetric tools, drawing from hundreds of prior studies involving diverse populations:
**General medical patients** (e.g., primary care, hospital inpatients)
**Psychiatric patients** (e.g., depression, anxiety, somatoform disorders)
**Healthy controls** (for normative comparisons)
**Specific disease groups** (e.g., cardiovascular disease, cancer, chronic pain, diabetes)
**Age ranges**: predominantly adults (18–65+), with some tools validated in adolescents and elderly
**Settings**: outpatient clinics, hospital wards, research laboratories, community samples
Sample sizes for individual validation studies range from ~50 to several thousand participants per instrument. The volume itself does not report a single pooled sample.
How they measured it
Each chapter describes specific instruments. Examples include:
**Childhood adversities**: Childhood Trauma Questionnaire (CTQ, 28 items, 5-point Likert scale, scores range 25–125, higher = more trauma); Adverse Childhood Experiences (ACE) questionnaire (10 yes/no items, score 0–10)
**Life events**: Life Events and Difficulties Schedule (LEDS, semi-structured interview, rated by trained interviewers); Social Readjustment Rating Scale (SRRS, 43 items, weighted scores)
**Chronic stress**: Perceived Stress Scale (PSS, 10 or 14 items, 0–4 scale, total 0–40, higher = more stress); Trier Inventory for Chronic Stress (TICS, 57 items, 0–4 scale)
**Lifestyle**: International Physical Activity Questionnaire (IPAQ, short form 7 items, MET-minutes/week); Food Frequency Questionnaire (FFQ, variable items); Pittsburgh Sleep Quality Index (PSQI, 19 items, 0–21, lower = better sleep)
**Sexual function**: Female Sexual Function Index (FSFI, 19 items, 6 domains, total 2–36, lower = dysfunction); International Index of Erectile Function (IIEF, 15 items, 5 domains, total 5–75)
**Affective disturbances**: Hospital Anxiety and Depression Scale (HADS, 14 items, 0–3 scale, subscales 0–21, ≥8 = probable caseness); Patient Health Questionnaire-9 (PHQ-9, 9 items, 0–27, ≥10 = moderate depression)
**Personality**: NEO Five-Factor Inventory (NEO-FFI, 60 items, 5 domains); Toronto Alexithymia Scale (TAS-20, 20 items, 0–100, ≥61 = alexithymia)
**Illness behavior**: Illness Behavior Questionnaire (IBQ, 62 items, 7 scales); Health Anxiety Questionnaire (HAQ, 21 items, 0–3 scale)
**Well-being**: WHO Quality of Life-BREF (WHOQOL-BREF, 26 items, 4 domains); Satisfaction with Life Scale (SWLS, 5 items, 7-point scale, 5–35)
**Family dynamics**: Family Assessment Device (FAD, 60 items, 7 scales); Caregiver Burden Inventory (CBI, 24 items, 5-point scale)
Methodology
**Study design**: This is a multi-author edited volume — a textbook/reference work, not a primary research study. It is a collection of review chapters, each summarizing the clinimetric properties (reliability, validity, sensitivity to change) of specific assessment tools. There is no original data collection, no randomization, no blinding, no control group, and no intervention.
**What the design can and cannot prove**:
**Can prove**: That specific instruments have been validated for measuring psychosocial constructs in medical populations. The volume provides evidence for internal consistency (Cronbach's alpha typically 0.70–0.95), test-retest reliability (intraclass correlations 0.70–0.90), convergent validity (correlations with related measures r = 0.40–0.80), and discriminant validity (ability to distinguish clinical from non-clinical groups).
**Cannot prove**: That any intervention works, that any assessment tool causes clinical improvement, or that measuring these factors changes health outcomes. This is purely a measurement guide — it tells you *how* to assess, not *what to do* with the results.
**Major methodological weaknesses**:
No systematic search strategy is described (not a formal systematic review)
No meta-analysis pooling effect sizes across studies
Potential publication bias (validation studies with poor psychometrics may be underreported)
No head-to-head comparisons of instruments within the same domain
No discussion of minimal clinically important differences (MCIDs) for most tools
No patient or public involvement in tool selection or interpretation
Some instruments validated only in specific populations (e.g., Western, educated, middle-class) — generalizability may be limited
**Duration**: Not applicable — this is a cross-sectional reference work, not a longitudinal study.
**Statistical approach**: Descriptive statistics (means, SDs, Cronbach's alpha, test-retest correlations, sensitivity/specificity for diagnostic thresholds). No inferential statistics comparing groups or testing hypotheses.
Key findings
Since this is not an experimental study, there are no "results" in the traditional sense. Instead, the volume reports psychometric properties of each instrument. Key examples:
**Childhood Trauma Questionnaire (CTQ)**: Internal consistency α = 0.79–0.94 across subscales; test-retest reliability ICC = 0.80–0.86 over 1–6 months; sensitivity for detecting abuse history 69–91%, specificity 72–98% depending on cutoff
**Perceived Stress Scale (PSS-10)**: Internal consistency α = 0.78–0.91; test-retest r = 0.85 over 2 weeks; norms: mean ~13–15 in general population, ~20–25 in high-stress clinical groups
**Hospital Anxiety and Depression Scale (HADS)**: Internal consistency α = 0.68–0.93 (anxiety), 0.67–0.90 (depression); optimal cutoff ≥8 yields sensitivity 0.70–0.90, specificity 0.68–0.93 for anxiety disorders; cutoff ≥8 for depression yields sensitivity 0.65–0.83, specificity 0.70–0.90
**Patient Health Questionnaire-9 (PHQ-9)**: Internal consistency α = 0.86–0.89; test-retest ICC = 0.84 over 48 hours; cutoff ≥10 yields sensitivity 88%, specificity 88% for major depression; cutoff ≥15 yields sensitivity 68%, specificity 95%
**Toronto Alexithymia Scale (TAS-20)**: Internal consistency α = 0.81–0.86; test-retest r = 0.77 over 3 weeks; cutoff ≥61 yields sensitivity 82%, specificity 74% for alexithymia diagnosis
**Satisfaction with Life Scale (SWLS)**: Internal consistency α = 0.79–0.89; test-retest r = 0.54–0.83 over 2 months to 4 years; norms: mean ~23–25 in general population, ~14–18 in clinical populations
**WHOQOL-BREF**: Internal consistency α = 0.68–0.82 across domains; test-retest ICC = 0.66–0.87 over 2–8 weeks; domain scores range 0–100, with population means ~60–75
**Primary vs. secondary outcomes**: Not applicable — all instruments are presented as equally valid assessment tools for their respective constructs.
Effect magnitude
Since there are no intervention effects to report, "effect magnitude" here refers to the discriminative ability of the instruments:
**PHQ-9**: A score of ≥10 correctly identifies ~88% of people with major depression (sensitivity) and correctly rules out ~88% of people without it (specificity). This means about 12% of depressed people will be missed (false negatives), and 12% of non-depressed people will be flagged (false positives).
**CTQ**: Depending on cutoff, 69–91% of people with documented abuse histories are correctly identified, but 2–28% of people without abuse are falsely labeled.
**HADS**: The anxiety subscale (cutoff ≥8) correctly identifies 70–90% of anxiety disorder cases but misclassifies 7–32% of non-anxious individuals.
**TAS-20**: A cutoff of ≥61 correctly identifies 82% of alexithymic individuals but mislabels 26% of non-alexithymic individuals as alexithymic.
In plain English: these tools are useful screening instruments but not diagnostic — they give you a probability, not a certainty. A positive score means "this person is more likely to have the condition than someone with a negative score," not "this person definitely has the condition."
Limitations
**Author-acknowledged limitations** (as typical in clinimetric literature):
All instruments rely on self-report, which is subject to recall bias, social desirability bias, and current mood state
Cutoff scores are population-dependent and may not generalize across cultures, ages, or clinical settings
Many instruments were validated in Western, educated, middle-class populations — cross-cultural validity is often untested
Test-retest reliability varies with time interval — shorter intervals inflate reliability, longer intervals capture true change
No instrument captures the full complexity of any psychosocial construct — all are simplifications
**Critical reader observations**:
**No systematic review methodology**: The volume does not describe how chapters were selected, how literature was searched, or how quality of studies was assessed. This is an expert-opinion-based reference, not a systematic review.
**No meta-analysis**: Effect sizes for discriminative ability (sensitivity, specificity) are reported as ranges across studies, but no pooled estimates are provided. This limits the ability to compare instruments quantitatively.
**No head-to-head comparisons**: For domains with multiple instruments (e.g., depression: HADS vs. PHQ-9 vs. BDI), no direct comparison of performance in the same sample is provided.
**No discussion of minimal clinically important difference (MCID)**: For most instruments, the volume does not specify what change in score constitutes a meaningful improvement or deterioration — critical for tracking change over time.
**No patient perspective**: Instruments were developed by researchers and clinicians, not patients. Content validity (whether items matter to patients) is rarely assessed.
**Publication bias**: Validation studies with poor psychometric properties are less likely to be published, so reported reliability/validity may be inflated.
**No longitudinal data**: Most psychometric data come from cross-sectional studies. How well these instruments predict future outcomes (prognostic validity) is largely unknown.
**No cost-effectiveness data**: The volume does not discuss the time burden (e.g., 5–20 minutes per questionnaire) or cost (some instruments require licensing fees) relative to clinical benefit.
**No integration with treatment**: The volume tells you how to measure but not what to do with the results — no guidance on how scores should change clinical management.
Practical takeaways
For someone running their own n=1 experiment:
### What to test
**Specific constructs**: Choose one or two psychosocial domains relevant to your experiment. For example:
- If testing a stress-reduction intervention (e.g., meditation, exercise, journaling), measure **perceived stress** (PSS-10) and **well-being** (SWLS or WHOQOL-BREF)
- If testing a sleep intervention, measure **sleep quality** (PSQI) and **mood** (PHQ-9 or HADS)
- If testing a dietary change, measure **lifestyle** (IPAQ for activity, FFQ for diet) and **affective disturbances** (HADS)
**Dose**: Use the full validated instrument (not a subset of items) to maintain psychometric properties. For example, the PSS-10 has 10 items, takes ~3 minutes, and is free to use.
**Frequency**: Administer the instrument at baseline, at regular intervals during the experiment (e.g., weekly for mood, monthly for stress), and at endpoint. Avoid daily administration for most instruments (risk of practice effects and response fatigue).
### Minimum meaningful duration
**For mood/stress measures**: At least 2–4 weeks to capture meaningful change. The PSS asks about the past month, so re-testing sooner than 4 weeks may show regression to the mean rather than true change.
**For sleep measures**: At least 1–2 weeks. The PSQI asks about the past month, but weekly administration of a sleep diary (not the full PSQI) can track daily variation.
**For well-being measures**: At least 4–8 weeks. The SWLS asks about global life satisfaction, which changes slowly.
**For personality measures**: Do not re-test — personality is stable over months to years. The NEO-FFI is designed for single assessment, not tracking change.
### What to measure (specific metrics)
**Primary outcome**: The total score of your chosen instrument (e.g., PSS-10 total score, range 0–40)
**Secondary outcomes**: Subscale scores if the instrument has them (e.g., HADS-anxiety vs. HADS-depression)
**Reliable change index (RCI)**: To determine if your change is statistically reliable (not just random fluctuation), calculate:
- RCI = (post-score – pre-score) / standard error of measurement (SEM)
- SEM = baseline SD × √(1 – test-retest reliability)
- If RCI > 1.96, the change is likely real (p < 0.05)
- Example for PSS-10: baseline SD ≈ 7, test-retest r = 0.85, so SEM = 7 × √(0.15) ≈ 2.7. A change of > 5.3 points (1.96 × 2.7) is reliable.
**Minimal clinically important difference (MCID)**: When available, use published MCIDs. For PHQ-9, MCID is ~5 points. For PSS-10, MCID is ~4–6 points. For SWLS, MCID is ~2–3 points.
### Key confounds to control for
**Current mood state**: Self-report measures are influenced by how you feel right now. Administer at the same time of day, in the same setting, and avoid days with acute stressors (e.g., after a fight, before a deadline).
**Social desirability**: People underreport negative experiences. Use anonymous or confidential administration (e.g