Human resource management in the age of generative artificial intelligence: Perspectives and research directions on ChatGPT
Read full paper →- Authors
- Pawan Budhwar, Soumyadeb Chowdhury, Geoffrey Wood, Herman Aguinis, Greg J. Bamber, Jose R. Beltran, Paul Boselie, Fang Lee Cooke, Stephanie Decker, Angelo S. DeNisi, Prasanta Kumar Dey, David Guest, Andrew J. Knoblich, Ashish Malik, Jaap Paauwe, Savvas Papagiannidis, Charmi Patel, Vijay Pereira, Shuang Ren, Steven G. Rogelberg, Mark N. K. Saunders, Rosalie L. Tung, Arup Varma
- Journal
- Human Resource Management Journal
- Year
- 2023
- Citations
- 722
TL;DR
This perspectives editorial synthesises existing literature to argue that generative AI (like ChatGPT) will fundamentally reshape human resource management (HRM) across recruitment, performance management, employee well-being, and ethical compliance, but the evidence base is almost entirely theoretical — no controlled experiments exist yet, so anyone running a self-experiment on using generative AI for HR tasks must treat all claims as hypotheses, not proven effects.
What they tested
This is not an empirical study. It is a "perspectives editorial" — a scholarly opinion piece that reviews existing literature on AI and generative AI, then proposes research directions for HRM. The authors tested no interventions, no comparators, and no outcome measures. Instead, they:
Reviewed the existing (pre-2023) literature on AI in HRM, covering recruitment, selection, performance appraisal, training, employee monitoring, and well-being.
Identified six key themes where generative AI (specifically ChatGPT and its variants) could impact HRM: job displacement vs. creation, stakeholder relationships, business models, academic research, ethical risks (bias, privacy, misinformation), and employee well-being.
Proposed a research agenda with testable hypotheses for future empirical work.
The "outcome" is a conceptual framework — not data. For someone running a self-experiment, this paper provides a list of *potential* interventions to test (e.g., "use ChatGPT to draft job descriptions" or "use ChatGPT to provide performance feedback") but zero effect sizes or durations to guide you.
Who was studied
No human participants were studied. The paper draws on:
A narrative review of approximately 80–100 cited works (the authors do not specify an exact number or systematic search strategy).
The authors' own expertise as senior HRM scholars (all are professors at major universities: Aston University, University of Bradford, University of Cambridge, George Washington University, Monash University, and University of South Australia).
Public statements from OpenAI's CEO (Sam Altman) and media coverage of ChatGPT's launch in November 2022.
The "sample" is the existing academic literature on AI in HRM, which itself is dominated by conceptual papers, case studies, and small-scale surveys — not randomised controlled trials. The authors acknowledge this explicitly: "the full consequences are largely undiscovered and uncertain."
How they measured it
No measurement instruments were used. The paper employs:
**Narrative synthesis**: The authors grouped existing studies into thematic clusters (e.g., "recruitment and selection," "performance management," "employee well-being") and summarised common arguments.
**Gap analysis**: They identified what has *not* been studied (e.g., longitudinal effects of generative AI on career trajectories, or controlled comparisons of AI-generated vs. human-generated performance feedback).
**Speculative forecasting**: They extrapolated from trends in earlier AI technologies (e.g., algorithmic hiring tools from 2015–2022) to predict what generative AI might do differently.
There are no scales, no confidence intervals, no p-values. The closest thing to a measurement is the authors' claim that ChatGPT's impact "could be as big as 'the printing press'" — a qualitative analogy, not a quantified effect.
Methodology
**Study design**: Perspectives editorial (also called a "viewpoint" or "essay" in management journals). This is the lowest level of evidence in the scientific hierarchy — below systematic reviews, below cohort studies, below case-control studies, and far below randomised trials.
**How they conducted the review**: The authors do not report a systematic search strategy (no databases searched, no keywords, no inclusion/exclusion criteria, no PRISMA flowchart). They state they "synthesize the literature on AI and generative AI" but do not explain how they selected which papers to include. This means the review is vulnerable to **confirmation bias** — the authors may have selectively cited papers that support their preferred narrative (that generative AI will be transformative for HRM) while ignoring contradictory evidence.
**What this design can prove**: Nothing. A perspectives editorial can generate hypotheses, identify research gaps, and stimulate debate. It cannot establish cause-and-effect, estimate effect sizes, or provide reliable guidance for practice.
**What this design cannot prove**:
Whether using ChatGPT for HR tasks actually improves outcomes (e.g., faster hiring, fairer performance reviews, better employee well-being).
Whether the risks (bias, privacy violations, misinformation) are large enough to outweigh benefits.
Whether any effects persist over time (the paper covers no longitudinal data).
Whether the authors' predictions are accurate (they are opinions, not findings).
**Major methodological weaknesses**:
1. **No systematic search**: Without a reproducible search strategy, the review is not falsifiable.
2. **No quality assessment**: The authors do not evaluate the rigour of the papers they cite. Some cited works may themselves be weak (e.g., opinion pieces, small-N studies).
3. **No quantitative synthesis**: There are no meta-analytic effect sizes, no forest plots, no heterogeneity statistics.
4. **Publication date**: The paper was published in 2023, just months after ChatGPT's public launch (November 2022). The authors themselves note that "the full consequences are largely undiscovered" — meaning any claims are necessarily speculative.
5. **Conflict of interest**: The authors do not declare any funding from AI companies, but they are all HRM scholars whose careers benefit from positioning their field as "urgently needing new research" — a subtle incentive to overstate the novelty and importance of generative AI.
Key findings
Because this is a perspectives piece, there are no empirical findings. Instead, the authors present six thematic arguments:
**Theme 1: Job displacement vs. creation**
The authors argue generative AI will automate some HR tasks (e.g., drafting job descriptions, screening CVs, answering employee FAQs) but may create new roles (e.g., "prompt engineers," "AI ethics officers").
They cite estimates from other sources (not their own data) that AI could automate 60–70% of current work tasks in some sectors, but provide no specific reference or confidence interval.
**No effect size**: The paper does not quantify how many HR jobs will be displaced or created, or over what timeframe.
**Theme 2: Stakeholder relationships**
Generative AI could change how HR interacts with employees, managers, unions, and regulators. For example, ChatGPT could generate personalised employee communications, but might also produce insensitive or legally risky content.
**No data**: The authors provide no studies comparing AI-generated vs. human-generated HR communications on outcomes like trust, satisfaction, or legal compliance.
**Theme 3: Business models**
The authors suggest generative AI could enable "HR-as-a-service" platforms that replace in-house HR departments, but acknowledge this is speculative.
**No evidence**: No case studies, no market data, no cost-benefit analyses.
**Theme 4: Academic research**
The authors argue generative AI will change how HR research is conducted (e.g., AI-assisted literature reviews, hypothesis generation) but also raise concerns about plagiarism and data fabrication.
**No empirical support**: The paper cites no studies testing whether AI-assisted research is faster, more accurate, or more biased than traditional methods.
**Theme 5: Ethical risks**
The authors list six categories of risk: bias (AI replicating historical discrimination), misinformation (AI generating plausible but false HR policies), context insensitivity (AI failing to account for local laws or culture), privacy (AI processing sensitive employee data), security (AI being hacked or misused), and well-being (AI causing stress or job insecurity).
**No prevalence data**: The paper does not report how often these risks occur in practice, or under what conditions they are most severe.
**Theme 6: Employee well-being**
The authors warn that generative AI could increase surveillance (e.g., AI monitoring employee emails and chat messages) and create "algorithmic anxiety" — stress from being evaluated by an opaque AI system.
**No measurement**: No validated scales, no survey data, no physiological measures of stress.
**Summary of "findings"**: The paper contains zero testable results. It is a call for more research, not a report of completed research.
Effect magnitude
There is no effect magnitude to report. The paper's strongest quantitative claim is a second-hand quote from OpenAI's CEO comparing ChatGPT's potential impact to "the printing press" — an analogy, not a measured effect. For context:
The printing press (c. 1450) reduced the cost of producing a book by roughly 95% and increased book production in Europe from ~30,000 volumes in 1450 to ~20 million by 1500. That is a documented, quantified effect.
ChatGPT's effect on HRM has not been quantified in any peer-reviewed study as of this paper's publication.
If you were to run a self-experiment based on this paper, you would be starting from zero — no baseline effect sizes to power your study, no known minimum meaningful duration, no validated outcome measures.
Limitations
The authors acknowledge several limitations themselves, and a critical reader would note additional ones:
**Acknowledged by authors**:
"The full consequences are largely undiscovered and uncertain" — the paper is explicitly speculative.
"This is a perspectives editorial, not a systematic review" — they do not claim empirical rigour.
The technology is evolving rapidly; claims may be outdated within months (the paper was published in 2023; GPT-4 was released later that year, and GPT-4o in 2024).
**Additional limitations a critical reader would note**:
1. **No systematic methodology**: The review is not reproducible. Another team of scholars could review the same literature and reach different conclusions.
2. **Selection bias**: The authors are all HRM academics. They may overemphasise the importance of HRM-specific effects and underemphasise broader economic or technological factors.
3. **Recency bias**: The paper was written in the first months of ChatGPT's public availability. Early adopters' enthusiasm (or fear) may have coloured the authors' perspective.
4. **No negative evidence**: The paper does not cite studies showing that AI in HRM has *failed* to deliver benefits (e.g., the well-documented failures of algorithmic hiring tools at Amazon, which were scrapped in 2018 for gender bias).
5. **No cost-benefit analysis**: The paper discusses risks and benefits qualitatively but never asks: "How much bias is acceptable in exchange for how much efficiency gain?" or "At what adoption rate do risks outweigh benefits?"
6. **Generalisability**: The paper focuses on ChatGPT specifically, but generative AI is a fast-moving field. Claims about ChatGPT may not apply to other models (e.g., Claude, Gemini, Llama) or to future versions.
Practical takeaways
For someone running their own n=1 experiment using generative AI for HR-related tasks:
### What to test
**Specific intervention**: Use ChatGPT (or another generative AI) to perform one HR task that you currently do manually. Good candidates from this paper:
- Drafting job descriptions for a specific role
- Generating performance review feedback for a direct report
- Writing an employee newsletter or policy update
- Answering common employee questions (e.g., about benefits, leave policies)
- Summarising a long HR document (e.g., a 50-page employee handbook)
**Dose**: For each task, define the exact prompt you will use. For example: "Write a 300-word job description for a mid-level data analyst role at a tech startup, including required skills, responsibilities, and company culture." Run the same prompt 3–5 times to see variability in output.
**Comparator**: The "control" condition is doing the task yourself without AI. For a rigorous n=1 test, you would:
1. Complete the task manually (your baseline).
2. Wait at least 48 hours to avoid carryover effects.
3. Complete the same task using ChatGPT.
4. Repeat this cycle 5–10 times to account for day-to-day variation in your own performance and ChatGPT's outputs.
### Minimum meaningful duration
**Per cycle**: 48 hours minimum between manual and AI conditions (to avoid remembering your previous output).
**Total experiment**: 2–4 weeks (5–10 cycles). This is long enough to see consistent patterns but short enough to maintain motivation.
**Why this duration**: A single comparison is meaningless — you might have a good day or a bad day. Multiple cycles let you calculate an average effect and see if ChatGPT consistently saves time, improves quality, or introduces errors.
### What to measure (specific metrics)
For each task, measure at least three outcomes:
1. **Time**: Use a stopwatch or time-tracking app. Record minutes from start to finish for both manual and AI conditions. For AI, include the time spent writing the prompt, editing the output, and fact-checking.
2. **Quality**: Use a 1–10 self-rating scale for:
- Accuracy (does it contain factual errors?)
- Completeness (does it cover all necessary points?)
- Tone (is it appropriate for the audience?)
- Originality (does it sound generic or tailored?)
- Legal/ethical risk (does it contain biased language, privacy violations, or non-compliant statements?)
Have a colleague (blinded to condition) rate the outputs on the same scales. This reduces self-report bias.
3. **Satisfaction**: Rate your own satisfaction with the process on a 1–10 scale (how stressful, how enjoyable, how confident you are in the output).
### Key confounds to control for
1. **Prompt quality**: A poorly written prompt produces poor output. Standardise your prompt across trials. If you change the prompt, that is a new experiment.
2. **Model version**: ChatGPT changes over time (e.g., GPT-3.5 vs. GPT-4). Note which version you use and do not switch mid-experiment.
3. **Task difficulty**: Some tasks are easier for AI than others. Start with a simple, well-defined task (e.g., drafting a job description) before moving to complex ones (e.g., resolving an employee grievance).
4. **Your own learning**: You will get faster at using ChatGPT over time. This is a confound — your improvement might be due to practice, not the AI itself. To control for this, randomise the order of conditions (e.g., some weeks do manual first, some weeks do AI first).
5. **Fatigue**: If you do both conditions on the same day, your performance on the second task may suffer. Always separate conditions by at least 48 hours.
6. **Expectation bias**: If you believe ChatGPT will save time, you might unconsciously rush through the manual condition. Blind yourself to the hypothesis if possible (e.g., tell yourself you are just "trying two methods" without predicting which will be better).
### What a positive result would look like
A positive result for your n=1 experiment would be:
**Time**: ChatGPT consistently saves you at least 20% of the time per task (e.g., 30 minutes manual vs. 24 minutes AI) across 7+ of your 10 cycles.
**Quality**: Your blinded rater gives ChatGPT outputs an average score of 7/10 or higher on accuracy and completeness, and the AI outputs are not consistently worse than your manual outputs.
**Satisfaction**: You rate the AI process at least 2 points higher than the manual process on a 10-point scale (e.g., 7/10 vs. 5/10).
**Error rate**: ChatGPT introduces factual errors in fewer than 1 in 10 outputs. If errors appear more often, the time savings may not be worth the risk.
**Important caveat**: Even a positive n=1 result does not mean ChatGPT is "better" for HRM in general. It only means it works for *you*, for *that specific task*, under *those specific conditions*. The paper provides no population-level evidence to guide you — you are building that evidence yourself.