Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy
Read full paper →- Authors
- Yogesh K. Dwivedi, Nir Kshetri, Laurie Hughes, Emma Slade, Anand Jeyaraj, Arpan Kumar Kar, Abdullah M. Baabdullah, Alex Koohang, Vishnupriya Raghavan, Manju Ahuja, Hanaa Albanna, Mousa Ahmad Albashrawi, Adil S. Al-Busaidi, Janarthanan Balakrishnan, Yves Barlette, Sriparna Basu, Indranil Bose, Laurence Brooks, Dimitrios Buhalis, Lemuria Carter, Soumyadeb Chowdhury, Tom Crick, Scott W. Cunningham, Gareth H. Davies, Robert M. Davison, Rahul Dé, Denis Dennehy, Yanqing Duan, Rameshwar Dubey, Rohita Dwivedi, John S. Edwards, Carlos Flavián, Robin Gauld, Varun Grover, Mei‐Chih Hu, Marijn Janssen, Paul Jones, Iris Junglas, Sangeeta Khorana, Sascha Kraus, Kai R. Larsen, Paul Latreille, Sven Laumer, Tegwen Malik, Abbas Mardani, Marcello Mariani, Sunil Mithas, Emmanuel Mogaji, Jeretta Horn Nord, Siobhán O’Connor, Fevzi Okumus, Margherita Pagani, Neeraj Pandey, Savvas Papagiannidis, Ilias O. Pappas, Nishith Pathak, Jan Pries‐Heje, Ramakrishnan Raman, Nripendra P. Rana, Sven‐Volker Rehm, Samuel Ribeiro‐Navarrete, Alexander Richter, Frantz Rowe, Suprateek Sarker, Bernd Carsten Stahl, Manoj Tiwari, Wil van der Aalst, Viswanath Venkatesh, Giampaolo Viglia, Michael Wade, Paul Walton, Jochen Wirtz, Ryan Wright
- Journal
- International Journal of Information Management
- Year
- 2023
- Citations
- 3,487
TL;DR
This opinion paper synthesises perspectives from 43 experts across 13 fields to map the opportunities, risks, and research gaps of generative AI like ChatGPT — concluding that while the technology can boost productivity in banking, hospitality, and marketing, it also introduces unresolved threats around bias, misinformation, privacy, and the erosion of human judgment, with no consensus on whether regulation is needed.
What they tested
This is not an experiment. It is a **multi-author opinion piece** — a structured collection of expert commentaries. The “intervention” is the emergence of generative conversational AI (specifically ChatGPT and similar large language models). The “comparator” is the status quo (pre-generative-AI workflows, policies, and ethical frameworks). The “outcome measures” are expert-identified opportunities, challenges, and research priorities across three thematic areas: (1) knowledge, transparency, and ethics; (2) digital transformation of organisations and societies; and (3) teaching, learning, and scholarly research.
The paper does not test a hypothesis. Instead, it aggregates and organises expert opinion to produce a research agenda. The 43 contributors were asked to write short commentaries on ChatGPT’s implications for their field. The authors then synthesised these into a structured framework.
Who was studied
No human subjects were studied. The “sample” is **43 expert contributors** from the following fields:
Computer science
Marketing
Information systems
Education
Policy/government
Hospitality and tourism
Management
Publishing
Nursing
Law
Psychology
Economics
Ethics/philosophy
All contributors are academics or industry practitioners with domain expertise. They were invited based on their publication record and recognised expertise. The paper does not report demographics (age, gender, career stage, geographic location) of the contributors. This is a **non-empirical, qualitative synthesis** — not a systematic review or meta-analysis.
How they measured it
No quantitative measurement instruments were used. The “measurement” was a **structured qualitative process**:
1. **Invitation and commentary solicitation:** Each expert was asked to write a 500–1000 word commentary addressing: (a) key opportunities of generative AI in their field, (b) key challenges/risks, and (c) priority research questions.
2. **Thematic analysis:** The lead authors (Dwivedi et al.) read all 43 commentaries and used an inductive coding approach to identify recurring themes. They grouped these into three overarching thematic areas.
3. **Synthesis and reporting:** The authors wrote a narrative summary of each theme, including direct quotes from contributors, and then generated a list of research questions.
There is no inter-rater reliability statistic, no coding framework validation, and no quantitative measure of agreement or disagreement among contributors. The paper reports that “opinion is split” on whether ChatGPT’s use should be restricted or legislated, but does not quantify the split (e.g., “14 of 43 experts favoured restriction, 29 opposed”).
Methodology
**Study design:** This is a **multi-author opinion paper** — a type of scholarly commentary that aggregates expert perspectives without systematic review methodology. It is not a systematic review, meta-analysis, randomised trial, or observational study. It is best classified as a **narrative synthesis of expert opinion**.
**How the design works:**
The lead authors selected 43 experts based on reputation and field coverage.
Each expert wrote an independent commentary.
The lead authors then read all commentaries, identified common themes, and wrote a synthesised narrative.
The final paper includes a list of research questions derived from the commentaries.
**What this design can prove:**
It can identify the range of expert opinions on a novel topic.
It can generate hypotheses and research agendas.
It can highlight areas of consensus and disagreement among a specific group of experts.
**What this design cannot prove:**
It cannot establish causal relationships (e.g., “ChatGPT causes productivity gains of X%”).
It cannot quantify effect sizes, prevalence, or statistical significance.
It cannot generalise beyond the 43 experts selected — their views may not represent the broader scientific community, industry practitioners, or policymakers.
It cannot rule out selection bias: experts were chosen by the lead authors, who may have favoured contributors with certain viewpoints.
It cannot provide objective evidence — the “findings” are opinions, not data.
**Major methodological weaknesses:**
**No systematic search strategy:** Unlike a systematic review, the authors did not search databases, apply inclusion/exclusion criteria, or assess study quality. They simply invited known experts.
**No transparency on selection criteria:** The paper does not state how experts were identified, how many were invited vs. agreed, or why specific fields were included/excluded.
**No quantitative synthesis:** There is no tally of how many experts raised each point, no measure of agreement strength, and no formal qualitative analysis method (e.g., grounded theory, framework analysis).
**No conflict of interest disclosure for contributors:** The paper does not report whether any contributors had financial ties to OpenAI, Microsoft, Google, or other AI companies.
**No peer review of individual commentaries:** The 43 commentaries were not independently peer-reviewed before inclusion.
**Duration:** The paper was submitted in February 2023 and published in March 2023 — a very rapid turnaround. This reflects the fast-moving nature of the topic but also means the paper captures only the very early months of ChatGPT’s public availability (launched November 2022). Many of the claims and concerns may already be outdated.
Key findings
Because this is an opinion paper, there are no statistical results. Below are the **thematic findings** as reported by the authors, organised by the three thematic areas.
### Theme 1: Knowledge, transparency, and ethics
**Opportunities identified:**
- ChatGPT can democratise access to information and reduce barriers to knowledge creation.
- It can assist with drafting, summarising, and translating content across languages.
- It may reduce repetitive cognitive labour, freeing humans for higher-level analysis.
**Challenges identified:**
- **Bias:** Training data contains societal biases (racial, gender, cultural). ChatGPT can amplify these biases. The paper notes that “biases attributable to training datasets and processes” require urgent research.
- **Misinformation:** ChatGPT can generate convincing but factually incorrect text. The paper warns that “the technology has the potential to generate and spread misinformation at scale.”
- **Privacy:** User inputs may be stored and used for model training, raising concerns about data confidentiality.
- **Accountability:** It is unclear who is legally responsible when ChatGPT produces harmful or defamatory content.
- **Transparency:** Users cannot easily determine whether text was generated by AI, and the model does not cite sources for its claims.
**Split opinion on regulation:** The paper states that “opinion is split on whether ChatGPT’s use should be restricted or legislated.” Some contributors argued for immediate regulation (e.g., mandatory watermarking, disclosure requirements), while others argued that regulation would stifle innovation and that existing laws (e.g., defamation, copyright) are sufficient.
### Theme 2: Digital transformation of organisations and societies
**Opportunities identified:**
- **Banking:** ChatGPT can automate customer service, fraud detection queries, and financial advice (with caveats).
- **Hospitality and tourism:** It can power chatbots for booking, recommendations, and multilingual support.
- **Information technology:** It can assist with code generation, debugging, and documentation.
- **Marketing:** It can generate ad copy, social media content, and personalised email campaigns.
- **Management:** It can draft reports, summarise meetings, and assist with strategic planning.
**Challenges identified:**
- **Job displacement:** The paper notes that “generative AI is likely to disrupt white-collar professions” but does not quantify the risk.
- **Over-reliance:** There is a risk that organisations become dependent on AI outputs without critical human oversight.
- **Quality control:** AI-generated content may contain errors that are difficult to detect, especially in specialised domains.
- **Intellectual property:** It is unclear whether AI-generated content can be copyrighted, and whether training on copyrighted data constitutes infringement.
### Theme 3: Teaching, learning, and scholarly research
**Opportunities identified:**
- **Education:** ChatGPT can serve as a personalised tutor, provide feedback on writing, and generate practice questions.
- **Research:** It can assist with literature searches, summarising papers, drafting grant proposals, and generating code for data analysis.
- **Accessibility:** It can help students with disabilities (e.g., dyslexia, visual impairments) by generating alternative text or simplifying complex language.
**Challenges identified:**
- **Academic integrity:** Students may use ChatGPT to write essays or complete assignments, undermining assessment validity.
- **Plagiarism detection:** Current tools (e.g., Turnitin) are not reliable at detecting AI-generated text.
- **Loss of critical thinking:** Over-reliance on AI may reduce students’ ability to formulate arguments, evaluate sources, and think independently.
- **Research integrity:** The paper warns that “generative AI could be used to fabricate data, generate fake citations, or write fraudulent papers.”
- **Peer review:** AI-generated reviews may be indistinguishable from human reviews, potentially corrupting the peer review process.
### Research questions identified (selected examples)
The paper lists 12 specific research questions. Key ones include:
1. What skills, resources, and capabilities are needed to handle generative AI?
2. How can biases in generative AI be identified and mitigated?
3. What business and societal contexts are best suited for generative AI implementation?
4. What is the optimal combination of human and generative AI for various tasks?
5. How can the accuracy of text produced by generative AI be assessed?
6. What are the ethical and legal issues in using generative AI across different contexts?
Effect magnitude
There are no effect sizes, confidence intervals, or p-values in this paper. The “effect” is the identification of a research agenda. In plain English:
The paper does not claim that ChatGPT causes any specific outcome. It reports that **43 experts believe** ChatGPT offers productivity gains in banking, hospitality, IT, and marketing, but provides no data on the magnitude of those gains.
The paper reports that **opinion is split** on regulation, but does not say how many experts favoured vs. opposed regulation.
The paper identifies **bias** as a concern but provides no measurement of bias magnitude (e.g., “ChatGPT is 23% more likely to associate women with domestic roles than men”).
The paper identifies **misinformation risk** but provides no prevalence data (e.g., “ChatGPT produces factually incorrect statements in 18% of responses on medical topics”).
If you want to quantify the “effect” of this paper itself: it has been cited over 1,200 times (as of early 2025), making it one of the most cited early papers on ChatGPT’s societal implications. Its primary effect has been to shape the research agenda and public discourse around generative AI.
Limitations
### Acknowledged by the authors:
The paper is an “opinion piece” and does not claim to be a systematic review.
The authors note that “the views expressed are those of the individual contributors and do not necessarily represent the views of their institutions.”
The authors call for “further research” across all three thematic areas, implicitly acknowledging the preliminary nature of their findings.
### Not acknowledged (critical reader observations):
1. **Selection bias:** The 43 experts were chosen by the lead authors. There is no evidence that the sample is representative of the broader academic or professional community. Experts with strong pro-AI or anti-AI views may have been over- or under-represented.
2. **No systematic search:** Unlike a systematic review, the authors did not search databases (e.g., Scopus, Web of Science) for relevant literature. They relied entirely on invited commentaries. This means the paper may miss important perspectives published in journals or by authors outside the authors’ networks.
3. **No quantitative analysis:** The paper does not report how many experts raised each point, making it impossible to assess the strength of consensus. For example, “opinion is split” could mean 22 vs. 21 or 40 vs. 3 — the reader cannot tell.
4. **Rapid publication timeline:** The paper was submitted in February 2023, only three months after ChatGPT’s launch. Many of the claims (e.g., about capabilities, limitations, and risks) may be outdated given rapid model improvements (GPT-4 was released in March 2023, after this paper was submitted).
5. **No conflict of interest disclosures:** The paper does not report whether any contributors had financial ties to AI companies. Given that several contributors work in computer science and information systems, undisclosed conflicts are possible.
6. **No empirical validation:** The paper’s claims about productivity gains, bias, and misinformation are based on expert opinion, not empirical data. For example, the claim that ChatGPT “offers significant gains in banking” is not supported by any controlled experiment or field study.
7. **Field coverage gaps:** The paper includes 13 fields but omits others that are directly affected by generative AI, such as journalism, law (beyond passing mention), medicine, creative arts, and military/defence.
8. **No discussion of environmental costs:** The paper does not mention the energy consumption and carbon footprint of training and running large language models — a significant ethical and practical concern.
Practical takeaways
For someone running their own n=1 experiment on using generative AI (e.g., ChatGPT) to improve personal productivity, learning, or decision-making:
### What to test
**Intervention:** Using ChatGPT (or a similar model like Claude or Gemini) as a writing assistant, research summariser, or brainstorming tool.
**Dose:** Specify the exact task and frequency. For example:
- “Use ChatGPT to draft all work emails for 2 weeks, then compare time spent and quality vs. writing from scratch.”
- “Use ChatGPT to summarise 5 research papers per day for 1 week, then test recall vs. reading full papers.”
- “Use ChatGPT to generate 10 marketing taglines per day for 1 week, then A/B test click-through rates vs. human-generated taglines.”
### Minimum meaningful duration
**For productivity tasks:** 1–2 weeks per condition (AI-assisted vs. manual). This is long enough to overcome the novelty effect and learning curve.
**For learning/retention:** 4–6 weeks. Learning effects take time to stabilise, and you need to measure delayed recall (e.g., test knowledge 1 week after using ChatGPT vs. after traditional study).
**For creative tasks:** 2–3 weeks per condition. Creative output quality can vary day-to-day, so longer periods reduce noise.
### What to measure (specific metrics)
**Time:** Track minutes spent per task (e.g., “time to draft a 500-word email”).
**Quality:** Use a 1–10 self-rating scale for clarity, accuracy, and usefulness. For objective tasks (e.g., code generation), measure error rate or pass/fail on a test suite.
**Satisfaction:** Rate on a 1–7 Likert scale: “I felt confident in the output,” “I would use this approach again.”
**Cognitive load:** After each session, rate on a 1–5 scale: “How mentally effortful was this task?”
**For learning:** Score on a standardised quiz (e.g., 10 multiple-choice questions) taken immediately after study and again 1 week later.
### Key confounds to control for
1. **Novelty effect:** ChatGPT is new and exciting. The first week of use may show inflated productivity due to motivation, not the tool itself. Run a 1-week “wash-in” period before collecting data.
2. **Task difficulty:** Some tasks are easier to automate than others. Hold task difficulty constant across conditions (e.g., always summarise papers of similar length and complexity).
3. **Time of day:** Cognitive performance varies diurnally. Do AI-assisted and manual tasks at the same time each day.
4. **Model version:** ChatGPT changes frequently. Note the exact model version (e.g., GPT-4, GPT-3.5) and date of use.
5. **Prompt quality:** The quality of ChatGPT’