June 5, 2026

The Real AI Bottleneck in Research Isn’t Writing: You aimed AI at the wrong half of the job. Here is the half that pays.

by Paras Karmacharya

AI helped you write three drafts in record time.

But none of them are papers yet.

That is the trap many clinical researchers are walking into. We aim AI at the most visible part of academic writing: the introduction, the discussion, the abstract, the paragraph that refuses to sound right. AI makes those parts faster, so it feels like we solved the problem.

But the real bottleneck was never the blank page.

A draft is not a manuscript. A manuscript is not a submission. A submission is not an acceptance. And an accepted paper is not the same as a paper that changes how someone thinks, studies, teaches, or treats a patient.

That gap is where most of the value leaks.

A recent NBER working paper by Mert Demirer, Leon Musolff, and Liyuan Yang, Writing Code vs. Shipping Code: Productivity Effects Across Generations of AI Coding Tools, studied more than 100,000 GitHub developers using AI coding tools. The key finding was not just that AI increased coding activity. It was that the gains shrank sharply as work moved downstream from code generation to commits, projects, and releases. In other words, AI made production faster, but human bottlenecks still determined what actually shipped.

Similar to software development, clinical research moves through a production chain: research question, study design, data quality, analysis, interpretation, core message, figures, draft, co-author review, citation verification, journal fit, peer review, revision, acceptance, publication, and influence.

AI makes one part of that chain much faster.

It does not automatically fix the rest.

Used carelessly, it can make the rest heavier. You do not get more papers. You get more polished text that needs a human expert to rescue it.

That is why so many researchers feel the same contradiction right now. They are writing faster. They are producing more. They have more outlines, abstracts, and half-built manuscripts. And yet the number of papers that actually cross the finish line barely moves.

The bottleneck was never the writing.

The bottleneck was the thinking.

The manuscript factory got faster. The publication pipeline did not.

AI is very good at producing plausible academic prose. Give it a dataset summary, a few results, and a target journal, and it can generate an introduction, methods, results, discussion, abstract, title, cover letter, and response-to-reviewers template.

That is not nothing.

For early-career researchers, especially those writing in English as an additional language or learning academic writing conventions for the first time, this can reduce real friction. In a preregistered experiment on professional writing tasks, Noy and Zhang found that ChatGPT reduced average task time by 40% and improved output quality by 18%.

So yes, AI can help people write faster.

But clinical research writing is not just writing.

The hard problems come before and after the draft. Before the draft, you have to know what the paper is really saying. After the draft, you have to prove it.

Most researchers use AI as a content generator. That is where the trouble begins.

At the bottom of the manuscript chain are tasks like wording, formatting, summarizing, outlining, and turning bullet points into paragraphs. AI is excellent here. These tasks are visible, repeatable, and easy to delegate.

Higher up the chain are tasks like deciding whether the research question matters, whether the analysis supports the claim, whether the clinical implication is overstated, whether the introduction frames the real gap, whether the discussion says something new, and whether the paper deserves to exist.

That is where academic writing becomes academic thinking.

That is also where AI can help the most, if you use it correctly.

The problem is that lower-chain tasks give faster dopamine. A rough paragraph becomes polished in seconds. A messy abstract becomes clean in one prompt. A blank discussion becomes a five-paragraph discussion before your coffee cools.

But speed at the bottom can create drag at the top.

Now your co-authors are reviewing more text. Your mentor is correcting more generic claims. Your statistician is clarifying more overstatements. Your future self is untangling paragraphs that sound good but do not quite say the right thing.

This is the academic version of technical debt.

Call it “manuscript debt”.

Manuscript debt is what happens when you generate polished text before you have done the hard thinking.

It looks like progress early.

It becomes cleanup later.

AI makes some work better and some work worse.

One of the best studies for understanding this is the Harvard Business School and Boston Consulting Group experiment on knowledge workers.

The researchers studied 758 consultants using GPT-4 on realistic knowledge-work tasks. For tasks inside the model’s capability range, AI users completed 12.2% more tasks, worked 25.1% faster, and produced higher-quality solutions. But for a task outside the model’s capability range, AI users were 19 percentage points less likely to produce correct solutions than people who did not use AI (jagged frontier).

That is exactly the nuance clinical researchers need.

AI does not simply help or hurt. It depends on the task.

Rewriting a paragraph for clarity is often a good AI task. Generating ten possible titles is often a good AI task. Turning rough bullet points into a cleaner outline is often a good AI task.

But checking whether a discussion claim outruns the study design is higher risk. Interpreting a subgroup finding is higher risk. Summarizing what a body of clinical literature truly supports is higher risk. Creating citations from memory is not something you should ask AI to do.

That is the central distinction.

The sentence-level task is often safe to delegate.

The judgment-level task is not.

AI increases output. It does not automatically increase originality.

AI often improves the average version of a thing: the average title, the average abstract, the average introduction, the average limitation paragraph, the average “future research is needed” ending.

That can be helpful when you are learning the structure of a paper. It becomes dangerous when the output starts sounding like every other paper.

A Science Advances study found a similar pattern in creative writing. Access to generative AI improved individual story ratings, especially for less creative writers, but AI-assisted stories became more similar to one another. The authors described this as an increase in individual creativity at the risk of losing collective novelty.

Peer reviewers are not asking whether your manuscript sounds like a manuscript. They are asking: What is new here? Why does this matter? Did the design answer the question? Are the claims supported? Is the interpretation fair? Does the paper add something beyond what we already know?

That is why generic AI prose can be so damaging.

It does not look bad. It looks acceptable. It uses the right academic phrases. It checks the surface boxes.

But the gap is recycled. The novelty is vague. The discussion could fit twenty other papers. The clinical implication sounds safe enough to be meaningless.

Average prose will not save a weak message.

Aim AI before the draft, not just during the draft.

The quality of anything AI writes is capped by what you feed it.

Garbage question in, generic manuscript out.

So move AI upstream.

Before you write a single section, use AI to define the five story elements of your paper:

Your two to three key findings
The single core message those findings support
The broader clinical or scientific problem
The precise research gap
The implication for clinicians, patients, researchers, or policy

Most weak manuscripts fail because these elements are not aligned. The introduction frames one gap. The results answer another. The discussion claims a third. The conclusion reaches for an implication the data cannot support.

The paper feels unfocused because the authors never decided what the paper was really about.

Lingard and Watling’s classic advice to health researchers is to write the “story,” not just the “study.” Their point is not that researchers should exaggerate. It is that a persuasive scientific manuscript needs a coherent through-line that helps the reader understand what the work adds and why it matters.

AI can help you find that through-line.

But only if you ask it to think before you ask it to write.

Use this prompt:

I am writing a clinical research manuscript. Here are my study objective, population, methods, and key findings. Help me identify the strongest single core message. Then list which findings support that message, which findings distract from it, and which claims would be too strong based on the data. Be skeptical. Do not write the manuscript yet.

That last sentence matters.

Most researchers ask AI to draft too early. Ask it to think with you first.

Your goal is not to get a paragraph. Your goal is to get clarity.

Once the core message is clear, every section becomes easier. The introduction sets up the gap. The methods establish credibility. The results deliver the evidence. The discussion interprets the meaning. The conclusion lands the contribution.

That is how a manuscript starts to feel inevitable.

Not because the prose is pretty.

Because the logic is aligned.

Write the introduction last.

Many early-career researchers start with the introduction because it comes first in the paper.

That is understandable.

It is also often inefficient.

The introduction should not be a broad literature review with a study objective attached at the end. The introduction is a runway. It should move the reader from the broader problem to the specific gap to your study question.

But you cannot frame that runway well until you know where the paper lands.

That means you need to understand your findings and message before you finalize the introduction.

AI can help you reverse engineer the introduction from the message.

Try this:

Here is the core message of my study: [insert message]. Here are the two to three findings that support it: [insert findings]. Help me design the introduction backward. What broader problem should I open with? What specific gap should I emphasize? What prior literature should I position against? What should I avoid claiming because my data do not address it?

This is where AI becomes useful.

Not because it gives you finished prose.

Because it gives you options.

It can show you three possible framings: a clinical burden framing, a methods gap framing, a treatment decision framing, a health equity framing, a pathophysiology framing, or a real-world evidence framing.

Then you choose.

That choice is the job.

Do not outsource it.

Use AI as a critic, not a co-author.

The most useful AI output is often not the draft it writes.

It is the critique it gives.

There is now direct evidence that LLMs can provide useful research-paper feedback when used carefully. In an NEJM AI study, Liang and colleagues compared GPT-4 feedback with human reviewer comments across Nature-family journals and ICLR submissions. The overlap between GPT-4 feedback and human reviewer feedback was comparable to the overlap between two human reviewers. In a prospective user study, 57.4% of researchers found the feedback helpful or very helpful (paper feedback).

That does not mean AI replaces peer review.

It means AI can help you find problems before peer review.

Once you have a real draft, give AI a harder role. Ask it to behave like a skeptical reviewer for your target journal. Ask where the logic breaks, what the reader will not believe, which claim is overstated, what is missing from the limitations, and whether the conclusion follows from the results.

Try this:

Act as a skeptical peer reviewer for [target journal]. Read this discussion section. Identify the three claims most likely to be challenged. For each, explain why a reviewer might object, what evidence would be needed to support it, and how I could revise the language to make the claim more precise without weakening the paper.

This is a much better use of AI than asking:

Write my discussion section.

A generated discussion gives you text.

A skeptical critique gives you leverage.

It shows you where the manuscript may fail before a reviewer does.

That matters because peer review rarely rejects papers for one awkward sentence. Papers get rejected because the story is unclear, the contribution is weak, the methods do not support the claim, the novelty is not obvious, or the discussion reads like a generic summary instead of disciplined interpretation.

AI can help you find those problems early.

But only if you ask for critique instead of comfort.

Your judgment is the quality-control system.

AI cannot be your author.

That is the position of major publication bodies.

The International Committee of Medical Journal Editors states that authors should not list or cite AI as an author, and that humans are responsible for reviewing AI-generated content because it can be “incorrect, incomplete, or biased” (ICMJE guidance). COPE makes the same point: AI tools cannot meet authorship requirements because they cannot take responsibility for submitted work (COPE position).

That should not be viewed only as a rule.

It is a reminder of what authorship means.

Authorship is not typing. Authorship is responsibility.

You are responsible for the question, the claim, the interpretation, and the final paper.

AI can help you get there.

It cannot carry that responsibility for you.

So create a rule for your team: every AI-touched claim must have a human owner.

If AI helped summarize background literature, someone must verify the source papers. If AI suggested an interpretation, someone must check it against the data. If AI drafted the limitations, someone must make sure the real limitations are included. If AI helped with the response to reviewers, someone must confirm that the response is accurate, respectful, and fully addresses the critique.

The standard is simple:

No claim enters the manuscript unless a human author can defend it.

Not the model.

Not the prompt.

Not the chatbot.

An author.

Build a citation wall between AI and the final manuscript.

Do not let AI invent your citation trail. Do not ask it for references and paste them into your manuscript. Do not cite a paper because AI said it exists. Do not cite a paper because the title sounds right. Do not cite a paper because the abstract appears to support your sentence.

Open the paper.

Read the relevant section.

Check whether the claim in your manuscript is actually supported.

The risk is not theoretical. Studies have documented fabricated or inaccurate references in AI-generated scholarly and medical content. Walters and colleagues studied fabricated bibliographic citations in GPT-generated outputs. A 2026 Nature analysis warned that hallucinated references are now appearing in the scientific literature at scale.

In my experience, about 10 to 20% of citations that AI provide are still hallucinated. So it is important to always verify them.

A safer workflow is:

Use AI to identify what type of evidence you need.
Search the literature yourself in PubMed, Google Scholar, Embase, Scopus, or your institutional tools.
Open the actual papers.
Extract the relevant claim, population, method, and limitation.
Use AI to organize the evidence matrix, not to replace the reading.

Use this prompt:

I need to support the following claim in my introduction: [insert claim]. Do not provide citations. Instead, tell me what type of source would be appropriate, what search terms I should use, what study designs would be strongest, and what evidence would be insufficient.

That prompt keeps AI where it belongs.

It helps you think about the evidence need.

It does not fabricate the evidence.

The better workflow: from draft factory to publication engine

Most researchers use AI like this:

Data summary → draft manuscript → edit → submit.

That looks efficient.

It is often fragile.

A stronger workflow looks like this.

1. Build the message map.

Before drafting, define the core message, key findings, gap, contribution, and implication. Ask AI to identify misalignment.

2. Pressure-test journal fit.

Before polishing, test whether the paper fits the target journal. Ask AI what that journal’s readers would care about. Then verify by reading recent papers from the journal yourself.

3. Check figure and table logic.

Before writing paragraphs, make sure the tables and figures tell the story. Ask AI whether a reader could understand the paper’s main message from the display items alone.

4. Outline before prose.

For the discussion, map the main finding, comparison with literature, interpretation, implication, limitation, and future direction before drafting paragraphs.

5. Draft after the thinking is done.

Now draft. By this point, AI has been aimed at the thinking layer, so the draft has a better chance of being specific.

6. Run a skeptical review.

Ask AI to review the manuscript like a hostile reviewer. Then ask a human expert. Do not confuse the two.

7. Audit every claim.

List every claim that requires evidence. Assign each claim to a citation, a result, or a revision. If no one can support it, cut it.

8. Give co-authors focused questions.

Do not ask, “Thoughts?”

Ask:

Does the core message hold?
Are any claims too strong?
Is anything missing from your domain?
Do you disagree with the interpretation?

This is how AI becomes useful.

Not by replacing the manuscript process.

By strengthening the weak points in the process.

The bottom line

Producing more was never the goal.

Producing faster was never the real problem.

The goal is producing the one paper that lands, survives review, and changes what a clinician, researcher, educator, or policymaker does next.

Aim AI at the content factory and it will bury you in tidy, forgettable drafts.

Aim it at the question, the message, the critique, the citation audit, the reviewer response, and the judgment layer, and the same tool can make you a sharper researcher.

That is the choice.

Not whether to use AI.

But where to aim it.

Top Papers on AI in research this week:

Clinical LLM Evidence Map – A new npj Digital Medicine paper maps how LLMs are actually being evaluated in clinical practice. Most studies still measure workflow or accuracy, while robust clinical outcomes remain thin. The takeaway is clear: clinical AI needs better trials, better reporting, and standardized outcome sets.
Clinical Literature Summarization – Ten headache specialists were compared with RAG-based summaries from Sonnet, GPT-4o, and Llama 3.1. Human experts still produced the most preferred summaries overall. The paper is useful because it shows what clinicians value: synthesis, dosage detail, reference quality, and real clinical nuance.
Clinical Note Extraction Reproducibility – This paper asks a practical question: how stable are LLM extractions from discharge summaries? The answer is less reassuring than many workflow demos suggest. Prompt wording, model choice, and schema design can shift outputs, especially when distinguishing “not present” from “not documented.”
Dynamic Clinical Decision-Making – MedSP1000 tests LLMs in standardized-patient style clinical encounters, not static multiple-choice questions. Even the best model completed only 60.4% of expert rubric items. This is a strong cautionary paper for anyone evaluating clinical AI using overly simple benchmarks.
Grounded EHR Clinical Reasoning – ChatHealthAI combines structured longitudinal EHR representations with an LLM for interpretable clinical prediction. The goal is not just better prediction. It is prediction with readable, clinically grounded reasoning that researchers can inspect.
High-Impact Research Prediction – MIRAI predicts future paper impact using title, abstract, and publication date, then uses that signal to guide research ideation. The idea is provocative. For academic researchers, it points toward AI tools that help prioritize ideas, not just draft manuscripts.

Top Papers on AI in education this week:

Guided LLM Scaffolding in Statistics – Students with guided LLM use performed better than students with unrestricted access during no-help assessments. Access alone was not enough. The useful lesson is that AI works better as a reasoning scaffold than as an answer machine.
LLMs in Student Research Writing – Software engineering students disclosed how they used LLMs while writing short research papers. They used AI for brainstorming, method clarification, organizing findings, and polishing prose. The big concern was familiar: generated content still required verification.
Undergraduate Research Application Review – Purdue tested GPT models to support review of about 1,200 undergraduate research statements. GPT-5.2 processed the full batch in about 4.6 hours. A coordinator then reviewed scored, rationale-annotated outputs in roughly four hours, replacing a multi-week process.
Bias in AI Tutoring Agents – This AIED 2026 paper shows that tutoring agents can miss stereotypical bias while remaining confident. That is the dangerous part. In education, confidence can make flawed feedback feel more trustworthy than it deserves.
K-12 Curriculum Alignment – This paper tests whether LLMs align with state-specific U.S. history standards and student personas. Models adjusted their answers, but not always according to the actual curriculum. That matters for educators using general chatbots as informal tutors.
RAG Academic Advising – This paper proposes a locally deployed RAG advising system grounded in syllabus data. It helps with course sequencing, prerequisites, and study planning. The strongest angle is privacy: institutions can build useful advising tools without sending everything to external systems.

P.S. I’m doing 2 live workshops with Research Boost this month:

📌 June 13, 10:00 AM CDT Bring Your Research: Let’s Write Your Manuscript with AI → {Reserve Your Seat}

📌 June 14, 10:00 AM CDT Bring Your Research: Let’s Write Your Grant Proposal with AI → {Reserve Your Seat}

Your Ultimate Guide to ChatGPT Work for Researchers: Turn the App You Already Have Into Your Personal Assistant

Uncategorized

The Most Powerful AI Model Is Available to Researchers for Just 2 More Days. It Might Not Matter.

Uncategorized

Your Trainee’s Flawless Draft Proves Nothing Now. Here Is What to Measure Instead in the Age of AI

Uncategorized

The Day the Best AI Model Vanished. Why Every Researcher Needs a Backup.

Uncategorized

Join the ONLY NEWSLETTER You Need to Publish High-Impact Clinical Research Papers & Elevate Your Academic Career

I share proven systems for publishing high-impact clinical research using AI and open-access tools every Friday.

The Real AI Bottleneck in Research Isn’t Writing: You aimed AI at the wrong half of the job. Here is the half that pays.

Table of Contents

The manuscript factory got faster. The publication pipeline did not.

AI makes some work better and some work worse.

AI increases output. It does not automatically increase originality.

Aim AI before the draft, not just during the draft.

Write the introduction last.

Use AI as a critic, not a co-author.

Your judgment is the quality-control system.

Build a citation wall between AI and the final manuscript.

The better workflow: from draft factory to publication engine

1. Build the message map.

2. Pressure-test journal fit.

3. Check figure and table logic.

4. Outline before prose.

5. Draft after the thinking is done.

6. Run a skeptical review.

7. Audit every claim.

8. Give co-authors focused questions.

The bottom line

Top Papers on AI in research this week:

Top Papers on AI in education this week:

Leave a Comment Cancel Reply

Related Posts

Your Ultimate Guide to ChatGPT Work for Researchers: Turn the App You Already Have Into Your Personal Assistant

The Most Powerful AI Model Is Available to Researchers for Just 2 More Days. It Might Not Matter.

Your Trainee’s Flawless Draft Proves Nothing Now. Here Is What to Measure Instead in the Age of AI

The Day the Best AI Model Vanished. Why Every Researcher Needs a Backup.

Rising Researcher Academy