The most expensive mistake in AI-assisted research has nothing to do with prompting. It is pasting the wrong data into the wrong tool.
Healthcare breaches cost more than any other industry. The current US average sits at $7.42 million per incident.
And the threat is rarely an outside attacker. Around 95% of breaches involve human error. The single most common cause of data loss is a careless or negligent insider, not a hacker.
A misdirected email. An unpublished cohort dropped into a free tool “just to explore.”
One slip can cost funding, reputation, and years of work.
The root error underneath almost all of it: treating every piece of research data the same.
Your data is not uniform. Neither is its security. A published abstract and an identified patient record sit at opposite ends of a risk spectrum, yet most researchers route both through the same chat box.
The fix is a two-step habit.
- Classify the data first.
- Match the tool to that classification.
Most institutions already use a four-tier system: Public, Institutional, Restricted, Critical. Each level dictates which AI tools you can and cannot touch. Here is what each one means in practice.

1. Level 1 is public data. Use any tool you want.
This is data meant to be shared.
Examples you can treat as open:
- Published papers, preprints, and conference abstracts
- NHANES public-use files, CDC WONDER, and BRFSS survey data
- ClinicalTrials.gov records and FDA drug label data
- Public sequence repositories like GenBank and Gene Expression Omnibus
- Open-access datasets and public GitHub repositories
There is no restriction and no realistic downside.
The AI options are wide open. Any chat model works (ChatGPT, Claude, Gemini). Any training or fine-tuning setup. Any embedding model through an API or interface.
If the information is already public, the tool’s data policy stops mattering. Work freely and move fast.
2. Level 2 is the institutional middle. Narrow your tools.
This is where most active research lives. Unpublished results. Grant proposals and budgets. Aggregated or de-identified datasets.
None of it identifies a person. All of it would still hurt you if it leaked early.
Examples that live here:
- Your unpublished results, manuscript drafts, and analysis code
- Grant proposals, specific aims, and budgets
- IRB protocols and study design documents
- Aggregated or summary-level counts you have generated
- De-identified national datasets that require a data use agreement, like the HCUP National Inpatient Sample (NIS), SEER, and ACS NSQIP
The acceptable tools tighten here:
- “Paid” plans with training turned off
- AI licenses provided through your institution’s portal
- Open-source models on local or institution-approved hosting
- Storage on approved institutional cloud only
NOTE: All free AI plans are not really FREE – they are training on your data. On consumer Claude (Free, Pro, Max), data is not used to train future models by default. In contrast, in ChatGPT Free, Plus, and Pro, by default, the training stays on until you opt out.
The rule for Level 2: verify your plan qualifies, and confirm training is off before you upload anything.
3. Level 3 triggers legal protection. A paid plan will not cover you.
Now a single leak triggers an investigation. This level covers de-identified patient data and anything governed by a CDA, an NDA, PII, or HIPAA.
Examples that belong here:
- HIPAA Limited Data Sets, which still carry dates and ZIP codes
- Vanderbilt’s Synthetic Derivative, the de-identified mirror of the EHR, and BioVU, its de-identified biobank
- Credentialed-access databases like MIMIC, eICU, the All of Us Controlled Tier, and UK Biobank
- De-identified claims data under a data use agreement, such as MarketScan, Optum, and CMS limited data sets
- Any dataset shared under a CDA or NDA with an industry partner
“De-identified” does not mean safe. Researchers found that 15 demographic attributes are enough to correctly re-identify 99.98% of Americans in any dataset, which is exactly why this data still demands real protection.
This is the level where a comfortable habit becomes a reportable event.
Your personal subscription tier does not buy your way in. ChatGPT Free, Plus, Pro, and Team are not HIPAA-eligible and carry no Business Associate Agreement. Only ChatGPT Enterprise and the API platform can sign a BAA, and only through a sales-managed contract. A BAA also covers the vendor’s obligations, not yours. How your team configures access and what people type into prompts still sits on you.
What actually works at Level 3:
- Enterprise contracts with a signed BAA, through your institutional AI portal
- Open-source models with fully local hosting
- Data kept on private institutional servers only
- Hosting cleared by your Governance, Risk, and Compliance (GRC) team
The contract and the infrastructure protect you. The price of your monthly plan does not.
4. Level 4 needs custom infrastructure. No platform qualifies.
Identified patient data. Federal embargo data. Proprietary drug-trial data. Anything touching national security.
Examples that demand custom infrastructure:
- The live EHR itself (Epic or others), plus any identified chart extract with names, MRNs, or full dates
- Vanderbilt’s Research Derivative, the identified clinical data warehouse that requires IRB approval and institutional credentials
- REDCap projects that hold PHI or direct identifiers
- CMS Research Identifiable Files and other identifiable claims data
- Proprietary, pre-market drug-trial data under a strict CDA
No commercial AI platform clears this bar. Not one.
The only acceptable setup is open-source or custom-built models, running on private compute you control (on-premise or a private cloud), with explicit institutional sign-off. If you operate here, you are building a private solution, not subscribing to one.
How to work within the levels
The 4 levels tell you where your data sits. They do not tell you how to behave once you know. Classification is the easy half. The harder half is holding the line when a deadline is close and a free tool is one tab away. These 3 rules cover almost every decision you will face:
1. Match the tool to the data, not the task.
The task does not set the risk. The data does.
ChatGPT for brainstorming with published findings is fine. The same ChatGPT window for unpublished cohort data is a policy violation at most institutions, even on a paid plan.
Same tool. Same prompt. Different data. Different verdict.
Most researchers anchor their decision to what they are doing. Anchor it instead to what they are handling.
2. “Opt out of training” is necessary, not sufficient.
Turning off training stops the model from learning from your data. It does not stop your data from reaching the company’s servers.
Samsung learned this the hard way. Within about 20 days of allowing ChatGPT, engineers pasted source code and a meeting transcript into it, and the company banned the tool because the exposed data could not be pulled back.
On consumer Claude, opting out drops your data retention from 5 years to 30 days. Better. Not zero.
For anything Level 3 or above, a privacy toggle is not the control you need. You need an enterprise contract with a BAA, or fully local hosting. A setting in a menu cannot substitute for a signed agreement.
3. Open-source is not automatically private.
“Open-source” describes the model. It says nothing about the security.
Running Llama on your own machine is genuinely private. Running the exact same model on a random third-party cloud depends entirely on who controls that server.
Above Level 2, open-source still requires local hosting or a platform your GRC team has approved. The license is not a security guarantee.
The trade-offs of going local are:
- Open-source models still trail the frontier commercial models on raw capability
- Full-parameter models need multiple GPUs to run well
- Workarounds exist: smaller fine-tuned or distilled models, quantization to cut hardware demands, and parameter-efficient fine-tuning (PEFT) for local training
You trade some capability for control. At Level 3 and 4, that trade is not optional.
Where the plans actually land
A quick map of which subscription reaches which data level:
- Free or unpaid plan: Level 1 only
- Paid consumer plan with opt-out on: Levels 1 to 2
- Enterprise contract with a BAA: Levels 1 to 3
- Open-source, hosted locally: Levels 1 to 4
Notice the pattern. Spending more money moves you one level. Controlling your own infrastructure moves you all the way.
The one question to ask every time
Before you paste anything into an AI tool, ask yourself a single question: what level is this data?
If you cannot answer, assume it is higher than you think.
That one pause prevents most of the damage. The data supports the instinct. Shadow AI, meaning staff using personal accounts for sensitive work, added roughly $670,000 to the average breach last year. And 97% of AI-related breaches hit organizations with no access controls in place.
Using AI well in research is not only a matter of writing sharp prompts. It is knowing which data is allowed to go where.
Does your institution give you clear AI data-security guidance? Or are you piecing it together on your own? Tell me how your program handles it. Would love to know what is working.
Top Papers on AI in research this week
- Co-Scientist Launches in Nature – Google DeepMind’s multi-agent research assistant officially arrived in Nature this week, built on Gemini. The system runs a coalition of specialized agents that debate, generate, and refine hypotheses against the scientific literature. Imperial College London’s Fleming Initiative put it to the test on antimicrobial resistance. It replicated a decade of lab-derived conclusions in a fraction of the time.
- The AI Scientist Goes End-to-End – Sakana AI’s system for fully automating the scientific process was accepted by ICLR and published in Nature this spring. Singapore startup Analemma then ran a live demonstration, generating 166 complete machine-learning papers in roughly 417 hours. That works out to one paper every two and a half hours, for about $1,100 total.
- LLMs Inflate Output, Deflate Quality – A major study in Science examined 2.1 million preprints and found that LLM adoption increases a researcher’s output by anywhere from 24% to 89%. The gains are largest for non-native English speakers. But quality has slipped. Writing became more polished while the underlying science grew weaker.
- AI Predicts Research Trends Years in Advance – Researchers at the Karlsruhe Institute of Technology combined LLMs with machine learning to map concept relationships across scientific literature. The system can surface emerging research directions two to three years before they peak. Results were published in Nature Machine Intelligence.
- Protein Pairs Get Their Own Language Model – A new model from Singapore’s Cancer Science Institute learns from two interacting proteins at once, rather than one in isolation. More accurate interaction predictions could help identify drug targets faster. The work appeared in Nature Communications this April.
Top Papers on AI in education this week
- Classroom AI: LLMs as Grade-Specific Teachers – Researchers fine-tuned LLMs to generate age-appropriate content across six grade levels, from lower elementary to adult education. A study with 208 participants showed a 35.64 percentage point improvement in grade-level alignment over standard prompting. Accuracy was not sacrificed. The paper appeared in npj Artificial Intelligence.
- DeepTutor: An Agent-Native Tutoring System – Hong Kong University’s Data Intelligence Lab released DeepTutor, a fully open-source agentic tutoring framework. It pairs static knowledge grounding with dynamic learner memory, adapting in real time to what a student knows. Across five backbone models, it improved personalized metrics by 10.8% and agentic reasoning by 29.4%.
- How Students Actually Use LLMs for Critical Thinking – A new arXiv paper tracked LLM use across two runs of a research methods course, where students decided for themselves whether and how to use AI. Researchers built a refined taxonomy of usage types, organized by how much initiative the student took. The findings carry real implications for how instructors design AI-aware assignments.
- The Illusion of Understanding in Middle Schoolers – A study of 63 students (ages 14-15) using ChatGPT for science tasks found a correct solution rate of only 0.51, even on problems that were entirely solvable with effective prompting. Domain knowledge offered no protection. The researchers argue that AI’s fluent outputs may be cultivating cognitive and metacognitive laziness.
- Two Hours of AI Literacy Changes How Students Use LLMs – A randomized study gave 116 middle schoolers a two-hour workshop explaining how LLMs work and fail. Trained students asked better follow-up questions, reformulated queries more often, and judged AI responses more accurately. Short, well-designed interventions can make a real difference.
📌 P.S. Join my next live masterclass FREE: Academic Writing with AI on May 30, 07:00 am CDT.
Register here: https://risingresearcheracademy.easywebinar.live/event-registration-11
