Trusted AI in Life Sciences: Domain-Trained, Closed-Loop and Governed

Written by Victor Hamel | Jun 25, 2026 3:58:46 PM

Consumer AI has gone mainstream, yet life science organizations still struggle to turn it into trusted, compliant value. We explain why generic models fall short in pharmacovigilance, clinical, and regulatory work, and how Quartica's approach, fine-tuned models running inside a closed-loop system with governance built in, meets the bar that regulators and the latest CIOMS and EU AI Act guidance now expect.

Over the past few years, we have witnessed a remarkable pace of progress in artificial intelligence. Not long ago, AI was still the next big thing: promising but largely confined to research labs and academic papers.

Today, AI is woven into everyday life. It has become rare to meet someone who has never tried a chatbot from one of the large AI companies, and most people now use AI weekly to ask, rewrite, summarize, and analyze. Yet here lies a paradox. Despite this massive consumer adoption, companies, particularly in regulated industries like life sciences, still struggle to get real value from artificial intelligence.

The reality is that in 2026, implementing AI in a life science organization is not about finding the most powerful model. It is about designing a system that understands the complexities of each organization, integrates with its data ecosystem, and is aware of the specifics of its portfolio, all while remaining trusted and compliant.

On top of this sits one of the most critical challenges for any organization: data privacy. There is a wide gap between a consumer uploading documents to a chatbot, where privacy matters but the risk exposure is limited, and a biopharma organization, for which strict data governance and controls are a non-negotiable foundation.

Training models that understand the reality of life sciences

In 2023, the Transformer, the architecture behind most large language models, was rapidly becoming the gold standard of AI, replacing earlier approaches like classical NLP and recurrent neural networks. These models could achieve strong performance on language tasks without the large, task-specific labeled datasets earlier architectures required while producing sound human language and tackling tasks such as summarization, question answering, and classification.

We saw the potential for Biopharma R&D work across pharmacovigilance, clinical, and regulatory early on, but we were also conscious that these models knew very little about the realities of the field. So, at Quartica, we started fine-tuning them and developed our first range of proprietary models. It was exciting to see what the right datasets could do: base models became AI systems capable of detecting drugs and adverse reactions in the literature, classifying relevant articles, and even extracting relationships to support causality assessment.

Today, with the latest generations of LLMs and the success of scaling, AI models can tackle far more complex use cases. With their ability to reason and work across large contexts, they can solve mathematics and biology cutting-edge problems and produce high quality rationale.

In domains like pharmacovigilance, clinical medical writing or regulatory operations, however, our case studies tell a more nuanced story. Open source and commercially available models, while capable of sophisticated analysis, repeatedly fail to adhere to critical concepts and requirements. We have seen some of the most powerful models on the market hallucinate, generate medical scientific content not compliant with standards such as ICH, invent patient data when navigating a case report with limited information, or produce causality assessments that ignore confounders and underlying conditions. There is a misconception that anything can be thrown at Commercial or open-source AI LLMs to magically produce a compliant, regulated output. But the old saying, “garbage in, garbage out”, has never been more true.

The lesson is clear: AI may now be available to everyone, but there is a strong case for adapting these models to the realities of life sciences R&D. Over the past few years, Quartica has invested heavily in R&D to develop a best-in-class family of models, focused on the tasks that matter most for pharmacovigilance, clinical and regulatory work. This spans protocol development including SOA, clinical & non-clinical summaries, subject narratives, literature analysis, CTD authoring, CMC impact assessments, labeling, benefit risk reasoning, causality assessment and R&D clinical data analysis. This also included ensuring tasks are compliant with current and can support upcoming regulations/guidelines such as GxP, ICH E6, ICH M11 (CeSHarP), EU 2025/1466, CIOMS AI in PV, EU AI Act among others.

Drug safety, clinical efficacy and overall benefit-risk are not experienced equally across patient populations. Adverse reactions present differently across age, sex, ethnicity, and comorbidities, and the data available to assess them is often thinnest for exactly the populations most at risk of being overlooked. An AI model that performs brilliantly on average but poorly on underrepresented patients is not a good model. It is a biased one. This is why fairness has been specifically embedded in our AI development from the start: we curate training datasets for population diversity, evaluate model behavior across patient subgroups rather than on aggregate scores alone, and train and test our models explicitly on bias, equity, and fairness in safety interpretation.

Our latest training results show significant improvements over commercially available LLMs on the tasks that truly matter for drug safety.

The takeaway is not that consumer AI is incapable of complex work. It is that life science organizations need AI systems adapted to their most critical challenges, systems that understand the context in which they operate.

No two life science organizations are alike: each has its own portfolio, its own data landscape, its own conventions, processes, and regulatory history. So, for every client, our family of R&D pre-trained models are the starting point, not the final solution. We adapt these models further to each organization's data and ways of working, resulting in an AI system that is genuinely unique to that company. Each deployment is isolated, with no data transfer between environments. Your data shapes your models, and only yours. The result is not a shared platform with a thin layer of customization, but a dedicated AI system that knows your products, speaks your processes, and never mixes your information with anyone else's.

In a regulated environment, consistency is key

Adopting AI in a regulated environment requires a shift in how authors think about their role. In a manual process, authors are used to writing section by section, paragraph by paragraph. The trade-off is fragmented content: a single data source update or a piece of late-breaking information can trigger a complex impact assessment and, in the worst case, compromise a submission.

When organizations move to an AI system, we often see writers expecting it to work the same way, authoring content sentences by sentence with a full range of manual customization. Several tools in the market support this linear model, typically LLM plug-ins authoring tools that generate content section by section on demand personalized for the author. In this setup the authoring plugin allows a user to query a commercial LLM to generate content for a section or paragraph piecemeal. This mode of AI application results in mxodest or incremental gains that rarely justify the investment.

But AI can operate differently. Quartica MARS AI operates in a multi-document, multi-workflow model. AI works across the whole document at once: authoring multiple sections simultaneously, cascading critical information from one to another, analyzing contributions across multiple datasets, reconciling them with historical information to anticipate regulatory feedback, and updating content automatically as sources change, all while maintaining consistency and a full memory of the process. It operates across global, regional, and local workflows in parallel, not sequentially.

This is the difference between a boilerplate benefit-risk narrative and the thorough, coherent, cross-referenced analysis that regulators expect and inspectors scrutinize.

We see this as the central tension between specialization and consistency. Across our MARS implementations and our analysis of large volumes of historical data, we have observed consistent patterns of SOP drift: writers customizing individual analyses beyond the template boundaries, often with little traceability or rationale for the deviation. The writer expertise is real but the process is fragile.

AI systems can offer real degrees of specialization, but inside a controlled environment where consistency and compliance set the rules. We believe in an approach to AI that adapts through controlled inputs while guaranteeing consistency across every report, every product and every market.

There is no full substitute for human medical knowledge. But the argument that only certain authors can write about a given product, because the expertise lives with them, no longer holds. Context-aware AI systems can deliver medical data analysis with a level of consistency, traceability, and explainability that a fragmented manual process struggles to match.

Over time, AI systems can also accumulate and preserve institutional knowledge across medical writing teams, maintaining product-specific analytical context even as authors change roles or leave the organization. The cost of losing a senior medical writer is rarely captured in a software benchmark. Neither is the cost of rebuilding the interpretive context they carried. This is precisely the kind of value that governed AI delivers and that standard ROI models fail to measure.

The case for closed-loop AI systems in regulated Life Sciences

Plugging a business application into an external AI API is tempting. But beyond the critical need for adaptation discussed above, relying on external AI providers means giving up a fundamental pillar of data governance: data privacy.

Every prompt sent to an external API is data leaving your environment. In pharma, that data is rarely trivial. Case narratives, medical histories, and safety reports contain some of the most sensitive information biopharma organizations manage. In Europe, health data qualifies as special category data under Article 9 of the GDPR [3], and in the United States it falls under the protections of HIPAA [6]. Once it crosses into a third-party infrastructure, you inherit their retention policies, their sub-processors, their geography, and their roadmap. You can negotiate contracts, but you can no longer enforce control by design.

The legal ground under cross-border data flows has also proven anything but stable. The Schrems II ruling invalidated the EU-US Privacy Shield overnight, and the Standard Contractual Clauses that organizations rely on today, in both their EU and UK forms, come with transfer impact assessments and the possibility of further legal challenge [4]. More recently, the US Department of Justice data security rule under 28 CFR Part 202 placed new restrictions on transfers of bulk sensitive personal data, explicitly including personal health data [5]. For a life science organization, an architecture in which sensitive data never needs to cross a border in the first place is not just simpler. It is structurally safer.

There is also a quieter problem that external dependence creates: unpredictability. Commercial models are updated, deprecated, and replaced on timelines you do not choose. For a consumer application, a silent model update is an inconvenience. For a validated system producing regulatory documents, it is a compliance event. A model that behaves differently today than it did during validation is, for all practical purposes, a new system.

This is why we built MARS as a closed-loop AI system. Our models are developed, fine-tuned, and hosted by Quartica, trained on proprietary datasets, and deployed without any reliance on external AI providers. Customer data never leaves the controlled environment, never trains a third-party model, and never travels through infrastructure we do not govern. Each environment is operated under an information security management system aligned with ISO/IEC 27001 [7], and model versions are frozen, validated, and changed only through a controlled process.

This is not just our preference; it is where the regulatory landscape is heading. The CIOMS Working Group XIV report on artificial intelligence in pharmacovigilance, published in December 2025 [1], and the EU AI Act [2] both converge on the same principles: organizations deploying AI in safety critical contexts must be able to demonstrate control over their systems, their data, and their outputs. A closed-loop architecture does not just make compliance easier. It makes it structural.

Governance and monitoring: trust is built continuously

A well-trained model in a controlled environment is the foundation. It is not the finish line.

The CIOMS report makes a point that resonates deeply with how we think about AI at Quartica: automation pursued purely for efficiency risks stripping out the human judgment that the system was designed around. The entire discipline exists to support careful, accountable assessment of individual cases. An AI system that obscures that accountability, however accurate, is not fit for purpose.

In practice, this means governance and monitoring cannot be an afterthought bolted onto an AI deployment. They have to be designed into the system from day one. For us, that translates into a few non-negotiable principles.

Humans stay in the loop where it matters. AI should absorb the repetitive and the mechanical, freeing safety experts to focus on the assessments that require their full expertise: differential diagnosis, causality, benefit risk judgment. Every AI generated output in MARS is reviewable, editable, and ultimately owned by a human expert. This is the human oversight that Article 14 of the EU AI Act requires of high-risk AI systems [2], and that CIOMS develops in depth through its human-in-the-loop and human-on-the-loop models [1]. The system is designed to augment accountability, not dilute it.

Every output is traceable. In a regulated environment, "the model said so" is not an acceptable answer. Each document, classification, and assessment produced by MARS carries a complete audit trail: which model version produced it, from which inputs, reviewed and approved by whom. This is record-keeping in the sense of Article 12 of the EU AI Act [2]. When an inspector asks how a conclusion was reached, the answer exists.

Every model is documented. Each MARS model comes with documentation in the spirit of an AI model card: its intended use, expected inputs and outputs, the nature of the human interaction around it, and its evaluated performance and limitations. This mirrors what the CIOMS transparency principle asks organizations to disclose about the AI systems they deploy [1], and what Article 13 of the EU AI Act expects in terms of transparency toward deployers [2].

Performance is monitored, not assumed. Models are validated before deployment, but validation is a snapshot. Real world data shifts, terminologies evolve, and edge cases accumulate. We continuously monitor model performance in production, track drift against validated baselines, and treat any degradation as a quality event with defined escalation paths, in line with the continuous appraisal that the CIOMS validity and robustness principle calls for [1] and the post-market monitoring obligations of the EU AI Act [2].

Change is controlled. Model updates follow the same discipline as any change to a validated system: documented, tested, approved, and traceable. Our customers always know which version of which model is producing their output, and nothing changes without their visibility.

This is what the EU AI Act asks of an AI system, and what the CIOMS principles describe: risk management, human oversight, transparency, logging, and continuous monitoring across the full lifecycle. We see these not as regulatory burdens but as the actual definition of production grade AI in life sciences. Anyone can demo a capable model. Operating one accountably, year after year, under inspection, is a different discipline entirely.

That discipline is what we mean by trusted AI. Not a model that is just impressive on a benchmark, but a system your organization can defend in front of a regulator, explain to a patient, and rely on every single day.

References

[1] CIOMS Working Group XIV. Artificial Intelligence in Pharmacovigilance. Council for International Organizations of Medical Sciences, Geneva, December 2025.

[2] Regulation (EU) 2024/1689 (EU Artificial Intelligence Act), in particular Article 9 (risk management), Article 10 (data and data governance), Article 12 (record-keeping), Article 13 (transparency), Article 14 (human oversight), and Article 15 (accuracy, robustness and cybersecurity).

[3] Regulation (EU) 2016/679 (General Data Protection Regulation), in particular Article 9 (special categories of personal data), Article 25 (data protection by design and by default), Article 28 (processors), Article 32 (security of processing), and Chapter V (transfers of personal data to third countries).

[4] Court of Justice of the European Union, Case C-311/18, Data Protection Commissioner v Facebook Ireland and Maximillian Schrems (Schrems II), 16 July 2020; European Commission Standard Contractual Clauses (2021); UK International Data Transfer Agreement and Addendum (2022).

[5] US Department of Justice, Data Security Program, 28 CFR Part 202, implementing Executive Order 14117 on preventing access to bulk sensitive personal data by countries of concern.

[6] Health Insurance Portability and Accountability Act (HIPAA), Security and Privacy Rules, 45 CFR Parts 160 and 164.

[7] ISO/IEC 27001:2022, Information security, cybersecurity and privacy protection. Information security management systems. Requirements.

View full post