OIG’s 2026 Compliance Guidance Draws a Hard Line on AI in Risk Adjustment Coding

For the past three years, AI has been the most overused word in Medicare Advantage vendor conversations. Every coding platform has it. Every sales deck leads with it. Every conference panel ends up on it. Health plan executives have sat through enough demos to know the talking points by heart: AI finds more codes, AI improves RAF scores, AI scales what your team cannot do manually. What most of those conversations have not included is any serious discussion of what happens when the government starts asking questions about what the AI actually found.

In February 2026, the Office of Inspector General published its first Medicare Advantage Industry Segment Specific Compliance Program Guidance since 1999. Most of the industry focused on the prior authorization section. The risk adjustment section deserves equal attention, because OIG put AI directly in its list of potentially fraudulent and abusive conduct. Not as a hypothetical. Not as a future concern. As a named risk, in plain language, in the government’s official compliance framework for Medicare Advantage plans.

That changes the conversation health plans should be having with their coding vendors.

What OIG Actually Said About AI in Risk Adjustment

Under its list of potentially fraudulent and abusive conduct in risk adjustment, OIG specifically named the following:

“Querying physicians via electronic medical record platforms, including prompts generated by artificial intelligence algorithms, or otherwise prompting physicians to add risk-adjusting diagnoses that patients did not have or that did not affect the care, treatment, or management of the patient.”

OIG is not warning against AI broadly. It is drawing a precise line: an AI prompt that surfaces a diagnosis that did not affect the care, treatment, or management of the patient is conduct the government views as potentially fraudulent. The mechanism that generated the prompt, whether an algorithm, a software platform, or a vendor tool, does not change the liability analysis. The question is whether the diagnosis was clinically real and reflected in how the patient was actually managed.

OIG is not saying the technology is the problem. It is saying that when the technology runs ahead of what actually happened clinically, the plan has a serious problem.

The Accountability Chain OIG Established

The AI passage does not stand alone. It sits inside a broader set of requirements OIG placed on health plans regarding their risk adjustment programs, and several of these requirements speak directly to how plans are expected to govern their coding vendors and the tools those vendors use.

OIG states that MAOs should review any software or systems used, including those created by vendors for plan use and also those used by providers. This is not a suggestion to review vendor contracts. It is a requirement to actually look at what the technology is doing, what it is finding, and whether what it finds can be supported clinically.
OIG also states that plans should analyze provider reporting of diagnoses using algorithms to identify outliers and anomalies, and conduct followup audits, education, and appropriate corrective action. Plans cannot treat algorithm output as validated output. The algorithm flags. A human evaluates. That evaluation has to be documented and defensible.
And critically, OIG reminds plans that MAOs maintain the ultimate responsibility for fulfilling their CMS contract obligations even when functions are delegated to third parties. A coding vendor’s AI found a diagnosis. The plan submitted it. The plan certified it. When CMS audits, the question will come to the plan.

The Gap Most Plans Have Not Closed, and What Vendors Are Not Telling You

Most health plans today cannot answer a simple question: for the diagnoses their coding vendor’s AI found last year, what percentage were checked against clinical documentation that showed the condition actually affected how the patient was cared for? That is not a hypothetical. It is exactly what OIG’s guidance implies a plan should be able to demonstrate. Most plans cannot, not because they are negligent, but because the way AI coding tools are sold to them makes the question invisible.

What AI coding tools actually do

Most AI coding tools scan medical records for words, phrases, and documentation patterns linked to specific HCC codes. When the tool finds a match, it flags the code. A mention of chronic kidney disease in a progress note. A reference to a prior diagnosis in a problem list. The tool generates a suggestion.

What it cannot do is think like a clinician. It cannot tell whether a condition was actively managed or mentioned in passing. It cannot determine whether the diagnosis was assessed at the visit or whether the documentation would hold up under ICD-10-CM guidelines. A note that references diabetes in the patient history looks identical to a note where diabetes is the primary condition being treated. A trained coder reading the record can tell the difference immediately. The tool cannot.

How vendors frame the conversation, and what they leave out

Vendors selling AI coding tools lead with RAF score improvement. Our platform found X additional codes your program missed, resulting in Y improvement in risk scores and Z additional revenue per member per month. The number is compelling. The conversation stops there.

What the RAF number does not tell you is how many of those codes would survive a RADV audit. It does not tell you what percentage were backed by documentation showing the condition affected how the patient was actually cared for, which is the exact standard OIG just put in its compliance guidance. RAF improvement and audit defensibility are not the same measurement. Vendors have every commercial reason to lead with one and say nothing about the other.

Questions health plans should ask before they sign anything

Before committing to any AI coding program, health plans should get clear answers to the following:

What is your RADV audit pass rate on codes your AI found specifically, not on the overall program?
What human review step exists between what your tool finds and the code being submitted to CMS?
Who does that review, what are their qualifications, and how is the review documented?
How does your tool tell the difference between a diagnosis that was assessed and managed versus one that was mentioned in passing?
If CMS selects one of the codes your tool found during a RADV audit, what is your obligation to the plan at that point?

If a vendor cannot answer questions two and three clearly and specifically, that is the answer. OIG confirmed that the plan owns what gets submitted to CMS. A vendor who cannot describe the human review step in their process is leaving that entire accountability burden on the plan.

What a Program That Can Actually Be Defended Looks Like

The ICPG calls for audits of diagnosis data both before and after submission to CMS. Before is where most programs fall short. There has to be a step where someone checks what the AI found against the actual clinical record, and that someone needs the coding and clinical knowledge to make a real call, not just approve a flag. Determining whether a diagnosis affected the care, treatment, or management of a patient requires a person who has read enough records to know the difference between documentation that holds and documentation that does not.

The plan also needs real visibility into what its vendor’s tools are actually doing. OIG’s requirement to review vendor software is not satisfied by an attestation. It requires the plan to understand the logic, watch for unusual patterns, and be able to show that oversight to CMS if asked. And the CMS certification on risk adjustment data accuracy has to reflect a program that has genuinely done this work. Under the False Claims Act, that certification is not a formality.

How Annova Solutions Approaches This

Annova Solutions calls its approach Scaled by AI, Managed by Experts. AI scans records and identifies coding opportunities across large member populations. Then senior coders with deep Medicare Advantage risk adjustment experience evaluate every flagged diagnosis against the clinical record before anything moves forward. Not junior reviewers. Not offshore checkers. People who have spent more than 15 years working inside this program, through every model change and audit methodology shift, and who know what documentation holds and what does not.

The programs Annova runs for health plans are built around one question: would this code hold if CMS came knocking tomorrow? RAF improvement follows naturally from accurate and complete coding. It is not what the program is optimized around.

For health plans rethinking their vendor relationships after reading the OIG ICPG, Annova is worth a conversation. Fifteen years of working inside this compliance environment produces a very different discussion than a vendor leading with an AI demo.

The Question Worth Sitting With

The question every VP of Risk Adjustment and Chief Compliance Officer should be sitting with is this: between the tool that finds the code and the certification the plan submits to CMS, what actually happens? Who looks at what the tool found? What are they evaluating? And if CMS or OIG asked tomorrow, could the plan show them?

AI in risk adjustment coding is not going away. The plans that get through the next round of OIG and CMS scrutiny without a problem will be the ones that treated the tool as the beginning of a clinical review, not the end of it.