For more than a decade, the patient had sought medical treatment in vain, but after entering all his medical history into ChatGPT, the cause of the disease was revealed at a glance: gene mutation! Medical AI from giants such as Microsoft and OpenAI has quietly debuted, with an accuracy rate that exceeds that of professional doctors! The future of medical care may be completely rewritten!
AI once again shakes up the medical community!
A patient was tortured by an inexplicable illness for more than ten years, and dozens of doctors were unable to find a cure.
Until he entered the report into ChatGPT—the AI hit the nail on the head: MTHFR A1298C gene mutation!
On Reddit, this news went viral!
⚠️Note: Before adopting ChatGPT's suggestions, the patient communicated and confirmed with the attending physician many times. Please be sure to combine the opinions of professional doctors and do not rely solely on AI as a medical basis.
For more than a decade, the patient has been troubled by various inexplicable symptoms. He has done everything he could, including spinal MRI, CT scans, and blood tests, but he is still confused.
Later, he also underwent functional medicine testing and unexpectedly discovered that he had a homozygous mutation: MTHFR gene A1298C mutation. This is a common methylenetetrahydrofolate reductase MTHFR gene mutation, and the most common mutation is called MTHFR C677T.
Everyone has two copies of the MTHFR gene, one from the mother and one from the father. The figure above shows the possible genotypes of MTHFR C677T. MTHFR A1298C occurs at position 1298 of the MTHFR gene. This mutation affects 7-12% of the American population.
He also saw a neurologist and was tested for multiple sclerosis (MS).
Until he entered all the examination reports and medical history over the years into ChatGPT.
Amazing. A surprising scene happened: ChatGPT found that although the serum vitamin B12 level was normal, it was inconsistent with the persistent neuralgia and chronic fatigue.
This abnormality points to a long-ignored possibility - methylation block.
After a few months, the patient's tingling sensations lessened and the brain fog dissipated.
The chief physician reviewed the treatment report and was shocked to find that genetic mutation was the cause of all the symptoms!
Rohan Paul, an AI engineer who tracks the progress of AGI, was encouraged by the news. He believes that "the time has come for second opinions from medical AI models to become the norm in medical practice."
The relevant tweets were also reposted by the president of OpenAI.
Former Forbes contributor and author Derick David said this is the "AlphaGo" moment in the medical field: AI is now better than humans in disease diagnosis.
AI medical miracles are happening one after another
There are so many similar examples!
Reddit user crasstyfartman's sister was diagnosed with a rare genetic blood disease by ChatGPT——
Before this, she had spent over a decade seeing doctors and naturopaths, all of whom told her it was just psychological. They even rolled their eyes when she asked for a test. She insisted. ChatGPT was right.
After 22 years of complaining to doctors, netizen buyableblah finally got a diagnosis with the help of ChatGPT.
I did the same thing, but for endometriosis. I finally had an ultrasound and found a 6 cm endometrioma (called an endometrioma), which has now grown to 7.3 cm and I plan to have it removed later this year.
One netizen even used ChatGPT to save a pet dog that was "wrongly sentenced to death" by a veterinarian.
Reddit user sometimeslater0212 expressed strong dissatisfaction with the medical system:
I showed the findings from ChatGPT to my doctors, but they all scoffed at it. Some said, "I've never heard of a similar diagnostic suggestion," while others said, "Don't believe ChatGPT."
This kind of arrogance is really annoying.
But it’s not just OpenAI. Microsoft, Google, IBM and others have already laid out medical AI.
Microsoft's consumer AI products are used in more than 50 million health-related scenarios every day.
From a user’s first query about knee pain to an emergency late-night search for a nearby clinic, search engines and AI assistants are increasingly becoming the first line of defense for healthcare services.
Just last week, Microsoft released MAI-DxO. And this AI system performs far better than doctors.
The researchers used real-life case records published weekly in the New England Journal of Medicine as a benchmark.
The results showed that Microsoft's AI Diagnostic Orchestrator (MAI-DxO) had an accuracy rate of 85% when diagnosing NEJM cases, which is more than four times that of experienced human doctors in the experiment.
Moreover, MAI-DxO costs less than human doctors.
Microsoft: The road to medical ASI
NEJM publishes the "Massachusetts General Hospital Case Record" every week, which records in detail the patient's entire diagnosis and treatment process.
Such cases are usually extremely difficult to diagnose, and often require multiple experts and a series of tests to make a final judgment.
NEJM: New England Journal of Medicine, one of the most authoritative medical journals in the world
So, how does AI perform in these complex cases?
To explore this issue, based on the NEJM case, Microsoft's research team designed a set of interactive diagnostic challenges called the Sequential Diagnosis Benchmark (SD Bench).
304 NEJM cases were transformed into step-by-step diagnosis and treatment simulations: just like in a real environment, AI models or human doctors can ask questions step by step, arrange tests, obtain results, and update diagnostic ideas in real time, and finally give conclusions. The final conclusions will be compared with the standard answers given by NEJM.
Each test request generated a virtual fee to simulate real medical resource consumption. Based on this, the researchers evaluated the model from two key perspectives: diagnostic accuracy and resource efficiency.
Figure 1: Schematic diagram of an AI agent reasoning and solving a sequential diagnosis problem
Enter initial case information, such as:
A 29-year-old woman was admitted to the hospital with sore throat, parapharyngeal swelling, and bleeding. Her symptoms did not improve after antimicrobial treatment.
According to the "sequential diagnosis" process, AI begins to reason:
(1) Patient review of condition
(2) Next, the AI begins the medical interview, covering the following areas: past medical history, medication history, signs of malignant tumors, history of viral infection, dental history, bleeding tendency, routine tests (such as blood routine, coagulation) and imaging examinations (such as neck MRI).
(3) Internal discussion of the virtual doctor expert group
(4) Check each item and update the diagnosis
(5) AI system draws a diagnosis conclusion
(6) Comparison with NEJM authoritative diagnosis results and expert review opinions
In the video below, the project leader introduces the basic process.
Towards accurate diagnosis
The researchers conducted a comprehensive evaluation of the most representative generative AI models, covering 304 real cases from the New England Journal of Medicine (NEJM). The basic models involved in the evaluation include GPT, Llama, Claude, Gemini, Grok, and DeepSeek.
Paper link: https://arxiv.org/abs/2506.22405v2
In addition to benchmarking these models, the researchers also designed the Microsoft AI Diagnostic Orchestrator (MAI-DxO) —
A system that simulates a collaborative team of multiple virtual doctors who jointly handle complex cases through diverse diagnostic ideas.
Figure 5: Overview of the MAI-DxO orchestration system
Compared to a single model, orchestrators are not only better at integrating data from different sources, but also provide greater security, transparency, and adaptability as the healthcare environment changes.
This model-agnostic architecture also improves the auditability and resilience of the system, both of which are critical for high-risk, rapidly evolving clinical scenarios.
The evaluation results show that MAI-DxO significantly improves the diagnostic performance of all models. The best performance is the combination of MAI-DxO and OpenAI's o3 model, with a diagnostic accuracy of 85.5% in NEJM cases.
For comparison, the experiment also evaluated 21 practicing doctors from the United States and the United Kingdom, who had 5 to 20 years of clinical experience. In the same task, the average accuracy of the cases they completed was only 20%.
MAI-DxO is configurable and allows for a cost cap to be set, allowing for exploration of the “cost vs. value” trade-off in the diagnostic process.
Without restrictions, AI may be tempted to prescribe all possible tests, regardless of cost, patient experience, or delays in diagnosis and treatment. The study found that MAI-DxO was not only more accurate than doctors and single models, but also had lower overall testing costs.
The following scatter plot compares different AI models in terms of "diagnostic accuracy" and "average testing cost". The MAI-DxO curve in the figure is located in the upper left area with the best performance, and the red cross represents the average level of human doctors.
AI+doctor: The first step to say goodbye to expensive medical treatment
Doctors often choose between breadth or depth of expertise. For example, general practitioners deal with a wide range of problems across age groups and systems, while specialists focus on a single disease or system.
However, the complexity of NEJM cases is far beyond the scope of a single doctor. AI is not limited by this and can take into account both breadth and depth. In addition, in many aspects, AI's clinical reasoning ability has surpassed that of human doctors.
This capability has the potential to revolutionize healthcare—not only empowering patients to manage routine health issues themselves, but also providing decision support to doctors.
Currently, medical spending in the United States accounts for nearly 20% of GDP, of which up to a quarter is ineffective spending.
AI is expected to be a key force in curbing this waste.
This is not about replacing doctors, but about opening up a new medical co-governance model: AI+doctor, joint diagnosis.
Reference: https://www.reddit.com/r/ChatGPT/comments/1lrmom4/chatgpt_solved_a_10_year_problem_no_doctors_could/
https://x.com/rohanpaul_ai/status/1939800536121057652
https://x.com/rohanpaul_ai/status/1941321376838951320
https://microsoft.ai/new/the-path-to-medical-superintelligence/
https://www.cdc.gov/folic-acid/data-research/mthfr/index.html
This article comes from the WeChat public account "Xinzhiyuan" , author: Xinzhiyuan, published by 36Kr with authorization.