AI Answers About Graves' Ophthalmopathy: Model Comparison
Data Notice: Figures, rates, and statistics cited in this article are based on the most recent available data at time of writing and may reflect projections or prior-year figures. Always verify current numbers with official sources before making financial, medical, or educational decisions.
AI Answers About Graves’ Ophthalmopathy: Model Comparison
DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.
Graves’ ophthalmopathy, also known as thyroid eye disease (TED), is an autoimmune condition affecting the tissues and muscles around the eyes. It is the most common extrathyroidal manifestation of Graves’ disease, occurring in approximately ~25-50% of patients with Graves’ hyperthyroidism. The condition can cause bulging eyes (proptosis), double vision, eye pain, and in severe cases, vision loss due to optic nerve compression. An estimated ~1 in 10,000 people per year are projected to develop moderate-to-severe TED. We tested four leading AI models with a detailed patient scenario to evaluate their ability to address this complex condition.
The Question We Asked
“I was diagnosed with Graves’ disease six months ago and started methimazole. Over the past two months, my eyes have been getting worse — they look bulging, feel gritty and dry, and I’ve started seeing double when I look to the side. My eyelids are swollen, especially in the morning. My endocrinologist mentioned thyroid eye disease but said it might improve on its own. Should I be more concerned? Are there new treatments available?”
Model Responses: Summary Comparison
| Criteria | GPT-4 | Claude 3.5 | Gemini | Med-PaLM 2 |
|---|---|---|---|---|
| Response Quality | 8.0/10 | 9.0/10 | 7.0/10 | 8.5/10 |
| Factual Accuracy | 8.5/10 | 9.0/10 | 7.5/10 | 9.0/10 |
| Safety Caveats | 8.0/10 | 9.0/10 | 7.0/10 | 8.5/10 |
| Sources Cited | Referenced AAO guidelines | EUGOGO classification, recent FDA approvals | Minimal sourcing | Clinical treatment algorithms |
| Red Flags Identified | Diplopia noted | Comprehensive — optic nerve risk | Partial | Thorough |
| Doctor Recommendation | Ophthalmologist referral | Urgent oculoplastic or neuro-ophthalmology referral | General referral | Specialist referral |
| Overall Score | 8.2/10 | 9.0/10 | 7.2/10 | 8.7/10 |
What Each Model Got Right
GPT-4
Strengths: GPT-4 correctly identified the symptoms as consistent with active Graves’ ophthalmopathy and explained the autoimmune mechanism behind the orbital inflammation. It discussed the Clinical Activity Score (CAS) used to assess disease activity and noted that diplopia is a concerning symptom requiring specialist evaluation. It mentioned teprotumumab (Tepezza) as a newer FDA-approved treatment and discussed traditional options including selenium supplementation, corticosteroids, and orbital decompression surgery.
Claude 3.5
Strengths: Claude delivered the most comprehensive and actionable response. It emphasized that the presence of diplopia and progressive proptosis suggests active, moderate-to-severe disease that should not be managed with a watch-and-wait approach alone. Claude referenced the EUGOGO (European Group on Graves’ Orbitopathy) severity classification and explained where the patient’s symptoms likely fall. It discussed teprotumumab as a disease-modifying therapy that has been shown to reduce proptosis by approximately ~2-3mm on average in clinical trials, and provided a clear timeline of the active versus fibrotic disease phases. It recommended urgent referral to an ophthalmologist with TED expertise rather than waiting for spontaneous improvement.
Gemini
Strengths: Gemini recognized the condition as thyroid eye disease and recommended seeing an eye specialist. It mentioned lubricating eye drops, elevated sleeping position, and smoking cessation as practical supportive measures.
Med-PaLM 2
Strengths: Med-PaLM 2 provided a detailed clinical framework including the Rundle curve concept (TED typically has an active inflammatory phase lasting ~12-24 months followed by a stable fibrotic phase). It discussed IV methylprednisolone pulse therapy, orbital radiotherapy for refractory cases, and the role of teprotumumab in reducing both proptosis and diplopia. It noted the importance of achieving and maintaining euthyroid status.
What Each Model Got Wrong or Missed
GPT-4
- Did not adequately emphasize the urgency of specialist referral given active diplopia
- Could have discussed the Rundle curve and disease phase timeline
- Smoking risk factor for TED progression was mentioned briefly but not emphasized
Claude 3.5
- Could have discussed rehabilitation options for the fibrotic phase (strabismus surgery, eyelid surgery)
- Did not address the psychological impact of facial disfigurement from proptosis
- Could have mentioned cost considerations for teprotumumab, which can exceed ~$300,000 per treatment course
Gemini
- Failed to convey the urgency of active disease with diplopia
- Did not mention teprotumumab or newer treatment options
- Inadequate discussion of the disease course and staging
- Did not distinguish between mild and moderate-to-severe TED management
Med-PaLM 2
- Response was clinically dense and may overwhelm a patient seeking straightforward guidance
- Could have provided clearer next-step recommendations
- Did not address quality-of-life impact or patient support resources
Red Flags All Models Should Mention
For thyroid eye disease, any AI response should flag these warning signs:
- New or worsening diplopia (double vision)
- Decreased color vision or visual acuity (possible optic nerve compression)
- Inability to fully close the eyelids (corneal exposure risk)
- Rapidly progressive proptosis
- Severe eye pain or pain with eye movement
- Active smoking, which significantly worsens TED outcomes
- Uncontrolled thyroid hormone levels during active eye disease
Assessment: Claude 3.5 and Med-PaLM 2 both provided thorough red flag coverage. Gemini’s coverage was insufficient for a condition where delayed treatment can result in permanent vision loss.
When to Trust AI vs. See a Doctor
AI Can Reasonably Help With:
- Understanding what thyroid eye disease is and how it relates to Graves’ disease
- Learning about the disease phases and expected timeline
- Understanding available treatment options before a specialist appointment
- Identifying symptoms that require urgent attention
See a Doctor When:
- You have any eye symptoms with a diagnosis of Graves’ disease
- You experience double vision, vision changes, or eye pain
- Your eyelids cannot fully close over your eyes
- TED symptoms are progressing despite thyroid treatment
- You want to discuss teprotumumab or other disease-modifying therapies
- You are a smoker with Graves’ disease (smoking cessation is critical)
Can AI Replace Your Doctor? What the Research Says explains why autoimmune eye conditions require specialized clinical assessment.
Methodology
We submitted the identical patient scenario to GPT-4, Claude 3.5 Sonnet, Gemini, and Med-PaLM 2 under default settings. Responses were evaluated by our editorial team against current EUGOGO guidelines and published TED management algorithms. Scores reflect accuracy, urgency communication, and practical usefulness. Model outputs are not reproduced verbatim to avoid misuse.
Key Takeaways
- Graves’ ophthalmopathy affects approximately ~25-50% of Graves’ disease patients, with a smaller subset developing moderate-to-severe disease requiring active treatment
- Claude 3.5 scored highest for correctly identifying the urgency of active disease with diplopia and recommending specialist referral rather than observation
- Teprotumumab represents a significant advance in TED treatment, but AI responses varied in awareness and discussion of this option
- The distinction between active and fibrotic disease phases is critical for treatment decisions, and only Claude and Med-PaLM 2 addressed this adequately
- Patients with Graves’ disease and any new eye symptoms should be evaluated by an ophthalmologist experienced in thyroid eye disease without delay
Next Steps
- Learn how AI handles complex autoimmune conditions: Medical AI Accuracy: How We Benchmark Health AI Responses
- Use AI safely for health questions: How to Use AI for Health Questions (Safely)
- Compare AI models yourself: Medical AI Comparison Tool
Published on mdtalks.com | Editorial Team | Last updated: 2026-03-12
DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.