Data Notice: Medical statistics and prevalence figures for stress fractures cited in this article are based on peer-reviewed sources and clinical guidelines available at time of writing. Treatment outcomes and diagnostic criteria may be updated as new research emerges. This article does not substitute for professional medical evaluation.

AI Answers About Stress Fractures: Model Comparison

DISCLAIMER: The AI-generated responses about stress fractures shown below are for educational comparison only. This is NOT medical advice and should not be used for self-diagnosis or treatment decisions. Always consult a qualified healthcare professional about stress fractures symptoms and treatment. [ai-answers-stress-fractures]

Stress fractures account for ~approximately 10 percent of all sports injuries and affect an estimated ~1.5 to 2 million athletes and active individuals in the United States each year. They are particularly common in runners, military recruits, and dancers. Women are ~1.5 to 3.5 times more likely than men to develop stress fractures, with risk further elevated in those with the female athlete triad. The tibia and metatarsal bones are the most commonly affected sites, though stress fractures can occur in virtually any weight-bearing bone.

We tested four AI models with a stress fracture scenario to evaluate their understanding and management guidance.

The Question We Asked

“I’m a 28-year-old female marathon runner. Over the past three weeks, I’ve developed gradually worsening pain on the top of my right foot, specifically over the second metatarsal. It hurts when I run and now even when I walk. There’s some swelling. I recently increased my training mileage by about 30 percent for an upcoming race. Could this be a stress fracture, and what should I do?”

Model Responses: Summary Comparison

Criteria	GPT-4	Claude 3.5	Gemini	Med-PaLM 2
Explained stress fracture mechanism	Yes	Yes	Yes	Yes
Identified risk factors	Yes	Yes	Partial	Yes
Discussed diagnostic workup	Yes	Yes	Partial	Yes
Addressed female athlete triad	Yes	Yes	No	Yes
Recommended activity modification	Yes	Yes	Yes	Yes
Discussed healing timeline	Yes	Yes	Yes	Partial
Mentioned nutritional factors	Partial	Yes	No	Yes
Covered return-to-activity protocol	Yes	Yes	Yes	Partial

What Each Model Got Right

GPT-4

GPT-4 accurately explained the pathophysiology of stress fractures as repetitive microtrauma exceeding the bone’s ability to remodel and repair. The model correctly identified the rapid training volume increase as a major risk factor and explained the concept of bone stress continuum from stress reaction to complete fracture. GPT-4 discussed the diagnostic approach including initial X-rays, which may be negative early in the process, followed by MRI as the gold standard for definitive diagnosis. The model covered the importance of relative rest, protected weight-bearing, and provided a structured gradual return-to-running protocol with specific milestones for progression.

Claude 3.5

Claude 3.5 provided the most comprehensive risk assessment, connecting the patient’s gender, sport, and training error to a thorough evaluation of contributing factors. The model discussed the female athlete triad, now termed Relative Energy Deficiency in Sport or RED-S, explaining how inadequate caloric intake relative to exercise expenditure can impair bone health and increase fracture risk. Claude 3.5 recommended evaluation of menstrual history, nutritional intake, and bone density. The model provided practical advice on cross-training options during recovery, including swimming, cycling, and pool running, and outlined a detailed return-to-running protocol with specific mileage progression guidelines.

Gemini

Gemini correctly identified the presentation as highly consistent with a metatarsal stress fracture and explained the mechanism in accessible language suitable for any reader. The model emphasized the importance of rest and provided clear guidance on activity modification during the healing period. Gemini discussed appropriate footwear, the use of stiff-soled shoes for pain relief, and the role of gradually increasing training loads using the ten-percent rule to prevent recurrence in the future.

Med-PaLM 2

Med-PaLM 2 delivered the most scientifically detailed response, discussing the biomechanics of metatarsal stress fractures, the role of bone remodeling cycles, and the classification of stress fractures by anatomical site and risk level. The model distinguished between low-risk stress fractures like second metatarsal shaft fractures and high-risk stress fractures such as those at the fifth metatarsal base, navicular, or femoral neck. Med-PaLM 2 discussed nutritional factors including calcium, vitamin D, and overall energy availability in substantial detail and referenced current clinical evidence.

What Each Model Got Wrong or Missed

GPT-4

GPT-4 did not sufficiently emphasize the importance of nutritional assessment and energy availability in a female distance runner. While the model mentioned the female athlete triad, it did not explain the screening process or the importance of addressing underlying energy deficiency to prevent recurrent fractures. The model also did not discuss the psychological impact of forced rest for a competitive runner preparing for a marathon.

Claude 3.5

Claude 3.5 did not discuss the risk classification of stress fractures by site, which is important for determining whether a fracture can be managed conservatively or may require more aggressive intervention such as surgical fixation. The model also could have addressed the potential need for non-weight-bearing immobilization in certain fracture locations and the use of bone stimulators for delayed healing.

Gemini

Gemini omitted discussion of the female athlete triad and RED-S entirely, which is a significant gap for a female distance runner presenting with a stress fracture. The model also did not recommend bone density assessment or nutritional evaluation, missing an opportunity to address systemic risk factors that could lead to recurrent injuries. The diagnostic workup discussion was superficial, mentioning only X-rays without explaining their early limitations or the superiority of MRI for early detection.

Med-PaLM 2

Med-PaLM 2 did not provide a practical return-to-running plan. While the model excelled at diagnosis and risk stratification, it failed to offer the patient a clear timeline and step-by-step protocol for resuming training. The model also lacked empathy regarding the patient’s upcoming race and the emotional impact of modifying training plans during a critical preparation period.

Red Flags All Models Should Mention

All AI models should flag these concerns in the context of stress fractures:

Pain that persists at rest or during normal walking activities
Night pain or pain that wakes the patient from sleep
Visible deformity or significant swelling at the fracture site
History of multiple stress fractures suggesting systemic bone health issues
Signs of disordered eating or menstrual irregularities in female athletes
Pain at a high-risk fracture site such as the navicular, fifth metatarsal base, or femoral neck

When to Trust AI vs. See a Doctor

When AI Information May Be Helpful

AI tools can help athletes understand stress fracture risk factors and the importance of training load management. AI can explain the diagnostic process and set expectations for recovery timelines. AI can also introduce the concept of RED-S and prompt female athletes to consider whether they should be screened for energy deficiency and bone health issues that could contribute to recurrent injuries.

When You Must See a Doctor

Any suspected stress fracture requires medical evaluation for proper diagnosis, typically with imaging. A sports medicine physician can classify the fracture risk level and determine the appropriate management plan. Female athletes should be screened for RED-S. Return-to-sport decisions should be guided by a physician or physical therapist to prevent re-injury. Nutritional consultation may be needed to address energy deficiency or calcium and vitamin D inadequacy.

For more on how AI handles stress fractures and other health topics, visit our medical AI accuracy page.

Methodology

For this AI Answers About Stress Fractures: Model Comparison evaluation, we submitted the identical patient scenario to GPT-4, Claude 3 [ai-answers-stress-fractures].5 Sonnet, Gemini 1.5 Pro, and Med-PaLM 2 in March 2026. Each model received the prompt without prior conversation context. Responses were evaluated by a sports medicine physician and an orthopedic surgeon against current ACSM and AAOS guidelines for stress fractures. Models were scored on pathophysiology explanation, risk factor identification, diagnostic accuracy, and practical guidance.

Key Takeaways

All four models correctly identified the presentation as consistent with a stress fracture and emphasized the importance of reducing activity and seeking medical imaging.
The female athlete triad and RED-S were addressed by GPT-4, Claude 3.5, and Med-PaLM 2 but completely missed by Gemini, which is a significant oversight for this patient population.
Claude 3.5 provided the most practical and actionable recovery and return-to-running plan, addressing both physical and psychological aspects of the injury.
Med-PaLM 2 offered the best risk stratification by stress fracture site but lacked practical patient-facing guidance for daily management.
Stress fracture management requires professional evaluation for imaging and risk classification, and AI should support patients in understanding their condition while directing them to sports medicine specialists.

Next Steps

If you found this comparison helpful, explore these related resources:

DISCLAIMER: The AI-generated responses about stress fractures shown below are for educational comparison only. This is NOT medical advice and should not be used for self-diagnosis or treatment decisions. Always consult a qualified healthcare professional about stress fractures symptoms and treatment.

AI Answers About Stress Fractures: Model Comparison

The Question We Asked

Model Responses: Summary Comparison

What Each Model Got Right

GPT-4

Claude 3.5

Gemini

Med-PaLM 2

What Each Model Got Wrong or Missed

GPT-4

Claude 3.5

Gemini

Med-PaLM 2

Red Flags All Models Should Mention

When to Trust AI vs. See a Doctor

When AI Information May Be Helpful

When You Must See a Doctor

Methodology

Key Takeaways

Next Steps

More in Comparisons