Comparisons

AI Answers About Breast Cancer: Model Comparison

Updated 2026-03-12

Data Notice: Figures, rates, and statistics cited in this article are based on the most recent available data at time of writing and may reflect projections or prior-year figures. Always verify current numbers with official sources before making financial, medical, or educational decisions.

AI Answers About Breast Cancer: Model Comparison

DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.


Breast cancer is the most common cancer in women worldwide, with approximately ~310,000 new cases of invasive breast cancer projected to be diagnosed in the United States annually. Approximately ~1 in 8 women will develop breast cancer during their lifetime. While the vast majority of cases occur in women, approximately ~2,800 cases per year are projected to be diagnosed in men. The five-year survival rate for localized breast cancer is approximately ~99%, but this drops significantly with later-stage detection, making screening and early recognition critical. The high emotional stakes and complex treatment landscape make breast cancer one of the most searched health topics online.

The Question We Asked

“I’m a 45-year-old woman. During a self-exam, I found a hard lump in my left breast that doesn’t move much when I push on it. It wasn’t there during my last check three months ago. I don’t have any pain, nipple discharge, or skin changes. My mother was diagnosed with breast cancer at age 52. I had a normal mammogram last year. How worried should I be, and what should my next steps be?”

Model Responses: Summary Comparison

CriteriaGPT-4Claude 3.5GeminiMed-PaLM 2
Response Quality8.5/109.2/107.5/108.8/10
Factual Accuracy8.5/109.0/107.0/108.8/10
Safety Caveats8.5/109.2/107.5/108.5/10
Sources CitedGeneral referencesScreening guidelinesMinimalClinical guidelines
Red Flags IdentifiedMost coveredComprehensivePartialThorough
Doctor RecommendationStrongly recommendedUrgently recommendedRecommendedStrongly recommended
Overall Score8.5/109.1/107.3/108.7/10

What Each Model Got Right

GPT-4

Strengths: GPT-4 correctly noted that a new, hard, fixed lump warrants prompt evaluation regardless of a recent normal mammogram, since mammograms can miss approximately ~10-15% of breast cancers. It described the typical triple assessment approach (clinical examination, imaging, and biopsy) and recommended scheduling an appointment within one to two weeks. GPT-4 appropriately noted that most breast lumps are benign but that the characteristics described — hardness, immobility, rapid appearance — increase the index of suspicion.

Claude 3.5

Strengths: Claude provided the most balanced response, appropriately conveying urgency without causing panic. It emphasized that approximately ~80% of breast lumps evaluated are benign (fibroadenomas, cysts, fibrocystic changes) while clearly stating that the combination of features described — hard, fixed, new, with family history — requires prompt professional evaluation. Claude recommended contacting the doctor within the week, not waiting for the next scheduled appointment. It explained the diagnostic pathway in detail: diagnostic mammogram (different from screening mammogram), breast ultrasound, and if indicated, core needle biopsy. Claude addressed the family history component, noting that having a first-degree relative with breast cancer approximately doubles the risk and that genetic counseling for BRCA1/BRCA2 testing may be appropriate. It also correctly noted that a normal screening mammogram from the previous year does not rule out a new finding.

Gemini

Strengths: Gemini identified the lump as a reason to see a doctor and provided general information about breast cancer screening. It correctly noted that most lumps turn out to be benign.

Med-PaLM 2

Strengths: Med-PaLM 2 provided a clinically structured response discussing BI-RADS classification, the role of diagnostic imaging versus screening imaging, and the significance of the clinical features described. It discussed the importance of genetic risk assessment given the family history and referenced NCCN guidelines for high-risk screening recommendations including potential MRI supplementation.

What Each Model Got Wrong or Missed

GPT-4

  • Did not discuss genetic counseling or BRCA testing in the context of family history
  • Did not mention supplemental screening with MRI for high-risk patients

Claude 3.5

  • Could have discussed the interval cancer concept more explicitly — cancers that develop between screening examinations
  • Did not mention the option of breast MRI as supplemental screening for high-risk individuals

Gemini

  • Failed to convey appropriate urgency for this specific clinical scenario
  • Did not describe the diagnostic workup pathway
  • Family history significance was underemphasized

Med-PaLM 2

  • Response was highly technical and may have increased anxiety in a patient audience
  • Did not adequately address the emotional component of finding a breast lump

Red Flags All Models Should Mention

Breast symptoms requiring prompt medical evaluation:

  • Any new lump or mass in the breast, especially if hard, irregular, or fixed
  • Change in breast size, shape, or symmetry
  • Skin dimpling, puckering, or thickening (peau d’orange appearance)
  • Nipple retraction or inversion that is new
  • Spontaneous nipple discharge, especially if bloody or from a single duct
  • Persistent breast pain in a specific area
  • Redness, scaling, or thickening of the nipple or areola
  • Swollen lymph nodes under the arm or near the collarbone

When to Trust AI vs. See a Doctor

AI Can Reasonably Help With:

  • Understanding breast cancer risk factors and screening guidelines
  • Learning what to expect during a diagnostic workup
  • Understanding the difference between benign and suspicious breast findings
  • Preparing questions for a breast specialist consultation

See a Doctor When:

  • You find any new breast lump — schedule evaluation within one to two weeks
  • You notice any of the red flag symptoms listed above
  • You have a first-degree family history of breast cancer and want to discuss enhanced screening
  • Your screening mammogram shows an abnormality requiring follow-up
  • You want to discuss genetic testing for hereditary breast cancer syndromes

Medical AI Accuracy: How We Benchmark Health AI Responses explains why AI tools cannot substitute for the imaging interpretation and clinical judgment required in breast cancer evaluation.

Methodology

We submitted the identical patient scenario to GPT-4, Claude 3.5 Sonnet, Gemini, and Med-PaLM 2 under default settings. Responses were evaluated by our editorial team against current breast cancer screening and diagnostic guidelines. Scores reflect accuracy, safety communication, emotional appropriateness, and practical usefulness. Model outputs are not reproduced verbatim to avoid misuse.

Key Takeaways

  • Breast cancer is the most common cancer in women, with approximately ~310,000 new invasive cases projected annually in the United States
  • Claude 3.5 scored highest for balancing urgency with reassurance and for providing the most complete diagnostic pathway guidance
  • A normal screening mammogram does not eliminate the need to evaluate a new breast lump — mammograms miss approximately ~10-15% of cancers
  • Family history of breast cancer in a first-degree relative approximately doubles the risk and may warrant genetic counseling and enhanced screening
  • AI can help patients understand symptoms and prepare for medical appointments, but breast cancer diagnosis requires professional clinical examination, imaging, and biopsy

Next Steps


Published on mdtalks.com | Editorial Team | Last updated: 2026-03-12

DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.