Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Evaluating the Response of Large Language Models to Tuberculosis Medical Queries: A Study of ChatGPT, Gemini, and Copilot

Comparative Analysis of Chatbot Performance in Medical Queries Related to Tuberculosis

Table of Contents

  • Table 1: Mean Scores of Responses to Medical Questions based on NLAT-AI Criteria
  • Figure 1: Heatmap of Mean Scores across Chatbots
  • Table 2: Chatbot Scores on Diagnostic Domain Indices
  • Table 3: Chatbot Scores on Treatment Domain Indices
  • Table 4: Chatbot Scores in Prevention and Control
  • Table 5: Chatbot Scores in Disease Management
  • Table 6: Evaluation of Chatbots Based on DISCERN-AI Criteria

Summary of Findings

This section summarizes the performance of ChatGPT, Copilot, and Gemini across various medical inquiry categories, offering insights into their strengths and areas for improvement.

Evaluating Chatbot Performance in Medical Inquiries: A Deep Dive

In the realm of digital health, artificial intelligence (AI) chatbots are increasingly playing a pivotal role in answering medical queries. Recent evaluations have focused on three prominent models—ChatGPT, Copilot, and Gemini—specifically in the context of tuberculosis management. The findings, relay valuable insights into their performance across various medical categories based on the NLAT-AI criteria.

Overview of the Findings

Performance Summary

Table 1 presents the mean scores for each chatbot across four main categories: Diagnostic, Treatment, Prevention & Control, and Disease Management. Fig. 1 displays a heatmap that visually represents each model’s average performance.

  • Diagnostics: All three chatbots achieved an impressive score of 4.0, indicating a strong consensus in diagnostic competency.

  • Treatment: ChatGPT and Copilot scored 4.0, while Gemini lagged slightly behind at 3.8. This discrepancy points to potential limitations in Gemini’s treatment indicators.

  • Prevention & Control: Here, Gemini outperformed the others, earning a score of 4.4, while Copilot scored the lowest at 3.6.

  • Disease Management: Both ChatGPT and Gemini earned a solid score of 4.0, contrasting with Copilot’s lower score of 3.6, suggesting it may benefit from enhancements in this area.

In-Depth Analysis of Chatbot Categories

Diagnostics Performance (Table 2)

Table 2 highlights diagnostic capabilities concerning brucellosis. The chatbots generally scored well, with most indices reflecting a score of 4. However, Gemini’s scores of 3 in Appropriateness and Effectiveness indicate room for improvement, especially when addressing specific medical queries.

Treatment Evaluation (Table 3)

When assessing treatment-related inquiries, ChatGPT achieved a stellar score of 5 in Accuracy, outperforming both Copilot and Gemini. Copilot, however, surpassed Gemini in the Appropriateness metric, suggesting that while ChatGPT excels in precision, the others may offer more clinically relevant guidance.

Prevention & Control Metrics (Table 4)

Gemini’s superior performance in the Prevention & Control domain is underscored by perfect scores (5 out of 5) in Safety and Actionability metrics. These findings suggest Gemini’s robustness in preventative health measures and actionable advice—a vital aspect in controlling outbreaks.

Disease Management Insights (Table 5)

Table 5 illustrates a comparable performance among all three chatbots in Disease Management. While scores were generally high, Copilot’s lower scores in Accuracy and Effectiveness indicate it might need further fine-tuning in providing comprehensive patient care strategies.

Evaluating Brucellosis Responses (Table 6)

Table 6 provides a broader look at how well each chatbot handles inquiries regarding brucellosis:

  • Information Relevance: ChatGPT took the lead, providing more relevant responses than its counterparts.

  • Citing Sources: While Copilot and Gemini included partial citations, they still outperformed ChatGPT, which did not reference any sources. This aspect highlights transparency discrepancies among models.

  • Date of Information Production: None of the chatbots provided production dates, a critical oversight that can hinder the assessment of information reliability and timeliness.

  • Balance and Impartiality: All three chatbots performed similarly, maintaining neutrality in their responses.

  • Additional Sources and Uncertainty Indication: All models struggled to provide ample additional resources or acknowledge uncertainty in their responses, indicating widespread opportunities for enhancement.

Conclusion

The comparative evaluation of ChatGPT, Copilot, and Gemini underscores the potential and limitations of AI chatbots in the medical field. While all three demonstrated strong diagnostic capabilities, substantial variations across treatment, prevention, and disease management highlight the need for ongoing improvements.

Future developments should focus on:

  1. Enhancing Domain-Specific Knowledge: Particularly for Gemini in treatment and outcomes.
  2. Improving Source Transparency: Including comprehensive citations and references.
  3. Incorporating Timeliness: Assigning production dates to enhance reliability.

As AI chatbots continue to evolve in the medical landscape, these findings not only guide improvements in chatbot performance but also ensure that they achieve their ultimate goal: providing accurate, relevant, and timely medical information to users.

Latest

Principal Financial Group Enhances Automation for Building, Testing, and Deploying Amazon Lex V2 Bots

Accelerating Customer Experience: Principal Financial Group's Innovative Approach to...

ChatGPT to Permit Adult Content: How Can Parents Ensure Children’s Safety?

Navigating Digital Dilemmas: Parents' Worries About Children's Online Behavior...

AiMOGA Robotics Takes Center Stage at the 2025 Chery International User Summit for Co-Creation Initiatives

Unveiling the Future of Mobility: Highlights from the 2025...

Product Manager Develops Innovative Enterprise Systems Worth Billions

Transforming Healthcare and Retail: The Innovative Journey of Mihir...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

Product Manager Develops Innovative Enterprise Systems Worth Billions

Transforming Healthcare and Retail: The Innovative Journey of Mihir Pathak Empowering Change through Intelligent Systems and Digital Integration Revolutionizing Healthcare and Retail: The Vision of Mihir...

U.S. Artificial Intelligence Market: Size and Share Analysis

Overview of the U.S. Artificial Intelligence Market and Its Growth Potential Key Trends and Impact Factors Dynamic Growth Projections Transformative Role of Generative AI Economic Implications of Reciprocal...

How AI is Revolutionizing Data, Decision-Making, and Risk Management

Transforming Finance: The Impact of AI and Machine Learning on Financial Systems The Transformation of Finance: AI and Machine Learning at the Core As Purushotham Jinka...