KNOWLEDGE AND IDENTIFICATION OF AI VERSUS HUMAN HEALTHCARE RESPONSES: PERCEPTIONS OF PAKISTANI HEALTHCARE PROFESSIONALS

MUHAMMAD MUZAMMIL SAQLAIN, MUHAMMAD UMAR ABBAS, AIMEN ABBAS, ANAM BINT IRFAN AKBAR, SOHAIL AMJAD, SADAF IFTIKHAR, MUHAMMAD HAMZA, AHMAD SAQIB, MUHAMMAD ALI

Authors

MUHAMMAD MUZAMMIL SAQLAIN, MUHAMMAD UMAR ABBAS, AIMEN ABBAS, ANAM BINT IRFAN AKBAR, SOHAIL AMJAD, SADAF IFTIKHAR, MUHAMMAD HAMZA, AHMAD SAQIB, MUHAMMAD ALI

Abstract

Purpose and Objectives: The objective of the study was to compare the quality of answers that the large language models (LLMs), namely ChatGPT-4 and the general AI model, provide concerning the perceived quality of responses provided by human participants, as perceived by the Pakistani professionals in the healthcare sector. These aims were to evaluate differences in knowledge, helpfulness, empathy, question relevance, clarity, and distractor quality and to establish the correlations between response features and patterns of evaluation.

Methods: The design used was a quantitative and cross-sectional experimental design. A total of 197 healthcare professionals in Pakistan were selected and interviewed in May 2025. ChatGPT-4, a general AI model, and human experts were the participants who provided a response to the health-related questions. Knowledge, helpfulness, and empathy Likert-type scales were used to evaluate anonymized responses by the participants. The statistical tests (Chi-square, Spearman rank correlation, mean comparison with confidence interval 95) were used.

Results: ChatGPT-4 reactions were similar to human-created reactions in dimensions all, and there were non-significant mean distinctions in the interpretation of the queries (MD = -.13), clearness (MD = -.03) and caliber of distractors (MD = -.10). The overall model of AI was significantly worse in all the areas, especially clarity (MD = 1.21, p = 0.001) and distractor quality (MD = 1.5). Chi-square measures showed that there were significant relationships between response type and knowledge (Cramer V = 0.099), helpfulness (V = 0.116) and empathy (V = 0.115). Spearman correlations revealed that there were great connections between the knowledge and helpfulness (r = 0.83 0.85), between knowledge and empathy (r =0.67 0.69), and between helpfulness and empathy (r = 0.76 0.79).

Conclusion: ChatGPT-4 proves to be as efficient as medical expertise in producing the relevant, understandable, and empathetic responses, which is why it may be considered a helpful solution in medical education and evaluation. The presence of high required human supervision may suggest that other AI models have low performance. Though such findings are encouraging, the confidence of evidence was considerably low suggesting the necessity to conduct other studies using more diverse bigger samples.

KNOWLEDGE AND IDENTIFICATION OF AI VERSUS HUMAN HEALTHCARE RESPONSES: PERCEPTIONS OF PAKISTANI HEALTHCARE PROFESSIONALS

Authors

Abstract

Downloads

How to Cite

Issue

Section

License