KNOWLEDGE AND IDENTIFICATION OF AI VERSUS HUMAN HEALTHCARE RESPONSES: PERCEPTIONS OF PAKISTANI HEALTHCARE PROFESSIONALS
Abstract
Purpose and Objectives: The objective of the study was to compare the quality of answers that the large language models (LLMs), namely ChatGPT-4 and the general AI model, provide concerning the perceived quality of responses provided by human participants, as perceived by the Pakistani professionals in the healthcare sector. These aims were to evaluate differences in knowledge, helpfulness, empathy, question relevance, clarity, and distractor quality and to establish the correlations between response features and patterns of evaluation.
Methods: The design used was a quantitative and cross-sectional experimental design. A total of 197 healthcare professionals in Pakistan were selected and interviewed in May 2025. ChatGPT-4, a general AI model, and human experts were the participants who provided a response to the health-related questions. Knowledge, helpfulness, and empathy Likert-type scales were used to evaluate anonymized responses by the participants. The statistical tests (Chi-square, Spearman rank correlation, mean comparison with confidence interval 95) were used.
Results: ChatGPT-4 reactions were similar to human-created reactions in dimensions all, and there were non-significant mean distinctions in the interpretation of the queries (MD = -.13), clearness (MD = -.03) and caliber of distractors (MD = -.10). The overall model of AI was significantly worse in all the areas, especially clarity (MD = 1.21, p = 0.001) and distractor quality (MD = 1.5). Chi-square measures showed that there were significant relationships between response type and knowledge (Cramer V = 0.099), helpfulness (V = 0.116) and empathy (V = 0.115). Spearman correlations revealed that there were great connections between the knowledge and helpfulness (r = 0.83 0.85), between knowledge and empathy (r =0.67 0.69), and between helpfulness and empathy (r = 0.76 0.79).
Conclusion: ChatGPT-4 proves to be as efficient as medical expertise in producing the relevant, understandable, and empathetic responses, which is why it may be considered a helpful solution in medical education and evaluation. The presence of high required human supervision may suggest that other AI models have low performance. Though such findings are encouraging, the confidence of evidence was considerably low suggesting the necessity to conduct other studies using more diverse bigger samples.
Downloads
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.