Publication:
Artificial Intelligence Chatbots in Peritoneal Dialysis Education: A Cross-Sectional Comparative Study of Quality, Readability, and Reliability

Placeholder

Organizational Units

Program

Institution Authors

Authors

Onan E.
Bozaci İ.
DELİGÖZ BİLDACI Y.
Karakaya S. P. Y.
Kozanoglu R.
KAZANCIOĞLU R.

Advisor

Language

Publisher

Journal Title

Journal ISSN

Volume Title

Abstract

Background: Peritoneal dialysis (PD) remains underutilized worldwide, partly due to limited patient education, misconceptions, and barriers to accessing reliable health information. Artificial intelligence (AI)-based chatbots have emerged as promising tools for improving health literacy, supporting shared decision-making, and enhancing patient engagement. However, concerns regarding content quality, reliability, and readability persist, and no study to date has systematically evaluated AI-generated content in the context of PD. Therefore, this study aimed to systematically evaluate the quality, reliability, and readability of AI-generated educational content on peritoneal dialysis using multiple large language model-based chatbots. Methods: A total of 45 frequently asked questions about PD were developed by nephrology experts and categorized into three domains: general information (n = 15), technical and clinical issues (n = 21), and myths/misconceptions (n = 9). Three AI-based chatbots, Gemini Pro 2.5, ChatGPT-5, and LLaMA Maverick 4, were prompted to generate responses to all questions. Each response was independently evaluated by two blinded reviewers for textual characteristics, readability using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL), and content quality/reliability using the Ensuring Quality Information for Patients (EQIP) tool and the Modified DISCERN instrument. Results: Across all domains, significant differences were observed among the chatbots. Gemini Pro 2.5 achieved higher Flesch Reading Ease (FRES) scores (32.6 ± 10.5) compared with ChatGPT-5 (24.2 ± 11.7) and LLaMA Maverick 4 (16.2 ± 7.5; p < 0.001), as well as higher EQIP scores (75.4% vs. 59.4% and 61.5%, respectively; p < 0.001) and Modified DISCERN scores (4.0 [4.0–4.5] vs. 3.0 [3.0–3.5] and 3.0 [2.5–3.5]; p < 0.001). ChatGPT-5 demonstrated intermediate performance, while LLaMA Maverick 4 showed lower scores across evaluated metrics. Conclusions: These findings demonstrate differences among AI-based chatbots in readability, content quality, and reliability when responding to identical peritoneal dialysis–related questions. While AI chatbots may support health literacy and complement clinical decision-making, their outputs should be interpreted with caution and under appropriate clinical oversight. Future research should focus on multilingual, multicenter, and outcome-based studies to ensure the safe integration of AI into PD patient education.

Description

Source:

Keywords:

Citation

Onan E., Bozaci İ., DELİGÖZ BİLDACI Y., Karakaya S. P. Y., Kozanoglu R., KAZANCIOĞLU R., "Artificial Intelligence Chatbots in Peritoneal Dialysis Education: A Cross-Sectional Comparative Study of Quality, Readability, and Reliability", Journal of Clinical Medicine, cilt.15, sa.2, 2026

Endorsement

Review

Supplemented By

Referenced By

0

Views

0

Downloads

View PlumX Details


Sustainable Development Goals