AI Diagnostic Accuracy Compromised by Typographical Errors and Informal Language
Artificial Intelligence (AI) systems, designed to aid healthcare workers in analyzing patient records, may be influenced by common human typing errors and slang, a recent study has found. The study, led by Abinitha Gourabathina, a graduate student with the MIT Department of Electrical Engineering and Computer Science, was presented at a medical meeting and is considered preliminary until published in a peer-reviewed journal.
The research revealed that alterations in text, including colorful language and slang, increased the likelihood of AI recommending self-care by 7 to 9 percent. Colorful language includes exclamations like "wow," or adverbs like "really" or "very." Uncertain language, such as "kind of," "sort of," "possibly," or "suppose," also had an impact, albeit to a lesser extent.
Interestingly, the study found that these errors were more likely to change treatment recommendations for women, resulting in a higher percentage who were erroneously advised not to seek medical care. AI models made about 7 percent more errors for female patients and were more likely to recommend self-care for women, even when gender cues were removed.
However, the study did not find any significant impact of typos and extra white spaces on AI's treatment recommendations. The medical datasets used to train AI models were found to be not a realistic reflection of the patient population.
To mitigate these issues, several strategies can be employed. Implementing robust preprocessing techniques can help clean and normalize data, reducing the impact of typing errors and informal language. Ensuring that training datasets include a variety of linguistic styles and errors can improve AI's ability to handle real-world data. Incorporating human oversight and review processes can also help catch errors and ensure that AI-driven recommendations are accurate and appropriate.
The study used patient notes that preserved all clinical data, like prescription medications and previous diagnoses, while adding language that more accurately reflects how people type and speak. In future research, the team plans to test records that better mimic real messages from patients and study how AI infers gender from clinical tests.
In conclusion, the use of common human typing errors, slang, and non-standard language in medical record analysis can significantly impact the accuracy of AI systems. By addressing these challenges proactively, AI systems can be made more robust and reliable in medical record analysis and treatment recommendations.
- The increased usage of colorful language and slang in health-and-wellness records can lead AI systems to recommend self-care therapies and treatments by up to 9 percent.
- AI models are found to make more errors in treatment recommendations for female patients when common human typing errors and informal language are present, and these errors are less prevalent for male patients.
- To improve the reliability of AI systems in mental-health and science, it's essential to implement preprocessing techniques, diversify training datasets with various linguistic styles and errors, and incorporate human oversight to ensure accuracy in recommendations.