One of the last projects tied to DoD and under Biden’s administration has come to a successful close. The Chief Digital and Artificial Intelligence Office (CDAO) of the US Department of Defense(DoD) has completed the Crowdsourced AI Red-Teaming (CAIRT) Assurance program pilot. This initiative is dedicated to integrating AI chatbots into military medical services.
The large language model (LLM) chatbots were implemented in the CAIRT program. Additionally, CAIRT assisted the DoD in the development of decentralized, crowdsourced strategies for AI Risk Mitigation and AI Assurance.
Over 200 clinical providers and healthcare analysts contributed to the identification of potential vulnerabilities in the use of AI chatbots in military medical applications. Notably, the pilot identified several hundred potential issues, as reported by the DoD.
To that end, the DoD said, “This exercise will result in repeatable and scalable output via the development of benchmark data sets, which can be used to evaluate future vendors and tools for alignment with performance expectations.”
According to DoD, a non-profit humane intelligence conducted the CAIRT LLM pilot. It achieved this in partnership with the Defense Health Agency (DHA) and the Program Executive Office of Defense Healthcare Management Systems (PEO DHMS).
In addition, the humane intelligence used the red-teaming methodology to identify specific system issues. This involved the internal testing of system resiliency using adversarial techniques.
Furthermore, red-teaming drew participants who are interested in interacting with emerging technologies and as potential future beneficiaries. They were allowed the opportunity to contribute to the enhancement of the systems.
In this program, humane intelligence used crowdsourced red-teaming to address two potential applications in military medicine: clinical note summarization and a medical advisory chatbot.
Although vulnerabilities were identified, DoD highlighted, “these findings will play a crucial role in shaping DoD policies and best practices for responsible use of Generative AI (GenAI), ultimately improving military medical care. If, when fielded, these prospective use cases comprise covered AI defined in OMB M-24-10, they will adhere to all required risk management practices.”
Still, the DoD stated that the CAIRT Assurance Program will keep testing LLMs and AI systems. This will speed up the CDAO’s AI Rapid Capabilities Cell, make the GenAI goal more effective, and help build trust across all DoD use cases.
CDAO’s lead for this initiative, Dr. Matthew Johnson, also said, “Since applying GenAI for such purposes within the DoD is in earlier stages of piloting and experimentation, this program acts as an essential pathfinder for generating a mass of testing data, surfacing areas for consideration, and validating mitigation options that will shape future research, development, and assurance of GenAI systems that may be deployed in the future,”
The upcoming administration is expected to continue these projects. Trump’s team is open-minded about AI even as it looks to compete with China.
Although AI has a hugely beneficial impact on medical science, it is followed by several significant risks and dangers.
For status, AI systems use specific algorithms that require massive datasets to increase accuracy. This method puts sensitive patient data at risk for security, privacy, and confidentiality. Presently, since pharmaceutical and insurance corporations are interested in such datasets, hacking has expanded greatly. Medical file hacking may also be part of a government cyberattack.
In addition, data poisoning, the intentional modification of medical data to induce errors or biases in healthcare, is another major risk of medical data misuse. This hurts medical advice accuracy and reliability. AI using different epidemiological data models, like in the COVID-19 pandemic, may yield diverse results.
Another issue would be a flawed medical algorithm. This may be due to inadequate algorithm testing since there are no established standards to test its validity. For instance, double-blind trials are the most effective way to prove a treatment works.
Still, who would be liable for such a mistake—the personal doctor, the hospital, the equipment provider, or the algorithm developer? Thus, medical errors caused by machine malfunction raise serious legal issues.
AI chatbots playing doctor? 🤖 Not quite yet.
A study reveals their diagnostic skills are shaky at best.
Healthcare AI still needs a brain boost before it can be trusted with your health. 🩺 pic.twitter.com/W2ast8S7iO
— CAD Authority (@CAD_Authority) January 2, 2025
Also, AI may also impair doctor-patient relationships. Therefore, doctors are required to understand AI evaluation and performance to explain its role to patients and reduce patient anxiety.
Finally, there is a phenomenon known as the “lazy doctor” effect. If the physician exclusively employs AI algorithms for diagnosis and treatment, this may result in a progressive, irreversible loss of practical skills, intellectual creativity, and the ability to solve medical problems.
However, people have gotten used to Chatbots in their daily lives. With proper research, AI Chatbots can eliminate the small mistakes that doctors make, hence making the medical space safer.
From Zero to Web3 Pro: Your 90-Day Career Launch Plan