The Trouble with Large Language Models and How to Address AI “Lying”
Even as AI systems become more advanced and enmeshed in daily operations, concerns regarding whether large language models (LLMs) are generating accurate and true information remain paramount throughout the business landscape. Unfortunately, the potential for AI to generate false or misleading information—often referred to as AI “hallucinations”—is very real, and though the possibility poses some significant cybersecurity challenges, there are ways organizations deploying this technology can mitigate the risks.
AI governance has been a hot topic of late, one that’s left many scrambling to strike a balance between positive utilization of AI’s capabilities and ensuring these applications do no harm. As cybersecurity experts, it’s part of our job to stay apprised of these issues so that we can best assist our clients, and we do have some insight here to provide.
In this blog post, we’ll explore the “lying” perpetuated by AI, including why it happens and how it can be reduced, before getting into specific measures cybersecurity teams can take to further mitigate the risk.
Why Do LLMs Lie? It’s All About the Data.
Though AI continues to evolve quickly, these models remain only as good as the data on which they’re trained. Large language models like those from OpenAI or Anthropic, are incredibly powerful tools, but they’re still just that—tools that can only respond based on the data on which they’ve been trained.
So, if that data is flawed or biased, or if the model lacks proper guardrails to handle the information, AI outputs can easily become inaccurate, deceptive, or even harmful—when these applications “lie,” it’s often because they’re overfitting to bad data or because there are gaps in how they’ve been trained to evaluate information.
All that creates huge risk. Imagine a scenario where you’ve entrusted an AI-powered system to make critical decisions, like identifying threats or managing responses to attacks. If it instead generates false information, that could delay responses, escalate a situation unnecessarily, or—the worst-case scenario—allow adversaries to manipulate the AI to overlook real threats or generate false alerts, essentially using the AI’s vulnerabilities against your organization and users.
To avoid any of this fallout, a primary focus must be on your data pipeline—both the information being fed to these models and how you handle it.
How to Mitigate Risks Posed by AI That May "Lie" or Intentionally Mislead
Aside from secure and ethical data management, there are other ways to reduce the risks of AI “lying.”
The Presidio AI Framework—just one of three papers released by the WEF AI Governance Alliance, an initiative created “to champion responsible global design and release of transparent and inclusive AI systems”—strongly emphasizes the importance of setting guardrails throughout the AI development lifecycle, and setting yours should start at the beginning, during the foundation model-building phase.
To be proactive in avoiding harmful AI outputs, developers can use APIs to adjust parameters like the model's "temperature"—lowering the temperature can make the model more predictable and better control how creative or deterministic the AI outputs are. Developers should also document detailed summaries of the training data while implementing mechanisms within the AI to detect and mitigate misleading outputs.
Such guardrails might include safeguards like model watermarking and drift monitoring that will help you detect when your model’s behavior starts to deviate from its intended purpose, but you might also consider AI red teaming. Otherwise known as AI-specific penetration testing, these assessments involve simulating attacks on your AI models to see how they respond— how they interact with and impact the application it's powering—while identifying any persisting vulnerabilities before deployment.
This type of multi-faceted, proactive approach will help you catch early signs of your model’s potential “lying” or yielding misleading outputs, but your organization should combine such with reactive strategies too.
On that front, continuous monitoring and logging are essential—you can’t just deploy an AI model and assume it will work perfectly forever. You must maintain ongoing validation and recalibration, especially if you’re operating in high-risk environments like healthcare or law enforcement, where the consequences of misleading information can be severe.
What Frameworks Can Help with Securing AI and Ensuring Accurate Outputs?
All these measures, both proactive and reactive, can help detect and mitigate AI hallucinations and other issues before they develop into real problems, but for extra validation—and assurance for your customers—you can also implement more comprehensive frameworks and/or best practices.
The AI governance world is a bustling one with new developments all the time, and already you have several options to choose from to better secure and strengthen your AI models.
Most prominent among them is ISO 42001, which—since its release in 2023—has become de facto flagship standard for AI risk management. To achieve the framework’s goal of safe, ethical, and secure deployment of AI systems, organizations are asked to build and implement an artificial intelligence management system (AIMS) that, if it meets all ISO 42001 requirements, empowers you to comprehensively address and control the risks AI presents.
Though ISO 42001 is solely focused on AI, ISO 9001—and its separate, specific concentration on promoting customer confidence in the quality and reliability of products and services—actually makes a great complement to ISO 42001. Why? We spoke before regarding the importance of data management in preventing AI “lying” and bias. ISO 9001 certification and its specific guidelines in data-related areas can help data- and process-driven organizations achieve the necessary transparency and confidence in their AI systems.
Of course, ISO certifications can be resource-intensive, but they’re not the only path to effective management of AI—you do have other valuable options for guidance and best practices regarding safeguarding your AI systems:
- NIST's AI Risk Management Framework: Meant to provide a voluntary, adaptable guide for organizations seeking to better manage AI risks, NIST’s AI RMF focuses on governance, risk identification, continuous measurement, and mitigation strategies so that AI systems are trustworthy, transparent, and unbiased.
- HITRUST AI Risk Management Assessment: Though primarily intended to assist healthcare organizations, the HITRUST AI RM assessment could serve as an important initial benchmark to help evaluate risks within your AI processes, policies, and systems.
- WEF AI Governance Alliance Briefing Papers: We mentioned them earlier, but this series of three papers regarding responsible AI development, implementation, and governance was released with the intent to create a transparent future where AI only enhances societal progress.
- Guidelines for Secure AI System Development: Developed by the UK NCSC and the U.S. CISA together with other groups from around the world, these guidelines are intended to aid in the development of secure AI systems so that system risks are reduced before security issues arise.
Moving Forward into a Future with Safeguarded AI
No matter which framework or guidance you opt to use for assistance, remember that data quality/provenance and a combination of proactive and reactive security strategies will be essential to avoid AI “lying” and ensuring that outputs are accurate and trustworthy instead.
Trust in AI systems isn’t a given—it has to be built and maintained through rigorous processes and continuous oversight. As you ponder how your organization can best do that with your resources and specifics, check out our other content that can shed light on possible avenues and considerations:
About AVANI DESAI
Avani Desai is the CEO at Schellman. Avani has more than 15 years of experience in IT attestation, risk management, compliance and privacy. Avani’s primary focus is on emerging healthcare issues and privacy concerns for organizations. Named as one of the 2017 Global Leaders in Consulting by Consulting Magazine she has also been featured and published in the ISSA Journal, ITSP Magazine, ISACA Journal, Information Security Buzz, Healthcare Tech Outlook, and many more. Avani also sits on the board of Catalist, a not for profit that empowers women by supporting the creation, development and expansion of collective giving through informed grantmaking. In addition, she is co-chair of 100 Women Strong, a female only venture philanthropic fund to solve problems related to women and children in the community.