A Plain-Language Guide to the AI Fabrication Crisis
From Government Policy Scandals to Your Everyday Use
Published 5 May 2026 · 20 min read
In the last week of April 2026, two South African government departments had major policy documents publicly pulled or amended because they contained references that were fabricated by AI tools. The Draft National AI Policy was withdrawn on 26 April after at least six of its 67 academic citations were found to be completely made up. Four days later, the Department of Home Affairs suspended two senior officials and ordered a full audit of every policy document produced since November 2022 after an investigation found that 102 of 148 references in the Revised White Paper on Citizenship, Immigration and Refugee Protection were fake.
These incidents are embarrassing. But they are not unique, and they are not a reason to fear government or AI. They are a reason to understand how these tools work, why they sometimes make things up, and how anyone - from a Cabinet minister to a student writing an essay - can use AI safely by building one habit into their workflow: checking.
This report explains what happened, why it happened, and how you can avoid making the same mistake.
On 25 March 2026, Cabinet approved the Department of Communications and Digital Technologies' Draft National Artificial Intelligence Policy. On 10 April, it was published in the Government Gazette for a 60-day public comment period. The 86-page document proposed a framework for governing AI in South Africa, including new oversight bodies and an insurance fund for citizens harmed by AI systems.
It did not survive a bibliography check.
The civic technology group Article One flagged several citations that did not appear to exist. Investigative journalists then checked all 67 academic references. At least six were fabricated. The journals were real - publications like the South African Journal of Philosophy and AI & Society - but the editors confirmed independently that the specific articles cited in the policy had never been published. The authors named in the citations were real academics. The papers attributed to them were not.
On 26 April, Communications Minister Solly Malatsi withdrew the policy. In a public statement, he said the most plausible explanation was that AI-generated citations had been included without proper verification, and called it an "unacceptable lapse" that proved why human oversight of AI is critical.
On 30 April, Director-General Nonkqubela Jordan-Dyani placed two officials on precautionary suspension. According to reporting by Rapport, the officials are the department's AI policy lead and a deputy director-general - both pre-dating Malatsi's July 2024 ministerial appointment.
The irony is difficult to overstate: the country's AI policy - the document that was supposed to set the rules for responsible AI use - was itself undermined by irresponsible AI use. No new timeline for a redraft has been announced.
Want the full picture?
We've written a dedicated analysis of what South Africa's Draft AI Policy means for legal practitioners - from compliance obligations to the tools you're already using. Read: South Africa's Draft AI Policy Just Made Your Choice of Legal Tech a Compliance Decision →
Four days after the AI policy was withdrawn, the Department of Home Affairs confirmed a second, arguably larger, AI hallucination scandal.
News24 had investigated the reference list of the Revised White Paper on Citizenship, Immigration and Refugee Protection - a document that Minister Leon Schreiber had championed as the most fundamental reform of the country's citizenship and immigration framework in a generation, and which Cabinet had approved on 3 April 2026. Of 148 references checked, 102 were fabricated or could not be verified.
The department's official statement was unusually direct. It announced the immediate suspension of a Chief Director and a Director involved in the drafting process. It also announced three significant steps: the appointment of two independent law firms to manage both the disciplinary process and a full review of all policy documents produced by the department dating back to 30 November 2022 (the date ChatGPT was released to the public); the design and implementation of AI checks and declarations as part of internal approval processes; and the withdrawal of the reference list, which the department said had been "generated and attached to the document after the fact" and was not cited in the body of the text.
The department insisted that the body of the white paper continued to accurately reflect the government's position, because the policy reforms came from an extensive process of cross-departmental collaboration and public consultation and were not materially affected by the fabricated reference list.
The same day, Schreiber - in his role as DA Coordinator in the National Executive - issued a separate statement directing Democratic Alliance ministers to urgently implement AI verification in their departments, and said he would raise the issue at the next Cabinet meeting for government-wide adoption.
The retroactive audit going back to November 2022 is significant. It acknowledges the obvious: if officials used a chatbot for one Cabinet-level document, they almost certainly used it for others.
An AI hallucination is what happens when an artificial intelligence tool produces output that sounds confident, reads fluently, and appears authoritative - but is factually wrong or completely made up. IBM defines it as a phenomenon where the AI perceives patterns or generates information that is nonexistent or imperceptible to human observers.
Think of it this way: imagine you ask a very well-read friend to recommend a book. Instead of admitting they can't think of one, they make up a title, attribute it to a real author, and describe it so convincingly that you go looking for it at the bookshop - only to find it doesn't exist. Your friend wasn't lying to hurt you. They were trying to be helpful. But the result is the same: you were given false information delivered with total confidence.
That is what AI hallucination looks like in practice.
In the government cases, the AI tools didn't just invent random text. They fabricated academic citations that followed the exact format of real citations - complete with real journal names, real author names, and plausible-sounding article titles. As Wits University's analysis put it, the policy "did not just invent sources. It manufactured seemingly credible African scholarly authority." The journals were real. The authors were real. The papers were fiction.
Tools like ChatGPT, Claude, and Gemini are Large Language Models. They were trained by reading billions of pages of text from the internet, books, and academic papers. But they don't store that information as a filing cabinet of facts. Instead, they learn patterns - which words tend to follow other words, how sentences are structured, what a legal citation looks like, how academic papers are typically referenced.
When you ask one of these tools a question, it doesn't look up the answer in a database. It predicts the most statistically likely next word, and then the next, and the next, until it has composed a response. It is, fundamentally, a very sophisticated pattern-completion engine.
Here is the critical distinction: these tools are optimised for fluency, not for accuracy. When you ask for an academic citation, the model has seen millions of citations during training. It knows what one looks like - author, year, journal name, volume, page number. So it generates something that fits that template perfectly. The problem is that the actual paper might not exist.
OpenAI's own research explains this directly: the way models are evaluated tends to reward confident answers over honest uncertainty. The models are, in effect, trained to guess rather than to say "I don't know."
Modern AI tools are fine-tuned using a process called Reinforcement Learning from Human Feedback, where human reviewers rate the AI's responses and the model learns to produce answers that get higher ratings. The result is that these tools have a built-in tendency to tell you what you want to hear. If you ask for evidence supporting a particular argument, the AI is inclined to produce supporting evidence - even if it has to invent it - rather than telling you no such evidence exists.
This is likely what happened in the government cases. Drafters almost certainly prompted an AI tool to find academic support for their policy positions. The tool, unable to find real citations that perfectly matched the request, generated plausible-sounding ones instead.
These are not edge cases or bugs that will be fixed in the next update. They are features of how the technology works.
Not all AI tools are created equal, and understanding the difference matters for how you assess risk.
These include ChatGPT, Claude, Gemini, Grok, and Microsoft Copilot. They are trained on the open internet and designed to be conversational generalists - capable of discussing anything from cooking recipes to constitutional law. Their strength is versatility. Their weakness is that they have no built-in mechanism for verifying whether what they tell you is true. When they generate a citation, they are pattern-matching, not searching a database.
For factual claims, references, legal authorities, medical information, or any content where accuracy matters, these tools are unreliable unless every claim is independently verified.
These are products built for specific professional fields - legal research tools like Lexis+ AI or Westlaw AI-Assisted Research, for example. They typically combine a language model with a verified, curated database using a technique called Retrieval-Augmented Generation (RAG). In simple terms: the tool searches real sources first, then asks the AI to summarise what it found.
These tools are significantly more reliable than generic chatbots. But they are not infallible. Research has shown that even the best of these tools still hallucinate a meaningful proportion of the time. And as the Northbound Processing case in South Africa demonstrated, a locally marketed "sovereign" legal AI tool called Legal Genius - which claimed to be exclusively trained on South African law - still generated fabricated case citations that were submitted to the Gauteng High Court.
If you are not sure whether a tool is generic or specialised, here is a simple test: does the tool show you the original source it drew from? Can you click through to the actual document, judgment, or article? If the answer is yes - and the source opens to a real, verifiable page - you are likely using a retrieval-based tool. If the answer is no, and the tool simply presents text without linking to primary sources, you are using a generic model and need to verify everything yourself.
As a rule of thumb: ChatGPT, Claude, Gemini, Grok, Microsoft Copilot, and Google's AI Overviews are generic tools. Products marketed specifically for law, medicine, or finance - especially those that require a professional subscription and show their sources - tend to be specialised. But "specialised" does not mean "safe." It means "safer." Verification is still required.
The takeaway
Specialised tools are better than generic ones for professional work. But no AI tool currently available is reliable enough to be used without human verification of every factual claim. The government officials who got into trouble were using generic tools for work that demanded verified accuracy. That was the root of the problem.
Before we go further into what went wrong, it is worth being clear about what AI does well. These tools are not inherently dangerous. They are dangerous when they are used for the wrong task, or when their output is treated as final without being checked.
The key insight is this: AI is safest when you are using it for tasks where you would notice if it got something wrong. If you ask it to draft an email, you will read the email before sending it and catch any errors. If you ask it to generate a list of academic sources, you might not recognise whether those sources are real - and that is where the danger lies.
It is easy to look at these incidents and think this is a government competence issue. It is not. It is a human-and-technology issue, and it has already affected lawyers, consultants, academics, and journalists around the world.
South African courts have been dealing with AI-hallucinated legal citations since 2023. In Parker v Forsyth (June 2023), a Johannesburg magistrate found that eight case authorities submitted by a law firm were entirely fictitious - generated by ChatGPT and never verified. In Mavundla v MEC (January 2025), a KwaZulu-Natal High Court judge found that seven of nine cited authorities were fabricated, referred the legal team to the Legal Practice Council for investigation, and described the reliance on unverified AI output as "irresponsible and downright unprofessional." In Northbound Processing (June 2025), the Gauteng High Court dealt with fabricated citations from a specialised South African legal AI tool and again referred the matter to the Legal Practice Council.
Read our full analysis
We've documented all three South African cases in detail, including court findings, referrals to the Legal Practice Council, and what they mean for the profession. Read: The Erosion of Stare Decisis - AI Hallucinations in South African Jurisprudence →
The case that put AI hallucinations on the global map was Mata v. Avianca (New York, 2023), where two attorneys were each fined US$5,000 for filing a brief containing ChatGPT-fabricated case law. Since then, the problem has grown enormously. According to the database maintained by French researcher Damien Charlotin, over 1,300 court filings worldwide have been identified as containing AI-fabricated content as of late 2025.
In May 2025, the Trump administration's flagship children's health document - the MAHA Report, produced under Health Secretary Robert F. Kennedy Jr. - was found by multiple news outlets to contain AI-hallucinated citations, including fabricated study titles and mischaracterised research findings. The White House quietly republished a corrected version.
In October 2025, Deloitte Australia was forced to refund the final instalment of its AU$440,000 (approximately R5.4 million) contract with the Department of Employment and Workplace Relations after a University of Sydney researcher identified fabricated academic references in a report on welfare automation. Deloitte subsequently disclosed it had used Azure OpenAI's GPT-4o to assist with the report.
In every case, the pattern is the same: a professional used a generic AI tool to produce work that required verified accuracy, did not check the output, and was caught. The tool was not the problem. The missing step - verification - was the problem.
What is encouraging about this pattern is that the failure was never one of competence. The government officials, lawyers, and consultants involved were perfectly capable of checking citations. They simply did not do it - either because they trusted the tool too much, or because they did not understand that checking was necessary. Once you understand that it is necessary, the actual process of checking is straightforward.
AI tools are genuinely useful. They can help you brainstorm ideas, structure documents, summarise long texts, draft emails, and explain complex topics. The point of this report is not to tell you to stop using them. It is to help you use them well - and to show you that checking AI output is not a specialised skill. It is something anyone with a phone and five minutes can do.
The techniques below are ordered from simplest to most thorough. You do not need to do all of them every time. The right level of checking depends on how much the output matters.
This sounds obvious, but it is the step that was skipped in every single case in this report. Read the AI's output with a simple question in mind: "Does this sound like something I can verify, or does it sound like something I'm being asked to take on faith?" If a paragraph contains a specific claim - a statistic, a name, a date, a source - that claim needs a quick check before you pass it on.
You do not need a university library card or an expensive database subscription. Here is exactly what to do:
That is it. These three steps take less than two minutes per citation and would have caught every single fabricated reference in both the AI policy and the Home Affairs white paper.
Ask the AI the same question twice, phrasing it differently each time. If it gives you substantially different facts or different sources, neither answer is reliable.
This is one of the most effective techniques available to everyday users, and it costs nothing. Copy the output from ChatGPT (or whichever tool you used) and paste it into a different AI tool - Claude, Gemini, or another chatbot. Then type: "Please review the following text. Identify every specific factual claim (names, dates, statistics, citations) and tell me which ones you can verify and which ones you cannot."
AI tools are often better at critiquing text than producing it. Where the two tools disagree, you have found something worth investigating further.
AI tools frequently invent statistics, financial figures, percentages, and dates. If a number will appear in anything important - a presentation, a report, a submission - search for it independently. Check whether the figure appears on the website of the institution the AI attributes it to. If you cannot find it there, do not use it.
To show how straightforward this is, let's walk through what the Home Affairs checking process should have looked like for a single citation.
Suppose the AI produced this reference:
Moyo, T. and Ndlovu, S. (2021). "Digital Identity Systems and Citizenship in Post-Colonial Africa." Journal of African Law, 65(2), pp. 234–251.
This looks legitimate. The journal (Journal of African Law) is real. The authors' names are plausible. The topic fits perfectly. Here is how you check it:
Total time to verify that this reference is an AI Hallucination: under two minutes. This is not specialist work. It requires no training and no subscription. It is a Google search with quotation marks.
If the officials drafting the Home Affairs white paper had done this for even five of their 148 references, they would have discovered the problem immediately - and avoided a national scandal.
Not everything the AI produces needs the same level of scrutiny. The right amount of checking depends on what you are using the output for.
If you are using AI to draft a casual email, brainstorm ideas for a birthday party, summarise a podcast for your own notes, or structure your thoughts before a meeting - a quick read-through for anything that sounds wrong is sufficient. You are the audience, or the stakes are low enough that an error is easily corrected.
If you are using AI output in a work presentation, a blog post, a client email, or a university assignment - check every specific claim. That means: verify any statistic, date, name, or quoted source. A practical rule of thumb: if a sentence contains a number, a name, or a source, check it.
If you are producing a legal submission, a government policy document, a medical report, a financial filing, a tender document, or anything else where errors have legal, financial, or reputational consequences - verify every factual claim, open every citation, and have a second person review the document independently.
This is what was missing in both the AI policy and the Home Affairs cases. It is also what was missing in the Parker, Mavundla, and Northbound court cases. In all of these, the high-stakes nature of the output demanded full verification, and full verification was not done.
The one-sentence rule
If you are not sure how much checking a particular use case requires, ask yourself: "If this turns out to be wrong, what happens?" If the answer is "nothing much" - check lightly. If the answer is "I would be embarrassed" - check the key facts. If the answer is "there could be professional, legal, or financial consequences" - check everything, and have someone else check it too.
The government's response to these scandals - particularly Home Affairs' decision to appoint independent auditors and retrospectively review documents dating back to November 2022 - provides a template that any organisation using AI should consider adapting.
Every document produced with AI assistance should say so. Name the tool, identify the sections where it was used, and record who verified the output. This is not about shaming AI use - it is about creating accountability. The Department of Home Affairs is now implementing this. Every other organisation should too.
Every factual claim, citation, and reference in a high-stakes document should be checked by a named person whose identity appears on an audit trail. The lesson from both the Mavundla judgment and the Home Affairs scandal is the same: you cannot delegate verification to a machine.
AI is excellent for some tasks and dangerous for others. Drafting, summarising, brainstorming, and structuring are low-risk. Generating citations, producing legal research, creating financial projections, or writing medical guidance without verification are high-risk. A written policy should make the distinction clear.
Before any important document is finalised, a designated person should open five random citations or factual claims and verify them against primary sources. If any are wrong, the entire document goes back for a full review. This is cheap, fast, and effective.
Show people what hallucinated citations actually look like. Walk them through the Parker, Mavundla, and Home Affairs cases. The more familiar people are with how these fabrications present themselves, the better they become at spotting them instinctively.
If your organisation has been using AI tools since late 2022, consider conducting a retrospective review of important documents produced during that period - exactly as Home Affairs is now doing. It is better to find problems proactively than to have them found for you.
AI hallucinations are not a glitch. They are not a bug that will be patched in the next software update. They are a structural feature of how large language models work - a consequence of tools that are designed to predict plausible language rather than retrieve verified facts.
But that does not make them unmanageable. It makes them predictable.
The South African government's experience is a cautionary tale, but it is also a useful one. The government's response - particularly Home Affairs' swift suspensions, independent audits, and AI declaration requirements - shows exactly the kind of institutional discipline that these tools demand. The fact that the problem was caught, disclosed, and acted upon is a sign of functioning accountability, not of systemic failure.
The lesson extends far beyond government. Anyone who uses a generic AI tool to produce work that requires factual accuracy - a lawyer drafting heads of argument, a consultant writing a report, a student completing an assignment, a journalist fact-checking a story, or a business owner preparing a tender document - faces exactly the same risk.
But the fix is simple, and it is the same fix in every case: check. Not everything, not all the time - but the facts that matter, using the free tools described in this report. A two-minute Google Scholar search would have prevented every scandal described in these pages.
AI is a powerful tool for thinking, drafting, and exploring ideas. It is not a reliable tool for producing verified facts. The moment you understand that distinction - and build a quick verification step into your workflow - you are no longer at risk of becoming the next headline.
The tools are powerful. The checking is easy. The habit is what makes the difference.
Researched with the assistance of AI and reviewed by Squire's legal and editorial team.
Updating...
Please wait while we reconnect