Squire.law · When the Robot Writes the Rules: AI Hallucinations Explained

In the last week of April 2026, two South African government departments had major policy documents publicly pulled or amended because they contained references that were fabricated by AI tools. The Draft National AI Policy was withdrawn on 26 April after at least six of its 67 academic citations were found to be completely made up. Four days later, the Department of Home Affairs suspended two senior officials and ordered a full audit of every policy document produced since November 2022 after an investigation found that 102 of 148 references in the Revised White Paper on Citizenship, Immigration and Refugee Protection were fake.

These incidents are embarrassing. But they are not unique, and they are not a reason to fear government or AI. They are a reason to understand how these tools work, why they sometimes make things up, and how anyone - from a Cabinet minister to a student writing an essay - can use AI safely by building one habit into their workflow: checking.

This report explains what happened, why it happened, and how you can avoid making the same mistake.

On 25 March 2026, Cabinet approved the Department of Communications and Digital Technologies' Draft National Artificial Intelligence Policy. On 10 April, it was published in the Government Gazette for a 60-day public comment period. The 86-page document proposed a framework for governing AI in South Africa, including new oversight bodies and an insurance fund for citizens harmed by AI systems.

It did not survive a bibliography check.

The civic technology group Article One flagged several citations that did not appear to exist. Investigative journalists then checked all 67 academic references. At least six were fabricated. The journals were real - publications like the South African Journal of Philosophy and AI & Society - but the editors confirmed independently that the specific articles cited in the policy had never been published. The authors named in the citations were real academics. The papers attributed to them were not.

On 26 April, Communications Minister Solly Malatsi withdrew the policy. In a public statement, he said the most plausible explanation was that AI-generated citations had been included without proper verification, and called it an "unacceptable lapse" that proved why human oversight of AI is critical.

On 30 April, Director-General Nonkqubela Jordan-Dyani placed two officials on precautionary suspension. According to reporting by Rapport, the officials are the department's AI policy lead and a deputy director-general - both pre-dating Malatsi's July 2024 ministerial appointment.

The irony is difficult to overstate: the country's AI policy - the document that was supposed to set the rules for responsible AI use - was itself undermined by irresponsible AI use. No new timeline for a redraft has been announced.

Want the full picture?

We've written a dedicated analysis of what South Africa's Draft AI Policy means for legal practitioners - from compliance obligations to the tools you're already using. Read: South Africa's Draft AI Policy Just Made Your Choice of Legal Tech a Compliance Decision →

Four days after the AI policy was withdrawn, the Department of Home Affairs confirmed a second, arguably larger, AI hallucination scandal.

News24 had investigated the reference list of the Revised White Paper on Citizenship, Immigration and Refugee Protection - a document that Minister Leon Schreiber had championed as the most fundamental reform of the country's citizenship and immigration framework in a generation, and which Cabinet had approved on 3 April 2026. Of 148 references checked, 102 were fabricated or could not be verified.

The department's official statement was unusually direct. It announced the immediate suspension of a Chief Director and a Director involved in the drafting process. It also announced three significant steps: the appointment of two independent law firms to manage both the disciplinary process and a full review of all policy documents produced by the department dating back to 30 November 2022 (the date ChatGPT was released to the public); the design and implementation of AI checks and declarations as part of internal approval processes; and the withdrawal of the reference list, which the department said had been "generated and attached to the document after the fact" and was not cited in the body of the text.

The department insisted that the body of the white paper continued to accurately reflect the government's position, because the policy reforms came from an extensive process of cross-departmental collaboration and public consultation and were not materially affected by the fabricated reference list.

The same day, Schreiber - in his role as DA Coordinator in the National Executive - issued a separate statement directing Democratic Alliance ministers to urgently implement AI verification in their departments, and said he would raise the issue at the next Cabinet meeting for government-wide adoption.

The retroactive audit going back to November 2022 is significant. It acknowledges the obvious: if officials used a chatbot for one Cabinet-level document, they almost certainly used it for others.

An AI hallucination is what happens when an artificial intelligence tool produces output that sounds confident, reads fluently, and appears authoritative - but is factually wrong or completely made up. IBM defines it as a phenomenon where the AI perceives patterns or generates information that is nonexistent or imperceptible to human observers.

Think of it this way: imagine you ask a very well-read friend to recommend a book. Instead of admitting they can't think of one, they make up a title, attribute it to a real author, and describe it so convincingly that you go looking for it at the bookshop - only to find it doesn't exist. Your friend wasn't lying to hurt you. They were trying to be helpful. But the result is the same: you were given false information delivered with total confidence.

That is what AI hallucination looks like in practice.

In the government cases, the AI tools didn't just invent random text. They fabricated academic citations that followed the exact format of real citations - complete with real journal names, real author names, and plausible-sounding article titles. As Wits University's analysis put it, the policy "did not just invent sources. It manufactured seemingly credible African scholarly authority." The journals were real. The authors were real. The papers were fiction.

AI tools don't have a database of facts

Tools like ChatGPT, Claude, and Gemini are Large Language Models. They were trained by reading billions of pages of text from the internet, books, and academic papers. But they don't store that information as a filing cabinet of facts. Instead, they learn patterns - which words tend to follow other words, how sentences are structured, what a legal citation looks like, how academic papers are typically referenced.

When you ask one of these tools a question, it doesn't look up the answer in a database. It predicts the most statistically likely next word, and then the next, and the next, until it has composed a response. It is, fundamentally, a very sophisticated pattern-completion engine.

They are built to sound right, not to be right

Here is the critical distinction: these tools are optimised for fluency, not for accuracy. When you ask for an academic citation, the model has seen millions of citations during training. It knows what one looks like - author, year, journal name, volume, page number. So it generates something that fits that template perfectly. The problem is that the actual paper might not exist.

OpenAI's own research explains this directly: the way models are evaluated tends to reward confident answers over honest uncertainty. The models are, in effect, trained to guess rather than to say "I don't know."

The "people-pleaser" problem

Modern AI tools are fine-tuned using a process called Reinforcement Learning from Human Feedback, where human reviewers rate the AI's responses and the model learns to produce answers that get higher ratings. The result is that these tools have a built-in tendency to tell you what you want to hear. If you ask for evidence supporting a particular argument, the AI is inclined to produce supporting evidence - even if it has to invent it - rather than telling you no such evidence exists.

This is likely what happened in the government cases. Drafters almost certainly prompted an AI tool to find academic support for their policy positions. The tool, unable to find real citations that perfectly matched the request, generated plausible-sounding ones instead.

These are not edge cases or bugs that will be fixed in the next update. They are features of how the technology works.

Not all AI tools are created equal, and understanding the difference matters for how you assess risk.

Generic (general-purpose) AI tools

These include ChatGPT, Claude, Gemini, Grok, and Microsoft Copilot. They are trained on the open internet and designed to be conversational generalists - capable of discussing anything from cooking recipes to constitutional law. Their strength is versatility. Their weakness is that they have no built-in mechanism for verifying whether what they tell you is true. When they generate a citation, they are pattern-matching, not searching a database.

For factual claims, references, legal authorities, medical information, or any content where accuracy matters, these tools are unreliable unless every claim is independently verified.

Specialised (domain-specific) AI tools

These are products built for specific professional fields - legal research tools like Lexis+ AI or Westlaw AI-Assisted Research, for example. They typically combine a language model with a verified, curated database using a technique called Retrieval-Augmented Generation (RAG). In simple terms: the tool searches real sources first, then asks the AI to summarise what it found.

These tools are significantly more reliable than generic chatbots. But they are not infallible. Research has shown that even the best of these tools still hallucinate a meaningful proportion of the time. And as the Northbound Processing case in South Africa demonstrated, a locally marketed "sovereign" legal AI tool called Legal Genius - which claimed to be exclusively trained on South African law - still generated fabricated case citations that were submitted to the Gauteng High Court.

How to tell which type you are using

If you are not sure whether a tool is generic or specialised, here is a simple test: does the tool show you the original source it drew from? Can you click through to the actual document, judgment, or article? If the answer is yes - and the source opens to a real, verifiable page - you are likely using a retrieval-based tool. If the answer is no, and the tool simply presents text without linking to primary sources, you are using a generic model and need to verify everything yourself.

As a rule of thumb: ChatGPT, Claude, Gemini, Grok, Microsoft Copilot, and Google's AI Overviews are generic tools. Products marketed specifically for law, medicine, or finance - especially those that require a professional subscription and show their sources - tend to be specialised. But "specialised" does not mean "safe." It means "safer." Verification is still required.

The takeaway

Specialised tools are better than generic ones for professional work. But no AI tool currently available is reliable enough to be used without human verification of every factual claim. The government officials who got into trouble were using generic tools for work that demanded verified accuracy. That was the root of the problem.

Before we go further into what went wrong, it is worth being clear about what AI does well. These tools are not inherently dangerous. They are dangerous when they are used for the wrong task, or when their output is treated as final without being checked.

Use AI confidently for

Brainstorming and idea generation. Ask it to help you come up with angles for a presentation, questions for an interview, or topics for a report. It is excellent at producing a range of starting points quickly.
Structuring and organising. Give it a jumble of notes and ask it to organise them into a logical outline. It is very good at imposing structure on messy thinking.
Drafting and rewording. Ask it to draft an email, rewrite a paragraph in simpler language, or help you find a clearer way to express a complicated idea. This is where these tools genuinely shine.
Summarising long documents. Paste in a long report and ask for the key points. The summary may not be perfect, but it will give you a useful starting framework far faster than reading from scratch.
Explaining concepts. Ask it to explain something you don't understand - a legal principle, a medical term, a financial concept - in plain language. It is usually very good at this, and for general understanding (not for making decisions) it is a powerful learning tool.

Be cautious with

Factual claims about specific people, events, or institutions. It may be right, but it may also be confidently wrong. Always verify claims that matter.
Anything involving numbers. Statistics, financial figures, dates, quantities - these are high-risk areas. The AI often invents plausible-sounding numbers. If a number will influence a decision, check it.
Recent events. Most AI tools have a training cutoff date. They may not know about something that happened last month, and they will sometimes fill the gap with invented details rather than admitting they don't know.

Never rely on AI for

Citations, references, or sources. This is the single highest-risk use case - and it is exactly what tripped up both the AI policy drafters and the Home Affairs officials. If you need a source, find it yourself first, then use AI to help you summarise or format it.
Legal, medical, or financial advice you intend to act on. Use AI to help you understand concepts or prepare questions for a professional, but do not treat its output as authoritative guidance.
Any claim you plan to present as verified fact. If your name will be on the document, the responsibility for accuracy is yours, not the AI's.

The key insight is this: AI is safest when you are using it for tasks where you would notice if it got something wrong. If you ask it to draft an email, you will read the email before sending it and catch any errors. If you ask it to generate a list of academic sources, you might not recognise whether those sources are real - and that is where the danger lies.

It is easy to look at these incidents and think this is a government competence issue. It is not. It is a human-and-technology issue, and it has already affected lawyers, consultants, academics, and journalists around the world.

South African courts

South African courts have been dealing with AI-hallucinated legal citations since 2023. In Parker v Forsyth (June 2023), a Johannesburg magistrate found that eight case authorities submitted by a law firm were entirely fictitious - generated by ChatGPT and never verified. In Mavundla v MEC (January 2025), a KwaZulu-Natal High Court judge found that seven of nine cited authorities were fabricated, referred the legal team to the Legal Practice Council for investigation, and described the reliance on unverified AI output as "irresponsible and downright unprofessional." In Northbound Processing (June 2025), the Gauteng High Court dealt with fabricated citations from a specialised South African legal AI tool and again referred the matter to the Legal Practice Council.

Read our full analysis

We've documented all three South African cases in detail, including court findings, referrals to the Legal Practice Council, and what they mean for the profession. Read: The Erosion of Stare Decisis - AI Hallucinations in South African Jurisprudence →

The United States

The case that put AI hallucinations on the global map was Mata v. Avianca (New York, 2023), where two attorneys were each fined US$5,000 for filing a brief containing ChatGPT-fabricated case law. Since then, the problem has grown enormously. According to the database maintained by French researcher Damien Charlotin, over 1,300 court filings worldwide have been identified as containing AI-fabricated content as of late 2025.

In May 2025, the Trump administration's flagship children's health document - the MAHA Report, produced under Health Secretary Robert F. Kennedy Jr. - was found by multiple news outlets to contain AI-hallucinated citations, including fabricated study titles and mischaracterised research findings. The White House quietly republished a corrected version.

Australia

In October 2025, Deloitte Australia was forced to refund the final instalment of its AU$440,000 (approximately R5.4 million) contract with the Department of Employment and Workplace Relations after a University of Sydney researcher identified fabricated academic references in a report on welfare automation. Deloitte subsequently disclosed it had used Azure OpenAI's GPT-4o to assist with the report.

The pattern

In every case, the pattern is the same: a professional used a generic AI tool to produce work that required verified accuracy, did not check the output, and was caught. The tool was not the problem. The missing step - verification - was the problem.

What is encouraging about this pattern is that the failure was never one of competence. The government officials, lawyers, and consultants involved were perfectly capable of checking citations. They simply did not do it - either because they trusted the tool too much, or because they did not understand that checking was necessary. Once you understand that it is necessary, the actual process of checking is straightforward.

AI tools are genuinely useful. They can help you brainstorm ideas, structure documents, summarise long texts, draft emails, and explain complex topics. The point of this report is not to tell you to stop using them. It is to help you use them well - and to show you that checking AI output is not a specialised skill. It is something anyone with a phone and five minutes can do.

The techniques below are ordered from simplest to most thorough. You do not need to do all of them every time. The right level of checking depends on how much the output matters.

1. Read it before you use it

This sounds obvious, but it is the step that was skipped in every single case in this report. Read the AI's output with a simple question in mind: "Does this sound like something I can verify, or does it sound like something I'm being asked to take on faith?" If a paragraph contains a specific claim - a statistic, a name, a date, a source - that claim needs a quick check before you pass it on.

2. Check citations and references using free tools

You do not need a university library card or an expensive database subscription. Here is exactly what to do:

Step one: Copy the title. Take the exact title of the article, book, or case the AI has cited. Do not retype it - copy and paste it exactly.
Step two: Search for it. Paste the title into one of these free resources: Google Scholar (search for the exact title in quotation marks - zero results almost certainly means it doesn't exist); SAFLII (free, open access to South African court judgments); or a regular Google search in quotation marks (real papers leave traces across the internet; fabricated papers leave none).
Step three: Confirm the author. If the paper exists, check that the author named by the AI actually wrote it. AI tools frequently attach real author names to fabricated papers.

That is it. These three steps take less than two minutes per citation and would have caught every single fabricated reference in both the AI policy and the Home Affairs white paper.

3. Spot the red flags

"Studies show" with no study named. If the AI writes "research has demonstrated that…" but gives no specific author, institution, or publication, it may be inventing the claim.
Suspiciously perfect evidence. If you asked the AI to support a specific argument and it returns a source that supports your argument exactly, with no caveats or nuance, be suspicious. Real academic research almost never supports one position perfectly.
Too much detail about recent events. AI models have training cutoff dates. Highly specific details about recent events may be invented.
Overly smooth prose in technical sections. Real expert writing tends to be uneven. AI-generated text tends to be uniformly polished. If a technical section reads like a brochure, look more closely.

4. Use the "rephrase test"

Ask the AI the same question twice, phrasing it differently each time. If it gives you substantially different facts or different sources, neither answer is reliable.

5. Use a second AI to check the first

This is one of the most effective techniques available to everyday users, and it costs nothing. Copy the output from ChatGPT (or whichever tool you used) and paste it into a different AI tool - Claude, Gemini, or another chatbot. Then type: "Please review the following text. Identify every specific factual claim (names, dates, statistics, citations) and tell me which ones you can verify and which ones you cannot."

AI tools are often better at critiquing text than producing it. Where the two tools disagree, you have found something worth investigating further.

6. Verify numbers independently

AI tools frequently invent statistics, financial figures, percentages, and dates. If a number will appear in anything important - a presentation, a report, a submission - search for it independently. Check whether the figure appears on the website of the institution the AI attributes it to. If you cannot find it there, do not use it.

To show how straightforward this is, let's walk through what the Home Affairs checking process should have looked like for a single citation.

Suppose the AI produced this reference:

Moyo, T. and Ndlovu, S. (2021). "Digital Identity Systems and Citizenship in Post-Colonial Africa." Journal of African Law, 65(2), pp. 234–251.

This looks legitimate. The journal (Journal of African Law) is real. The authors' names are plausible. The topic fits perfectly. Here is how you check it:

Step one (30 seconds): Open Google Scholar. Type the exact title in quotation marks: "Digital Identity Systems and Citizenship in Post-Colonial Africa". Hit search.
Step two (15 seconds): Look at the results. If the paper is real, it will appear - usually as the first result. If you get zero results, the paper almost certainly does not exist.
Step three (30 seconds): If you want to be thorough, go to the Journal of African Law's website (published by Cambridge University Press) and search their archive for volume 65, issue 2 (2021). If the article is not listed in the table of contents, it is fabricated.

Total time to verify that this reference is an AI Hallucination: under two minutes. This is not specialist work. It requires no training and no subscription. It is a Google search with quotation marks.

If the officials drafting the Home Affairs white paper had done this for even five of their 148 references, they would have discovered the problem immediately - and avoided a national scandal.

Not everything the AI produces needs the same level of scrutiny. The right amount of checking depends on what you are using the output for.

Low stakes: a light read-through is fine

If you are using AI to draft a casual email, brainstorm ideas for a birthday party, summarise a podcast for your own notes, or structure your thoughts before a meeting - a quick read-through for anything that sounds wrong is sufficient. You are the audience, or the stakes are low enough that an error is easily corrected.

Medium stakes: check the key facts

If you are using AI output in a work presentation, a blog post, a client email, or a university assignment - check every specific claim. That means: verify any statistic, date, name, or quoted source. A practical rule of thumb: if a sentence contains a number, a name, or a source, check it.

High stakes: verify everything

If you are producing a legal submission, a government policy document, a medical report, a financial filing, a tender document, or anything else where errors have legal, financial, or reputational consequences - verify every factual claim, open every citation, and have a second person review the document independently.

This is what was missing in both the AI policy and the Home Affairs cases. It is also what was missing in the Parker, Mavundla, and Northbound court cases. In all of these, the high-stakes nature of the output demanded full verification, and full verification was not done.

The one-sentence rule

If you are not sure how much checking a particular use case requires, ask yourself: "If this turns out to be wrong, what happens?" If the answer is "nothing much" - check lightly. If the answer is "I would be embarrassed" - check the key facts. If the answer is "there could be professional, legal, or financial consequences" - check everything, and have someone else check it too.

The government's response to these scandals - particularly Home Affairs' decision to appoint independent auditors and retrospectively review documents dating back to November 2022 - provides a template that any organisation using AI should consider adapting.

Mandate AI declarations

Every document produced with AI assistance should say so. Name the tool, identify the sections where it was used, and record who verified the output. This is not about shaming AI use - it is about creating accountability. The Department of Home Affairs is now implementing this. Every other organisation should too.

Require a named human verifier

Every factual claim, citation, and reference in a high-stakes document should be checked by a named person whose identity appears on an audit trail. The lesson from both the Mavundla judgment and the Home Affairs scandal is the same: you cannot delegate verification to a machine.

Distinguish permitted from prohibited uses

AI is excellent for some tasks and dangerous for others. Drafting, summarising, brainstorming, and structuring are low-risk. Generating citations, producing legal research, creating financial projections, or writing medical guidance without verification are high-risk. A written policy should make the distinction clear.

Build a "tripwire" review step

Before any important document is finalised, a designated person should open five random citations or factual claims and verify them against primary sources. If any are wrong, the entire document goes back for a full review. This is cheap, fast, and effective.

Train your team on real examples

Show people what hallucinated citations actually look like. Walk them through the Parker, Mavundla, and Home Affairs cases. The more familiar people are with how these fabrications present themselves, the better they become at spotting them instinctively.

Run periodic audits

If your organisation has been using AI tools since late 2022, consider conducting a retrospective review of important documents produced during that period - exactly as Home Affairs is now doing. It is better to find problems proactively than to have them found for you.

AI hallucinations are not a glitch. They are not a bug that will be patched in the next software update. They are a structural feature of how large language models work - a consequence of tools that are designed to predict plausible language rather than retrieve verified facts.

But that does not make them unmanageable. It makes them predictable.

The South African government's experience is a cautionary tale, but it is also a useful one. The government's response - particularly Home Affairs' swift suspensions, independent audits, and AI declaration requirements - shows exactly the kind of institutional discipline that these tools demand. The fact that the problem was caught, disclosed, and acted upon is a sign of functioning accountability, not of systemic failure.

The lesson extends far beyond government. Anyone who uses a generic AI tool to produce work that requires factual accuracy - a lawyer drafting heads of argument, a consultant writing a report, a student completing an assignment, a journalist fact-checking a story, or a business owner preparing a tender document - faces exactly the same risk.

But the fix is simple, and it is the same fix in every case: check. Not everything, not all the time - but the facts that matter, using the free tools described in this report. A two-minute Google Scholar search would have prevented every scandal described in these pages.

AI is a powerful tool for thinking, drafting, and exploring ideas. It is not a reliable tool for producing verified facts. The moment you understand that distinction - and build a quick verification step into your workflow - you are no longer at risk of becoming the next headline.

The tools are powerful. The checking is easy. The habit is what makes the difference.

Researched with the assistance of AI and reviewed by Squire's legal and editorial team.

When the Robot Writes the Rules: Understanding AI Hallucinations and Why They Matter

The 55-Second Version

What Happened: The Draft AI Policy

What Happened: The Home Affairs White Paper

What Is an AI Hallucination?

How AI Hallucinations Happen