How can engineers reduce AI model hallucinations?

The first of a two-part series discusses best practices to help engineers significantly reduce model hallucinations

Many engineers have adopted generative AI as part of their organization’s digital transformation. They like its tangible business benefits, the breadth of its applications, and its ease of implementation.

Offsetting all this considerable value, generative AI sometimes produces inaccurate, biased or nonsensical output that appears authentic. Such outputs are called hallucinations. The following types of hallucinations occur:

Output doesn’t match what is known to be accurate or true.
Output is not related to the end-user prompt.
Output is internally inconsistent or contains contradictions.

Hallucinations can significantly undermine end-user trust. They arise from various factors, including:

Patchy, insufficient or false training data. It results in the Large Language Model (LLM or model) fabricating information when it’s unsure of the correct answer.
Model lacks proper grounding and context to determine factual inaccuracies.
Excessive model complexity for the application.
Inadequate software testing.
Poorly crafted, imprecise or vague end-user prompts.

AI hallucinations can have significant consequences for real-world applications that erode engineers’ confidence. For example, an AI hallucination can:

Provide inaccurate values leading to an erroneous engineering load calculation and product failure.
Suggest stock trades leading to financial losses.
Incorrectly identify a benign skin lesion as malignant, leading to unnecessary medical interventions.
Contribute to the spread of misinformation.
Inappropriately deny credit or employment.

Organizations can mitigate the risk and frequency of these hallucinations occurring. That avoids embarrassing the company and misleading its customers by adopting multiple strategies, including:

Clear model goal.
Balanced training data.
Accurate training data.
Adversarial fortification.
Sufficient model tuning.
Limit responses.
Comprehensive model testing.
Precision prompts.
Fact-check outputs.
Human oversight.

Let’s explore the first five of these mitigations in more detail.

Clear model goal

Hallucinations occur if the model goal is too general, vague, or confusing. An unclear model goal will make the selection of appropriate training data ambiguous. That leads to an increase in the frequency of hallucinations.

A small amount of team collaboration can often clarify the model goal and reduce hallucinations.

For example, a model goal to verify machine performance is too general. A better goal might be to verify steel lathe or stamp performance.

Balanced training data

Hallucinations occur if the training data used to develop the model is insufficient, unbalanced or includes significant gaps. That leads to edge cases where the model attempts to respond to prompts with inadequate data. Overfitting is the term used to describe a model trained on a limited dataset that can’t make accurate predictions.

Train your model on diverse, representative data that covers a wide range of real-world examples for your application domain. Ensuring your training data is representative may require creating synthetic data. Use a data template to ensure all training data instances conform to a standard data structure. That improves the quality of training data and reduces hallucinations.

For example, in responding to a prompt about copper’s tensile stress, a model that contains primarily data about steel and aluminum performance characteristics will not have sufficient exposure to other metals.

Accurate training data

Many models are trained on data read from many public web pages. Many problems cause inaccurate web information that can lead to hallucinations, including:

Simple spelling and grammatical errors or misunderstandings.
Deliberately vague or erroneous information designed to mislead people.
Information that was correct in the past but has been superseded by updates or new research.
Humour, irony or parody that is easily misunderstood.
Contradictory information due to conflicting opinions or scientific theories.
Errors introduced by translating information from another language.

You can’t fact-check every web page you’ve used to build training data. However, you can fact-check a sample of web pages to estimate the risk of errors and related hallucinations in your training data.

For example, an engineer may prompt a model to verify a calculation. If the second output is different, the engineer may be able to identify inaccurate training data.

Adversarial fortification

Adversarial attacks consist of prompts intentionally or unintentionally designed to:

Launch a cyber attack to create financial loss, brand reputation damage, or intellectual property theft.
Mislead the model to produce hallucinations to compromise the reliability and trustworthiness of the model.

A model can become more resistant to adversarial attacks by:

Integrating adversarial examples into the training process to improve the classifier’s resistance to attack.
Introducing algorithms designed to identify and filter out adversarial examples.
Including adversarial examples in the scope of model testing.

For example, an engineer might unintentionally write a prompt that outputs a hallucination. The engineer should report the output to the team managing the model. The team should implement an enhancement that will reduce the likelihood of future hallucinations.

Sufficient model tuning

Hallucinations increase if a model is inadequately tuned.

Model tuning is a manual and semi-automated experimental process of finding the optimal values for hyperparameters to maximize model performance and reduce hallucinations. Hyperparameters are variables whose values the model cannot estimate from the training data.

For example, in responding to an engineer’s prompt about wind tunnel performance, a model that has not been inadequately tuned may return values that violate the laws of fluid behaviour.

By implementing these best practices, engineers can significantly reduce model hallucinations and become more confident in the reliability of model outputs.

Source link