No, You Probably Don’t Need a Custom LLM

One of the most frequent comments I hear from clients and prospective clients is their desire for a custom LLM because GPT or other API-based LLMs aren’t delivering satisfactory results. They often mention hallucinations, forgetfulness, and inaccuracies as primary reasons for believing they need a custom-trained model. Their initial instinct is to think they need to build a model from scratch or fine-tune an existing LLM to achieve high-quality results. However, in most cases, there are simple steps you can take to get better results without investing the time, effort, and money to create a new model. Here are common mistakes people make and solutions to address them.

Mistake #1: Relying on Engrained LLM Knowledge

One of the most significant errors people make is relying too much on the pre-trained knowledge within the LLM. Although these models have been trained on extensive datasets, there’s a substantial difference between the knowledge embedded in the model and the knowledge that can be provided by passing it in as context. The embedded knowledge loses specificity and precision when trained with examples because it must generalize the information.

Solution: Use Retrieval-Augmented Generation (RAG)

Instead of relying solely on the LLM’s internal node knowledge, consider using Retrieval-Augmented Generation (RAG). RAG involves integrating external data sources with the LLM to provide contextually relevant information during the generation process. The essences of this approach is to pass in relevant information to the LLM during generation such that it can use that knowledge directly when generating responses. How to get the relevant information for RAG is a whole challenge of its own and outside the scope of this blog post.

Mistake #2: Asking the LLM to do too much

Instead of depending solely on the LLM’s internal node knowledge, consider using Retrieval-Augmented Generation (RAG). RAG integrates external data sources with the LLM to provide contextually relevant information during the generation process. The essence of this approach is to input relevant information to the LLM during generation, allowing it to use that knowledge directly when creating responses. Obtaining the relevant information for RAG is a challenge in itself and beyond the scope of this blog post.

Solution: Break Down the Tasks

To achieve better results, break down complex queries into smaller, manageable tasks. Divide the query into distinct parts so the model can handle each task individually. For example, first ask the model to summarize a document, then request a translation of the summary. This sequential processing enables the model to focus on one task at a time, ensuring higher accuracy and better performance.

Providing clear, specific instructions for each query also helps guide the model toward the desired output. By segmenting tasks this way, the LLM can process each request more effectively, resulting in more accurate and reliable outcomes. This approach enhances the model’s performance and makes the outputs more actionable and easier to integrate into your workflow.

Mistake #3: Neglecting Proper Prompt Engineering

Another common mistake people make is neglecting the importance of prompt engineering. Many users assume that simply inputting a question or command will yield the best possible response from the LLM. However, the quality of the output is highly dependent on the way the input is phrased. Poorly constructed prompts can lead to vague, irrelevant, or confusing answers.

Solution: Optimize Your Prompts

To get the best results from an LLM, optimize your prompts by being specific and clear. Instead of broad questions, provide detailed instructions. For example, instead of asking “What are the benefits of exercise?”, ask “Can you list the top five benefits of regular aerobic exercise?“. Specify constraints such as word limits or required formats. If the initial output isn’t satisfactory, refine your prompts based on the response. Testing different phrasings helps identify what works best. Optimizing prompts enhances response quality, making interactions with the AI more efficient and productive.

Mistake #4: Setting the Temperature Too High

Another common mistake is setting the temperature too high when generating responses with an LLM. The temperature setting controls the randomness of the model’s output. A high temperature can result in more creative but often less coherent and accurate responses, leading to variability and inconsistency. For many tasks, you don’t want creativity but rather accurate responses.

Solution: Turn down the Temperature

For tasks requiring highly deterministic and factual responses, setting the temperature to 0 is ideal. This setting ensures that the model generates the most likely response based on its training data without any randomness. For tasks that are more creative such as summarization, a lower temperature (e.g., 0.2 to 0.5) is ideal.

Conclusion

Unless you have tried all of these techniques, you probably don’t need to build a custom LLM to achieve high-quality results. Many issues with LLM performance can be mitigated by optimizing your approach. Incorporating context-specific data, breaking down complex tasks, refining your prompts, and adjusting the temperature settings can significantly enhance the output.

By applying these strategies, you can effectively leverage existing LLM capabilities to meet your needs, saving time, resources, and effort involved in custom model development. Focus on mastering these techniques to maximize your LLM’s potential before considering more drastic measures like custom training.

No, You Probably Don't Need a Custom LLM