There will be cambrian explosion of smaller LLMs give the AI executive order that regulates really large models.
While these models won’t be as performant as the GPT-4 out of the box, there are different techniques that can be applied to fine-tune them and make them match big model performance for a specific use-cases.
This is also a worth while endeavor at scale, as it’s a substantial saving (almost 60x) to run smaller, more efficient models that match performance.
Here are the different ways you can fine-tune an LLM for your use-case
Supervised Fine-Tuning – This is most common and you need a task specific labeled dataset.
A popular technique, LoRA involves adding low-rank matrices to pre-existing layers within a large pre-trained model. These low-rank matrices are relatively small in terms of the number of parameters, but they are powerful in adapting the model for specific tasks.
The idea is to fine-tune only these added low-rank matrices while keeping the original large-scale parameters frozen.
This technique makes a lot of sense when you have a specific task like summarization or extracting terms from your procurement contracts and you have some examples or labelled data.
LoRa fine-tuning of Llama-2, shows performance almost on par with full-parameter fine-tuning, and even outshines GPT-4 in specialized tasks like generating SQL queries or text-based functional representations.
Over at Abacus AI we have used LoRa method successfully and matched GPT 4.0 and fine-tuned versions of GPT 3.5’s performance
Domain-Specific Fine-Tuning – You can also use a corpus of specialized data to fine-tune model. For example: PMC-LLaMA, is an effort to build open-source language models for medicine. This model was fine-tuned with a staggering 4.8 million biomedical academic papers and 30K medical textbooks.
Clinical LLaMA-LoRA fine-tunes the pre-trained Llama to the clinical domain for downstream clinical tasks, illustrating the model’s adaptability to healthcare-specific challenges.
Reinforcement Learning-Based Fine-Tuning:
Reinforcement learning algorithms like Proximal Policy Optimization (PPO) are used to optimize a policy, which in this context is the model’s parameterization for generating sequences of tokens. The optimization is guided by a reward function, which quantitatively evaluates the quality of generated sequences. This technique allows the model to learn more complex, multi-step reasoning and adapt its responses based on the reward signals. The feedback or the rewards used for fine-tuning could come from various sources like predefined metrics, domain-specific criteria, or even automated systems.
Human Preference-Based Fine-Tuning – In this approach, human evaluators provide comparative rankings of different outputs generated by the model for the same prompt. These rankings are used to construct a reward model, essentially a function that maps from model outputs to scalar rewards. The model is then fine-tuned using reinforcement learning techniques, often PPO, guided by this reward model. This iterative process enables the model to align closely with human preferences and values. RLHF where you align the model to human values falls into this category but you can also use this method where you can receive human feedback from a chat bot’s responses or content recommendations.
Few-Shot Learning – Few-shot learning in the context of large language models involves providing a few example tasks directly within the prompt to guide the model’s behavior. Technically, this does not involve retraining the model but exploits the model’s inherent meta-learning capabilities. The model generalizes from the examples to perform the specific task at hand, effectively leveraging its pre-trained parameters to adapt to new tasks without explicit fine-tuning. This can be super effective with GPT-4 but also works with simpler tasks on Llama-2 or Abacus Giraffe.