Introducing Falcon, an innovative family of state-of-the-art language models crafted by the Technology Innovation Institute in Abu Dhabi! These cutting-edge models are now available to the world under the Apache 2.0 license. 🚀
What are Falcon Models?
Falcon Models are state-of-the-art language models created by the Technology Innovation Institute (TII). They are Causal Decoder-only models, meticulously trained using next-word prediction loss. These remarkable models are accessible to the community under the permissive Apache 2.0 license, enabling commercial use.
What are Causal Models?
Causal models are a type of language modeling used for text generation. There are two main types of language modeling: causal and masked. Causal language models predict the next token in a sequence, but they can only see tokens on the left and cannot access future tokens.
Presently, two available checkpoints are one with 7B parameters and the other with 40B parameters. These models boast a vocabulary size of 65,024 and a sequence length of 2048. Specifically, the 7B checkpoints consist of 32 layers, while the 40B checkpoints take it even further with an impressive 60 layers.
The Falcon Models underwent extensive training on 384 A100 40GB GPUs, with the smaller model requiring two weeks and the larger one taking two months to complete the training process. Notably, the training was conducted using the bfloat16 data type, which represents a 16-bit floating point format according to the IEEE 754 standard.
It’s remarkable to highlight that despite their remarkable capabilities, the computing required for training Falcon Models is only 75% of what was used for training GPT-3, a testament to their efficiency and optimization.
The training approach employed a 3D parallelism strategy, leveraging TensorParallel (with a factor of 8), Pipeline Parallel (with a factor of 4), and Data Parallel (with a factor of 12). Additionally, they incorporated ZERO (Zero Redundancy Optimizer) to further enhance training efficiency.
Falcon Models are not just trained with causal language modeling, but they also incorporate two other mechanisms: Flash Attention and Multiquery, enhancing their inference capabilities while reducing overall RAM requirements for training.
For inference, the RAM needed is 16 GB for the 7B model and a minimum of 85–100 GB GPU (VRAM) for the 40B model. However, there are alternative options for those without such high-end GPUs. The 40B model can be loaded in 8-bit mode on an A6000 with 48GB RAM or in 4-bit mode using bits and bytes with just 27 GB RAM.
The smaller Falcon model is pre-trained on a larger dataset with 1.5T tokens, while the larger models use 1T tokens for training. This dedication to data enhances the model’s performance.
Falcon is considered the best open-source model available and has outperformed other models like LLaMA, StableLM, RedPajama, and MPT, as evidenced by its position on the OpenLLM Leaderboard.
While Falcon models primarily support English, the 7B model also includes French, and the 40B model further expands its capabilities to include Spanish, German, and French, with limited support for Italian, Portuguese, Dutch, Romanian, Czech, and Swedish.
Falcon Instruct Models
- These Falcon models are also instruct fine-tuned, Instruct Versons are called as Falcon-7B-Instruct and Falcon-40B-Instruct. It is fine-tuned on Instructions and conversational data.
- It is trained on 32/64 A100 40GB GPUs.
- Languages: English & French.
- Falcon Instruct model also uses rotary position embedding, multi-query(in multi-query for different attention hits but the key and value matrices are shared across different hits ), and Flash attention for efficient training and inference. Instruction models are also of 32/60 layers.
- The OpenLLM leaderboard assesses the performance of Large Language Models (LLMs) on four distinct tasks:
a. AI2 Reasoning Challenge (25-shot): Involves grade-school science questions.
b. HellaSwag (10-shot): A benchmark for commonsense inference.
c. MMLU (5-shot): Consists of 57 tasks across different domains like maths, computer science, and law.
d. TruthfulQA (0-shot): A benchmark that evaluates the model’s truthfulness when answering questions.
In conclusion, Falcon Models are undoubtedly among the finest open-source language models, boasting an Apache 2.0 license. As decoder-only models trained efficiently using RefinedWeb, they have demonstrated exceptional performance.
These models, available in both 7B and 40B variants, are seamlessly integrated with Hugging Face, a renowned platform for natural language processing.
The development of Falcon Models showcases the unwavering dedication and the utilization of cutting-edge techniques. These advancements have paved the way for unparalleled language processing capabilities, unlocking a wide array of applications in the AI landscape.
For more in-depth insights, you can refer to the following sources: