Meta Llama: Everything you need to know about the open generative AI model

Llama illustration

Like every Big Tech company these days, Meta has its own flagship generative AI model, called Llama. Llama is somewhat unique among major models in that it's "open," meaning developers can download and use it however they please (with certain limitations). That's in contrast to models like Anthropic's Claude, OpenAI's GPT-4o (which powers ChatGPT) and Google's Gemini, which can only be accessed via APIs.

In the interest of giving developers choice, however, Meta has also partnered with vendors, including AWS, Google Cloud and Microsoft Azure, to make cloud-hosted versions of Llama available. In addition, the company has released tools designed to make it easier to fine-tune and customize the model.

Here's everything you need to know about Llama, from its capabilities and editions to where you can use it. We'll keep this post updated as Meta releases upgrades and introduces new dev tools to support the model's use.

What is Llama?

Llama is a family of models — not just one:

  • Llama 8B

  • Llama 70B

  • Llama 405B

The latest versions are Llama 3.1 8B, Llama 3.1 70B and Llama 3.1 405B, which was released in July 2024. They're trained on web pages in a variety of languages, public code and files on the web, and synthetic data (i.e., data generated by other AI models).

Llama 3.1 8B and Llama 3.1 70B are small, compact models meant to run on devices ranging from laptops to servers. Llama 3.1 405B, on the other hand, is a large-scale model requiring (absent some modifications) data center hardware. Llama 3.1 8B and Llama 3.1 70B are less capable than Llama 3.1 405B, but faster. They're "distilled" versions of 405B, in point of fact, optimized for low storage overhead and latency.

All the Llama models have 128,000-token context windows. (In data science, tokens are subdivided bits of raw data, like the syllables "fan," "tas" and "tic" in the word "fantastic.") A model’s context, or context window, refers to input data (e.g., text) that the model considers before generating output (e.g., additional text). Long context can prevent models from "forgetting" the content of recent docs and data, and from veering off topic and extrapolating wrongly.

Those 128,000 tokens translate to around 100,000 words or 300 pages, which for reference is around the length of "Wuthering Heights," "Gulliver’s Travels" and "Harry Potter and the Prisoner of Azkaban."

What can Llama do?

Like other generative AI models, Llama can perform a range of different assistive tasks, like coding and answering basic math questions, as well as summarizing documents in eight languages (English, German, French, Italian, Portuguese, Hindi, Spanish and Thai). Most text-based workloads — think analyzing files like PDFs and spreadsheets — are within its purview; none of the Llama models can process or generate images, although that may change in the near future.

All the latest Llama models can be configured to leverage third-party apps, tools and APIs to complete tasks. They're trained out of the box to use Brave Search to answer questions about recent events, the Wolfram Alpha API for math- and science-related queries, and a Python interpreter for validating code. In addition, Meta says the Llama 3.1 models can use certain tools they haven’t seen before (but whether they can reliably use those tools is another matter).

Where can I use Llama?

If you're looking to simply chat with Llama, it's powering the Meta AI chatbot experience on Facebook Messenger, WhatsApp, Instagram, Oculus and Meta.ai.

Developers building with Llama can download, use or fine-tune the model across most of the popular cloud platforms. Meta claims it has over 25 partners hosting Llama, including Nvidia, Databricks, Groq, Dell and Snowflake.

Some of these partners have built additional tools and services on top of Llama, including tools that let the models reference proprietary data and enable them to run at lower latencies.

Meta suggests using its smaller models, Llama 8B and Llama 70B, for general-purpose applications like powering chatbots and generating code. Llama 405B, the company says, is better reserved for model distillation — the process of transferring knowledge from a large model to a smaller, more efficient model — and generating synthetic data to train (or fine-tune) alternative models.

Importantly, the Llama license constrains how developers can deploy the model: App developers with more than 700 million monthly users must request a special license from Meta that the company will grant on its discretion.

What tools does Meta offer for Llama?

Alongside Llama, Meta provides tools intended to make the model "safer" to use:

  • Llama Guard, a moderation framework

  • Prompt Guard, a tool to protect against prompt injection attacks

  • CyberSecEval, a cybersecurity risk assessment suite

Llama Guard tries to detect potentially problematic content either fed into — or generated — by a Llama model, including content relating to criminal activity, child exploitation, copyright violations, hate, self-harm and sexual abuse. Developers can customize the categories of blocked content and apply the blocks to all the languages Llama supports out of the box.

Like Llama Guard, Prompt Guard can block text intended for Llama, but only text meant to "attack" the model and get it to behave in undesirable ways. Meta claims that Llama Guard can defend against explicitly malicious prompts (i.e., jailbreaks that attempt to get around Llama's built-in safety filters) in addition to prompts that contain "injected inputs."

As for CyberSecEval, it's less a tool than a collection of benchmarks to measure model security. CyberSecEval can assess the risk a Llama model poses (at least according to Meta's criteria) to app developers and end users in areas like "automated social engineering" and "scaling offensive cyber operations."

Llama's limitations

Llama comes with certain risks and limitations, like all generative AI models.

For instance, it's unclear whether Meta trained Llama on copyrighted content. If it did, users might be liable for infringement if they end up unwittingly using a copyrighted snippet that the model regurgitated.

Meta at one point used copyrighted e-books for AI training despite its own lawyers’ warnings, according to recent reporting by Reuters. The company controversially trains its AI on Instagram and Facebook posts, photos and captions, and makes it difficult for users to opt out. What’s more, Meta, along with OpenAI, is the subject of an ongoing lawsuit brought by authors, including comedian Sarah Silverman, over the companies’ alleged unauthorized use of copyrighted data for model training.

Programming is another area where it's wise to tread lightly when using Llama. That's because Llama might — like its generative AI counterparts — produce buggy or insecure code.

As always, it's best to have a human expert review any AI-generated code before incorporating it into a service or software.