Meta releases Llama 3, claims it’s amongst the perfect open models available

Date:

ChicMe WW
Kinguin WW
Lilicloth WW

Meta has released the newest entry in its Llama series of open generative AI models: Llama 3. Or, more accurately, the corporate has debuted two models in its latest Llama 3 family, with the remainder to return at an unspecified future date.

Meta describes the brand new models — Llama 3 8B, which comprises 8 billion parameters, and Llama 3 70B, which comprises 70 billion parameters — as a “major leap” in comparison with the previous-gen Llama models, Llama 2 8B and Llama 2 70B, performance-wise. (Parameters essentially define the skill of an AI model on an issue, like analyzing and generating text; higher-parameter-count models are, generally speaking, more capable than lower-parameter-count models.) In actual fact, Meta says that, for his or her respective parameter counts, Llama 3 8B and Llama 3 70B — trained on two custom-built 24,000 GPU clusters — are are among the many best-performing generative AI models available today.

That’s quite a claim to make. So how is Meta supporting it? Well, the corporate points to the Llama 3 models’ scores on popular AI benchmarks like MMLU (which attempts to measure knowledge), ARC (which attempts to measure skill acquisition) and DROP (which tests a model’s reasoning over chunks of text). As we’ve written about before, the usefulness — and validity — of those benchmarks is up for debate. But for higher or worse, they continue to be certainly one of the few standardized ways by which AI players like Meta evaluate their models.

Llama 3 8B bests other open models similar to Mistral’s Mistral 7B and Google’s Gemma 7B, each of which contain 7 billion parameters, on at the very least nine benchmarks: MMLU, ARC, DROP, GPQA (a set of biology-, physics- and chemistry-related questions), HumanEval (a code generation test), GSM-8K (math word problems), MATH (one other mathematics benchmark), AGIEval (a problem-solving test set) and BIG-Bench Hard (a commonsense reasoning evaluation).

Now, Mistral 7B and Gemma 7B aren’t exactly on the bleeding edge (Mistral 7B was released last September), and in just a few of the benchmarks Meta cites, Llama 3 8B scores only just a few percentage points higher than either. But Meta also makes the claim that the larger-parameter-count Llama 3 model, Llama 3 70B, is competitive with flagship generative AI models, including Gemini 1.5 Pro, the newest in Google’s Gemini series.

Image Credits: Meta

Llama 3 70B beats Gemini 1.5 Pro on MMLU, HumanEval and GSM-8K, and — while it doesn’t rival Anthropic’s most performant model, Claude 3 Opus — Llama 3 70B scores higher than the second-weakest model within the Claude 3 series, Claude 3 Sonnet, on five benchmarks (MMLU, GPQA, HumanEval, GSM-8K and MATH).

Meta Llama 3

Image Credits: Meta

For what it’s value, Meta also developed its own test set covering use cases starting from coding and inventive writing to reasoning to summarization, and — surprise! — Llama 3 70B got here out on top against Mistral’s Mistral Medium model, OpenAI’s GPT-3.5 and Claude Sonnet. Meta says that it gated its modeling teams from accessing the set to take care of objectivity, but obviously — on condition that Meta itself devised the test — the outcomes need to be taken with a grain of salt.

Meta Llama 3

Image Credits: Meta

More qualitatively, Meta says that users of the brand new Llama models should expect more “steerability,” a lower likelihood to refuse to reply questions, and better accuracy on trivia questions, questions pertaining to history and STEM fields similar to engineering and science and general coding recommendations. That’s partly due to a much larger dataset: a set of 15 trillion tokens, or a mind-boggling ~750,000,000,000 words — seven times the dimensions of the Llama 2 training set. (Within the AI field, “tokens” refers to subdivided bits of raw data, just like the syllables “fan,” “tas” and “tic” within the word “incredible.”)

Where did this data come from? Good query. Meta wouldn’t say, revealing only that it drew from “publicly available sources,” included 4 times more code than within the Llama 2 training dataset and that 5% of that set has non-English data (in ~30 languages) to enhance performance on languages apart from English. Meta also said it used synthetic data — i.e. AI-generated data — to create longer documents for the Llama 3 models to coach on, a somewhat controversial approach resulting from the potential performance drawbacks.

“While the models we’re releasing today are only tremendous tuned for English outputs, the increased data diversity helps the models higher recognize nuances and patterns, and perform strongly across quite a lot of tasks,” Meta writes in a blog post shared with TechCrunch.

Many generative AI vendors see training data as a competitive advantage and thus keep it and info pertaining to it near the chest. But training data details are also a possible source of IP-related lawsuits, one other disincentive to disclose much. Recent reporting revealed that Meta, in its quest to take care of pace with AI rivals, at one point used copyrighted e-books for AI training despite the corporate’s own lawyers’ warnings; Meta and OpenAI are the topic of an ongoing lawsuit brought by authors including comedian Sarah Silverman over the vendors’ alleged unauthorized use of copyrighted data for training.

So what about toxicity and bias, two other common problems with generative AI models (including Llama 2)? Does Llama 3 improve in those areas? Yes, claims Meta.

Meta says that it developed latest data-filtering pipelines to spice up the standard of its model training data, and that it has updated its pair of generative AI safety suites, Llama Guard and CybersecEval, to try and prevent the misuse of and unwanted text generations from Llama 3 models and others. The corporate’s also releasing a brand new tool, Code Shield, designed to detect code from generative AI models which may introduce security vulnerabilities.

Filtering isn’t foolproof, though — and tools like Llama Guard, CyberSecEval and Code Shield only go thus far. (See: Llama 2’s tendency to make up answers to questions and leak private health and financial information.) We’ll need to wait and see how the Llama 3 models perform within the wild, inclusive of testing from academics on alternative benchmarks.

Meta says that the Llama 3 models — which can be found for download now, and powering Meta’s Meta AI assistant on Facebook, Instagram, WhatsApp, Messenger and the online — will soon be hosted in managed form across a big selection of cloud platforms including AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM’s WatsonX, Microsoft Azure, Nvidia’s NIM and Snowflake. In the longer term, versions of the models optimized for hardware from AMD, AWS, Dell, Intel, Nvidia and Qualcomm may also be made available.

The Llama 3 models could be widely available. But you’ll notice that we’re using “open” to explain them versus “open source.” That’s because, despite Meta’s claims, its Llama family of models aren’t as no-strings-attached because it’d have people consider. Yes, they’re available for each research and industrial applications. Nonetheless, Meta forbids developers from using Llama models to coach other generative models, while app developers with greater than 700 million monthly users must request a special license from Meta that the corporate will — or won’t — grant based on its discretion.

More capable Llama models are on the horizon.

Meta says that it’s currently training Llama 3 models over 400 billion parameters in size — models with the flexibility to “converse in multiple languages,” take more data in and understand images and other modalities in addition to text, which might bring the Llama 3 series in step with open releases like Hugging Face’s Idefics2.

Meta Llama 3

Image Credits: Meta

“Our goal within the near future is to make Llama 3 multilingual and multimodal, have longer context and proceed to enhance overall performance across core [large language model] capabilities similar to reasoning and coding,” Meta writes in a blog post. “There’s loads more to return.”

Indeed.

Share post:

High Performance VPS Hosting

Popular

More like this
Related