Meta Releases LLaMa, New ChatGPT API Open To Everyone, OpenAI Lays Out Plans For The Future, Amazing Research Towards Multi-Modal Models

published7 months ago
6 min read

Meta Releases LLaMa Models, New ChatGPT API Open To Everyone, OpenAI Lays Out Plans For The Future, Amazing Research Towards Multi-Modal Models

This week Machine learning progress stays as fast as an Olympic sprinter slipping on a banana peel placed on a lump of vaseline.

In this week's issue, we will cover Meta open-sourcing state-of-the-art LLMs, OpenAI’s new APIs, and their plans for the future.

Last but not least, we will do a quick review of some really fascinating research that shines a light on the future of LLMs.

Let’s jump in!

Meta Open-Sources State-Of-The-Art Language Models But There Are Bad News For Builders

The models are available in sizes of 7B, 13B, 33B, and 65B parameters.

They offer state-of-the-art performance and the smaller ones even fit on a single GPU. The high performance with such small models was achieved by training them on considerably larger datasets ranging from 1T to 1.4T tokens. This makes the models not just cheaper when used for inference but also lowers the cost of fine-tuning.

The models were released under a non-commercial license.

This is bad news for builders but great for academics. The license focuses on “research only”. Meta states this that was done in order to maintain integrity and prevent misuse.

However, if you intend to use the models for research you can apply for access here.

In my opinion, they are extra careful this time after getting burned when they released Galactica last year. Galactica was trained on scientific papers but had to be pulled within three of the release. The public outcry was massive because the model hallucinated some scientific-sounding but incorrect facts.

Why Is This Important?

Today, the high cost of developing LLMs effectively shuts out small companies and academic institutions. The availability of open-source foundation models would change this. Meta made an important step in this direction.

Next, in the same vein, are some good news for the tinkerers among us.

OpenAI Releases APIs To ChatGPT and Whisper

If you were as disappointed as I was when you saw that access to Meta’s LLaMA models is limited to researchers, you are going to like this.

APIs to ChatGPT and OpenAI’s speech-to-text model whisper are available as of yesterday. Through system-wide optimizations, they claim to have reduced ChatGPT inference costs by 90% when compared to GPT-3. They now price ChatGPT at $0.002 per 1000 tokens. Dedicated instances are available for speedup. According to them, the instance makes economic sense if you process ~450M tokens a day.

Why Is This Important?

Obviously, the API access lets developers all over the world build the application on top of these LLMs. However, the decreased cost is very striking. 1000 tokens are roughly 1.5 pages of text. Hence, you can generate 500 pages per $1.

This is a massive cost reduction, which most likely came about from combining synergies with Microsoft with increased amounts of sparsity as well as retrieval enhancement on the backend. one-tenth the cost of prior GPT-3 inference.

Our next point contains a summary of a post by Sam Altman on how he sees OpenAI’s way into the future. He offers a look at their strategy for how they plan to navigate a world of stronger and stronger models.

Below that, you can find a quick summary of a strikingly beautiful paper shining the light on a future of multi-modal language models.

OpenAI Releases Its Roadmap For The Future

In their post, they provide strategies for the long- and short-term. Further, their outline what they aim to do once they get close to something like AGI.

They see their mission as “Building AGI that benefits all of humanity” by:

  1. Minimizing risks and maximizing benefits so that AI becomes an amplifier for humanity
  2. Making access and governance fairly and widely shared
  3. Continuously learning and adapting by deploying less powerful models to minimize “one shot to get it right” scenarios

In the short term they:

  • Aim to create successively more powerful systems
  • Want to give people and institutions time to adapt to the progress while the stakes are low
  • Want to enable a tight feedback loop to tackle questions of bias, job displacement, etc.
  • Plan to continue open-sourcing models to decentralize access and broaden the set of people that contributes
  • Will become increasingly cautious with the creation and deployment of models when they get closer to AGI

Once they get close to AGI they:

  • Expect the balance between the upsides and downsides of new deployments to shift
  • Are actively working on better alignment techniques
  • Expect that developments in AI safety and capabilities go hand in hand but want to increase the ratio of safety progress to capability progress
  • Have already set up their organization in a way that limits the returns to shareholders so they are not incentivized to capture profits without regard for risk. They can sponsor comprehensive UBI experiments or even cancel obligations to shareholders if safety considerations require it.

In the long term they:

  • Transition to AGI is perhaps the most important, hopeful, and scary project in human history
  • Expect changes to remain relatively slow for a while and accelerate in the late stages of development
  • Consider AI that can advance science as a special case that could be more impactful than everything else
  • Global coordination to slow down AI efforts will likely become important

Now let’s jump to the paper that I promised you!

Language Is Not All You Need: Aligning Perception with Language Models

The authors train a large language model on text, and images as well as a mix of text and image data.

Their model (KOSMOS-1) can perform a pretty impressive array of tasks such as:

  • Language understanding/generation
  • OCR-free NLP (bottom right image in the examples below)
  • Visual question answering
  • Multi-modal dialogue
  • Classification via text instructions

How did they do this?

They converted all data to a sequence. This allowed them to train the model in a self-supervised manner just as other language models are trained.

To transform the multi-modal data into sequences the images are encoded via an image encoding network. In a second step, the data are placed in sequences and special tokens are used to signal the start and end of each modality (see table below).

Why Is This Important?

Research into multi-modal models is highly meaningful in at least three ways.

First, it will be very useful if a model can answer complex queries about images and other media. This of something mundane such as improved invoice processing software. If generative pre-training improves this to a point that we get ChatGPT-like performance on unseen invoices, the value of that would be otherworldly.

Second, today’s language models such as ChatGPT are only trained on text data. As a result, they have a limited understanding of our world. Further, there seems to be a limit to how big and powerful auto-regressive LLMs can become because we are running out of text.

Third, it is not entirely clear how far LLMs can be scaled before we run out of text data. This is a fascinating topic and one of the next essays will be about this so stay tuned.

In a nutshell, the problem is basically the following: the latest research on scaling LLMs showed that we need much more data to train models than previously thought. As a result, it seems as though there might not be enough text data in the world to train some of the bigger models (500B parameters) that we have today.

Converting images and other data into sequence form would allow tapping into a near-infinite trove of data to train models.

Stuff like this makes me excited for the future!

Thank you for reading! As always, I really enjoyed making this for you and sincerely hope you found it useful!

If you are not subscribed yet, click here to subscribe!

At The Decoding ⭕, I send out a thoughtful 5-minute email every week that keeps you in the loop about machine learning research and the data economy.

113 Cherry St , #92768, Seattle WA
Unsubscribe · Preferences

Welcome to The Decoding ⭕!

Machine learning evolves at a mind-boggling speed. Staying up to date is hard! The Decoding is a weekly 5-minute newsletter keeping you in the loop. Sign up below to get smarter about machine learning and the data economy!

Read more from Welcome to The Decoding ⭕!