Meta Releases LLaMa Models, New ChatGPT API Open To Everyone, OpenAI Lays Out Plans For The Future, Amazing Research Towards Multi-Modal Models
This week Machine learning progress stays as fast as an Olympic sprinter slipping on a banana peel placed on a lump of vaseline.
In this week's issue, we will cover Meta open-sourcing state-of-the-art LLMs, OpenAI’s new APIs, and their plans for the future.
Last but not least, we will do a quick review of some really fascinating research that shines a light on the future of LLMs.
Let’s jump in!
Meta Open-Sources State-Of-The-Art Language Models But There Are Bad News For Builders
The models are available in sizes of 7B, 13B, 33B, and 65B parameters.
They offer state-of-the-art performance and the smaller ones even fit on a single GPU. The high performance with such small models was achieved by training them on considerably larger datasets ranging from 1T to 1.4T tokens. This makes the models not just cheaper when used for inference but also lowers the cost of fine-tuning.
The models were released under a non-commercial license.
This is bad news for builders but great for academics. The license focuses on “research only”. Meta states this that was done in order to maintain integrity and prevent misuse.
However, if you intend to use the models for research you can apply for access here.
In my opinion, they are extra careful this time after getting burned when they released Galactica last year. Galactica was trained on scientific papers but had to be pulled within three of the release. The public outcry was massive because the model hallucinated some scientific-sounding but incorrect facts.
Why Is This Important?
Today, the high cost of developing LLMs effectively shuts out small companies and academic institutions. The availability of open-source foundation models would change this. Meta made an important step in this direction.
Next, in the same vein, are some good news for the tinkerers among us.
OpenAI Releases APIs To ChatGPT and Whisper
If you were as disappointed as I was when you saw that access to Meta’s LLaMA models is limited to researchers, you are going to like this.
APIs to ChatGPT and OpenAI’s speech-to-text model whisper are available as of yesterday. Through system-wide optimizations, they claim to have reduced ChatGPT inference costs by 90% when compared to GPT-3. They now price ChatGPT at $0.002 per 1000 tokens. Dedicated instances are available for speedup. According to them, the instance makes economic sense if you process ~450M tokens a day.
Why Is This Important?
Obviously, the API access lets developers all over the world build the application on top of these LLMs. However, the decreased cost is very striking. 1000 tokens are roughly 1.5 pages of text. Hence, you can generate 500 pages per $1.
This is a massive cost reduction, which most likely came about from combining synergies with Microsoft with increased amounts of sparsity as well as retrieval enhancement on the backend. one-tenth the cost of prior GPT-3 inference.
Our next point contains a summary of a post by Sam Altman on how he sees OpenAI’s way into the future. He offers a look at their strategy for how they plan to navigate a world of stronger and stronger models.
Below that, you can find a quick summary of a strikingly beautiful paper shining the light on a future of multi-modal language models.
OpenAI Releases Its Roadmap For The Future
In their post, they provide strategies for the long- and short-term. Further, their outline what they aim to do once they get close to something like AGI.
They see their mission as “Building AGI that benefits all of humanity” by:
In the short term they:
Once they get close to AGI they:
In the long term they:
Now let’s jump to the paper that I promised you!
Language Is Not All You Need: Aligning Perception with Language Models
The authors train a large language model on text, and images as well as a mix of text and image data.
Their model (KOSMOS-1) can perform a pretty impressive array of tasks such as:
How did they do this?
They converted all data to a sequence. This allowed them to train the model in a self-supervised manner just as other language models are trained.
To transform the multi-modal data into sequences the images are encoded via an image encoding network. In a second step, the data are placed in sequences and special tokens are used to signal the start and end of each modality (see table below).
Why Is This Important?
Research into multi-modal models is highly meaningful in at least three ways.
First, it will be very useful if a model can answer complex queries about images and other media. This of something mundane such as improved invoice processing software. If generative pre-training improves this to a point that we get ChatGPT-like performance on unseen invoices, the value of that would be otherworldly.
Second, today’s language models such as ChatGPT are only trained on text data. As a result, they have a limited understanding of our world. Further, there seems to be a limit to how big and powerful auto-regressive LLMs can become because we are running out of text.
Third, it is not entirely clear how far LLMs can be scaled before we run out of text data. This is a fascinating topic and one of the next essays will be about this so stay tuned.
In a nutshell, the problem is basically the following: the latest research on scaling LLMs showed that we need much more data to train models than previously thought. As a result, it seems as though there might not be enough text data in the world to train some of the bigger models (500B parameters) that we have today.
Converting images and other data into sequence form would allow tapping into a near-infinite trove of data to train models.
Stuff like this makes me excited for the future!
Thank you for reading! As always, I really enjoyed making this for you and sincerely hope you found it useful!
If you are not subscribed yet, click here to subscribe!
At The Decoding ⭕, I send out a thoughtful 5-minute email every week that keeps you in the loop about machine learning research and the data economy.