ChatGPT-Level Performance By Fine-Tuning LLaMa With Only 1000 Samples, Musk Chasing OpenAI, And An Overview Of Multi-Modal Models
Machine learning progress stays as rapid as a thieving magpie swooping towards a gleaming silver spoon.
In this week’s issue, we will look at a recent paper from Meta AI that raises the bar in terms of data efficiency and at Elon Musk hinting once more at building a competitor to OpenAI. Last but not least, we have a quick review of the state of multi-modal models in store for you.
Let’s jump in!
Usually, each company has its own, long interview process, and for every company, applicants need to start all over again. This is not only annoying but also wastes a lot of time and money for everyone involved.
The DeepTalent platform validates your skills only once.
Companies then start reaching out to you and fast-track your application. They can do that because they already know you have great skills as a member of the exclusive network.
For companies, the service can significantly shorten the time-to-hire and reduce the costs of technical interviews by 80%.
Full disclosure here: I am a co-founder of DeepTalent.
So, if you are looking for a new job in machine learning or are hiring engineers in the field, check out: DeepTalent.io!
In a recent paper, researchers found that fine-tuning a LLaMa model on only 1000 samples was enough to create a state-of-the-art conversational AI. Their instruction-tuned model rivals popular models such as GPT-4, ChatGPT, and Google’s Bard. What is particularly interesting about their finding is that they did not use any reinforcement learning with human feedback (RLHF) as it was used in the creation of ChatGPT. In the paper, they conclude that instruction fine-tuning probably works that well with comparably very few samples because most of the model’s capabilities are learned during pre-training.
Why Is This Important?
Over the last months, we have seen the cost of creating a state-of-the-art LLM drop considerably. The model used in this study has only 65B parameters. Though this is still a massive model, it has roughly six times fewer parameters than GPT-3 and about 20 times fewer than GPT-4. Reaching state-of-the-art performance by fine-tuning on no more than 1000 samples of dialogue lowers the barrier to entry once more. Curating 1000 examples of specific dialogue is well within the realm of what is possible for e. g. a startup without a boatload of funding.
During The Wall Street Journal’s CEO Council Summit in London, Elon Musk expressed his desire to establish an AI business that can compete with industry giants Google and Microsoft. Musk hinted that this endeavor could involve various parts of his corporate empire, including Twitter, which he believes could become cash-flow positive by next month. He suggested the possibility of Twitter and Tesla partnering with an AI company, similar to the Microsoft and OpenAI collaboration. Musk’s existing AI company, X.AI, could play a role in this ambitious plan.
Why Is This Important?
On the one hand, a company such as X.AI could function as a platform that unifies the AI efforts of Tesla as well as those of Twitter. This would likely help Musk’s endeavors to foster more innovation internally. On the other hand, it will likely create spillover effects on Twitter as well as help to fight off the stiff competition that other car manufacturers pose on Tesla’s self-driving systems.
Multimodal models, capable of processing various types of input data, have made significant progress in recent years. Meta AI’s ImageBind is one such model that embeds six modalities into a joint embedding space. It uses a CLIP-like contrastive approach to train encoders for each modality. ImageBind demonstrates strong performance in tasks such as few-shot classification, object detection, and embedding space arithmetic. All of these abilities are obviously observed across multiple modalities. This means, for example, that the model can answer questions about an image in natural language.
Why Is This Important? Multimodal models have the potential to revolutionize AI systems, enabling new tasks and impacting our understanding of the world. Development in this area has led to impressive results already. Further, the ability to operate across different modalities has the potential to blow some of the current use cases out of the water. A simple example of this is OCR-free document processing. This would enable us to extract information from scanned documents without the need for additional complex OCR systems.
Thank you for reading!
As always, I really enjoyed making this for you and sincerely hope you found it useful!
See you next week!
If you are not subscribed yet: At The Decoding ⭕, I send out a thoughtful 5-minute email every week that keeps you in the loop about machine learning research and the data economy. Click here to subscribe!