💙 Gate Square #Gate Blue Challenge# 💙
Show your limitless creativity with Gate Blue!
📅 Event Period
August 11 – 20, 2025
🎯 How to Participate
1. Post your original creation (image / video / hand-drawn art / digital work, etc.) on Gate Square, incorporating Gate’s brand blue or the Gate logo.
2. Include the hashtag #Gate Blue Challenge# in your post title or content.
3. Add a short blessing or message for Gate in your content (e.g., “Wishing Gate Exchange continued success — may the blue shine forever!”).
4. Submissions must be original and comply with community guidelines. Plagiarism or re
AI model "big is better" point of view no longer works
Author | The Economist Translator |
Editor in charge | Xia Meng
Listing | CSDN (ID: CSDNnews)
If AI is to get better, it will have to do more with fewer resources.
Speaking of "Large Language Models" (LLMs), such as OpenAI's GPT (Generative Pre-trained Transformer) - the core force driving the popular chatbots in the United States - the name says it all. Such modern AI systems are powered by vast artificial neural networks that mimic the workings of biological brains in a broad way. GPT-3, released in 2020, is a big language model behemoth with 175 billion "parameters," which is the name for the simulated connections between neurons. GPT-3 is trained by processing trillions of words of text in a few weeks using thousands of AI-savvy GPUs, at an estimated cost of more than $4.6 million.
However, the consensus in modern AI research is: "bigger is better, and bigger is better". Therefore, the scale growth rate of the model has been in rapid development. Released in March, GPT-4 is estimated to have around 1 trillion parameters—an almost sixfold increase over the previous generation. OpenAI CEO Sam Altman estimates it cost more than $100 million to develop. And the industry as a whole is showing the same trend. Research firm Epoch AI predicts in 2022 that the computing power required to train top models will double every six to ten months (see chart below).
Earlier this year, Morgan Stanley estimated that if half of Google's searches were handled by current GPT-type programs, it could cost the company an extra $6 billion a year. This number will likely continue to rise as the size of the model grows.
As a result, many people's view that AI models are "big is better" is no longer valid. If they are going to continue to improve AI models (let alone realize those grander AI dreams), developers need to figure out how to get better performance with limited resources. As Mr Altman said this April when looking back at the history of large-scale AI: "I think we've reached the end of an era."
Quantitative Crunching
Instead, researchers began to focus on how to improve the efficiency of the model, not just the pursuit of scale. One way is to achieve a trade-off by reducing the number of parameters but using more data to train the model. In 2022, Google's DeepMind division trained a 70 billion parameter LLM called Chinchilla on a corpus of 1.4 trillion words. Despite having fewer parameters than GPT-3's 175 billion and training data of only 300 billion words, this model outperformed GPT-3. Feeding a smaller LLM with more data means it takes longer to train, but the result is a smaller, faster, and cheaper model.
Another option is to let reduce the precision of floating point numbers. Reducing the number of digits of precision in each number in the model, i.e. rounding, can drastically reduce hardware requirements. Researchers at the Austrian Institute of Science and Technology demonstrated in March that rounding can drastically reduce the memory consumption of a GPT-3-like model, allowing the model to run on one high-end GPU instead of five with "negligible loss of accuracy." ".
Some users fine-tune a general-purpose LLM to focus on specific tasks such as generating legal documents or detecting fake news. While this is not as complex as training an LLM for the first time, it can still be expensive and time-consuming. Fine-tuning Meta's (Facebook's parent company) open-sourced 65 billion-parameter LLaMA model required multiple GPUs and took anywhere from hours to days.
Researchers at the University of Washington have invented a more efficient way to create a new model Guanaco from LLaMA on a single GPU in a day with negligible performance loss. Part of the trick is a rounding technique similar to the one used by the Austrian researchers. But they also used a technique called Low-Rank Adaptation (LoRA), which involves fixing the model's existing parameters and then adding a new, smaller set of parameters to it. Fine-tuning is done by changing only these new variables. This simplifies things to the point that even a relatively weak computer, such as a smartphone, is up to the task. If LLM can be run on the user's device instead of the current giant data center, it may bring greater personalization and better privacy protection.
Meanwhile, a team at Google is offering new options for those who can live with smaller models. This approach focuses on mining specific knowledge from a large general model and transforming it into a smaller and specialized model. The big model acts as the teacher and the small model acts as the student. The researchers had the teachers answer questions and demonstrate their reasoning. Both the answers and inferences from the teacher model (big model) are used to train the student model (small model). The team successfully trained a student model with only 7.7 billion parameters (the small model) to outperform its teacher model with 540 billion parameters (the large model) on specific inference tasks.
Another approach is to change the way the model is built instead of focusing on what the model is doing. Most AI models are developed in the Python language. It's designed to be easy to use, freeing the programmer from having to think about how the program operates the chip while it's running. The price of masking these details is that the code runs slower. Paying more attention to these implementation details can pay huge dividends. As Thomas Wolf, chief scientific officer of open-source AI company Hugging Face, puts it, this is "an important aspect of current research in artificial intelligence."
optimized code
For example, in 2022, researchers at Stanford University released an improved version of the "attention algorithm" that allows large language models (LLMs) to learn the connections between words and concepts. The idea is to modify the code to take into account what's happening on the chip it's running on, in particular to keep track of when specific information needs to be retrieved or stored. Their algorithm managed to triple the training speed of GPT-2, an early large language model, and also enhanced its ability to handle longer queries.
Cleaner code can also be achieved with better tools. Earlier this year, Meta released a new version of its AI programming framework, PyTorch. By getting programmers to think more about how to organize computations on actual chips, it can double the speed at which models can be trained by adding a single line of code. Modular, a startup founded by former Apple and Google engineers, last month released a new AI-focused programming language called Mojo, based on Python. Mojo gives programmers control over all the details that used to be shielded, and in some cases code written using Mojo can run thousands of times faster than an equivalent block of code written in Python.
The last option is to improve the chip that runs the code. Although originally designed to handle the complex graphics found in modern video games, GPUs are surprisingly good at running AI models. A hardware researcher at Meta said that for "inference" (i.e., the actual execution of a model after it's been trained), GPUs aren't designed perfectly. As a result, some companies are designing their own more specialized hardware. Google already runs most of its AI projects on its in-house "TPU" chips. Meta, with its MTIA chip, and Amazon, with its Inferentia chip, are trying something similar.
It can be surprising that sometimes simple changes like rounding numbers or switching programming languages can yield huge performance gains. But this reflects the rapid development of large language models (LLM). For many years, large language models were primarily a research project, and the focus was mainly on getting them to work and produce valid results, rather than on the elegance of their design. Only recently have they been turned into commercial, mass-market products. Most experts agree that there is plenty of room for improvement. As Chris Manning, a computer scientist at Stanford University, said: "There is no reason to believe that the currently used neural architecture (referring to the current neural network structure) is optimal, and it is not ruled out that more advanced architectures will appear in the future."