- 26 April 2024
- Free Speech
- 4 min to read
- article 7 of 14
-
Wessel van RensburgTechnology consultant
LARGE Language Models (LLMs) like GPT-4 have faced criticism from many, including those within the AI community. As discussed in the previous column, the critique cuts to the core: not only they do they not resemble human intelligence, they lack understanding, functioning merely as a sophisticated form of auto-complete.
Yet there are others in the community with a very different opinion. Even when GPT-3 came onto the scene, they thought something significant was going on within deep learning, and in particular with the transformer architecture on which ChatGPT is based: the larger these models became, the more data they were fed, they more they began showing emergent properties they were not trained for.
When we say larger models we are generally referring to the number of parameters they have. Parameters of an AI model control the size and the bias of pathways between the model's neurons that are learned from training data. Think of them as the model's internal variables that shape its actions and effectiveness.
Sam Altman of OpenAI, Dario Amodei of Anthropic and Demis Hassabis of Google DeepMind all believe that LLMs possess a type of understanding and suggest that with sufficient data and size they could achieve superhuman intelligence. We can call this way of thinking the “scaling hypothesis of intelligence", founded on two straightforward ideas: a larger “brain" correlates with greater intelligence, and there's a functional similarity between artificial and natural neurons.
The scaling hypothesis has been substantiated in scholarly articles, with findings that remain impressively consistent. By expanding the models' parameters and concurrently amplifying the training data, their performance improves. Some argue that this is just the beginning for LLMs and that we're entering an era of exponential enhancement. If that is true, then reaching superhuman AI will require few new scientific breakthroughs. It will be mainly an engineering and financing feat.
Lees hierdie artikel in Afrikaans:
Has AI hit a ceiling?
Gary Marcus, a well-known sceptic, has a critical observation: it's 18 months since ChatGPT's release, and while Google, Anthropic and recently Meta have unveiled models that may surpass GPT-4 in some ways, they're all within the same range of performance. If we were going to witness exponential growth, he argues, at least one should have demonstrated a significant leap forward, more profound than the progression from GPT-3 to GPT-4. Thus, he poses the question: have LLMs hit a performance ceiling?
So far, the scaling hypothesis shows unambiguously that training more advanced LLMs with the transformer architecture requires exponentially more resources. For instance, GPT-2's training was relatively modest, costing about $45,000 in compute power. GPT-3, being roughly 10 times bigger in data and parameter count, incurred a cost 100 times greater, soaring above $4 million.
The specifics of GPT-4 remain under wraps but it's speculated that its parameter count and data increased by a factor of nearly 10. The training expense alone is estimated to have been between $50 million and $100 million. In terms of data volume, it's believed GPT-4 was trained on an amount comparable to that held in the US Library of Congress. Even so, GPT-4's performance improvement over GPT-3 is linear, not exponential.
You can see where this is heading. To get another improvement of the same magnitude, GPT-5 will probably have to be trained on something like 10 times the data GPT-4 was trained on and be a 10 times larger model. That will cost in the order of $2 billion or more.
That is an eyewatering sum, but within the budget ranges of the big tech companies that typically fund these training runs. However, compiling the required amount of quality training data is not a trivial task. Epic AI projects that the pool of high-quality text data will be exhausted between now and 2026. It estimates that about 10 trillion words of such data is available, an amount akin to what was used to train GPT-4. This data includes sources like books, news articles, scientific papers, Wikipedia and curated web content.
Intellectual inbreeding
Some argue that the view is overly pessimistic. For instance, Anthropic's Claude 3 model partially relies on internally generated data, or synthetic data, for training. Some fear this is a kind of intellectual inbreeding but Claude 3 works well. Similarly, ChatGPT is creating a substantial amount of data through user interactions, already equivalent to the volume it was trained on. YouTube videos represent another vast, largely untapped data reservoir. However, with heightened awareness of data's value and the flurry of copyright litigation, procuring sufficient data might become a complex and costly endeavour.
No wonder GPT-5 is taking a while to arrive. When it does, I predict it will be substantially better than GPT-4 but not enough to silence the detractors. If the scaling hypothesis remains the only pathway to machine intelligence, by the time we get to GPT-6 we can expect a new set of players to come to the fore: states, with their massive financing abilities. And when GPT-7 comes around, only two will be able to afford it: China and the US.
♦ VWB ♦
BE PART OF THE CONVERSATION: Go to the bottom of this page to share your opinion. We look forward to hearing from you.
To comment on this article, register (it's fast and free) or log in.
First read Vrye Weekblad's Comment Policy before commenting.