Is the LLM Path Flawed? The Scaling Law's End is Nigh.

Recently, I watched the Silicon Valley 101 interview with Tian Yuandong, former Research Director of FAIR at Meta. This interview provided me with a frontline research expert’s perspective on the current development of AI and Large Language Models (LLMs). It was professional, authentic, and without agenda or bias, from which I benefited greatly.

After watching, I summarized several insights:

  • Fewer people will engage in fundamental research, while more will focus on applications.
  • The LLM path may not be universally correct, and the scaling law will eventually reach its limits.
  • Balancing technology and business, bridging the gap between academia and engineering.
  • Choosing work that is both enjoyable and has future market value is crucial.

As model engineering technologies mature, fewer people will be engaged in fundamental model research. This is a common pattern in the technology field: a small number of people work on foundational technologies (such as Linux kernel development, cloud-native infrastructure, or basic framework development), while the vast majority focus on integrating these technologies with business needs for practical implementation.

Current LLMs have parameter scales in the billions. A human can acquire at most 10 billion tokens of knowledge in a lifetime, yet models are now trained on tens of trillions or even thirty trillion tokens. Where does this high-quality data come from? Whether models trained on such a massive number of tokens can truly achieve human-level capabilities remains a big question mark.

From a business perspective, large-scale data and parameters imply enormous computational demands, and there’s ultimately an upper limit to computing power and energy. The scaling law indicates that exponential input yields only linear growth, making it a very inefficient endeavor in terms of input-output ratio. Whether the value generated by models produced at such high costs can exceed the investment is also questionable.

Crucially, current models lack human “insight.” They struggle to offer unique perspectives or innovative ideas when facing problems. As Tian Yuandong put it:

“This kind of high-level human insights, human knowledge, and unique perspectives on a problem, these are things that current models are lacking.”

For companies like Meta, every technological investment aims to generate business value. Meta’s relative lag in AI can be partly attributed to rushing commercialization when the technology was still immature, and instances of “outsiders directing insiders.” The cycle from academic research to engineering implementation is often long; patience is key.

It’s important to clearly think about what you truly enjoy, what you can do, and what creates value. Blindly chasing hot industries is not advisable; what’s scorching hot today might be yesterday’s news tomorrow, like iOS development during the mobile internet boom.

This is my biggest realization recently: as a tech professional, if you want to make a living through technology, that technology must generate commercial value. Of course, one can also pursue technology driven by interest and passion, regardless of its market value. Ultimately, the goal of work is to live a better life.

Related Content