Small language models are the future of agentic AI

The Hacker News discussion on Small Language Models (SLMs) vs. Large Language Models (LLMs) in agentic systems reveals several key themes:

The Promise and Pitfalls of SLMs for Specialized Tasks

A central theme is the potential for SLMs to be more efficient and cost-effective for narrowly scoped tasks, particularly in areas like customer service. However, this idea is met with skepticism, with some arguing that existing, simpler automation (like CRUD flows) is already sufficient and that introducing SLMs can be a step backward or an unnecessary layer of complexity.

"I don't know whether Amazon relies on LLMs or SLMs for this and for similar interactions, but it makes tons of financial sense to use SLMs for narrowly scoped agents. In use cases like customer service, the intelligence behind LLMs is all wasted on the task the agents are trained for." (bryant)
"A CRUD flow is the actual automation, which was already digested into the economy by 2005 or so. PHP is not a guy in the back who types HTML really fast when you click a button :)" (thatjoeoverthr)
"Wouldn't surprise me if down the road we start suggesting role-specific SLMs rather than general LLMs as both an ethics- and security-risk mitigation too." (bryant)
"A company's perspective? Net positive, they don't need to hire people." (oblio)

The "Enshittification" and Overcomplication of User Experiences

Several users expressed frustration with LLM-driven customer service, highlighting instances of LLMs hallucinating information, performing poorly, or introducing unnecessary complexity compared to traditional, more direct methods. A recurring sentiment is that LLM interfaces can feel like a "smoke bomb" or a way to avoid genuine human interaction and accountability.

"The LLM told me what sort of information they need, and what is the process, after which I followed through the whole thing. After I went through the whole thing it reassured me everything is in order, and my request is being processed. For two weeks, nothing happened, I emailed the (human) support staff, and they responded to me, that they can see no such request in their system, turns out the LLM hallucinated the entire customer flow and was just spewing BS at me." (torginus)
"There really should be some comeback for this type of enshAItification. We're supposed to think 'oh it's an LLM, well, that's ok then'?" (ttctciyf)
"The LLM is a smoke bomb they shot in your face :)" (thatjoeoverthr)
"Air Canada famously lost a court case recently (though the actual interaction happened in 2022) where their chat bot promised a discount that they didn't actually offer. They tried to argue that the chatbot was a 'separate legal entity that is responsible for its own actions'!!" (quietbritishjim)
"This is reason number two why I always request the service ticket number." (dotancohen)

Economic and Societal Impacts of Automation

The discussion touched upon the broader economic consequences of increased automation, particularly the potential for widespread job displacement and the suffering caused in regions that have already experienced industrial decline. The shift from human-powered services to AI-driven ones is viewed by some as a negative development for employment and societal well-being.

"We're going to be so messed up in a decade or so when only 10-20-30% of the population is employable in decent jobs." (oblio)
"People keep harping on about people moving on with their lives, but people don't. Many industrial heartlands in the developed world are wastelands compared to what they were: Walloonia in Belgium, Scotland in the UK, the Rust Belt in the US. People don't really move on, they suffer, sometimes for generations." (oblio)
"The LLM, here, is the opposite; additional human labor to build the integrations, additional capital for chips, heavy cost of inference, an additional skeuomorphic UI (it self identifies as a chat/texting situation) and your wasted time. I would almost call it 'make work'." (thatjoeoverthr)

The Role of Model Distillation and "Unix-like" Approaches

The concept of distilling larger models into smaller, specialized ones (SLMs) was discussed as a potential approach to manage complexity and resources. This aligns with a "Unix" philosophy of breaking down complex tasks into smaller, more manageable components. However, questions arose about the necessity of LLM-level language comprehension even for specialized models and the potential loss of nuance in distilled models.

"One could start with a large model for exploration during development, and then distill it down to a small model that covers the variety of the task and fits on a USB drive. E.g. when I use a model for gardening purposes, I could prune knowledge about other topics." (janpmz)
"No mention of mixture-of-exports. Seems related. They do list a DeepSeek R1 distillate as an SLM. The introduction starts with sales pitch. And there's a call-to-action at the end. This seems like marketing with source references sprinkled in. That said, I also think the 'Unix' approach to ML is right. We should see more splits, however currently all these tools rely on great language comprehension." (flowerthoughts)
"So if all of these agents will need comprehensive language understanding anyway, to be able to communicate with each other, is SLM really better than MoE?" (flowerthoughts)

Technical Challenges and Efficiency Arguments for SLMs

A significant portion of the discussion focused on the technical feasibility and actual efficiency of SLMs, particularly in the context of agentic systems. Critiques were raised regarding:

Context Window Limitations: The paper's apparent oversight of context window limitations in SLMs was highlighted as a major "prohibitive technical barrier," especially for complex agent tasks that require significant meta-information. The VRAM requirements for larger context windows on consumer hardware were also questioned.
- "IMO, the paper commits an omission that undermines the thesis quite a bit: context window limitations are mentioned only once in passing and then completely ignored throughout the analysis of SLM suitability for agentic systems. This is not a minor oversight - it's arguably, in my experience, the most prohibitive technical barrier to this vision." (sReinwald)
System-Level Inefficiencies: The paper's simple FLOP comparisons were criticized for ignoring real-world inefficiencies like retry taxes, task decomposition overhead, and infrastructure disparities between data centers and consumer hardware. These factors, when accounted for, may negate the claimed economic advantages of SLMs.
- "When you account for failed attempts, orchestration overhead and infrastructure efficiency, many 'economical' SLM deployments likely consume more total energy than centralized LLM inference." (sReinwald)
Energy Consumption and Hardware Utilization: Debates arose about the energy efficiency of local vs. remote models, with one user emphasizing that only the increase in energy usage on a local machine should be considered, while another argued that the overall efficiency of centralized data centers might still be superior, even factoring in internet packet energy.
- "My initial gut feeling is that the server will have way better energy efficiency when it comes to the amount of calculations it can do over its lifetime and how much energy it needs over its lifetime. But I would love to see the actual math." (mg)
- "As the local machine is there anyway, only the increase in energy usage should be considered, while the server only exists for this use case (distributed across all users)." (danhor)

The "Versatility" of LLMs vs. Specialized Models

The inherent versatility of LLMs was contrasted with the development effort and risk associated with fine-tuning SLMs. For tasks that aren't performed at massive scale or are highly specific, some users suggested that off-the-shelf LLMs might still be the more practical and less risky choice, despite latency and cost considerations.

"I think that part of the beauty of LLMs is their versatility in so many different scenarios. When I build my agentic pipeline, I can plug in any of the major LLMs, add a prompt to it, and have it go off to do its job. Specialized, fine-tuned models sit somewhere in between LLMs and traditional procedural code. The fine-tuning process takes time and is a risk if it goes wrong." (iagooar)
"Sure enough, latency and cost are a thing. But unless you have a very specific task performed at a huge scale, you might be better off using an off-the-shelf LLM." (iagooar)
"You can't distill something that's not built yet. The ecosystem is still young. A fine-tuned Qwen3 0.6B model can produce more effective and faster results than a raw Gemma 3 12B model." (umtksa)