LLM agents have applications ranging from business agents to optimization. Their rise is a watershed moment - even in the context of Deep Learning and transformers. One of the best things about LLM agents is that people with little background in ML can also understand them.
What is an LLM agent?
An LLM agent is a system built on top of a Large Language Model which is designed to perform some specific task(s) and interact with its environment. Here are two things worthy of attention:
- They can perform a specific task: As its name indicates, an agent must perform some specific tasks. For example, it can suggest travel destinations based on parameters like weather and budget or give mental health advice* based on the user’s condition.
- They can interact with the environment: One of the limitations surrounding LLMs was their inaccessibility to the outer world. Not long ago, ChatGPT used to reply with something like, “Sorry, I am updated as of [date] and cannot provide up-to-date information about that.” This is no longer the case - LLM agents are able to interact with external sources like the internet, databases, APIs, or even other LLMs.
Chain of Thoughts reasoning
LLMs are pretty good at language understanding but often struggle with basic questions. They made researchers wonder if LLMs can reason. This curiosity led to an interesting observation: LLMs can reason by breaking down tasks into sub-tasks without requiring explicit fine-tuning if prompted with a chain of thoughts. Surprisingly, just a few prompts are sufficient enough for an LLM to begin CoT reasoning.
![Standard Prompting vs CoT Prompting – CoT prompting allows the LLM to reason. (Credits: CoT Paper)](https://blog.apify.com/content/images/2025/02/Standard-prompting-vs-CoT-prompting.png)
This prompting is quite similar to guiding students to solve a mathematical problem rather than providing only the answer keys. It's remarkable how LLMs (thanks to the vast amount of data they’re trained on) require far fewer prompts than human learners before they can reason about unseen tasks.
![An example of CoT: taking steps to solve the given quadratic equation (Credits: IBM)](https://blog.apify.com/content/images/2025/02/Example-of-CoT-reasoning.png)
ReAct
The fewer prompts required for CoT reasoning are cost-effective, but they may cause hallucinations. To make CoT work as an agent, we need the environment interaction. Combining these reasoning and acting capabilities has led to an advanced type of agent called ReAct (Reasoning and Acting).
![ReAct combines both the reasoning and acting capabilities of LLMs (Source: ReAct paper)](https://blog.apify.com/content/images/2025/02/How-ReAct-works.png)
ReAct works similarly to how we have those “inner thoughts” while performing some task. “Now I’ve covered ReAct a bit; I must move to their benefits part now. Oh wait, I’m yet to conclude this section [starts writing the conclusion]” is an example of the way I just reasoned and acted here.
What can LLM agents do?
LLM agents are quite versatile. They’re able to perform a number of tasks, like:
- User interaction: Chatbots were among the first LLM agents. Beyond them, LLM agents are also capable of interactions, like voice assistants, customer support, or digital tutors.
- Integration: Not long ago, I was wondering if the LLM agent I'm designing would be able to integrate with the existing code. But they surpassed my expectations: LLM agents are capable of integrating with APIs, existing code (I've tested it on Python), databases, and even other LLMs.
- Task automation: This is where LLM agents overlap with general AI agents, as both are capable of automating the desired tasks (with LLM agents having more capabilities).
- Optimization: LLM agents’ capabilities seem to include their use as optimizers in complex problems like the Travelling Salesman Problem (TSP).
Types of LLM agents
LLM agents are a broad and diverse species that can be further categorized into a few types. Here, we're categorizing them based on their functionality.
Agentic AI
While LLM agents are supposed to be truly agentic (fully autonomous), many of them require further prompting or user feedback (like asking the user to pick the most suitable recipe from the suggestions). Agentic AIs, on the other hand, are meant to execute tasks with minimal (ideally no) human input. These agents involve iterative reasoning and external validation. There's also an increase in techniques not requiring any external validation like Universal Self-Consistency.
Multi-modal agents
Agents like O1, Gemini, DeepSeek, and Claude are capable of handling diverse data types. By combining different AI models (trained on images, audio, text, etc), they provide a unified interface for working with different types of data.
Personal assistants
Gemini, Siri, and Alexa are off-the-top-of-the-head examples of personal assistants. But there are many more dedicated PAs too, like Notion AI, WHOOP AI Coach, Cleo AI, etc. In fact, with a little customization, you can build your own personal assistant.
Embodied agents
One of the most exciting applications is embodied agents. LLMs are empowering robots, autonomous-driving cars, smart home systems, drones, and some virtual assistants. Examples include Ameca, Tesla Optimus, and Wayve.
Capabilities of LLM agents
The increasing interest in these agents isn’t overhyped. Here are a few definite reasons for this.
- Automation: Every agent is meant to automate tasks. Be it the classical Royal Mail automated delivery system or Apify Actors, they all are aimed at minimizing human intervention and automating tasks, which is both time and cost-effective.
- Integration: LLM agents are anything but static. They can integrate with APIs, user-defined functions, databases, or even other LLM agents, resulting in a dynamic response.
- Personalization: By integrating with the database, LLM agents can maintain conversation history and provide contextual responses. This is in contrast to the normal or even rule-based AI agents that either don’t maintain memory or have limited memory.
- Natural language understanding: Often, our applications don’t require a hardcore AI task but just an interpretation of the user’s query. This interpreted query can be used in some functions for the ultimate application. A good example is Singapore Air’s Flight Recommender system.
LLM agent frameworks
The rise of LLM agents also means a rise in agent frameworks - both for development and deployment.
Development
LangChain/LangGraph: LangChain and LangGraph are the go-to frameworks for LLM application development. Their vast coverage of functions and the ability to integrate with several other libraries make them the first choice for LLM agent development.
LlamaIndex: LlamaIndex can be the preferred option for RAG-based agents. It also provides the option to host these LLM agents locally, which can be interesting from the data privacy aspect.
Haystack: Originally launched as a normal NLP tasks library, Haystack has quickly adapted to LLM agents. It provides several integrations for different vector stores and models.
AutoGen: AutoGen, developed by Microsoft, uses a layered approach that lets developers use both high-level and low-level APIs. In addition to other benefits like message-passing and external API integration framework, it also supports .Net (yet to launch as of Feb 2025).
SmolAgent: SmolAgent is a simple framework by Hugging Face. The huge range of transformers on Hugging Face means one can use any of those models for developing LLM agents. The real power of SmolAgents lies in the concept of CodeAct, where we provide code snippets instead of Text (or even JSON) for the agent’s reasoning.
![Traditional agents vs CodeAct (Taken from the CodeAct paper [ICML 2025])](https://blog.apify.com/content/images/2025/02/Traditional-agents-vs.-CodeAct.png)
Deployment
Anthropic Claude API: Despite the presence of tech giants like Google, Microsoft, and OpenAI, Anthropic Claude API is still gaining popularity as an LLM agent deployment framework. It (at least theoretically) centers around ethical AI assistants.
Relevance AI: Relevance AI is focused towards “no-coding” agents. These agents can be developed with little (or no) coding and can integrate with some APIs.
Microsoft Copilot Studio: Copilot Studio is especially useful for office productivity. It provides a range of OpenAI models and integrates seamlessly with the Microsoft 365 apps.
LLM agent challenges
Being a rapidly growing area means LLM agents have room for further exploration and face some challenges as well.
Hallucinations
Hallucinations are always a problem with LLM agents. Despite the facility of interaction with the internet, LLM agents often make up their own facts. Fine-tuning them is quite costly but Retrieval Augmented Generation (RAG) can be an effective (both in cost and efficiency) solution. RAG is a huge application where we can use web scraping efficiently if it's combined with some expert feedback/guidance.
Privacy
ChatGPT maintains a history of our conversations (which is true for most agents). While it's helpful in maintaining context, it comes at a cost, as it can expose our sensitive information. Sensitive information goes beyond passwords and credit card details. It may include metadata (like location and search trends) and semi-sensitive information related to business or family.
Jailbreaks
Jailbreaking LLMs has proven to be easier than it seemed at first. With a little tweak in the prompt, one can exploit LLMs for negative purposes. Becuase these agents are available to anyone from school kids to outlaws (mainly via mobile apps), these jailbreaks can share some unwanted information.
![An example of a Jailbreak prompt: with a little tweak we can exploit LLMs (Credits: “Do Anything Now” paper)](https://blog.apify.com/content/images/2025/02/Example-of-a-Jailbreak-prompt.png)
These challenges, especially jailbreaks, remind me of the Radar problem we studied in childhood. As radars kept advancing, so did stealth techniques, leading to advancements in both industries. So, these jailbreaks may sound daunting, but they (their identifications) will help enhance LLM agents’ capabilities even more in the long term.
Conclusion
LLM agents remind me of a friend from my university days who used to play down the need for extensive programming, saying, “Don’t ye worry! Agent systems will soon take over.” The joke was on us, as time has proven how right he was!
The way LLM agents have changed the way we approach not only software applications but also the whole AI/Deep Learning canvas is unbelievable (and the future is even more exciting).
LLM agents are making it possible to automate a range of tasks smartly with minimal setup. I say "smartly" rather than "intelligently," as hallucinations and some other challenges still need to be solved, and they remain one of the researchers’ focus points, too.