Artificial intelligence is changing rapidly, and it is no longer just about chatbots that answer questions. Since the arrival of ChatGPT at the end of 2022, attention has shifted towards action-driven AI agents. Unlike chatbots that process information and respond in natural language, these agents are capable of executing complex tasks autonomously. We will explore in depth what AI agents are, how they work, and what their different types are. First, let’s look at a summary table with all this information and then we will get to the main points:
Aspect | Description |
---|---|
Difference with chatbots | Unlike chatbots (e.g., ChatGPT, Gemini), which are limited to processing and responding to information in their environment, AI agents can invoke tools, memorize short/long-term data, and perform multiple steps to complete tasks. |
Key technologies | Based on large language models (LLMs) fine-tuned for action, reinforcement learning, and visual language models; they integrate external tools (APIs, functions, GUIs). |
Types of agents | 1. Simple reflex agents: conditional logic without memory. 2. Model-based reflex agents: internal memory and fixed rules. 3. Goal-based agents: plan to achieve specific goals. 4. Utility-based agents: optimize a reward function. 5. Learning agents: improve with experience. |
Notable examples | – Operator (OpenAI): navigates and operates GUIs on the web to shop, book, and fill out forms (requires human supervision). – Deep Research (OpenAI/Gemini): generates detailed reports with citations. – Computer Use (Anthropic): controls a computer via screen vision. – Manus (China). |
What are AI Agents?
The term ‘AI agent’ refers to a software system that uses artificial intelligence to plan, reason, make decisions, and carry out multiple actions in order to achieve goals autonomously. Unlike chatbots, which process information in a closed environment, AI agents interact with external systems to complete their tasks.
Like chatbots, AI agents operate with large language models (LLMs), but they are tuned to be action-driven. In today’s AI landscape, many companies employ reinforcement learning and advanced reasoning over visual language models to develop these agents. Additionally, they integrate external tools such as APIs, functions, and databases to carry out a variety of tasks.
Essentially, AI agents are more than just a model; they are an AI system that allows interaction with tools, short and long-term memory, and connection with third-party systems to perform specific tasks. A notable example is OpenAI’s Operator agent, designed to interact with graphical interfaces on the web.
This agent can browse the internet, order groceries, fill out forms, and book flights, among many other actions. While it uses GPT-4’s vision capability to analyze screens and determine where to click, it is not completely autonomous and sometimes requires human supervision to complete tasks.
Types of AI Agents
According to Stuart Russell and Peter Norvig in their book ‘Artificial Intelligence: A Modern Approach’, AI agents are classified into five types: simple reflex agents, model-based reflex agents, goal-based agents, utility-based agents, and learning agents.
Simple reflex agents operate with conditional logic and are the most basic form. They do not learn or retain memory of patterns. On the other hand, model-based reflex agents have memory and build a basic understanding of the world from their actions. For example, a vacuum robot adjusts its behavior to avoid obstacles as it detects them.
Goal-based agents are not limited by rules and must achieve specific goals, planning and reasoning to find the best way to fulfill them. A good example would be an AI playing chess, considering all possible moves to achieve victory.
Utility-based agents maximize “satisfaction” or “happiness” through a reward function. Finally, learning agents have similar capabilities to other agents but can acquire new knowledge from an unknown environment and improve over time.
Examples of AI Agents
A pioneer in this field is OpenAI’s Operator agent, which can perform tasks on the web through a cloud browser. It can order food, find hotels, and buy concert tickets. Currently, it is in a research preview and is only available to ChatGPT Pro subscribers, which costs $200 per month.
In addition to Operator, OpenAI has launched the Deep Research agent, which delves into any topic and generates comprehensive reports, including citations to verify information. There is also the Deep Research agent from Gemini, which offers similar services for free.
On the other hand, Anthropic has introduced the Computer Use agent, which can operate a computer by visually analyzing the screen. Although it is a bit slow, it fulfills its function. Anthropic’s MCP standard is being adopted by companies like Google and Microsoft to connect AI agents with AI models.
Recently, China’s Manus agent went viral, capable of browsing the web, executing code, and interacting with a computer in the cloud. Despite its impressive demonstration, it operates with Anthropic’s Claude 3.5 model.
Finally, Google is developing Project Mariner, an agent that will perform tasks in the Chrome browser, similar to Operator, currently in testing with selected users.
Although we are in the early days of the era of AI agents, it is clear that the future is heading towards action-driven applications. We have not yet reached a level where AI models can perform critical tasks completely autonomously, and AI companies are incorporating human supervision as part of the process.
0 Comments