As inflation rises, some businesses consider replacing demanding remote freelancers with AI agents. However, new research indicates that this approach may lead to unexpected challenges.
Research Highlights AI’s Limitations
A study by the nonprofit Center for AI Safety (CAIS) and data annotation firm Scale AI reveals how AI models aimed at automating tasks often underperform compared to their human counterparts. The research assessed the productivity of six top AI agents in simulated freelance tasks.
Results Show Low Efficiency
The findings were alarming: none of the AI agents met more than 3 percent of their work targets, achieving only $1,810 out of a potential $143,991. According to DAIS director Dan Hendrycks, this explicit data sheds light on the actual capabilities of AI in the workforce.
Benchmarking AI Performance
To frame their study, the researchers created the Remote Labor Index (RLI), an evaluation tool designed to measure how well AI can handle real-world remote projects across various sectors, including game development and data analysis.
Surprising Performance Rankings
The results ranked the AI agents based on performance. The standout was an AI from the Chinese startup Manus, achieving an automation rate of merely 2.5 percent. Close behind were Elon Musk’s Grok 4 and Anthropic’s Claude Sonnet 4.5, each hitting 2.1 percent. OpenAI’s GPT-5, despite its claims of advanced intelligence, reached only 1.7 percent, while ChatGPT Agent performed even worse at 1.3 percent. The least effective was Google’s Gemini 2.5 Pro, with a dismal 0.8 percent.
The Reality of Automation
The tech industry has been heavily investing in AI, desperately seeking to leverage AI agents for increased productivity. However, many companies that replaced workers with AI found themselves needing to rehire as they discovered that AI tools often fail to deliver quality work.
The Cost of Low-Quality Work
Numerous studies underscore the pitfalls of implementing AI in the workplace. An MIT study revealed that 95 percent of companies that experimented with AI reported no substantial revenue growth. Furthermore, AI-inaugurated processes often produced low-quality work, creating frustration among employees tasked with correcting errors.
Ongoing Challenges in AI Development
Despite swift advancements in AI technology, significant shortcomings remain. Hendrycks emphasizes that AI agents struggle with long-term memory and continuous learning, unlike human workers who acquire skills through experience. Despite these challenges, the trend of workforce automation shows no signs of halting.
The Challenges of Replacing Freelancers with AI Agents
As inflation rises, many employers are facing increasing demands from their remote freelancers for higher pay. This has led some organizations to consider replacing human workers with AI agents. However, recent research suggests that this may not yield the desired productivity gains.
AI Performance Compared to Human Freelancers
Recent studies conducted by the Center for AI Safety (CAIS) and Scale AI reveal a concerning disparity in productivity between AI agents and human freelancers. The findings demonstrate that AI models, designed to automate tasks or even entire job functions, often fall short of expectations.
Insights from the Research
The researchers evaluated six leading AI models using a benchmark they developed called the Remote Labor Index. This index comprised a diverse array of real-world remote projects specifically designed to assess each AI’s ability to perform economically valuable work across different sectors, including game development and data analysis.
Staggering Results
The results of the study were startling. Not a single AI agent managed to complete more than 3 percent of the tasks assigned to them. For example, a total of $143,991 worth of potential work was available, but the AI agents collectively completed only $1,810 worth. This stark underperformance raises important questions about the capabilities of current AI technology.
Top Performers and Their Shortcomings
The research identified Manus, a Chinese startup’s AI agent, as the top performer with an automation rate of just 2.5 percent. Following closely were Elon Musk’s Grok 4 and Anthropic’s Claude Sonnet 4.5, both clocking in at 2.1 percent. Additionally, OpenAI’s GPT-5, despite claims of its advanced capabilities, managed only 1.7 percent. Even more alarming was Google’s Gemini 2.5 Pro, which posted a meager 0.8 percent success rate, raising red flags about its reliability.
The Economic Implications of AI Automation
Employers have been eager to integrate AI into their operations, hoping it will streamline workflows and reduce reliance on human labor. Yet, the reality is that many companies that have transitioned to AI report little to no economic growth. A significant number of firms that experimented with AI initiatives found that their revenue remained stagnant, highlighting the challenges of relying solely on AI for productivity improvements.
Understanding the Limitations of AI Technology
The limitations of AI agents are clear. They lack the long-term memory and continual learning abilities that humans possess, which are essential for adapting to new skills and situations in the workplace. This raises critical questions about their effectiveness in replacing human workers, especially in creative or complex tasks that require nuanced understanding and expertise.
The Future of Work
As discussions about AI and its impact on jobs continue, the reality remains that many businesses are grappling with the repercussions of prematurely adopting AI technology. Anecdotal evidence suggests that organizations rushing to replace human workers with AI have often had to rehire them after realizing that AI couldn’t meet their operational needs. The ongoing evolution of AI may redefine the workplace, but for now, it appears that human talent remains irreplaceable.

