In the midst of all the DeepSeek hype, the launch of GPT Operator went almost unnoticed. It was supposed to be GPT’s long-awaited answer to the biggest AI trend of the year. But how does it actually work—and does it live up to expectations? I decided to experiment with Operator and share my experiences in this article.
Back in 1950, Norbert Wiener, the father of cybernetics, wrote in his book The Human Use of Human Beings that automatic machines would one day be able to take over human work. He warned that technology would automate not just physical labor, but also mental tasks.
At the time, this sounded like science fiction—but Wiener already foresaw that machines would become smarter than people expected. As he wrote:
“The automatic machine, when used for production, competes with human labor not on the basis of man’s muscle power, but on the basis of his intelligence.” – Norbert Wiener
AI agents have long been a dream scenario, but GPT Operator feels like a serious step forward. The technology is powered by a new model—the Computer-Using Agent (CUA)—which combines vision and reasoning. And yes, it’s available only to the happy few with a $200/month Pro subscription. But just how smart is this AI? Last weekend, I got to play with Operator via a client in the US. I put it through its paces—and let’s just say the results were… surprising.
Operator is not your typical chatbot. Unlike ChatGPT or Gemini, this tool can actually view web pages, click buttons, type in forms, and complete tasks. In theory, it means you can say: “Hey Operator, book a table for two in Eindhoven,” and boom—it’s done.
But how autonomous is it really? Operator doesn’t use traditional APIs—it uses a built-in browser to visually interpret and interact with websites like a human would. It can collect data, complete tasks, and even work with platforms like OpenTable. Still, I ran into a few limitations along the way.
I gave Operator several tasks to test its capabilities and see whether it could really make a difference in daily life.
Teaching GPT Operator to Make a Reservation
Lately, I’ve forgotten to book restaurants when meeting clients or friends. That’s becoming more problematic now that restaurants are often fully booked. So I gave Operator a task:
“Book a table for two in Eindhoven at restaurant X (name not relevant), Friday night at 7 PM.”
Operator enthusiastically opened the restaurant’s site via OpenTable (a platform most places I visit use). But it quickly ran into problems with the dynamic interface.
- No login prompt: Operator didn’t ask for my login details, and therefore got stuck at the reservation page. Without logging in, it couldn’t complete the booking.
- Wrong selection: Instead of checking availability, it stayed on the homepage and selected random options without showing real-time slots.
- No flexibility: When my first choice (7 PM) wasn’t available, Operator didn’t suggest alternatives. A human would immediately try a different time or restaurant—but Operator just gave up.
After 10 minutes, I had to take over and book it myself. If I hadn’t, I would’ve been stuck without a table again.
Automating Simple Intern Work
After the restaurant test, I tried something more work-related:
“Find 20 popular crypto influencers on YouTube, collect their LinkedIn profiles and email addresses, and put it all into an Excel sheet.”
The first few minutes were genuinely impressive. Operator opened a browser, searched for finance influencers, and started collecting info. But soon, the issues began:
- Poor search strategy: Instead of searching YouTube directly, it used Bing as the primary source—leading to irrelevant or outdated results. A human would obviously start on YouTube itself, where bios and contact links are listed. Operator didn’t.
- Hallucinations: Operator started inventing LinkedIn profiles and email addresses. Some contact details were completely fictional and didn’t exist anywhere online. If I had blindly used this data, I would’ve ended up with a long list of useless—or even damaging—leads.
- Speed issues: Scrolling, clicking, and typing took several seconds per action. After 20 minutes, it had only found 10 influencers—and much of the data was incorrect. A manual search would’ve been faster and far more accurate.
In short: if Operator were an intern, I’d thank them politely… and never hire them again.
Operator as a Personal Shopper
Next, I tested something that often takes up unnecessary time: online shopping for basic things. So I gave Operator this task:
“Order a pack of coffee and a USB-C to USB cable from a major Dutch webshop.”
At first, things went well. Operator searched for the products, added them to the cart, and went to the checkout page. Then came the issues:
- No payment handling: Operator couldn’t process payment or ask me to step in. So the order remained incomplete.
- Wrong product match: It selected a USB-C cable, even though I had specifically asked for a USB-C to USB cable.
- Ignored error messages: When a product was out of stock, Operator didn’t try alternatives. A human would intuitively pick another brand or size—but Operator just stopped.
The result: a half-filled cart and a purchase I still had to complete manually.
Booking Flights at Lightning Speed?
Lastly, I tried the example OpenAI itself often gives: booking a flight. I travel frequently, so I was hopeful. But again, it fell short.
It did, however, show me what Operator is good at: handling simple, repetitive tasks—like placing the same weekly order from the same supplier.
But anyone who has booked a flight knows how many steps are involved. How many choices there are. How useful it is to see if flights are cheaper a few hours earlier. Then there’s seat selection (which varies across planes), meal preferences, luggage options—you name it.
Despite its shortcomings, I still believe Operator has real potential. This is only the first version, and OpenAI will undoubtedly improve its speed and accuracy. Just compare the first version of GPT to what we have today.
Affordable alternatives like DeepSeek could also make this technology more accessible. Other players like Google (with Project Mariner) and Anthropic (with their own Computer Use AI) are working on similar systems. That competition means we’ll likely see even more powerful AI agents soon.
For now? It’s an impressive demo—but not a gamechanger. My job is safe… for now. But ask me again in a year.