Varick Agents: Is AI Overhyped? Last week's reality check.

Published 6 days ago • 3 min read

The Week AI Hype Met Reality

We're actively deploying AI agents for clients who need results, not demos. One project involves automating invoice processing for a 9-figure company, cutting 300+ weekly hours of manual work. Another enhances a sales platform's lead scoring by 40% through better data classification.

The difference is that we focus on what actually works in production, not what sounds impressive in press releases. If your business wants AI that performs rather than just impresses, book a free call at varickagents.com.

Last Week's Reality Check

Everyone expected fireworks. GPT-5 was supposed to change everything. Claude Opus 4.1 would revolutionize coding. The internet was ready for the next AI breakthrough. People on X were genuinely convinced AGI was here.

What we got instead was... mediocre.

GPT-5 launched to collective shoulder shrugs. Slow responses, inconsistent outputs, and enough reliability issues to send most developers back to their previous tools within hours. I spent an hour trying to solve a coding problem with GPT-5 in Cursor that Claude handled in five minutes. In the app, on max think mode, it's pretty good. But anything other than that is very underwhelming.

Claude Opus 4.1 is an incremental improvement, if even that. Some are saying it's somehow worse.

Here's what most people missed last week while complaining about broken workflows: the real innovation happened quietly in Google's labs.

The Actual Breakthrough Hardly Anyone Talked About

Google DeepMind dropped Genie 3 with minimal fanfare. Type "medieval castle with a dragon" and get a fully interactive 3D world you can explore in real-time. Not a static image. Not a video. A playable environment with consistent physics.

This is genuinely a fundamental shift from generating content to generating systems. Think training simulations, rapid prototyping, synthetic datasets for specialized industries.

The military applications alone are endless. Imagine dropping your soldiers into a training environment that mirrors 1:1 what they can expect when deployed. And it's not just military: this will forever change the way virtual reality and gaming is generated in 2-3 years. While everyone debated GPT-5's personality changes, Google shipped technology that was science fiction just six months ago.

What This Means for Your Workflow

If you're coding: Stick with what works. The best developers I know are still using Claude Sonnet 4.0. New doesn't always mean better, especially when "better" breaks your existing setup.

If you're running a business: Stop chasing every model release. The companies seeing real ROI from AI are the ones using proven tools consistently, not jumping to every shiny new option. If it works, it works.

If you're experimenting: Now's a great time to test open-weight models. OpenAI's gpt-oss models (20B and 120B parameters) run entirely on your infrastructure. Perfect for regulated industries or anyone tired of vendor lock-in. If you've been waiting to use GPT APIs because you don't want to give OpenAI confidential company data, now is your time to shine. If you're interested in doing this for your business, visit varickagents.com.

How We're Actually Building

My development setup this week:

Cursor: Still running Claude 4.0 for consistency
Complex problems: GPT-5-Thinking mode for when I'm stuck (it's genuinely smart, just poorly integrated with Cursor). This is my greatest hack, give it a shot: say Claude in Cursor is stuck on an issue and can't solve it, I ask it to create a markdown file with all the context it would need on solving the problem, as well as everything it has tried, and all the error messages. Then I paste this doc into ChatGPT, and it usually solves it with some back and forth.
Agent development: I'm testing Claude Opus 4.1 vs GPT-5 on longer workflows where context matters. GPT-5's API cost is insanely cheap. So far, I'm impressed.
Experimentation: Running gpt-oss-20b for mobile proofs-of-concept.

The key is testing everything while keeping production stable. New models go into parallel environments first. Migration happens only after measurable improvements in real workflows.

Bottom Line

The best AI implementations I'm seeing aren't using the latest models. They're using the right models consistently.

That 9-figure company saving 300+ hours weekly? They're running on GPT-4o. We haven't shifted them to GPT-5, and honestly don't even plan to yet. So far there hasn't been a need to justify the jump. The sales platform that has improved lead scoring by 40% is the same story.

Innovation isn't about chasing releases. It's about identifying problems worth solving and using whatever tools actually solve them. Focus on results, not reviews. Test new models, but don't bet production systems on marketing promises.

The harsh truth is that the technology you need to 10x your business already exists. The gap isn't in the models, it's in the implementation. The companies succeeding with AI gains right now have clear problems, disciplined approaches, and teams that prioritize shipping over experimenting. If that sounds like your business and you're ready to move beyond the hype cycle, let's build something that actually works. Click Hire Varick Agents below.

As always, thanks for reading. If you have a topic you want covered next week, leave a reply either here or on Twitter. We'll see you in the next one.