Blog Image
Introduction

I spent 3 years at Meta as a software engineer. The systems I worked on processed billions of transactions, served millions of users, and generated hundreds of millions in revenue. Yet even the best engineers could have 10x’d their work if they had built reliable agents as opposed to babysitting automations.

My goal for this article is to help you understand agents better, no matter if you’re a beginner, an expert, or anything in between. (as someone who’s running a $3M company building them)

What are agents?

An agent is a system that takes actions on your behalf based on the goals you give it, not instructions. The difference is important, as it can be the difference between a simple trigger, or an understanding of an entire workflow. E.g: you run a script that says “send this email” and it does only that. Whereas if you were to tell an agent: "make sure this customer gets a response within 4 hours," it figures out what needs to happen, which is to check if someone already responded, draft a reply if not, escalate if the issue is complex, and verify the customer actually received it.

  1. The agent loop works as follows:

  2. Observes the current state (e.g: read emails, check databases)

  3. Decides what action to take based on the goal

  4. Takes the action (sends message, triggers workflow)

  5. Observes the result

  6. Repeats until the goal is met or it hits a stopping condition (outlined by you)

This is different from traditional automation in one critical way: agents handle exceptions. When something unexpected happens, agents route to a human, log what went wrong, and wait for guidance. The bonus is that they also learn from that guidance for next time.

The Three Components Every Agent Needs

Every production agent system has the same three pieces. Skip any of them and your agent either can't do useful work or creates more problems than it solves.

Perception: This is how the agent sees the world, which is usually a combination of APIs, databases, and document stores.

Decision Logic: This is how the agent chooses what to do next. Production agents use structured decision trees for routine cases and only invoke the model for ambiguous situations

Action Interface: This is how the agent affects the world. Every action needs to be logged, reversible where possible, and gated by permissions. Ideally include reasoning where possible.

These three components form a loop - the agent observes through its perception layer, decides through its logic layer, acts through its interface, then observes the result of that action. Repeat until done.

Tools Are How Agents Take Actions

Tools are functions that the agent can call. Each tool does one thing: send an email, query a database, create a calendar event, etc..

Important to note that the model doesn't execute tools directly. It returns a structured request: "I want to call the send_email tool with these parameters." Your orchestration layer then validates the request (does the agent have permission? are the parameters valid?), executes the function, captures the result, and feeds it back to the agent.

Agents aren’t aware of how they send emails, but they do know what tools to use, with what input and what output is expected. Good tool design is about boundaries - one tool should do one thing, have clear success and failure states and return structured data for the agent to reason through. If a tool sometimes works and sometimes fails silently, the agent can't learn from it.

Context Windows & Memory

Agents need memory because they operate over time. Often a conversation with an agent might span several hours, and the workflow would involve over a dozen steps. This means that the agent needs to remember what it's already done and what it’s trying to accomplish.

As you probably know, context windows are limited, even with 200K tokens, you can't fit an entire project history into every request. This is where external memory comes in.

External memory is just files or databases the agent can read and write. When an agent completes a task, it writes a summary to a log file, which keeps the active context window small while giving the agent access to everything it might need.

The pattern that I’ve seen work most often: use the context window for active work (current task) and external memory for history (completed tasks). The agent decides what to load from memory based on the current goal.

For production systems, memory also serves as an audit trail. Every decision the agent has ever made gets logged with context. This means that when something goes wrong, you can trace back through the memory to see where the agent's reasoning broke down, and fix your agent.

Planning vs Execution

Most agent failures happen because people skip the planning step, which is arguably the most important step. For anything non-trivial planning should be the first thing you do, because this is where the agent breaks down a goal into steps, and identifies dependencies. A human would then review the plan and correct any misunderstandings (and approve execution).

Here’s an example.

Goal: "Migrate the customer data from the old system to the new one." Outcome:

Without planning:

  • Agent starts writing migration scripts without understanding the schema differences

  • Misses edge cases in the data that require manual review

  • Executes the migration during business hours when customers are active

With planning:

  • Agent performs schema comparison (old vs new)

  • Agent performs volume analysis (how long will this take, when should we run it)

  • Agent identifies edge cases successfully (records that don't fit the new schema)

  • Agent verifies approach (how do we confirm nothing was lost)

  • Agent has a rollback plan

A human would then review this plan, catch the parts the agent got wrong, and approve execution. Just remember that planning is key

Handling Failures and Exceptions

As good as agents are, they often fail. But you can reduce the rate of failure for an agent by having the right failure modes in place.

The 3 best ones are:

Retry with backoff: For transient failures (network timeout, rate limit), the agent should retry with exponential backoff.

Human-in-the-loop: For decisions the agent isn't confident about, it should stop and ask a human.

Safe failures: The agent should always fail safely, which means never deleting old data.

The key in understanding failures is making them observable, which means asking yourself what the agent was trying to do and what went wrong.

Guardrails and Permissions

Guardrails are hard limits the agent can't bypass. This could be prohibited actions (never delete data) or enforcing rate limits

Permissions are role-based access controls. This could be an agent that can read customer data, but not modify it. Or an agent that can create calendar events, but not delete them. The agent doesn’t know about the latter constraint though. It just tries to take actions, and the orchestration layer enforces the rules. So if the agent tries to do something it’s not allowed to do, the action gets blocked.

Building Your First Agent

Start with a problem that's well-defined, with clear success criteria. This is similar to prompt engineering, where you don’t want to be vague.

The steps to build your first agent:

Define the goal in specific terms → List every piece of information the agent needs → Write actual functions + build the tools the agent will use → test independently → write decision logic → wrap in orchestration layer → test with real data → add guardrails based on what broke → add rate limits → add validation

Congrats, you just built your first agent! Side note: your agent will be narrow, which is good, as you want to get one thing working reliably before expanding scope.

When to Use Agents vs When Not To

Just like AI isn’t the answer to every problem, neither are agents. They're good for a specific class of work, like repetitive decisions that require some judgment but follow learnable patterns.

The best agent deployments we’ve done at Varick all share one pattern: they handle the 80% of cases that are straightforward, and they route the 20% that are actually complex to humans who can apply judgment. The goal is to free the expertise of humans to focus on the problems that actually require it.

What Comes Next

Since you now understand what agents are, and how they work, you can go build one for yourself.

The next level of applying all of this information to real life is understanding composition: how do you orchestrate multiple agents together / how do you build agent hierarchies and so on. If there’s enough demand, I’ll work on agents 102.

For now, take the concepts here and build something. Pick one repetitive task in your work or business, break it down into perception/decision/action, build the tools it needs, and see if you can get an agent to handle it reliably. You'll learn more from one real implementation than from reading ten more articles.

TLDR:

Agents take actions based on goals.

Every agent needs three components: perception (how it sees the world), decision logic (how it chooses actions), and action interfaces (how it affects the world).

Utilize tools, which are functions the agents call to take actions.

Always plan, since complex tasks help the agent break down the goal and identify dependencies.

Establish guardrails and permissions - these should be impossible to bypass

Build one agent that does one thing reliably before expanding scope.

Use agents for specific tasks.

If you’re a business owner thinking about implementing AI for your business, just book a free AI Audit with my company Varick Agents, and we’ll explore whether or not AI is a good fit for you + what a potential AI transformation would look like: book today at varickagents.com.

And if you want more tips on how to get the most out of AI for yourself or your business, subscribe to our free weekly newsletter: varickagents.com/newsletter