Human-in-the-Loop: What It Actually Means

Most people say 'human-in-the-loop' like a reassurance sticker. Here's what it looks like when you actually mean it -- and why it means you don't need the bleeding-edge models everyone is chasing.

"Human-in-the-loop" is a fun phrase. It sounds a lot like staying informed -- "keep me in the loop." And to a degree, it means that in a generative AI context as well, but it goes beyond just knowing what is going on.

As appealing as it may sound to just let the bots make decisions and run without any human intervention, I think it is a trap. And poorly done human-in-the-loop is even more of a trap!

When I use the term, I am making sure to keep the human decision making and influence in the AI system. At minimum that means additional guidance early in the process to make sure it is pointed in the right direction. But being in the loop means staying on top of the decisions as they are being made.

I had too many early attempts that ran off the rails to fully trust the AI running by itself. I give it what seems like a clear task and almost immediately it makes choices that I disagree with that then get compounded by later decisions.

What Can Go Wrong

A big one for me is when using the tools for writing software. I have strong opinions on what the right tools and approaches are. And most of my projects tend to be early-stage "discovery" projects or internal tools that will forever be small. I don't want the full enterprise software development lifecycle. In fact, the "best practices" that are great if we are scaling a project are antagonistic while still finding the problem-solution-market fit.

I want to be flexible to pivot as we get feedback, so I intentionally "break the rules" until we've learned what we need to learn early on. I also structure things in ways that are more LLM-friendly rather than maintenance-friendly. Watch for more on my "extreme prototyping" approach if you're curious about that aspect of the AI tools.

The Evolution of Delegation

The early version of "delegation" to the LLMs was at best "pair programming" -- I let the LLM generate code (or content, when applying the same approach outside of software development) then review it all carefully. Once I accepted the changes, I would give the LLM the next discrete task. I was tracking all the work and making all the decisions at that level of abstraction, while reviewing and approving the more concrete details of what the LLM was producing.

If it weren't for the positive connotations attached to the pair programming analogy, I'd almost describe it as micromanaging the LLM. Technically, it is doing its job, but if you're still there making the decisions around whether it is any good or not, you're still half doing the job. And some people think that the review portion of the job is the worst part. It doesn't feel great when you hand off the "best" parts of the task only to get back more of the worst parts. Few people out there would take that offer.

The Real Advantage

The big advantage to this sort of monitored delegation is that you can shift your thinking up a level of abstraction. That means you can stop worrying about the details of how something is happening and make sure the right things are happening.

It also means that if you don't have the brain power for the detailed work, you can still move things forward by giving the yes/no. Sometimes I want to work on a project and don't even have the available mental-emotional-physical bandwidth to make those levels of quality control decisions. I can still answer questions and do the thought partner sorts of interactions. It means that I can engage with the work when I want, almost entirely independent of what my energy level is -- there is always some aspect I can engage with when the LLM is providing some scaffolding.

The Agentic Trap

This is a very different way of interacting with the tools than the more recent possibilities of "agentic" workflows. The advances in state of the art (SOTA) models have unlocked more and more decision making that the LLM can nominally do. I say nominally because it is sleight of hand and anthropomorphizing that makes the AI tools look like they "make decisions" and, more to the point, because the average quality of decisions are still lagging behind the expectations folks seem to have for just handing off large tasks and getting back spectacular results.

One way that the agentic tools improve quality is by looping as they work towards a well-defined outcome. As long as the LLM model can assess how well they are completing a task and are given an opportunity to iterate, they can try different approaches. Generally, this lets you get closer and closer to the desired results -- given enough "creativity" from the model or enough variety when using multiple models.

These SOTA models can now do larger and larger chunks of work before they succeed or throw up their metaphorical hands and say they can't do any more. That sounds great. Then you get this pile of output for that human-in-the-loop to review and approve. We're back to trading one kind of work for another kind that we may appreciate less.

One obvious answer is to have the LLMs check the work of the LLMs. So you add another layer of tools to check the output of the first tools. As long as the models can catch the types of issues another generates, we can make progress this way. If the model isn't up to the task, we are just kicking the can down the road and making the review task that much more taxing.

The Compounding Mess

There are hidden decisions from earlier that are showing up downstream. There are those subtle issues that look okay with a surface review -- which is all it is likely to get here; we were using agents to get rid of work, not create more! There is the work of understanding the problem space before we can commit to a particular solution (see the Double Diamond approach from the graphic design side of things). All a poor agentic run did was make the eventual task more complicated, possibly more complex, and concentrated the thinking needed to work through the issue while adding lots of potential distractions at the same time.

This is the spiral that relying on the AI tools as human replacements sparks. We need better models that make more and better decisions that let more work happen before the human is involved. And when it fails, it makes a bigger mess than if people were involved all along -- but it can do it faster! ("Sure, we're losing on each sale, but we'll make up for it in volume!")

The Alternative

Another direction is to lean more heavily into the human-in-the-loop approach all along. We certainly can't hand off as many tasks fully, but we avoid the "bankruptcy" that comes from compounding issues stewing in the background until a human gets involved.

From the early days, I have acted as if the LLMs are talented high school students. They can do some amount of work and come back sounding very confident in their results -- whether that confidence is well founded or not! They often don't know what they don't know and leap to premature conclusions. As the models have gotten more capable over the years, I might upgrade them to a talented undergrad student, but I still treat my agentic systems as if they are interns. Keep an eye on their work output, trust but verify, and only give them access to resources you're willing to chalk up to "learning experiences" when things inevitably go wrong. And be pleasantly surprised when things go smoothly, rather than assuming that you can hand your whole operation over now that one thing went well.

The Horse Riding Analogy

To use a horse riding analogy, I prefer to give the LLM its head -- to make choices without me needing to direct every detail -- and redirect quickly when we get off course. I don't just give the horse the reins, expect it to guide itself, and hope for the best.

The fun part is finding what aspects of our tasks can be handed off to what degree and which class of delegation a given task might fall into.

Is this a task that I honestly don't care how it is done, just that it is complete, like delivering messages? Or is this something where I'm still going to go through the details if I handed it to a professional because the consequences are so severe, like preparing taxes? What is the worst that can happen? If the worst is just wasted time, that's a great task to test with the AI. If we're putting someone's safety and security at risk, I'm not ready to just hand that to an AI.

Matching the Tool to the Task

A lot of how I use the LLMs now is to support the parts of the task that either I dislike, don't naturally do, or where my natural tendencies make that task harder rather than easier. For example, I know I need to get more of my thinking out there into the world and helping folks like you, but my natural quality control (verging on perfectionism) means that I end up in an editing loop instead of pushing out more content. Right now, I'm having an interactive conversation with an agentic tool that I'm letting make the "good enough" decision for me. I'm intentionally putting as much of myself into this as I can and only letting it do light editing. It is a task that has more benefits from doing at "good enough" levels than potential downsides from it now being "excellent." And if I'm really not happy with it, I can still edit it -- before or after making it public!

Why You Don't Need the Bleeding Edge

When I am using the tools as thinking partner, I am still there making decisions, so I really don't need that full state-of-the-art model -- and the premium price for that SOTA performance! For many tasks, like this interviewing approach, the cheaper, open models are more than sufficient and can be run for pennies to dollars instead of tens to thousands of dollars. Some of them can even run here on a local machine and only cost me the additional electricity.

So I have found many benefits to staying firmly in the human-in-the-loop camp and being conscious about what tasks I delegate and what parts I retain for myself. You're getting me (and more of me!) with the support of the LLM, not some AI-only slop -- state of the art or not.