My Perspective on GenAI: Part 2

Seven months later, the tools caught up to the vision.


A Lot Has Changed

When I wrote My Perspective on GenAI back in August 2025, I was using AI for about 20% of my output. Ideation, boilerplate, research. Useful but bounded.

Seven months later that number has exploded. I’m building ideas I’ve had for years and never had time for, and I’m making progress on them daily. The tools have matured to the point where experienced users can run multiple workstreams in parallel, asynchronously, from anywhere. My wife recently said, “You’re programming on your phone, you’re programming on your computer. What are you doing?!”

My AI interns and I are trying to take over the world.


My Current Toolbox

My main apps haven’t changed much but the mix has evolved:

  • Kiro IDE – Spec-driven development. I use the advanced spec workflow to define requirements and let Kiro generate structured plans. In many cases I hand those specs off to other AI agents to execute, but the framework Kiro produces is a huge boon. It forces clarity before code.
  • Kiro CLI – Quick iterations directly from the terminal.
  • Claude Code Web – The biggest addition. This is the game changer.
  • Claude Code – Local agent work when I need it on my machine.
  • Microsoft Copilot – Still my go-to for chat history and mobile accessibility.

The biggest change from six months ago is Claude Code Web. The productivity gains convinced me to spend $100/month on the Max plan. Why? Because with Claude Code Web I can have agents working completely asynchronously from me. I’m kicking off more than 50% of my tasks from my phone and checking in on them there too.

Claude Code Web session

My joke to my friends is that my phone is my new coffee shop development environment.

Phone dev setup -- Claude Code, Copilot, AWS Console, GitHub

My phone showing the apps I use daily: Claude Code, Copilot, AWS Console, and GitHub. More than half my tasks start here.


Async Development Is Harder Than It Sounds

Everywhere I look people make async AI development sound easy. It’s not. Getting to the point where you can kick off a task and walk away takes real upfront investment. It can actually be slower at first until you build a framework.

OpenAI recently published Harness Engineering, a blog post about how their team built a product with zero manually-written code using Codex. I resonated with it a lot. Their core argument is that the engineer’s job shifts from writing code to designing environments, specifying intent, and building feedback loops. That matches my experience exactly.

But here’s the thing – their post is largely theoretical. They describe the philosophy. Let me show you what I actually built.


Building the Harness: A Real Example

I’ve been building a Rust DPDK application. For the unfamiliar, DPDK is a packet processing framework that requires special kernel drivers. This is not a simple web app. You can’t just run cargo test locally and call it a day. The build environment needs specific hardware configurations, kernel modules, and network interfaces.

So I had to build a feedback loop for the coding agents.

Here’s how it works:

  1. Claude Code Web creates a feature branch, writes code, and runs unit tests.
  2. When it’s done, it pushes to origin.
  3. That triggers GitHub Actions. The CI environments have access to AWS resources and can deploy what’s needed.
  4. Builds, unit tests, integration tests, and performance tests all run.
  5. Here’s the key – the results post back to the Pull Request. Not just pass/fail. Environmental details. Network configuration. dmesg output. Kernel crash logs. Application logs. CloudFormation init logs. Everything.

That rich detail is what makes this work. When an LLM has real diagnostic data to reason about, it can evaluate the code it wrote and fix it. No hallucinations. No guessing.


Context Beats Steering Prompts

This is the single biggest lesson I’ve learned since Part 1.

My observation is that when LLMs don’t have text that drives their actions, they are much more likely to hallucinate or assume. The number of steering prompts I’ve written around “don’t hallucinate” or “don’t make assumptions” is embarrassing. They didn’t work.

What did work was giving the agents context. Real build output. Real error logs. Real environment state. The harness engineering blog from OpenAI says the same thing, and I agree completely – but the proof is in the implementation.


The Proof: Look at These PRs

If you want to take a step into insanity, look at these two pull requests on my Rust DPDK project:

  • PR #16 – 21 commits, 68 comments
  • PR #18 – 54 commits, 591 comments

591 comments on a single PR. And to be clear – those comments and that detail were not written for humans. They were written for LLMs. The CI feedback, the logs, the environment state – all of it exists so the agents can read it, reason about it, and act on it.

Example PR comment with rich CI feedback

The AI agents just iterated until they succeeded. No random hallucinations. No writing code just to make tests pass. Not a lot of over-steering from me. The rich data from CI gave the agents direction on what to do next, and that direction had the insight they needed to make changes that actually worked.

Was I still in the loop? Yes, but in a very different way. I would open my phone, check what Claude was saying, open the GitHub PR and see what was there. Give it a few nudges. Check in later. It felt like I gave a task to a junior engineer who made consistent progress and got minor guidance from me.

My time was freed up significantly.


The Mindset Shift

Building the harness to get to this point takes time. There’s no shortcut. And as the OpenAI blog states, it requires a mindset shift. You need to focus on architecture and interfaces. If you get bogged down in implementation approach, you will slow everything down drastically.

With the current tools, if your interfaces are solid then the cost of a refactor is pretty low. The agents can rewrite implementations all day. What they can’t do is fix a bad architecture – that’s still on you.

This maps directly to what I said in Part 1 about my role in the loop: Supervisor, Architect, Code Reviewer. That hasn’t changed. What’s changed is the leverage those roles now have.


Where I Am Now

All in all, I’m really excited. My productivity is exploding. I’m building all the time – as my wife pointed out. There feels like an arbitrage moment right now with the cost of these tools relative to the productive output, if you can build a framework that allows you to harness the output.

Pun intended.


What’s Changed Since Part 1

August 2025 March 2026
AI share of output ~20% Majority of implementation
Working style Synchronous, side-by-side Async, phone check-ins
Key bottleneck Tool maturity Building harnesses and feedback loops
Biggest lesson Supervise the AI closely Give it context, not steering prompts
Monthly tool spend Minimal $100/month (Claude Max)
Vibe Cautiously optimistic Building everything I’ve wanted to for years

Thanks

If you read Part 1 and tried the prompt workflow, I hope it served you well. The game has moved forward. The workflow still matters but the real unlock is building feedback loops that let agents self-correct with real data.

Try it. Build a harness for your project. Start small – even just getting CI to post detailed logs back to a PR changes the dynamic completely.

And if you want more of these updates, subscribe to my newsletter. Up to one thoughtful email per month. No spam.


Part 1: My Perspective on GenAI | dpdk-stdlib-rust repo