Episode 72: A Check-In with Agentic Coding

October 6, 2025

Continuing to explore the changing balance between coder and machine.

Recent tools have changed what it feels like to build software alongside an AI assistant. The progress is hard to ignore, yet the boundaries of what still needs a human touch remain important. This reflection considers how much has shifted since the early experiments, what continues to hold back full automation, and where the most useful middle ground may now be.

Transcript

I've talked on the show before about some of my experiences using, you know, AI coding
tools, you know, mostly LLM driven, some agentic, that kind of stuff.
So I wanted to check back in on this.
My conclusion in the past, last time I kind of talked about this, was really that the
technology really wasn't there.
It still left, like, it left a lot of issues, particularly when things started going awry
in the project, whatever you were working on.
It was very bad at recovering from that.
And also, once it kind of started down, it would lose the thread pretty quickly.
My overall way I've been approaching all this for the last year or so is that using LLMs
in particular are a great tool to have at your disposal when you're coding, working on some
sort of development project, that kind of thing.
But with some reservation that it's, you know, you can just open up a blank page, so
to speak, and just go to town.
So recently, I had sort of a concise, relatively small project from scratch that I thought, you
know, let's check in on this again.
So just for some background, like, I was using Cursor for this, and that, and what I wanted
to do was see if I could prompt the Cursor agent and have it actually build me this full
project.
I'd had some information from a friend and mentor who had, has similar points of view
on this kind of thing.
But he, he was telling me that, you know, recently he's been doing more and more with
some of the agentic coding and that it is getting better and that he's actually published a couple
projects.
You know, I'd say 80, 90% of the way with using sort of agentic AI.
So I wanted to give it another role.
So not to bury the lead, this stuff is getting better.
It is way better than it was even six, eight months ago.
I was surprised.
So, so, you know, fast forward, I was able to essentially create this entire project that
I was working on using the Cursor agent and just prompting it along the way.
Um, and not only that, it also helped me then deploy it up to a cloud service.
So I was really able to pretty much go end to end on this thing.
Now, a few, few things to bring up here that I noticed along the way.
So, you know, again, my conclusion here is that, man, this is, these really have gotten
good, but with some caveats.
This was a, from the ground up, fairly, I don't want to say simple project, but definitely
limited in scope.
There, it wasn't a huge thing with a million moving pieces and integrations into existing
systems and all sorts of other things.
It was fairly concise, some integrated integration work, but not a ton.
When I have tried to do similar things with pre-existing code bases, I still think that the AI struggles.
I often find that in that scenario, I have more luck using the, well, the workflow that
most people say you shouldn't be using, which is to copy and paste some of your code over
into like a chat GPT and have it talk to you and then copy and paste your code back over
into the editor.
I find for existing projects with fairly large code bases, I still find it to be a better
workflow.
But this was not that.
This was a, a from scratch.
I let this thing just run with it.
It structured the project.
You know, there was a backend and then a frontend and then another frontend.
So there was a few pieces to it.
I'm not sure I'm wild about how the AI automatically structured things like in terms of directory
structure.
But at the end of the day, if we end up in a world where no one really needs to touch
the code very often, I suppose it doesn't matter a lot as long as like security is kind
of dealt with.
But anyway, there were some things I noticed along the way.
So my caveats to all of this overall, very optimistic experience, repositive experience.
I think it went really well and I'm very happy with it.
The first thing I would say is we are still at a place where I really feel like you still
need to have some background in coding, being a developer, being a programmer, something in
that realm to really make the highest and best use of these tools.
Because sometimes, and it can vary, sometimes that's simply because there's like one line
in the code somewhere and you know all you got from reading it, if you go and look at
it, all you got to do is go and like change like one thing and the whole thing's going
to sing or it's going to make some small customization or edit or maybe it's a line in a CSS style
sheet or something.
The ability to go in and make those small adjustments yourself without having to try
to describe that to the AI is frankly more efficient and probably going to get you to where you're
trying to go quicker in some cases.
It also really helps to have some coding background when things do start going awry because you can
go and look at, you know, maybe one portion of the code or whatever that was written for you
and just follow the logic along and be like, where is this thing going awry, right?
So those things are useful to have under your belt.
But I'll tell you, I did a lot less of that this time around than I have at any point over
the last year while I've been messing around with this stuff.
A couple things that I found to watch out for.
First of all, the longer your chat and context for the project becomes, the more the AI still
loses the thread.
That's been a problem with LLMs and AI for quite a while.
I mean, since the conception, like the longer your chat history goes, the harder of a time
it has with like following everything and then putting it all into practice.
This goes hand in hand, I think, with why this really worked really well for a small, concise,
limited scope project.
But man, if I had to build something large and huge, I don't think you'd get all the way
through it.
Now, what I want to do some kind of research on or look into a little more is, are there
some tricks to help digest a longer context better?
And I think that there are some out there, but I don't, I just haven't looked into that
all that much yet.
But that's a, that's a piece.
But, you know, again, small, concise, limited scope project handled it very well, at least
for a very long time.
As I got further down in the project, there were a couple of weird things that would happen
after a while, right?
So this was like a, and you have to watch it.
This is why it's helpful to know a little bit.
So this project had a Python component, which meant that I had a virtual environment set
up for the, the Python code base.
If you're not a Python programmer, it's just, that's a thing you do with Python.
And at some point along the way, the thing just lost track of where my virtual environment
was and that I even had one running.
And then it started like blanket installing stuff into like my global Python area.
So like you have to watch what it's doing still, but that didn't happen right away.
It kept that context for a long time until eventually it just lost it.
And then I had to remind it and be like, Hey, you're installing stuff into my global
Python environment.
I'm using a virtual environment.
Make sure you use this.
And then it went back to it.
But if you had no programming experience whatsoever, you might not necessarily be looking out for
that kind of thing or realize that it's happening.
The other thing I would say is you do, I find I've got to keep at least a high level view
on security when I'm doing this stuff.
Sometimes they, these things just implement some really wacky stuff that would be really
easy to take advantage of if you deploy it.
So I'll do a combination of a kind of watching what it's doing to make sure that, oh, if I'm
going to pass around API keys or I'm going to pass around and, you know, environment variables
that they are in some way secured.
But I also ask it on occasion, like every so often throughout the project, like, Hey,
take a look at everything.
Give me a security audit on this.
And then it can look through it and find its own mistakes.
But I find I have to prompt it to do that step.
Whereas some other steps it takes on by itself.
So just something to kind of keep an eye on, but it is really, uh, impressively better than
eight, nine months ago.
If you haven't done anything like this, like if you have a small project to just really
do from scratch, um, I've heard, I haven't used Claude code much.
I've been more in the cursor world, but you know, either tool I have on good authority is
pretty good at this point.
Um, give it a whirl.
Uh, it really was rather impressive.
I still don't think we're at a place where this should just be the go-to for deploying
production code, especially not for critical systems, but for a little side project thing.
Yeah.
Like I think we're there or we're pretty close.
So give it a whirl.
Let me know what you find along the way.