How Our Dev Team Actually Ships with AI in Early 2026

The Headway dev team talks AI tools, workflows, and real tradeoffs behind shipping production software, from Claude Code and TideWave to inheriting vibe-coded apps.

Listen next on Even-Keeled

System Architecture and Conway's Law

30 min

Design Systems in Action: Collaboration Between Designers and Developers

39 min

Presented by

Host

Jon Kinney

Partner & CTO

Host

Dan Diemer

Mobile Lead

Host

Tim Gremore

Development Lead

Host

Jacob Miller

Marketing & Brand Manager

Guest

Host

Jon Kinney

Partner & CTO

Host

Dan Diemer

Mobile Lead

Host

Tim Gremore

Development Lead

Host

Jacob Miller

Marketing & Brand Manager

Guest

Transcript

Jon Kinney: I've seen some discourse around, you know, AI generated slop and how all these SaaS platforms are, uh, just degrading and whatnot. I think yes, that is true, but more than one thing can be true at once. Right? I think some people, us included, are writing some of the best, most secure, most performant code they've ever written, because it is easier to generate that code.

And when you do know what you're doing and the bottleneck isn't typing each keystroke. You really can orchestrate it at a lot higher level. Hello and welcome to Even Keel, the podcast about the craft of creating software and building effective development teams. I'm John Kinney, founding partner and CTO at Headway, a product design and development agency.

For more than a decade, we've been helping startups and enterprise teams built better products faster from launching new products, adding new features, building powerful design systems, and even transforming the way their teams operate. Every discussion we share is a window into what we've learned, how our teams are leveraging new tools, thoughtful perspectives on the latest trends in AI development and more.

In today's episode, we talk about how we're helping clients leverage AI from early Chat GPT integrations to today's frontier models. We dig into the work that Tim and Dan have been doing with design systems and why giving AI agents better context makes a huge difference. We also share what's working and what's not with tools like Cloud Code Cursor and Windsurf and I walk through an internal tool I've been building to streamline our estimation and sales process.

Enjoy.

Jacob Miller: How do we help an organization leverage ai? What does that look like right now? Like how do those conversations begin? Is it usually like, Hey, we have this idea, or we don't know even know what to do? What are you guys doing right now with other clients? However you want? I guess

Jon Kinney: it's, it's progressed so rapidly over the last even year.

I think one of our first uses in the wild was just sort of exposing chat, GPT, so to speak, to the end users with a prompt that would guide them towards sports specific questions. So this was a baseball app that would help, um, advanced teams learn their playbook more effectively. And you could just ask it if you were maybe.

On the edge of being an advanced team, but maybe your coaching experience wasn't quite there. You could say, Hey, I've got a first and third situation, you know, what do I do for defense here that's most effective? And it would know based on the system prompt that we gave it, it, you know, sort of embedded into the deployed application that it should help you understand the options.

And if you gave it more context, like, you know, these are, it's a traveling 12-year-old team or, or whatever. It could help understand probably with some level of discernment based on the training data that was inherently available to it. This wasn't super advanced. We weren't doing ML and, and human in the loop or anything, but just kind of off the shelf, um, you know, sonnet, cloud sonnet type, uh, type integration.

And it would, it would be generally good at what you should do in those particular situations as a. Up and coming baseball coach, or if you were a college coach or you know, minor leaguer, which we've had some folks use the app in that, in that regard, uh, it would, you know, the questions you ask, the, the responses you get would be more tailored and more advanced based on the, the more advanced prompts.

So I think that was one of our first kind of in the wild deployments, um, maybe 18 months ago.

Jacob Miller: And I was gonna say, yeah, it was almost two years ago probably.

Jon Kinney: Yeah. And since then, I think. We've started using it a lot more just behind the scenes for all of the administration of projects. The actual coding has changed so much, even in just the last six months.

Um, and especially with the latest frontier models that have been put out, uh, with Opus 4.5, and I heard, I, I don't know if it dropped yet, but, uh, it's February 3rd today. Rumored that Sonnet five is, is coming out today. So I guess we'll see. Yeah, if that, if that changes the game. But really the biggest thing I think that's still unanswered is with the new capabilities of these frontier models.

How can you safely use them in the context of giving it access to the enough data where it can make meaningful, you know, decisions or help you make meaningful decisions. As an organization around a key metric or idea or a go to market plan or you know, anything that you would typically need to take multiple people.

Um, looking at for days or weeks. A lot of analysis. Maybe report building that, you know, used to take a long time 'cause you had to collate all the data.

Tim Gremore: Mm-hmm.

Jon Kinney: It can just do it so much faster, but how do we do it in a secure way? So sort of just answering business questions I think is where a lot of people will enter.

Stack. Um, but we so far have really tried to just lean on our expertise as builders of systems and helping people create more structured, more reliable, faster systems. And so I know Tim and Dan have been doing a lot of work in design systems and being able to help structure a code base in a way where when you tell agents, Hey, I want.

To be able to create this new component, or I want to, you know, really create this screen and it knows the list of components that are available and they're well-defined, it can really go a lot faster and help understand what it is that you need to compose that resulting ui. So, I dunno, Tim or Dan, if you wanna speak to a little bit more of the nuts and bolts of some of the things we've been doing there for, uh, a couple of our clients.

Tim Gremore: I, I think what, what Dan and I have been seeing is that like the. The MCP offerings, the ag agent, um, aids that companies are offering. So if you just take like Figma MCP offering, for example, what Figma ISS providing is not well-informed of the design system that we've helped build out.

Jacob Miller: Hmm.

Tim Gremore: And so, so their MCP offering is helping the agent that we're using often Claude Code.

Um, it's helping Claude. Code produce a better, uh, a better artifact, a more accurate art artifact. But it's still, it's still missing key ingredients. It's still missing design tokens or components that we've created that are a crucial part of what we're building. So, um, so being able to, um, add that context to the workflow, the agentic workflow that we use, or that.

Our teammates use, you know, with the client has made a, a pretty substantial difference in, in how accurate the product is that we get from an agent.

Jacob Miller: How, how are you and Dan like, kind of like divide and conquer on that front? Like, I know Dan, you're more, more of a mobile expert, so I'm curious, like, as far as like design systems and the connectivity there, how are you guys kind of like, oh, Tim's focused on this aspect of it and Dan's focused on that aspect?

Or is it just kind of like, Hey, I'll, yep. You're kinda like leapfrogging each other like, yep, I have time this week, I'll make some more progress. And then you kinda hand off to each other, like carry the torch, so to speak.

Dan Diemer: It's more that leapfrog approach because what tends to happen is that that split between, you know, web and mobile becomes a lot less important, especially with the way that we, we build design systems and that we can build them in a really, um, cross platform and platform agnostic way.

Um, when we come at it from that approach, then the way that the AI tooling comes into play really matters less about the platforms and more about the, the split between like that deep backend work that Tim has, has done a lot of on this project. Um, and then how we map that to the front end work. So what we're finding is that there's a lot of tooling that still needs to be built, and it's getting, it's getting.

Introduced in more of like a, a, a platform way with things like the Figma MCP, but they're still not great. It's still not, um, turnkey enough I think for, for us or for our clients to really pick it up and just use it. And so we're really having to fill in the gaps in a lot of ways, either through tooling to, you know, get some of that information outta Figma in a way that the AI systems can use it in, in a really like thorough and thoughtful way.

Or building more of a bridge from our, our AI tools into the front end. Um, sometimes with things like Figma Code Connect, that's been really useful, but it's not useful in a broad spectrum yet. Um, and so taking kind of some of the best of all of that and building our own recipes of how we use them has seems to be how things are shaking out that it makes any sense.

Jacob Miller: Yeah, no, it does. Yeah. Again, I was just curious, uh. Because, you know, in the past it's like, hey, this person's gonna focus on this aspect. But like you said, it just seems more agnostic now. It's like, hey, these tools can kind of understand no matter what we're dealing with. Uh, but I think a lot of it's like, I, like you were saying, like, uh, filling in those, those connectivity gaps, like, hey, it knows this, but it doesn't understand that yet.

How do we give it that information for now until like that gap is closed by some other tool or whatever that might be?

Jon Kinney: Well, and the, the trick is that, you know, these large language models are, they're not item potent. And what I mean by that is you, the same input is not gonna get the same output every time.

Right? Yeah. Which is one of the things we strive for in software development, especially around things like testing and deployments. You want the exact same thing to happen every time, and that's just not guaranteed when you ask a question of ai. And you know, to further get into the weeds, they're a stateless model right now, which means every API call to an LLM has a certain defined amount of context and it doesn't necessarily, unless you build systems and tools around it, have long-term memory so it knows what you asked it that time, and you append another thing to it and it now knows a little bit more because there's a new message in the thread and now it knows what you asked at that time.

But that's where these context windows, you know, come into play and the agents get, um, less predictable and less smart and even start to cut corners. Uh, you know, when the context windows fill up and, and those context windows have been getting larger, but that also isn't necessarily the only thing that's gonna solve that problem.

And so I think ultimately being able to have more predictable guardrails around. Uh, what aspects of automation? 'cause really that's where all of this is going, right? It's, it's automation. What aspects of automation truly need generative AI and large language models? And what aspects of this automation have really already been solved?

And now we're just making it harder by trying to put them into these, you know, non-deterministic systems.

Jacob Miller: Yeah, yeah, yeah. There was an interesting, uh, study done by Spark Toro around just. Just search and like, you know, the Google, the over AI overviews from Google or searching in chat PT or Claude or Perplexity or whatever, and they just had a bunch of people use the exact same question and use it multiple times in, in like a week or a month.

And like, so everyone was asking the same exact thing and the results would be different. Like you said, even from the same person 24 hours later, the results could be completely different. And I know like normal search can be like that as things like start to rank or there's news or whatever. Usually like the top 10 on a page, you know, the first 10 links or whatever are almost the same.

Like, but like, it just, it just does seem like, you know, when you're thinking about like, Hey, even if a organization has all this data into a dataset and they're using an LLM to ask a questions today, they could be asking for a go-to-market strategy and get one answer and next week it'd be a different answer.

Yeah. And so like, that's like really, that's like a big deal. Like, so it's like, how do you know, I, and I, like you said, it's um. State You said you call it stateless?

Jon Kinney: Stateless,

Jacob Miller: yeah. Yeah. And, and it's like, so how do we, is there a path to like fixing that and what, what would it be? Are there people or other people trying to fix it?

And what have been the results been?

Dan Diemer: Well, I think what we are, we're finding success with right now, and I don't know if this will persist, is, is treating it in the same way that in the past we would treat it as like a software spike. Like we're just gonna write. The code that gets the job done. Maybe it doesn't follow all of the architectural patterns we'd want.

Um, it's, it's a little bit more spaghetti. It's not as clean. And then we clean those things up once we, we understand kind of the problem space. I think what we're finding success in is doing that with ai, throwing everything, you know, at the AI kitchen sink and then seeing where. We want things to go, what the outputs are, and then pulling it back, trying to really systematize it, trying to get AI out of the mix as much as possible except for the spots that it needs, um, seems to be a really reliable way of getting back to something that's not, it's like quasi deterministic, you know?

It's not, we're not getting exactly the same input and output, but inside of these specific bounds. Where we're okay with content being more malleable, everything around that we can make sure that we get back what we would expect and that seems to be a good way to move forward. I was actually just working on a thing earlier today that is very much that where we're, we're breaking out some designs, having AI do some analysis, and then returning that analysis to us along with some structured formatting.

Um, of that design, which AI could do all of it if we wanted, but what we know is that what it's gonna give us back from run one is gonna be different from run two, and we don't trust it enough to be correct in both runs. And so we're gonna, we're gonna hedge and make sure that we know we can get reliable information, but we also do want to pepper in the, uh, the, the detail, the like analysis that AI can do about.

A large structure of code and, uh, some other various inputs to get back something that is really useful and would take somebody, you know, some, some amount of time to actually do manually where we can, we can do this entire flow in, you know, 30 to 45 seconds.

Jon Kinney: I was just gonna say, Dan, what I, what I think you're talking about is really trying to control the slop factor, right?

Which has become kind of somewhat of a buzzword along with ai, um, vibe coding, right? So people are, you know, context engineering, all these things. Um, I think slop factor, I initially was like, well, yeah, AI slop. Sure. I wonder where that came from. But it occurred to me that it is kind of already a technical term because in iOS development there's a thing called hit slap, right?

Which is how much off from hitting a button you can be, and it still works. That button still gets triggered. So I don't know if there was any of that. That came into play around slop, or if people just think, Hey, the code is sloppy and I don't care. But in any case, kind of a funny little connection there.

But the, the thing that's interesting is I've seen some discourse around, you know, AI generated slop and how all these SaaS platforms are, uh, just degrading and whatnot. I think yes, that is true, but more than one thing can be true at once. Right? I think some people. Us included are writing some of the best, most secure, most performant code they've ever written because it is easier to generate that code.

And when you do know what you're doing and the bottleneck isn't typing each keystroke, you really can orchestrate it at a lot higher level and it's way easier to get rid of the human slot that used to be in the code base. Like we've all had code bases where, oh yeah, I guess you know, that function is just 500 lines and it is what it is.

We don't really have a way to. Clean it up 'cause there's never enough time allocated in the sprint to really go back and, and fix that. Well, you can point that function at AI now. Tell it what you actually wanted it to look like and it'll generate it in under a minute. And so the ability to remove our own cruft and fix our own bugs in the code that we know are there but didn't really have enough time or energy, or weren't given enough budget to fix or change.

It's so much easier now to tell it how to do the right things. And I think that kind of just, I wanna piggyback that on some other ways that we at headway are really able to stand on the shoulders of what seem like extremely insightful decisions now, but we're kind of just the product of making code better for humans is also making it better for.

AI agents, and that is convention over configuration in the, the tools that we use. And so that's the case for Ruby on Rails, which is one of the, the main things that we use to develop products. Uh, that's the case for Elixir and Phoenix. And both of those languages have very succinct, um, English like expressive code.

And so it's easy for the ais to generate that code. It's token efficient. And so the, the technology that we've been building on for the last 10 years at headway and much longer than that before we all started, this is really well positioned to take us into this next ENT development phase. Um, I, I think maybe a little less so on the JavaScript front, um, just because there are so many ways to do things and that's where, you know, you ask it the same question and you get a different result.

You type the same search and you get a different result. That is very much the case when you're trying to solve a JavaScript problem because so many folks have done things so many different ways, and it isn't that you can't solve the problem, it's just probably gonna be somewhat of a different answer depending on how you couch the question and depending on what context your particular, uh, project is in.

Um, it, that's not to say it can't do it, of course. You can say, well, something like A Rails has really strong convention over configuration and a long history of open source. So there's a lot for it to train on, but the amount of training data pales in comparison to JavaScript. So it's a bit of a trade off, right?

Like there's maybe a little bit, there's still tons of Ruby and rail stuff, don't get me wrong, but there's less of it than JavaScript. But what is there is maybe a little bit more idiomatic, uh, and so. A smaller pool of

Dan Diemer: higher quality training data, right?

Jon Kinney: Yeah. And that's not to say that Java, some JavaScript code can't be of high quality.

I mean, web apps suck without any JavaScript. It just is what it is. Um, but you wanna make sure that you're doing it intentionally because Yeah, I mean, hey, my, my vibe coded app works great, except the problem is it has to download 15 megs of JavaScript before it can run. Right? Like that's, oh

Jacob Miller: man,

Jon Kinney: that's not good.

That's not universally the case, but somebody who doesn't know what they're looking out for. That can very easily be the case. So yeah, we're seeing more and more clients kind of bring a, a vibe coded app to us for us to, uh, take to the next level. And it's interesting 'cause o on the one hand you can almost throw it away 'cause it's so easy to recreate.

Um Oh

Jacob Miller: sure.

Jon Kinney: But on the other hand, you know, trying to understand all of the nuance that they prompted into it is trick is trickier 'cause they don't have good. Specs or requirements, it's just, Hey, I built this thing.

Tim Gremore: To add on to that, like re the app that you recently helped, uh, build a native version of, which was initially, it was initially web or did they just have a React native and you helped migrate to expo?

Jon Kinney: Yeah, it was React native, uh, it from, from, from the CLI, sort of the old style React native build, and we converted it to expo through, through the help of cloud code and. Um, yeah, I mean, it was, I, I know like one shot gets way overused, but it wasn't far off. Like it was it,

Tim Gremore: but, but I think, sorry to interrupt, but like that, like, it was pretty close, but I think part of that is because like, uh, like, I mean, give Dan credit, like the, his knowledge of both web and native platforms helps make.

The AI one shot promise kind of work, right? Knowing, knowing how to, knowing how to, uh, divide up your component definition and your module definition so that that same module can be shared across platforms, whereas, well, I know that this is different on native than it is on web, and therefore I need a different definition.

I need a different approach to how I build out. This feature or function. Um, just having that knowledge leads me to a more efficient prompt and I can help AI help me, but I, with that knowledge that, that we, we all share and that data's been so instrumental in helping establish, um, it's kind of a crucial ingredient to how you've been able, like how you saw that migration from React native CLI to expo happen so smoothly.

Jon Kinney: And well, and to be fair, I had worked on that conversion a few months prior. We've engaged with this client a couple of different times with various contracts and we, we, uh, worked on the expo conversion and kind of set it aside because the team wasn't ready to bring it into production yet. And we had a couple other features and things that we needed to finish, uh, and build, and they didn't want to disrupt the whole app deployment and workflow.

Just yet. And, and some of the changes that I had to make for Expo meant fundamentally re rewriting a few of the ways that they worked with file uploads and storage and things. Um, and they wanted explicitly to be able to allow files to be accessible in your my files area, depending on iOS or Android. Um, not just within the app, but ideally it could be both.

It was only external before expo. Then I had it only internal in my first go around. And then we kind of scrapped that 'cause it honestly ended up getting about three or four months, uh, out of date from other, from, you know, the main branch. So then when we started the conversion again, we just started the X book conversion from the latest, um, branch of the code.

And it, it went smoother 'cause Expo had more tooling. To be able to help with ag agentic, you know, workflows. I think they even have an MCP now too. And, uh, additionally we kind of knew some more of the edges of the system and we were able to make the thing like that file access happen both inside and outside of the app, which is the ideal state.

So it ended up, yeah, I mean, there was some experimentation, there was some learning, some failures, some restarts. And then once you know all that, yeah, the prompt can be so much more focused and. Again, the tooling and ecosystem helped quite a bit there too, but, um, it did go quite well, uh, in the second go round for the, for the upgrade.

Dan Diemer: Yeah. Four months is, uh, is a lifetime in these tools and so, um, in the end it, it kind of becomes a blessing being able to set it down for that period of time when you come back to it. Uh, because it's like moving from a computer to a supercomputer. And I've seen that in a number of things. Things that I tried.

Um, even just, just like toy projects, uh, in the early days of ai, which we're talking 18 months ago, right? Early days, uh, that were very, it required a lot of like manual intervention and thinking through how and planning that now the tools really can one shot them. And it's fascinating how quickly that's changed.

I think touching back on an, on a few of the things that you were talking about earlier, John. It's, it's crazy how, like, just the, the regular, uh, valuable things in our jobs that we would do as software developers. Um, you know, writing clean code, writing good tests, all of that, it all just feeds back into the system.

And so, you know, the convention over configuration stuff, the, the tools benefit greatly from all of that, just like developers benefit greatly from all of that. And so what we've found is like. The more we follow these good practices, the more the AI can work efficiently when those things don't exist.

Like when we get these projects that John's talking about, where, um, you know, we're being asked to vet them and figure out like how do we, how do we take them to the next level? Um, and none of that extra, extra work, extra infrastructure exists. Um, the AI has gotten really good at building that too. So if you know that what we have, you know, in my hand is the thing that should work.

It's pretty, pretty great at extracting out all of that testing. It's really good at looking at that 500 line of code function and figuring out how to break that out and in ways that would just be not feasible for a human in the past because it would, like John said, it would just tie up all of your sprint time for, you know, stuff that we as developers know is very valuable, but to the end product is.

Less valuable than other tasks. And so now with that all off the table, it seems like the, the quality of, of code and products we can ship is just fundamentally going to continue to increase.

Jacob Miller: Today's episode is brought to you by Headway, a digital product agency based in Wisconsin. Do you need to design and build a world-class user experience for your software, but feeling like you just can't get there on your own?

That's where headway comes in. We're the folks who help ambitious startups and enterprise teams bring their product ideas to life through design, development and product strategy. We don't just give you user interface designs. And leave you to figure out the rest. We work with you and your target customers to create beautiful user experiences, build scalable design systems, and provide Silicon Valley level talent to get it done right the first time.

Whether you're launching a new software or just looking to add features, we've got your back. We've worked with startups in industries like Logistics, healthcare, FinTech, and EdTech, helping them solve their biggest software challenges. And here's the thing, no matter how complex your product is, we can help you gain clarity and get the job done together.

We're so confident that we'll be the right fit, that we offer a 50% money back guarantee. If you don't think we're working out within the first 45 days ready to see how we can help you and your team. Just head over to headway.io and book a free consultation today.

Dan Diemer: What, what else has, uh,

Jacob Miller: or have you guys been using within like your tooling or like your day-to-day, uh, tasks to, to be more effective in your work?

Like with, with ai, obviously, like there's things like cloud code and, and things like that, but are there, is there anything else about how you're, how you're working every day? Um, whether it's communication stuff with clients or. Actual, actual code. I'm just curious what you guys have been using.

Jon Kinney: I just wanted to share this quick.

Uh, Jacob, uh, a tweet here from Gary Tan, who is one of the, uh, president and the CEO of Y Combinator. So, you know, if you're not familiar with them, they fund all kinds of startups every year. There's batches, right? That people apply to. Um, and I, I forget even how many they're up to now. 20, something like that.

Um, and. He said people are sleeping a bit on how much Ruby on Rails and Claw code is a crazy unlock. Um, right. So Rails was designed for people, people love synthetic sugar and LLMs or sugar fiends. And so, uh, just a, a bit of calling out of some of the, the ways that that can, convention over configuration can really help, uh, empower those large language models to, to take the next step without needing as much prompting, which is, uh.

So you can, you can get a feature built out pretty end to end without having to do a lot of intermediary coaching and then just review the totality of it, which still sometimes needs changes and tweaks, and especially the front end, uh, scenario with Rails is still a little bit fraught with some of that JavaScript ecosystem, um, bifurcation where, you know, it's somebody does it.

All in CSS, somebody does it in Tailwind. You know, these people use um, hot wire and stimulus and turbo and these other people use React with a Rails app and other people use um, different. What's the JavaScript library, Tim, that, that you've found some good luck with lately?

Tim Gremore: Uh, well I've been using inertia to tie the

Jon Kinney: inertia Yep.

Tim Gremore: Phoenix to, to react. So.

Jon Kinney: So there's just different, yeah, there's that. That's kind of, I think still a big piece of the puzzle that, uh, we're trying to solve here at Headway with some of our sort of Figma to code initiatives and design system capabilities. And it's a little trickier to solve that kind of thing in the context of some of our enterprise work because we don't always control the full stack that the whole organization uses end to end.

I think for any of the products that we're building or new zero to one MVPs, we're, we're definitely two feet in on trying to make that be really effective.

Tim Gremore: Like other tooling I, I've tried, uh, I just sort of this past weekend for the first time gave pencil do dev ago. Um, pencil dev is, they're taking inspiration for how Figma is attempted to integrate AI into designing and prototyping.

Um. So pencil dev connects to cloud code. When I've got what my, my typical workflow, if I'm working in a Phoenix app or a Rails app, most recent, most recently there would be Phoenix. Uh, I've got Tide wave connecting to cloud code. And now when I'm asking for some designs, I'm just, I'm exploring, like, I'm trying to get ideas that I've got down on, on paper, so to speak, that I can then.

Um, work on actually implementing that design. Pencil dev got me pretty far with very little effort to, um, download it, um, make sure it's connected to cloud code, which it just did like it identified and, and connected immediately prompt and I had some very reasonable designs that I could work with. And then, and then it was a simple workflow.

Select this layer. Go back to tid wave. Prompt, ultimately prompting cloud code through tid wave, implement the design that's in pencil, and it, and it did, right. It, it gave me a pretty high fidelity first go at that design. Um, not perfect, right? Like this is, this is more there, there's sort of, it's a sliding scale, right?

Like this, what I'm describing right now is I just, I wanna prototype, I want to explore this idea. I'm not concerned with, um, the need for a design system and, and building out a production app yet I'm not there yet. Right. So that, that this world is, is ideal for just like Dan said, kind of quickly throwing things at the wall so I can vet the concept.

Um, that flow doesn't necessarily work in enterprise, the day-to-day that we, we help clients with. Um, but. Um, but it does, it does help there too. It's just, it's a different workflow and calls for really calls for a design system to help maintain brand standards and consistency throughout product development.

Jacob Miller: Yeah.

Jon Kinney: What what's interesting with Tide Wave, Tim, is I've done some digging on this because I've been trying to figure out, so I've got a clawed Max plan, right? So it's 200 bucks a month and it has a certain amount of tokens, 20 x more than pro, right? You know, how could I ever run outta that? Well, you can, uh, if you're running a bunch of stuff in loops all the time, right?

So people have probably heard of Ralph Loops and whatnot. I don't, I I, maybe I'm using them, so to speak sometimes, but I'm not like setting out to write a Ralph Loop specifically. Uh, but what's interesting is the, there's a, a, the concept of a harness. Like a model harness. And so Claude Code is a harness.

That you can use to talk to anthropic models. And Opus 4.5 is the, the current as of February 3rd, uh, current kind of state-of-the-art model. And, but you can also configure it to talk to, um, sonnet, right? Sonnet 4.5, I think is the, is the latest, and five is coming out soon as we alluded to. But if you're using a max plan.

They really want you to use Anthropic. That is that plan inside of Cloud Code. There's another company called Open Code that is a different harness that can still use any of these frontier models. You could choose, um, something from, from, um, OpenAI, right Codex if you wanted to. And they have their own app that just launched today too.

Desktop, uh, Mac application, which is cool, but. There's just this weird world that we're in right now where tools like Tide Wave actually, uh, somehow are able to authenticate with your Max plan, but they aren't directly in the tool call loop with they, that's not the right way to put it. Um, they aren't controlling the agent loop, if that makes sense.

They're making API calls. So it's different than open code, where open code, you can give it, um, some browser capabilities and, you know, playwright, MCP, different things. And you can kind of approximate what tide wave does. But it's, they're, they're just different approaches and it's, it's gonna be interesting to see where the sweet spot is, and from a pricing perspective, how these things shake out because, um, TID wave works really well with that theoretical kneecap of not having full.

Access to the loop, right? So it's making API calls stateless API calls, but it tracks some of that stuff and it knows what framework you're using. It knows how to read that. Frameworks, um, logging it knows how to, uh, look up the docs for that framework. And it's all very specific. So it's token efficient.

And like you alluded to Tim, you can choose kinda like in Chrome dev tools, you can choose. Um, to click on an, uh, particular dom node and it will then know even more context about what it's trying to fix. Um, but it's still just interesting that like tide wave specifically, and I validated this with Josee, one of the creators is, is, is using the SDK, the API calls, not sort of man in the middle, um, masquerading as Claude code, like, like open code was doing for a while.

Jacob Miller: Hmm.

Tim Gremore: Yeah, that's, that's a really interesting distinction. I mean, I, I'm curious what pencil debt for that matter, if they're playing in a similar way with Claude as T Wave is. Um,

Jon Kinney: if you don't, they come after you and try to shut you down, is what we've learned. So yeah, I would imagine that's what, what most people who wanna be able to grow and scale do is try to play by the rules.

Right now. I'm just curious how long those rules are gonna stay the way that they are.

Jacob Miller: Yeah. Something that, uh, I, it came up, I didn't mean to share it right away. It is auto shared, but, uh, I shared this with the design team last week and there's this guy that does really great interviews with folks, a lot of AI designers within the past year, folks at like, at, uh, perplexity and, uh, Andro and stuff like that.

But he, he was just talking about like, he has this matrix of like, here's the tools I'm using right now, and when I use them. Because like you guys are saying, like right now, it's like, oh, this makes sense for this, but not for that. And then like, but then at the same time, like if this guy has subscriptions to all these tools all the time, all of a sudden it's like, you know, your, your budget and your margin start to go away.

If you're a freelancer or an agency, it's like there's just so much happening. Or if you're an internal team, a product team, and you're just like, all of a sudden your costs are inflating because you're using all these things that you weren't using before. Um, it just, it is interesting to think through.

Um, and you start to have, to start being more selective on like, oh, we don't really need that as much as we think we do. Um, and obviously these, this matrix will change every three to six months. Hey, this new thing here and there's the new thing there. So, um, yeah, just, it's just really interesting. I'm curious if there's anything else that you guys have been using on actual project work or,

Dan Diemer: yeah.

Jacob Miller: Seem, seem promising.

Dan Diemer: Yeah, I've, I've, I'm kind of in a different world than John as far as like AI usage goes. And, and I think it's interesting that like the spectrum that we have just internally at headway of how we use these tools, and same with Tim. Like Tim, uh, mentioned he uses TID wave quite a bit.

Um, I don't, and part of that is a technology stack thing, right? If I'm not working inside of. Rails or elixir, it's, it's less beneficial to use tid wave. Um, and the type of work that I tend to be doing doesn't sit in either of those realms right now. So, um, what I'm finding though is really good success with like a, a more hand rolled, um, curated set of tools that is very small.

So I think we're all using this tool called beads. Which is a, like task management tool, but it's very AI focused. And so the whole point of it is, um, you can instruct your, your AI to use it instead of it's built in to do system. And then the beats tasks can, can depend on each other. They can block each other.

Um, and where it really shines is that there's this, there's this concept, like when we talk about the context window. A certain point with, at least with cloud code, you, when it fills up, it has to kind of wipe the slate clean. And so it can do that in two ways. It can compact your conversation where it takes all of the tokens that it has and it compresses it down into just what it needs to know to move on to like the next thing.

Or it can clear it or it just wipes it totally away. And beads lets you kind of leapfrog around that with this concept of priming. So what it can do is. You track all of that work inside of beads, beads can come back and after the context's clear, basically re-inject what it needs to know to continue to do work.

But in a, in a more detailed way, you can provide enough context into each individual task to work on an isolation. And so pulling that into my suite of tools and bumping up from Claude Pro, which was like the $20 a month offering. To Claude Max, but not, not quite as high as John's using I'm, I'm on the a hundred dollars plan.

Um, that has been to me, like the sweet spot I've not run into. So back on the pro plan, I would run into the token usage limits fairly often where they kind of block you until I forget, three hours later I take mm-hmm. There's like a reset period. Um, once I bumped up to that a hundred dollars level, I've never run into that again.

Interesting. And adding beads into the mix has, has been enough that I can continue to work and sometimes have two or three things going on across different projects. Like I was working on, uh, this design system stuff, but I also had a couple of our other clients, like native apps that I needed to upgrade to meet an Apple deadline.

And I had the three, all three of those running. At the same time and switching back between them. As you know, the AI prompted me for input and it's just really efficient. I'm able to like, do three times the work, I guess, essentially, and it's still really high quality work. Um, without it losing the thread, without it being frustrating for me, like I have a pretty high, what I call like a jank factor where I can, you know, I would happily work off of a Chromebook just for the fun of it.

With all the limitations that are inside of that. But I don't hit that. I don't hit any of this, this jank things just continue to work and I don't have to, um, babysit the model and try to like remind it of a decision it made five minutes ago.

Tim Gremore: Hmm.

Dan Diemer: And so, I don't know. I keep thinking, I've been thinking about this a little bit and I, I think when, uh, so I'm a big fan of like science fiction books and, you know, Neil Stevenson and William Gibson and stuff.

And, and I've been thinking about how like this concept of. Cyber decks in those books is kind of where we're at with these AI models. Everybody's got their own, their own deck they're building up of, of tools and it can look vastly different from the other persons. And you can have it, you know, the spin on it could be crazy, but you can also be really productive with a, a fairly cheap set of tools too, which I, I think is pretty cool.

You still can't really do that with the free tier, I think like. It's enough to wet your whistle and then realize like, okay, I do actually have to pony up. But even at a, at a, you know, a $20 spend, I think you're, you're getting a ton of bang for your buck.

Jacob Miller: Yeah. How, how have we been kind of changing, uh, like timelines of things, knowing that, hey, we can get this thing off the ground pretty quickly.

And then after, you know, the last few weeks is refinement and, and review and all that kind of stuff. How have we been thinking about that from a development side?

Jon Kinney: Hot topic, Jacob. Hot topic. It's, it's interesting, right? Because code is cheap is kind of the saying that you're seeing around nowadays and you know when anything can be built, don't build everything.

Because now taste matters way more than execution, which isn't a hundred percent true. You still, I mean otherwise we're just back into the slop world, right? An insecure world. I mean, if you were on vacation last week, you, you. You don't know that the AI agents now have a church that they can attend without humans.

Like stuff just has gotten kind of crazy with, uh, Claude Bot, which got renamed to Molt Bot, which got renamed to open Claude, uh, open Claw. And um, yeah. So in any case. You can do a lot quickly, but knowing what's the right way to do it and what to and what not to do is super important. And so, specifically to your question, what, what I've sort of faced in my technical, you know, leadership role at Headway, I have to do a lot of, uh, technical sales with our CEO Andrew.

And so we'll hop on with prospects and we'll talk about. What pain points they have, what needs they have. Again, a lot of times nowadays they're coming with some AI generated something, whether it's just documentation or ideas or, um, you know, different tools that they are prescribing without even really knowing what they do, but saying, you know, we should use this because X, y, Z said we should.

Um, sometimes it's right and sometimes it's something I hadn't heard of, and that's really cool. And when a, when a customer can help, um. Teach you something new and then you can learn about it and then in turn take your, you know, I'm, I'm an expert in five minutes, in 10 years, right? Like 20, 20 years, whatever it is, right?

Yeah.

Jacob Miller: TLDR, all this stuff.

Jon Kinney: Yeah. It, it's, it's the, the collective experience with a new set of data that really helps you be the expert and that that's what we get to do. But back to the estimation piece. I had to take things that, that clients were telling us. Research them, distill them, understand with their budget constraints and timelines and our team availability and expertise, what was gonna be the right thing for any one of our customers.

And it took me not forever, but long enough that it was a pain point, right? Where it's like, Hmm, alright, I gotta do this estimate, shoot, I'm gonna need like a four hour focus session to dive into this, to take notes on that to. Aggregate this information and then, you know, research X, y, z, sometimes I'll do a little bit of, uh, prototyping and just understanding what's possible.

And I, I wanted to distill that down into, at least for me, a workflow that would help me be more effective and efficient without losing much, if any of that human in the loop expertise around what we're trying to help solve. I started building a set of tools in a Rails app that would allow me to ingest that data and sort of have a conversation around that data and build up the context in a way where I could then start to use large language models to, uh, extract features based on the context of the conversation and, uh, the nuance that I was able to help impart to the, the conversation that wasn't necessarily there in the documents.

Right? So I take what they give us. I lend my expertise and experience, talk about similar things we've built in the past and timelines and technologies. And then I'm able to actually create a pretty nice set of estimates around how many people, for how long with what features, um, and then turn that into a timeline and an estimate ultimately.

And so it isn't like a fire and forget and it isn't yet something that we're providing access to. Clients for them to do this. Um, I think the first step would probably be continuing to use it internally just with me and Andrew and then perhaps some folks that are on our sales team could start to get ideas around some of the conversations they're having before I get involved.

Jacob Miller: Hmm.

Jon Kinney: You know, timelines, scope, and those kinds of things. And then we'll always have that human aspect of presenting what we're going to help you do and bring to market or. Change inside of your organization if we're coming into a bigger team. But it really does standardize the way in which we can break down the needs of a project and understand how we approach it with our tech, with our tooling, uh, in a way that's gonna be hopefully, theoretically much more concise and accurate over time.

And we get to reinforce that learning that we're doing on every project rather than. Sort of just retrospectively looking back and going, yeah, we did go a little bit over on that one. We should probably add an extra week, or we should do something different by another, you know, development lead should, uh, should come over top of this project for a day a week, you know, in the second half or something.

It's, it'll be much more quantifiable going forward to be able to, to have that in place. So, um, and then like any good, uh, product born out of need, uh, we are potentially looking to make that. SaaS product at some point, so we're excited to mm-hmm.

Jacob Miller: That's

Jon Kinney: great to continue to, to build on that and iterate on it a little bit internally and then, uh, open it up to, to other folks at some point.

Jacob Miller: Again, looking forward to doing more, more conversations like this around things that the development team's working on, how we're working on things, what we're seeing in the industry, what we're seeing on projects, um, and honestly like how we feel about a lot of this stuff that's going on, because I think that's half the battle sometimes is like.

Is this really where we should go? Um, you know, as a business? Is this where our clients should be looking to, to advance, like with technology, uh, things like that. There's always a lot, a lot more variables involved when it comes to all this agent tech stuff and data and security and, and predictability and all that kind of stuff.

So,

Tim Gremore: and remember, human interaction is better than a bot.

Jacob Miller: That's right. Human interaction's better than a bot.

Tim Gremore: My bot

Dan Diemer: responds faster than you though, Tim.

Tim Gremore: Oh man.

Jacob Miller: Is that what life's about though, or response rate

Tim Gremore: species to me is the

Dan Diemer: God response, speed, and also telling me how correct I am.

Jon Kinney: Well, that wraps it up for today.

If this content is helpful for you, it'd mean a lot to me if you could rate and review the podcast on iTunes or wherever you get your podcast from so that we can help share this info with even more people. Until next time, I'm your host John Kinney, and this is the even Keeled podcast.

‍

show notes

AI Tools and Platforms

Design Systems and Integrations

Articles and References

Content

‍

Content

00:00
Helping Clients Leverage AI and Early Deployments in the Wild

03:30
Frontier Models, Secure Data Access, and Answering Business Questions With AI

05:09
Design Systems, Figma MCP, and Filling the Agentic Tooling Gaps

10:02
Why LLMs Aren't Idempotent and Context Window Limitations

11:27
What Actually Needs Generative AI vs. Traditional Automation

13:01
Software Spikes, the Slop Factor, and Controlling AI Output

16:01
Why Experienced Developers Are Writing the Best Code of Their Careers

17:23
Convention Over Configuration as an AI Advantage in Rails and Elixir

20:00
Inheriting Vibe-Coded Apps and Making Them Production-Ready

21:00
Migrating React Native CLI to Expo With Claude Code

25:23
How Clean Code and Good Testing Feed the AI Feedback Loop

27:10
Developer Tooling: Claude Code, TideWave, Pencil.dev, and Beads

32:40
Claude Max Plans, Token Limits, and Model Harness Pricing

38:00
Building Your Own AI Cyberdeck and the Cost of Tooling Subscriptions

42:50
"Code Is Cheap" and Why Taste Matters More Than Execution

45:30
How AI Is Changing Project Estimation and Technical Sales

Presented by

Transcript

Ready to learn how we can help?