The Ruby AI Podcast
The Ruby AI Podcast explores the intersection of Ruby programming and artificial intelligence, featuring expert discussions, innovative projects, and practical insights. Join us as we interview industry leaders and developers to uncover how Ruby is shaping the future of AI.
The Ruby AI Podcast
You Can’t Vibe-Code Trust: Scaling AI Safely with Bekki Freeman
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Valentino Stoll and co-host Joe Leo open the Ruby Podcast noting OpenAI is winding down its SOA video app and discuss the broader difficulty of building AI businesses. Guest Bekki Freeman, staff software engineer at Caribou Financial and organizer of Rocky Mountain Ruby, shares conference details (Boulder, Colorado at eTown, September 28–29; CFP opening soon; tickets after the schedule). The conversation focuses on safely scaling AI use in an 8-year Rails monolith: preparing messy codebases with dead code and metaprogramming, strengthening test harnesses and coverage, improving documentation, and being explicit about desired patterns rather than copying existing bad ones. They discuss PR review bottlenecks from increased AI-generated PRs, ideas like specialized AI review agents, stronger RuboCop rules, pairing/mobbing, and remote knowledge-sharing practices, plus security cautions and what AI may and may not replace (tech-debt work vs “taste”).
00:00 Sora Shutdown News
00:57 AI Hype Reality Check
01:43 Meet Bekki Freeman
02:00 Rocky Mountain Ruby Update
04:32 AI Meets Legacy Rails
07:22 Prep Codebase for AI
10:06 Patterns Versus Best Practices
12:37 Testing Strategy and TDD
16:45 PR Review Bottlenecks
19:27 Specialized Review Agents
21:31 Defining Quality Context
24:29 Humans and Team Adoption
25:20 Remote Change Adoption
27:00 Creating Sharing Rituals
29:19 Release Calls As Watercooler
30:12 Mob Sessions With Agents
33:55 Security And YOLO Risks
35:45 Too Much Code Problem
37:16 Vibe Coding Vs SaaS
42:10 AI Engineering In Two Years
45:33 Codex Versus Claude
47:39 Wrap Up And Farewell
Valentino Stoll 00:00
Hey everybody, welcome to another episode of the Ruby AI Podcast. I am Valentino Stoll, joined by my lovely co-host, Joe.
Joe Leo 00:08
Hi, I'm the lovely co-host. I've got something to hit you withright off the bat, Valentino. Sora is no more. OpenAI announced today that—let me get thisright from CNN—is winding down Sora, the video generation app. It launched to so much fanfare last year.
Joe Leo 00:27
And I have a question for you: what are you going to do? The deal with Disney that was announced in December is now dead. You can no longer automate your favorite Disney characters. What are you going to do with your time?
Valentino Stoll 00:42
Find another model.
Joe Leo 00:45
Find another model. Yeah.
Valentino Stoll 00:49
I do think it's funny that they couldn't afford video.
Joe Leo 00:54
Yeah, it's true. It's true. Well, you know, I read it and I thought, kind of like, okay, good. And my second thought was, actually, it's really hard to do some of this stuff. And if you're trying to focus on every—I mean, it's a little bit of a response to people who are just like, oh, well, in two years, you know,
Joe Leo 01:13
AI will just do X, where X is anything: movie production, full accounting system, like whatever it is. And it's not because AI can't do it or it's incapable. It's because a business has to be built around this, and it's hard.
Joe Leo 01:26
And we didn't get here with three years' worth of effort, and you're not going to just kind of take it over with one company or three companies doing every industry that we can comprehendright now. So there's my rant, and thank you for indulging me. And now let's introduce our guest today,
Joe Leo 01:46
the wildly competent engineer and community organizer, Becky Freeman. We are so happy to have you on the show.
Bekki Freeman 01:54
Thank you for having me. I'm so excited.
Joe Leo 01:57
We're excited too. I'm excited. First, I'm going off script here, not that we really have a script, but I'm kind of curious about your work with Rocky Mountain Ruby and how it's going and what's going to look like this year.
Bekki Freeman 02:10
We have dates for this fall.
Joe Leo 02:12
Allright.
Bekki Freeman 02:13
So excited. And now you're catching me off guard because I don't have a calendar up in order to tell you the exact date. So it's going to be the end of September, and it's going to be in the beautiful Boulder, Colorado, as always. Same venue, E-Town, which we all love. And our dates are September 28th and 29th.
Bekki Freeman 02:33
Would love to see you all. We're going to be opening our request for proposals in the next couple of months, and we'll start getting our lineup set, and then tickets will go on sale as soon as we have our schedule.
Joe Leo 02:47
I'm going to fire off a proposal. I'll tell you what, because I missed the window for a RubyConf because of laziness, and I would like to remedy that with Rocky Mountain Devs.
Bekki Freeman 03:02
We would love to have your proposal. That sounds great. I will make sure you get the message when we go live.
Joe Leo 03:08
That's great.
Bekki Freeman 03:09
Yeah. So Spike and I've been doing Rocky Mountain Ruby for a few years now, and we just love it. It is such a welcoming and kind atmosphere, and it's—we build in a lot of social time, which some people go, oh, but the truth is, in reality, people love it. We have icebreakers.
Bekki Freeman 03:26
We make sure that everybody is very friendly to inviting a new stranger into their group to get to know. We have open lunch so that everyone can go and talk, and it's just a fun conference. And Boulder in the fall is gorgeous as well. So a lot of people will come early for the weekend and go hiking and then come and see us for the conference.
Joe Leo 03:46
Yeah, I love that. I do love the getting-to-know-you stuff. I do groan at it when it's announced, and then I do it, and I always enjoy it. You know, Ancient City Ruby—I don't think they have it anymore—but that used to be along the similar lines in Jacksonville Beach, where, yeah, you spent about 60% of the time in conference sessions, and then, you know, you really went out and got to know everybody the rest of it.
Joe Leo 04:06
So I'm very excited about that. And I love the—I don't know, flagship is theright word—but the premier regional conferences, of which this is definitely one. And I really love that you're still doing that work.
Bekki Freeman 04:19
We love it. Yep. I'm wrapping my Rocky Mountain the Stella dinosaurright now.
Joe Leo 04:24
Yeah.
Bekki Freeman 04:24
Yes. So if you want to see this year's Stella, you have to come to the conference.
Joe Leo 04:29
Allright. So I reached out to you so that we could talk a little bit about AI and Ruby, and I thought you're at a, I guess, relatively new—you're about a year in to your new role—and I thought, well, I bet there's some interesting stuff going on. And when we had just a brief chat, I thought, yeah, this is good. We haven't done a show quite in this theme before,
Joe Leo 04:48
which is, hey, you know, here is a high-risk legacy system. Legacy, I mean, eight years is not that long,right? But it's been around for a little while, and you're coming into it, and of course, you've got opinions about AI, and so does everybody else on your team, and you've got potential use cases for AI. And then that kind of hits reality, and that's what I want to talk about today.
Joe Leo 05:08
So I guess just give us an overview. You know, what are you working on? Where are you working? And what's the team and the application, the system like?
Bekki Freeman 05:15
Yeah. So a staff software engineer, Caribou Financial, and we focus on refinancing auto loans. Our mission is to give people financial freedom. And believe it or not, there are people with car loans that are 30% interest. And so the goal of our company is to get people down to a more reasonable interest rate,
Bekki Freeman 05:35
save them money every month, get them the gap protections they might need so that if they have a car accident, heaven forbid, that they don't go into, like, financial distress because of it. And because we have real customers, we have to be very thoughtful about changes we make in the code. And that means we can't YOLO vibe code,
Bekki Freeman 05:57
just sit there just like in front of our computer being like, Claude, just make this thing for me, because we have real people who depend on our software. And so we are more cautious about it. And the way I like to think about anything in software is as a system. That's very much where my brain goes.
Bekki Freeman 06:14
And an engineering organization, along with the software it owns and gardens, is a system: many inputs, many outputs, a lot of dependencies.
Bekki Freeman 06:23
And so where we're atright now is we're trying to scale our use of AI as a software development team from every engineer doing random things and experimenting to an actual system-wide use that has the ability to scale safely. And that is my fun projectright now,
Bekki Freeman 06:44
is trying to figure out how do we scale AI development without destroying our code or reducing our quality.
Joe Leo 06:52
Valentino, that's kind of the job description for you, isn't it, Edgusto?
Valentino Stoll 06:57
Yeah, pretty much.
Joe Leo 06:59
Yeah.
Valentino Stoll 07:00
We have a very similar problem, though, we're trying to solve, and it's ongoing.
Bekki Freeman 07:04
Yeah.
Valentino Stoll 07:05
So yeah. So I'm curious, you know, where do you even start with something like that?
Bekki Freeman 07:09
We start at the beginning.
Joe Leo 07:12
Help Valentino, because he needs some direction.
Bekki Freeman 07:14
Allright. This is going to be fun then, because I need help too. So we're going to brainstorm, and we're going to come up with some really good ideas. So I think when I started looking at this and reading all the blog articles, as we all do, doing all the research, what I first saw was that the codebase has to be ready for AI to come in and build the context it needs.
Bekki Freeman 07:34
Everything in engineering is about context. If you don't understand all of the moving parts and all of the dependencies, you can have a problem. And if we have an eight-year-old monolith that at some point thought it should be microservices and then realized that was not theright idea, you also have some, like.
Joe Leo 07:51
Just want to cut in here and say every Ruby codebase that is at least eight years or older has this problem. All of them have had an identity crisis at one time. Go on.
Bekki Freeman 07:58
I'm glad we're not unique. And no shade to microservices, a tool for theright problem. And our problem was not a microservice problem, but we do have a lot of, like, spaghetti in our code and some things that are half done and some projects that got killed halfway done. And so for the AI bots, for our friend Claude specifically,
Bekki Freeman 08:18
to understand our code, it's going to go down the wrong path quite frequently, because they'll go find this class and they'll be like, oh, this is what I can use. Well, no, actually, that's dead. Doesn't work anymore. It's just never been removed. And so that's why I'm saying we have to kind of start at the beginning and prep the codebase. So we found a couple very specific issues where we've had to really shore up our practices.
Bekki Freeman 08:39
So one is in meta programming. I think in Ruby we love our meta programming. That makes the code a little bit tough to search. And I think Claude's favorite command is grep. And when you can't grep for a specific string, you really do have to be very careful. So that leads to the next safety mechanism, which is test hardnesses.
Bekki Freeman 08:59
We need to make sure we have, like, very robust testing so that if we do break something, that it will be caught in CI or caught before.
Joe Leo 09:09
Sorry, that's near.
Bekki Freeman 09:10
It goes into prod. I was going to say.
Joe Leo 09:12
I'll mute that.
Bekki Freeman 09:12
I'll think that's my train. My train has, like, a much more aggressive honking sound. Yeah. So then we had to shore up our test hardnesses quite a bit. And there were some areas where we had 20% code coverage on unit tests. There were some areas that we had integration-level coverage but not unit-level, and then some vice versa. And as I started digging through that,
Bekki Freeman 09:33
I started finding dead code. So now we removed that. So you can see how, like, building the system starts way at the beginning before we can even use Claude to really improve our code or add new features. And I could go on. There's more and more and more.
Bekki Freeman 09:46
But the big one that comes next after this is what kind of documentation do we have in the codebase so that we can actually give our agents more information upfront so that they have a better starting point for any questions we ask it or any features we want it to put in.
Joe Leo 10:03
I like that as a starting point. I have this example from this morning that I wanted you to react to. So this is Codex, but, you know, the results are pretty much the same with Claude code. I love Codex now. Valentino, I'm ready to fight you over Claude code versus Codex, but not at the moment. So, okay, so I am working on a legacy codebase, and I'm just in this exploratory phase.
Joe Leo 10:22
And I'm going to start making some changes. So I was just like a very simple prompt. I just wanted to see what it would do. And I said, look, I always want Codex to be TDD, and I want it to use best practices for design. It's a language that I don't work with all the time. It's not Ruby. I don't know how to really drill in. Always use Git work trees, you know, PR would fit, et cetera. Anyway, so it gives me this boilerplate that had this bullet point in it,
Joe Leo 10:43
and it said, "Prefer existing design patterns and best practices in the codebase. Keep changes surgical and coherent with current architecture." And I immediately thought of you because I thought, I don't want it to use the patterns that are here. I don't know it that well, but I know it's not doing theright things. You know, I want it to use best practices and refactor incrementally to those patterns.
Joe Leo 11:04
But I think by default, Codex at least is going to say, whatever's here is the way this person wants it. And so I'm going to kind of keep that structure.
Bekki Freeman 11:14
Yes. And that is such a double-edged sword because on the one hand, yes, we don't want new patterns all over the place because it makes the code impossible to understand. But on the flip side, if you have bad patterns in place, having those replicated is not good. Our example of that situation was we were not using, like, the subject pattern and described class pattern in our R spec files.
Bekki Freeman 11:36
And Claude was like, cool, I'll keep doing the same thing. And so we actually had to tell it, no, no, please use those patterns, even though they don't exist, and be really explicit about what patterns we wanted to change. And that's part of building that system around scaling AI is some of the patterns you want, some of the patterns you don't.
Bekki Freeman 11:55
And now you have to tell Claude which ones are the good ones.
Joe Leo 11:58
Yeah, it's funny. Going against the reels conventions really is just biting everybody at this point more than ever.
Valentino Stoll 12:07
Yeah, I tend to agree with that. Setting up theright conventions, even on your own, that you do want is a hard challenge. And even Claude sometimes will listen to your Claude files.
Joe Leo 12:20
Yeah, that's it. It's another thing I'm finding. But there are ways I've found it to work better because it identifies the Claude files in subdirectories. So you can have, like, very specific directories have their own rules and memories, which is helpful. I'm curious, like, you mentioned testing is, like, kind of critical.
Joe Leo 12:39
I would say don't even start using Claude unless you have testing in place and working well. But how do you, like, deal with the test growth? Do you have any rules in place for, like, okay, no AI can make tests? You mentioned, like, best practices for, like, trying to get it to use theright subject predicate or things like that. How involved are you?
Joe Leo 13:00
Are you becoming test-driven development, or?
Bekki Freeman 13:03
It's such a big question. So I'm trying to think how to distill down the answer. We are still developing our rules and best practices around tests. Our test suite is slow already, and we have to make a decision. Do we want to let it get slower by adding more tests and then work on getting it faster?
Bekki Freeman 13:22
Like, what's our priority? Is it speed or is it coverage? And both areright. So we have to just choose which is most importantright now. And soright now where we are focused is increasing the test coverage. So a specific example I have in part of my experiments of developing our system for AI-driven development,
Bekki Freeman 13:43
I just started YOLOing some PRs. I didn't necessarily merge them YOLO, but I started YOLOing them. And I said, Claude, go refactor this code to use modern Rails practices and good OOP, things like that, and use test-driven development and blah, blah, blah.
Bekki Freeman 14:02
Okay. And I put up this PR, and it actually looked really good. I was like, actually, this is fine. I think this is okay. And then one of my reviewers, my human reviewers, pointed out that there were some meta methods that Claude didn't notice, and that would now just throw a no method error because it hadn't fixed those. And I was like, that's interesting.
Bekki Freeman 14:22
So a human found this, but Claude didn't. But also our test harness did not find this. And so then I took a step back and I said, okay, Claude, we only have 60% test coverage over here. Please increase this, get all the branch coverage, et cetera, et cetera. And then I merged those two PRs and said, okay, now would it fail on this change?
Bekki Freeman 14:44
Nope, still didn't fail. It had to keep digging into it. And so I thinkright now my policy or what I'm doing myself in my experiments is do all the test coverage first in a separate PR before I even try to tackle the rest. So yes, more TDD, but a confession,
Bekki Freeman 15:03
I would much rather have the AI write tests for me than sit there and do it myself.
Joe Leo 15:08
Yeah, I got it. We want the honest truth there. I think most people are in that camp, allright? Most developers are in that camp. But then it does become this challenge, both at an individual level and at a team-wide level, which is where is it that you insert the human judgment,
Joe Leo 15:29
the human behavior to get the highest leverage or return on your investment?
Bekki Freeman 15:34
It's really hovered. Yeah, you're asking me all the hard questions. So if I may take this one quite broad, so we've thought, okay, Claude and Gemini, they can do pretty decent PR reviews in the aspect of, like, conventions or no method errors, things like that, unsafe code. They're pretty good at that,
Bekki Freeman 15:54
but they don't really understand yet, especially these review bots, the full context and all the side effects and all of the unintended consequences of a change. And so we thought, okay, the human should come in on that aspect, the more complex of, like, the human can understand all these pieces that Claude can't see and apply that knowledge and that,
Bekki Freeman 16:14
you know, historical context to this PR and make sure that the design is really good. The implementation of the intended feature design is really good. Where I got stuck is I said to myself, that's too late.
Bekki Freeman 16:30
Have you ever put up a PR that you, like, lovingly grew and you sent it through everything and then you put it up and then someone's like, you did this all wrong. This is in the wrong place. The design's wrong. You shouldn't have used a service object here.
Bekki Freeman 16:43
So to circle back to your actual question, pair programming, I'm seriously going back to everything we write should be paired because then you get all of that design collaboration upfront before at the best, most efficient time to have it rather than at the end when the PR is ready to ship.
Joe Leo 17:03
I love this. And actually, we talked a little bit about this with Kinsey, didn't we, Valentino, where we were talking about letting a third, letting an agent into your pair and how that might kind of positively impact the application and positively impact the efficiency. So have you done any of that or has your team?
Bekki Freeman 17:24
We're starting to do more of it. And we are quite early on in this system because it is like 30 things that we're trying to adjust at once. So we are working on that. Andright now our biggest bottleneck is PR review time. And I would assume at Gusto, you're kind of in the same boat.
Bekki Freeman 17:40
And we have gone from an average of 20 open PRs to 70 because we're putting up so many more PRs, but we still only have the same team size to review them. And that's where, you know, I'm building the system further forward is how do we fix the PR review problem?
Bekki Freeman 18:00
And, you know, we haven't solved it, but everything I'm reading says that if you have good test harnesses and you have good canary environments, you don't necessarily have to read the code as much and as manually. So then you do want to have that pair programming because you don't want just one person kind of doing this in a silo. You want to have more eyes on it.
Bekki Freeman 18:20
But that's where we're stuckright now is what do we do about PR review bottlenecks?
Joe Leo 18:27
Let me tell you something. Every company that I know of has this problem. I was having a conversation about this just yesterday with one of my engineers on their client. And so it's rampant. And of course, this goes all the way up to Amazon,right? AWS, who had these outages a few weeks ago where they've applied something called controlled friction,
Joe Leo 18:48
which is just, you know, slowing down and having more people look at the code. And so that's really not an answer,right? Like, go slower is not an answer. So I don't think that this is solved, but I do know that it's a rampant problem. There are just PRs open everywhere. And of course, it's almost like you don't even want to tell the business that all this code is ready.
Joe Leo 19:08
It's like, well, just deploy it. What's the worst that can happen? Right? Like, we know the worst that can happen because our jobs are on the line. But I think today it's a problem without an answer. I'm very interested to know what you think an answer might be or what it might look like. What are you thinking?
Bekki Freeman 19:24
Yeah, it's a great place to start. So what I'm thinkingright now, and this is not a novel idea, this is just for reading what other people are doing, what other people are finding success for, is where we're headed is toward very specific agents that review PRs.
Bekki Freeman 19:42
So the one agent would be an expert on active record, query efficiency, query performance, correct database access. Another agent, PR reviewer, might be the observability specialist.
Bekki Freeman 19:58
So it will look and say, hey, I don't see any way that we can actually see when your code is breaking or see if it's operating successfully. So I want you to add more observability here or tweak it for best practices, whatever. And then one joke was like, we need a Becky bot, a Becky agent, PR reviewer,
Bekki Freeman 20:18
because Becky will always call you out on instance variables in a partial. So you might as well just have a PR review agent that just looks for instance variables and partials. And the joke with that one is that it's not going to be mad at you. It's going to be disappointed.
Joe Leo 20:34
Oh, no. It's so much worse.
Bekki Freeman 20:36
That you put an instance variable in your partial. So things like that that just do more than static analysis, but kind of look at the conventions. And you can tell Claude that it's on the Rails core team and it's an expert on Rails. And is this following all of the Rails conventions? Is this how you, core team member, would have written this PR?
Bekki Freeman 20:57
And try to get a little bit closer to at least not having to read, like, word by word for any little tiny issue in conventions or logical errors, things like that. So that's one thing we're looking at. And we've started building some of these.
Bekki Freeman 21:14
We already have RuboCop and we're trying to bolster our RuboCop and get our to-do file from, what, 3,000 lines to, like, maybe 100. I think once we get down to that point, we won't have to do as much style review, things like that. Yeah, what are you thinking?
Joe Leo 21:30
I'm curious because most of this sounds like it's like the hardest problem that anybody has is really how do you define meaning,right, under different contexts? Like, you could just say, okay, be the best Ruby engineer and follow the best practices. That doesn't mean the same thing for every instance. You might follow different conventions in a test file,
Joe Leo 21:50
that you might in a service object, that you might in a front-end component. The meaning of what it means to be a good Ruby programmer is different in each of those contexts, depending on what team you're on, what project you're on. And so I think about this a lot. How are you kind of like defining this meaning in a way that can improve future use cases?
Joe Leo 22:11
If you have, like, this specification bot, are you storing that information in a specific way, in a specific place so that it can resurface when it visits the test file? What are your, like, approaches to, like, capturing that and saving it?
Bekki Freeman 22:26
The more specific it is, I feel like the better it can be,right? Like, so if you have the Hotwire TurboStream expert bot, it has the meaning of what does good use of all of the Hotwire functionality look like? And so I wouldn't want that one looking at the services for Rails conventions.
Bekki Freeman 22:47
I'd want them to be very specific to different pieces of our code. And then what do we do? We pointed at the documentation, I guess, and then pointed at a bunch of, like, recent blog articles.
Bekki Freeman 22:58
I've definitely had Claude look at some Rails 6 conventions for our Rails 7 app and, like, apply the wrong fixes and the wrong review comments because we're on the wrong version of Rails compared to what it was looking at. So I think it has to be explicit and specific, but then also, like, concise.
Bekki Freeman 23:18
I think brevity is important because as soon as the context falls over, it starts doing wacky things. But every app has a different purpose in life. So I don't think all of us can use the same review agents. We're all solving slightly different problems, so we have different concerns.
Bekki Freeman 23:39
You know, if you were looking at a query on a million-row database table, you might make a trade-off to have an M plus one query instead of pulling the entire table into memory, for instance. But you need to, like, know which issue you're dealing with or which is the most critical to your system. Yeah, what is the meaning of meaning?
Joe Leo 24:01
That's all we need to figure out here. And then we can go home. I actually, I really like your answer, Becky. And it sounds like the thing that caught my attention with creating review agents, for example, is that yes, you'reright. Your review agents might not be my review agents on my application. And of course, I may work on five applications and I need different agents for all of them.
Joe Leo 24:21
But within your company, you might create a review agent and that might be the best thing for Caribou, no matter who the engineer is,right, depending on the circumstance. Now you're in this position where you're trying to implement changes across a team, which means that you're going to have to deal with humans, unfortunately.
Joe Leo 24:39
And so I'm curious to know what other things have worked. And simultaneously with that, like, what other misgivings or challenges have you had to overcome with the humans?
Bekki Freeman 24:50
Yes, the humans are the center of everything we do. And they're a huge part of our system. So there's a couple things. So people have opinions. That's one lovely thing about humans. We all have opinions. And so we can do this big collaborative exercise of trying to gather all the information and everybody's perspectives.
Bekki Freeman 25:11
But then who makes the final decision? Because at some point we need to make a decision and some people aren't going to agree with that decision. And so making sure that we're following change management practices that people, even if they don't necessarily agree with the decision, they know why it was made and they understand and can be bought into it. It's not always a fast process,
Bekki Freeman 25:32
but I think it's really important in terms of building our engineering culture. And then there's also we have to be respectful of people who are like, am I just writing Claude files that are going to take my job at some point? Like, I'm teaching Claude how to do my job and that has a fear component with it as well. The training I'm finding is also hard because we are remote,
Bekki Freeman 25:54
so we're not at the same water cooler. And so we can't find out what people are doing or what processes and policies have been put in place by accident. We have to be very intentional about it. And if you've heard the adage, people have to hear something seven times before they actually internalize it. Now picture you're the leader in the organization saying,
Bekki Freeman 26:14
use this Claude plugin before pushing your PR. You start to feel like a broken record because you have to say it seven times to make sure everybody's actually seen it at least once, has seen the directive or seen the policy change. And so we are definitely struggling with how do we promote the changes that we're making,
Bekki Freeman 26:33
the improvements we're making, and then how do we get people to use all of the tools that we're putting in place?
Joe Leo 26:40
Yeah. Well, first, let me say that that adage that you have to say something seven times before people hear it once, it's as annoying as it is true. In 12 years of running a company, I have found it to be true over and over and over again. And so you just kind of get used to it. I'm just kind of like, well, I know, I know, I'm saying this for the fifth time, but it hasn't stuck yet. So I'm just going to keep saying it. More importantly,
Joe Leo 27:01
you've talked about this water cooler effect. I think that's really interesting or the lack thereof because you're remote. And I could have four different conversations with four different engineers at Dev Method and they're going to tell me four different ways that they use AI, even if they're on the same team, even if they're working on a similar application.
Joe Leo 27:19
And so I get that that can be a real challenge. Are there ways that you can help your team to kind of keep up with the latest or even be intrigued or excited by developments? Because the thing is, even if you implement something, it's going to have to change in a couple of weeks,right? Things change so quickly.
Bekki Freeman 27:39
We've been trying a few things here and it's tough because we are also across the country. So we have four time zones. And so, you know, lunch for one time zone is dinner for the other. So it is always hard to get us all in one room. But we have Friday Technopalooza is what we call it,
Bekki Freeman 27:59
where we just share random things we've been working on that are involved with the engineering organization. So sometimes it's product changes. Sometimes it's a new tech-related thing that we've discovered or worked on. And lately it's been turning a lot into sharing AI practices and knowledge and AI-based experiments that we've been doing.
Bekki Freeman 28:18
We also have a Slack channel called Knowledge Share, where it's like, oh, I found this random trick that's really making my life better. I'm going to go share it in this channel. And while that is kind of just a scrolling feed of stuff, at least if you see it and a month later it becomes relevant to you, you can actually go search for it.
Bekki Freeman 28:38
So at least it's there. The hardest part when people say, oh, we just need to document everything, I don't buy that because that's only half of it. The other half is people have to read it. And I think 90% of documentation that exists on a system is not read by anybody. Nobody even knows to go look for it.
Bekki Freeman 28:58
And that, to me, when people say just go document it or go write up a policy or a procedure, I don't buy it because writing it doesn't actually help if no one knows to go read it. So you still have to keep doing the Knowledge Share piece of it. So just creating spaces where people can talk and can share ideas has been really important.
Bekki Freeman 29:18
Another incidental one that I've found is we have a deployment call that we don't do continuous releases. We do scheduled releases. And when those happen, there's a call and only one or two people have to be on that call. But on that call, we're just doing a lot of watching pods roll and watching builds go through.
Bekki Freeman 29:38
And so it turns into a really great water cooler. So even if I'm not assigned to be on that release, I go to that meeting because it ends up being a really nice spot to share information with teammates and also get to get to know our remote coworkers. And usually we do end up talking about what our latest experiment is or I've been trying to add tests to this class and holy cow, it is crazy.
Bekki Freeman 30:00
And then someone else will be like, oh, I worked on that. So I know that the reason it's crazy is this other thing over here. And you really do get a lot more organic knowledge sharing just these shared spaces.
Joe Leo 30:11
You mentioned a really great point of how do you bring human back into these new workflows that we're creating for ourselves. And that's a great one, having these release water coolers. That's fun. I'm curious, where else can we bring human back?
Joe Leo 30:28
And I know, like, one thing that has worked well recently is before working on a feature, kind of doing like a mob pairing session where we have, like, 5 to 10 people, you know, working on the design or, like, whatever new thing that is going to be worked on as, like, a collective experience where one person's driving, like, the agent session.
Joe Leo 30:49
And be like, oh, no, that's not exactly what we want,right? And like, well, why don't we want that? Teasing apart, like, kind of like doing the old school, like, on a whiteboard almost, but in an agent session on a Zoom or something,right?
Bekki Freeman 31:02
Totally.
Joe Leo 31:03
Of breaking apart the problem you're really trying to solve. What other ways are you, like, thinking about this kind of problem, like forcing the what problem are we solving and how are we going to solve it together kind of thing?
Bekki Freeman 31:17
So this is going to sound, I guess, counterintuitive, but we had some issues this week with some of our pods falling over, just running out of CPU and getting seven of us on the call troubleshooting it, even though it was high pressure, high stress and, like, nobody wants to be in the situation where we have an outage.
Bekki Freeman 31:38
It was one of my most fun days at work in the last month because, yeah, we were all just together hammering on this problem. And the funniest thing is we are all asking Claude independently what's wrong, what's causing this. And Claude gave us each of us a different answer.
Bekki Freeman 31:59
So we had five different smoking guns from Claude and not one of them was the root cause.
Joe Leo 32:07
That is kind of shocking. Maybe it shouldn't be shocking. That is shocking to me.
Bekki Freeman 32:10
And Claude was pointing at this particular sidekick job that we had. We had refactored a sidekick job to pull more stuff into memory rather than doing more database queries. And Claude was like, this is the smoking gun. This is definitely what caused your issue. High risk, high impact. It had siren emojis in its report. It was very, very adamant that this was the issue.
Bekki Freeman 32:31
And then after we researched it for like 20 minutes, we're like, that job wasn't even scheduled to run. What are you talking about? It was like, oh, well, I guess I didn't really know if it would run or not. So, you know, we had this lovely mob troubleshooting session. We all got to laugh about it. We all got to be very human,
Bekki Freeman 32:50
even while we were using our assistants to help us troubleshoot the problem.
Joe Leo 32:56
That makes me think this is like almost like a missing product is like a way to get collectively everybody's bots to join in on a single session where there's like a mediator trying to solve a problem. Incident response, but like for the you're human and they're bot to join.
Bekki Freeman 33:13
So now on the next incident call, only everybody's AI agents will join and no actual humans will join you and they'll just.
Joe Leo 33:19
I'll have ruined the whole thing for you.
Bekki Freeman 33:23
So much for bringing humanity back.
Joe Leo 33:27
No, what I do think about this, like moldbook experiment, maybe as a way to, like, sidetrack people into humanity is by getting their AIs distracted, you know, in like in a moldbook fashion where their bots are socializing and like giving the human space. There was a dating app that came out on the back of that.
Joe Leo 33:47
The agents get together and they kind of work out whether or not the humans would make a good match. I like that.
Bekki Freeman 33:53
Lovely. We're fairly cautious in how we deploy our agent tooling because we deal with a lot of sensitive data and so we have a very high security bar. And so we aren't doing things like moldbook and OpenClaud where we just can just YOLO go out anywhere and do stuff. We're much more cautious with what permissions we give it.
Bekki Freeman 34:13
And when I hear some of the stories of people using OpenClaud, I am shocked that they're doing this with their company's code and their company's infrastructure. And I wonder how long it is before things start going very badly.
Joe Leo 34:32
Well, bad is in the eye of the beholder. If you work at a company where your primary purpose is to fix these kinds of things, yeah, probably going just fine.
Bekki Freeman 34:40
So you're happy.
Joe Leo 34:41
I'm doing fine. Yeah.
Bekki Freeman 34:42
Great. Keep doing that.
Joe Leo 34:44
Now I also use OpenClaud and I also vibe code and YOLO, but I only do it on things where I'm the only user. And when I'm using OpenClaud, it's in a sandbox on a digital ocean droplet where, you know, the blast radius is very small. Does it surprise me that people are not doing that? Not really. I mean,
Joe Leo 35:02
engineers, I think, included, there are just a subset of engineers that just have a very high risk tolerance or a very low amount of caring for things that go wrong in their company. Either way.
Bekki Freeman 35:17
Yeah. Your software has no bugs if you have no users.
Joe Leo 35:21
That's absolutely true.
Bekki Freeman 35:22
Therefore, users cause bugs.
Joe Leo 35:24
Users cause bugs. And if I'm the only idiot that's annoyed by the stupid thing I did with Codex and YOLO mode, well, it's like a tree falling in the woods. In all fairness, as developers, we cause bugs. Yeah. Well, yeah. That's fair. The code wouldn't exist without us,right? Yeah. Yeah.
Joe Leo 35:44
You bring up a valid point there because what everybody is reporting, including here today, is that because of AI CodeGen tools, we have more PRs than we can possibly handle, which means more code almost exclusively. Right? PRs almost always add more code than they delete.
Joe Leo 36:03
And that means that we have this throughput issue where the answer seems to be for a lot of people, well, hey, we've got to open this up. We've got to get this stuff moving because the bottleneck is now the human or not human review stage. But what that's going to get you if you solve that problem is a lot more code.
Joe Leo 36:23
And code is a liability. So what do we do about that?
Bekki Freeman 36:28
If I had the answer to that, I could probably retire. This is the question.
Joe Leo 36:33
I've got a solution.
Bekki Freeman 36:35
Allright.
Joe Leo 36:37
I call it the OvdiBot and it's following the no-code presentation he gave a long time ago and it just starts deleting code.
Joe Leo 36:47
Do you really need this and can it be a Google Sheet? Right. Going through code one by one and being like, what is this really doing? Right? And be like, yeah, we don't need this. Let me just delete this. The test pass,right?
Joe Leo 37:01
Like, almost like the Chaos Monkey from Netflix, but like for your code and just like pruning it and be like, yeah, there's a reason you don't have a test to cover this. It's probably because you don't need it. And then just remove it.
Bekki Freeman 37:16
Yeah. Like a lot of the tech podcasts I listen to, they talk about how people aren't going to buy SaaS software anymore because they're just going to vibe code up their own software to do whatever they need to do. And I am with you, Valentino, that I think more software is not good, less software is good.
Bekki Freeman 37:36
And I actually see that kind of falling over a cliff at some point where people are like, I don't want to maintain this Trello board. I'll go pay the 20 bucks a month, whatever, because maintaining this Trello board YOLO app is more expensive on my time than just paying the $20 a month.
Bekki Freeman 37:56
I mean, we only have so much energy and focus and I don't think most business owners want to maintain 20 pieces of software themselves.
Joe Leo 38:05
I think you're absolutelyright. And that's the thing is people keep considering this stuff in isolation. Okay, I'm going to rip out my instance of Salesforce,right? That was a famous example and I'm going to build it myself. Okay, great. Well, now you've got a CRM to maintain.
Joe Leo 38:20
If you actually do that, which I think is insane, are you then also going to rip out another app and another one? Do I really want to maintain everything that I depend on today with a bunch of applications that are all being vibe coded and vibe managed or maintained? Absolutely not.
Joe Leo 38:37
So what I think is more likely is that the threat that I can, maybe if I'm an enterprise or I'm a large company, hey, I can just go and build this myself. So you better give me a discount or you better give me this. That's not an original thought. That was something that was brought up in the Citrini research paper that sent Wall Street scrambling for a couple of days.
Joe Leo 38:57
And I don't agree with all of it, but I think that's a smart thing. Like if it's giving you negotiating power, then maybe the business model needs to shift a little bit and maybe you need to be a little bit more thoughtful about what differentiates you as a company other than just the software. But I don't think that's strikingly different. I think that's just enhancing what has already been true.
Joe Leo 39:17
Yeah. You know, it reminds me of Base Camp's whole phenomena on the software industry in general in that they had one great product that they just focused on exclusively and just iteratively worked on just delivering the quality product of that very specific thing. And they started getting rid of stuff,right?
Joe Leo 39:36
Like they had Campfire as a chat service and they're like, you know what? We don't want to deal with that. We're not going to even deal with that. Yeah,right. Nobody else does. Now they've like revisited for some reason. They're not like offering it as a service,right? Like you're not paying them money for it,right? There's like a proliferation of that idea. Okay, like we have a very specific thing.
Joe Leo 39:56
It's giving us customers and we're happy with the customers we have,right? We don't need this exponential growth month over month in order to sustain that state of being,right? I thought that's the future,right? Like people are going to really dig in and they did for a bit and then they're like, well, I want more money.
Joe Leo 40:16
And everybody thought, you know, if I just add on all these other features, like we'll attract more customers,right? And then you get those customers and they're like, why doesn't this thing work in the corner of the app that I want to use it for? And you're like, well, we'll get there. And I feel like this is the same effect that we're seeing with more growth,right? And more like adoption.
Joe Leo 40:36
So like, yeah, sure, like you can vibe code your thing and it'll serve a very specific purpose to start. And then people will be like, oh, what I really want to do is be able to call this thing from my phone and just have it talk to me,right? And then they're like, oh yeah, how hard could that be? Right? And somebody builds that and then it like has bugs and issues and like,
Joe Leo 40:55
well, you know, I can't really deal with thatright now, but just stop using that for a little bit,right? Like that's like what ends up happening. And like you get these people that get adopted and they start to get excited and then it like falls apart on them and they get less excited and less excited and more frustrated. And then they're just like, allright, I'm just going to pay somebody else.
Bekki Freeman 41:12
I'm just going to pay somebody. Yeah. Yeah. And like, so like WordPress has been around for what, 40 years at this point? And I see so many small business owners that they got WordPress because it's like you don't need a developer, you don't need to code anything.
Bekki Freeman 41:25
And 40 years later, the people who are using WordPress, they still don't know how to manage plugin updates without their whole site going down.
Bekki Freeman 41:35
And so when I think about people who are still struggling to do the minor IT work of their low-code systems, I just don't envision those same business owners being excited about maintaining their AI agent software over time.
Bekki Freeman 41:54
Or like, oh, I got this notice that my key was compromised. What do I do? And that happens a few times and you're like, this isn't worth it anymore. Even though it saved me 20 bucks a month, it's not worth it.
Joe Leo 42:08
Yeah, I hear you there. So I want to step back a bit and I want you to think about the AI engineering of the future. And when I say the future, I guess I mean like two years. That's enough time for things to go to be wildly different than they are today.
Joe Leo 42:24
So I mean, what feels real to you and your team and what you can reasonably envision doing and what feels like it's kind of just hype and will fall by the wayside?
Bekki Freeman 42:35
The first thing that comes to mind as being more achievable with AI is staying on top of tech debt. The biggest issue when you have a business is that you're trying to write software for the business that improves the business, brings the business forward.
Bekki Freeman 42:52
And so the things that get deferred are the maintenance aspects and like upgrading rails or switching from single quotes to double quotes in the whole code base, like those kinds of things. And so I definitely see AI being an unlock in being able to maintain modern standards in software in a more ongoing,
Bekki Freeman 43:14
organic way. I think that's going to be huge. And we're already using it to really accelerate our tech debt management today. So I could see in like two years that just being a thing that happens every night automatically without us even having to be involved.
Bekki Freeman 43:33
The thing that I am still not sure is going to be ready is, I guess some people call it taste.
Bekki Freeman 43:43
I could tell the software I want a Trello board or I can tell the agent, I mean, that I want a vibe coded Trello board, but I can't necessarily explain correctly or in theright way of like how I want it to flow or feel when I use it. Again,
Bekki Freeman 44:01
those very human aspects of our software, like in the end, most of our software is written for human use. Some of it is talking to other software and maybe we'll get further there. But at the end of the day, there are people using it and it doesn't feel good to them. And I just don't know how we would explain that to an AI agent.
Joe Leo 44:24
I think that's insightful. Somewhere between there's no accounting for taste and just building off of what was already done is kind of easy. It was always kind of easy. Now it's easy and fast, but that doesn't mean that you're going to be able to craft what the next thing is that people want to use and see and feel and touch. Do I have that roughly correct?
Bekki Freeman 44:45
Yeah. I think that's a good way of looking at it. So almost like we're all going to become product managers where we have to be able to describe the purpose of the system in a way that the AI agents can understand.
Joe Leo 45:00
So we'll all become product managers. And what else? What else happens after that?
Bekki Freeman 45:04
Maybe the apocalypse. I don't know.
Bekki Freeman 45:11
It's an exciting time because I'm not sure anybody really knows where we're headedright now.
Joe Leo 45:16
Yeah, you'reright about that. But we love getting people's sort of read on the future because a lot of people see things differently and we just don't know. But it's certainly an exciting time we're living in today. Valentino, do you have any parting questions? So have you tried Codex and do you prefer? I was coming back to it.
Bekki Freeman 45:38
Oh man, you're putting me on the spot. Okay. So I love RubyMine. So there aren't a lot of other of the IDEs that I picked up because RubyMine is my favorite, but I love Claude Code's CLI interface. I'm very fond of it. So no, I have not picked up Codex. I haven't even picked up, what is it called? Claude Coworker or?
Joe Leo 46:01
Claude Cowork. Yeah. It's like a vanilla Claude bot.
Bekki Freeman 46:05
Yeah. And then Google's antigravity is actually quite good. Yeah, antigravity is pretty good, but so I'm going to totally cop out on this and I'm going to say they're both great. I love them equally. Sorry, guys.
Joe Leo 46:22
No way. That's true. Allright. I'll take that as a win for me. I'm just going to tell youright now that you're both incorrect on this. I don't know that's true. What I know is that on the small apps that I'm using, Codex is flying past Claude Code and Claude Code now, like I have superpowers, you know, enabled and all that. And now I'm just like,
Joe Leo 46:41
and maybe you don't use this anymore, but now when superpower starts planning, I'm like, oh God, oh my God, just kill me. It's going to take forever. I had it working on some like very simple kind of just adding, you know, linting and, you know, some kind of boilerplate kind of stuff to an app yesterday. And God,
Joe Leo 47:00
it's like, hey, it's asking me these clarifying questions and then it's creating this plan and writing the plan. And meanwhile, Codex is on another branch, just like flying through issues, flying through.
Bekki Freeman 47:09
Really?
Joe Leo 47:09
Now it's a smaller code base, not huge like Gusto and not whatever the gap is between us and Caribou, but still, I mean, worth a shot. Worth taking a look at.
Bekki Freeman 47:20
Okay. I will go assign someone to do that.
Joe Leo 47:23
You'll assign somebody to do that. Assign an agent to test out that agent and have it report back to you.
Bekki Freeman 47:30
But then I need another agent to summarize the report.
Joe Leo 47:33
I know. I know. And then another one to decide.
Joe Leo 47:39
Well, Becky, it's been great having you on the show. Would love to have you on again. Would love to talk about the conference, for example, mayberight before orright after you have it. I wish you all the best of luck in planning and organizing.
Bekki Freeman 47:51
Thank you.
Joe Leo 47:52
Yeah. And hope to see you around soon.
Bekki Freeman 47:55
Absolutely. And I look forward to seeing you both at Rocky Mountain Ruby because I know you're going to be the first ones to buy tickets.
Joe Leo 48:00
Well, I'm not even going to need to buy a ticket because I'm going to get this request instead. I'm going to submit a request like I did not do a couple of weeks ago. And Valentino, will I see you this evening at our official Ruby?
Valentino Stoll 48:14
Unfortunately not. I got some contractors at home. I got to get back to.
Joe Leo 48:20
Contract killers. Allright. I'll be there on my own telling people to stop using Claude Code. Just kidding. Allright. Thanks everybody for listening and we'll see you again soon.
Podcasts we love
Check out these other fine podcasts recommended by us, not an algorithm.
Latent Space: The AI Engineer Podcast
Latent.Space