The Ruby AI Podcast
The Ruby AI Podcast explores the intersection of Ruby programming and artificial intelligence, featuring expert discussions, innovative projects, and practical insights. Join us as we interview industry leaders and developers to uncover how Ruby is shaping the future of AI.
The Ruby AI Podcast
The Latent Spark: Carmine Paolino on Ruby’s AI Reboot
In this episode of the Ruby AI Podcast, hosts Joe Leo and his co-host interview Carmine Paolino, the developer behind Ruby LLM. The discussion covers the significant strides and rapid adoption of Ruby LLM since its release, rooted in Paolino's philosophy of building simple, effective, and adaptable tools. The podcast delves into the nuances of upgrading Ruby LLM, its ever-expanding functionality, and the core principles driving its design. Paolino reflects on the personal motivations and community-driven contributions that have propelled the project to over 3.6 million downloads. Key topics include the philosophy of progressive disclosure, the challenges of multi-agent systems in AI, and innovative ways to manage contexts in LLMs. The episode also touches on improving Ruby’s concurrency handling using Async and Rectors, the future of AI app development in Ruby, and practical advice for developers leveraging AI in their applications.
00:00 Introduction and Guest Welcome
00:39 Depend Bot Upgrade Concerns
01:22 Ruby LLM's Success and Philosophy
05:03 Progressive Disclosure and Model Registry
08:32 Challenges with Provider Mechanisms
16:55 Multi-Agent AI Assisted Development
27:09 Understanding Context Limitations in LLMs
28:20 Exploring Context Engineering in Ruby LLM
29:27 Benchmarking and Evaluation in Ruby LLM
30:34 The Role of Agents in Ruby LLM
39:09 The Future of AI Apps with Ruby
39:58 Async and Ruby: Enhancing Performance
45:12 Practical Applications and Challenges
49:01 Conclusion and Final Thoughts
Valentino Stoll 00:00
Hey, everybody. Welcome to another episode of the Ruby AI Podcast. Joe and I are here today interviewing a very special guest. You may have heard of him from such popular repositories as the Ruby LLM project. Joe, you want to say hello?
Joe Leo 00:18
Yeah. I'm Joe Leo, co-host of the Ruby AI Podcast. I'm very happy to be here with Carmine Paolino. Hey, Carmine, welcome.
Carmine Paolino 00:27
Hey, guys. Thank you so much for having me. This is a pleasure.
Joe Leo 00:30
It really is. And this is going to be a kind of a special episode because for Phoenix, we use Ruby LLM every day. And so I have questions prepared from my engineers who are excited that I'm interviewing you today.
Joe Leo 00:41
But I have a question first, and that is, why is the Dependabot upgrade from Ruby LLM 182 to 190 sitting in my pull request, sitting in the pull request bin for GitHub? Why are my developers afraid to upgraderight now?
Joe Leo 01:01
Is there any reason, or can they upgrade with abandon?
Carmine Paolino 01:04
They can. Sure. There's actually a new release today, which is 191.
Joe Leo 01:10
Ah, see, this is already obsolete.
Carmine Paolino 01:14
Yeah.
Joe Leo 01:14
That's good. I'm going to tell them that.
Carmine Paolino 01:15
The Dependabot config is weekly, I guess. It should be daily.
Joe Leo 01:21
Daily, yeah. That's good. That's probably a good place to start. I mean, over the last six months, 1.0, you'll have to correct me if my timing's off here. I think 1.0 was released about six months ago. Is thatright?
Carmine Paolino 01:33
It was March 11, actually. A little bit more than six months.
Joe Leo 01:37
A little more. And since then, we've had minor releases and tiny releases, and we're now up to 191. And along that same timeline, the library has grown massively in popularity. And so I'm curious to get your take.
Joe Leo 01:53
What do you think accounts for its success, and what does the commensurate amount of updating and kind of love that you've poured into the library mean over the past, let's say, six-plus months?
Carmine Paolino 02:04
Right. I think the success came from the very beginning. So I didn't expect any of this, by the way. I thought this was mostly a project for myself. I built it because I wanted that kind of interface in order to build my company, which is Chatwood Work. And four days later, someone posted it on Acorn News, and it became the number one in Acorn News.
Carmine Paolino 02:24
And so from a few hundred stars that I got from me posting it on Reddit, it became 1,700 stars in not even a week. So it definitely skyrocketed at the very beginning. And then it became a normal, steady increase in terms of popularity.
Carmine Paolino 02:40
I think when you do things for yourself and you're really passionate about it and you're really catering to your own needs, there may be somebody else out there that may feel the same. So I'm a true believer of building things for yourself and dogfooding. And I think that has resonated with a lot of people.
Carmine Paolino 03:01
So I built it for myself, and a lot of people agreed that that's a good way to do it, I guess.
Joe Leo 03:05
Yeah. It's a lot of somebodies that have felt like, yeah, I need this thing too, a whole lot. Just checkedright before we aired, Ruby LLM has over 3.6 million downloads. So you really are, you're leading something that is momentous in the Ruby community and really in the development community writ large.
Carmine Paolino 03:23
Well, I think that's probably the philosophy that I poured into it that actually became the reason why people really communicated with it and really connected with it.
Valentino Stoll 03:32
Before I let you get off onto a tangent,
Valentino Stoll 03:37
I'm really curious, what was missing from all the other gems? I know you said you liked a very particular interface. What continues to drive that interface based on all of the things that are changing with all these models that you see continuing to grow the Ruby LLM project?
Carmine Paolino 03:56
The philosophy is that I think simple things should be simple, and complex things should be possible. Also, models and providers, they are commodities. So they should be able to be changed any time. We have new models coming out any three days, every three days. And so we need to be able to change them really quickly.
Carmine Paolino 04:16
And we don't want to have yet another way of doing things because these providers, they have their own APIs. They want to kind of lock you in their own APIs. So I'm kind of against that. Then also convention over configuration, which we all know and love. Progressive disclosure was another tenet of my philosophy. And also, look, I'm a solopreneur.
Carmine Paolino 04:36
So I need to be able to build the companies that I want to build just by myself. And I don't want to spend a ton of money into the cloud. So I want to be able to make this into one machine and have thousands of connections at the same time.
Carmine Paolino 04:52
So for me, the last, maybe most important tenet of this philosophy is one API for one person in one machine. So these are the core tenets.
Joe Leo 05:02
That's interesting. Can you talk a little bit about progressive disclosure? Because of the things that you mentioned, I think that might be the least familiar, certainly to me, maybe to others.
Carmine Paolino 05:10
Right. So when you use Ruby LLM and you do ruby-lam.chat.ask, then you have all these dot-with methods. That's what I mean by progressive disclosure. So first, you can start with ruby-lam.chat.ask. And not even specifying a specific model, it will choose the default model. Then you can add the model name.
Carmine Paolino 05:31
Then maybe you want to use another provider. Maybe you want to add some parameters that are specific for that provider. Maybe you want to add a system prompt. So all these things you can add on top of the original call.
Carmine Paolino 05:45
And that makes it so much easier to understand what's going on instead of having to configure everything from scratch and be this massive amount of code or ashes that are really ugly.
Joe Leo 05:56
Yeah. And that definitely a huge advantage of Ruby LLM is the way that it abstracts that messiness away from the developer. But I am also curious for you. As you mentioned, a new model is released every three days. Some portion of the 3.6 million people that are using out Ruby LLM want to use that model.
Joe Leo 06:16
So is your job just continuously configuring the interface for a new model behind the scenes?
Carmine Paolino 06:23
Fortunately not, becauseright now we're ready. Yes, I'm fine. Thanks for asking.
Carmine Paolino 06:31
So we have already 11 providers that we support. And out of these providers, some of them, they support hundreds and hundreds of models, like Open Router, for example. Or even the local providers, like GPU Stack has Hugging Face and another model repository. So in theory, we support thousands of models.
Carmine Paolino 06:52
So I don't need to do work for each single model, fortunately. The thing that changes all the time is the model registry. But there is an easy way to actually refresh it. And now we support actually saving the new refreshed model registry in a different file. This is a new thing for 1.8. So hello, dev methods.
Joe Leo 07:13
We're on 1.8, but we need the new 1.9 features and goodies. Yeah.
Carmine Paolino 07:19
Sorry, 1.9. Yeah.
Joe Leo 07:20
Oh, it is. Oh, it's new for 1.9. Then yeah. Allright. We'll get on it today.
Carmine Paolino 07:24
Then also we support the model-backed model registry, which was from, I believe, 1.8. And so you can actually save all of these new models that come in. I have to say the model registry was one of the first things that I wanted to have in the library that I wanted to have. And kind of the fundamental reason why I made all this.
Carmine Paolino 07:45
So you can see it by, for example, having ruby-lam.chat without even specifying a model. Or just by specifying a model, it will actually know which provider it is. And so it would give you the access to that provider instead of you have to remember which model is from which provider.
Valentino Stoll 08:04
That was one thing I really liked about Open Router. It was one of the first, I think, to have this provider concept. It's funny because I tried to work with Andre on the LangChain RB project to get a provider kind of domain set up. It's hard when you already have an established gem to do some kind of core reworking like this.
Valentino Stoll 08:25
It had some advantage there in that you already knew what was missing and knew what to build first. I'm curious, how does the provider mechanisms work? Is it easy or straightforward to set up this kind of setup? What challenges have you kind of experienced trying to manage all these different APIs?
Carmine Paolino 08:46
It's not super challenging. I thinkright now it may become a little bit more challenging with the responses API because it's a completely different concept. So I believe, correct me if I'm wrong, I haven't looked into that deeply. But they store the entire conversations themselves. This is fundamentally different from how we do things in Ruby LLM.
Carmine Paolino 09:04
So I had the concept of we store the conversation ourselves, and we send it all at the same time. It's not that hard to just send the last message, though. So also not that big of a problem. I think the most challenging thing is actually to support every little thing that the providers add that is slightly different.
Carmine Paolino 09:25
And maybe the most challenging thing of all was to actually implement Anthropic prompt caching. So every other provider kind of does prompt caching by themselves automatically. You don't even have to set it up. It's great. Anthropic decided that you should have the full control about which messages they get cached.
Carmine Paolino 09:44
And they have a whole book of a page that you can read about when it happens, why. It's crazy. And I didn't want to have a whole set of hooks and methods and things just for that specific quirk. So it was quite the challenge. There's been a few PRs that wanted to address that.
Carmine Paolino 10:05
And they fundamentally changed a lot of things in the library. So I was like, I'm not so sure about this. What if they decide that this is a wrong idea at some point and they change it? And then we're stuck with all this mess of code to rewrite again.
Carmine Paolino 10:21
So I decided to go a little bit deeper and basically give the users the possibility to attach raw content. Because in order to do, correct me if I'm jumping too deep into details here. But basically, for Anthropic prompt caching, you need to specify certain things,
Carmine Paolino 10:41
like the cache control, ephemeral, and then also for how long you want it, inside the block of the content. So you have type text, then the content. And you have to specify in that same block. So I had to give access to the users to that block. And so I invented the concept of raw content blocks,
Carmine Paolino 11:02
which is, I believe it was 1.9 also.
Joe Leo 11:06
Allright. So still, that's not been in production for very long. So probably jury's still out on how effective it'll be. And what I'm hearing you say is that you're trying to make the design decision that makes the smallest amount of impact to the overall library because you're really just supporting one provider. One very important provider, but one provider nonetheless.
Carmine Paolino 11:27
Yeah. So in general, I think that's the challenge, is to not completely mess up the whole library just for little quirks of little providers. And so sometimes you need to sit for a little while with the ideas and think deeply. And then something comes up that.
Valentino Stoll 11:43
You make kind of a great point in that Anthropic even has their whole access to thinking where you can add thinking blocks. So I guess that's where I wonder about the complexity within Ruby LLM, managing all the APIs. The responses API comes with a whole bunch of new things that you can't do in other model providers.
Valentino Stoll 12:04
How are you looking at this landscape of now that it's beyond just everybody uses OpenAI's APIs, now that the use cases are getting more advanced, what are your plans longer term for supporting these diverging APIs?
Carmine Paolino 12:19
I look at the common denominator. So for example, thinking is going to be something that we're going to look at pretty soon because a lot of providers now have thinking. And to support it in a very straightforward, clean manner in a way that I would actually enjoy using. And we essentially have two different layers in Ruby LLM.
Carmine Paolino 12:37
So we have the plain old Ruby objects of Ruby LLM in which you have all your representation of the messages, the contents of the messages, the attachments, all of that. And it's in one single format. It's one API. And then you have the provider translation layer.
Carmine Paolino 12:57
So the provider translation layer parses the provider responses and also renders the provider responses. So in order to support all these APIs, I basically we need to think, what is the best way to interact with that specific concept? And then implement it inside the Ruby LLM,
Carmine Paolino 13:18
pure Ruby LLM part of it. And then in the provider translation layer, then we implement how to do that. It's as simple as that.
Joe Leo 13:27
Look at the wheels turning for Valentino.
Joe Leo 13:32
There's a lot of unknown usage, I think. I really admire your you and Alex Rudolph from the Ruby OpenAI gem have a very similar philosophy in just waiting things out. And it's very impressive, especially for the popularity that your gem has gotten here.
Joe Leo 13:53
Being able to wait probably gets harder and harder. More issues, more pull requests, more people yelling at you on various channels. Why are you not solving this already in this particular way? I know. Yeah. Right? It's fine.
Carmine Paolino 14:11
It's fine.
Joe Leo 14:11
It's not easy, though, I imagine.
Carmine Paolino 14:14
No, it's not easy. That's true.
Joe Leo 14:16
Where do you see that turning point? Is it pretty natural at this point where you see, OK, this now makes sense to implement? At what point do you see, OK, let the floodgates open and just let people implement this stuff? Or do you try and take hold of that design and decide?
Carmine Paolino 14:37
I'm not sure if you guys are going to be satisfied with this answer, but it's gut feeling.
Joe Leo 14:43
I'm satisfied.
Carmine Paolino 14:44
It clicks and it feelsright.
Joe Leo 14:45
It's honest.
Carmine Paolino 14:47
There's no process or analysis or Excel spreadsheet or decision tree that I make in order to implement something. It's just does it feelright? And maybe we can rationalize this by saying it feelsright when the API will feelright, will feel like I actually want to use this.
Carmine Paolino 15:09
And also, the complexity is not overblown for something that should really be simple. So as I said, the first tenet of the philosophy is simple things should be simple and complex things should be possible. So that also means that sometimes you just give escape access to people and be like, hey, this part you can control.
Carmine Paolino 15:30
But warning, you have to know what you're doing.
Valentino Stoll 15:35
So how much of it at this point is you leading that design or getting influenced by others? Do you see more of it becoming external and just like things starting to lookright based on what other people are contributing? Or is it coming from just your experience watching stuff grow?
Valentino Stoll 15:55
Do you see one increasing over the other at this point?
Carmine Paolino 15:59
I'm not sure about the future. I'm kind of against predicting the future. So I didn't know in the future I was going to be.
Joe Leo 16:06
You lean on the LLMs for the next step?
Carmine Paolino 16:08
Yeah.
Joe Leo 16:11
That would be an even worse prediction.
Carmine Paolino 16:15
At the moment, I'm kind of leading the design of things. I certainly get influenced by what people do and what people think, as we all do. We're surrounded by our own environment. And it's kind of like reinforcement learning. So we always get influenced by whatever people are saying, whatever people are doing.
Carmine Paolino 16:31
But I still feel like I want to have the direction of at least the design of things for the foreseeable future. Ultimately, this is kind of like my taste implemented inside code. And it worked out so far. So let's see.
Carmine Paolino 16:50
It may change in the future. We will see how that goes.
Joe Leo 16:54
That's fair. I wanted to bring in a couple of questions from the development teams. Is that allright with you?
Carmine Paolino 17:00
Yeah.
Joe Leo 17:01
Allright. So this one, I think, is relevant based on what we were just talking about. So this is from Steve. So there's been a lot of discussion about multi-agent AI-assisted development lately. And he notes that Swarm recently released version 2 of their SDK. And it's built on top of Ruby LLM.
Joe Leo 17:17
And so the question for you is, what challenges and opportunities do you see with this new development paradigm?
Carmine Paolino 17:24
I'm not a huge believer in multi-agent systems. You can clip that if you want.
Joe Leo 17:30
This is going to be clipped and put into on my LinkedIn tomorrow. So please continue.
Carmine Paolino 17:35
I'm with you, too.
Joe Leo 17:36
Allright. Yeah, even better.
Carmine Paolino 17:38
There you go. Perfect. So I've been in AI for now a long time. It's been 14 years, if you count also my studies. And the more you add models on top of other models, the more you add errors. So I think that multi-agent systems,
Carmine Paolino 17:58
at least maybe I also haven't used them enough. That's also true. But I cannot believe that they're going to be extremely good at their output. So already, I'm using Codex every day. And I need to really baby it in order to get the kind of output I want.
Carmine Paolino 18:17
Even just a normal chat with ChatGPT or Claude or whatever needs to be babied a lot in order to get the output you want. So I think when you add a lot of agents working in parallel, you have a little bit more accuracy because each agent is doing a specific task.
Carmine Paolino 18:37
But overall, you're using a lot of them. So your surface area increases. And I'm kind of skeptical that that is a viable thingright now. It may be like a great thing in the future. And it's great that people are already exploring that and doing the best out of it. I know that Kieran Klassen from Every,
Carmine Paolino 18:58
for example, is a big fan of that. But at the moment, I don't see this being a really good way of working. What are your thoughts on this, guys?
Joe Leo 19:09
It's difficult to me because I both use it and fight against it at the same time. I guess I both see the benefits and the cons simultaneously, which I think is good. And I think I'm not alone in this.
Joe Leo 19:24
I think breaking things down into distinct agentic units that do very specific things is not manageable. And I commonly liken this to the whole flat organization mentality that DHH originally started charging for, where you get a manager of one,
Joe Leo 19:45
and you'll be much more effective, especially if you can multiply that across your smaller organizations. And the problem then becomes just a scalability. How does that scale? As far as development goes, you are just one. If you work within a team,
Joe Leo 20:04
even at a larger organization, you're probably like 10 developers at the most if it's structuredright. And so even 10 is manageable. But as it grows, it becomes less manageable. And I think that's even more so the truth with LLMs as contributors.
Joe Leo 20:24
And so the more that you break things apart and distribute it, the harder that becomes to manage. And then you're just managing the distribution. And then how do you worry about quality as things grow, too? And then at what point are you just spending all your time managing all this stuff? Maybe people want to do that versus actual coding.
Joe Leo 20:45
But personally, I don't. But who knows how that'll evolve? Maybe people coming out of college now, they really don't want to program. That could be the case. I'm too far out of the educational domain to kind of know that. So listening on that.
Carmine Paolino 21:02
So maybe I'll just say that because we program in Ruby. And I love Ruby. I think it's so good to program in Ruby. Why would you take it from us?
Joe Leo 21:10
You know, I'm with you. And so unfortunately, they don't teach Ruby in most colleges yet. And so if they did, maybe people would have a different mentality. How many people want to learn TypeScript or Rust or things like that coming out of college? Maybe they do. I don't know.
Joe Leo 21:28
But it seems like they would rather talk to an LLM based on just talking to people new to the industry.
Carmine Paolino 21:36
Yeah. I'm not sure if it's so much about the language design as it is more like we're kind of lazy as people. So I think that's fine that we're lazy.
Carmine Paolino 21:47
But there is still a lot to be done in the LLM space in order to have very good code generation that doesn't skyrocket into some pile of crap.
Joe Leo 22:00
Yeah. Speaking of that specifically then, that's one of my biggest cons of all of this. It's like you ask it to do something very complex. Even you give it a very refined specification of all the details that you want from it. And it's going to go and dump a ton of stuff that you then have to spend time to look through and weed through.
Joe Leo 22:19
And how much of that, if you had just designed upfront and done your own rubber ducking and done the Carrie Bernhardt style of development of testing the things that you want to do upfront so you can see where the interface leads, all of that is lost at just pure implementation. Boom. Here you go. Boom. And you have to keep level setting as you go,
Joe Leo 22:41
too, and be like, that's not what I want. That's not what I want. And then it's just like it becomes this whole can of worms of trying to course correct, which if you break it down smaller, maybe isn't a bad thing.
Carmine Paolino 22:53
Well, that's interesting because it sounds to me like both of you guys are describing agentic coding in general, whether or not it's multi-agent. And I'm curious to know, is that because Carmine, you mentioned at the top, OK, well, I'm using Codex every day. And I really have to baby it. And I get that. And we've had people on the show already multiple times that are like,
Carmine Paolino 23:12
yeah, you know maybe excited, maybe frustrated, but are like, yeah, I'm doing this thing every day. And I'm telling Codex or Claude Code to do this or do that. I'm trying to get better at it. But then where does the multi-agent thing come in?
Carmine Paolino 23:24
And I guess maybe for definitely for me, maybe for some people listening, how does that differ from, for example, just the normal tool calling that happens whenever I ask Codex to do something for me?
Joe Leo 23:36
Right. So in multi-agent, you would have multiple agents doing different things. And a single thing would be conducive then to the final output. Usually, you would have maybe master agent. For example, in deep research, you have a lead researcher. It gives out tasks to the smaller researchers than they do specific things.
Joe Leo 23:56
That works really well, to be honest. Deep research is great. But at the same time, you have to think about also the accuracy of the total output. So this is, I think, what we were talking about. And yeah, it is similar to just agentic development. But it kind of skyrockets from there,
Joe Leo 24:15
it feels to me, where if there is one wrong information, then it just gets passed along. It could be that you could use multi-agents to verify the output of things. And I think that's theright way of doing it. If you have to do multi-agent, that's probably the student teacher or researcher verifier kind of implementations. I think those are the best.
Joe Leo 24:36
But how many times do you see that versus I just have a swarm of 100 agents, and they're doing all these things for me. And suddenly, OK, how do you even verify all that code?
Joe Leo 24:49
For me, it comes down to how many times did you enjoy in your life to edit code or to read other people's code more than you enjoyed writing your own? This is what it comes down to me. I think writing code is so much more enjoyable,
Joe Leo 25:08
so much more fun than it is trying to figure out this crazy amount of complicated things that are interlocking between themselves that somebody else, unless it's written in Ruby and is really beautiful, then I really enjoy it.
Carmine Paolino 25:25
Yeah. I'm curious. So you mentioned deep research being good for multi-agents. And so it makes me think maybe there are specific tasks that are good for doing multi-agent approaches. And so I'm curious, what else do you see besides research as a multi-agent approach being beneficial?
Carmine Paolino 25:43
And what tasks have you seen it fall apart where a single-agent approach is more beneficial?
Joe Leo 25:50
I think it's all about context management, which is perhaps the most important thing that we should take care into developing an agent. So by looking at how Claude Code and Codex work, I was actually inspired to change how Chat with Work works because they do context management really beautifully. So you have first the grep tool,
Joe Leo 26:10
the search tool, that would search things in files. And then you will only get that specific line. So you can search in hundreds of files and get hundreds of specific previews of these files. And then the LLM would decide which file to go and look at. So I think with multi-agent systems,
Joe Leo 26:31
what you can do is to spread across a bigger task of researching in the entire web. And then each single agent would look at one specific aspect of it, summarize it, and then give that summary back to the lead researcher or to another step in between, who knows?
Joe Leo 26:50
And at that point, you would have much more context-efficient usage. And are you guys familiar with context rot?
Carmine Paolino 26:59
Oh, yeah.
Joe Leo 27:00
Yeah.
Carmine Paolino 27:01
I was just listening to Lance Martin on Latent Space Podcast. He was talking about that. Do you want to just refresh the audience on what that is?
Joe Leo 27:10
So basically, even though you have models that have 1 million tokens of context, it doesn't mean that they will perform just as well if they would only use 100K tokens of it. Actually, you can see a pretty steep falloff after, I think it was even 1,000 tokens,
Joe Leo 27:32
even for the bigger models, because they start to catastrophically forget things. They start to have poorer performance in just the quality of the answers that they have. I don't know if you guys experience it.
Joe Leo 27:44
Also, if you, for example, code for a long time in the same session in Claude Code or Codex, at some point, what used to be a fantastic code kind of sidekick becomes this really messy, unwieldy agent that is not doing what you ask for.
Carmine Paolino 28:03
Yeah. But I used to have the same experience with junior programmers when I was pairing with them all day.
Joe Leo 28:10
Fair enough. We just overwhelm.
Carmine Paolino 28:13
Exactly.
Joe Leo 28:15
We should have a vacation for the LLMs.
Carmine Paolino 28:17
That's really funny. Yeah. So I'm curious on the topic of context engineering, which has become kind of like the new thing. Where do you see that growing within the Ruby LLM gem? Are you looking for opportunities to try and introduce some design concepts around context management?
Carmine Paolino 28:36
Or are you kind of not even considering that and just let people figure it out like everybody else?
Joe Leo 28:42
Not at the moment, except for one thing, which is when we will introduce thinking, I'm already thinking, pun intended, to remove all of the thinking blocks, which I believe is what Codex does. So you see that it's quite context-efficient. It doesn't go up as much as Claude Code does. But apart from that,
Joe Leo 29:01
I don't see we could implement a lot of other things, like compacting of the context. But at that point, we're essentially having a library of prompts. And I'm not sure if your LLM library should prescribe how these prompts should be.
Joe Leo 29:20
I think it should be your own kind of creativity and ingenuity in order to make these prompts as best as you can.
Carmine Paolino 29:27
That leads me to my next question around how are you testing? How are you benchmarking the library? Is there any room there for something new? Because I feel like that one thing we just had Vincent on from DeSpyRB. And one thing I really like about that is it's very evaluation first.
Carmine Paolino 29:48
And I feel like a lot of that is missed. We're at this kind of inflection point of creativity hitting its limits. We're kind of finding the tasks that the things are doing well at. And so how are you testing that it's doing the tasks that you ask it to specifically do?
Joe Leo 30:06
Yeah. Evals are a big thing. They're going to become a big thing when I will release the eval framework that I'm thinking about.
Joe Leo 30:15
So Chat with Work is now at a point in which the UI works really well. The tools work really well. And I want to actually have a private alpha out with a few selected people. But then after that, we really need to make sure that the responses that they get from their questions, they're actually really good.
Joe Leo 30:34
I know that Kiran has released Leva. But what I want to have is a framework that essentially, first of all, makes the concept of an agent in a class. So you already have progressive disclosure in RubyLM, which you can then compact into an agent by one method.
Joe Leo 30:53
So you have RubyLM.chat.ask with a system prompt and with tools and all of that. And you can call that an agent and put it in a method. I think it's nicer to have it in a class. It also makes people think more that, hey, this is an agentic. You can do agents with RubyLM, which I think some people don't realize. And then you can use these agents,
Joe Leo 31:14
which are essentially a collection of tools and instructions in the eval framework. So this is kind of how I'm thinking about it. So there's a two-step thing. I think the agent class is going to be very, very easy to implement. And the agent framework, probably not, but we'll see. This is how I'm going to benchmark it.
Joe Leo 31:31
But if you think about benchmarking it in terms of the quality of the library or where does it go next, because I think there was also that part of the question, it's based on what I like.
Joe Leo 31:44
No, but to give you a better answer is essentially a combination of so I don't want to make RubyLM a framework for a specific type of development in LLMs.
Joe Leo 31:59
I want it to be an easy-to-use layer to communicate with LLMs in a way that it's at theright complexity level, is at theright abstraction level. So it's not too shallow, like, for example, the OpenAI SDK or Alex's Ruby OpenAI, which is like it's a layer on top of the APIs.
Joe Leo 32:21
Or it's too complicated a multi-agent framework would be. I want to be kind of the in-between in which you can simply send a message to an LLM. But you can also do something a lot more complicated in a very easy way. Again, it's like the simple should be simple, complex should be possible kind of way.
Joe Leo 32:42
So in my talk at EURUCO and in the upcoming keynotes at SFRuby, I have a slide in which in 20 lines, I can fit it in a slide. There is a four-agent multi-agent system in RubyLM for I don't remember what it was. I think it was some summarization of something.
Joe Leo 33:02
I think it was, look at this code and give me a security review and a code quality review and all of that. It was 20 lines of code. So if you can do that, I think I'm happy.
Carmine Paolino 33:15
First, that's very exciting for the upcoming SFRuby. And also, I think that it's theright philosophy for RubyLM to stay generic and be a library that can be used in many different ways. I still think that there probably is an opportunity there for exactly what you're talking about, what you were talking about at the top, which was a library for context.
Carmine Paolino 33:36
But I don't actually think that that should be RubyLM's job. I think that's probably another job. And I think DeSpy kind of gets at it as a framework for building it. But I also think all of us developers are and probably should not be creative writersright now. And I think the sooner somebody comes along and says,
Carmine Paolino 33:55
no, no, this is actually the best way, these are the words you use when you're doing some generic task, the better it will be for everybody. We can get out of the creative writing game and back to really closer to the programming.
Joe Leo 34:07
Yeah. I think this is the struggle of a developer approaching machine learning. So I don't know if you guys use classical machine learning ever. But when I switched from essentially 2009, I learned Rails because I wanted to in university during my bachelor. And then a couple of years after,
Joe Leo 34:27
I was doing my master in AI. And I was using Python. I was learning all these other things about machine learning. And it was such a shock going from the deterministic way of thinking of a developer, like, if I do this, then this happens, to like, oh, I train a model. And who knows what happens?
Carmine Paolino 34:47
Right.
Joe Leo 34:47
Yeah. And we'll have to do all these vague testing. And you have these metrics that don't really match the business goals. And oh, my god, how do you combine them together? So I think this is a different version of that kind of feeling that we're havingright now.
Joe Leo 35:03
But now we're interacting instead of with a model that essentially you give some numbers and it spits some numbers out, it feels like it's a whole person that you're talking to. And it has a personality and a style and a way of thinking. So it feels a little bit different. I think we're never going to have a deterministic way of doing that.
Joe Leo 35:24
And each single model has its own style and personality. I don't know if you guys can see the quotes.
Carmine Paolino 35:30
Yeah, yeah.
Joe Leo 35:33
And we should kind of approach it that way. And I think, of course, we all want to have like, oh, what is the best way of doing it? But I don't know if we're going to find it. You have to have it per model and do a lot of testing. And that sucks, Vincent.
Carmine Paolino 35:48
Yeah, it's funny.
Joe Leo 35:49
Yeah, fair point.
Carmine Paolino 35:51
The training data is so important for task selection,right?
Joe Leo 35:56
Yeah.
Carmine Paolino 35:57
So the single-agent thing.
Joe Leo 36:00
I feel like.
Carmine Paolino 36:00
Yeah, I know. And you mentioned traditional machine learning. And data quality is like the most important thing you could possibly imagine. And so if you're looking at an open-source model, that's all you're looking at is what data was used to train this thing.
Carmine Paolino 36:13
I feel like that was an unlock for me, realizing that just there's so much quality data on a hugging face that you can just download that is available. And I feel like once people realize there's quality data out there and that you can use it even with these less deterministic models to help course correct,
Carmine Paolino 36:32
I feel like there's a lot of open holes to fill in that kind of crossover.
Carmine Paolino 36:39
Now that machine learning people are kind of like, I won't say losing jobs, but transitioning to the LLM world, I feel like there's a lot of missing gaps of, well, how do we use those traditional methods to improve the determinism aspect of these specific tasks? But I wanted to.
Joe Leo 36:56
Yeah. I've been in machine learning enough to say that there's no way to make it completely deterministic.
Carmine Paolino 37:01
Oh, no way,right?
Joe Leo 37:03
I know.
Carmine Paolino 37:03
But you can greatly improve the prediction accuracy just by percentages, order of magnitudes. You can course correct for some tasks, especially if you have the data.
Joe Leo 37:17
Yeah, and then you can do re-evaluation. So this is also why evaluation is the clear next step for also RubyLM. Even in traditional machine learning, you would do so many tests before putting a model in production. There's an entire field in business of weights and biases and MLflow and all of these frameworks to just do experiment tracking.
Joe Leo 37:39
And the reason is that you do a lot of experiments. The problem now is that, well, we can't train these models anymore because they cost millions of dollars to be trained. So what can you do? Well, you can do a little bit of fine-tuning if you have a lot of data and you really want to have a specific style of responding.
Joe Leo 37:59
But is it going to be game-changing for you? Not sure. And also, it should not be used for learning new concepts necessarily. Instead, it's all about context engineering. So these are the two tunables that you have. And unfortunately or fortunately, probably unfortunately, all these models, they are different.
Joe Leo 38:21
So for each model, you need to have a slightly different prompt that would work just as well as the other ones. So if you have a certain benchmark that you need to hit in terms of precision, then you need to have different prompts and different tools and different description of these tools. And all the things that you pass to the model needs to be a little bit different in order to maximize their usage.
Carmine Paolino 38:42
Yeah, totally. That's another thing you could see on Hugging Face too is you explore a model and it tells you how to use it, at least their model card, which I feel like people just don't even look at anymore because they're either using Anthropic or they're using OpenAI for the most part. But yeah, I mean, there's other options out there.
Carmine Paolino 39:02
I'm excited to see the smaller models kind of take over as an aggregate. I don't know if we'll see that anytime soon. I am super interested in this topic, but I want to leave time for your whole, I guess, bit on why async in Ruby is just like a fantastic leveraging point for your library,
Carmine Paolino 39:22
but more importantly, for Ruby's position in AI because I feel like you were so spot on in your article on the future of AI apps with Ruby. So what's the high-level, 10,000-foot view of where you kind of see because you mentioned the async framework as being a great positional point.
Carmine Paolino 39:43
With Tenderlove's Aaron Patterson's talk recently, he was talking about Reactors providing that full parallelism. Where do you see this kind of growth opportunity? And what should people be focusing on when they're using their AI systems?
Joe Leo 39:58
Right. So 10,000-foot view is if you're doing I/O-bound operations, you should use async in Ruby. If you're doing CPU-bound operation, you should use Reactors. You can use threads too. That's fine. But they come with overhead. So why would you do that?
Carmine Paolino 40:20
Right.
Joe Leo 40:20
That's it. That's pretty much the bit. No, so I can give you a little bit more context. So threads are really used a lot in Ruby. And the background job processors, they use a concept of the maximum amount of workers. And why do they do that?
Joe Leo 40:39
Because threads also consume a little bit more memory, consume a little bit more RAM, especially virtual memory, actually a lot more virtual memory. I think it's 8 megabytes per thread of virtual memory. And it's 32 kilobytes or 4 kilobytes.
Joe Leo 40:55
It's in my article somewhere of memory. So substantially different. In RAM, it's a lot different. You don't consume that much RAM. So it's kind of equal. That's fine. But the context switching is a lot, I think it's 20 times more expensive. So long story short, you need to use Mac threads.
Joe Leo 41:16
Otherwise, you're going to blow up the virtual memory budget. Or you're going to have way too many context switches. And then your machine slows down to a crawl. With async, you run everything in a single thread. And essentially, it's cooperative concurrency, which means that each single fiber would have to yield to the other fibers.
Joe Leo 41:38
But they yield automatically at I/O operations. So you don't have to do kind of anything unless you're doing something really specific. So what it means is that it's possible now to create, and in fact, Samuel Williams, which is the creator of the whole async ecosystem, has created also an active job adapter for async job.
Joe Leo 41:59
And now you don't need to have a maximum number of fibers in there, which means that you can run a lot more concurrent LLM or any I/O-bound operations at the same time. So that's where the one-machine part of the one-API, one-person, one-machine comes in.
Joe Leo 42:17
You can run thousands and thousands of concurrent operations and LLMs conversations at the same time because otherwise, you will block one single worker for each single LLM conversation that you're having. And these LLM conversations that you're having, they can take 60 seconds, five minutes, God knows.
Joe Leo 42:37
If you put the thinking budget too high, probably, I don't know, 10 minutes, 20 minutes, probably will time out. So take care of your timeouts. But then you're having one slot occupied for that specific conversation. And then you have one less slot. And you shouldn't have an enormous amount of threads,
Joe Leo 42:57
even in SolidQ or in all of these other wonderful but also not LLM-optimized background job processors.
Carmine Paolino 43:06
Yeah, that makes a ton of sense. I guess, how do you think about the scale of things? Do you find yourself needing to group certain tasks that you know take longer than others in certain queues? Do you do any of that kind of planning? Or do you just throw things in the active job and let it figure itself out?
Joe Leo 43:24
No, there's no prioritization or anything. At the moment, there's one single queue in Chat with Work. And the reason is that I moved away from another way of doing this.
Joe Leo 43:36
If you don't know Chat with Work, maybe your audience doesn't, it's basically an agent that communicates with your Google Drive and Notion and Slack and GitHub and stuff. For the moment, it's only Google Drive. And in the past, that used to be like a normal RAG system.
Joe Leo 43:53
So I would synchronize all your files and index them and have embeddings and vector search and all of that. And that required two queues because you have the queue of fetching stuff and the queue of the conversations. Butright now, it's actually much more similar to Cloud Code.
Joe Leo 44:08
That's why I call it Cloud Code for your documents because it's essentially having a grep and a search tool and a read tool and an update tool. And so it doesn't need to fetch your whole data set in the background. So it doesn't really clog my queues too much or at all.
Joe Leo 44:28
This is specific maybe to my use case, but you should probably use if you have different, yeah, use cases.
Carmine Paolino 44:36
You know, it's funny you mentioned that read, update, kind of the crud actions of Cloud Code style. I feel like I've seen that growing in the Ruby Discord channel, at least, that style of I won't even call it agentic coding, but.
Joe Leo 44:51
But it's the style. It's doing crud updates or doing crud with AI.
Carmine Paolino 44:57
Yeah, pretty much. Yeah, pretty much. Just giving it a Rails app to use.
Joe Leo 45:06
Yeah, it turns out we're all crud monkeys, even LLMs.
Carmine Paolino 45:08
Right, LLMs, yeah.
Joe Leo 45:11
Yeah.
Carmine Paolino 45:12
What is something that is another question from one of my developers on the topic of Chat with Work? Where have you found RubyLLM to be specifically very good at solving technical tasks? And where have you seen it actually maybe come up short?
Joe Leo 45:26
Well, it's been developed within Chat with Work. So it's pretty good at what it does.
Carmine Paolino 45:31
So it's always perfect for the task. Yeah. That's good. That's an advantage.
Joe Leo 45:37
What's very good at it, I think it's I don't know. The thing that I enjoy the most is the Rails integration just because it has the exact same API specification as the plain old Ruby objects kind of API. I don't need to think so much when I'm coding a new prompt or I have another agent or something like that.
Joe Leo 45:59
It's very easy. It's just chat.create. And then you have the same exact parameters that you can pass to RubyLLM.chat. And then dot ask is the same. So that part I enjoy a lot. The cluttering my brain from bullshit, that's the thing that I like a lot.
Carmine Paolino 46:16
Yeah.
Joe Leo 46:16
Because I think we need to focus more on the context engineering and prompt engineering and all of these tasks which are really important to make our products better instead of thinking about what is the response format of this specific provider in a specific thing that I do, like streaming, for example.
Carmine Paolino 46:39
That's good to hear.
Joe Leo 46:40
Falling short? Yeah, I don't really have any good options. Is there anything that you guys think it's falling short?
Carmine Paolino 46:48
I don't have anything here. If I did, I would tell you. But I don't. I think we're generally very happy to be using it.
Joe Leo 46:54
Just the responses API.
Carmine Paolino 46:57
The responses API.
Joe Leo 46:59
It's not there yet. It's not there yet.
Carmine Paolino 47:00
I know. I know.
Joe Leo 47:02
I also don't find too much of a reason to have a responses API too soon because it seems to me that we're going to lose audio support by switching to that, which is really strange. And the only thing that we will gain is O4, I believe, and a couple of other models that are responses API only. But so far,
Joe Leo 47:22
OpenAI has actually provided backwards compatibility through Chat completions API through pretty much everything. It's going to come at some point. Don't worry. I see you disappointed.
Carmine Paolino 47:37
The biggest thing to me is the reasoning boosts that you can get out of, especially in the world where there's compliance risk and you need to take advantage of their encrypted reasoning tokens to feed back into itself. There's just some optimizations that they have built in now that you just see.
Joe Leo 47:56
I'm not aware of that.
Carmine Paolino 47:57
Yeah.
Joe Leo 47:57
Actually, they're not.
Carmine Paolino 47:58
Yeah. There's a lot of great stuff kind of just built into the new APIs that give you advantages for specific things.
Joe Leo 48:04
Yeah, that's what I was looking forward to.
Carmine Paolino 48:06
Yeah, put it in the issue folder.
Joe Leo 48:08
I'll open an issue. Yeah.
Carmine Paolino 48:10
Yeah, they know. I'll be like, hey, look what you're missing, Carmen.
Carmine Paolino 48:17
I'm just messing around. You could still use the Chat completions API for most of it.
Joe Leo 48:22
Yeah, yeah, that's kind of the point. But we will get there. We will get there. Maybe soon.
Carmine Paolino 48:26
I'm definitely in the advanced category. So I'm already taking advantage of a lot of the reasoning models and the thinking and doing things maybe that it's not even ready to do in a lot of ways, hoping that the bitter lesson catches up with me and I could just delete half my code base. But it's slow and steady.
Joe Leo 48:45
Yeah, it's fantastic. It's not so bitter when you think of it that way.
Carmine Paolino 48:49
Yeah,right? Except it will just keep generating. So by the time I've deleted half of it, it's generated another half.
Joe Leo 48:57
That's true.
Carmine Paolino 48:59
Net new. Well, you know we've been talking about a lot here, Carmen. Thank you so much for coming on and for creating this fantastically designed RubyGem. I know a lot of Rubyists out there are just kind of over the moon with it for good reason. It follows all the things that we want out of a library.
Joe Leo 49:18
Yeah, and tack on to that, it also follows all the things that we know and love about the Ruby community, which is somebody like yourself building this. And yeah, I get that it solved the problem for you. But you kept building it, kept releasing it out in the open. And it's grown to be this thing that so many people love. And I have a tremendous amount of respect for that and for you.
Carmine Paolino 49:36
It's really nice to hear that.
Joe Leo 49:37
Keep it up.
Carmine Paolino 49:38
I'm going to keep following along and using it more and more. I've only just started. I have this habit or bad habit, good habit, I don't know, a habit of repurposing. We have this little podcast admin Rails app that does our intake and prepares for episode launches.
Carmine Paolino 49:55
And I swap out the thing that runs it every time we have a new guest on that has some gem.
Carmine Paolino 50:03
It was on DeSpy. Now it's on RubyLLM.
Joe Leo 50:06
Awesome.
Carmine Paolino 50:07
Just kind of like a personal experiment. But it's pretty painless, straightforward. All the components just swapright out. So yeah, I think it speaks volumes to your library, just very well designed.
Joe Leo 50:17
Thank you, guys.
Carmine Paolino 50:18
Anyway, if people want to reach out to you, find out what you're up to, what channels should they be looking at?
Joe Leo 50:24
I post on X quite frequently. So I guess that's one of the channels, probably the main one. It's@paolino. P-A-O-L-I-N-O. Also, my GitHub is@CRMNE, which is Carmine without the vowels except the last one for some reason. And then my website is paolino.me.
Joe Leo 50:46
So that's where you can find perhaps longer blog posts and stuff.
Carmine Paolino 50:49
Yeah, definitely check that out. There's some really good stuff on there.
Joe Leo 50:53
We'll link that to the show notes and make sure that people can follow along all this async stuff because to me, that's the most fascinating part of this. It just dropsright in.
Carmine Paolino 51:04
Yeah, awesome.
Joe Leo 51:07
Maybe tomorrow. We'll see.
Carmine Paolino 51:09
Hey, allright.
Joe Leo 51:10
Excellent.
Carmine Paolino 51:13
Well, it was really a pleasure to talk with you guys.
Joe Leo 51:16
Yep, fantastic.
Carmine Paolino 51:18
If there's anything you want to share, last, we have this little short segment at the end, things you want to point people to if you want to.
Joe Leo 51:25
No, just go make Ruby AI a big thing. I think we have the tools. I don't think we should be shy anymore of actually using AI because AI has changed,right? AI is not building models anymore. AI is building apps. And who are the best people to build apps? I think it's Ruby and Rails developers.
Carmine Paolino 51:45
I agree.
Joe Leo 51:48
We're not biased at all.
Carmine Paolino 51:49
At all.
Carmine Paolino 51:55
I mean, I think the fact that there's two people here that have successful businesses that they're operating in this space. And I work for a very large one myself at Gusto. So who cares if it scales or not,right? I think we've all proven that it does. But at this point, it makes money, so.
Joe Leo 52:13
It does. And there's room for everybody. That's absolutely true.
Carmine Paolino 52:16
Exactly. Yeah. Awesome. Hope to see you guys at Test of Ruby. If not, it's been a pleasure.
Joe Leo 52:22
Yeah, it was great, Carmen. Thanks. Take care.
Carmine Paolino 52:25
Bye.
Joe Leo 52:26
Bye-bye.
Podcasts we love
Check out these other fine podcasts recommended by us, not an algorithm.
Latent Space: The AI Engineer Podcast
swyx + Alessio