WEBVTT

00:00:00.000 --> 00:00:21.595
You just built a voice AI agent, it works, then the bill shows up and you're paying for the LLM, the voice, the phone call, and then another platform fee on top of that. That's not even the worst part. The worst part, you still don't really even own the system. Today, I'll show you Dobre and an open source VAPI alternative you can self host, inspect, and control.

00:00:26.555 --> 00:00:31.730
Voice AI nowadays can look somewhat simple from the outside. Take a phone call,

00:00:32.050 --> 00:00:34.850
turn speech into text, send it to the LLM,

00:00:35.010 --> 00:00:48.095
turn the answer back into speech, it's done. That's easy. Right? Well, as any of us know who've tried this, not really because real calls are messy. People interrupt, people go silent, they're gonna change topics,

00:00:48.415 --> 00:00:51.055
they can ask really weird questions.

00:00:51.215 --> 00:00:55.775
Your agent needs to call APIs and when it breaks, you need to know why.

00:00:56.095 --> 00:01:37.000
That is where most voice AI projects become more of a pain. A voice agent is not just chat GPT with a phone number, it is a live system with a bunch of moving parts, right? That's speech to text, LLM, text to speech, state, tool calls, a boatload of other things, you get it. There's a lot of moving parts that we don't actually see happening. And when the call fails, the bot gave a bad answer is not enough. Was it the prompt? Was it the model? What was it? Why did it fail? And this is where Dogebra comes in. If you enjoy coding tools that speed up your workflow, be sure to subscribe. We have videos coming out all the time. Alright. Now, let's look at this in practice. I'm gonna start locally because if a tool says it's built for devs,

00:01:37.160 --> 00:01:53.135
I wanna see Docker before anything else, this was super easy to spin up. I'm gonna clone it from GitHub, I'm gonna c d into the folder and then I just have to run Docker compose up. That's simple enough, easy enough for us. Once the containers are running, we can jump into the Dobre UI.

00:01:53.295 --> 00:02:05.390
Now I'll build a simple lead qualification agent. So what do I mean by that? Someone's gonna call in, the agent asks what they wanna build, then it asks about the company, the size, the budget,

00:02:05.630 --> 00:02:16.925
small things like this. It'll then call an API tool to create or update a CRM lead if we embed that. And maybe I could even say if the lead was qualified, it transfers to a human.

00:02:17.405 --> 00:02:19.645
So I add a prompt node,

00:02:20.365 --> 00:02:21.885
then a qualification step,

00:02:22.830 --> 00:02:24.670
then an API tool call

00:02:24.910 --> 00:02:27.630
and then I can add a branch and a transfer.

00:02:28.110 --> 00:02:41.545
There's no custom orchestration code yet and that's kind of the point here. This looks like a no code canvas but for devs and the value is not no code. The value is not wasting code trying to tie everything together.

00:02:41.865 --> 00:02:44.425
Now, let's try to run a test call here.

00:02:44.825 --> 00:02:46.905
Hi. This is Sarah from inbound calls.

00:02:48.090 --> 00:02:49.210
Are you still there?

00:02:49.930 --> 00:02:55.370
We're looking for an AI phone agent for inbound demo requests.

00:02:55.610 --> 00:03:00.250
That's great. I can definitely help you with that. To make sure I connect you with the right solution,

00:03:00.705 --> 00:03:06.385
could you tell me a little more about what you're looking to achieve with an AI phone agent for your inbound demo requests?

00:03:06.705 --> 00:03:12.305
Let's say around twenty thousand minutes. Thanks for sharing that. And what is your company size and industry?

00:03:12.705 --> 00:03:16.465
Now we can see the transcript here. We can see the trace.

00:03:17.080 --> 00:03:22.600
We can see the tool call that actually happened and we can see the state changes.

00:03:22.840 --> 00:03:35.165
Plus here's the recording which I wanted in the first place and that is what I want as a dev, not just the bot worked, I wanna know why it worked, when it fails, I want evidence of this actually happening.

00:03:35.245 --> 00:03:47.050
So what is Doga? Doga appears to give us three different things out of all this. We get a voice agent, a visual workflow builder and the platform layer you usually have to build yourself.

00:03:47.290 --> 00:03:54.250
The voice engine is the part that connects the caller, the phone provider, speech to text, the LLM, and text to speech.

00:03:54.490 --> 00:03:56.730
That is what makes the call actually happen.

00:03:57.245 --> 00:04:05.725
The workflow builder is where you design the logic of this whole system, so instead of hard coding every prompt, branch, API call and transfer,

00:04:05.725 --> 00:04:10.925
you can map out the flow visually. So huge win here, I like these kind of maps. Ask this question,

00:04:11.600 --> 00:04:20.080
wait for the answer. That's kind of what we're mapping out here. I can call this API branch here, transfer there, that kind of logic should be easy to change.

00:04:20.560 --> 00:04:24.800
Then to all this, there's the platform layer, testing, tracing, recordings, analytics,

00:04:25.125 --> 00:04:29.045
that is the boring stuff every series voice project eventually

00:04:29.045 --> 00:04:29.765
needs.

00:04:30.005 --> 00:04:34.725
With all this, you can bring your own providers, your own LLM and your own TTS.

00:04:34.965 --> 00:04:56.365
Because Doga is open source, you can inspect the code, change how it works and self host it. As of this recording, GitHub stars are low. So this is a super new find that I found but it's honestly a rather cool one. Now let's compare Doga to other things we already have out here. You have three main ways to build voice agents. First is hosted platforms, VAPI, Bland, Retail.

00:04:56.445 --> 00:05:00.205
These are good when you wanna move fast and you don't wanna run infrastructure.

00:05:00.445 --> 00:05:10.140
You get clean dashboards, APIs, transcript, testing tools, all that's really useful, but you start to lose control right there. If the platform changes pricings,

00:05:10.140 --> 00:05:12.780
you deal with it. If the platform changes limits,

00:05:13.260 --> 00:05:16.540
deal with it. Right? If you need custom deployment,

00:05:16.620 --> 00:05:27.135
anything like that, again, you might hit a wall. Hosted tools are fast though, so I guess that's a win. You have some of these raw frameworks like I came across PipeCap,

00:05:27.215 --> 00:05:28.175
Vocode,

00:05:28.735 --> 00:05:32.335
LiveKit I think is one of them. These give you a lot more control,

00:05:32.575 --> 00:05:34.250
you can build almost anything.

00:05:34.490 --> 00:05:38.410
But now, you're building everything around this framework,

00:05:38.650 --> 00:05:56.965
off UI workflow editor, so that's a big trade off using things like that. Now, Doga is still way too new but it's here, so I think their bet is kinda simple. What if you could use a visual voice agent builder without giving up the self hosting, choosing a provider, tracing and control?

00:05:57.365 --> 00:06:08.560
That's what this appears to be. Write code where code matters, use the builder where your flow matters, inspect the run time when things break, and swap providers when costs change.

00:06:08.880 --> 00:06:12.000
Self hosting gives us a lot of control which is huge.

00:06:12.320 --> 00:06:24.520
VAPI bland retail are best for fast hosted deployment, but the trade off cost lock in and less control. If you enjoy coding tools like this, be sure to subscribe to the BetterStack channel. We'll see you in another video.
