The Complete Guide to Building AI Agents for Beginners - Sottotitoli bilingue

In a few years, businesses will hire AI agencies composed entirely out of AI agents.
It will be absolutely normal for us to go to a software development shop that contains only one or two people.
For example,
this AI lab called Cognition recently released the first AI software engineer,
named Devin, that outperforms everything else that we have ever seen on the SWE benchmark.
It can literally train its own AI, learn unfamiliar technologies, contribute to production repos and even complete some side hustles on app work.
But what many people don't realize is that this comparison on their chart is made between their AI.
That's right.
That's standard,
large language models like Claude or GPT-4, while David has access to additional tools like terminal code editor and even its own browser.
So all this is literally just a cleverly prompted LLM with a bunch of tools and this
lab has already gotten more than 20 millions of dollars and funding.
Personally, I don't believe that they're heading in the right direction and I'll explain exactly why later.
But this really shows is that we're merely scratching the surface of what's possible here.
So in this video,
I will share with you my entire experience developing custom AI agent systems for companies of all sizes ranging from
small firms of 5 to 10 employees to corporations with 30,000 plus people.
In fact,
by the end of this video,
you'll be able to build your own fully functional social media marketing agency that will generate ad copy,
create ad images with dialy 3 and reliably post them on Facebook.
Here's the game plan.
We'll start with an overview of this new video.
AI agent developer role in what it entails.
Next we'll unravel what AI agents truly are.
After that we'll take a tour of the most popular AI agent frameworks at your disposal.
Then I'll be pulling back the curtain on my own framework giving you an insider's
perspective on how it works and how you can leverage it in your own projects.
And we'll get hands-on as we build a fully functional social media marketing agency ready to take on new clients and generate profits.
This will be a comprehensive guide, highlighting my entire process from start to finish.
So make yourself comfortable and let's dive in.
First, let me define this new AI agent developer role and why I believe it will be one of
the most in-demand skills in 2024.
Well, numerous studies and industry experts predict that we're headed towards full labor automation in the next decade.
While I totally agree with this projection, I don't think it will be a self-driven process.
As AI models become increasingly intelligent, they're certainly going to gain a broader understanding of the world.
However, they will never know how a specific company operates internally, simply because such data is rarely made public.
As we saw in 2023, businesses don't just want to incorporate standard large language models into their processes.
They to customize them and at least enrich them with their own personal data.
The I believe why labs like Cognition will soon fail is because they lack customization.
To fully automate a like Google, you need more than just a super intelligent AI developer.
We need to make sure that this developer has access to all the necessary tools infrastructure and
internal knowledge before it can actually perform any tasks.
This is where AI agent developers come in.
So, an AI agent developer is someone who fine-tunes AI agents based on internal business processes.
my primary responsibility is to equip AI with all the necessary sources and then ensure it knows how and when to use them in production.
The primary skills required for an AI agent developer role can significantly vary from project to project.
This topic deserves its own separate video by itself, so if you're interested, please let me know in the comments.
Soon I'll walk you through exactly how to accomplish all of this but for now we need to understand what AI agents truly are.
A lot of people say that AI agents are just instructions,
knowledge and actions and that's sort of true but that's not exactly what AI agents are.
That's how we make AI.
agents.
In fact, AI agents are much more than that.
Let me explain.
To what AI agents truly are,
we need to unpack the difference between standard 1.0 AI automations and more sophisticated 2.0 AI agent based applications.
Picture a straightforward customer support automation where an LLM Some must label each incoming email and must respond to it,
pulling some additional context from a vector database.
Does this sound like an agent or a mere AI automation?
You have noticed that it doesn't quite feel like an agent, right?
But why?
It has knowledge from your vector database, it some instructions on how to respond and it performs an action of attaching a label.
And the distinction lies in the fact that I said that it must generate a label and must answer each email.
You the fundamental difference between automations and agents is that agents possess decision-making capabilities.
So, in 1.0 AI automations, every single procedure, like context retrieval, response generation, and labeling is hardcoded into the backend logic.
This means that it literally cannot deviate from this logic no matter what.
If the automation is tasked with responding to emails, it cannot neglect to respond.
And while this rigidity works well for certain use cases,
it completely fails as soon as some unexpected circumstances are Imagine,
for example, if your customer support mailbox receives an inquiry about the potential partnership with your platform.
If this scenario wasn't accounted for in a 1.0 AI automation,
it would handle it like any other support inquiry, potentially causing a missed opportunity.
On the contrary, 2.0 AI agent based applications have a different approach.
While they still equip AI agent with the necessary tools,
context and instructions, they grant the agent the autonomy on how to utilize these tools by itself.
Instead of feeding your contacts into a prompt on every request, you empower the agent to retrieve it only when it's needed.
This flexibility means that the agent can adapt.
accordingly.
So in our previous example,
the agent would recognize that it's dealing with an inquiry outside of its expertise and then it could use other available tools if possible.
For example, it could reach out to your human support agent or it could send a notification and slack.
Overall, what AI agents truly are is a new way of people.
thinking about how to apply AI in various applications.
It's a paradigm shift rather than a simple technique.
Even in my agency, we all began with simple 1.0 AI automations.
But as my clients saw the tangible benefits they offered, they yearned for more, more advanced capabilities, and automation of increasingly complex tasks.
Over time we reached a stage where I wouldn't even call it automation anymore.
It more akin to outsourcing as some of the processes we automated literally required
multiple people to manually carry them out and the performances were never the same.
Now, having said all this, where do agents' worms come in?
To truly grasp the concept of agents' worms.
swarms is crucial to understand that all intelligence is environment dependent.
For instance, I might excel when it comes to programming but I'm utterly lost when it comes to cooking.
I would not last as a cook even in McDonald's for a day.
I basically eat meat and nothing else.
This applies to both AI agents and your own employees.
You can't as assigned 10 different roles even to the smartest person in the world.
Likewise, even when we reached GPT-100, I would still not recommend assigning so many different responsibilities to a single agent.
Firstly, by removing all of this unnecessary context for a given process, you simply save on tokens.
And secondly, even if GPT-100 wouldn't get confused, handling 10 different roles the users of such a system certainly would.
So what agents forms really allow you to do is separate responsibilities for different environments just like in real-world organizations.
This results in three main benefits.
First dramatically reduces hallucinations.
I found that after you add 7 to 10 tools to a system.
single GPT4 agent, it starts to get confused.
But when you split those tools into multiple agents, you almost completely eradicate this problem.
Secondly, you can outsource much more complex tasks, because the longer the sequence of your agents is, the more tasks
they can handle without direct supervision.
And lastly,
it makes the whole system much easier to scale you see most of my clients don't stop on a single AI agent integration
and often try to automate increasingly complex processes over time so when they need arises
instead of adjusting your existing system and then debugging it all over again you can simply
add another agent and leave all the previous agents as they are.
In fact,
this last problem of scaling is so common among my clients that this week we are releasing the first of its kind EI Agents as a service subscription.
Basically, if you're a business owner, you now pay us a fixed fee per month and we
will develop as many EI Agents as you need, but we will work on them one at a time.
Our goal is to provide a flexible and scalable solution that grows with your needs.
So if you're interested, you can apply right now using the link below at a temporarily discounted price.
However, if you're inclined to take on this journey by yourself, that's perfectly fine too.
Because next, I'm going to walk you through my entire process from start to end.
But before we get into the niche ingredient, let's start with a brief overview of all the multi-agent frameworks at your disposal.
The project is the one you've probably heard of before called AutoGen by Microsoft.
The main feature of AutoGen is multi-agent chats.
It was developed as a research experiment and was quite groundbreaking at time.
However, the problem with AutoGen is that it has extremely limited time.
conversational patterns that are super hard to customize.
If you look at its code in AutoGen,
the next speaker is determined with an extra call to the model that emulates roleplay between the agents.
Let me just read it to you.
Read the above conversation, then select the next speaker from agent names, only the role.
I mean, not only this is extremely inefficient, but it also makes the whole system absolutely uncontrollable.
A of people report that agents constantly hallucinate because there is no clear separation of concerns when it comes to tool execution.
One agent might write the code,
but because it needs to be executed by user proxy or some other agent, it often results in is a huge problem in production.
The framework that has recently been getting a ton of attention is called Crew AI.
Crew AI was developed as a side project and it introduces the concept of process into Asian communication.
This provides home semblance of control over the communication flow.
However, just like an outage end, the conversation flows are extremely limited.
offering only sequential or hierarchical options.
In the sequential process, basically all your agents communicate to each other one by one.
And in the hierarchical, there is one manager agent that communicates to everyone else.
Obviously, this is not how real organizations are structured.
For a who tested this amazing new signing screen.
Additionally CrewAI the manager agent is hardcoded for you which for some reason people find cool.
However, imagine if you want this agent to first search the web or get additional context before deciding who it should speak next to.
Try doing that in CrewAI.
The problem with CrewAI.
However, is that it was built on top of the length chain, which was released before any function call in models.
This means that there is no automatic type checking or error correction when it comes to tool execution.
The descriptions for these tools are also extremely limited.
Recently, Kriya introduced a to overcome this by extending a base tool class.
However, this process is definitely not straightforward as it could have been.
The backstory,
the role and the tasks that you need to define when you're creating your crew are simply prompt templates that also take away control from you as a developer.
Without these prompt templates, the crew AI simply would not be able to function.
The only advantage of crew AI is that you can use it with open source models.
Now, I personally would never utilize any of these frameworks in production for my clients, which is why I developed my
own framework called Agencies Warm.
In this framework,
there is not a single hard-coded prompt,
it is easily customizable with uniform communication flows and it is extremely reliable in production because it
provides automatic type checking and validation for all tools with the instruction.
library.
It the things possible wrapper around OpenAI Assistance API, which means that you have full control over all your agents.
So whether you add a manager agent,
define goals,
processes or not,
whether you create a sequential or hierarchical flow or even combine both with the communication tree that is 50 levels in depth, I don't.
care.
It is still going to work.
Your agents will determine who to communicate with next based on their own descriptions and nothing else.
The disadvantage of my framework is that it is fully based on OpenAI Assistant CPI and to answer your question right now,
no, we're not going to support any open source models.
So now let me answer this question, why AI for AI agent development?
Well, that's a good question because if you look at all the previous OpenAI endpoints, you'll find that the Assistant CPI isn't different in almost
any way.
It does provide a couple tools like a code interpreter and retrieval and you can upload files, but it's not that big of a deal.
However, it was a game changer for me as an AI agent developer.
And reason for this is state management.
You see, with the Assistant CPI, you can attach instructions, knowledge, and actions directly to each new agent.
This not only allows you to separate various responsibilities,
but also to scale your system seamlessly without having to worry about any underlying data management or about your agent's confusion tools as well.
in all other frameworks.
Agent state management is the primary reason why my framework is fully based on OpenAI Assistant CPI.
If costs are concerned for you,
simply use GPT 3.5 turbo,
which is much better than any open source model you can run locally unless you spend $10,000 on your computer.
And data privacy is a concern, you can easily use it with Azure OpenAI that doesn't even share data with OpenAI itself.
To get started creating your agents' worms using my framework, you need to understand three essential entities, which are agents, tools, and agencies.
Agents are essentially wrappers around assistance and assistance API.
They include numerous methods that simplify the agent creation.
For instance, instead of manually uploading all your files and adding their IDs when creating an assistant, you can just specify the folder path.
The system will then automatically attach all files from that folder to your assistant.
It also stores all your Asian settings in a special settings.json file.
Therefore, if your agent's configuration changes, The system will automatically update your existing assistant on OpenAI, rather than creating a new one.
The most commonly used parameters when creating an agent are name, description, instructions, model and tools.
These all pre-to-self explanatory, there are no preset templates for goals, processes, backstories, etc.
You simply include...
all of them into instructions.
Additional include files folder, schemas folder and tools folder.
As said, all files from your files folder will be automatically indexed and uploaded to OpenAI.
All tools from your tools folder will be attached to an assistant as well,
and all OpenAI schemas from your schemas folder will be automatically converted into tools allowing your agents to easily call third party APIs.
Additional properties API params and API headers are also available if your API requires authentication.
However, I do recommend creating all tools from scratch using instructor as it gives you a lot more control.
I previously posted an index detailed tutorial on instructor, which includes a brief conversation with its creator Jason Leo.
Check it out if you're interested.
In essence, instructor allows you to integrate a data validation library called Pydenic with function calls.
This ensures that all agent inputs actually make sense before any actions are executed.
minimizing production errors.
For instance, if you have a number division tool, you can verify that the division is not by zero.
If it is, the agent will see the error and automatically correct itself before executing any logic.
To begin creating tools in agents is warm with instructor create a class that extends a base tool,
add your class property is an implement the run method.
Remember, the agent uses dog string and all field descriptions to understand when and how to use your tool.
So for our number division tool,
the dog string should clearly state that this tool divides two numbers and then we also have to describe all parameters accordingly.
The next step is to define your execution.
logic within the run method.
You access all defined fields through the self-object.
To make some fields optional, use the optional type from pandemic.
To define available values for your agent, use a literal or enumerator type.
There are also many cool tricks that you can do.
For instance,
you can add a chain of thought parameter inside the tool,
which will save you on talking costs and
latency because the agent will only plan its actions when using this tool instead of on every request if you use this prompt globally.
To add your validation logic use field or model validators from pandemic.
In this division tool example it makes sense to add a field validator that checks if the division is not by zero returning an arrow if it is.
Because tools are arguably the most important part of any agent-based system I created this custom GPT to help you get started much faster.
Say for example, I need a tool that searches the web with SERP API.
As you can see,
it instantly generates a base tool with parameters like theory is a string and number results is an integer, including all relevant descriptions.
You can find the link in to the stool on our discord.
The component of the agency's form framework is the agency itself, which is essentially a collection of agents that can communicate with one another.
When initializing your agency, you add an agency chart that establishes communication flows between your agents.
In contrast to all other frameworks, communication flows, and agency swarm are uniform, meaning that you can define them in any way you want.
If you place your agents in a top-level list inside the agency chart,
these agents can communicate with the user,
and if you add your agents together inside the second-level list, these agents can communicate with one another.
So, to create a basic sequence of low, add a sequence.
CIO agent to the top-level list, then create a second-level list with a CIO developer and a virtual assistant.
In this flow, the user communicates with the CIO, who then communicates with the developer and the virtual assistant.
If you prefer a here-archical flow, simply place those agents into two separate second-level lists with the CIO.
Remember, the communication flows in the agencies form are directional.
So our previous example,
the CO can initiate communication with the developer who can then respond in this chat,
but the developer cannot initiate communication with the CO much like in real organizations.
If you still want the developer to assign tasks to the CO, simply add another list with the developer first and the CO second.
I always recommend starting with as few agents as possible and then adding more agents only after the previous ones are working as expected.
Advanced inside the agency class include async mode, threads callbacks, settings callbacks and these are useful when deploying your agents' worms on various backends.
Be sure to check out our documentation for more information.
to run in your agency you have three options.
You use the Gradio interface with the demo Gradio command,
the terminal version with the run demo method or get a completion with the get completion method which is similar to all previous chat completions APIs.
Now let's create our own social media marketing agency together so you can see my entire process from start to finish.
Alright for those who are new here please install and upgrade agency swarm using the command people install upgrade agency swarm and to get
started quickly I usually run the agency swarm Geneses command.
This will activate the Geneses agency which will create all your agents for you.
Of course it doesn't get everything right just hit.
but it does speed up the process significantly.
In my prompt,
I'm just going to say that I need a Facebook marketing agency that generates ad copy,
creates images with daily three and then post them on Facebook.
As you can see, we now have our initial agency structure with three agents.
The ad copy agent, the image generator agent and the Facebook.
book manager agent.
I like how the Genesis Agency has divided these responsibilities among three different agent roles.
However, I'd like to adjust the communication flows a bit and adopt a sequential flow.
So I will instruct the Genesis CEO accordingly.
Now as you can see, we have a sequential agency structure with three communication levels.
So now I'm gonna tell with the creation of those agents.
This process does take some time so I'm going to skip this point and come back when we are finally ready to fine-tune our agents.
After all our agents have been created,
you can see that the CEO tells me that I can run this agency with the Python agency.py command.
All the folders from my tools and agents are also displayed on the left.
So the next step is to test and fine tune all these tools.
We'll start with the image generator agent.
The Genesis Agency has created one tool for this agent called image generator.
It's impressive how close this tool is to what I was actually planning to implement myself.
As you there's the a lot OpenAI to generate an image with a simple image prompt.
They add copy, theme, and specific requirements and insert them into this prompt template.
Yes, my friends, AI has just learned to prompt itself.
However, there is an issue.
It uses an outdated OpenAI package version with the DaVinci Cortex model, which is actually designed for...
code generation.
So fix this now together.
First, I'll load a new OpenAI client with a convenience method from agency swarm util.
I'll also increase the timeout because image generation can take some time.
After that, I'll adjust the API call to use the new dialy 3e model.
And then I'll set the timeout back to default.
There is one more thing that we have to do.
We have to ensure that our agents can actually use this image when posting the ad.
So what I'm gonna do next is create a new save image method that will save this image locally but here's the kicker.
I do not want my agents to pass this image path to one another because if any hallucinations occur it could cause some issues.
Instead I'll save this path to a shared state.
Essentially, shared state allows you to share certain variables in all tools across all agents.
So of having the agent manually pass the image path to another agent,
you can simply save it in one tool and access it in another.
You can also perform some complex validation logic across various agents, which I'll show you soon.
Now we are finally ready to test.
this tool.
You do this by adding a simple if name equals main statement at the end and then initializing the tool with some example parameters.
After that you can print the result of the run method.
Don't forget to load the environment with your opening i key by adding the load.tenth method at the top.
As you can see, we now have this image generated and saved locally as expected.
we can now proceed with adjusting the next tool, which is the ad copy generator tool within the ad copy agent.
This tool is also very similar to how I would personally design it myself,
so I'll just adjust the prompt a bit and save the results into the shared state.
Moving on to the next Facebook manager agent, the Genesis agency decided to create two tools for us.
The first one is the ad performance monitor tool and the second one is the add scheduler and poster tool.
While these tools are also quite close, creating an add on Facebook requires a few more steps.
Specifically, we need to first create the campaign and an add set before we can post this app.
I will use the tool creator custom GPT to request two additional tools.
The add campaign starts and the add set creator tool.
To run these tools we first need to install the Facebook business SDK which you can do with this PIP command.
Next we need to create our Facebook app.
Go to the Facebook developer website click create app select either for the use case then business for the app type add your app
name and click create app then click on add product and add marketing API.
Go back to app settings, copy your app ID, app secret, and insert them into the environment file.
Now we have to get our access token by visiting the Facebook API explorer website and adding the following permissions.
After that simply copy the token and also put it into the end file.
Working with Facebook API can be a challenge as it is known to be one of the most complex APIs out there.
So I won't delve into the details of how I fine tune these tools.
The process is extremely similar.
You simply adjust the tool then you test it and you repeat until it works as expected.
One tool we do have to check out however is the ad creator tool.
As you can see in this tool we're actually utilizing the ad copy.
headline and image path from the shared state that we saved earlier.
I have also included a model validator that checks the presence of all these necessary parameters.
If one of the parameters is not defined,
the system throws a value error and instructs the agent on which tool it needs to use first.
This approach significantly enhances the reliability of the entire system as it ensures that the Facebook manager agent cannot post any ads until all the required
steps like image generation have been completed.
So as you can see,
for example, if the image path is missing, I'm instructing the Facebook manager agent to first instruct the image generator agent to generate this image.
After successfully testing all our tools, the final step is to find the instructions.
It a good practice to include how specifically your agents should communicate with each other and also specify an exact step-by-step process for them to follow.
Lastly, I decided to make a few adjustments to the communication flows.
First, I'd like to establish a direct line of communication with our Facebook manager agent, so I'll include it in the top-level list.
Also, I'll allow our CEO agent to communicate directly with both the Facebook manager agent and the image generator agents.
Now that we have made these adjustments, we're ready to run our agency.
It is as simple as running the Python agency.py command and then opening the provided Gradio interface link in your browser.
So let's see how it works.
I'll kindly ask for not advertisement to be created for my AI development agency called arson AI.
You can see the CEO immediately instructs the ad copy agent which promptly provides a
clear headline and then not copy for my agency stating revolutionize your business with AI.
Next the CEO commands the image generator agent to create an image for the ad copy.
in this futuristic visual for our campaign.
And this show directs the Facebook manager agent to commence the campaign using the campaign starter tool.
It then creates an ad set and executes the ad creation function, posting this ad on Facebook.
You can see this newly generated Facebook ad completed with ad copy, headline and email.
live on my Facebook account.
Impressive, right?
But what if you want to analyze your campaign's performance?
Well, you can do this by directly messaging the Facebook manager agent as it was included in the top-level list.
As can see now it uses the ad performance monitor tool and then informs me that unfortunately and currently there is no data because it takes some time
for an ad to reach its audience.
In conclusion, I'd like to briefly share my roadmap for this framework.
First, I plan to establish multi-agency communication.
This feature will allow the integration of multiple agencies for super complex use cases.
Next, I'll focus on enhancing Genesis Agency, because with multi-agency communication.
the Genesis Agency will be able to test other agencies during their creation.
The is to reach a point where there is almost no need to modify those
or instructions for simple agencies like the one we've just created.
And we will continue to regularly update this framework to include the latest releases from the OpenAI Assistance API.
With the upcoming features like memory and website, browsing the possibilities are exciting to say the least.
So stay connected on our Discord where we provide some additional resources and where you can
always find help from other members developing customer agent systems with my framework.
Additionally, we're always on a lookout for new talent.
So if you're interested and you have previous example building agents' worms using my framework, through our Job Postings channel.
Lastly, if you would like to see more examples of non-standard AI agent integrations, make sure to check out this video next where I deploy a production
radio GitHub code analysis agency that runs only on the backend using GitHub actions.
Thank you for watching and don't forget to subscribe.
Lingua di traduzione
Seleziona la lingua di traduzione

Sblocca altre funzionalità

Installa l'estensione Trancy per sbloccare altre funzionalità, tra cui sottotitoli AI, definizioni di parole AI, analisi grammaticale AI, parlato AI, ecc.

feature cover

Compatibile con le principali piattaforme video

Trancy non fornisce solo supporto per sottotitoli bilingue su piattaforme come YouTube, Netflix, Udemy, Disney+, TED, edX, Kehan, Coursera, ma offre anche traduzione di parole/frasi AI, traduzione immersiva a tutto testo e altre funzionalità per pagine web regolari. È un vero assistente per l'apprendimento delle lingue tutto in uno.

Tutti i browser delle piattaforme

Trancy supporta tutti i browser delle piattaforme, inclusa l'estensione per il browser iOS Safari.

Modalità di visualizzazione multiple

Supporta modalità teatro, lettura, mista e altre modalità di visualizzazione per un'esperienza bilingue completa.

Modalità di pratica multiple

Supporta dettatura di frasi, valutazione orale, scelta multipla, dettatura e altre modalità di pratica.

Sommario video AI

Utilizza OpenAI per riassumere i video e comprendere rapidamente i contenuti chiave.

Sottotitoli AI

Genera sottotitoli AI precisi e veloci per YouTube in soli 3-5 minuti.

Definizioni di parole AI

Tocca le parole nei sottotitoli per cercarne le definizioni, con definizioni alimentate da AI.

Analisi grammaticale AI

Analizza la grammatica delle frasi per comprendere rapidamente il significato delle frasi e padroneggiare punti grammaticali difficili.

Altre funzionalità web

Oltre ai sottotitoli video bilingue, Trancy fornisce anche traduzione di parole e traduzione a tutto testo per pagine web.

Pronto per iniziare

Prova Trancy oggi e scopri le sue funzionalità uniche

Scarica