LLVM, a toolkit used to build and optimize compilers.
Building a programming language from scratch is hard.
You have humans who want to write code in a nice simple syntax, then machines that need to run it on all sorts of architectures.
LLVM standardizes the extremely complex process of turning source code into machine code.
by grad student Chris Latner at the University of Illinois,
and today it's the magic behind clang for C and C++, as well as languages like Rust, Swift, Julia, and many others.
Most importantly, it represents high-level source code in a language-agnostic code called Intermediate Representation, or IR.
This means vastly different languages, like CUDA, and W.
Ruby produce the same IR, allowing them to share tools for analysis and optimization before they're converted to machine code for a specific chip architecture.
A compiler can be broken down into three parts.
The front end parses the source code text and converts it into IR.
The middle end analyzes and optimizes this generated code.
And finally, the back end converts the IR into IR.
To build your own programming language from scratch right now, install LLVM, then create a C++ file.
Now envision the programming language syntax of your dreams.
To make that high level code work,
you'll first need to write a lexer to scan the raw source code and break it into a collection of tokens,
like literals, identifiers, keywords, operators, and so on.
Next, we'll need to define
find an abstract syntax tree to represent the actual structure of the code and how different tokens relate to each other,
which is accomplished by giving each node its own class.
Third, we need a parser to loop over each token and build out the abstract syntax tree.
If you made it this far, congratulations, because the hard part is over.
Now we can import a of LLVM primitives to generate the intermediate representation.
Each type in the abstract syntax tree is given a method called codegen which always returns an LLVM value object used to represent a single assignment register which is a
variable for the compiler that can only be assigned once.
What's interesting about these IR primitives is that unlike assembly they're independent of any particular machine architecture and that dramatically
simplifies things for language developers who no longer need to match the output to a processor's instruction set.
Now the front end can generate IR, the OPT tool is used to analyze and optimize the generated code.
It multiple passes over the IR and does things like dead code elimination and scalar replacement of aggregates.
that brings us to the back end where we write a module that takes IR as an input that he met's object code that can run on any architecture.
Congratulations, you just built your own custom programming language and compiler in 100 seconds.
Hit the like button and subscribe if you want to see more short videos like this.
Thanks for watching, and I will see you in the next