MBodi: Demo at AI Camp Demo Day - Phụ đề song ngữ YouTube

MBodi: Demo at AI Camp Demo Day - Phụ đề song ngữ

I'm really, really excited for you guys to see what they've been able to achieve in the last three months.

So welcome to the stage Xavier from Embody.

All right.

Hi, everyone.

I'm Xavier, one of the co-founder of InBody.

So, here, I believe most of you have watched these movies, and what we believe robots should be like.

And the very minimum, we can teach them something.

They learn stuff, right?

But in reality, these are the common robots, the common commercially available robots.

I would say they are great at what they do.

But question I keep asking myself is that in this AI world, why have robots left behind?

Why?

Because robots can't just...

they are either trained or scripted program to do exactly one thing in one environment.

And even small detail, make them useless.

And is scarce and expensive to collect.

Right now, the robot dataset relies on very heavy teleoperation.

Remote control to collect data and a great example is the open-cross embodiment dataset

That's quite famous and it's collected across 18 months by 20 21 institutions.

It's very expensive and Transferturting doesn't work because as we train a model make a robot train a robot to do some tasks.

If change to a different robot, a different hardware, it doesn't work.

And even a camera angle change, like you're looking, your eyes looking differently, then things will break.

Plus, the current solution are slow.

Robot a minimum of 10 Hertz to operate well.

While current A large language model and vision language model are just not that good.

So generative AI has the potential to imbue robots with human-like capabilities.

And people are trying really hard to bring generative AI to robots.

Oh, an interesting thing.

Well, most these humanoid robots still remote control are tolerated.

But it's really hard to generalize with great accuracy.

This a example.

RT1 and RT2.

These are the foundation models by Google DeepMind.

And RT2, the best of the best, released several months ago.

And the same task, it was trained on.

If you change the background, you'll There's like about 20, 25% of filerate.

If you change to a different environment, the filerate is up to 50%.

And the RT1 was fundamental.

When you change to a different background, like table color, the filerate will be like 50%.

how to increase task-relevant data at scale with the best end user experience to make robots smarter and better.

Introducing Embody.

We are dedicated to enabling Internet-scale learning robotics.

How are we this?

We're creating a unique end-to-end learning platform.

We have a very easy interface to teach robot by just commanding it or visual demonstration

or integrating to the existing stack and the world model.

What is the world model?

do something,

and it'll go through the world model to generate an augment, 100x, the data by diffusion, and the foundation model for any rebuy embodiment.

And all of these are not possible without the end-to-end data pipeline, which chain all of these components to get together.

No amount of machine learning expertise or scientists needed.

Just teach a and collect the data, and the data will be automatically passed to the world model for augmentation in the cloud.

We get 100x amount of data and to define training and training of the foundation model.

What do get?

A capable robot.

And what's amazing about this is that as the robot's more capable, it's even easier to train the robot, the teacher robot, and collect data.

This is an example.

This is a pharmaceutical robot in the hospital.

What it does is that it's able to pick some of some And that's the only thing it does.

And what if we wanted to pick up something else,

like a syringe,

like mask or scissors, just integrate and body to the stack and teach it, commanding it to do, to pick up syringe or mask.

And then a couple of times, all this data will be passed automatically to the cloud and get trained.

And the robot will be more And, this is not limited to a single hardware.

Any embodiment can do.

All right, here's our team.

Sebastian, our co-founder and CEO, right there.

I always call him genius, but too humble to take that.

Yeah.

Physician.

machine learning researcher, and that's me.

We a close friend and teammate since the Google time,

and we were managing Google public DNS, the largest DNS resolver, we minimized latencies and protect the whole DNS internet traffic.

And robot time.

I believe most of you guys haven't really seen a robot being taught to do something, so let's see if this works.

I'm a little bit nervous here, because right now it's not scripted, not programmed, everything live, 100%.

All right.

Let's teach a robot to pick up the marker, right?

Okay.

Let me...

fold this up.

Let me check the microphone.

Good.

Okay, I'm going to teach you how to pick up the marker on the desk.

And as you can see, it's to your left.

What do you think your first action should be?

My first action should be to move my gripper to the left to align with the marker.

You're close.

You need to move just a tiny bit a bit to the left.

the left more I will adjust slightly more to the left to perfectly align

with the marker you are actually right above it now try to go down and pick it up from the table.

I will move down to the table and open my gripper to prepare for picking up the marker.

Okay, I think you are at position, just close your hand and lift your arm.

Closing my gripper to grasp the marker.

All right, good job now, release it.

releasing the marker now.

All right, I just moved the marker to a slightly different location to your right.

Can you try to pick it up again?

Let's see.

Moving my gripper to the right.

location of the marker.

Good position.

Now I'll try to go down and pick it up.

Moving down to the table to pick up the marker.

Alright, good job.

Now I'll close your hand and end this arm.

Closing my gripper to grasp the marker.

As we teach a robot, every conversation, every action, they got passed to the cloud instantly for a continuous training.

So I would love to pull up AWS like terminal here.

just that's the part, that's the end-to-end part, which is very fascinating.

And this is the n-body world model.

I would like to show you a quick peek of what more model it's like.

As a robot collects data like that, for example, picking up a fork.

And remember,

when you change with different background or do it,

so how to solve it with Embodied World Model by diffusion,

what about office top,

or wood floor,

or a gold arm with blue floor,

or a carpet, with Embodied World Model we can generate an imaginable amount of data from just a single one.

And one more thing, we are open sourcing our new cross-embodiment robot foundation model in coming weeks.

It will be available on Hugging phase.

Interesting thing, we talked to Clam, CEO of Hugging phase, last week, and also just launched their robot division.

We think the advancement of technology in a space like robotics, it's not done by just one or two companies.

It's across all the people and all the companies at every stack, every layer, and we hope we can improve this, push the boundary forward.

And with MMI, we believe we can increase the efficiency robotics training and data collection by 10x with a limited potential for robot generalization.

Yeah, robots are a $15 billion market and each of them only has one job.

What if every robot could learn to do anything,

but actually, in fact, robots should be a multi-trillion dollar market, because should be in our lives at anywhere, and actually, they can be.

And we are raising to accelerate our research, engineering, and product efforts.

We hope we can bring artificial general embodied intelligence, a little bit close.

Alright, let's get in touch.

Ngôn ngữ dịch

Chọn

Tóm tắt Xuất bản Luyện tập nói

Mở khóa nhiều tính năng hơn

Cài đặt tiện ích Trancy để mở khóa nhiều tính năng hơn, bao gồm phụ đề AI, định nghĩa từ AI, phân tích ngữ pháp AI, nói chuyện AI, v.v.

Tương thích với các nền tảng video chính

Trancy không chỉ cung cấp hỗ trợ phụ đề song ngữ cho các nền tảng như YouTube, Netflix, Udemy, Disney+, TED, edX, Kehan, Coursera, mà còn cung cấp dịch từ/câu bằng AI, dịch toàn văn sâu sắc và các tính năng khác cho các trang web thông thường. Đây là một trợ lý học ngôn ngữ đa năng thực sự.

Trình duyệt trên tất cả các nền tảng

Trancy hỗ trợ tất cả các trình duyệt trên tất cả các nền tảng, bao gồm tiện ích trình duyệt iOS Safari.

Nhiều chế độ xem

Hỗ trợ chế độ xem rạp, đọc, kết hợp và các chế độ xem khác để có trải nghiệm song ngữ toàn diện.

Nhiều chế độ luyện tập

Hỗ trợ luyện viết câu, đánh giá nói, trắc nghiệm nhiều lựa chọn, viết theo mẫu và các chế độ luyện tập khác.

Tóm tắt video AI

Sử dụng OpenAI để tóm tắt video và nắm bắt nhanh nội dung chính.

Phụ đề AI

Tạo phụ đề AI chính xác và nhanh chóng trên YouTube chỉ trong 3-5 phút.

Định nghĩa từ AI

Chạm vào từ trong phụ đề để tra cứu định nghĩa, với định nghĩa dựa trên AI.

Phân tích ngữ pháp AI

Phân tích ngữ pháp câu để nhanh chóng hiểu ý nghĩa câu và nắm vững các điểm ngữ pháp khó.

Nhiều tính năng web khác

Ngoài phụ đề song ngữ cho video, Trancy còn cung cấp dịch từ và dịch toàn văn cho các trang web.