Patent Explains How FSD Works

November 6, 2024

By Karan Singh

Thanks to a Tesla patent published last year, we have a great look into how FSD operates and the various systems it uses. SETI Park, who examines and writes about patents, also highlighted this one on X.

This patent breaks down the core technology used in Tesla’s FSD and gives us a great understanding of how FSD processes and analyzes data.

To make this easily understandable, we’ll divide it up into sections and break down how each section impacts FSD.

Vision-Based

First, this patent describes a vision-only system—just like Tesla’s goal—to enable vehicles to see, understand, and interact with the world around them. The system describes multiple cameras, some with overlapping coverage, that capture a 360-degree view around the vehicle, mimicking but bettering the human equivalent.

What’s most interesting is that the system quickly and rapidly adapts to the various focal lengths and perspectives of the different cameras around the vehicle. It then combines all this to build a cohesive picture—but we’ll get to that part shortly.

Branching

The system is divided into two parts – one for Vulnerable Road Users, or VRUs, and the other for everything else that doesn’t fall into that category. That’s a pretty simple divide – VRUs are defined as pedestrians, cyclists, baby carriages, skateboarders, animals, essentially anything that can get hurt. The non-VRU branch focuses on everything else, so cars, emergency vehicles, traffic cones, debris, etc.

Splitting it into two branches enables FSD to look for, analyze, and then prioritize certain things. Essentially, VRUs are prioritized over other objects throughout the Virtual Camera system.

The many data streams and how they're processed.

Virtual Camera

Tesla processes all of that raw imagery, feeds it into the VRU and non-VRU branches, and picks out only the key and essential information, which is used for object detection and classification.

The system then draws these objects on a 3D plane and creates “virtual cameras” at varying heights. Think of a virtual camera as a real camera you’d use to shoot a movie. It allows you to see the scene from a certain perspective.

The VRU branch uses its virtual camera at human height, which enables a better understanding of VRU behavior. This is probably due to the fact that there’s a lot more data at human height than from above or any other angle. Meanwhile, the non-VRU branch raises it above that height, enabling it to see over and around obstacles, thereby allowing for a wider view of traffic.

This effectively provides two forms of input for FSD to analyze—one at the pedestrian level and one from a wider view of the road around it.

3D Mapping

Now, all this data has to be combined. These two virtual cameras are synced – and all their information and understanding are fed back into the system to keep an accurate 3D map of what’s happening around the vehicle.

And it’s not just the cameras. The Virtual Camera system and 3D mapping work together with the car’s other sensors to incorporate movement data—speed and acceleration—into the analysis and production of the 3D map.

This system is best understood by the FSD visualization displayed on the screen. It picks up and tracks many moving cars and pedestrians at once, but what we see is only a fraction of all the information it’s tracking. Think of each object as having a list of properties that isn’t displayed on the screen. For example, a pedestrian may have properties that can be accessed by the system that state how far away it is, which direction it’s moving, and how fast it’s going.

Other moving objects, such as vehicles, may have additional properties, such as their width, height, speed, direction, planned path, and more. Even non-VRU objects will contain properties, such as the road, which would have its width, speed limit, and more determined based on AI and map data.

The vehicle itself has its own set of properties, such as speed, width, length, planned path, etc. When you combine everything, you end up with a great understanding of the surrounding environment and how best to navigate it.

Temporal Indexing

Tesla calls this feature Temporal Indexing. In layman’s terms, this is how the vision system analyzes images over time and then keeps track of them. This means that things aren’t a single temporal snapshot but a series of them that allow FSD to understand how objects are moving. This enables object path prediction and also allows FSD to understand where vehicles or objects might be, even if it doesn’t have a direct vision of them.

This temporal indexing is done through “Video Modules”, which are the actual “brains” that analyze the sequences of images, tracking them over time and estimating their velocities and future paths.

Once again, heavy traffic and the FSD visualization, which keeps track of many vehicles in lanes around you—even those not in your direct line of sight—are excellent examples.

End-to-End

Finally, the patent also mentions that the entire system, from front to back, can be – and is – trained together. This training approach, which now includes end-to-end AI, optimizes overall system performance by letting each individual component learn how to interact with other components in the system.

Summary

Essentially, Tesla sees FSD as a brain, and the cameras are its eyes. It has a memory, and that memory enables it to categorize and analyze what it sees. It can keep track of a wide array of objects and properties to predict their movements and determine a path around them. This is a lot like how humans operate, except FSD can track unlimited objects and determine their properties like speed and size much more accurately. On top of that, it can do it faster than a human and in all directions at once.

FSD and its vision-based camera system essentially create a 3D live map of the road that is constantly and consistently updated and used to make decisions.

November 5, 2024

By Karan Singh

As part of an update to its AI roadmap, Tesla has also announced the features that will be in FSD v13. Tesla provided many details about what we can expect, and there’s a lot of info to break down.

Tesla’s VP of AI, Ashok Elluswamy, also revealed that FSD v13 is expected to make FSD Unsupervised feature complete. That doesn’t mean that autonomy will be ready, as each feature will still need to work at safety levels higher than a human, but it means every key feature of autonomous vehicles will be present in FSD v13.

Let’s examine the v13 feature list Tesla and Tesla employees have recently provided to see exactly what’s coming.

Higher Resolution Video & Native AI4

FSD v12 has been trained using Tesla’s HW3 cameras and downsampling the AI4 cameras to match. For the first time, Tesla will use AI4’s native camera resolution to get the clearest image possible. Not only will Tesla increase the resolution, but they’re also increasing the capture rate to 36 FPS (frames per second). This should result in extreme smoothness and the ability of the vehicle to detect objects earlier and more precisely. It’ll be a big boon for FSD, but it’ll come at the price of processing all of this additional information.

The HW3 cameras have a resolution of about 1.2 megapixels, while the AI4 cameras have a resolution of 5.44 megapixels. That’s a 4.5x improvement in raw resolution – which is a lot of new data for the inference computer and AI models to deal with.

Yun-Ti Tsai, Senior Staff Engineer at Tesla AI, mentioned on X that the total data bandwidth is 1.3 gigapixels per second, running at 36 hertz, with nearly 0 latency between capture and inference. This is one of the baseline features for getting v13 off the ground, and through this feature update, we can expect better vehicle performance, sign reading, and lots of little upgrades.

Bigger Models, Bigger Context, Better Data

The next big item is that Tesla will increase the size of the FSD model by three times and the overall context length by the same amount. What that means, in simple terms, is that FSD will have a lot more information to draw upon—both at the moment (the context length) and from background knowledge and training (model size).

In layman’s terms, Tesla has made the FSD brain bigger and increased the amount of information it can remember. This means that FSD will have a lot more data to work with when making decisions, both from what’s happening right now and from what it has learned in the past.

Beyond that, Tesla has also massively expanded the data scaling and training compute to match. Tesla is increasing the amount of training data by 4.2 times and increasing their training commute power by 5x.

Video of the inside of Cortex today, the giant new AI training supercluster being built at Tesla HQ in Austin to solve real-world AI pic.twitter.com/DwJVUWUrb5

— Elon Musk (@elonmusk) August 26, 2024

Audio Intake

Tesla’s FSD has famously only relied upon visual data—equivalent to what humans can access. LiDAR hasn’t been on Tesla’s books except for model validation, and radar, while used in the past, was mostly phased out.

Now, Tesla AI will integrate audio intake into FSD’s models, with a focus on better handling of emergency vehicles. FSD will soon be able to react to emergency vehicles, even before it sees them. This is big news and is in line with how Tesla has been approaching FSD—through a very human-like lens.

We’re excited to see how these updates pan out – but there was one more thing. Ashok Elluswamy, VP of AI at Tesla, confirmed on X that they’ll add the ability for FSD to honk the horn.

Other Improvements

The other improvements, while major, can be summarized pretty simply. Tesla is focusing on improving smoothness and safety in various ways. The v13 AI will be trained to predict and adapt for collision avoidance, navigation, and better following traffic controls. This will make it more predictable for users and other drivers and improve general safety.

Beyond that, Tesla is also working on a better representation of the map and navigation inputs versus what FSD actually does. In complex situations, FSD may choose to take a different turn or exit, even if navigation is telling it to go the other way. This future update will likely close this gap and ensure that your route and FSD’s path planner match closely.

Of course, Tesla will also be working on adding Unpark, Reverse, and Park capabilities, as well as support for destination options, including parking in a spot, driveway, or garage or just pulling over at a specific point, like at an entrance.

Finally, they’re also working on adding improved camera self-cleaning and better handling of camera occlusion. Currently, FSD can and will clean the front cameras if they are obscured with debris, but only if they are fully blocked. Partial blockages do not trigger the wipers. Additionally, when the B-Pillar cameras are blinded by sunlight, FSD tends to have difficulties staying centered in the lane. This specific update is expected to address both of these issues.

FSD V13 Release Date

Tesla announced that FSD v13 will be released to employees this week, however, it’ll take various iterations before it’s released to the public. Tesla mentioned that they expect FSD v13 to be released to customers around v13.3, but surprisingly, they state that this will happen around the Thanksgiving timeframe — just a few weeks away.

Tesla is known for delays with its FSD releases, so we’re cautious about the late November timeline. However, the real takeaway is that FSD v13 is expected to offer a substantial leap in capability over the next few months—even if it’s exclusive to AI4.

November 5, 2024

By Karan Singh

As part of a recent update to the Tesla App, Tesla has added some slick new user interfaces to the Service and Roadside sections of the app. These new interfaces allow you to easily select the areas that are damaged or need attention.

These new screens in the service and roadside menus have been available in North America since at least late August, but appear to be rolling out to more users.

For now, it also appears to be restricted to the Model 3 (both original and 2024 Highland refresh), as well as the Model Y. The Model S and X, both redesigned or legacy – do not appear to have this new service menu just yet. The Cybertruck is the same – no unique service menu for it just yet, either.

Exterior Select Screen

In order to see this new screen, you can open up your Model 3 or Model Y’s service menu in the Tesla app. From there, you can select “Exterior” as the area of concern. Once there, you’ll be greeted with a short tutorial that will show you through the new interface.

Instead of having to type your areas of concern which could be ambiguous, this new interface makes it easy to select the affected areas. Just swipe left/right to view the car from different angles and tap the area that needs service.

Areas you select become highlighted in blue, and once you’re done selecting all of them, you can provide notes to each particular area. You can even select areas under the vehicle, like the front aero shield or center skid plate.

While most glass is selectable here, the windshield and top glass areas are a separate option in the service menu under Collision and Glass > Glass & Windows. However, windshield wipers in the second last visualization – Top.

Tire Select Screen

Similar to the Exterior and Glass screens, you can open the service menu and then select Tires & Wheels as your area of concern. You’ll have two sets of options here.

You can either go to Wheels > Wheel Damage; or Tires > Replacement Tires. Here, you’ll be able to identify specific wheels or tires for service as required.

The app’s roadside section offers a similar tire view, which lets you select which tire is flat.

Thanks to its app and mobile service, Tesla has been providing one of the best service experiences in the industry. While they still have work to do regarding part availability and reasonable wait times, their technology and simplicity continue to impress. We’re excited to see what improvements they’ll make next.

Source link

Patent Explains How FSD Works

Vision-Based

Branching

Virtual Camera

3D Mapping

Temporal Indexing

End-to-End

Summary

Higher Resolution Video & Native AI4

Bigger Models, Bigger Context, Better Data

Audio Intake

Other Improvements

FSD V13 Release Date

Exterior Select Screen

Tire Select Screen

Related Posts

About The Author

Admin

Add Comment

Cancel reply