Data Annotation: The Next Frontier for Autonomous Vehicles


Data Annotation: The Next Frontier for Autonomous Vehicles

May 28, 2020

The number of autonomous vehicles (AV) is set to rapidly increase in the coming years. Gartner has forecast that by 2023, over 740,000 cars with autonomous driving capabilities will be added to the market, and this growth will largely be driven by countries such as North America, Greater China, and Western Europe. Despite this growth, these vehicles have limited autonomous capabilities and still operate under human supervision. Besides, AVs face several challenges, including safety, issues over regulations, technology standardization, and more.  

To develop their autonomous capabilities, Algorithms that drive the AV should be trained to detect, track, and classify objects and take informed decisions for path planning and safe navigation. For an AV to reach acceptable levels of safety norms, it must clock billions of incident-free miles. The transition of a vehicle from level 2 to level 3 of automation, as determined by the standards body SAE International, is not an incremental, but an exponential one. This is due to the increasing complexity of AVs, use of multiple sensors, and most importantly, the petabytes of raw data captured continuously by a fleet of Autonomous Vehicles. The big question, however, is what can we do with this raw data?  

The algorithm developer is tasked with creating high performance and robust algorithms that are capable of accurately detecting other vehicles, lane markings, static and moving objects, pedestrians, traffic lights, and traffic signage at crossings and intersections in any scenario – inside a tunnel, on a pitch-dark highway, or in glaring sunlight. For a perception algorithm to work effectively, it needs to be trained with a steady influx of high-quality annotated or labelled data.    

How Data Curation Works 

Data curation or Annotation is the process of tagging or classifying objects in each frame captured by an AV. This data then needs to be curated so that it is understood by the deep learning model, and relevant objects need to be identified and tagged or labeled. The tagging of objects could be a manual or using AI assisted annotation, or a combination of both. Once the model is fed with the contextual trained data, it can be deployed to infer or detect patterns from data it has never processed and classify objects accurately (as illustrated in Figure 1).  

The Annotation Dilemma

In order to quickly deploy AV on the roads, huge training datasets which are accurately annotated and labelled are needed. The task of annotating sensor data from an AV starts with setting up an annotation pipeline that can process a multitude of data formats and annotation types. Typically, the raw data needs to be pre-processed and prepped for annotation. The annotation tasks could range from simple ones like 2D bounding box to more complex ones like pixel-level Semantic Segmentation or 3D Cuboids combining Camera and Lidar Data. Setting up, configuring, and executing the right annotation program is a daunting task given the number of permutations and combinations of data formats, autonomous vehicle use cases, annotation toolsets, annotation methodology, and export formats. 

Accelerating AV Development with the Right Partner

In order to accelerate the development of AVs, car manufacturers need to meet the growing demand of high-quality training datasets. They must stay up to date with the evolving landscape of annotation processes, tools, and technologies. This landscape is shifting at an accelerated rate, almost every month, as better and more efficient models are being published. This is more of a challenge than an opportunity since algorithm engineers now have hundreds of options to choose from. More importantly, there is no one-size-fits-all solution that meets all data annotation needs.

Eventually, data annotation is at the heart of the AV development process, and the robustness and effectiveness of the algorithms depend on availability of high-quality training datasets. Working with industry partners who can support the annotation needs of AV manufacturers and provide an robust quality management framework and seamless delivery execution can accelerate the AV development process.

Arun Prasad is a Business Consultant in the Autonomous and Connected Strategic Initiative of the Manufacturing Business Group at Tata Consultancy Services. His focus areas at TCS are data annotation and data management services, which he drives through consulting, go-to-market offerings, product development, and thought leadership initiatives.  He has close to 12 years of experience across manufacturing, product development, and consulting in multiple roles and geographies. His expertise spans aerospace manufacturing, internet of things, connected products, digital manufacturing, and Industry 4.0. He holds an MBA from IIM Calcutta and an undergraduate degree in mechanical engineering. 

Asutosh Mishra is a Business Analyst in the Autonomous and Connected Strategic Initiative of the Manufacturing Business Group at Tata Consultancy Services. His focus areas include data annotation, labeling, verification, and validation for autonomous vehicles. He has more than five years of experience in various technical and management roles across multiple industries. He holds an MBA in operations and marketing from Xavier Institute of Management, Bhubaneswar and an undergraduate degree in electronics and instrumentation from NIT, Rourkela.