Massachusetts Institute of Technology researchers use deep learning to process point clouds to make it easier for autonomous driving vehicles to understand the 3D world.

Posted 2025-10-13 00:00:00 +0000 UTC

The lidar sensor of the car sends out the infrared light pulse and measures the time when they bounce back from the object. The sensor creates a point cloud and creates a 3D snapshot of the surrounding environment of the car to help the car drive. It is very difficult to understand the original point cloud data, and before the era of machine learning, it needs trained engineers to identify the characteristics they want to capture manually. According to foreign media reports, researchers from the computer science and AI Laboratory (CSAIL) of MIT have recently published a series of papers that show that point clouds for 3D imaging applications can be automatically processed using deep learning. Justin Solomon, a professor at MIT and one of the senior authors of the paper, said, "at present, 90% of computer vision and machine learning only involve 2D images. Our work is designed to help better represent the 3D world, not only in autonomous driving applications, but also in all areas where 3D shape needs to be understood. " Previously, most methods were not particularly successful in obtaining patterns of point clouds from data, which is necessary to obtain useful information from 3D points in space. In one of the team's papers, researchers demonstrated their method of analyzing point clouds, edgeconv, which can classify and segment single objects by using dynamic graph convolution neural network. Wadim Kehl, a machine learning scientist at the Toyota Research Institute, said, "this algorithm can capture hierarchical patterns by building graphs of adjacent points, thus inferring various types of general information, which can be used by a variety of downstream tasks." In addition, the team studied other aspects of point cloud processing. For example, most sensors change their viewing angle when they move in the 3D world, and each time they scan the same object again, the position of the object may be different from that seen last time. To merge multiple point clouds into a detailed world view, you need to align multiple 3D points, a process called registration. "Registration enables us to integrate 3D data from different sources into a common coordinate system," said Dr. Yue Wang, one of the authors of the paper. Otherwise, we cannot get meaningful information from these methods. " The second paper of Solomon and Wang demonstrated a new registration algorithm, called DCP (deep closest point), which can better find the recognizable pattern, point and edge of point cloud, so as to align with other point clouds. This is especially important for the automatic driving vehicle to determine its location in the environment. One limitation of DCP is that it assumes that the whole shape can be seen, not only one side. This means that DCP can't align partial views of an object's shape (called "partial to partial registration"). Therefore, in the third paper, researchers propose an improved algorithm called partial registration network (PRNET). Compared with 2D images and photos, the existing 3D data is often chaotic and unstructured, Solomon said. Solomon's team attempts to extract meaningful information from chaotic 3D data in a controlled environment without the need for a large number of machine learning technologies. DCP and PRNET indicate that a key aspect of point cloud processing is context. The geometry required to align point cloud a with Point Cloud B may be different from that required to align point cloud C. For example, in partial registration, a part of the shape of a point cloud may not be visible on other point clouds, so it cannot be used for registration. Wang said the team's tools have been used by many researchers in computer vision and other fields. Next, researchers hope to apply these algorithms to real world data, including data collected from autopilot. Wang also said they plan to use self supervised learning to explore the potential of training their systems to minimize the human annotation required.

Copyright © 2020. TUTESL All rights reserved.