Google Optical Sensor Algorithm Helps Robots See Clear Objects
- By John K. Waters
Optical sensors such as cameras and LIDAR (short for "light detection and ranging") have become critical components of modern robotics design. Everything from a mobile robot piece-picking deployment in a warehouse to a fleet of self-driving cars needs to be able to "see" obstacles to avoid and/or objects to grasp -- even if they're transparent.
Recognizing a window or a plastic bottle is simple for the human eye but trickier for robot vision systems, which are traditionally taught to recognize objects that reflect light evenly in all directions. The surfaces of transparent objects both reflect and refract light, which tends to confound existing systems.
To solve this problem, Google's AI group collaborated with researchers at Columbia University and computer vision company Synthesis AI to create ClearGrasp, a machine learning algorithm capable of estimating accurate 3-D data of transparent objects from RGB-D images. RGB (red, green, blue) refers to a system for representing the colors used on a computer display. Red, green and blue can be combined in various proportions to create any color in the visible spectrum. An RGB-D image is a combination of an RGB image and its corresponding depth image.
"Enabling machines to better sense transparent surfaces would not only improve safety," the researchers said in a blog post, "but could also open up a range of new interactions in unstructured applications -- from robots handling kitchenware or sorting plastics for recycling, to navigating indoor environments or generating AR visualizations on glass tabletops."
ClearGrasp provides a deep learning approach for estimating accurate 3-D geometry of transparent objects from a single RGB-D image for robotic manipulation. It uses three deep convolutional networks: one to estimate surface normals, one for occlusion boundaries (depth discontinuities) and one that masks transparent objects. The mask is used to remove all pixels belonging to transparent objects, the researchers explained, so that the correct depths can be filled in.
"We then use a global optimization module that starts extending the depth from known surfaces, using the predicted surface normals to guide the shape of the reconstruction, and the predicted occlusion boundaries to maintain the separation between distinct objects," they said.
To train and test ClearGrasp, the researchers constructed a large-scale synthetic dataset of more than 50,000 RGB-D images, along with a real-world test benchmark with 286 RGB-D images of transparent objects and their "ground truth geometries." The researchers have released to the public this dataset, along with a dataset of 286 real-world transparent objects used in the development of ClearGrasp.
The experiments demonstrated that ClearGrasp is substantially better than monocular depth estimation baselines and is capable of generalizing to real-world images and novel objects, the researchers explained. They also demonstrated that ClearGrasp can be applied out-of-the-box to improve grasping algorithms' performance on transparent objects.
"We hope that our dataset will drive further research on data-driven perception algorithms for transparent objects," they wrote.
Download links and more example images are available on the project Web site and a GitHub repository.
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at firstname.lastname@example.org.