The dataset is composed of 150 synthetic scenes, captured with a (perspective) virtual camera, and each scene contains 3 to 5 objects. This paper introduces SceneNN, an RGB-D scene dataset consisting of 100 scenes that is used as a benchmark to evaluate the state-of-the-art methods on relevant research problems such as intrinsic decomposition and shape completion. This dataset contains 1800 stereo pairs with ground truth disparity maps, occlusion maps and discontinuity maps that will help to further develop the st We present a new large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high scene, layout, recognition, indoor, object, segmentation, rendering, 3d, realism, room, synthetic, trajectory, reconstruction, scene, slam, lighting, indoor, segmentation, robot, rendering, 3d, synthetic, navigation, object, 3d, kinect, reconstruction, depth, recognition, indoor, face, emotion, segmentation, 3d, recognition, biometry, frontview, object, segmentation, benchmark, semantic, context, recognition, detection, mesh, segmentation, recognition, synthetic, part, human, recognition, object, pedestrian, segmentation, pascal, detection, semantic, driving, street, urban, time, recognition, autonomous, video, segmentation, robot, classification, detection, car, synthetic, benchmark, paris, reconstruction, pointcloud, outdoor, 3d, source, architecture, semantic, code, urban, mesh, recognition, segmentation, classification, reconstruction, depth, large-scale, indoor, normal, building, panorama, segmentation, 3d, semantic, urban, benchmark, recognition, aerial, canada, segmentation, photogrammetry, germany, 3d, multiview, city, semantic, similarity, scene, summary, user, indoor, outdoor, video, 3d, clustering, study, segmentation, annotation, benchmark, semantic, scene, recognition, scene, segmentation, pedestrian, 3d, classification, understanding, car, semantic, object, natural-image, centered, scene, image classification, segmentation, benchmark, classification, synthetic, texture, segmentation, dynamic, video repetition, synthetic, texture, urban, stereo, depth, reconstruction, leuven, segmentation, 3d, semantic, sfm, graz, indoor, video, object, pedestrian, multiview, tracking, camera, multitarget, detection, calibration, urban, human, recognition, video, pedestrian, segmentation, tracking, multitarget, detection, object, detection, image, centered, classification, scene, object, color, patch, scene, tiny, image classification, object, urban, fine-grained, classification, recognition, vehicle, car, attribute, driving, street, urban, time, recognition, autonomous, video, segmentation, robot, classification, detection, car, year, segmentation, urban, semantic, recognition, facade, rectified, object, detection, aspect, perspective, ratio, layout, video, motion, dynamic, classification, scene, recognition, segmentation, 3d, semantic, classification, pointcloud, laser, urban, learning, scene, feature, place, recognition, video, object, egocentric, 3d, interaction, pose, tracking, gesture, skeleton, kinect, depth, human, recognition, action, illumination, segmentation, segmentation, benchmark, shape, recognition, pascal, category, semantic, dense, urban, surface, reconstruction, pointcloud, object, road, pedestrian, network, line, 3d, crowd, counting, detection, groundtruth, internet, reconstruction, recognition, image, community, social, 3d, clustering, detection, flickr, landmark, space, human, recognition, image, amazon, satellite, segmentation, learning, deep, classification, biology, resolution, video, detection, 3d, action, reconstruction, recognition, segmentation, benchmark, evaluation, classification, synthetic, texture, video, object, benchmark, classification, recognition, detection, action, evaluation, graz, object, laboratory, pedestrian, segmentation, multiview, tracking, camera, detection, calibration, video, object, segmentation, motion, pedestrian, benchmark, tracking, groundtruth, urban, reconstruction, recognition, building, 3d, classification, city, semantic, video, object, flow, segmentation, detection, optical, object, segmentation, annotation, mask, visual, tracking, urban, reconstruction, video, segmentation, 3d, classification, camera, semantic, person, depth, recognition, indoor, top-view, video, clothing, gender, reidentification, identification, people, code, quality, benchmark, video segmentation, object, segmentation, hd, tracking, resolution, rgbd, color, dynamic, multi-view, action, outdoor, video, 3d, face, emotion, lidar, human, indoor, multi-mode, model, building, urban, reconstruction, floorplan, layout, apartment, indoor, video, object, segmentation, motion, model, camera, groundtruth, object, recognition, attribute, classification, imagenet, urban, paris, grammar, facade, recognition, segmentation, procedural, architecture, semantic, city, object, scanner, 3d, reconstruction, point, model, laser, recognition, soccer, outdoor, object, pedestrian, game, pose, multiview, tracking, camera, multitarget, detection, urban, similarity, facade, recognition, segmentation, structure, classification, rectification, semantic, motion, video, object, proposal, flow, segmentation, stationary, model, camera, optical, groundtruth, object, rgbd, 3d, estimation, pose, texture-less, video, open-view, cross-view, recognition, indoor, action, multi-camera, motion, benchmark, video, object, pedestrian, segmentation, tracking, groundtruth, perspective, human, indoor, room, surveillance, detection, fisheye, omnidirectional, people, video, object, segmentation, motion, model, camera, codebook, reconstruction, matching, recognition, retrieval, 3d, classification, feature, flickr, landmark, paris, pointcloud, frontview, limited, 3d reconstruction, 3d, flickr, landmark, sfm, stereo, object tracking, depth, reconstruction, detection tracking, object detection, segmentation, odometry, optical flow, semantic car depth, sfm, benchmark, texture segmentation, texture classification, synthetic, 3d reconstruction, 3d, benchmark, sfm, multiview, image retrieval, object, rotation, centered, segmentation, action, behavior, video surveillance, human, background, urban, stereo, reconstruction, path, panorama, 3d, odometry, navigation, segmentation, motion, background, pedestrian, detection, description, 3d, benchmark, registration, reconstruction, shape, matching, video, segmentation, action classification, wearable, kinect, time, human, recognition, action, depth image processing - tug, accelerometer, video, building, urban, detection, 3d, estimation, plane, graz, outdoor, video, object, panorama, pedestrian, network, crowd, multiview, tracking, camera, multitarget, detection, calibration, video, metadata, segmentation, gaze data, polygon annotation, panorama, detection, car, omnidirection, recognition, human, coffee, graz, background, indoor, illumination, change, pedestrian, robust, multitarget, detection, deep learning, synthetic city urban, 3d, sfm, reconstruction, urban, benchmark, reconstruction, aerial, photogrammetry, germany, 3d, multiview, switzerland, city, video, medicine, table, depth, operation, recognition, surgery, motion, subtraction, dataset, background, object, stationary, foreground, camera, challenge, detection, groundtruth, urban, real, recognition, text, streetside, world, streetview, classification, detection, number, church, stability, 3d reconstruction, 3d, robust, geometry, landmark, sfm, illumination, material, classification, texture, light, recognition, urban, 3d, benchmark, city, reconstruction, landmark, groundtruth, ground truth, light field, disparity, depth, synthetic, slam, global shutter, indoor, aerial vehicles, constancy, color, white, chromaticity, physics, nature, dichromatic, illumination, object, balance, light, segmentation, pedestrian, sideview, object tracking, object detection, overlap, stereo, depth, pointcloud, noise, stereovision, 3d, groundtruth, subpixel, multiple, benchmark, evaluation, benhttp://motchallenge.net/chmark, dataset, target, video, pedestrian, 3d, tracking, surveillance, people, annotation, urban, pan, gsd, superpixel, nir, aerial, satellite, segmentation, zurich, rgb, city, semantic, object, mono, urban, pedestrian, outdoor, scale, detection, rgbd, hand, articulation, video, segmentation, classification, pose, fingertip, detection, urban, sideview, overlap, segmentation, pedestrian, tracking, multitarget, detection, segmentation, human, buffy, movie, object detection, urban, sideview, detection, car, recognition, scale, saliency, segmentation, salient object detection, attention, visual, video, activity, classification, tracking, recognition, detection, action, house, urban, registration, floorplan, building, streetview, segmentation, localization, city, semantic, video, medicine, surgery, phase, tool, recognition, face, age, wikipedia, imdb, recognition, detection, biometry, video, segmentation, motion, airport, clustering, camera, zoom, high-definition, benchmark, human, lisbon, indoor, video, re-identification, pedestrian, network, multiview, tracking, surveillance, camera, detection, recognition, human, detection, action, boundingbox, gif, scene, summarization, summary, video highlight detection, understanding, clutter, swan, bottle, matching, nature, object detection by shape, mug, giraffe, segmentation, applelogo, multilabel, privacy, classification, flickr, scene, regression, urban, highway, spain, object, traffic, transportation, vehicle, detection, car, wearable, kinect, fall detection - adl, depth, human, recognition, action, accelerometer, video, video, segmentation, action, action classification, motion, stereo, analysis, flow, segmentation, optical, semantic, vision, lidar, detection, groundtruth, 3d, car, sfm, house, urban, aerial, building, segmentation, footprint, groundtruth, city, semantic, ransac, reconstruction, synthetic, primitive, model fitting, 3d object, evaluation, multi-view, pedestrian, animal, tracking, multi-class, vehicle, detection, synthetic, annotation, benchmark, coco, segmentation, things, captioning, stuff, groundtruth, semantic, estimation, location, reconstruction, pointcloud, world, 3d, pose, landmark, video, medicine, workflow, surgery, recognition, challenge, face, segmentation, skin, detection, benchmarking, face, real, human, recognition, world, pedestrian, identification, clustering, multiview, surveillance, detection, sequence, video, pedestrian, scene, crowd, human, understanding, anomaly, detection, medical, segmentation, xray, chest, genome, none, tuberculosis, ct, recognition, video, flow, pedestrian, crowd, surveillance, optical, detection, urban, pedestrian, classification, synthetic, occlusion, tracking, detection, person, pedestrian, ear, recognition, human, lighting, biometry, face, person, human, lighting, recognition, illumination, pedestrian, biometry, urban, reconstruction, facade, building, 3d, repetition, symmetry, sfm, synthetic, visual, odometry, fov, blender, camera, groundtruth, segmentation, person, identification, authentication, mobile, shape, biometric/ hand geometry, webcam, geography, gps, segmentation, gis, supervised, semantic, driving, benchmark, autonomous, video, road, gps, map, 3d, localization, car, face, celebrity, detection, people, recognition, human, benchmark, evaluation, fine-grained, classification, aircraft, airplane, recognition, single view, learning, indoor, outdoor, depth estimation, profile, head, cutting, edge, tools, inserts, object, tool, milling, localization, wear, monitoring, motion, background, video, modeling, segmentation, change, surveillance, detection, saliency, domain, wearable, human, recognition, action, video, summarization, video, segmentation, co-segmentation, dataset, segmentation, clutter, horse, object detection by shape, nature, segmentation, multiview depth based pose estimation, video, segmentation, action, behavior, human, background, pedestrian, 3d, identification, classification, depth, shape, video, segmentation, detection, cow, animal, background, urban, traffic, recognition, detection, traffic sign, video, laboratory, classification, reconstruction, real, food, recognition, illumination, object, urban, pedestrian, classification, outdoor, scale, segmentation, pedestrian, sideview, object detection, lowlevel, match, edge, image, contour, segmentation, patch, detection, face, recognition, wild, identification, registration, segmentation, urban, geometry, semantic, classification, nature, video, motion, action, interactive, recognition, human, illumination, face, recognition, human, expression, urban, traffic, detection, city, sign, recognition, gesture, recognition, human, action, kinect, motion, foot, human, recognition, gait, action, classification, biometry, pressure, illumination, gesture, kinect, depth, recognition, human, action, 3d, benchmark, evaluation, reconstruction, depth, 4d, lightfield, video, pedestrian, crowd, counting, tracking, detection, indoor, webcam, segmentation, 3d reconstruction, camera, depth, object, binary, tool, classification, shape, optical flow, stereo depth, synthetic, graphics, pedestrian, indoor, frontview, object tracking, object detection, multitarget, motion, nature, recognition, fish, video, water, classification, animal, camera, face, adhead, abdomen, liver, binary, medical, segmentation, optimization, bone, babyface, segmentation, clutter, object detection by shape, urban, nature, outdoor, video, segmentation, supervised, classification, context, unsupervised, geometry, semantic, motion, multiple, 3d, estimation, capture, pose, human, view, gesture, detection, benchmark, kinect, recognition, human, recognition, motion, action, classification, multiview, segmentation, urban, motion, stereo, semantic, outdoor, 3d, registration, reconstruction, shape, matching, symmetry, lidar, scan, urban, reconstruction, human, laser, heat, aerial, germany, 3d, bremen, city, osnabrueck, tracking, segmentation, camera, action, multiview, optical, optical flow, stereo depth, synthetic, graphics, urban, stereo, cities, person, video, weakly, segmentation, pedestrian, detection, car, semantic. The Farman Institute 3D Point Sets dataset contains 11 objects by a 3D laser scanner. Michael Stark, Bernt Schiele. The DynTex dataset consists of a comprehensive set of Dynamic Textures. Surface mesh segmentation file (*.segs.json): Aggregated semantic annotation file (*.aggregation.json): BenchmarkScripts/util_3d.py gives examples to parsing the semantic instance information from the *.segs.json, *.aggregation.json, and *_vh_clean_2.ply mesh file, with example semantic segmentation visualization in BenchmarkScripts/3d_helpers/visualize_labels_on_mesh.py. It is composed of ADL (activity daily living) and fall actions simulated by 11 volunteers. It used for coupled symmetry and structure from motion detection. For the first few decades of the fields existence, computer vision has been focused on algorithmic, logical approaches to perception. The framework is being constru AWS hosts a variety of public datasets that anyone can access for free. The TUG (Timed Up and Go test) dataset consists of actions performed three times by 20 volunteers. This article presents a dataset of household objects and box scenes commonly found in warehouse environments, obtained using a robotic setup with four different cameras, and provides object labels as pixel-wise masks, 3D, and 2D object bounding boxes, useful for both object recognition and instance segmentation. See SensReader/python for a very basic python data exporter. If you have not received a response within a week, it is likely that your email is bouncing - please check this before sending repeat requests. The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. This dataset provides a collection of web images and 3D models for research on landmark recognition (especially for methods based on 3D models). The dataset has been Paris-rue-Madame dataset contains 3D Mobile Laser Scanning (MLS) data from rue Madame, a street in the 6th Parisian district (France). The Notre Dame de Paris dataset used for 3D SfM reconstruction and contains 715 images provided by Noah Snavely. View 4 excerpts, cites background and methods. L Yahoo Flickr Creative Commons 100M (YFCC100M) dataset contains a list of photos and videos. The 3DVis dataset includes a set of 12 heterogeneous scenes for testing 3D scene registration and analysis methods. The HCI 4D Lightfields dataset contains 11 objects with corresponding lightfields for depth estimation. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. How Good are Local Features for Classes of Geometric Objects. See ScanNet C++ Toolkit for more information and parsing code. The Swedish Traffic Sign Recognition provides Matlab code for parsing the annotation files and displaying the results. fish video and e 200 gray level images along with ground truth segmentations. This paper investigates how to use only those labeled 2D image collections to supervise training 3D semantic segmentation models using multi-view fusion, and addresses several novel issues with this approach, including how to select trusted pseudo-labels, how to sample 3D scenes with rare object categories, and how to decouple input features from 2D images from pseudo-Labels during training. Please refer to the BundleFusion repository at https://github.com/niessner/BundleFusion . Many different labeled video datasets have been collected over the past few years, but it is hard to compare them at a glance. We also thank Occipital for donating structure sensors and Nvidia for hardware donations, as well as support by the Max-Planck Center for Visual Computing and the Stanford CURIS program. Large-scale PEdesTrian Attribute (PETA) dataset, covering more than 60 attributes (e.g. (for collecting images, Lidar points, calibration etc.) The object is a plaster reproduction of Temple of the Dioskouroi in Agrigento, Sicily. 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). The evaluation results show that the RGB-D to CAD retrieval problem, while being challenging to solve due to partial and noisy 3D reconstruction, can be addressed to a good extent using deep learning techniques, particularly, convolutional neural networks trained by multi-view and 3D geometry. If you use the ScanNet data or code please cite: The ScanNet data is released under the ScanNet Terms of Use, and the code is released under the MIT license. Contains hand-labelled pixel annotations for 38 groups of images, each group containing a common foreground. We provide code for several scene understanding benchmarks on ScanNet: Train/test splits are given at Tasks/Benchmark. The GaTech VideoSeg dataset consists of two (waterski and yunakim?) See CameraParameterEstimation for details. Dataset contains 1000 images of 100 persons, with 10 images per person and is freely available. The data in ScanNet is organized by RGB-D sequence. This work focuses its attention on depth based semantic per-pixel labelling as a scene understanding problem and shows the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes by carefully synthesizing training data with appropriate noise models. The test sequences provide interested researchers a real-world multi-view test data set captured in the blue-c portals. Please visit our main project repository for more information and access to code, data, and trained models: https://github.com/ScanNet/ScanNet. The contour patches dataset is a large dataset of images patch matches used for contour detection. ScanNet is an RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations. WebUI contains the web-based data management UI used for providing an overview of available scan data and controlling the processing and annotation pipeline. A novel semantic projection block (SP-Block) is proposed to extract deep feature volumes from 2D segments of different views and is fused into deep volumes from a point cloud encoder to make the final semantic segmentation. The Synthetic CAD Models dataset consists of X synthetic CAD models for detection (planar) primitives. The Stanford Background Dataset is a new dataset introduced in Gould et al. Dai, Angela and Chang, Angel X. and Savva, Manolis and Halber, Maciej and Funkhouser, Thomas and Niessner, Matthias video sequences for object segmentation. Scene parsing data and part segmentation data derived from ADE20K dataset could be download from MIT Scene Parsing Benchmark. The Airport MotionSeg dataset contains 12 sequences of videos of an aiprort scenario with small and large moving objects and various speeds. The Stable Structure from Motion datasets due to size limitations cannot put the images online. ScanNet uses the BundleFusion code for reconstruction. Permanently growing database on lung tuberculosis patients. There exist two variants of this dataset - a CVPR 2007 paper [1] by Leibe et al. (ICCV 2009) for evaluating methods for geometric and semantic scene understa JPL First-Person Interaction dataset (JPL-Interaction dataset) is composed of human activity videos taken from a first-person viewpoint. The dataset consist of the about 50 hours obtained from kindergarten surveillance videos. If you use the ScanNet data or code please cite: If you have any questions, please contact us at scannet@googlegroups.com. The SUNCG dataset is a Large 3D Model Repository for Indoor Scenes. SUNCG is an ongoing effort to establish a richly-annotated, large-scale dataset SceneNet RGB-D is dataset comprised of 5 million Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth. The New College Data Set contains 30GB of data intended for use by the mobile robotics and vision research communities. These datasets were generated for the M2CAI challenges, a satellite event of MICCAI 2016 in Athens. Each sequence is stored under a directory with named scene_, or scene%04d_%02d, where each space corresponds to a unique location (0-indexed). The UCF Person and Car VideoSeg dataset consists of six videos with groundtruth for video object segmentation. This work proposes Scan2Cap, an end-to-end trained method to detect objects in the input scene and describe them in natural language, which can effectively localize and describe 3D objects in scenes from the ScanRefer dataset, outperforming 2D baseline methods by a significant margin. More information can be found in our paper. * 30 industry-relevant ob An indoor action recognition dataset which consists of 18 classes performed by 20 individuals. The PubFig database is a large, real-world face dataset consisting of 58,797 images of 200 people collected from the internet. SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places is introduced, and a generalization of bundle adjustment that incorporates object-to-object correspondences is introduced. The UBO 2014 consists of 7 semantic categories. Tools for working with ScanNet data. The Outex dataset is part of a framework for empirical evaluation of texture classification and segmentation algorithms. This paper aims to model the local and global geometric structures of 3D scenes by designing an end-to-end 3D semantic segmentation framework that captures the local geometries from point-level feature learning and voxel-level aggregation, models the global structures via 3D CNN, and enforces label consistency with high-order CRF. In HouseCraft, we utilize rental ads to create realistic textured 3D models of building exteriors. Click on thumbnail for a full-sized (640x480) image. The Buffy dataset contains images selected from the TV series, Buffy: the Vampire Slayer. The FAce Semantic SEGmentation (FASSEG) repository contains datasets for multi-class semantic face segmentation. By clicking accept or continuing to use the site, you agree to the terms outlined in our. 2017 International Conference on 3D Vision (3DV). Code for estimating camera parameters and depth undistortion. The Aspect Layout dataset is designed to allow evaluation of object detection for aspect ratios in perspective images. Scene understanding has (again) become a focus of computer vision research, leveraging advances in detection, context modeling, and tracking. Ground truth database of 50 images with: Data, Segmentation, Labelling - Lasso, Labelling - Rectangle. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval. A dataset of color and depth image pairs, gathered in real domestic and office environments, establishes baseline performance in a PASCAL VOC-style detection task, and suggests two ways that inferred world size of the object may be used to improve detection. Matterport3D: Learning from RGB-D Data in Indoor Environments, Learning to Reconstruct and Understand Indoor Scenes From Sparse Views, End2End Semantic Segmentation for 3D Indoor Scenes, Semantic Dense Reconstruction with Consistent Scene Segments, Learning 3D Semantic Segmentation with only 2D Image Supervision, Multi-sensor large-scale dataset for multi-view 3D reconstruction, ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans, Scan2Cap: Context-aware Dense Captioning in RGB-D Scans, RGB-D to CAD Retrieval with ObjectNN Dataset, CLUBS: An RGB-D dataset with cluttered box scenes containing household objects, SceneNN: A Scene Meshes Dataset with aNNotations, Joint 2D-3D-Semantic Data for Indoor Scene Understanding, SUN RGB-D: A RGB-D scene understanding benchmark suite, SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels, A category-level 3-D object dataset: Putting the Kinect to work, RGB-(D) scene labeling: Features and algorithms, Dense 3D semantic mapping of indoor scenes from RGB-D images, SceneNet: Understanding Real World Indoor Scenes With Synthetic Data, Holistic Scene Understanding for 3D Object Detection with RGBD Cameras, Semantic Scene Completion from a Single Depth Image, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). The Daimler Mono Pedestrian Detection Benchmark dataset contains a large training and test set. The ScanNet project is funded by Google Tango, Intel, NSF (IIS-1251217 and VEC 1539014/1539099), and a Stanford Graduate fellowship. Some datasets and evaluation tools are provided on this page for four different computer vision and computer graphics problems. Image segmentation and boundary detection. Dataset test. We thank Alex Sabia for scanning and verifying annotations, and Halle Pollack, Julian Massarani and Michael Fang fo checking annotations. A 66 stereo pairs dataset with their subpixel ground truths. The MSR Action datasets is a collection of various 3D datasets for action recognition. The datase ChairGest is an open challenge / benchmark. The GaTech VideoContext dataset consists of over 100 groundtruth annotated outdoor videos with over 20000 frames for the task of geometric context eval We introduce the Shelf dataset for multiple human pose estimation from multiple views. 2d annotation projections (*_2d-label.zip, *_2d-instance.zip, *_2d-label-filt.zip, *_2d-instance-filt.zip): Each of the 23 folders contains the video of one registration session. The 1DSfM Landmarks is a collection of community-based image reconstruction by Kyle Wilson and is comprised of 14 datasets with comparison to bundler gr A synthetic light field dataset with 24 scenes. The videos are collected mainly from the BBC Motion Gallery and Getty Imag Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets. The Comprehensive Cars (CompCars) dataset contains data from two scenarios, including images from web-nature and surveillance-nature. The SegTrack dataset consists of six videos (five are used) with ground truth pixelwise segmentation (6th penguin is not usable). Instead here are the tracked image points and the final Large population gait datasets composed of 4,016 subjects. The data set contains 3,425 videos of 1,595 different people. SensReader loads the ScanNet .sens data of compressed RGB-D frames, camera intrinsics and extrinsics, and IMU data. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation.