datasetposted on 27.06.2019 by Evgeny Toropov, Jose Moura
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
CADillac dataset is a collection of over 1,000 high-quality 3D models of vehicles. The primary goal is to render these 3D models into photo-realistic images, that, in turn, can be used to train machine learning models for car detection.
Each of the CAD model comes in .blend format (native format of Blender renderer.) CAD models are centered in the 3D scene and properly sized. They are complemented with meta-information about the car: the dimensions, color, type, domain, and more. Meta-information is stored in file "collection_v2.json". Please see README.txt for the detailed description of the dataset.
The dataset is complemented with the code available at https://github.com/kukuruza/CADillac. This code allows to (1) view and change CAD models in the dataset and (2) render CAD models to create virtual images. Post further question in the Github repository.
CAD models were originally collected from 3DWarehouse repository . 3DWarehouse gave us the permission to change and distribute the models as a part of these dataset.
One similar dataset we are aware of is Carla . It contains a set of 3D models with a texture bank. The total number of differing combinations of cars and textures Carla is able to generate is of the order of thousand(s). CADillac adds the power of crowdsourcing -- our dataset includes rare cases, such as military and fictional cars, fancy car tuning, antique cars, a variety of trucks and emergency vehicles and more.
We collected CAD vehicle models at 3DWarehouse  file sharing platform specialized at 3D CAD models. 3DWarehouse generously granted us the permission to use the data for the project. Models are contributed by individual artists and are accompanied by the name and the description. Fortunately, there are many detailed and high-quality models of vehicles, often organized in collections. We download individual collections automatically using the Selenium  web-browser automation tool.
After the models were downloaded, we manually classified them by type (passenger, bus, truck, van, and bike), by domain (general, emergency, military, and fiction), and by color (white, black, gray, red, yellow, blue, green, and mixed). Additionally, we manually extracted the information about car make, model, and year from the original model name and description, when such information is available. Using this information, we further searched for physical dimensions of the models using the CarQueryApi database . In particular, we retrieved vehicle length, width, height, and wheel base. We found the dimensions of the roughly 70% of the models in the dataset, while the missing 30% either do not have a proper description, or are missing from the CarQueryApi database.
After that, CAD models were post-processed. Each model was oriented along the X- axis, centered, and scaled to match the information from CarQueryApi. We found that the wheel base is the most robust property to match, since length, width, and height of a CAD model may be affected by bumper bars, side mirrors, taxi checkers or emergency light bar, and so on.
CAD models originally come in format skp, which is the native format of Sketchup  3D modelling software. They are converted to Blender  native format blend. Due to partial incompatibility of the two formats, some models get triangular artifacts on some surfaces, while others get matte gray artificial looking glass surfaces. These appearance problems are recorded in "collection_v2.json".
The authors thank Spandan Gandhi and Ekaterina Toropova for their help in preparing the dataset.