rbonatti_phd_robotics_2021.pdf (70.78 MB)
Download file

Active Vision: Autonomous Aerial Cinematography with Learned Artistic Decision-Making

Download (70.78 MB)
posted on 04.11.2021, 21:10 by Rogerio BonattiRogerio Bonatti
Aerial cinematography is revolutionizing industries that require live and dynamic camera viewpoints such as entertainment, sports, and security. Fundamentally, it is a tool with immense potential to improve human creativity, expressiveness, and sharing of experiences. However, safely piloting a drone while filming a moving target in the presence of obstacles is immensely taxing, often requiring multiple highly trained human operators to safely control
a single vehicle. Our research focus is to build autonomous systems that can empower any individual with the full artistic capabilities of aerial cameras. We develop a system for active vision: in other words, one that not only passively processes the incoming sensor feed, but on the contrary, actively reasons about the cinematographic quality of viewpoints and safely generates sequences of shots. The theory and systems developed in this work can impact
video generation for both real-world and simulated environments, such as professional and amateur movie-making, videogames, and virtual reality. First, we formalize the theory behind the aerial filming problem by incorporating cinematography guidelines into robot motion planning. We describe the problem in terms of its principal cost functions, and develop an efficient trajectory optimization framework for executing arbitrary types of shots while avoiding collisions and occlusions with obstacles. Second, we propose and develop a system for aerial cinematography in the wild. We combine several components into a real-time framework: vision-based target estimation, 3D signed-distance mapping for collision and occlusion avoidance, and trajectory optimization for camera motion. We extensively evaluate our system both in simulation and in field
experiments by filming dynamic targets moving through unstructured environments. Third, we take a step towards learning the intangible art of cinematography. We all know a
good clip when we see it - but we cannot yet objectively specify a formula. We propose the use of deep reinforcement learning with a human evaluator in the loop to guide the selection of artistic shots, and show that the learned policies can incorporate intuitive concepts of
human aesthetics. Next, we develop novel data-driven framework to enable direct user control of camera positioning parameters in an intuitive learned semantic space (e.g. calm, enjoyable, establishing), and show its effectiveness in a series of user studies. Lastly, we take the first steps towards the concept of multi-camera collaboration for filming. The use of multiple simultaneous viewpoints is necessary when capturing real-world scenes such as sports or social events. In these situations it is difficult to capture the optimal viewpoint at all times employing a single aerial camera, specially because the events cannot be reenacted for additional takes. Here, we design motion planning algorithms for multicamera cinematography that are able to maximize the quality of multiple video streams simultaneously using limited onboard resources.




Degree Type



Robotics Institute

Degree Name

  • Doctor of Philosophy (PhD)


Sebastian Scherer

Usage metrics