Humans In Their Natural Habitat: Training AI to Understand People
thesisposted on 21.10.2020, 20:15 by Gunnar SigurdssonGunnar Sigurdsson
Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision
methods need to be trained from real and diverse examples of our daily dynamic scenes. First, we need to give computers insight into our world, and our daily lives. Not just through the charade the we present to the world on social media, but through a genuine look at the most boring, mundane, routine aspects of our lives. But how do we model this data? How do we model information over time? How do we harness the richness and complexity of this data to enable understanding? To provide a lens through which to look at humans in their mundane lives, we explored
techniques for crowdsourcing the creation of this data from hundreds of people in their own homes, and analyzed how humans think about activities along with the best strategies
for annotating complex data of this nature. Given this insight into human behaviour, we can start understanding where other vision techniques have trouble, understand how to
improve them, and which venues are most promising moving forward. Once we have this kind of data, we can start building algorithms that harness the unique aspects of this data by learning how human activities change over time, and what activities occur with a recognizable temporal structure. We can harness the data to learn how complete human events generally unfold, such as a snowboarding trip, and apply these models to applied problems such as summarizing photo albums. Finally, we combine ideas from our work to demonstrate how these techniques can be used to collect data and modeling human activities from first and third-person at the same time, and unsupervised concept learning from web videos. We hope this kind of realistic bias may provide new insights that aid robots equipped with our computer vision models operating in the real world.
- Doctor of Philosophy (PhD)