Carnegie Mellon University
LisboaDeMenezesFalcao_cmu_0041E_10770.pdf (17.4 MB)

Human Object Ownership Tracking in Autonomous Retail

Download (17.4 MB)
posted on 2022-11-16, 22:51 authored by João Diogo Lisboa De Menezes Falcão

In retail 85% of sales occur in physical stores. In the U.S. alone, people spend roughly 37 billion hours each year waiting in line in physical stores. This leads to 4.4 billion potential work productive hours lost or comparatively 4.4 billion hours of leisure, rest or time with your loved ones lost. Autonomous stores can remove customer waiting time by providing a receipt without the need for scanning the items. Autonomous stores in the grocery sector can further serve locations, so called ’food deserts’, that would otherwise not have access to grocery stores. This is done by reducing the physical size of these stores while still maintaining the commercial opportunity through automation.

Understanding physical object ownership transfer is a key element of physical commerce, and is central to the understanding of when and how people grab products off of a store. For a machine to understand this it not only needs to sense and identify individual objects in a constraint physical space but also how their ownership changes over time. Humans are often at the center of such transfers and detecting and characterizing human object ownership over time opens the possibility for multiple applications to improve through automation. Applications such as inventory counting, surveillance, supply chain management, inventory management, and checkout-free retail all benefit from the ability to understand human object ownership over time by allowing automatic decisions to be made. In this thesis I will use the autonomous retail as a guiding application to demonstrate the applicability of this approach.

Approaches such as using manual intervention (e.g. cashiers at a supermarket), on-object sensing (e.g. RFID tags), contextual modeling through vision only or combining computer vision with other sensors can be used in applications that require the understanding of object ownership transfer, these however require directly, or indirectly, a large amount of human labor making it impractical to scale in real-world applications. Furthermore, the general low accuracy and throughput of these approaches further hinders their applicability in a broader realworld context.

This work explores the automatic tracking of human object ownership over time. In this thesis I propose a framework for detecting and characterizing human object ownership and introduce a method where physical context of the application is combined with the available sensing modalities. By modeling the static physical context (e.g. location, appearance, weight, volume), dynamic physical context (e.g. human motion, temporal-spatial proximity) and the physical relations (e.g. historical ownership, relative motion) between objects and people, sensing data can be combined with such context to enhance the detection and tracking of object ownership changes. 

Resulting from the combination of physical context modeling with the multiple sensing modalities, this approach significantly reduces the computation requirements while maintaining high accuracies which in turn enables the scale requirements of a real-world deployment. This method is validated across several retail applications, which due to its trading nature include a rich set of annotated ownership transactions. In the context of inventory monitoring our tracking approach achieved 92.6% item identification accuracy, a 2x reduction in error compared to the 86% accuracy reported for self-checkout stations. For autonomous retail stores, we maintained an average of up to 96.4% receipt accuracy over 1 year of operation, across over 65,000 transactions with a total of 1653 total different products being sold.




Degree Type

  • Dissertation


  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)


Pei Zhang

Usage metrics