Carnegie Mellon University
CMU-CS-18-119.pdf (29.97 MB)

Hand Pose Estimation and Prediction for Virtual Reality Applications

Download (29.97 MB)
posted on 2021-10-22, 19:51 authored by Se-joon ChungSe-joon Chung
Use of hands is the primary way we interact with the world around us. Recent trends in virtual reality (VR) also reflect the importance of interaction with hands. Mainstream virtual reality headsets such as Oculus Rift, HTC Vive, and the Playstation VR all support and encourage the use of their hand tracking controllers. However, tracking hands is very challenging due to their small size and various occlusions. For example, significant portions of the hand can get occluded when people are holding hands together or holding an object. For this reason, makers of VR headsets let their users hold controllers that are more reliably tracked than tracking hands directly. Another problem of hand tracking is that it often adds latency to the system. Furthermore, networked multiplayer interactions are even more challenging to deliver without users noticing delays due to the addition of network delays. In this thesis, we propose ways to overcome the current limitations of hand use in VR by addressing these challenges. To address difficulty of hand tracking, we present a way to estimate the entire hand pose given a few reliably tracked points on the hand. We used a commonly available multi-touch tablet to track the fingertip positions and estimated the entire hand pose from
the tracked fingertip positions using a quadratic encoding method. We show that quadratic encoding method yields smooth motions for smooth changes in the fingertip positions and show some demos of manipulation tasks using
the interface. We also present a way to handle changing number of fingertip contacts and show that we can identify unknown fingertips. To address the latency in hand tracking and multiplayer interactions, we propose a method to augment hand pose prediction with eye tracking which
will be commonly available in the next generation of VR headsets. We will first motivate our approach by presenting our observations of hand-eye coordination. We show that gaze leads grasping actions by an average of 0.726s, leaves the manipulation area as soon as the action is complete, fixates on the tool tips when a tool is held, and sometimes inspects the object without the hand directly interacting with the inspected area. All of the observations will be used to determine how gaze should be used to predict hand pose during different actions. Then, we present a study on predicting grasp types to show that gaze is effective in predicting hand interactions. We found interesting patterns of gaze during bottle grasps before the hand reaches the bottle. We use neural networks to show that these gaze patterns can be learned in order to improve prediction accuracy. Finally, we conclude with application of grasp type prediction in VR and a user study which evaluates the usefulness and quality of hand pose generated based on grasp type prediction. We found similar gaze patterns in VR as in the real-life experiment. The user study shows potential for usefulness of grasp type prediction in VR applications with rooms to improve in the future.




Degree Type

  • Dissertation


  • Computer Science

Degree Name

  • Doctor of Philosophy (PhD)


Nancy S. Pollard

Usage metrics


    Ref. manager