%0 Journal Article
%A Lee, Tai Sing
%A Stepleton, Tom
%A Potetz, Brian
%A Samonds, Jason M.
%D 2009
%T Neural encoding of scene statistics for surface and object inference
%U https://kilthub.cmu.edu/articles/journal_contribution/Neural_encoding_of_scene_statistics_for_surface_and_object_inference/6607679
%R 10.1184/R1/6607679.v1
%2 https://kilthub.cmu.edu/ndownloader/files/12098231
%K computer sciences
%K Information and Computing Sciences not elsewhere classified
%X Features associated with an object or its surfaces in natural scenes tend to vary coherently
in space and time. In psychological literature, these coherent covariations have been described
as important for neural systems to acquire models of objects and object categories.
From a statistical inference perspective, such coherent covariation can provide a mechanism
to learn statistical priors in natural scenes that are useful for probabilistic inference.
In this article, we present some neurophysiological experimental observations in the early
visual cortex that provide insights into how correlation structures in visual scenes are being
encoded by neuronal tuning and connections among neurons. The key insight is that
correlated structures in visual scenes result in correlated neuronal activities, which shapes
the tuning properties of individual neurons and the connections between them, embedding
Gestalt-related computational constraints or priors for surface inference. Extending these
concepts to the inferotemporal cortex suggests a representational framework that is distinct
from the traditional feed-forward hierarchy of invariant object representation and recognition.
In this framework, lateral connections among view-based neurons, learned from the
temporal association of the object views observed over time, can form a linked graph structure
with local dependency, akin to a dense aspect graph in computer vision. This web-like
graph allows view-invariant object representation to be created using sparse feed-forward
connections, while maintaining the explicit representation of the different views. Thus, it
can serve as an effective prior model for generating predictions of future incoming views to
facilitate object inference.
%I Carnegie Mellon University