Tech Report: "What" and "Where" Networks for Visual Recognition

Rajesh Rao (rao@cs.rochester.edu)
Mon, 26 May 1997 17:54:14 -0400

The following technical report on a model of simultaneous recognition
("what") and pose estimation ("where") is available on the WWW page:
http://www.cs.rochester.edu/u/rao/
or via anonymous ftp (see instructions below).

Comments and suggestions welcome (This message has been cross-posted -
my apologies to those who receive it more than once).

-- 
Rajesh Rao                       Internet: rao@cs.rochester.edu
Dept. of Computer Science	 VOX:  (716) 275-2527              
University of Rochester          FAX:  (716) 461-2018
Rochester  NY  14627-0226        WWW:  http://www.cs.rochester.edu/u/rao/

===========================================================================

Localized Receptive Fields May Mediate Transformation-Invariant Recognition in the Visual Cortex

Rajesh P.N. Rao and Dana H. Ballard

Technical Report 97.2 National Resource Laboratory for the Study of Brain and Behavior Department of Computer Science, University of Rochester May 1997

Neurons in the visual cortex are known to possess localized, oriented receptive fields. It has previously been suggested that these distinctive properties may reflect an efficient image encoding strategy based on maximizing the sparseness of the distribution of output neuronal activities or alternately, extracting the independent components of natural image ensembles. Here, we show that a relatively simple neural solution to the problem of transformation-invariant visual recognition also causes localized, oriented receptive fields to be learned from natural images. These receptive fields, which code for various transformations in the image plane, allow a pair of cooperating neural networks, one estimating object identity (``what'') and the other estimating object transformations (``where''), to simultaneously recognize an object and estimate its pose by jointly maximizing the a posteriori probability of generating the observed visual data. We provide experimental results demonstrating the ability of these networks to factor retinal stimuli into object-centered features and object-invariant transformations. The resulting neuronal architecture suggests concrete computational roles for the neuroanatomical connections known to exist between the dorsal and ventral visual pathways.

Retrieval information:

FTP-host: ftp.cs.rochester.edu FTP-pathname: /pub/u/rao/papers/local.ps.Z WWW URL: http://www.cs.rochester.edu/u/rao/

9 pages; 229K compressed.

==========================================================================

Anonymous ftp instructions:

>ftp ftp.cs.rochester.edu Connected to anon.cs.rochester.edu. 220 anon.cs.rochester.edu FTP server (Version wu-2.4(3)) ready.

Name: [type 'anonymous' here] 331 Guest login ok, send your complete e-mail address as password.

Password: [type your e-mail address here]

ftp> cd /pub/u/rao/papers/ ftp> get local.ps ftp> bye --------------------------------------------------------------------------