Alva Noë and Evan Thompson(ed): Vision and Mind
- selected readings in the philosophy of perception



Introduction pp 1-6

Alva Noë and Evan Thompson

1. The Orthodox View


The defining problem of traditional visual theory is that of understanding how we come to enjoy such rich, apparently world-representing visual impressions. You open your eyes and you take in an environment of meaningful objects and events and of colors, forms, and movements. What makes such perceptual experience so difficult to explain is the fact — if it is a fact — that when we open our eyes and contemplate a scene, we make no direct contact with that which we seem to see. What is given to us, one might suppose, is not the world itself, but the pattern of light on the retina, and that pattern does not supply enough information to determine how things are in the environment. For example, from the retinal image of a table alone, it may not be possible to tell whether it is large and far away, or small and nearby.

Visual scientists are quick to add that the problem is really even more baffling than we have indicated. The eye is in nearly constant motion; the resolving power (spatial and chromatic) of the retina is limited and non-uniform; passage to the retina is blocked by blood vessels and nerve fibers; there is a large “blind spot” on the retina where there are no photoreceptors; there are two retinal images, each of which is upside down. Given this impoverished basis, how do we manage to enjoy such richly detailed visual experiences of the environment? The central puzzle for traditional visual science has been to explain how the brain bridges the gap between what is given to the visual system and what is actually experienced by the perceiver.

In the face of this puzzle, an orthodox or “Establishment View” of perception has taken shape over the last fifty years. According to this orthodoxy, perception is a process whereby the brain, or a functionally dedicated subsystem of the brain, builds up representations of relevant features of the environment on the basis of information encoded by the sensory receptors. As David Marr surmises: “Vision is the process of discovering from images what is present in the world, and where it is.” Because the patterns on the retina are not sufficient by themselves to determine the layout of the surrounding environment, perception must be thought of as a process of inductive inference. Perceptions are hypotheses concerning the distal causes of proximal stimulation. In the famous phrase of Helmholtz, percep-tion is unconscious inference.

The orthodox view, in its modern computational form, treats perception as a “subpersonal” process carried out by functional subsystems or modules instantiated in the person's or animal's brain. For this reason, among others, it is often held that much of perception—specifically “early vision,” in which a model of the surface layout is supposed to be produced — is “cognitively impenetrable,” that is, impervious to the direct influence of cognition or thought. In other words, the beliefs and expectations of the perceiver are thought to have no influence on the character of the subpersonal computations that constitute perception. Thus, on the orthodox approach, perception is thought-independent.

Most adherents of the orthodox view also believe that for every conscious perceptual state of the subject, a particular set of neurons exists whose activities are sufficient, as a matter of scientific law, for the occurrence of that state. Davida Teller calls such neurons “the bridge locus” of visual perception; others call them the “neural correlate of consciousness” for visual perception. According to this viewpoint, to suppose that there is no bridge locus or neural correlate of consciousness would be to give up all hope of securing a scientific explanation of perceptual experience.

2. Heterodox Views


Although the orthodox view has dominated perceptual psychology, visual neuro-science, and artificial vision and robotics, important alternative research programs have existed for many decades. Collectively these alternatives constitute a significant heterodoxy in visual science (and cognitive science more generally), one whose influence seems to be felt increasingly in mainstream cognitive science and philosophy. Important differences exist among these alternative research programs, but what unites them is their convergence on certain fundamental criticisms of the orthodox view and their insistence on the inseparability of perception and action.

The Ecological Approach
The theoretical and empirical research on vision undertaken by the perceptual psychologist James J. Gibson marks an important break with the orthodox view. Perception, Gibson argues, is not an occurrence that takes place in the brain of the perceiver, but rather is an act of the whole animal, the act of perceptually guided exploration of the environment. One misdescribes vision if one thinks of it as a subpersonal process whereby the brain builds up an internal model of the environment on the basis of impoverished sensory images. Such a conception of vision is pitched at the wrong level, namely, that of the internal enabling conditions for vision rather than that of vision itself as an achievement of the whole animal. Put another way, the function of vision is to keep the perceiver in touch with the environment and to guide action, not to produce inner experiences and representations.

According to this animal-level account, the information directly available to the perceiver in vision is not to be found in the pattern of irradiation on the retinal surface, but rather in the world or environment that the animal itself explores. In other words, Gibson denies the assumption of the orthodox view — and of representational theories in general — that one makes no direct contact with that which one sees. For Gibson, perception is direct: It is not mediated by sensations or images that serve as the basis for reconstructing a representation of the things that we see. Perception, one might say, is direct inspection, not re-presentation. If perception does not operate according to mechanisms of inferential reconstruction on the basis of internal representations, then how does it operate, according to Gibson? The central working hypothesis of this ecological approach is that the perceiver makes direct contact with the environment thanks to the animal's sensitivity to invariant structures in the ambient light.

Two points are important here. First, perception is active: the animal moves its eyes, head, and body to scan the layout visually, while simultaneously moving through the environment. Thus visual perception occurs not as a series of snapshots corresponding to stationary retinal images, but as a dynamic visual flow. Second, there are lawful correlations between the structure of this flow and visible properties of the environment. Because perceivers are implicitly familiar with these lawful correlations, they are able to “pick up” content from the environment as specified in the light without having to reconstruct the environment from impoverished images through information processing.

The ecological approach remains highly controversial. Perhaps the most well-known criticism is that of Jerry A. Fodor and Zenon W. Pylyshyn. They defend the Establishment View, and they insist that Gibson failed to make a serious break with this view. At the end of the day, they suggest, the only significant contact one makes with the world in perception is through the stimulation of one's sensory receptors by patterns of energy. Perception, therefore, must be indirect: It must be a process of representation on the basis of that peripheral sensory contact. According to this way of thinking, perception remains, from a scientific viewpoint, a sub-personal process of computational representation, and accordingly is not usefully thought of as an animal-level achievement. On the other hand, John McDowell scrutinizes the conceptual and epistemological coherence of this Establishment position in the context of philosophical issues about the content of perceptual experience and knowledge; he argues that the nature of perception will continue to be misunderstood as long as perception is cast as an internal, subper-sonal process.

The Enactive Approach
Another alternative approach to perception has emerged from the work of the neuroscientists Humberto R. Maturana and Francisco J. Varela. They argue that it is a mistake to think of the nervous system as an input-output system that encodes an internal representation of the outside world. Rather than representing an independent, external world, the nervous system generates or brings forth, on the basis of its own self-organized activity, the perceptuo-motor domain of the animal. On the basis of this reappraisal, Varela has presented an enactive approach to perception, as one component of a comprehensive enactive or embodied view in cognitive science. According to this view, meaningful perceptual items, rather than being internally represented in the form of a world-model inside the head, are enacted or brought forth as a result of the structural coupling of the organism and its environment. A good example of the enactive approach is the account of color vision provided by Evan Thompson, Adrian Palacios, and Francisco J. Varela. They reject the orthodox view, as exemplified in computational color vision research and functionalist philosophy of mind, according to which the function of color vision is to recover from the retinal image reliable estimates of the invariant distal property of surface spectral reflectance (the percentage of light at each wavelength that a surface reflects). On the basis of cross-species comparisons of color vision, Thompson, Palacios, and Varela argue that different animals have different phenomenal color spaces, and that color vision does not have the function of detecting any single type of environmental property. They then use these arguments to motivate an enactive account of color, according to which color properties are enacted by the perceptuo-motor coupling of animals with their environments.

Animate Vision
The research program of animate vision has emerged at the interface of computational vision, artificial intelligence, and robotics. Instead of abstracting perceptual processes from their bodily context, animate vision proposes what Ballard calls a distinct embodiment level of explanation, which specifies how the facts of sensorimotor embodiment shape perception. For example, the orthodox view starts from the abstraction of a stationary retinal image and asks how the visual system manages to derive a model of the objective world; in so doing, it decomposes visual processes into modules that are passive in the sense of not being interconnected with motor processes. Animate vision, however, starts from the sensorimotor cycles of saccadic eye movement and gaze fixation, and asks how the perceiver is able to fixate points in the environment; in so doing, it decomposes visual processes into visuomotor modules that guide action and exploration. Such an embodied, action-based analysis reduces the need for certain kinds of representations in vision, in particular for an online, moment-to-moment, detailed world-model.

The Sensorimotor Contingency Theory
According to this theory, put forward by J. Kevin O'Regan and Alva Noë, it is a mistake to think of vision as a process taking place in the brain. Although the brain is necessary for vision, neural processes are not, in themselves, sufficient to produce seeing. Instead, seeing is an exploratory activity mediated by the animal's mastery of sensorimotor contingencies. That is, seeing is a skill-based activity of environmental exploration. Visual experience is not something that happens in an individual. It is something he or she does. This sensorimotor conception forms the basis of Noë and O'Regan's challenge to the wide-spread view, articulated by Teller and Chalmers, that the content of visual experience is represented at some specific stage of neural processing (the “bridge locus” or the “neural correlate of consciousness”).





Dana H. Ballard
On the Function of Visual Representation p.477


A number of different avenues of investigation have suggested that much of the basic phenomenology of vision can be traced to “a picture in the head,” retinotopic structures that represent phenomena of the world in a more or less veridical sense. While it is so far impossible to disprove this literal interpretation of experience, there are a variety of new results that challenge this literalism. These new results can be integrated under the heading of functionalism. Focussing on the tasks that humans have to do rather than subjective experiences, functionalism provides a new way of evaluating experimental data. In addition, the behaviour needs to be inter-preted in terms of a human model that is both sufficiently abstract to suppress con-founding neural-level detail, yet sufficiently concrete to include the crucial body mechanisms that mediate perception and action. In regard to any dichotomy of putative brain function it is always possible that the brain can do both. In this case it is possible that the brain can simultaneously represent the visual invariants that correspond to the “perception” of seeing and the invariants that correspond to functional programs. However, the intent of this paper is to suggest that many of the phenomena that might have previously gar-nered a literal interpretation can be more succinctly explained by functional models, even though these functional models may challenge many of our intuitions about perception.




Daniel C. Dennett
Seeing Is Believing—Or Is It? p.485


The distinction is not just semantic. Perceptual and conceptual representations are probably generated in separate regions of the brain and may be processed in very different ways. Just what contrast is there between perceptual and conceptual? Is it a difference in degree or kind, and is there a sharp dis-continuity in the normal progression of perceptual processes? If there is, then one—and only one—of the following is the right thing to say. Which is it to be?

1. Seeing is believing.
My belief that I see such-and-such details in the photograph in my hand is a perceptual state, not an inferential state. I do, after all, see those details to be there. A visually induced belief to the effect that all those details are there just is the perception!

2. Seeing causes (or grounds) believing.
My belief that I see such-and-such details in the photograph in my hand is an inferential, non-perceptual state. It is, after all, merely a belief—a state that must be inferred from a perceptual state of actually seeing those details.

Neither, I will argue, is the right thing to say. To see why, we should consider a slightly different question, which Ramachandran goes on to ask: “How rich is the perceptual representation corresponding to the blind spot?” Answers to that emi-nently investigatable question are simply neutral with regard to the presumed con-troversy between (1) and (2). One of the reasons people tend to see a contrast between (1) and (2) is that they tend to think of perceptual states as somehow much richer in content than mere belief states. (After all, perceptions are like pictures, beliefs are like sentences, and a picture's worth a thousand words.) But these are spurious connotations. There is no upper bound on the richness of content of a proposition. So it would be a confusion—a simple but ubiquitous confusion—to suppose that since a perceptual state has such-and-such richness, it cannot be a propositional state, but must be a perceptual state (whatever that might be) instead. No sane participant in the debates would claim that the product of perception was either literally a picture in the head or literally a sentence in the head. Both ways of talking are reckoned as metaphors, with strengths and shortcomings. Speaking, as Kinsbourne and I have done, of the Multiple Drafts Model of con-sciousness leans in the direction of the sentence metaphor, in the direction of a lan-guage of thought. (After all, those drafts must all be written, mustn't they?)