Sensory Substitution Coupling the Required Information to the Receiving Modality

Coupling the source information to the receiving modality actually involves two different issues: sensory bandwidth and the specificity of higher-level representation. After research has determined the information needed to perform a task, it must be determined whether the sensory bandwidth of the receiving modality is adequate to receive this information. Consider the idea of using the tactile sense to substitute for vision in the control of locomotion, such as driving. Physiological and psychophysical research reveals that the sensory bandwidth of vision is much greater than the bandwidth of the tactile sense for any circumscribed region of the skin (Loomis and Lederman 1986). Thus, regardless of how optical information is transformed for display onto the skin, it seems unlikely that the bandwidth of tactile processing is adequate to allow touch to substitute for this particular function. In contrast, other simpler functions, such as detecting the presence of a bright flashing alarm signal, can be feasibly accomplished using tactile substitution of vision.

Even if the receiving modality has adequate sensory bandwidth to accommodate the source information, this is no guarantee that sensory substitution will be successful, because the higher-level processes of vision, hearing, and touch are highly specialized for the information that typically comes through those modalities. A nice example of this is the difficulty of using vision to substitute for hearing in deaf people. Even though vision has greater sensory bandwidth than hearing, there is yet no successful way of using vision to substitute for hearing in the reception of the raw acoustic signal (in contrast to sign language, which involves the production of visual symbols by the speaker). Evidence of this is the enormous challenge in deciphering an utterance represented by a speech spectrogram. There is the celebrated case of Victor Zue, an engineering professor who is able to translate visual speech spectrograms into their linguistic descriptions. Although his skill is an impressive accomplishment, the important point here is that enormous effort is required to learn this skill, and decoding a spectrogram of a short utterance is very time-consuming. Thus, the difficulty of visually interpreting the acoustic speech signal suggests that presenting an isomorphic representation of the acoustic speech signal does not engage the visual system in a way that facilitates speech processing.

Presumably there are specialized mechanisms in the brain for extracting the invariant aspects of the acoustic signal; these invariant aspects are probably articulatory features, which bear a closer correspondence with the intended message. Evidence for this view is the relative success of the Tadoma method of speech reception (Reed et al. 1992). Some deaf-blind individuals are able to receive spoken utterances at nearly normal speech rates by placing a hand on the speaker's face. This direct contact with articulatory features is presumably what allows the sense of touch to substitute more effectively than visual reception of an isomorphic representation of the speech signal, despite the fact that touch has less sensory bandwidth than vision (Reed et al. 1992).

Although we now understand a great deal about the sensory processing of visual, auditory, and haptic perception, we still have much to learn about the perceptual/cognitive representations of the external world created by each of these senses and the cortical mechanisms that underlie these representations. Research in cognitive science and neuroscience will produce major advances in the understanding of these topics in the near future. Even now, we can identify some important research themes that are relevant to the issue of coupling information normally sensed by the impaired modality with the processing characteristics of the receiving modality.

Achieving Sensory Substitution Through Abstract Meaning

Prior to the widespread availability of digital computers, the primary approach to sensory substitution using electronic devices was to use analog hardware to map optical or acoustic information into one or isomorphic dimensions of the receiving modality (e.g., using video to sense print or other high contrast 2-D images and then displaying isomorphic tactile images onto the skin surface). The advent of the digital computer has changed all this, for it allows a great deal of signal processing of the source information prior to its display to the receiving modality. There is no longer the requirement that the displayed information be isomorphic to the information being sensed. Taken to the extreme, the computer can use artificial intelligence algorithms to extract the "meaning" of the optical, acoustic, or other information needed for performance of the desired function and then display this meaning by way of speech or abstract symbols.

One of the great success stories in sensory substitution is the development of text-to-speech devices for the visually impaired (Kurzweil 1989). Here, printed text is converted by optical character recognition into electronic text, which is then displayed to the user as synthesized speech. In a similar vein, automatic speech recognition and the visual display of text may someday provide deaf people with immediate access to the speech of any desired interactant. One can also imagine that artificial intelligence may someday provide visually impaired people with detailed verbal descriptions of objects and their layout in the surrounding environment. However, because inculcating such intelligence into machines has proven far more challenging than was imagined several decades ago, exploiting the intelligence of human users in the interpretation of sensory information will continue to be an important approach to sensory substitution. The remaining research themes deal with this more common approach.

Amodal Representations

For 3-D space perception (e.g., perception of distance) and spatial cognition (e.g., large-scale navigation), it is quite likely that vision, hearing, and touch all feed into a common area of the brain, like the parietal cortex, with the result that the perceptual representations created by these three modalities give rise to amodal representations. Thus, seeing an object, hearing it, or feeling it with a stick, may all result in the same abstract spatial representation of its location, provided that its perceived location is the same for the three senses. Once an amodal representation has been created, it then might be used to guide action or cognition in a manner that is independent of the sensory modality that gave rise to it (Loomis et al. 2002). To the extent that two sensory modalities do result in shared amodal representations, there is immediate potential for one modality substituting for the other with respect to functions that rely on the amodal representations. Indeed, as mentioned at the outset of this chapter, natural sensory substitution (using touch to find objects when vision is impaired) exploits this very fact. Clearly, however, an amodal representation of spatial layout derived from hearing may lack the detail and precision of one derived from vision because the initial perceptual representations differ in the same way as they do in natural sensory substitution.

Intermodal Equivalence: Isomorphic Perceptual Representations

Another natural basis for sensory substitution is isomorphism of the perceptual representations created by two senses. Under a range of conditions, visual and haptic perception result in nearly isomorphic perceptual representations of 2-D and 3-D shape (Klatzky et al. 1993; Lakatos and Marks 1999; Loomis 1990; Loomis et al. 1991). The similar perceptual representations are probably the basis both for cross-modal integration, where two senses cooperate in sensing spatial features of an object (Ernst et al. 2001; Ernst and Banks 2002; Heller et al. 1999), and for the ease with which subjects can perform cross-modal matching, that is, feeling an object and then recognizing it visually (Abravanel 1971; Davidson et al. 1974). However, there are interesting differences between the visual and haptic representations of objects (e.g., Newell et al. 2001), differences that probably limit the degree of cross-modal transfer and integration. Although the literature on cross-modal integration and transfer involving vision, hearing, and touch goes back years, this is a topic that is receiving renewed attention (some key references: Ernst and Banks 2002; Driver and Spence 1999; Heller et al. 1999; Martino and Marks 2000; Massaro and Cohen 2000; Welch and Warren 1980).


For a few rare individuals, synesthesia is a strong correlation between perceptual dimensions or features in one sensory modality with perceptual dimensions or features in another (Harrison and Baron-Cohen 1997; Martino and Marks 2001). For example, such an individual may imagine certain colors when hearing certain pitches, may see different letters as different colors, or may associate tactual textures with voices. Strong synesthesia in a few rare individuals cannot be the basis for sensory substitution; however, much milder forms in the larger population, indicating reliable associations between intermodal dimensions that may be the basis for cross-modal transfer (Martino and Marks 2000), might be exploited to produce more compatible mappings between the impaired and substiting modalities. For example, Meijer (1992) has developed a device that uses hearing to substitute for vision. Because the natural correspondence between pitch and elevation is space (e.g., high-pitched tones are associated with higher elevation), the device uses the pitch of a pure tone to represent the vertical dimension of a graph or picture. The horizontal dimension of a graph or picture is represented by time. Thus, a graph portraying a 45° diagonal straight line is experienced as a tone of increasing pitch as a function of time. Apparently, this device is successful for conveying simple 2-D patterns and graphs. However, it would seem that images of complex natural scenes would result in a cacophony of sound that would be difficult to interpret.

Multimodal Sensory Substitution

The discussion of sensory substitution so far has assumed that the source information needed to perform a function or functions is displayed to a single receiving modality, but clearly there may be value in using multiple receiving modalities. A nice example is the idea of using speech and audible signals together with force feedback and vibrotactile stimulation from a haptic mouse to allow visually impaired people to access information about 2-D graphs, maps, and pictures (Golledge 2002, this volume). Another aid for visually impaired people is the "Talking Signs" system of electronic signage (Crandall et al. 1993), which includes transmitters located at points of interest in the environment that transmit infrared signals carrying speech information about the points of interest. The user holds a small receiver in the hand that receives the infrared signal when pointed in the direction of the transmitter; the receiver then displays the speech utterance by means of a speaker or earphone. In order to localize the transmitter, the user rotates the receiver in the hand until receiving the maximum signal strength; thus, haptic information is used to orient toward the transmitter, and speech information conveys the identity of the point of interest.

Rote Learning Through Extensive Exposure

Even when there is neither the possibility of extracting meaning using artificial intelligence algorithms nor the possibility of mapping the source information in a natural way onto the receiving modality, effective sensory substitution is not completely ruled out. Because human beings, especially when they are young, have a large capacity for learning complex skills, there is always the possibility that they can learn mappings between two sensory modalities that differ greatly in their higher-level interpretative mechanisms (e.g., use of vision to apprehend complex auditory signals or of hearing to apprehend complex 2-D spatial images). As mentioned earlier, Meijer (1992) has developed a device (The vOICe) that converts 2-D spatial images into time-varying auditory signals. While based on the natural correspondence between pitch and height in a 2-D figure, it seems unlikely that the higherlevel interpretive mechanisms of hearing are suited to handling complex 2-D spatial images usually associated with vision. Still, it is possible that if such a device were used by a blind person from very early in life, the person might develop the equivalent of rudimentary vision. On the other hand, the previously discussed example of the difficulty of visually interpreting speech spectrograms is a good reason not to base one's hope too much on this capacity for learning.

Brain Mechanisms Underlying Sensory Substitution and Cross-Modal Transfer

In connection with his seminal work with the Tactile Vision Substitution System, which used a video camera to drive an electrotactile display, Bach-y-Rita (1967, 1972) speculated that the functional substitution of vision by touch actually involved a reorganization of the brain, whereby the incoming somatosensory input came to be linked to and analyzed by visual cortical areas. Though a radical idea at the time, it has recently received confirmation by a variety of studies involving brain imaging and transcranial magnetic stimulation (TMS). For example, research has shown that (1) the visual cortex of skilled blind readers of braille is activated when they are reading braille (Sadata et al. 1996), (2) TMS delivered to the visual cortex can interfere with the perception of braille in similar subjects (Cohen et al. 1997), and (3) that the visual signals of American Sign Language activate the speech areas of deaf subjects (Neville et al. 1998).

0 0

Post a comment