Motion of glossy objects does not promote separation of lighting and surface colour
The surface properties of an object, such as texture, glossiness or colour, provide important cues to its identity. However, the actual visual stimulus received by the eye is determined by both the properties of the object and the illumination. We tested whether operational colour constancy for glossy objects (the ability to distinguish changes in spectral reflectance of the object, from changes in the spectrum of the illumination) was affected by rotational motion of either the object or the light source. The different chromatic and geometric properties of the specular and diffuse reflections provide the basis for this discrimination, and we systematically varied specularity to control the available information. Observers viewed animations of isolated objects undergoing either lighting or surface-based spectral transformations accompanied by motion. By varying the axis of rotation, and surface patterning or geometry, we manipulated: (i) motion-related information about the scene, (ii) relative motion between the surface patterning and the specular reflection of the lighting, and (iii) image disruption caused by this motion. Despite large individual differences in performance with static stimuli, motion manipulations neither improved nor degraded performance. As motion significantly disrupts frame-by-frame low-level image statistics, we infer that operational constancy depends on a high-level scene interpretation, which is maintained in all conditions.
1. Introduction
For our sense of vision to support our daily interaction with objects of the physical world, we must distinguish between different causes of variation in the retinal image—those that are due to the properties of objects and those that are due to the conditions of observing. As the spectral content of light reflected from an object depends not only on the spectral reflectance of the object but also on the spectral content of the illuminant [1], correctly identifying a particular object—the ripest fruit for example—depends on compensating for any differences in illumination. The image of a glossy surface can exhibit spatial variation in chromaticity owing either to surface patterning or to variation in the light field reflected from the surface. In principle, surfaces could be painted in such a way as to make them appear glossy, or highlights could be misidentified as surface patterning. How might observers disambiguate surface and lighting contributions to the image? One possible cue is that motion of the light sources or the surface will cause relative motion between patterning that belongs to the surface and spatial variation due to lighting. We know that observers’ judgements of glossiness are sensitive to such relative motion cues [2]. In this paper, we test whether operational colour constancy—the ability to distinguish colour changes caused by changes in the spectral reflectance of a surface from those caused by changes in the spectral content of the illuminant [3]—is influenced by relative motion between surfaces and specular reflections.
1.1. Background
Surface colour and surface patterning, dictated by pigmentation, can provide useful information about the identity or state of objects. This information is carried in the diffuse reflectance component of light reflected from the object. However, the vast majority of objects also offer a non-zero specular reflectance from their surface, reflecting features of the environment and light sources without spectral transformation, which can contribute to a glossy appearance [4]. As specular and diffuse reflection components are produced in different ways (see for example [5] and the Ward model [6]), they differ in their spectral and geometric properties, which might allow perceptual separation of these two components of an image. For non-white objects (those with a non-uniform spectral reflectance function), diffuse and specular components differ in their spectral composition. They also exhibit different imaging geometries. Variation owing to surface patterning is usually rigidly attached to the surface, whereas the positions of features in the specular component depend on the spatial arrangement of the surface, the light sources and the viewer. In natural viewing, these geometric relationships are rarely fixed. When judging real objects, the viewer may purposefully manipulate the object [7] and changes in the spectrum of the illumination, owing for example to the distribution of shadows, are typically accompanied by changes in illumination geometry [8]. In the following sections, we summarize the perceptual signals that are available from images of glossy objects, when illuminated by lights of different spectral composition and when stationary or moving, and consider how these signals might enable human observers to identify object properties from image properties.
1.2. Colour signals
Previously we showed that non-zero specularity can support operational colour constancy with single surfaces, and that observers perform better as specularity increases [9]. The signals underpinning this performance derive from the chromatic variations across the image of a surface that are introduced by specular reflections (figure 1). Matte surfaces, with zero speculatrity, reflect only the diffuse component, whose spectral content is a wavelength-by-wavelength multiplication of the surface reflectance function and the illuminant spectrum (I(λ)×R(λ)). Conversely, with high specularity, some regions are almost completely dominated by the specular reflection, which carries the spectral content of the illuminant (I(λ)), and with low specularity, the reflected light, at least for most non-metallic surfaces, is constrained to be a linear mixture of the specular and diffuse components (i.e. aI(λ)+bI(λ)×R(λ), where a and b are scale factors that depend on the imaging geometry). A change in the illuminant will cause the colour of both diffuse and specular components to change similarly, while a change in the surface spectral reflectance will cause the specular highlights to change colour less than the parts of the image that are dominated by diffuse reflection (figure 1). The chromatic signature from a reflectance change is the temporal analogue of chromaticity convergence [11], which has long been identified [12,13] as a cue to the illuminant, because chromaticities in the image of a glossy surface with two or more spectral reflectances lie on lines in colour space that intersect at the illuminant chromaticity. Importantly, chromaticity convergence provides information about the illuminant even at low specularities when the illuminant chromaticity is not directly available in the image. However, chromatic statistics alone are not sufficient to explain operational colour constancy. Lee & Smithson [9] showed that phase-scrambled stimulus images—in which the chromatic statistics were preserved, but the spatial structure of the scene could not be inferred—led to a marked deterioration in performance.
1.3. Interactions between motion, texture, shape and glossiness
Whereas, for diffuse reflections, light is scattered in every direction from the surface, for specular reflections, light is reflected at the same angle from the normal at which it is incident, with usually a smaller amount of scattering that depends on the roughness of the surface. These differences in imaging geometry have important consequences for the relationship between material properties, such as texture (variation or patterning in surface reflectance), colour and glossiness, and the proximal image, and for the transformations imposed by shape and motion. For example, with static stimuli, texture information (conveyed in the diffuse component) gives rise to the compressions in the proximal image that depend on surface orientation (the first derivative of surface depth), whereas specularities generate compressions that depend on surface curvature (the second derivative of surface depth), leading to distinctive orientation fields in the image that can be used to derive shape information [14]. The perception of texture can be reduced if the orientation of lightness changes created by the texture are consistently aligned with shading created by surface relief [15], and the perception of gloss can be reduced if highlights are rotated or translated so that they are inconsistent with the lightness variation in the diffuse reflectance [16].
In general, the virtual image of a light source, a specular highlight, is located in depth behind a glossy convex surface and in front of a concave surface. Human observers are sensitive to the stereoscopic appearance of highlights, which affects both the perception of surface curvature and the appearance of glossiness [17,18,19]. Similarly, relative motion of the surface, lights or viewer, is likely to cause highlights to move relative to the surface, and such motion cues influence judgements of shape [20,21,22,23] and glossiness [24,25,26]. Proximal image features—such as size, contrast, sharpness of highlight regions—can enhance glossiness, regardless of whether it is the surface geometry, light field, or specularity that actually generates the structure [27]. Importantly, three-dimensional shape constraints (conveyed by stereospsis or by shape-from-motion or shape-from-texture) do influence perception of glossiness in stimuli that contain identical luminance gradients [28,29,30]. In a particularly compelling demonstration, Doerschner et al. [2] show that movies displaying standard specular motion of a specular object rotating back and forth promote a glossy appearance, whereas movies in which the reflections were rigidly attached to the surface promote a matte appearance. These studies confirm that information which is not available in a monocular static image can have a strong effect on separating surface and lighting contributions to the proximal image.
Multiple views of light reflected from a surface can provide enhanced information about the geometry of the light sources [31,32]. We might then expect these factors to affect colour constancy, because there is evidence that lightness and colour perception are affected by the observer’s interpretation of the three-dimensional structure of the objects and lighting in a scene [33,32,34,35,36,37,38,9]. Although we are primarily interested in operational colour constancy, the effects of motion on shape estimation are particularly strong, and cues to surface shape may have indirect effects on surface colour perception. Motion of a patterned, matte object can provide rich information about surface shape. In shape-from-motion experiments [39,40,41,42], dots whose motion is consistent with them being attached to a rigid, moving surface, are projected onto a two-dimensional display. Observers integrate the consistent motion of the dots and perceive the three-dimensional surface to which the dots are attached, whereas without the motion they see only randomly positioned dots. When a patterned, glossy surface moves, analogous shape-from-motion signals are carried in the optic flow of the diffuse component of the image. However, specular flow carries information about second-order shape properties [43], and its exact contribution to shape estimation has been debated. While it may disrupt shape estimation in some cases [22], it has also been shown to enhance estimates of three-dimensional curvature, dominating cues from optic flow of surface pattern [44]. Information carried in optic flow of the diffuse component can be dissociated from specular flow by comparing rotations of an object around a vertical axis with rotations around the viewing axis, a manipulation we use in the present experiments. While both rotations preserve the curvature information, only rotation around a vertical axis preserves information conveyed by optic flow about surface slant [45].
There is now strong evidence to suggest that motion influences perceived glossiness. There have, however, been fewer studies testing the effects of motion on perception of other material properties, such as lightness and colour, with glossy objects. Wendt et al. [26] asked participants to match the perceived lightness and glossiness of two surfaces with different shapes, and manipulated cues that may help separate reflectance and shape properties. Information from motion, binocular disparity and colour all improved the constancy of glossiness matches, both in isolation and in combination. Lightness constancy, however, was affected only by motion and colour information, and for motion the effects were counterintuitive, with motion impairing performance. When a surface has a complex shape, highlights appear and disappear or change size and shape as the local orientation of the surface to the observer and light source changes. Such disruption of the image might impair access to surface features signalling lightness, and we might expect it to also impair colour constancy tasks.
1.4. Rationale
Correctly parsing an image to identify the objects and illumination that produced it is a mathematically under-constrained problem. As summarized above, motion of glossy objects has been shown to have a strong impact on their perceived glossiness and their shape, suggesting that observers are sensitive to the geometric differences between diffuse and specular components of an image. For operational colour constancy, relative motion between patterning that belongs to the surface and spatial variation owing to specular reflections might provide a cue to separate the diffuse and specular reflectance components of an image. However, the fine chromatic discriminations that are required in operational colour constancy may be more difficult in the presence of motion that causes the most informative regions of the image (those associated with highlights) to move, at best, or to be scrambled, at worst, making them harder to track.
To assess human observers’ sensitivities to these competing factors, we presented them with animations of isolated glossy objects lit by discrete sources and measured their operational colour constancy, as a function of surface specularity, under different conditions of motion. Stimuli were designed to isolate the specularity cue to operational colour constancy. In a scene comprising many surfaces under one illuminant, there are several image properties—such as the mean chromaticity—that correlate with the illuminant chromaticity. But, for the stimuli used here, illumination and reflectance could be separated only by accessing information conveyed in the specular component of the image. In a factorial design, we compared performance for two types of object, two sources of motion and two axes of motion.
The two types of object were chosen to manipulate the relative patterns of motion of the diffuse and specular components in the image. One object was a smooth sphere, with a surface pattern (marbled); the other object was a bumpy sphere, with no surface pattern (bumpy). As surface patterning and bumpiness were never present on the same object, we do not test interactions between pattern and shading [15]. For the smooth marbled sphere, some motion conditions caused the highlight regions to move, but their spatial layout was never scrambled. Conversely, for the bumpy sphere, all motion conditions caused highlight regions to be scrambled.
We compared constancy performance when stimuli included motion of the object or of the lights. These sources of motion allowed us again to compare different patterns of highlight motion. For the marbled sphere, the two motion conditions had different effects on the highlights: motion of the (radially symmetric) object had no effect on highlight locations, though the surface pattern was displaced; motion of the lights caused the highlight locations to move, though the surface pattern remained stationary. For the bumpy sphere, both motion conditions disrupted the highlight locations and introduced changes in intensity (shading) of the diffuse component.
Finally, we chose to use two different axes of rotation that allowed us to manipulate the shape-from-motion information that was available from the diffuse component of the animation. A vertical axis, perpendicular to the viewing axis, provided typical shape-from-motion information. An axis aligned with the viewing axis was chosen to provide significantly less shape-from-motion information than given by rotation about the vertical axis, because points on the surface of the object move laterally across the image, rather than in depth (e.g. [44]).
We compared performance in these motion conditions with performance in two control conditions. In one control condition, there was no motion (equivalent to experiment 2 in [9]). In the other control condition, both the object and the light sources rotated about the visual axis (VA), equivalent to rotating the image on the screen. Neither control condition included relative motion of the diffuse and specular components of the image, but the second did include motion, and therefore provided a comparison for performance in conditions in which the locations of the most informative regions of the image moved.
2. Material and methods
2.1. Overview
Stimuli were computer-generated animations of a single, spherical object in a void, lit by three small light sources. At any instant, the surface had a single spectral reflectance (patterning for marbled stimuli changed reflectance by a scale factor only) and all three light sources shared the same spectral composition. Over the course of the animation, either the surface spectral reflectance (R(λ)) changed, or the illumination spectral power distribution (I(λ)) changed, but not both. It was the observer’s task to decide which had changed. Objects were rendered with one of five levels of specularity (specifying the proportion of light reflected in the specular component), from perfectly matte (specularity=0) to glossy (specularity=0.1). For matte stimuli, the operational colour constancy task is impossible, but with increasing specularity the signals available to solve the task increase. In different conditions of the experiment, we compare the rates of performance increase with increasing specularity.
2.2. Stimuli
Images were rendered using hyperspectral raytracing with RADIANCE [46] and custom routines. The object was either a sphere whose surface had been modified by displacing the surface depth according to procedural noise (Blender’s marble texture, Blender Foundation, Amsterdam, The Netherlands), or a sphere with reflectance intensity modified by a volumetric turbulence function (RADIANCE’s marble function). We refer to these as ‘bumpy’ and ‘marbled’ spheres, respectively, and they are the same as the objects used in experiment 2 of our previous study [9]. The rendered stimuli were converted to a 14-bit per pixel per channel RGB image for display on a CRT monitor driven by a Cambridge Research Systems (Rochester, UK) ViSaGe. Examples are shown in figure 2.
The light sources were assigned spectral power distributions (I(λ)) of either sunlight or skylight [47] and, in any one frame, all three sources had the same spectral distribution. The reflectance of the surface was specified with a spectral reflectance function (R(λ)) and a specularity value. Spectral reflectances were selected from a database of measurements of natural and man-made surfaces [48,49,50,51] (also see spectra.csv in the electronic supplementary material). As we used only two illuminants, an arbitrary selection of spectral reflectances would permit above chance performance in operational colour constancy based solely on the direction and magnitude of chromatic change of the diffuse component of the image: changes that were not aligned to the blue-yellow direction would be more likely to be surface changes. Therefore, to silence this cue, pairs of reflectances were selected such that the distributions of the direction and magnitude of chromatic change of the diffuse component were matched for the surface-change and illuminant-change trials.
In any given trial, specularity (as defined by RADIANCE’s plastic definition) was fixed at one of five values: zero and four logarithmically spaced values from 10−2.5 to 10−1. The maximum value of specularity we used (10−1) is a realistic value for a material of this kind [6] and appears glossy. Lower values produced materials with a slight sheen, and materials with very low specularities appeared matte. The maximum specularity is high enough that the brightest pixels were close to the illuminant chromaticity, while, for lower specularities, the light from the brightest points in the image contained a mixture of specular and diffuse components. The roughness parameter was fixed at 0.15 for all our stimuli.
The stimulus for each trial comprised a short animation, showing either a surface reflectance change, or an illuminant spectral change. The animation was either accompanied by motion or not, as specified in the following section. The dynamic part of each animation lasted 0.33 s, with additional static periods at the beginning and end, each lasting 0.50 s.
2.3. Motion conditions
For bumpy and for marbled stimuli, motion of the lights or of the object could be around the VA or a vertical axis, perpendicular to the VA (PA). We refer to these conditions as LightsVA, ObjectVA, LightsPA and ObjectPA. In control conditions, there was either no motion, or motion of the image about the VA. We refer to these conditions as No-Motion and ImageVA. We note that the ‘visual axis’ was that of the viewpoint from which the scene was rendered, not strictly that of the observer, who was free to move his/her gaze. We illustrate the scene geometry and rotation axes in figure 3. With six motion conditions (four, plus two control conditions), and two types of object (bumpy and marbled) we have 12 experimental conditions in total (figure 4).
In all cases, the direction of rotation about the axis was selected randomly to be clockwise or anticlockwise by 30°. The rotational motion occurred only during the 0.33 s period of animation that included the spectral change, had constant rotational speed (90° s−1) and abrupt onset and offset. Linear motion of points in the image corresponded to a maximum speed of 5.6° s−1 (degrees of visual angle per second) at the outer edge of the sphere for the VA conditions and 21.3° s−1(degrees of visual angle per second) across the centre of the sphere for the vertical-axis conditions. Examples of initial and final frames from our stimulus animations are shown in figure 2.
2.4. Procedure
The observer’s task was to classify each animation as a surface change or an illuminant change. Responses were collected via a button box. Observers viewed the computer-generated stimuli monocularly (to remove conflicting binocular disparity cues, which show the image to be in a single plane), in a dark room, with a black cardboard viewing tunnel to restrict stray light. At the viewing distance of 1.0 m, the stimuli subtended a visual angle of 7.1°×7.1°.
We ran the 12 experimental conditions in separate experimental sessions, each comprising 700 unique trials (140 at each of five levels of specularity), which were divided into four blocks of 175 trials, each block lasting approximately 15 min. The order in which the conditions were run was counterbalanced.
2.5. Observers
Six observers (1–6) participated in this experiment after giving their informed consent in accordance with the University of Oxford ethical procedures. All observers completed all conditions (except observer 3 who was not available to complete the conditions with rotation of the object or light sources about a vertical axis, ObjectVA and LightsVA). All observers had normal colour vision (no errors on the HRR plates and a Rayleigh match in the normal range measured on an Oculus HMC-Anomaloskop), and normal or corrected-to-normal visual acuity. Observers 4 and 6 are male; all others are female. Observer 4 is one of the authors; observer 6 was aware of the purpose of the experiment; the other observers had received an undergraduate psychology course on perception, but were naive to the purposes of the experiment. Observers 1, 2 and 4 participated in a previous study with similar stimuli [9] (as observers 3, 4 and 1, respectively). All participants had an opportunity to practise the task, except observer 6, whose early sessions were discarded.
3. Results
3.1. Performance per observer
The raw responses from observers are provided in the attached spreadsheet (see responses.csv in the electronic supplementary material). From these data, we calculate performance in discriminating between reflectance and illumination changes. We use d′ and ln(β) to independently indicate sensitivity and response-bias, respectively. A d′ value of 0 indicates that performance was at chance level, while the maximum measurable d′ from 140 trials is approximately 4.9 (because 140×p(z<(4.9/2))=140×0.9928=139). A ln(β) of 0 indicates no response bias, while a positive value indicates a tendency to give a ‘reflectance change’ response and a negative value indicates a tendency to give an ‘illuminant change’ response.
Performance for each observer in each condition—bumpy and marbled for each of six motion conditions—is shown in figure 5 as a function of specularity. As expected, performance (d′) was close to chance level with zero specularity and increased with specularity for all conditions and for all observers.
Performance with marbled stimuli was consistently higher than for bumpy stimuli, both in the no-motion conditions and in all moving conditions. These differences were particularly apparent at the highest specularities, but there is a performance advantage of between 30% and 60% for marbled stimuli throughout the range.