2
"If you want to make an apple pie from scratch, you must first create the universe."
Carl Sagan
This chapter contains a summary of the state of the art in augmented reality research and related technologies that are relevant to this dissertation. Some of this information has been developed during the period of my research and is included so comparisons can be made. First, an extended (from chapter 1) description and definition of augmented reality is presented, followed by a discussion of how it fits into the spectrum of virtual environments. The chapter then goes on to discuss various indoor and outdoor AR applications that have been developed, demonstrating the current state of the art. Following this is a discussion of the two techniques for performing real-time AR overlay, as well as a summary of the numerous types of tracking technologies employed. After these technologies have been explained, a history of human computer interaction techniques for desktop and virtual reality systems is then covered. Current techniques used for the capture of models in the physical world are then discussed, followed by a section summarising commercially available CAD software and solid modelling techniques. Finally, the problems of working outdoors with wearable computers are described, including how they can be used for mobile augmented reality.
When Sutherland proposed the concept of the Ultimate Display [SUTH65], his goal was to generate artificial stimulus that would give the user the impression that the experience is real. Instead of immersing the user into an artificial reality, a second approach is to augment the user’s senses with extra information, letting them experience both artificial and real stimulus simultaneously. In his excellent survey paper of the field, Azuma defines augmented reality systems as those that contain the following three characteristics [AZUM97a]:
· Combines real and virtual
Interactive in real-time
Registered in 3D
This
definition does not limit augmented reality to the use of head mounted displays
(allowing for monitors, projectors, and shutter glasses), but excludes
non-interactive media such as movies and television shows. This dissertation
focuses on mobile outdoor augmented reality, and therefore this chapter will
focus only on research related to head mounted displays.
With the availability of real-time computer-generated 3D graphics, computers can render synthetic environments on a display device that can give the user the impression they are immersed within a virtual world. This technology is referred to as virtual reality (VR) and is designed to simulate with a computer the physical world humans normally can see. The opposite of VR is the real physical world typically experienced by a human, although it may be slightly attenuated because it is being viewed via a head mounted display or video camera. Augmented reality is therefore made up of a combination of virtual and real environments, although the exact make up of this may vary significantly. Milgram and Kishino used these properties to define a reality-virtuality continuum [MILG94], and this can be used to perform comparisons between various forms of mixed reality by placement onto a spectrum. At one end of the continuum is the physical world, the other end is fully synthetic virtual environments, and AR is located somewhere in between since it is a combination of the two. Figure 2‑1 is adapted from Milgram and Kishino’s continuum, with example pictures at different locations on the reality-virtuality spectrum but showing the view from the same location. The first image in Figure 2‑1 shows a view of the physical world seen through a head mounted display, with no virtual information at all. The next image is augmented reality, where artificial objects (such as the table) are added to the physical world. The third image is augmented virtuality, where physical world objects (such as a live display of the user’s view of the world) are added into a fully immersive virtual environment. The final image depicts a completely synthetic environment, with no information from the physical world being presented. Every type of 3D environment can be placed somewhere along this spectrum and can be used to easily compare and contrast their properties.
To overlay 3D models on to the user’s view, a mobile AR system requires a HMD to be combined with a device that can measure the position and orientation of the user’s head. As the user moves through the physical world the display is updated by the computer in real-time. The accuracy of the virtual objects registered to the physical world influences the realism of the fusion that the user experiences. A major focus of current AR research has been achieving good registration, as discussed extensively in survey papers by Azuma [AZUM97a] and Azuma et al. [AZUM01]. There are a number of known problems that cause poor registration, such as tracker inaccuracies, HMD misalignment, and delays in the various stages of rendering from the trackers to the display.
While registration is important for producing AR applications that are realistic (giving the user a sense of presence and hence being more immersive and easier to use) it is not the only important issue in AR research. Other questions, such as how do users interface with these systems, and what kind of tasks can systems perform, are also important and make the registration research useable for building real world applications.
During the evolution of technologies such as virtual reality and augmented reality, there have been a number of applications developed that demonstrate the use of this technology. In the field of augmented reality, this research work initially began indoors where hardware is able to be large and consume considerable electrical power without imposing too many restrictions on its use. As hardware has become smaller in size and more powerful, researchers are demonstrating more complex systems and are starting to move outdoors. This section discusses various applications that have been developed for both indoor and outdoor environments, approximately arranged in chronological order where possible.
For
indoor augmented reality, there are a number of applications that have been
developed in areas as diverse as information display, maintenance,
construction, and medicine. These applications are used to provide extra
situational awareness information to users to assist with their tasks. By
projecting data onto the vision of a user, information is shown in situ in the
environment and the user can better understand the relationship the data has
with the physical world. The first working AR demonstration was performed using
a HMD designed by Sutherland [SUTH68] and is shown in Figure 2‑2. This
HMD is transparent, in that the user can see the physical world as well as
computer-generated imagery from small CRT displays overlaid using a half
silvered mirror. So while the goal of the Ultimate Display concept was to
completely immerse the user’s senses into a virtual environment, Sutherland
actually invented the addition of information (augmented reality) with the development
of this display. Sutherland’s demonstration projected a simple wire frame cube
with line drawn characters representing compass directions on each wall. Other
see through HMDs were developed for use in military applications, with examples
such as the Super Cockpit project by Furness [FURN86]. The use of HMDs was
designed to improve on existing heads up displays (HUD) in military aircraft,
providing information wherever the user is looking instead of just projected
onto the front of the glass windshield. Similar technology is used to implement
displays for virtual reality, except these are opaque and do not use the
physical world to provide extra detail.
|
|
|
Figure 2‑2 The first head mounted display, developed by Ivan Sutherland in 1968 (Reprinted and reproduced with permission by Sun Microsystems, Inc) |
|
|
|
Figure 2‑3 External and AR immersive views of a laser printer maintenance application (Images courtesy of Steven Feiner – Columbia University) |
The KARMA
system was developed by Feiner et al. as a test bed for the development of
applications that can assist with 3D maintenance tasks [FEIN93a]. Instead of
simply generating registered 3D graphics from a database to display
information, KARMA uses automatic knowledge-based generation of output
depending on a series of rules and constraints that are defined for the task.
Since the output is not generated in advance, the system can customise the
output to the current conditions and requirements of the user. One example
demonstrated by Feiner et al. was a photocopier repair application (shown in
Figure 2‑3) where the user is presented with detailed 3D instructions
showing how to replace toner and paper cartridges.
|
|
|
Figure 2‑4 Virtual information windows overlaid onto the physical world (Image courtesy of Steven Feiner – Columbia University) |
The Windows on the World work by Feiner et al. demonstrated the overlay of windows with 2D information onto an AR display [FEIN93b]. While traditional AR systems render 3D information, this system is based on 2D information in an X Windows server. Windows of information can be created in the X server and then attached to the display, the user’s surround, or the physical world. As the user moves about the 3D environment, the system recalculates the position of the windows on the HMD. Since the system is based on X Windows, any standard X application can be used and information always appears facing the user with no perspective warping. Figure 2‑4 shows an example of 2D information windows attached to different parts of the environment.
One of
the first commercially tested applications for augmented reality was developed
by the Boeing company to assist with the construction of aircraft [CURT98]. One
task performed by workers is the layout of wiring bundles on looms for
embedding into the aircraft under construction. These wiring looms are
complicated and so workers must constantly refer to paper diagrams to ensure
the wires are placed correctly. Curtis et al. describe the testing of a
prototype AR system that overlays the diagrams over the wiring board so that
workers do not have to take their eyes away from the task. Although it was
never fully deployed in the factory, this research is a good demonstration of
how AR technology can be used to assist workers with complicated real world
tasks.
|
|
|
Figure 2‑5 Worker using an AR system to assist with wire looming in aircraft assembly (Image courtesy of David Mizell – Boeing Company) |
|
|
|
Figure 2‑6 AR with overlaid ultrasound data guiding doctors during needle biopsies (Image courtesy of Andrei State – University of North Carolina, Chapel Hill) |
Using AR
to assist doctors with medical imaging is an area that shows much promise in
the near future. A current problem with X-ray and ultrasound images is that
they are two dimensional and it is difficult to spatially place this
information easily within the physical world. By overlaying this information
onto the patient using AR, the doctor can immediately see how the imaging data
relates to the physical world and use it more effectively. State et al. have
been performing research into the overlay of ultrasound images onto the body to
assist with breast biopsies [STAT96]. During the biopsy, a needle is injected
into areas of the body that the doctor needs to take a sample of and analyse.
Normally, the doctor will take many samples and hope that they manage to
achieve the correct location, but damaging areas of tissue in the process.
Using AR, the ultrasound overlay can be used to see where the biopsy needle is
relative to the area of interest, and accurately guide it to the correct
location. This results in less damage to the surrounding tissue and a greater
chance of sampling the desired area. Figure 2‑6 shows an example of a
needle being inserted into a simulated patient with overlaid ultrasound
imagery.
|
|
|
Figure 2‑7 Studierstube AR environment, with hand-held tablets and widgets (Images courtesy of Gerhard Reitmayr – Vienna University of Technology) |
Schmalstieg
et al. [SCHM00] and Reitmayr and Schmalstieg [REIT01a] describe a collaborative
augmented reality system named Studierstube, which can perform shared design
tasks. In this environment, users can work together to perform tasks such as
painting objects and direct manipulation of 3D objects, as shown in Figure 2‑7.
To provide users with a wide range of possible operations, the user carries a
Personal Interaction Panel (PIP) [SZAL97]. The PIP can be constructed using
either a pressure sensitive tablet or a tracked tablet and pen combination, and
the AR system then overlays interactive widgets on top of the tablet. Using the
pen on the tablet, the user can control the widgets that are linked up to
various controls affecting the environment.
|
|
|
Figure 2‑8 Marker held in the hand provides a tangible interface for viewing 3D objects (Images courtesy of Mark Billinghurst – University of Washington) |
The
ARToolKit was developed by Kato and Billinghurst to perform the overlay of 3D
objects on top of paper fiducial markers, using only tracking data derived from
captured video images [KATO99]. Using this toolkit, a number of applications
have been developed that use tangible interfaces to directly interact with 3D
objects using the hands. Billinghurst et al. [BILL99] use this toolkit to
perform video conferencing, with the user able to easily adjust the display of
the remote user, as shown in Figure 2‑8. Another application that uses
this technology is Magic Book by Billinghurst et al. [BILL01]. Each page of the
magic book contains markers that are used to overlay 3D objects with AR. By
pressing a switch on the display the user can be teleported into the book and
experience immersive VR. Magic Book integrates an AR interface (for viewing the
book from a top down view with a tangible interface) with a VR interface (for
immersively flying around the book’s 3D world).
|
|
|
Figure 2‑9 Actors captured as 3D models from multiple cameras overlaid onto a marker (Image courtesy of Adrian Cheok – National University of Singapore) |
The 3D Live system by Prince et al. [PRIN02] captures 3D models of actors in real-time that can then be viewed using augmented reality. By arranging a series of cameras around the actor, Virtual Viewpoint software from Zaxel [ZAX03] captures the 3D geometry using a shape from silhouette algorithm, and then is able to render it from any specified angle. 3D Live renders this output onto ARToolKit markers, and live models of actors can be held in the hands and viewed using easy to use tangible interfaces, as shown in Figure 2‑9. Prince et al. explored a number of displays for the system, such as holding actors in the hands on a card, or placing down life sized actors on the ground with large markers.
While indoor examples are useful, the ultimate goal of AR research is to produce systems that can be used in any environment with no restrictions on the user. Working outdoors expands the range of operation and has a number of unique problems, discussed further in Section 2.9. Mobile outdoor AR pushes the limits of current technology to work towards achieving the goal of unrestricted AR environments.
The first
demonstration of AR operating in an outdoor environment is the Touring Machine
(see Figure 2‑10) by Feiner et al. from Columbia University [FEIN97]. The
Touring Machine is based on a large backpack computer system with all the
equipment necessary to support AR attached. The Touring Machine provides users
with labels that float over buildings, indicating the location of various
buildings and features at the Columbia campus. Interaction with the system is
through the use of a GPS and head compass to control the view of the world, and
by gazing at objects of interest longer than a set dwell time the system
presents further information. Further interaction with the system is provided
by a tablet computer with a web-based browser interface to provide extra
information. The Touring Machine was then extended by Hollerer et al. for the
placement of what they termed Situated Documentaries [HOLL99]. This system is
able to show 3D building models overlaying the physical world, giving users the
ability to see buildings that no longer exist on the Columbia University
campus. Another feature is the ability to view video clips, 360 degree scene
representations, and information situated in space at the site of various
events that occurred in the past.
|
|
|
Figure 2‑10 Touring Machine system overlays AR information in outdoor environments (Images courtesy of Steven Feiner – Columbia University) |
|
|
|
Figure 2‑11 BARS system used to reduce the detail of AR overlays presented to the user (Images courtesy of Simon Julier – Naval Research Laboratory) |
The Naval
Research Laboratory is investigating outdoor AR with a system referred to as
the Battlefield Augmented Reality System (BARS), a descendent of the previously
described Touring Machine. Julier et al. describe the BARS system [JULI00] and
how it is planned for use by soldiers in combat environments. In these
environments, there are large quantities of information available (such as
goals, waypoints, and enemy locations) but presenting all of this to the
soldier could become overwhelming and confusing. Through the use of information
filters, Julier et al. demonstrate examples (see Figure 2‑11) where only
information of relevance to the user at the time is shown. This filtering is performed
based on what the user’s current goals are, and their current position and
orientation in the physical world. The BARS system has also been extended to
perform some simple outdoor modelling work [BAIL01]. For the user interface, a
gyroscopic mouse is used to manipulate a 2D cursor and interact with standard
2D desktop widgets.
|
|
|
Figure 2‑12 Context Compass provides navigational instructions via AR overlays (Images courtesy of Riku Suomela – Nokia Research Lab) |
Nokia research has been performing research into building outdoor wearable AR systems, but with 2D overlaid information instead of 3D registered graphics. The Context Compass by Suomela and Lehikoinen [SUOM00] is designed to give users information about their current context and how to navigate in the environment. 2D cues are rendered onto the display (as depicted in Figure 2‑12). Other applications such as a top down perspective map view have also been implemented by Lehikoinen and Suomela [LEHI02]. To interact with the system, a glove-based input technique named N-fingers was developed by Lehikoinen and Roykkee [LEHI01]. The N-fingers technique provides up to four buttons in a diamond layout that can be used to scroll through lists with selection, act like a set of arrow keys, or directly map to a maximum of four commands.
Apart from the previously mentioned systems, there are a small number of other mobile AR systems that have also been developed. Billinghurst et al. performed studies on the use of wearable computers for mobile collaboration tasks [BILL98] [BILL99]. Yang et al. developed an AR tourist assistant with a multimodal interface using speech and gesture inputs [YANG99]. Puwelse et al. developed a miniaturised prototype low power terminal for AR [POUW99]. Behringer et al. developed a mobile AR system using COTS components for navigation and control experiments [BEHR00]. The TOWNWEAR system by Satoh et al. demonstrated high precision AR registration through the use of a fibre optic gyroscope [SATO01]. The DWARF software architecture was designed by Bauer et al. for use in writing mobile outdoor AR applications [BAUE01]. Cheok has developed some outdoor games using AR and the 3D Live system discussed previously [CHEO02a] [CHEO02c]. Cheok has also developed accelerometer-based input devices such as a tilt pad, a wand, and a gesture pad for use with wearable computers [CHEO02b]. Fisher presents an authoring toolkit for mixed reality experiences and developed a prototype outdoor AR system [FISH02]. Ribo et al. developed a hybrid inertial and vision-based tracker for use in real-time 3D visualisation with outdoor AR [RIBO02]. Roberts et al. are developing a prototype for visualisation of subsurface data using hand held, tripod, and backpack mounted outdoor AR systems [ROBE02]. The use of AR for visualisation of archaeological sites was performed by Vlahakis et al. [VLAH02].
As previously mentioned, this dissertation focuses on the use of HMDs to merge computer-generated images with the physical world to perform augmented reality. This section describes the HMDs and other supporting technology necessary to display AR information, implemented using either optical or video combination techniques. These techniques are described and then compared so the applications of each can be better understood.
Rolland et al. [ROLL94], Drascic and Milgram [DRAS96], and Rolland and Fuchs [ROLL00] describe in detail the technological and perceptual issues involved with both optical and video see through displays. These authors identified a number of important factors that need to be considered when selecting which technology to use for an application, and these are as follows:
· System latency – the amount of time taken from when physical motion occurs to when the final image reflecting this is displayed.
Real-scene resolution and distortion – the resolution that the physical world is presented to the user, and what changes are introduced by the optics.
Field of view – the angular portion of the user’s view that is taken up by the virtual display, and whether peripheral vision is available to the user.
Viewpoint matching – the view of the physical world may not match the projection of the 3D overlay, and it is desirable to minimise these differences for the user.
Engineering and cost factors – certain designs require complex optics and so tradeoffs must be made between features and the resources required to construct the design.
· Perceived depth of overlapping objects – when virtual objects are drawn in front of a physical world object, it is desirable that the virtual objects perform correct occlusion.
Perceived depth of non-overlapping objects – by using depth cues such as familiar sizes, stereopsis, perspective, texture, and motion parallax, users can gauge the depth to distant objects.
Qualitative aspects – the virtual and physical worlds must be both rendered and these images must preserve their shape, colour, brightness, contrast, and level of detail to be useful to the user.
Depth of field – When physical and virtual images are passed through optics they will be focused at a particular distance. Keeping the image sharp at the required working distance is important for the user.
· User acceptance and safety – if the display attenuates the physical world it could be unsafe to use in some environments since the user’s vision system is not being supplied with adequate information to navigate.
Adaptation – some displays have limitations that can be adjusted to by humans over time, and can be used as an alternative to improving the technology if there are no harmful side effects.
Peripheral field of view – the area outside the field of view of the virtual display is not overlaid with information, but is still useful to the user when navigating in the physical world.
The
design of an optically combined see through HMD system may be represented by
the schematic diagram in Figure 2‑13, although in practice the design is
much more complex due to the internal optics required to merge and focus the
images. A small internal LCD screen or CRT display in the HMD generates an
image, and an optical combiner (such as a half silvered mirror or a prism)
reflects part of the light into the user’s eyes, and allowing light from the
physical world to pass through to the eyes as well.
|
|
In general, most current AR systems based on optically combined displays share the following properties:
· Optical combiners are used to merge physical and virtual world images.
The computer generates an overlay image that uses black whenever it wants the pixels to be see-through, and so the images are simple and can be rendered quickly.
The physical world light is seen by the user directly and has high resolution with an infinite refresh rate and no delay, while the generated image is pixelated and delayed.
The physical world remains at its dynamic focal length, while the overlay image is fixed at a specific focal length.
Accurate registration of the image with the physical world is difficult because the computer cannot monitor the final AR image to correct any misalignments.
Ghosting effects are caused by the optical combiner since both virtual and physical images are visible simultaneously (with reduced luminance), and obscuring the physical world with a generated image cannot typically be performed.
The field of view of the display is limited by the internal optics, and distortions increase at larger values.
The front of the display must be unoccluded so that the physical world can be seen through the HMD.
An example image from an optically combined AR system is shown in Figure 2‑14, with a 3D virtual table overlaying the physical world. Some of the problems with the technology are shown by the ghosted image and reflections, caused by sunlight entering the interface between the HMD and the lens of the camera capturing the photo.
Recent
technology has improved on some of the problems discussed in this section.
Pryor et al. developed the virtual retinal display, using lasers to project
images through an optical combiner onto the user’s retina [PRYO98]. These
displays produce images with less ghosting effects and transmission losses than
an LCD or CRT-based design. Kiyokawa et al. produced a research display that
can block out the physical world selectively using an LCD mask inside the HMD
to perform proper occlusion [KIYO00].
|
|
|
|
Video combined see through HMD systems use video cameras to capture the physical world, with virtual objects overlaid in hardware. This technique was first pioneered by Bajura et al. in 1992 [BAJU92]. An example implementation is depicted in the schematic in Figure 2‑15, with a video camera capturing images of the physical world that are combined with graphics generated by a computer. The display for this technique is opaque and therefore the user can only see the physical world through the video camera input. The combination process can be performed using two different techniques: using chroma-keying as a stencil to draw the video where AR pixels have not been drawn, or using the computer to draw the AR pixels on top of the video. The final image is then displayed to the user directly from an LCD or CRT display through appropriate optics.
In general, most current AR systems based on video combined displays share the following properties:
· The display is opaque and prevents light entering from the physical world, making it also possible to use for virtual reality tasks with no modifications required.
Some form of image processing is used to merge physical and virtual world images. Real-time image transformations may be necessary to adjust for resolution differences, spherical lens distortions, and differences in camera and display position.
The capture of the physical world is limited to the resolution of the camera, and the presentation of both physical and virtual information is limited to the resolution of the display. The final image viewed by the user is pixelated and delayed, with consistency between physical and virtual depending on whether the camera and display have similar resolutions.
The entire image projected to the user is at a constant focal length, which while reducing some depth cues also makes the image easier to view because the focus does not vary between physical and virtual objects.
More accurate registration may be achieved since the computer has access to both incoming and outgoing images. The computer may adjust the overlay to improve registration by using a closed feedback loop with image recognition.
The image overlay has no ghosting effects since the incoming video signal can be modified to completely occlude the physical world if desired.
By using video cameras in other spectrums (such as infra-red or ultraviolet) the user can perceive the physical world that is not normally visible to the human eye.
Demonstrations to external viewers on separate monitors or for recording to tape is simple since the video signal sent to the HMD may be passed through to capture exactly what the user sees.
An
example image from a video combined AR system is shown in Figure 2‑16,
with a 3D virtual table overlaying the physical world. Slight blurring of the
video stream is caused by the camera resolution differing from that used by the
display.
|
|
Table 2‑1 lists a summary of the information presented concerning optical and video combination techniques, comparing their features and limitations. Neither technology is the perfect solution for AR tasks, so the appropriate technique should be selected based on the requirements of the application.

Table 2‑1 Comparison between optical and video combined AR systems
To render graphics that are aligned with the physical world, devices that track in three dimensions the position and orientation of the HMD (as well as other parts of the body) are required. A tracker is a device that can measure the position and/or orientation of a sensor relative to a source. The tracking data is then passed to 3D rendering systems with the goal being to produce results that are realistic and match the physical world as accurately as possible. There have been a number of survey papers in the area: Welch and Foxlin discuss the state of the art in tracking [WELC02], Holloway and Lastra summarise the technology [HOLL93], and Azuma covers it as part of a general AR survey [AZUM97a]. This section covers the most popular technologies for tracking, with a particular focus on the types that are useful when working outdoors. This section is by no means a complete discussion of tracking and does not present new tracking results. I simply use currently available devices in this dissertation to provide tracking for my applications.
There are a number of different tracking technologies used, varying by the number of dimensions measured and the physical properties used. Holloway and Lastra discuss the different characteristics of various tracking systems [HOLL93], and these are summarised as follows:
· Accuracy – the ability of a tracker to measure its physical state compared to the actual values. Static errors are visible when the object is not moving, while dynamic errors vary depending on the motion of the object at the time.
Resolution – a measure of the smallest units that the tracker can measure.
Delay – the time period between reading inputs, processing the sensor data, and then passing this information to the computer. Large delays cause virtual objects to lag behind the correct location.
Update rate – the update rate measures the number of data values per second the tracker can produce. Faster update rates can perform smoother animation in virtual environments.
Infrastructure – trackers operate relative to a reference source. This reference may need to be measured relative to other objects to provide world coordinates useful to applications.
Operating range – trackers are limited to operating within a limited volume defined by the infrastructure. Signals emitted by sources attenuate rapidly over distance, which limits the range of operation.
Interference – various tracking technologies use emissions of signals that can be interfered with by other sources. External interference can be difficult to cancel out and affects the accuracy of results.
Cost – trackers range in price depending on complexity and the accuracy provided.
In this section, various aspects of the above factors will be discussed, along with the following extra factors:
· Degrees of freedom – trackers measure a number of degrees of freedom, being able to produce orientation, position, or some combination of these as results.
Coordinate type – some trackers measure velocity or acceleration that requires integration to produce relative-position values. When integrating values that are not exact, errors accumulate over time and cause drift. Absolute values do not require integration and are stable over time.
Working outdoors has a number of problems that are not noticed when dealing with indoor tracking systems. The use of tracking equipment in an indoor environment is simplified due to known limitations of the working environment. Alternatively, when working outdoors the environment is virtually unlimited in size and setting up infrastructure may be difficult. The use of technology that is required to be mobile restricts further the choices of tracking devices available. Azuma discusses in detail many problems to do with performing tracking outdoors [AZUM97b], and some extra factors to consider for comparison are:
· Portability – the device must be able to be worn by a person for use in a mobile environment, so weight and size are important.
Electrical power consumption – the tracking system must be able to run using batteries and not have excessive power requirements.
One of the main points stressed by Welch and Foxlin [WELC02] and Azuma et al. [AZUM98] is that to obtain the best quality tracking and to minimise any problems, hybrid tracking should be used. Since no tracking technology is perfect, hybrid trackers combine two or more different types of technologies with varying limitations to produce a better overall tracker. The last part of this section discusses some hybrid systems in detail.
Mechanical trackers rely on a physical connection between source and object, producing absolute position and orientation values directly.
The first tracker developed for interactive 3D computer graphics was the mechanical “Sword of Damocles” by Sutherland along with his new HMD [SUTH68]. This tracker is a mechanical arm with angle sensors at each joint. By knowing the length of each arm segment and the measured angle at each joint, the position and orientation of the tip of the arm can be calculated relative to the base. Measuring angles at a mechanical joint is very accurate with only very slight delays. Due to the mechanical nature of the device, the motion of the user is restricted to the length of the arm and the various joints that connect it together. The arm is quite heavy for a human and so while counterweights help to make it lighter, the inertia of the arm requires the user to perform movements slowly and carefully to avoid being dragged about and injured.
Sutherland also demonstrated a wand like device to use for 3D input when using the HMD. This device uses a number of wires connected to pulleys and sensors that measure location information. While much more lightweight, this device requires that the wires not be touched by other objects in the room as well as the user, and so the user must take this into account when moving about the room, restricting their motion.
Accelerometers measure linear forces applied to the sensor and are source-less, producing relative-position values through double integration. Accelerometers can measure absolute pitch and roll when measuring acceleration caused by gravity.
Accelerometers are small and simple devices that measure acceleration forces applied to an object along a single axis, discussed in detail by Foxlin et al. [FOXL98a]. Modern accelerometers are implemented using micro-electro-mechanical systems (MEMS) technology that have no moving parts and can be embedded into small IC sized components. Accelerometers vibrate small elements internally and measure applied forces by sensing changes in these vibrations. To acquire velocity this value must be integrated, and then integrated again if relative position is required. The advantages of accelerometers are that they require no source or infrastructure, support very fast update rates, are cheap to buy, have low power requirements, and are simple to add to a wearable computer. The main disadvantage of this technology is that the process of integrating the measurements suffers from error accumulation and so within a short time period the values drift and become inaccurate. Due to the rapid accumulation of errors, accelerometers are not normally used standalone for position tracking. Accelerometers are commercially available from companies such as Crossbow [XBOW02].
When three accelerometers are mounted orthogonally to each other, a tilt sensor is formed that can measure the pitch and roll angles toward the gravity vector. Since gravity is a constant downward acceleration of approximately 9.8 ms-2 on Earth, orientation can be calculated by measuring the components of the gravity force that is being applied to each accelerometer. The tilt sensor output is vulnerable to errors caused by velocity and direction changes since these applied forces are indistinguishable from gravity.
Gyroscopes measure rotational forces applied to the sensor and are source-less, producing relative-orientation values through integration.
The first gyroscopes were mechanical devices constructed of a wheel spinning on an axis. Gyroscopes are induced to maintain spinning on a particular axis once set in motion, according to the laws of conservation of angular momentum. When an external force is applied to a gyroscope, the reaction is a motion perpendicular to the axis of rotation and can be measured. Gyroscopes are commonly used for direction measurements in submarines and ships, being accurate over long periods of time but typically very large and not portable.
Gyroscopes may also be constructed using MEMS technology and contain an internal vibrating resonator shaped like a tuning fork, discussed in detail by Foxlin et al. [FOXL98a]. When the vibrating resonator experiences rotational forces along the appropriate axis, Coriolis forces will cause the tines of the fork to vibrate in a perpendicular direction. These perpendicular forces are proportional to the angular velocity and are measured to produce output. Since each gyroscope measures only one axis of rotation, three sensors are mounted orthogonally to measure all degrees of freedom. To gain absolute orientation the velocity from the sensor must be integrated once, but this drifts over time and is not normally used for standalone orientation tracking. These devices are similar to accelerometers in tha