Characterization of a Hybrid Tracking System
Holly L. Adams

Introduction

Virtual Reality (VR) is the process by which a computer and associated peripherals replace the sensory stimuli of sight, sound, or touch (or even smell) that would be provided by a natural environment. These stimuli give the user an illusion of existing within a synthetic or virtual world that they may interact with or change. (1,2) VR offers the user a full range of interaction by providing a complete field of view.  The user has the ability to look up, down, side to side, and even behind them.  As the user moves, their position and  motions are tracked by a sensor or by tracking peripheral(s).  Information from the tracking peripheral(s) is then used to update the virtual scene.  This information tells the computer what perspective the user is viewing the environment from.  The computer then generates a virtual environment to suit the user's perspective and create a realistic experience.

Head-mounted displays (HMDs) are currently one of the most common methods of displaying a virtual environment. A HMD is the method of display explored in this research. HMD's provide a high quality stereo scene to the user, but can easily cause motion sickness. The problem with a HMD is that the user is experiencing contradictory sensory input from their eyes and body. The eyes are sending information to the brain indicating motion, while the body is sending information that no motion is occurring. This discrepancy leads to V-sickness, a sensation similar to motion sickness and can make a task difficult if no impossible to complete. Other display options include haptic displays, spatially immersive displays, and virtual model displays.

The key to producing a successful virtual environment is a fast update rate for the scene. This means that the virtual environment should be updated at a rate that creates a perceptually realistic experience. The problem is not necessarily the tracking device associated with the system, or even the display. Delays generally come from the time it takes a computer to graphically render a scene associated with the environment. Although computer-processing speeds have drastically increased in recent years, they are still not fast enough to generate a virtual scene that will provide the user with a truly realistic experience.

The performance of any VR system is dependent not only on processing speeds, but also on the sensor(s) which generate information used to update the virtual environment. Magnetic sensors are the industry standard, but inertial, acoustical, and hybrid systems are also available. When using a head mounted display, ideally the head position would be read instantaneously. The problem remains that even if communication between the tracker and graphics generator were instantaneous, graphics generation of the scene would still require additional processing time. If however, graphics generation could begin before the user actually reached the next step in the virtual environment, the problems associated with processing delays would be reduced. Accurate prediction of the user's motion or position would allow the graphics of the scene to be generated beforehand and displayed for the user at the appropriate time.

A new hybrid tracking system that does have the ability to predict a users motion up to 50ms in advance has recently entered the market. This system collects 3 orientation readings from an inertial motion sensor and 3 position readings from 3 individual transponder beacons with each sampling. In comparison to the standard magnetic tracking system, the hybrid system has increased the sampling frequency thereby decreasing the average sampling lag as well as the system processing time. The combination of reduced processing time and prediction may be the next step in VR tracking devices.

Background

The success of any VR system is dependent on the quality and functionality of the environment provided to the user. In order to satisfy current demands, a VR system must provide users with a high quality graphics display. The system must also allow appropriate user interaction and have the ability to accurately update the environment at a rate that satisfies the demands of the specific task at hand. VR technology has been around for about 30 years, but collectively, VR systems have been slow to catch on due to their high cost and their inability to update a virtual environment fast enough to provide a realistic experience for the user . (1,3)

A head-mounted display (HMD) is one method of displaying a virtual environment.(4). HMD's are essentially a set of goggles or a visor that provide a stereoscopic view of a computer-generated environment. Each eye is presented with a different image. The images are then fused by the mind to create one three-dimensional scene. The HMD provides a sense of realistic motion by updating the scene with information collected from the users' head motions. Head motion information is collected by a tracking system connected to the HMD. The tracking system feeds back to the computer information which guides the graphics generation. Figure 1.0 depicts the information path followed to generate a scene and display it in a HMD.

Figure 1.0 Information path for HMD scene generation.

 

The three main types of sensors incorporated into VR systems are magnetic, inertial, and acoustical(2). An example of a magnetic tracking system currently on the market is the Polhemus Fastrak. Magnetic trackers can be used to provide an accurate trace of subjects' head movements, but do not provide a fast scene update rate. Magnetic tracking systems require extended processing time in that there are two processing engines implemented to determine the position and orientation of the subject's head relative to a magnetic receiver. The internal central processing unit (CPU) converts the data from analogue to digital, processes and then filters the data for noise before sending information on to the main CPU driving the system. Although environmental magnetic fields or ferromagnetic materials within the immediate area can effect magnetic sensors, these trackers generally are not distorted by the presence of small metal objects(2)

Inertial based tracking systems provide a faster scene update rate than magnetic based systems, but are not as accurate. Data generated via the inertial tracker is initially filtered by the system's internal CPU and then sent directly to the main CPU. The main CPU then provides a quick analogue to digital conversion. This type of tracking system enables the collection an infinite amount of data pertaining to the motion of the subject, but is not capable of maintaining registration of the subject's position within 3 dimensional space(2)

In many situations where accuracy is not critical, acoustical trackers provide a cheap alternative to other trackers. With the help of three microphones, acoustical trackers are able to determine the position and orientation of a moving body. Although they are not effected by the magnetic forces of surrounding objects as magnetic trackers are, acoustical trackers make it difficult to achieve the necessary accuracy, speed, and performance range required by most applications(2).

The majority of problems associated with VR can be attributed to the system's tracking peripherals. One of the main problems plaguing all applications of VR is lag. Lag is the time gap between user action and presence of the action's consequences within the virtual environment (i.e. the time it takes the system to catch up with a user's action). Drift is another common problem associated with virtual reality systems. Over an extended period of usage, VR systems are generally not able to maintain registration of the subject's location within three-dimensional space. The effects of lag as well as drift vary from system to system. For applications such as video games, which do not require a great deal of precision and are designed to run only a short time, these problems are not a critical issue. In cases which require more precision and need to run for extended lengths of time, the user may experience symptoms similar to those of motion sickness. Extremely precise applications such as flight simulators or technical training simulators may be impossible for the user to effectively complete(5,6).

The most accepted tracking device on the market is the Polhemus Fastrak, a magnetic tracking system. Product specs reveal a frequency of 120 Hz and accuracy within .03 inches and .15 degrees. Like other magnetic trackers, the Fastrak has proved itself to be very accurate after the fact. The problem with the Fastrak and other systems on the market is their inability to provide a fast update rate for the virtual environment. Like other magnetic trackers, the Fastrak's lag time is increased by the need for internal processing of the data before sending it on to the main processor.  Operation must take place in the presence of a magnetic base emitting three magnetic fields in the x, y, and z directions. Three orthogonal magnetic field detectors are aligned along the x, y, and z axes within the sensor and provide data regarding the position and orientation of the sensor. The Fastrak reports 6 degrees of freedom.  Fastrak output consists of six data points per sample; three points corresponding to position in three-dimensional space, and three to orientation.

InterSense, a new company that grew from the laboratories at Massachusetts Institute of Technology, has attempted to reduce the effects of lag and drift by developing a hybrid tracking system. InterSense's hybrid tracker, the IS600, combines information from ultrasonic transponder beacons with information from an inertial sensor. The advantage of inertial tracking is a higher sampling frequency, but the main drawback is that position readings can not be obtained.  By combining the ultrasonically detected position information with the orientation information from the inertial tracker, InterSense may be able to solve some of the problems associated with VR. Figure 2.0 displays the IS600 components.

Figure 2.0: Diagram of the IS600 attached to a head mounted display.

The main sensor of the system, the inertial sensor, sends information to the control box regarding the user's head motions. This information is separated into three parts, azimuth, elevation, and roll. Azimuth is angular rotation in a flat plane orientated parallel to the ground. Elevation is an increase or decrease in height, and roll is sideways rotation to the right or left. The ultrasonic coordinate base is connected to the control box and should be kept stationary. The three transponder beacons are attached to the HMD and send signals, which are detected by the coordinate base. When the transponder beacons are positioned properly, their signals should allow the system to pinpoint the location of the user's head through a process known as triangulation. Figure 3.0 depicts the proper positioning of the sensors for triangulation with the transponder beacons in a triangle and the inertial sensor at the base of the triangle. Once the triangulation routines have been properly integrated into the IS600, the CPU within the control box will be able to pinpoint the location of the inertial sensor. By placing the transponders in a triangle at a given distance from the inertial sensor, calculations will reveal the proper position of the sensor within an Euclidean coordinate system.

 

Figure 3.0: The IS600 sensor positioning on a head mounted display.
 

Position information from the coordinate base is sent to the control box as x, y, z coordinates of the inertial sensor. The control box also receives inertial data directly from the inertial sensor. All of the position and orientation data is then sent on to the main CPU. Ideally, the information should be combine and used to update the virtual scene. Currently, this is not the case. Until InterSense releases a firmware upgrade, now in alpha testing, the scene will be updated by data from the inertial sensor only. Similarly, triangulation has not yet been properly implemented. At this time, only one transponder beacon can be used to determine position of the users' head. With only one beacon, position information is not very accurate and is reported as the position of the beacon, not the inertial sensor.  The strong point of the IS600 is the system's ability to predict inertial motion up to 50ms in advance. No other product has successfully incorporated this type of prediction into a tracking system.

In an ideal situation, information from any given tracking system would be instantaneously available to the computer generating the virtual environment.  Even if instantaneous data transfer was an option, the system would still have a lag of approximately 16 ms, the average time required to graphically generate a scene(1).  For this reason, current processing constraints demand accurate prediction to appropriately update a virtual environment.  Although prediction may be effective, it also introduces the possibility of increased noise and overshoots.  Increases in noise are the result of the addition of any new component (such as prediction capabilities) to an already existing system.  When considering the possible benefits of prediction, noise is most likely not an issue.  If however, prediction is set too far in advance, overshoots may occur.  Overshoots are predictions that cause inappropriate scenes to be generated and usually occur when a user changes direction or speed of their head motion.  Overshoots may occur as a subject reverses their motion or changes velocity quickly.  The positive points associated with prediction should ultimately outweigh noise and the possibility of overshoot errors. Theoretically, the ultimate situation for prediction is a smooth, continuous motion. In cases of smooth motion, a system with prediction equal to the overall system lag should compensate for the lag. The graphics generation would then be taking place prior to the need for the updated scene. The result would be an appropriately rendered scene at the appropriate time. Table 1.0 lists the average lag of a VR system and the components that contribute to the lag for both the Fastrak and the IS600.
 
 

Table 1.0: Lag of both the Polhemus Fastrak and the InterSense IS600.
 
 
Polhemus Fastrak
InterSense IS600
Frequency
100 Hz
150 Hz
Average Sampling Lag
5 ms
1 ms
Tracker Processing Time
4 ms
2 ms
Serial Communication
5 ms
5 ms
Graphics Generation
16 ms
16 ms
Total Lag
30 ms
24 ms
 
 The Fastrak displays an average system lag of approximately 30 ms and the IS600 a lag of approximately 24 ms.  Table 1.0 displays a difference of only 6 ms between the two systems.  Although the 6 ms difference between the Fastrak and the InterSense total system lag does not seem significant, it is important to keep in mind the power the IS600 stands to gain from the system's added prediction capabilities. The effects of prediction will not truly be seen until the IS600 has been successfully integrated into a VR system.  A driving simulator is currently in use at the University of Rochester (U of R) and is the main environment for VR testing at that facility.

 

Figure 4.0: A video clip of the driving simulator at the U of R.

900k Quick Time movie

 

The video clip of the U of R driving simulator displays the need for prediction in a virtual environment.  When observing the subject's head motions, it is obvious from the video clip that the majority of head motions in the driving simulator are smooth rotations. These types of motions are ideal for prediction.  Theoretically a system with prediction would improve the registration of the environment with the user's actions during smooth motions.

driving simulator environment was generated by a Silicon Graphics (SGI) Onyx system. The system is equipped with four bR10,000 CPU's and an InfiniteReality graphics generator. The environment was pieced together and enhanced at the U of R from framework provided by SGI.
 

Methods

Part 1: Characterization of the IS600

Initial testing to determine drift and velocity thresholds of the IS600's inertial sensor was conducted at the Visual Perception Laboratory in the Center for Imaging Science at Rochester Institute of Technology.  The serial port of the IS600 (Figure 2.0) was connected to the modem of a Macintosh Computer. The inertial sensor was then attached to an angular vernier scale. A C program collecting data from the tracker was run on the Macintosh with the application ThinkC. The program allowed user input of the desired number of samples to be collected as well as the sampling interval. The program output consisted of a time, stamp and the orientation (azimuth, elevation, and roll) of the sensor. Figure 5.0 displays part of the apparatus described above.
 

Figure 5.0: The IS600 inertial sensor attached to a rotational vernier scale.

 
Slow rotation of the vernier scale with the attached inertial sensor (a few degrees per second) was done by hand to determine the minimum detectable velocity of the sensor. This method was repeated at high rotational speeds and recorded with a HI8 video camera. Video recording of the position of the sensor in 5-degree increments served as a calibration trial. The videotape was time stamped, and then observed frame by frame to pinpoint the frames where the onset of motion occurred. The frames displaying the initial motion of the sensor were compared to the calibration frames in order to determine the actual azimuth rotation of the sensor (degrees) within the selected frames.

The same C program was used to determine drift by recording the initial and final reading of the sensor's azimuth rotation. Initially this data was collected without motion and finally, the experiment was repeated for simulated natural head motions. Head motions were simulated by attaching the inertial sensor to a subject's head.  The subject was then instructed to look around the room for a specified time interval. The sensor was initially affixed to a solid surface, then placed on the subject's head, and finally returned to the solid surface before data collection ended.

Similar tests were performed when the tracker was moved to the National Institute of Health (NIH) Laboratory within the Department of Brain and Cognitive Science located at the University of Rochester in Rochester, New York. At the U of R the inertial sensor was attached to a wire wound potentiometer and measurements of both systems were collected by a Silicon Graphics (SIG) Onyx system. The potentiometer and the IS600 inertial sensor were simultaneously rotated in the azimuth plane and readings from both devices were collected by the SGI. Velocity thresholds were again examined by affixing the inertial sensor to the potentiometer and rotating the sensor in the azimuth at both high and low velocities. Analysis of the data collected was performed in Excel. Plots representing the potentiometer rotation and the azimuth rotation of the sensor were normalized and scaled to the appropriate rotation range.

The next step was to test the system's prediction capabilities. This required the use of InterSense's software package. The demonstration software allowed the adjustment of the system parameters and was then able to save any settings to the control box. The control box retained the desired settings even after the power switch had been toggled. Prediction was set to the maximum, 50 ms, so that the data offset between the potentiometer rotation and azimuth rotation of the sensor was maximized. The inertial sensor was again affixed to the potentiometer and smooth rotations were used as the system input. Several tests were repeated by presenting the system with the stimuli of sinusoidal motion. The SGI was again responsible for data collection.
 
 

Part 2: Integration of the IS600 into a Virtual Environment

The final step of the process was to integrate the IS600 into an already existing virtual environment. The environment used was the driving simulator at the U of R displayed in Figure 6.0.

 

Figure 6.0: The virtual car at the U of R.

 

The driving simulator environment was compiled with a baud rate of 19.2Kb per second, and prediction commands were set within the environment to allow for the toggling on and off of a given prediction rate. The 50ms prediction setting was repeatedly tested to insure that the system could handle a toggle command. Data from sinusoidal rotations of the inertial sensor attached to the potentiometer were again collected.

Thirteen subjects were then brought into the U of R lab for perceptual testing. All subjects were over the age of 18 and none were prone to motion sickness. Initially the subjects were acclimated to the driving simulator by driving freely for approximately 3 to 5 minutes.  In each case, the subjects were instructed to maintain a constant distance from a lead car that was traveling at a constant velocity within the environment. The subjects were then asked to make 30 choices as to which environment they preferred (with prediction or without). The prediction setting of 30 ms was toggled on and off in a random order.  For each seting, subjects were given approximately 8 seconds to drive before they were asked to make a decision.
 
 

Results/Discussion

Part 1: Characterization of the IS600 Inertial Sensor
All data collected during the characterization of the IS600 was collected at a baud rate of 38.4Kb.  Orientation readings were collected, via a Macintosh Computer, from the IS600 inertial sensor over various time intervals ranging from 1 to 15 minutes. When the sensor was static, drift was not observed. However, after simulated head motions lasting 1 to 15 minutes, the sensor did display drift of approximately 2 to 5 degrees. In these cases, at least 2 of the degrees of drift can be attributed to experimental error.  The error occurred as the sensor was being returned to the initial, static position.

When the IS600 was connected to the Silicon Graphics Onyx system, the sampling rate was random. Depending on the baud rate, sampling intervals generally ranged from about 7 ms to 40 ms, but jumped as high as 80 ms. This problem is currently under investigation by InterSense. In order to complete testing within a reasonable amount of time, the sampling problem was temporairly overlooked.

The sensor was found to be effective in tracking normal head motions. However, failures were observed at both high and low velocities.  The problems at extreme velocities are displayed in figures 7.0 and 8.0. In both figures, there are discrepancies between the azimuth rotation readings and the potentiometer reading. These discrepancies are failures of the inertial sensor to accurately register motion.  In cases of both high and low angular velocities, sensor failures were observed. When the angular velocity returned to a rate within the sensor's thresholds, the sensor was able to compensate for these failures. Extremely high and low velocity failures of the detector are not an issue for situations in which head tracking is the main concern.  When the substantial mass of one's head is combined with the weight of a HMD, the velocity and range of head motions is reduced making it easier for a VR system to function properly.

Figure 7.0: Graph of the yaw rotation of the inertial sensor compared with the readings from an attached potentiometer at low angular velocity.

 

Figure 8.0: Graph of the yaw rotation of the inertial sensor compared with readings from an attached potentiometer at high angular velocity.

 

The prediction capabilities of the inertial sensor were tested both without prediction, and with 50ms prediction in order to confirm that the system's prediction capabilities were in fact functioning properly. Rotation of the inertial sensor was performed at a constant velocity in order to generate the prediction graphs displayed in figures 9.0 and 10.0. In figure 9.0, the sensor and potentiometer display approximately the same results. When prediction was set to 50ms for the plot in figure 10.0, the yaw rotation of the sensor is shifted approximately 50ms ahead of the potentiometer readings.

 

Figure 9.0: Graph of the yaw rotation of the inertial sensor and the potentiometer readings acquired during angular rotation at a constant velocity.
 
 

 
 
 
Figure 10.0: Graph of the yaw rotation of the inertial sensor and the potentiometer readings acquired during angular rotation at a constant velocity. Prediction was set to 50ms.
 
 
 
 

Part 2: Integration of the IS600 into a Virtual Environment

Once the system had been successfully integrated into the virtual driving simulator at the U of R, the software was recompiled to include a toggle command that would turn prediction on and off while a subject was interacting with the driving environment. Thirteen subjects were tested to determine whether or not they preferred an environment generated based on the prediction of their head motions 30 ms in advance to an environment without prediction.  Of the subjects tested, none were able to choose prediction at a percentage greater than 75% or less than 25%.    Given a more generous range, 4 of the 13 preferred an environment without prediction, and 3 preferred an environment with prediction.  The rest of the subjects did not pick either environment with a percentage rate greater than 60%.

Table 2.0: Chart of subject environment preferences. Subjects were asked to choose 30 times between prediction of 30 ms and no prediction. Subject 7 and subject 4 were female. The rest were male.
 
Subject
With Prediction
Without Prediction
% Prediction Preferred
Subject 1
10
20
33%
Subject 2
16
14
53%
Subject 3
16
14
53%
Subject 4
16
14
53%
Subject 5
16
14
53%
Subject 6
10
20
33%
Subject 7
17
13
57%
Subject 8
11
19
37%
Subject 9
19
11
63%
Subject 10
16
14
53%
Subject 11
13
17
43%
Subject 12
11
19
37%
Subject 13
18
12
60%
 


 

Conclusions

Characterization and testing of the IS600 was complicated by software and firmware problems, in particular the random sampling rate of the system.  In spite of these problems the IS600 shows promise. A maximum threshold was not determined because results could not be reproduced with any accuracy.  A minimum threshold was determined to be approximately 3degrees per second.  Tests of the prediction capabilities at 50ms revealed that a toggle command could be successfully integrated into the system. Perceptual testing of the IS600 capabilities did not reveal conclusive results.  When asked to choose between environments created based on 30 ms prediction and environments based on real head positions, subjects were not able to repeatedly pick either of the environments with confidence.  Table 1.0 displays the breakdown of the system lag and is the reason that subjects were tested in the driving environment with a 30 ms prediction time. Table 2.0 displays the subject breakdown, and the percentage of trials in which they preferred prediction. Although a strong conclusion can not be made as to the success of the system prediction capabilities, it can not be concluded that prediction is not an important option.  It is clear that this system has the potential to surpass other tracking systems currently on the market. Until the hardware and software issues have been worked out by InterSense, purchasing the IS600 will be a difficult choice to make. The cost is higher for the InterSense by a few thousand dollars, and the expected capabilities have not yet been properly integrated into the system. At this point, the IS600 behaves similarly to other trackers on the market.  The main difference between the systems is that InterSense has also included inertial prediction capabilities. Accurate data pertaining to the effectiveness of InterSense's prediction process can not be obtained until all software and hardware problems are resolved. Once the software and hardware issues have been resolved, InterSense will be marketing a very interesting product worth the extra money.
 

 

Table of Contents