From: Jeff Pierce [jpierce@cs.cmu.edu] Sent: Wednesday, October 03, 2001 2:25 PM To: bowman@vt.edu; 3dui List Subject: RE: Comparison of HMD and Dome At 11:55 AM 10/3/01, Doug Bowman wrote: >> Generalizing the tasks performed in VE is exactly what I am doing. Your >> "testbed evaluation" paper is really useful. >> >> There is one point I got confused. I noticed that the model of the testbed >> (also other previous research) was created based on tasks. For instance, >> for selection and manipulation, a boxes scene was created; for travel >> , a town type scene was created. But in our model, we did all test by >> using only one model. The VE is a room with a set of 15 colored balls, >> cubes, torus, cylinders, and paramids (5 of each type, 3 of each color) >> along one wall and 15 matching platforms along the opposite wall. The task >> requires subjects to move the 15 balls on the left side the room over to >> the matching 15 platforms on the right side of the room. The subject has >> to go through a maze type walls, while avoiding the walls. We measure >> 6-DOF manipulation performance, task finishing time, sickness, and so on. >> Notice that subjects have to navigation to the object, pick up and then >> navigation, then drop. Two constant speeds are allowed. >> >> Does it matter we measure them together? My program did record all points >> (in each loop) subject traveled to. Should I seperate the task performace >> and discuss them? > >Jian, > >If you're looking for generalizable results, my personal feeling is >that it's best to separate the tasks and control all the outside factors >as much as possible. That way, when you do your analysis you can find >out statistically which factor is responsible for any changes in >performance. > >There are some other people on the list who disagree with me (would Jeff >like to chime in here?). How can I resist the opportunity to engage in thesis procrastination? =) For the rest of the list who're wondering what Doug is talking about, he and I had a short discussion about how well the results from particular tasks correspond to the results from real work. For example, many testbed tasks involve the manipulate of generic shapes (e.g. cubes, spheres) to allow experimenters to isolate the contributions of individual factors (e.g. size, distance). If we instead had users manipulate familiar objects (e.g. chairs) their task performance could be affected because they recognized the objects and made assumptions about their properties (e.g. size, distance). The advantage to using these types of tasks is that we can learn a great deal about the contributions of individual factors (e.g. how does doubling the size of a cube affect task performance?). The disadvantage is that the results do not necessarily "generalize" as well to real work. When engaged in real work, users that extra information available: they're working with familiar objects, allowing them to take advantage of the known properties of those objects. When you create these types of tasks you're choosing a particular point on a spectrum: more confidence about the contributions of individual factors in exchange for less confidence about how well the results transfer to real work. Fred Brooks wrote a paper in 1988 that discusses this spectrum; it's worth a read if you haven't looked at it: Frederick P. Brooks. Graphics Reality Through Illusion: Interactive Graphics Serving Science. CHI 1988 Proceedings, pages 1-11. We can also choose a point closer to the other end of the spectrum. On this end you get more confidence about how well your results reflect real work, but you pay with less confidence about the effects of individual factors. Consider a case where I want to learn whether technique A or technique B is better for arranging objects in a scene. If I create tasks of this type (e.g. moving furniture around a room, moving rides around an amusement park) I will arguably have more confidence in the result then if I make people move cylinders around a featureless environment. However, I won't be able to state the effects of size with as much confidence. If users are much less accurate positioning furniture at 500 feet than amusement park rides, is the difference because of the relative size of the objects or because of the types of objects (furniture vs. rides)? The trick, of course, is determining where on the spectrum you should be. I tend to lean toward the latter end of the spectrum because I'm an engineer at heart. If I need to choose a technique for a VE where drama students will be prototyping stage layouts, I'm probably better served by the latter type of study. On the other hand, to learn all about a particular technique and what makes it tick, you're probably better served by the former type of study. If you're more of a scientist interested in learning Truth you probably lean toward this end. The question you need to answer is what exactly you want to learn. Doug's recommendation (separate the tasks and control all the outside factors as much as possible) will help you draw conclusions about how particular factors affect performance in a particular display. For example, you might learn that when you double the size of the spheres performance gets faster in the HMD but not in the dome. On the other hand, all you might care about is whether training is more effective in display A than display B. In this case you need to focus more on making your tasks resemble real work than on controlling the individual factors. If you're training people to pull a piece of the international space station from the shuttle, navigate through space, and snap the piece into place, make your tasks similar to that and don't worry about the individual factors. So what do you want to learn? How individual factors affect performance in a particular display? Whether users will be more effective working in display A than display B? My impression is that you're trying to learn the former, so Doug's suggestion is the way to go. Jeff