This is not a full report, only pre-typeset sections.
Objective: Evaluate whether an interface based on combining a traditional robot teleoperation interface with mapping and informational overlays would alloy multiple robots to increase operator efficiency. Background: There is a drive within the field of robotic control to design interfaces that allow a single operator to operate multiple vehicles. Method: Seventeen people took part in a two task repeated measures design. The dependent variables assessed were time to complete the task, number of tasks completed and the mental workload required. Results: Testing showed, with an increase in the number of robots, there was a statistically significant increase in user satisfaction, a decrease in the time required to complete the trial, but a decrease in the completion of some tasks. Conclusion: This paper demonstrates an innovative approach to controlling the activities of a group of robots while providing tele-operation access to individual robots through the simultaneous display of traditional tele-operation data overlaid with map data.
In the wake of a major chemical, biological, radiological or nuclear event, a quick response is critical for both damage mitigation and further threat assessment. First response teams require time to establish their base of operations and set up the protective equipment necessary for ensuring responder safety. The field of robotics holds some promise for addressing these issues. Well before the logistical support necessary for supporting human responders is in place, robots could be on the scene assessing the damage, helping victims, and searching for additional threats.
For reasons of both operator availability and streamlining the logistical process, there is a strong interest in a single operator coordinating the activities of a group of robots. Achieving this requires advances on a variety of fronts, and an important one is the design of the control interface. An operator needs to be able to effectively focus on a single robot while simultaneously maintaining an overarching awareness of the situation and organizing the activities of a group of robots.
Existing work in this area (Humphrey et. al., 2007; Pitman at at., 2007) focuses on placing information about the positions of different vehicles at the periphery of the view from the camera mounted on the robot being teleoperated. The HCI application makes two significant changes on this paradigm: a two dimensional map is added where the user may direct robots' activities, and a set of overlays are added which permit the user to optionally display additional information. The application also divides the view into two areas, one of which changes depending on whether the user is teleoperating a specific robot (the manipulation interface) or assigning tasks to robots (the orchestration interface). One of the available overlays includes a semi-transparent view of the two-dimensional map overlaying the three-dimensional view from the robot's camera.
Sandy
To test the effectiveness of the interface, three simulated disaster scenarios were generated and seventeen users played the part of a first responder to a disaster event. In designing information to present in the overlays for these scenarios the goal was to incorporate information that would both be useful to the user and that might reasonably be generated during the course of a disaster response. The two overlays available during the trial scenarios represented theoretical data from on-site experts. One was a set of areas where the user was told chemical analysts wanted additional samples. The other was an area where the user was told that an expert on the site had identified a high likelihood for an additional explosive device.
The first stage of the testing was a tutorial on the operation of the interface. After brief introduction to the goals of the experiment and a background questionnaire, the user was stepped through a tutorial by the tester. The user was given a series of incremental tasks maneuvering a single robot and using it to collect two chemical samples and find a bomb. The ordering and instructions associated with the tasks were organized to incorporate the skills necessary for completing the trial scenarios: using the robots automatic behaviors, accessing various overlays and teleoperating the robot. During the course of the training, the user was encouraged to ask questions. The user was told that questions during the trial scenarios were discouraged to emphasize the importance of a thorough understanding.
After the training, which took on average 12 minutes, the user was given the first trial scenario. The user had no interaction from the tester and all knowledge about the tasks to be performed was learned from the available overlays. There were two trial scenarios both of which required the user to find two bombs and take three chemical samples. The number of robots was either two or four. Which quantity came first was randomized across participants. The tester had no interaction other than taking notes about situations where the user was confused and enforcing a ten minute time limit.
After the first trial, three surveys were completed. These were: a situational awareness survey about the users perceived awareness of the scenario, a usage survey about the users satisfaction with different elements of the interface, and the NASA-TLX mental workload survey.
Next, the user did the second trial scenario whichever number of robots (two or four) that they hadn't used in the previous trial. After this trial they completed the same three surveys they did following the first trial as well as an additional open-ended survey listing aspects that they enjoyed and disliked.
Sandy
There were a variety of common criticisms of the interface from users. Criticisms were collected both from comments by users during testing and from an open-ended survey completed at the end of all the trials. There was a large overlap between the two sources, and the two are combined in the following table with separate markers for the tester's notes (⌘) and exit surveys (⌖). If an element was present in both, it was only counted once as a tester's note.
Criticism | Number of Reporters | Percentage Reporting |
---|---|---|
difficult to tell when automatic behaviors are stopped | ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ | 76% |
can't add tasks from orchestration overlay | ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌖ | 59% |
impossible to know if a task has been completed | ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌖ ⌖ | 47% |
stall indicator not terribly accurate | ⌘ ⌖ ⌖ ⌖ ⌖ ⌖ ⌖ | 41% |
difficult to identify active robot | ⌘ ⌘ ⌘ ⌘ ⌘ ⌘ ⌖ | 41% |
can't turn camera without turning robot | ⌘ ⌘ ⌘ ⌘ ⌖ | 29% |
robot is too large in simulation | ⌘ ⌘ ⌘ ⌘ | 24% |
bug: robot stuck going forward | ⌘ ⌘ ⌘ ⌘ | 24% |
can't alter overlay opacity | ⌘ ⌘ ⌘ ⌖ | 24% |
want manual control from orchestration interface | ⌘ ⌘ ⌘ | 18% |
not possible to sample at robot's current location | ⌘ ⌘ | 12% |
bomb detection is difficult | ⌘ ⌘ | 12% |
bug: possible to drive off map | ⌘ ⌘ | 12% |
robot turns too slowly | ⌘ ⌖ | 12% |
lack of avoidance behaviors confusing | ⌘ ⌘ | 12% |
wanted Cancel /Add Task buttons reversed |
⌘ ⌘ | 12% |
needed breakdown of completion of subtasks | ⌖ | 6% |
need keyboard map toggle | ⌘ | 6% |
want auditory feedback | ⌖ | 6% |
difficult to read map in orchestration interface | ⌖ | 6% |
orchestration toggle button moves | ⌘ | 6% |
can't reposition waypoints | ⌘ | 6% |
The distribution of percentages supports Nielsen's controversial statement that user tests should only be conducted with five subjects. From a software engineering perspective, far too many users were tested. As a scientific study needing to show significance, calculations of power would be needed.
A theoretic point that Sandy and I disagree on is the extent to which generally applicable scientific hypotheses can be addressed while there are addressable significant usability issues present. I believe that our test is valid for our interface, i.e. the basic test design was valid. I do not believe however that, regardless of the results, that they are of general interest to the research community.
A metaphor is, say a test designed to answer the question, "can people paint equally well using their right and left hands?" I define a set of criteria for judging the relative quality of paintings, then I give my subjects a brick and a bucket of paint. My criticism is, regardless of my results, the test does not say anything about the question in general. The fact that the tools I used to perform the test consistently and significantly deviated from the tools that the subjects ideally wanted has a significant result on how generalizable my result is. I've certainly got valid data on painting with bricks, but the goal in science is generally to come up with results that are more universally applicable.
So long as there are interface issues that more than say 50% of users had a problem with, then there are issues with the general applicability of the research. Specifically with our interface, the most common issue was users not understanding that the robot's automatic behaviors had been suspended. Several users ended up spending large amounts of time focused on a single robot either teleoperating it or repeatedly adding and removing waypoints. The focus on a single robot because of that interface design issue certainly affected the hypothesis surrounding the efficiency controlling multiple robots.
As it happens, we found significance regardless. If we had not found significance, or even in that we did, the extent to which a majority of users agree that our interface differs from the ideal tool for completing the task significantly limits how meaningful our results are.
One potential situation where the consensus approach to tool design falls apart is the concept of paradigm shifting. I, for example, can't type. Typing slows me down and if I had never heard of typing before and was forced to interact with an interface by typing, I would likely dislike it and say it slows me down. So, there may be interface elements which a majority of users have an issue with, but that represent, unknown to the user, an actual closer representation of the ideal tool. In that situation however, testing needs to be done to verify that the posited improvement through a paradigm shift actually exists. In any case, our issues with over a 50% reporting rate do not fall into that category.
The statistical analysis of the information collected during testing reveals both positive and negative aspects to increasing the number of robots. When using four robots as opposed to two, there is a statistically significant increase in the overall user satisfaction (4.706 to 5.118) as well as a decrease in the time required to complete the scenario (661.71s to 563.313s). However, these results are somewhat misleading. Completion of the scenario was user reported, and the time to complete the scenario may have dropped in part because users were also more likely to forget to complete a task as evidenced by a drop in the number of chemical tasks completed (2.588 to 2.176). The number of bombs found increased slightly (1.294 to 1.353), but not to the point of significance.
Several of the reported statistics support the expected usage pattern of the interface. Users created both more waypoints (7.059 to 9.765) and tasks (both waypoints and samples) (10.647 to 13.294) when working with four robots, and changed the active robot more frequently (10.882 times to 16.941 times). This supports the concept that the interface makes it possible to do several tasks in parallel by assigning tasks and then switching between robots to monitor and guide them.
In this same vein, users also reported finding the automatic behaviors more convenient when operating with more robots though the overall impression was lukewarm (4.118 to 4.882). The indifferent response was in part based on the reported frustrations detecting that behaviors were suspended, and issues with the stall indicator.
In general the workload on the users was moderate. The NASA-TLX total mental workload was, on average, 53/100 on a linear scale. Similarly the situational demands, as reported on the SART survey, were 4.74/7. Surprisingly, there was not a significant increase in either the reported mental workload or the demands on attentional resources when the number of robots was increased.
Table x shows the design issues collected via tester observations and users comments. Many of these issues were reported only by one or two users and are arguably attributable to personal preference. Certain issues, however, were frequently repeated. Five design issues were reported by more than 40% of users:
Overall, the user testing suggests that users were comfortable with the increased number of robots, as evidinced by the reported increase in overall satisfaction, but that they were not aware of the detriment that the increase in the number of robots had on their situational awareness. The increase in the number of missing samples, when coupled with the lowered trial time, indicates an increased number of forgotten samples. This is a reasonable consequence of the increase in complexity when dealing with more robots, but the lack of awareness of a change in perceived situational awareness is troubling. Ideally the operator should be aware of their decreased awareness so that they can take measures to compensate.
Incorporating measures to address several of the issues identified by the users in future work has the potential of helping to address the problem of decreased awareness when dealing with larger numbers of robots.