One of our active endeavors involves the development of an autonomous socially
assistive robot system for free-play interaction with children with autism
spectrum disorders (ASD). Socially assistive robots
(SAR) have been shown to have promise as potential assessment and therapeutic
tools, because children with ASD express an interest in interacting socially
with such machines. Our work
is motivated by the fact that SAR may hold significant promise for ASD
intervention. The long-term goal of this and related endeavors is to develop
autonomous robot systems that can aid in the diagnosis or treatment of ASD.
Since play features so prominently in interventions (both diagnostic and
therapeutic) for children with ASD, we are developing an interactive robot for
The unconstrained nature of the free-play task used as part of ASD therapy is intended to engage children on a wide range of the autism spectrum, especially lower-functioning children with less mature communication abilities. In our human-robot implementation of the free-play task, the child and the robot are free to interact however the child chooses, with no specific task or game. However, autonomous operation of the robot in such a free-form social setting presents a wide range of challenges, including understanding the social behavior that occurs during the experiment session in time to formulate appropriate real-time robot responses. In addition, the unconstrained nature of the interaction means that any a priori categorization of the child's behavior can be quickly and frequently confounded, especially considering the heterogeneous nature of the participant population.
We use a largely unsupervised approach for clustering proxemic features using Gaussian Mixture Models (GMM). This model is then used to classify recorded data as describing a child interacting with a robot or not interacting with a robot. We show the how this method can be successfully used to categorize actions in an unsupervised manner, and the challenges that exist for recognizing ASD-relevant activities from overhead video.
Trajectory recovery for a number of people in a room is accomplished by mounting overhead cameras in a grid pattern around the space such that the entire room is in view. Some overlap is left between adjacent cameras to minimize boundary errors. Frame capture occurs at a low-level after which we perform Bayer pattern demosaicing and a calibration step to minimize tracking errors due to lens distortion. A logging mechanism is used to ensure that frames from each camera are correctly time-stamped.
To facilitate person identification, people in the space are equipped with uniquely identifiable tags that are worn on a hat. This allows us to capture head orientation and position over time using the Augmented Reality Toolkit to perform tag finding and pose estimation. The library gives us a transformation allowing us to calculate position and yaw, pitch and roll angles for each tag present in each camera image. These measurements are relatively stable but can become quite noisy under certain lighting conditions and at the frame boundaries. To minimize the effects of noise we use a particle filter to further refine the estimate as well as integrate the measurements from all the cameras given a motion model that approximates possible human motions over each time step.
This work is supported in part by the NSF grants IIS-0803565, CNS-0709296, and in part by the Nancy Laurie Marks Family Foundation.
Open-source development of this project has utilized the fine contributions of Player/Stage and the Robot Operating System (ROS).