Research
I'm generally interested in work which combines computer vision, robotics, language and learning, usually in a deep way for well-defined and constrained scenarios. I'm also involved in projects with applications to broader domains like the DARPA Mind's Eye program which aims to do action recognition 'in the wild'.
Hardware and robotic control |
Game learning |
|
We develop our own custom robotic hardware with a mix of LynxMotion, Servocity and many custom-milled parts. Our robots are robust but pose difficult challenges for traditional robotic control systems because of the inaccuracies of off-the-shelf servos, the skew and imprecision of off-the-shelf parts as well as their vibration, the changing spring constant and the need for very fine accuracy to manipulate small pieces with a large number of repetitions. To address these issues we have developed a novel method to control our robots using automatic differentiation through a non-linear formulation of inverse kinematics. |
A general approach to learning the rules of board games, learning to play valid games but not necessarily to play well, through visual observation of robotic play. The aim of this research is to develop a general approach to building automated systems that learn complex sets of rules from very few exemplars by using a combination of a novel linguistic model for learning, complex novel constraints for computer vision component to recognize unknown boards and pieces, and an active closed-loop system that can interact with a teacher both by example and using natural language. |
Grammars for Vision and Language |
Large-Scale Labelling of Video Events with Verbs |
|
A general approach to recognizing the structure of complex assemblies constructed from a known part inventory which generally suffer from significant occlusion by building a joint probability distribution using the output of feature detectors in the image conditioned over a grammar of valid and physically stable structures as well as a stochastic Montague grammar for a subset of English dealing with building parts We develop a general approach by analogy to speech processing where like in our domain individual detectors, phoneme detectors for speech processing and line&ellipse detectors for our domain, are by necessity very inaccurate and reasonable false negative rates lead to very unreasonable false positive rates. A grammar of valid Lincoln Log structures is used to both constrain the regions in the image where the line∧ellipse detectors must be applied and to condition the join distribution of the output of the detectors when performing structure recognition. |
As part of the DARPA Mind's Eye project we are developing novel techniques for labelling thousands of videos with a large collection of verbs. We have developed a novel approach to tracking objects reliably, extracting feature vectors from these object tracks and performing a forced-choice classification task. We've tested our approach on over 2000 videos and over 500,000 frames achieving roughly 70% classification accuracy on a one-out-of-22 classification task. We have also developed new methods to extract contours of participant objects, perform extremely low bit-rate lossy compression of video while still allowing humans to recognize the actions, and annotate videos will full grammatical sentences describing the event. |