The present method incorporates audio and visual cues from human gesticulation for automatic recognition. The methodology articulates a framework for co-analyzing gestures and prosodic elements of a person's speech. The methodology can be applied to a wide range of algorithms involving analysis of gesticulating individuals. The examples of interactive technology applications can range from information kiosks to personal computers. The video analysis of human activity provides a basis for the development of automated surveillance technologies in public places such as airports, shopping malls, and sporting events.