CDS Lecture Series

Monday, March 27, 2000, 2:00 p.m.

Michael Brandstein
Division of Engineering and Applied Sciences
Harvard University

Audio and Video Signal Acquisiton in Challenging Environments: Current Research at the Harvard Intelligent Multi-Media Environments Laboratory (HIMMEL)

The overall goal of the research at the Harvard Intelligent Multi-Media Environments Laboratory (HIMMEL) is to produce automated and relevant high-quality signal capture in noisy enclosures which have been populated with remote microphones and video cameras. This process should be achieved without the active participation or distraction of its human users. This talk will focus on two projects currently underway at HIMMEL. The first deals with the multi-channel enhancement of speech degraded by reverberations and additive noise. We will discuss our work to incorporate speech modeling into the multi-channel context as opposed to addressing the problem strictly from a beamforming or inverse filtering perspective. This approach is shown to be capable of attenuating multipath effects without requiring explicit estimation of room channel responses. We will then present the results of work currently underway for real-time tracking of faces using a combination of acoustic and visual cues. Initial talker locations are estimated acoustically from microphone array data while precise localization and tracking are derived from image information. The system is capable of tracking multiple individuals simultaneously and is robust to nonlinear source motions, complex backgrounds, varying lighting conditions, and a variety of source-camera depths.

