Machine Perception Research | ECE

Teaching Computers about Facial Expressions

ECE researchers are working with colleagues in Psychology to create a database of human facial expressions that will be of interest to a broad research community. The database was obtained using a Kinect sensor, which provides standard 2-D images as well as 3-D representations. The fully annotated dataset includes seven expressions (happiness, sadness, surprise, disgust, fear, anger, and neutral) for 32 subjects (males and females) ages 10 to 30, and with a variety of skin tones. The dataset has been instrumental in the creation of a preliminary system that automatically recognizes human facial expressions using both 2-D and 3-D data.

Visual Question Answering

Given an image and a free-form natural-language question about the image (e.g., "What kind of store is this?" or "Is it safe to cross the street?"), the machine's task is to automatically produce a concise, accurate, free-form, natural language answer ("bakery", "Yes"). An ECE team is investigating Visual Question Answering (VQA), which has applications with high societal impact that involve humans working in collaboration with machines to elicit and extract situationally relevant information from visual data. This research could improve the way visually impaired users live their daily lives, and revolutionize how society at large interacts with visual data.

Our main thesis is that VQA represents not a single, narrowly defined problem (e.g., image classification) but a rich spectrum of semantic scene-understanding problems and associated research directions. Each question in VQA may lie at a different point on this spectrum--from questions that directly map to existing well-studied computer-vision problems ("What is this room called?" = indoor scene recognition) all the way to questions that require an integrated approach of language (semantics), vision (scene), and reasoning (understanding) over a knowledge base ("Does the pizza in the back row next to the Coke seem vegetarian?").

We explore approaches that map to a sequence of waypoints along this spectrum including (i) pure computer vision; (ii) integrating vision + language; (iii) integrating vision + language + knowledge bases. We are also exploring approaches to (a) make these models interpretable; (b) train the machine to be curious and actively ask questions to learn; (c) use VQA as a new modality to learn more about the visual world than what existing annotation modalities allow; and (d) train the machine to know what it knows and what it does not.

Jan	MAR	Apr
	07
2019	2021	2023

Machine Perception Research | ECE | Virginia Tech

Research Areas

Machine Perception

Associated Labs & Facilities

Associated Faculty

Abbott, A.

Huang, Jia-Bin

Jia, Ruoxi

Jin, Ming

Current Research

Teaching Computers about Facial Expressions

Visual Question Answering

Related News