Search | arXiv e-print repository

doi 10.1109/ACIIW52867.2021.9666238

Towards Understanding Confusion and Affective States Under Communication Failures in Voice-Based Human-Machine Interaction

Authors: Sujeong Kim, Abhinav Garlapati, Jonah Lubin, Amir Tamrakar, Ajay Divakaran

Abstract: We present a series of two studies conducted to understand user's affective states during voice-based human-machine interactions. Emphasis is placed on the cases of communication errors or failures. In particular, we are interested in understanding "confusion" in relation with other affective states. The studies consist of two types of tasks: (1) related to communication with a voice-based virtual… ▽ More We present a series of two studies conducted to understand user's affective states during voice-based human-machine interactions. Emphasis is placed on the cases of communication errors or failures. In particular, we are interested in understanding "confusion" in relation with other affective states. The studies consist of two types of tasks: (1) related to communication with a voice-based virtual agent: speaking to the machine and understanding what the machine says, (2) non-communication related, problem-solving tasks where the participants solve puzzles and riddles but are asked to verbally explain the answers to the machine. We collected audio-visual data and self-reports of affective states of the participants. We report results of two studies and analysis of the collected data. The first study was analyzed based on the annotator's observation, and the second study was analyzed based on the self-report. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Journal ref: 2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)

arXiv:2107.06886 [pdf, other]

"How to best say it?" : Translating Directives in Machine Language into Natural Language in the Blocks World

Authors: Sujeong Kim, Amir Tamrakar

Abstract: We propose a method to generate optimal natural language for block placement directives generated by a machine's planner during human-agent interactions in the blocks world. A non user-friendly machine directive, e.g., move(ObjId, toPos), is transformed into visually and contextually grounded referring expressions that are much easier for the user to comprehend. We describe an algorithm that progr… ▽ More We propose a method to generate optimal natural language for block placement directives generated by a machine's planner during human-agent interactions in the blocks world. A non user-friendly machine directive, e.g., move(ObjId, toPos), is transformed into visually and contextually grounded referring expressions that are much easier for the user to comprehend. We describe an algorithm that progressively and generatively transforms the machine's directive in ECI (Elementary Composable Ideas)-space, generating many alternative versions of the directive. We then define a cost function to evaluate the ease of comprehension of these alternatives and select the best option. The parameters for this cost function were derived empirically from a user study that measured utterance-to-action timings. △ Less

Submitted 14 July, 2021; originally announced July 2021.

arXiv:2106.09623 [pdf, other]

Towards Explainable Student Group Collaboration Assessment Models Using Temporal Representations of Individual Student Roles

Authors: Anirudh Som, Sujeong Kim, Bladimir Lopez-Prado, Svati Dhamija, Nonye Alozie, Amir Tamrakar

Abstract: Collaboration is identified as a required and necessary skill for students to be successful in the fields of Science, Technology, Engineering and Mathematics (STEM). However, due to growing student population and limited teaching staff it is difficult for teachers to provide constructive feedback and instill collaborative skills using instructional methods. Development of simple and easily explain… ▽ More Collaboration is identified as a required and necessary skill for students to be successful in the fields of Science, Technology, Engineering and Mathematics (STEM). However, due to growing student population and limited teaching staff it is difficult for teachers to provide constructive feedback and instill collaborative skills using instructional methods. Development of simple and easily explainable machine-learning-based automated systems can help address this problem. Improving upon our previous work, in this paper we propose using simple temporal-CNN deep-learning models to assess student group collaboration that take in temporal representations of individual student roles as input. We check the applicability of dynamically changing feature representations for student group collaboration assessment and how they impact the overall performance. We also use Grad-CAM visualizations to better understand and interpret the important temporal indices that led to the deep-learning model's decision. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: Accepted in the poster session at the 14th International Conference on Educational Data Mining

arXiv:2012.01461 [pdf, other]

ACE-Net: Fine-Level Face Alignment through Anchors and Contours Estimation

Authors: Jihua Huang, Amir Tamrakar

Abstract: We propose a novel facial Anchors and Contours Estimation framework, ACE-Net, for fine-level face alignment tasks. ACE-Net predicts facial anchors and contours that are richer than traditional facial landmarks while overcoming ambiguities and inconsistencies in their definitions. We introduce a weakly supervised loss enabling ACE-Net to learn from existing facial landmarks datasets without the nee… ▽ More We propose a novel facial Anchors and Contours Estimation framework, ACE-Net, for fine-level face alignment tasks. ACE-Net predicts facial anchors and contours that are richer than traditional facial landmarks while overcoming ambiguities and inconsistencies in their definitions. We introduce a weakly supervised loss enabling ACE-Net to learn from existing facial landmarks datasets without the need for reannotation. Instead, synthetic data, from which GT contours can be easily obtained, is used during training to bridge the density gap between landmarks and true facial contours. We evaluate the face alignment accuracy of ACE-Net with respect to the HELEN dataset which has 194 annotated facial landmarks, while it is trained with only 68 or 36 landmarks from the 300-W dataset. We show that ACE-Net generated contours are better than contours interpolated straight from the 68 GT landmarks and ACE-Net also outperforms models trained only with full supervision from GT landmarks-based contours. △ Less

Submitted 9 January, 2022; v1 submitted 2 December, 2020; originally announced December 2020.

arXiv:2007.06667 [pdf, other]

A Machine Learning Approach to Assess Student Group Collaboration Using Individual Level Behavioral Cues

Authors: Anirudh Som, Sujeong Kim, Bladimir Lopez-Prado, Svati Dhamija, Nonye Alozie, Amir Tamrakar

Abstract: K-12 classrooms consistently integrate collaboration as part of their learning experiences. However, owing to large classroom sizes, teachers do not have the time to properly assess each student and give them feedback. In this paper we propose using simple deep-learning-based machine learning models to automatically determine the overall collaboration quality of a group based on annotations of ind… ▽ More K-12 classrooms consistently integrate collaboration as part of their learning experiences. However, owing to large classroom sizes, teachers do not have the time to properly assess each student and give them feedback. In this paper we propose using simple deep-learning-based machine learning models to automatically determine the overall collaboration quality of a group based on annotations of individual roles and individual level behavior of all the students in the group. We come across the following challenges when building these models: 1) Limited training data, 2) Severe class label imbalance. We address these challenges by using a controlled variant of Mixup data augmentation, a method for generating additional data samples by linearly combining different pairs of data samples and their corresponding class labels. Additionally, the label space for our problem exhibits an ordered structure. We take advantage of this fact and also explore using an ordinal-cross-entropy loss function and study its effects with and without Mixup. △ Less

Submitted 2 September, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

Comments: Accepted in the ECCV 2020 workshop on Imbalance Problems in Computer Vision

arXiv:1603.06554 [pdf, other]

Action-Affect Classification and Morphing using Multi-Task Representation Learning

Authors: Timothy J. Shields, Mohamed R. Amer, Max Ehrlich, Amir Tamrakar

Abstract: Most recent work focused on affect from facial expressions, and not as much on body. This work focuses on body affect analysis. Affect does not occur in isolation. Humans usually couple affect with an action in natural interactions; for example, a person could be talking and smiling. Recognizing body affect in sequences requires efficient algorithms to capture both the micro movements that differe… ▽ More Most recent work focused on affect from facial expressions, and not as much on body. This work focuses on body affect analysis. Affect does not occur in isolation. Humans usually couple affect with an action in natural interactions; for example, a person could be talking and smiling. Recognizing body affect in sequences requires efficient algorithms to capture both the micro movements that differentiate between happy and sad and the macro variations between different actions. We depart from traditional approaches for time-series data analytics by proposing a multi-task learning model that learns a shared representation that is well-suited for action-affect classification as well as generation. For this paper we choose Conditional Restricted Boltzmann Machines to be our building block. We propose a new model that enhances the CRBM model with a factored multi-task component to become Multi-Task Conditional Restricted Boltzmann Machines (MTCRBMs). We evaluate our approach on two publicly available datasets, the Body Affect dataset and the Tower Game dataset, and show superior classification performance improvement over the state-of-the-art, as well as the generative abilities of our model. △ Less

Submitted 21 March, 2016; originally announced March 2016.

arXiv:1505.02137 [pdf, other]

Human Social Interaction Modeling Using Temporal Deep Networks

Authors: Mohamed R. Amer, Behjat Siddiquie, Amir Tamrakar, David A. Salter, Brian Lande, Darius Mehri, Ajay Divakaran

Abstract: We present a novel approach to computational modeling of social interactions based on modeling of essential social interaction predicates (ESIPs) such as joint attention and entrainment. Based on sound social psychological theory and methodology, we collect a new "Tower Game" dataset consisting of audio-visual capture of dyadic interactions labeled with the ESIPs. We expect this dataset to provide… ▽ More We present a novel approach to computational modeling of social interactions based on modeling of essential social interaction predicates (ESIPs) such as joint attention and entrainment. Based on sound social psychological theory and methodology, we collect a new "Tower Game" dataset consisting of audio-visual capture of dyadic interactions labeled with the ESIPs. We expect this dataset to provide a new avenue for research in computational social interaction modeling. We propose a novel joint Discriminative Conditional Restricted Boltzmann Machine (DCRBM) model that combines a discriminative component with the generative power of CRBMs. Such a combination enables us to uncover actionable constituents of the ESIPs in two steps. First, we train the DCRBM model on the labeled data and get accurate (76\%-49\% across various ESIPs) detection of the predicates. Second, we exploit the generative capability of DCRBMs to activate the trained model so as to generate the lower-level data corresponding to the specific ESIP that closely matches the actual training data (with mean square error 0.01-0.1 for generating 100 frames). We are thus able to decompose the ESIPs into their constituent actionable behaviors. Such a purely computational determination of how to establish an ESIP such as engagement is unprecedented. △ Less

Submitted 28 May, 2015; v1 submitted 6 May, 2015; originally announced May 2015.

Showing 1–7 of 7 results for author: Tamrakar, A