Search | arXiv e-print repository

DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition

Authors: Xin Jing, Luyang Zhang, Jiangjian Xie, Alexander Gebhard, Alice Baird, Bjoern Schuller

Abstract: In ornithology, bird species are known to have variedit's widely acknowledged that bird species display diverse dialects in their calls across different regions. Consequently, computational methods to identify bird species onsolely through their calls face critsignificalnt challenges. There is growing interest in understanding the impact of species-specific dialects on the effectiveness of bird sp… ▽ More In ornithology, bird species are known to have variedit's widely acknowledged that bird species display diverse dialects in their calls across different regions. Consequently, computational methods to identify bird species onsolely through their calls face critsignificalnt challenges. There is growing interest in understanding the impact of species-specific dialects on the effectiveness of bird species recognition methods. Despite potential mitigation through the expansion of dialect datasets, the absence of publicly available testing data currently impedes robust benchmarking efforts. This paper presents the Dialect Dominated Dataset of Bird Vocalisation, the first cross-corpus dataset that focuses on dialects in bird vocalisations. The DB3V comprises more than 25 hours of audio recordings from 10 bird species distributed across three distinct regions in the contiguous United States (CONUS). In addition to presenting the dataset, we conduct analyses and establish baseline models for cross-corpus bird recognition. The data and code are publicly available online: https://zenodo.org/records/11544734 △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: accepted by Interspeech 2024

arXiv:2403.14048 [pdf, ps, other]

The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data

Authors: Alice Baird, Rachel Manzelli, Panagiotis Tzirakis, Chris Gagne, Haoqi Li, Sadie Allen, Sander Dieleman, Brian Kulis, Shrikanth S. Narayanan, Alan Cowen

Abstract: The NeurIPS 2023 Machine Learning for Audio Workshop brings together machine learning (ML) experts from various audio domains. There are several valuable audio-driven ML tasks, from speech emotion recognition to audio event detection, but the community is sparse compared to other ML areas, e.g., computer vision or natural language processing. A major limitation with audio is the available data; wi… ▽ More The NeurIPS 2023 Machine Learning for Audio Workshop brings together machine learning (ML) experts from various audio domains. There are several valuable audio-driven ML tasks, from speech emotion recognition to audio event detection, but the community is sparse compared to other ML areas, e.g., computer vision or natural language processing. A major limitation with audio is the available data; with audio being a time-dependent modality, high-quality data collection is time-consuming and costly, making it challenging for academic groups to apply their often state-of-the-art strategies to a larger, more generalizable dataset. In this short white paper, to encourage researchers with limited access to large-datasets, the organizers first outline several open-source datasets that are available to the community, and for the duration of the workshop are making several propriety datasets available. Namely, three vocal datasets, Hume-Prosody, Hume-VocalBurst, an acted emotional speech dataset Modulate-Sonata, and an in-game streamer dataset Modulate-Stream. We outline the current baselines on these datasets but encourage researchers from across audio to utilize them outside of the initial baseline tasks. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2402.19344 [pdf, other]

The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition

Authors: Dimitrios Kollias, Panagiotis Tzirakis, Alan Cowen, Stefanos Zafeiriou, Irene Kotsia, Alice Baird, Chris Gagne, Chunchang Shao, Guanyu Hu

Abstract: This paper describes the 6th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with IEEE CVPR 2024. The 6th ABAW Competition addresses contemporary challenges in understanding human emotions and behaviors, crucial for the development of human-centered technologies. In more detail, the Competition focuses on affect related bench… ▽ More This paper describes the 6th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with IEEE CVPR 2024. The 6th ABAW Competition addresses contemporary challenges in understanding human emotions and behaviors, crucial for the development of human-centered technologies. In more detail, the Competition focuses on affect related benchmarking tasks and comprises of five sub-challenges: i) Valence-Arousal Estimation (the target is to estimate two continuous affect dimensions, valence and arousal), ii) Expression Recognition (the target is to recognise between the mutually exclusive classes of the 7 basic expressions and 'other'), iii) Action Unit Detection (the target is to detect 12 action units), iv) Compound Expression Recognition (the target is to recognise between the 7 mutually exclusive compound expression classes), and v) Emotional Mimicry Intensity Estimation (the target is to estimate six continuous emotion dimensions). In the paper, we present these Challenges, describe their respective datasets and challenge protocols (we outline the evaluation metrics) and present the baseline systems as well as their obtained performance. More information for the Competition can be found in: https://affective-behavior-analysis-in-the-wild.github.io/6th. △ Less

Submitted 12 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2312.12332 [pdf, ps, other]

Connection between Sub-surface Layers and Surface Magnetic Activity over Multiple Solar Cycles

Authors: Mackenzie A. Baird, Sushanta C. Tripathy, Kiran Jain

Abstract: We investigate spatio-temporal evolution of high-degree acoustic mode frequencies of the Sun and the surface magnetic activity, over the course of multiple solar cycles, to improve our understanding of the connection between the solar interior and atmosphere. We focus on high-degree modes due to their ability to characterize conditions in the shear layer just below the solar surface. Using the ful… ▽ More We investigate spatio-temporal evolution of high-degree acoustic mode frequencies of the Sun and the surface magnetic activity, over the course of multiple solar cycles, to improve our understanding of the connection between the solar interior and atmosphere. We focus on high-degree modes due to their ability to characterize conditions in the shear layer just below the solar surface. Using the full-disk Doppler observations made by the Global Oscillation Network Group (GONG), mode frequencies covering the period from July 2001 to December 2021 are computed through the local helioseismic technique of ring diagrams. Considering 10.7 cm radio flux measurements, the sunspot number, and the local magnetic activity index as solar activity proxies, we note strong correlation between the frequency shifts and each activity index. We further investigate the hemispheric asymmetry in frequency shifts and magnetic activity and find that both the activity and frequencies in the descending phase of cycle 23 were dominant in the southern hemisphere, while in cycle 24 these quantities fluctuated between northern and southern hemispheres. Analyzing the frequency shifts at different latitudes with the progression of solar cycles, we observe that the shifts at mid-latitudes are dominant in the southern hemisphere during the maximum period of solar activity in cycle 24 but the values overlap as the cycle advances towards the minimum activity period. The frequency shifts at the beginning of cycle 25 are found to be dominant in the southern hemisphere following magnetic activity. The analysis presents additional evidence that the variability in oscillation frequencies are caused by both strong and weak magnetic fields. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted for publication in ApJ

arXiv:2305.03369 [pdf, other]

The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation

Authors: Lukas Christ, Shahin Amiriparian, Alice Baird, Alexander Kathan, Niklas Müller, Steffen Klug, Chris Gagne, Panagiotis Tzirakis, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, Björn W. Schuller

Abstract: The MuSe 2023 is a set of shared tasks addressing three different contemporary multimodal affect and sentiment analysis problems: In the Mimicked Emotions Sub-Challenge (MuSe-Mimic), participants predict three continuous emotion targets. This sub-challenge utilises the Hume-Vidmimic dataset comprising of user-generated videos. For the Cross-Cultural Humour Detection Sub-Challenge (MuSe-Humour), an… ▽ More The MuSe 2023 is a set of shared tasks addressing three different contemporary multimodal affect and sentiment analysis problems: In the Mimicked Emotions Sub-Challenge (MuSe-Mimic), participants predict three continuous emotion targets. This sub-challenge utilises the Hume-Vidmimic dataset comprising of user-generated videos. For the Cross-Cultural Humour Detection Sub-Challenge (MuSe-Humour), an extension of the Passau Spontaneous Football Coach Humour (Passau-SFCH) dataset is provided. Participants predict the presence of spontaneous humour in a cross-cultural setting. The Personalisation Sub-Challenge (MuSe-Personalisation) is based on the Ulm-Trier Social Stress Test (Ulm-TSST) dataset, featuring recordings of subjects in a stressed situation. Here, arousal and valence signals are to be predicted, whereas parts of the test labels are made available in order to facilitate personalisation. MuSe 2023 seeks to bring together a broad audience from different research communities such as audio-visual emotion recognition, natural language processing, signal processing, and health informatics. In this baseline paper, we introduce the datasets, sub-challenges, and provided feature sets. As a competitive baseline system, a Gated Recurrent Unit (GRU)-Recurrent Neural Network (RNN) is employed. On the respective sub-challenges' test datasets, it achieves a mean (across three continuous intensity targets) Pearson's Correlation Coefficient of .4727 for MuSe-Mimic, an Area Under the Curve (AUえーゆーC) value of .8310 for MuSe-Humor and Concordance Correlation Coefficient (CCC) values of .7482 for arousal and .7827 for valence in the MuSe-Personalisation sub-challenge. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: Baseline paper for the 4th Multimodal Sentiment Analysis Challenge (MuSe) 2023, a workshop at ACM Multimedia 2023

arXiv:2304.14882 [pdf, other]

The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

Authors: Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Alexander Barnhill, Maurice Gerczuk, Andreas Triantafyllopoulos, Alice Baird, Panagiotis Tzirakis, Chris Gagne, Alan S. Cowen, Nikola Lackovic, Marie-José Caraty, Claude Montacié

Abstract: The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two different problems for the first time in a research competition under well-defined conditions: In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classi… ▽ More The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two different problems for the first time in a research competition under well-defined conditions: In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the usual ComPaRE features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectRum toolkit; in addition, wav2vec2 models are used. △ Less

Submitted 1 May, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

Comments: 5 pages, part of the ACM Multimedia 2023 Grand Challenge "The ACM Multimedia 2023 Computational Paralinguistics Challenge (ComParE 2023). arXiv admin note: text overlap with arXiv:2205.06799

MSC Class: 68 ACM Class: I.2.7; I.5.0; J.3

arXiv:2303.01498 [pdf, ps, other]

ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges

Authors: Dimitrios Kollias, Panagiotis Tzirakis, Alice Baird, Alan Cowen, Stefanos Zafeiriou

Abstract: The fifth Affective Behavior Analysis in-the-wild (ABAW) Competition is part of the respective ABAW Workshop which will be held in conjunction with IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2023. The 5th ABAW Competition is a continuation of the Competitions held at ECCV 2022, IEEE CVPR 2022, ICCV 2021, IEEE FG 2020 and CVPR 2017 Conferences, and is dedicated at automatically… ▽ More The fifth Affective Behavior Analysis in-the-wild (ABAW) Competition is part of the respective ABAW Workshop which will be held in conjunction with IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2023. The 5th ABAW Competition is a continuation of the Competitions held at ECCV 2022, IEEE CVPR 2022, ICCV 2021, IEEE FG 2020 and CVPR 2017 Conferences, and is dedicated at automatically analyzing affect. For this year's Competition, we feature two corpora: i) an extended version of the Aff-Wild2 database and ii) the Hume-Reaction dataset. The former database is an audiovisual one of around 600 videos of around 3M frames and is annotated with respect to:a) two continuous affect dimensions -valence (how positive/negative a person is) and arousal (how active/passive a person is)-; b) basic expressions (e.g. happiness, sadness, neutral state); and c) atomic facial muscle actions (i.e., action units). The latter dataset is an audiovisual one in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities. Thus the 5th ABAW Competition encompasses four Challenges: i) uni-task Valence-Arousal Estimation, ii) uni-task Expression Classification, iii) uni-task Action Unit Detection, and iv) Emotional Reaction Intensity Estimation. In this paper, we present these Challenges, along with their corpora, we outline the evaluation metrics, we present the baseline systems and illustrate their obtained performance. △ Less

Submitted 20 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2202.10659

arXiv:2301.10477 [pdf, other]

HEAR4Health: A blueprint for making computer audition a staple of modern healthcare

Authors: Andreas Triantafyllopoulos, Alexander Kathan, Alice Baird, Lukas Christ, Alexander Gebhard, Maurice Gerczuk, Vincent Karas, Tobias Hübner, Xin Jing, Shuo Liu, Adria Mallol-Ragolta, Manuel Milling, Sandra Ottl, Anastasia Semertzidou, Srividya Tirunellai Rajamani, Tianhao Yan, Zijiang Yang, Judith Dineley, Shahin Amiriparian, Katrin D. Bartl-Pokorny, Anton Batliner, Florian B. Pokorny, Björn W. Schuller

Abstract: Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields of medical imaging, but also in the use of wearable… ▽ More Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields of medical imaging, but also in the use of wearables and other intelligent sensors. In comparison, computer audition can be seen to be lagging behind, at least in terms of commercial interest. Yet, audition has long been a staple assistant for medical practitioners, with the stethoscope being the quintessential sign of doctors around the world. Transforming this traditional technology with the use of AI entails a set of unique challenges. We categorise the advances needed in four key pillars: Hear, corresponding to the cornerstone technologies needed to analyse auditory signals in real-life conditions; Earlier, for the advances needed in computational and data efficiency; Attentively, for accounting to individual differences and handling the longitudinal nature of medical data; and, finally, Responsibly, for ensuring compliance to the ethical standards accorded to the field of medicine. △ Less

Submitted 25 January, 2023; originally announced January 2023.

arXiv:2210.15754

Proceedings of the ACII Affective Vocal Bursts Workshop and Competition 2022 (A-VB): Understanding a critically understudied modality of emotional expression

Authors: Alice Baird, Panagiotis Tzirakis, Jeffrey A. Brooks, Christopher B. Gregory, Björn Schuller, Anton Batliner, Dacher Keltner, Alan Cowen

Abstract: This is the Proceedings of the ACII Affective Vocal Bursts Workshop and Competition (A-VB). A-VB was a workshop-based challenge that introduces the problem of understanding emotional expression in vocal bursts -- a wide range of non-verbal vocalizations that includes laughs, grunts, gasps, and much more. With affective states informing both mental and physical wellbeing, the core focus of the A-VB… ▽ More This is the Proceedings of the ACII Affective Vocal Bursts Workshop and Competition (A-VB). A-VB was a workshop-based challenge that introduces the problem of understanding emotional expression in vocal bursts -- a wide range of non-verbal vocalizations that includes laughs, grunts, gasps, and much more. With affective states informing both mental and physical wellbeing, the core focus of the A-VB workshop was the broader discussion of current strategies in affective computing for modeling vocal emotional expression. Within this first iteration of the A-VB challenge, the participants were presented with four emotion-focused sub-challenges that utilize the large-scale and `in-the-wild' Hume-VB dataset. The dataset and the four sub-challenges draw attention to new innovations in emotion science as it pertains to vocal expression, addressing low- and high-dimensional theories of emotional expression, cultural variation, and `call types' (laugh, cry, sigh, etc.). △ Less

Submitted 27 October, 2022; originally announced October 2022.

arXiv:2207.06958

Proceedings of the ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts

Authors: Alice Baird, Panagiotis Tzirakis, Gauthier Gidel, Marco Jiralerspong, Eilif B. Muller, Kory Mathewson, Björn Schuller, Erik Cambria, Dacher Keltner, Alan Cowen

Abstract: This is the Proceedings of the ICML Expressive Vocalization (ExVo) Competition. The ExVo competition focuses on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, included three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first,… ▽ More This is the Proceedings of the ICML Expressive Vocalization (ExVo) Competition. The ExVo competition focuses on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, included three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to train a multi-task model to recognize expressed emotions and demographic traits from vocal bursts. The second, ExVo-Generate, requires participants to train a generative model that produces vocal bursts conveying ten different emotions. The third, ExVo-FewShot, requires participants to leverage few-shot learning incorporating speaker identity to train a model for the recognition of 10 emotions conveyed by vocal bursts. △ Less

Submitted 16 August, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

arXiv:2207.05691 [pdf, other]

doi 10.1145/3551876.3554817

The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress

Authors: Lukas Christ, Shahin Amiriparian, Alice Baird, Panagiotis Tzirakis, Alexander Kathan, Niklas Müller, Lukas Stappen, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, Björn W. Schuller

Abstract: The Multimodal Sentiment Analysis Challenge (MuSe) 2022 is dedicated to multimodal sentiment and emotion recognition. For this year's challenge, we feature three datasets: (i) the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset that contains audio-visual recordings of German football coaches, labelled for the presence of humour; (ii) the Hume-Reaction dataset in which reactions of in… ▽ More The Multimodal Sentiment Analysis Challenge (MuSe) 2022 is dedicated to multimodal sentiment and emotion recognition. For this year's challenge, we feature three datasets: (i) the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset that contains audio-visual recordings of German football coaches, labelled for the presence of humour; (ii) the Hume-Reaction dataset in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities, and (iii) the Ulm-Trier Social Stress Test (Ulm-TSST) dataset comprising of audio-visual data labelled with continuous emotion values (arousal and valence) of people in stressful dispositions. Using the introduced datasets, MuSe 2022 2022 addresses three contemporary affective computing problems: in the Humor Detection Sub-Challenge (MuSe-Humor), spontaneous humour has to be recognised; in the Emotional Reactions Sub-Challenge (MuSe-Reaction), seven fine-grained `in-the-wild' emotions have to be predicted; and in the Emotional Stress Sub-Challenge (MuSe-Stress), a continuous prediction of stressed emotion values is featured. The challenge is designed to attract different research communities, encouraging a fusion of their disciplines. Mainly, MuSe 2022 targets the communities of audio-visual emotion recognition, health informatics, and symbolic sentiment analysis. This baseline paper describes the datasets as well as the feature sets extracted from them. A recurrent neural network with LSTM cells is used to set competitive baseline results on the test partitions for each sub-challenge. We report an Area Under the Curve (AUえーゆーC) of .8480 for MuSe-Humor; .2801 mean (from 7-classes) Pearson's Correlations Coefficient for MuSe-Reaction, as well as .4931 Concordance Correlation Coefficient (CCC) and .4761 for valence and arousal in MuSe-Stress, respectively. △ Less

Submitted 21 October, 2022; v1 submitted 23 June, 2022; originally announced July 2022.

Comments: Baseline paper for the 3rd Multimodal Sentiment Analysis Challenge (MuSe) 2022, a full-day workshop at ACM Multimedia 2022

arXiv:2207.03572 [pdf, other]

The ACII 2022 Affective Vocal Bursts Workshop & Competition: Understanding a critically understudied modality of emotional expression

Authors: Alice Baird, Panagiotis Tzirakis, Jeffrey A. Brooks, Christopher B. Gregory, Björn Schuller, Anton Batliner, Dacher Keltner, Alan Cowen

Abstract: The ACII Affective Vocal Bursts Workshop & Competition is focused on understanding multiple affective dimensions of vocal bursts: laughs, gasps, cries, screams, and many other non-linguistic vocalizations central to the expression of emotion and to human communication more generally. This year's competition comprises four tracks using a large-scale and in-the-wild dataset of 59,299 vocalizations f… ▽ More The ACII Affective Vocal Bursts Workshop & Competition is focused on understanding multiple affective dimensions of vocal bursts: laughs, gasps, cries, screams, and many other non-linguistic vocalizations central to the expression of emotion and to human communication more generally. This year's competition comprises four tracks using a large-scale and in-the-wild dataset of 59,299 vocalizations from 1,702 speakers. The first, the A-VB-High task, requires competition participants to perform a multi-label regression on a novel model for emotion, utilizing ten classes of richly annotated emotional expression intensities, including; Awe, Fear, and Surprise. The second, the A-VB-Two task, utilizes the more conventional 2-dimensional model for emotion, arousal, and valence. The third, the A-VB-Culture task, requires participants to explore the cultural aspects of the dataset, training native-country dependent models. Finally, for the fourth task, A-VB-Type, participants should recognize the type of vocal burst (e.g., laughter, cry, grunt) as an 8-class classification. This paper describes the four tracks and baseline systems, which use state-of-the-art machine learning methods. The baseline performance for each track is obtained by utilizing an end-to-end deep learning model and is as follows: for A-VB-High, a mean (over the 10-dimensions) Concordance Correlation Coefficient (CCC) of 0.5687 CCC is obtained; for A-VB-Two, a mean (over the 2-dimensions) CCC of 0.5084 is obtained; for A-VB-Culture, a mean CCC from the four cultures of 0.4401 is obtained; and for A-VB-Type, the baseline Unweighted Average Recall (UAR) from the 8-classes is 0.4172 UAR. △ Less

Submitted 27 October, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

arXiv:2205.04328 [pdf, other]

Insights on Modelling Physiological, Appraisal, and Affective Indicators of Stress using Audio Features

Authors: Andreas Triantafyllopoulos, Sandra Zänkert, Alice Baird, Julian Konzok, Brigitte M. Kudielka, Björn W. Schuller

Abstract: Stress is a major threat to well-being that manifests in a variety of physiological and mental symptoms. Utilising speech samples collected while the subject is undergoing an induced stress episode has recently shown promising results for the automatic characterisation of individual stress responses. In this work, we introduce new findings that shed light onto whether speech signals are suited to… ▽ More Stress is a major threat to well-being that manifests in a variety of physiological and mental symptoms. Utilising speech samples collected while the subject is undergoing an induced stress episode has recently shown promising results for the automatic characterisation of individual stress responses. In this work, we introduce new findings that shed light onto whether speech signals are suited to model physiological biomarkers, as obtained via cortisol measurements, or self-assessed appraisal and affect measurements. Our results show that different indicators impact acoustic features in a diverse way, but that their complimentary information can nevertheless be effectively harnessed by a multi-tasking architecture to improve prediction performance for all of them. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: Paper accepted for publication at IEEE EMBC 2022. Rights remain with IEEE

arXiv:2205.01780 [pdf, other]

The ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts

Authors: Alice Baird, Panagiotis Tzirakis, Gauthier Gidel, Marco Jiralerspong, Eilif B. Muller, Kory Mathewson, Björn Schuller, Erik Cambria, Dacher Keltner, Alan Cowen

Abstract: The ICML Expressive Vocalization (ExVo) Competition is focused on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, includes three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to trai… ▽ More The ICML Expressive Vocalization (ExVo) Competition is focused on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, includes three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to train a multi-task model to recognize expressed emotions and demographic traits from vocal bursts. The second, ExVo-Generate, requires participants to train a generative model that produces vocal bursts conveying ten different emotions. The third, ExVo-FewShot, requires participants to leverage few-shot learning incorporating speaker identity to train a model for the recognition of 10 emotions conveyed by vocal bursts. This paper describes the three tracks and provides performance measures for baseline models using state-of-the-art machine learning strategies. The baseline for each track is as follows, for ExVo-MultiTask, a combined score, computing the harmonic mean of Concordance Correlation Coefficient (CCC), Unweighted Average Recall (UAR), and inverted Mean Absolute Error (MAE) ($S_{MTL}$) is at best, 0.335 $S_{MTL}$; for ExVo-Generate, we report Fréchet inception distance (FID) scores ranging from 4.81 to 8.27 (depending on the emotion) between the training set and generated samples. We then combine the inverted FID with perceptual ratings of the generated samples ($S_{Gen}$) and obtain 0.174 $S_{Gen}$; and for ExVo-FewShot, a mean CCC of 0.444 is obtained. △ Less

Submitted 12 July, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2202.08981 [pdf, other]

A Summary of the ComParE COVID-19 Challenges

Authors: Harry Coppock, Alican Akman, Christian Bergler, Maurice Gerczuk, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Jing Han, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Panagiotis Tzirakis, Anton Batliner, Cecilia Mascolo, Björn W. Schuller

Abstract: The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from infected individuals' respiratory sounds. We present… ▽ More The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from infected individuals' respiratory sounds. We present a summary of the results from the INTERSPEECH 2021 Computational Paralinguistics Challenges: COVID-19 Cough, (CCS) and COVID-19 Speech, (CSS). △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: 18 pages, 13 figures

arXiv:2201.00052 [pdf, other]

Evaluating Deep Music Generation Methods Using Data Augmentation

Authors: Toby Godwin, Georgios Rizos, Alice Baird, Najla D. Al Futaisi, Vincent Brisse, Bjoern W. Schuller

Abstract: Despite advances in deep algorithmic music generation, evaluation of generated samples often relies on human evaluation, which is subjective and costly. We focus on designing a homogeneous, objective framework for evaluating samples of algorithmically generated music. Any engineered measures to evaluate generated music typically attempt to define the samples' musicality, but do not capture qualiti… ▽ More Despite advances in deep algorithmic music generation, evaluation of generated samples often relies on human evaluation, which is subjective and costly. We focus on designing a homogeneous, objective framework for evaluating samples of algorithmically generated music. Any engineered measures to evaluate generated music typically attempt to define the samples' musicality, but do not capture qualities of music such as theme or mood. We do not seek to assess the musical merit of generated music, but instead explore whether generated samples contain meaningful information pertaining to emotion or mood/theme. We achieve this by measuring the change in predictive performance of a music mood/theme classifier after augmenting its training data with generated samples. We analyse music samples generated by three models -- SampleRNN, Jukebox, and DDSP -- and employ a homogeneous framework across all methods to allow for objective comparison. This is the first attempt at augmenting a music genre classification dataset with conditionally generated music. We investigate the classification performance improvement using deep music generation and the ability of the generators to make emotional music by using an additional, emotion annotation of the dataset. Finally, we use a classifier trained on real data to evaluate the label validity of class-conditionally generated samples. △ Less

Submitted 31 December, 2021; originally announced January 2022.

Journal ref: 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP)

arXiv:2107.12964 [pdf, other]

A Physiologically-Adapted Gold Standard for Arousal during Stress

Authors: Alice Baird, Lukas Stappen, Lukas Christ, Lea Schumann, Eva-Maria Meßner, Björn W. Schuller

Abstract: Emotion is an inherently subjective psychophysiological human-state and to produce an agreed-upon representation (gold standard) for continuous emotion requires a time-consuming and costly training procedure of multiple human annotators. There is strong evidence in the literature that physiological signals are sufficient objective markers for states of emotion, particularly arousal. In this contri… ▽ More Emotion is an inherently subjective psychophysiological human-state and to produce an agreed-upon representation (gold standard) for continuous emotion requires a time-consuming and costly training procedure of multiple human annotators. There is strong evidence in the literature that physiological signals are sufficient objective markers for states of emotion, particularly arousal. In this contribution, we utilise a dataset which includes continuous emotion and physiological signals - Heartbeats per Minute (BPM), Electrodermal Activity (EDA), and Respiration-rate - captured during a stress inducing scenario (Trier Social Stress Test). We utilise a Long Short-Term Memory, Recurrent Neural Network to explore the benefit of fusing these physiological signals with arousal as the target, learning from various audio, video, and textual based features. We utilise the state-of-the-art MuSe-Toolbox to consider both annotation delay and inter-rater agreement weighting when fusing the target signals. An improvement in Concordance Correlation Coefficient (CCC) is seen across features sets when fusing EDA with arousal, compared to the arousal only gold standard results. Additionally, BERT-based textual features' results improved for arousal plus all physiological signals, obtaining up to .3344 CCC compared to .2118 CCC for arousal only. Multimodal fusion also improves overall CCC with audio plus video features obtaining up to .6157 CCC to recognize arousal plus EDA and BPM. △ Less

Submitted 28 July, 2021; v1 submitted 27 July, 2021; originally announced July 2021.

arXiv:2107.11757 [pdf, other]

MuSe-Toolbox: The Multimodal Sentiment Analysis Continuous Annotation Fusion and Discrete Class Transformation Toolbox

Authors: Lukas Stappen, Lea Schumann, Benjamin Sertolli, Alice Baird, Benjamin Weigel, Erik Cambria, Björn W. Schuller

Abstract: We introduce the MuSe-Toolbox - a Python-based open-source toolkit for creating a variety of continuous and discrete emotion gold standards. In a single framework, we unify a wide range of fusion methods and propose the novel Rater Aligned Annotation Weighting (RAAW), which aligns the annotations in a translation-invariant way before weighting and fusing them based on the inter-rater agreements be… ▽ More We introduce the MuSe-Toolbox - a Python-based open-source toolkit for creating a variety of continuous and discrete emotion gold standards. In a single framework, we unify a wide range of fusion methods and propose the novel Rater Aligned Annotation Weighting (RAAW), which aligns the annotations in a translation-invariant way before weighting and fusing them based on the inter-rater agreements between the annotations. Furthermore, discrete categories tend to be easier for humans to interpret than continuous signals. With this in mind, the MuSe-Toolbox provides the functionality to run exhaustive searches for meaningful class clusters in the continuous gold standards. To our knowledge, this is the first toolkit that provides a wide selection of state-of-the-art emotional gold standard methods and their transformation to discrete classes. Experimental results indicate that MuSe-Toolbox can provide promising and novel class formations which can be better predicted than hard-coded classes boundaries with minimal human intervention. The implementation (1) is out-of-the-box available with all dependencies using a Docker container (2). △ Less

Submitted 20 October, 2021; v1 submitted 25 July, 2021; originally announced July 2021.

Comments: (1) https://github.com/lstappen/MuSe-Toolbox (2) docker pull musetoolbox/musetoolbox

arXiv:2105.01633 [pdf, other]

An Estimation of Online Video User Engagement from Features of Continuous Emotions

Authors: Lukas Stappen, Alice Baird, Michelle Lienhart, Annalena Bätz, Björn Schuller

Abstract: Portraying emotion and trustworthiness is known to increase the appeal of video content. However, the causal relationship between these signals and online user engagement is not well understood. This limited understanding is partly due to a scarcity in emotionally annotated data and the varied modalities which express user engagement online. In this contribution, we utilise a large dataset of YouT… ▽ More Portraying emotion and trustworthiness is known to increase the appeal of video content. However, the causal relationship between these signals and online user engagement is not well understood. This limited understanding is partly due to a scarcity in emotionally annotated data and the varied modalities which express user engagement online. In this contribution, we utilise a large dataset of YouTube review videos which includes ca. 600 hours of dimensional arousal, valence and trustworthiness annotations. We investigate features extracted from these signals against various user engagement indicators including views, like/dislike ratio, as well as the sentiment of comments. In doing so, we identify the positive and negative influences which single features have, as well as interpretable patterns in each dimension which relate to user engagement. Our results demonstrate that smaller boundary ranges and fluctuations for arousal lead to an increase in user engagement. Furthermore, the extracted time-series features reveal significant (p<0.05) correlations for each dimension, such as, count below signal mean (arousal), number of peaks (valence), and absolute energy (trustworthiness). From this, an effective combination of features is outlined for approaches aiming to automatically predict several user engagement indicators. In a user engagement prediction paradigm we compare all features against semi-automatic (cross-task), and automatic (task-specific) feature selection methods. These selected feature sets appear to outperform the usage of all features, e.g., using all features achieves 1.55 likes per day (Lp/d) mean absolute error from valence; this improves through semi-automatic and automatic selection to 1.33 and 1.23 Lp/d, respectively (data mean 9.72 Lp/d with a std. 28.75 Lp/d). △ Less

Submitted 4 May, 2021; originally announced May 2021.

arXiv:2104.07123 [pdf, other]

The MuSe 2021 Multimodal Sentiment Analysis Challenge: Sentiment, Emotion, Physiological-Emotion, and Stress

Authors: Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, Björn W. Schuller

Abstract: Multimodal Sentiment Analysis (MuSe) 2021 is a challenge focusing on the tasks of sentiment and emotion, as well as physiological-emotion and emotion-based stress recognition through more comprehensively integrating the audio-visual, language, and biological signal modalities. The purpose of MuSe 2021 is to bring together communities from different disciplines; mainly, the audio-visual emotion rec… ▽ More Multimodal Sentiment Analysis (MuSe) 2021 is a challenge focusing on the tasks of sentiment and emotion, as well as physiological-emotion and emotion-based stress recognition through more comprehensively integrating the audio-visual, language, and biological signal modalities. The purpose of MuSe 2021 is to bring together communities from different disciplines; mainly, the audio-visual emotion recognition community (signal-based), the sentiment analysis community (symbol-based), and the health informatics community. We present four distinct sub-challenges: MuSe-Wilder and MuSe-Stress which focus on continuous emotion (valence and arousal) prediction; MuSe-Sent, in which participants recognise five classes each for valence and arousal; and MuSe-Physio, in which the novel aspect of `physiological-emotion' is to be predicted. For this years' challenge, we utilise the MuSe-CaR dataset focusing on user-generated reviews and introduce the Ulm-TSST dataset, which displays people in stressful depositions. This paper also provides detail on the state-of-the-art feature sets extracted from these datasets for utilisation by our baseline model, a Long Short-Term Memory-Recurrent Neural Network. For each sub-challenge, a competitive baseline for participants is set; namely, on test, we report a Concordance Correlation Coefficient (CCC) of .4616 CCC for MuSe-Wilder; .4717 CCC for MuSe-Stress, and .4606 CCC for MuSe-Physio. For MuSe-Sent an F1 score of 32.82 % is obtained. △ Less

Submitted 22 October, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

arXiv:2102.13468 [pdf, other]

The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates

Authors: Björn W. Schuller, Anton Batliner, Christian Bergler, Cecilia Mascolo, Jing Han, Iulia Lefter, Heysem Kaya, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Maurice Gerczuk, Panagiotis Tzirakis, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Leon J. M. Rothkrantz, Joeri Zwerts, Jelle Treep, Casper Kaandorp

Abstract: The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubChallenge, a three-way assessment of the level of es… ▽ More The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubChallenge, a three-way assessment of the level of escalation in a dialogue is featured; and in the Primates Sub-Challenge, four species vs background need to be classified. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the 'usual' COMPARE and BoAW features as well as deep unsupervised representation learning using the AuDeep toolkit, and deep feature extraction from pre-trained CNNs using the Deep Spectrum toolkit; in addition, we add deep end-to-end sequential modelling, and partially linguistic analysis. △ Less

Submitted 24 February, 2021; originally announced February 2021.

Comments: 5 pages

MSC Class: 68 ACM Class: I.2.7; I.5.0; J.3

arXiv:2102.08359 [pdf, other]

End-2-End COVID-19 Detection from Breath & Cough Audio

Authors: Harry Coppock, Alexander Gaskell, Panagiotis Tzirakis, Alice Baird, Lyn Jones, Björn W. Schuller

Abstract: Our main contributions are as follows: (I) We demonstrate the first attempt to diagnose COVID-19 using end-to-end deep learning from a crowd-sourced dataset of audio samples, achieving ROC-AUC of 0.846; (II) Our model, the COVID-19 Identification ResNet, (CIdeR), has potential for rapid scalability, minimal cost and improving performance as more data becomes available. This could enable regular CO… ▽ More Our main contributions are as follows: (I) We demonstrate the first attempt to diagnose COVID-19 using end-to-end deep learning from a crowd-sourced dataset of audio samples, achieving ROC-AUC of 0.846; (II) Our model, the COVID-19 Identification ResNet, (CIdeR), has potential for rapid scalability, minimal cost and improving performance as more data becomes available. This could enable regular COVID-19 testing at apopulation scale; (III) We introduce a novel modelling strategy using a custom deep neural network to diagnose COVID-19 from a joint breath and cough representation; (IV) We release our four stratified folds for cross parameter optimisation and validation on a standard public corpus and details on the models for reproducibility and future reference. △ Less

Submitted 6 January, 2021; originally announced February 2021.

Comments: 5 pages

MSC Class: 68T11 ACM Class: I.2; I.5; J.3

arXiv:2101.06053 [pdf, other]

The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements

Authors: Lukas Stappen, Alice Baird, Lea Schumann, Björn Schuller

Abstract: Truly real-life data presents a strong, but exciting challenge for sentiment and emotion research. The high variety of possible `in-the-wild' properties makes large datasets such as these indispensable with respect to building robust machine learning models. A sufficient quantity of data covering a deep variety in the challenges of each modality to force the exploratory analysis of the interplay o… ▽ More Truly real-life data presents a strong, but exciting challenge for sentiment and emotion research. The high variety of possible `in-the-wild' properties makes large datasets such as these indispensable with respect to building robust machine learning models. A sufficient quantity of data covering a deep variety in the challenges of each modality to force the exploratory analysis of the interplay of all modalities has not yet been made available in this context. In this contribution, we present MuSe-CaR, a first of its kind multimodal dataset. The data is publicly available as it recently served as the testing bed for the 1st Multimodal Sentiment Analysis Challenge, and focused on the tasks of emotion, emotion-target engagement, and trustworthiness recognition by means of comprehensively integrating the audio-visual and language modalities. Furthermore, we give a thorough overview of the dataset in terms of collection and annotation, including annotation tiers not used in this year's MuSe 2020. In addition, for one of the sub-challenges - predicting the level of trustworthiness - no participant outperformed the baseline model, and so we propose a simple, but highly efficient Multi-Head-Attention network that exceeds using multimodal fusion the baseline by around 0.2 CCC (almost 50 % improvement). △ Less

Submitted 20 October, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

Comments: accepted version

arXiv:2101.00339 [pdf, other]

An Artificial Intelligence System for Combined Fruit Detection and Georeferencing, Using RTK-Based Perspective Projection in Drone Imagery

Authors: Angus Baird, Stefano Giani

Abstract: This work presents an Artificial Intelligence (AI) system, based on the Faster Region-Based Convolution Neural Network (Faster R-CNN) framework, which detects and counts apples from oblique, aerial drone imagery of giant commercial orchards. To reduce computational cost, a novel precursory stage to the network is designed to preprocess raw imagery into cropped images of individual trees. Unique ge… ▽ More This work presents an Artificial Intelligence (AI) system, based on the Faster Region-Based Convolution Neural Network (Faster R-CNN) framework, which detects and counts apples from oblique, aerial drone imagery of giant commercial orchards. To reduce computational cost, a novel precursory stage to the network is designed to preprocess raw imagery into cropped images of individual trees. Unique geospatial identifiers are allocated to these using the perspective projection model. This employs Real-Time Kinematic (RTK) data, Digital Terrain and Surface Models (DTM and DSM), as well as internal and external camera parameters. The bulk of experiments however focus on tuning hyperparameters in the detection network itself. Apples which are on trees and apples which are on the ground are treated as separate classes. A mean Average Precision (mAP) metric, calibrated by the size of the two classes, is devised to mitigate spurious results. Anchor box design is of key interest due to the scale of the apples. As such, a k-means clustering approach, never before seen in literature for Faster R-CNN, resulted in the most significant improvements to calibrated mAP. Other experiments showed that the maximum number of box proposals should be 225; the initial learning rate of 0.001 is best applied to the adaptive RMS Prop optimiser; and ResNet 101 is the ideal base feature extractor when considering mAP and, to a lesser extent, inference time. The amalgamation of the optimal hyperparameters leads to a model with a calibrated mAP of 0.7627. △ Less

Submitted 1 January, 2021; originally announced January 2021.

Comments: 12 pages, 12 figures

arXiv:2004.14858 [pdf, other]

MuSe 2020 -- The First International Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop

Authors: Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W. Schuller, Iulia Lefter, Erik Cambria, Ioannis Kompatsiaris

Abstract: Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020 is a Challenge-based Workshop focusing on the tasks of sentiment recognition, as well as emotion-target engagement and trustworthiness detection by means of more comprehensively integrating the audio-visual and language modalities. The purpose of MuSe 2020 is to bring together communities from different disciplines; mainly, the audio-vis… ▽ More Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020 is a Challenge-based Workshop focusing on the tasks of sentiment recognition, as well as emotion-target engagement and trustworthiness detection by means of more comprehensively integrating the audio-visual and language modalities. The purpose of MuSe 2020 is to bring together communities from different disciplines; mainly, the audio-visual emotion recognition community (signal-based), and the sentiment analysis community (symbol-based). We present three distinct sub-challenges: MuSe-Wild, which focuses on continuous emotion (arousal and valence) prediction; MuSe-Topic, in which participants recognise domain-specific topics as the target of 3-class (low, medium, high) emotions; and MuSe-Trust, in which the novel aspect of trustworthiness is to be predicted. In this paper, we provide detailed information on MuSe-CaR, the first of its kind in-the-wild database, which is utilised for the challenge, as well as the state-of-the-art features and modelling approaches applied. For each sub-challenge, a competitive baseline for participants is set; namely, on test we report for MuSe-Wild a combined (valence and arousal) CCC of .2568, for MuSe-Topic a score (computed as 0.34$\cdot$ UAR + 0.66$\cdot$F1) of 76.78 % on the 10-class topic and 40.64 % on the 3-class emotion prediction, and for MuSe-Trust a CCC of .4359. △ Less

Submitted 9 July, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

Comments: Baseline Paper MuSe 2020, MuSe Workshop Challenge, ACM Multimedia

arXiv:1912.11920 [pdf]

Diversity-Oriented Synthesis of Polymers of Intrinsic Microporosity with Explicit Solid Solvation Cages for Lithium Ions

Authors: Miranda J. Baran, Mark E. Carrington, Swagat Sahu, Artem Baskin, Junhua Song, Michael A. Baird, Simon J. Teat, Stephen M. Meckler, Chengyin Fu, David Prendergast, Brett A. Helms

Abstract: Here, we describe a diversity-oriented synthetic strategy for microporous polymer membranes from which we identified those whose FVEs serve as solid solvation cages for lithium ions. Lead candidate membranes featuring such ion solvation cages exhibited both higher ionic conductivity and higher cation transference number than control membranes where FVEs were aspecific, which indicates conventional… ▽ More Here, we describe a diversity-oriented synthetic strategy for microporous polymer membranes from which we identified those whose FVEs serve as solid solvation cages for lithium ions. Lead candidate membranes featuring such ion solvation cages exhibited both higher ionic conductivity and higher cation transference number than control membranes where FVEs were aspecific, which indicates conventional bounds for membrane permeability and selectivity for ion transport can be overcome. Such membranes show promise as dendrite-suppressing anode-electrolyte interlayers in high-voltage lithium-metal batteries for electric mobility. △ Less

Submitted 26 December, 2019; originally announced December 2019.

arXiv:1908.01671 [pdf, other]

Acoustic Sounds for Wellbeing: A Novel Dataset and Baseline Results

Authors: Alice Baird, Bjoern Schuller

Abstract: The field of sound healing includes ancient practices coming from a broad range of cultures. Across such practices there is a variety of acoustic instrumentation utilised. Practitioners suggest that sound has the ability to target both mental and even physical health issues, e.g., chronic-stress, or joint-pain. Instruments including the Tibetan singing bowl and vocal chanting, are still widely use… ▽ More The field of sound healing includes ancient practices coming from a broad range of cultures. Across such practices there is a variety of acoustic instrumentation utilised. Practitioners suggest that sound has the ability to target both mental and even physical health issues, e.g., chronic-stress, or joint-pain. Instruments including the Tibetan singing bowl and vocal chanting, are still widely used today. With the noise-floor of modern urban soundscapes continually increasing and known to impact wellbeing, methods to improve this are needed. With that in mind, this study presents the Acoustic Sounds for Wellbeing (ASW) dataset. The ASW dataset is a dataset gathered from YouTube including 88\,+ hrs of audio from 5-classes of acoustic instrumentation (Gongs, Drumming, Singing Bowls, and Chanting). We additionally present initial baseline classification results on the dataset, finding that conventional Mel-Frequency Cepstra coefficient features achieve at best an unweighted average recalled of 57.4 % for a 5-class support vector machine classification paradigm. △ Less

Submitted 21 October, 2019; v1 submitted 5 August, 2019; originally announced August 2019.

arXiv:1903.07395 [pdf, other]

Voice command generation using Progressive Wavegans

Authors: Thomas Wiest, Nicholas Cummins, Alice Baird, Simone Hantke, Judith Dineley, Björn Schuller

Abstract: Generative Adversarial Networks (GANs) have become exceedingly popular in a wide range of data-driven research fields, due in part to their success in image generation. Their ability to generate new samples, often from only a small amount of input data, makes them an exciting research tool in areas with limited data resources. One less-explored application of GANs is the synthesis of speech and au… ▽ More Generative Adversarial Networks (GANs) have become exceedingly popular in a wide range of data-driven research fields, due in part to their success in image generation. Their ability to generate new samples, often from only a small amount of input data, makes them an exciting research tool in areas with limited data resources. One less-explored application of GANs is the synthesis of speech and audio samples. Herein, we propose a set of extensions to the WaveGAN paradigm, a recently proposed approach for sound generation using GANs. The aim of these extensions - preprocessing, Audio-to-Audio generation, skip connections and progressive structures - is to improve the human likeness of synthetic speech samples. Scores from listening tests with 30 volunteers demonstrated a moderate improvement (Cohen's d coefficient of 0.65) in human likeness using the proposed extensions compared to the original WaveGAN approach. △ Less

Submitted 13 March, 2019; originally announced March 2019.

Comments: 7 pages, 2 figures

arXiv:1903.07171 [pdf, other]

Responsible and Representative Multimodal Data Acquisition and Analysis: On Auditability, Benchmarking, Confidence, Data-Reliance & Explainability

Authors: Alice Baird, Simone Hantke, Björn Schuller

Abstract: The ethical decisions behind the acquisition and analysis of audio, video or physiological human data, harnessed for (deep) machine learning algorithms, is an increasing concern for the Artificial Intelligence (AI) community. In this regard, herein we highlight the growing need for responsible, and representative data collection and analysis, through a discussion of modality diversification. Facto… ▽ More The ethical decisions behind the acquisition and analysis of audio, video or physiological human data, harnessed for (deep) machine learning algorithms, is an increasing concern for the Artificial Intelligence (AI) community. In this regard, herein we highlight the growing need for responsible, and representative data collection and analysis, through a discussion of modality diversification. Factors such as Auditability, Benchmarking, Confidence, Data-reliance, and Explainability (ABCDE), have been touched upon within the machine learning community, and here we lay out these ABCDE sub-categories in relation to the acquisition and analysis of multimodal data, to weave through the high priority ethical concerns currently under discussion for AI. To this end, we propose how these five subcategories can be included in early planning of such acquisition paradigms. △ Less

Submitted 17 March, 2019; originally announced March 2019.

Comments: 4 pages

arXiv:1804.07532 [pdf]

Miniaturized atmospheric ionization detector

Authors: Karen Aplin, Aaron Briggs, Adam Baird, Peter Hastings, R. Giles Harrison, Graeme Marlton

Abstract: A small scintillator-based detector for atmospheric ionization measurements has been developed, partly in response to a need for better ionization data in the weather-forming regions of the atmosphere and partly with the intention of producing a commercially available device. The device can measure both the count rate and energy of atmospheric ionizing radiation. Here we report results of a test f… ▽ More A small scintillator-based detector for atmospheric ionization measurements has been developed, partly in response to a need for better ionization data in the weather-forming regions of the atmosphere and partly with the intention of producing a commercially available device. The device can measure both the count rate and energy of atmospheric ionizing radiation. Here we report results of a test flight over the UK in December 2017 where the detector was flown with two Geiger counters on a meteorological radiosonde. The count rate profile with height was consistent both with the Geigers and with previous work. The energy of incoming ionizing radiation increased substantially with altitude. △ Less

Submitted 20 April, 2018; originally announced April 2018.

Comments: Proc 18th Conference on Atmospheric Electricity, Nara, Japan, June 2018

arXiv:1003.2662 [pdf, other]

doi 10.1088/1748-0221/5/05/P05004

Construction and Commissioning of the CALICE Analog Hadron Calorimeter Prototype

Authors: C. Adloff, Y. Karyotakis, J. Repond, A. Brandt, H. Brown, K. De, C. Medina, J. Smith, J. Li, M. Sosebee, A. White, J. Yu, T. Buanes, G. Eigen, Y. Mikami, O. Miller, N. K. Watson, J. A. Wilson, T. Goto, G. Mavromanolakis, M. A. Thomson, D. R. Ward, W. Yan, D. Benchekroun, A. Hoummada , et al. (205 additional authors not shown)

Abstract: An analog hadron calorimeter (AHCAL) prototype of 5.3 nuclear interaction lengths thickness has been constructed by members of the CALICE Collaboration. The AHCAL prototype consists of a 38-layer sandwich structure of steel plates and highly-segmented scintillator tiles that are read out by wavelength-shifting fibers coupled to SiPMs. The signal is amplified and shaped with a custom-designed ASIC.… ▽ More An analog hadron calorimeter (AHCAL) prototype of 5.3 nuclear interaction lengths thickness has been constructed by members of the CALICE Collaboration. The AHCAL prototype consists of a 38-layer sandwich structure of steel plates and highly-segmented scintillator tiles that are read out by wavelength-shifting fibers coupled to SiPMs. The signal is amplified and shaped with a custom-designed ASIC. A calibration/monitoring system based on LED light was developed to monitor the SiPM gain and to measure the full SiPM response curve in order to correct for non-linearity. Ultimately, the physics goals are the study of hadron shower shapes and testing the concept of particle flow. The technical goal consists of measuring the performance and reliability of 7608 SiPMs. The AHCAL was commissioned in test beams at DESY and CERN. The entire prototype was completed in 2007 and recorded hadron showers, electron showers and muons at different energies and incident angles in test beams at CERN and Fermilab. △ Less

Submitted 12 March, 2010; originally announced March 2010.

Comments: 36 pages, 32 figures

Report number: DESY 10-032

Journal ref: JINST 5 (2010) P05004

arXiv:0805.4833 [pdf, ps, other]

doi 10.1088/1748-0221/3/08/P08001

Design and Electronics Commissioning of the Physics Prototype of a Si-W Electromagnetic Calorimeter for the International Linear Collider

Authors: CALICE Collaboration, J. Repond, J. Yu, C. M. Hawkes, Y. Mikami, O. Miller, N. K. Watson, J. A. Wilson, G. Mavromanolakis, M. A. Thomson, D. R. Ward, W. Yan, F. Badaud, D. Boumediene, C. Carloganu, R. Cornat, P. Gay, Ph. Gris, S. Manen, F. Morisseau, L. Royer, G. C. Blazey, D. Chakraborty, A. Dyshkant, K. Francis , et al. (92 additional authors not shown)

Abstract: The CALICE collaboration is studying the design of high performance electromagnetic and hadronic calorimeters for future International Linear Collider detectors. For the electromagnetic calorimeter, the current baseline choice is a high granularity sampling calorimeter with tungsten as absorber and silicon detectors as sensitive material. A ``physics prototype'' has been constructed, consisting… ▽ More The CALICE collaboration is studying the design of high performance electromagnetic and hadronic calorimeters for future International Linear Collider detectors. For the electromagnetic calorimeter, the current baseline choice is a high granularity sampling calorimeter with tungsten as absorber and silicon detectors as sensitive material. A ``physics prototype'' has been constructed, consisting of thirty sensitive layers. Each layer has an active area of 18x18 cm2 and a pad size of 1x1 cm2. The absorber thickness totals 24 radiation lengths. It has been exposed in 2006 and 2007 to electron and hadron beams at the DESY and CERN beam test facilities, using a wide range of beam energies and incidence angles. In this paper, the prototype and the data acquisition chain are described and a summary of the data taken in the 2006 beam tests is presented. The methods used to subtract the pedestals and calibrate the detector are detailed. The signal-over-noise ratio has been measured at 7.63 +/- 0.01. Some electronics features have been observed; these lead to coherent noise and crosstalk between pads, and also crosstalk between sensitive and passive areas. The performance achieved in terms of uniformity and stability is presented. △ Less

Submitted 5 August, 2008; v1 submitted 29 May, 2008; originally announced May 2008.

Comments: Content modified: minor review corrections implemented

Journal ref: JINST 3:P08001,2008

arXiv:hep-ex/0104010 [pdf, ps, other]

doi 10.1109/23.958765

A Fast High Resolution Track Trigger for the H1 Experiment

Authors: A. Baird, E. Elsen, Y. H. Fleming, M. Kolander, S. Kolya, D. Meer, D. Mercer, J. Naumann, P. R. Newman, D. Sankey, A. Schoening, H. -C. Schultz-Coulon, Ch. Wissing

Abstract: After 2001 the upgraded ep collider HERA will provide an about five times higher luminosity for the two experiments H1 and ZEUS. In order to cope with the expected higher event rates the H1 collaboration is building a track based trigger system, the Fast Track Trigger (FTT). It will be integrated in the first three levels (L1-L3) of the H1 trigger scheme to provide higher selectivity for events… ▽ More After 2001 the upgraded ep collider HERA will provide an about five times higher luminosity for the two experiments H1 and ZEUS. In order to cope with the expected higher event rates the H1 collaboration is building a track based trigger system, the Fast Track Trigger (FTT). It will be integrated in the first three levels (L1-L3) of the H1 trigger scheme to provide higher selectivity for events with charged particles. The FTT will allow to reconstruct 3-dimensional tracks in the central drift chamber down to 100 MeV/c within the L2 latency of ~ 23 mus. To reach the necessary momentum resolution of ~ 5% (at 1 GeV/c) sophisticated reconstruction algorithms have to be implemented using high density Field Programmable Gate Arrays (FPGA) and their embedded Content Addressable Memories (CAM). The final track parameter optimization will be done using non-iterative fits implemented in DSPs. While at the first trigger level rough track information will be provided, at L2 tracks with high resolution are available to form trigger decisions on topological and other track based criteria like multiplicities and momenta. At the third trigger level a farm of commercial processor boards will be used to compute physics quantities such as invariant masses. △ Less

Submitted 6 April, 2001; originally announced April 2001.

Comments: 6 pages, 7 figures, submitted to TNS

Journal ref: IEEE Trans.Nucl.Sci.48:1276-1285,2001

Showing 1–33 of 33 results for author: Baird, A