-
Weakly-Supervised Semantic Segmentation of Circular-Scan, Synthetic-Aperture-Sonar Imagery
Authors:
Isaac J. Sledge,
Dominic M. Byrne,
Jonathan L. King,
Steven H. Ostertag,
Denton L. Woods,
James L. Prater,
Jermaine L. Kennedy,
Timothy M. Marston,
Jose C. Principe
Abstract:
We propose a weakly-supervised framework for the semantic segmentation of circular-scan synthetic-aperture-sonar (CSAS) imagery. The first part of our framework is trained in a supervised manner, on image-level labels, to uncover a set of semi-sparse, spatially-discriminative regions in each image. The classification uncertainty of each region is then evaluated. Those areas with the lowest uncerta…
▽ More
We propose a weakly-supervised framework for the semantic segmentation of circular-scan synthetic-aperture-sonar (CSAS) imagery. The first part of our framework is trained in a supervised manner, on image-level labels, to uncover a set of semi-sparse, spatially-discriminative regions in each image. The classification uncertainty of each region is then evaluated. Those areas with the lowest uncertainties are then chosen to be weakly labeled segmentation seeds, at the pixel level, for the second part of the framework. Each of the seed extents are progressively resized according to an unsupervised, information-theoretic loss with structured-prediction regularizers. This reshaping process uses multi-scale, adaptively-weighted features to delineate class-specific transitions in local image content. Content-addressable memories are inserted at various parts of our framework so that it can leverage features from previously seen images to improve segmentation performance for related images.
We evaluate our weakly-supervised framework using real-world CSAS imagery that contains over ten seafloor classes and ten target classes. We show that our framework performs comparably to nine fully-supervised deep networks. Our framework also outperforms eleven of the best weakly-supervised deep networks. We achieve state-of-the-art performance when pre-training on natural imagery. The average absolute performance gap to the next-best weakly-supervised network is well over ten percent for both natural imagery and sonar imagery. This gap is found to be statistically significant.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Improving Speech Recognition for African American English With Audio Classification
Authors:
Shefali Garg,
Zhouyuan Huo,
Khe Chai Sim,
Suzan Schwartz,
Mason Chua,
Alëna Aksënova,
Tsendsuren Munkhdalai,
Levi King,
Darryl Wright,
Zion Mengesha,
Dongseong Hwang,
Tara Sainath,
Françoise Beaufays,
Pedro Moreno Mengibar
Abstract:
Automatic speech recognition (ASR) systems have been shown to have large quality disparities between the language varieties they are intended or expected to recognize. One way to mitigate this is to train or fine-tune models with more representative datasets. But this approach can be hindered by limited in-domain data for training and evaluation. We propose a new way to improve the robustness of a…
▽ More
Automatic speech recognition (ASR) systems have been shown to have large quality disparities between the language varieties they are intended or expected to recognize. One way to mitigate this is to train or fine-tune models with more representative datasets. But this approach can be hindered by limited in-domain data for training and evaluation. We propose a new way to improve the robustness of a US English short-form speech recognizer using a small amount of out-of-domain (long-form) African American English (AAE) data. We use CORAAL, YouTube and Mozilla Common Voice to train an audio classifier to approximately output whether an utterance is AAE or some other variety including Mainstream American English (MAE). By combining the classifier output with coarse geographic information, we can select a subset of utterances from a large corpus of untranscribed short-form queries for semi-supervised learning at scale. Fine-tuning on this data results in a 38.5% relative word error rate disparity reduction between AAE and MAE without reducing MAE quality.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data
Authors:
Alëna Aksënova,
Zhehuai Chen,
Chung-Cheng Chiu,
Daan van Esch,
Pavel Golik,
Wei Han,
Levi King,
Bhuvana Ramabhadran,
Andrew Rosenberg,
Suzan Schwartz,
Gary Wang
Abstract:
Building inclusive speech recognition systems is a crucial step towards developing technologies that speakers of all language varieties can use. Therefore, ASR systems must work for everybody independently of the way they speak. To accomplish this goal, there should be available data sets representing language varieties, and also an understanding of model configuration that is the most helpful in…
▽ More
Building inclusive speech recognition systems is a crucial step towards developing technologies that speakers of all language varieties can use. Therefore, ASR systems must work for everybody independently of the way they speak. To accomplish this goal, there should be available data sets representing language varieties, and also an understanding of model configuration that is the most helpful in achieving robust understanding of all types of speech. However, there are not enough data sets for accented speech, and for the ones that are already available, more training approaches need to be explored to improve the quality of accented speech recognition. In this paper, we discuss recent progress towards developing more inclusive ASR systems, namely, the importance of building new data sets representing linguistic diversity, and exploring novel training approaches to improve performance for all users. We address recent directions within benchmarking ASR systems for accented speech, measure the effects of wav2vec 2.0 pre-training on accented speech recognition, and highlight corpora relevant for diverse ASR evaluations.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Target Detection and Segmentation in Circular-Scan Synthetic-Aperture-Sonar Images using Semi-Supervised Convolutional Encoder-Decoders
Authors:
Isaac J. Sledge,
Matthew S. Emigh,
Jonathan L. King,
Denton L. Woods,
J. Tory Cobb,
Jose C. Principe
Abstract:
We propose a framework for saliency-based, multi-target detection and segmentation of circular-scan, synthetic-aperture-sonar (CSAS) imagery. Our framework relies on a multi-branch, convolutional encoder-decoder network (MB-CEDN). The encoder portion of the MB-CEDN extracts visual contrast features from CSAS images. These features are fed into dual decoders that perform pixel-level segmentation to…
▽ More
We propose a framework for saliency-based, multi-target detection and segmentation of circular-scan, synthetic-aperture-sonar (CSAS) imagery. Our framework relies on a multi-branch, convolutional encoder-decoder network (MB-CEDN). The encoder portion of the MB-CEDN extracts visual contrast features from CSAS images. These features are fed into dual decoders that perform pixel-level segmentation to mask targets. Each decoder provides different perspectives as to what constitutes a salient target. These opinions are aggregated and cascaded into a deep-parsing network to refine the segmentation.
We evaluate our framework using real-world CSAS imagery consisting of five broad target classes. We compare against existing approaches from the computer-vision literature. We show that our framework outperforms supervised, deep-saliency networks designed for natural imagery. It greatly outperforms unsupervised saliency approaches developed for natural imagery. This illustrates that natural-image-based models may need to be altered to be effective for this imaging-sonar modality.
△ Less
Submitted 17 February, 2022; v1 submitted 10 January, 2021;
originally announced January 2021.
-
Public Intervention Strategies for Distressed Communities
Authors:
Lester O. King
Abstract:
This research presents a methodology to comprehensively define Distressed Communities. We further identify if there is a significant difference in public investment between Distressed communities and Wealthy communities. One of the key tools in sustainability planning is the use of sustainability indicators (Sis). A considerable amount of scholarship has contributed to define and develop SI progra…
▽ More
This research presents a methodology to comprehensively define Distressed Communities. We further identify if there is a significant difference in public investment between Distressed communities and Wealthy communities. One of the key tools in sustainability planning is the use of sustainability indicators (Sis). A considerable amount of scholarship has contributed to define and develop SI programs for local level application (Elgert and Krueger, 2012). Much of the focus of SI research is on developing the ideal indicator based on defined criteria for each indicator (Hart, 1999; Innes and Booher 2000; Holman, 2009). Here we suggest a methodology beyond defining the ideal indicators to demonstrating how indicators can be used for more in-depth analysis of complex urban problems. In this analysis we reduce 34 development metrics to a smaller number of factors that represent how the data can be classified into groups based on similarities among 88 communities. Using the factor (group) that contained measures identifying Distressed Communities, the communities were alloted an index score and ranked. The top 10 communities were then compared to the bottom 10 communities according to 14 place based variables related to opportunities for local government led improvement.
△ Less
Submitted 28 September, 2016;
originally announced September 2016.
-
ACO Implementation for Sequence Alignment with Genetic Algorithms
Authors:
Aaron Lee,
Livia King
Abstract:
In this paper, we implement Ant Colony Optimization (ACO) for sequence alignment. ACO is a meta-heuristic recently developed for nearest neighbor approximations in large, NP-hard search spaces. Here we use a genetic algorithm approach to evolve the best parameters for an ACO designed to align two sequences. We then used the best parameters found to interpolate approximate optimal parameters for a…
▽ More
In this paper, we implement Ant Colony Optimization (ACO) for sequence alignment. ACO is a meta-heuristic recently developed for nearest neighbor approximations in large, NP-hard search spaces. Here we use a genetic algorithm approach to evolve the best parameters for an ACO designed to align two sequences. We then used the best parameters found to interpolate approximate optimal parameters for a given string length within a range. The basis of our comparison is the alignment given by the Needleman-Wunsch algorithm. We found that ACO can indeed be applied to sequence alignment. While it is computationally expensive compared to other equivalent algorithms, it is a promising algorithm that can be readily applied to a variety of other biological problems.
△ Less
Submitted 3 June, 2014;
originally announced June 2014.