Search | arXiv e-print repository

Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models

Authors: Lucas Rafael Stefanel Gris, Arnaldo Candido Junior, Vinícius G. dos Santos, Bruno A. Papa Dias, Marli Quadros Leite, Flaviane Romani Fernandes Svartman, Sandra Aluísio

Abstract: The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were… ▽ More The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were not transcribed. This article presents an evaluation and error analysis of three automatic speech recognition models trained with spontaneous speech in Portuguese and one model trained with prepared speech. The evaluation allowed us to choose the best model, using WER and CER metrics, in a manually aligned sample of NURC/SP, to automatically transcribe 284 hours. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2208.11594 [pdf, other]

Active Gaze Control for Foveal Scene Exploration

Authors: Alexandre M. F. Dias, Luís Simões, Plinio Moreno, Alexandre Bernardino

Abstract: Active perception and foveal vision are the foundations of the human visual system. While foveal vision reduces the amount of information to process during a gaze fixation, active perception will change the gaze direction to the most promising parts of the visual field. We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene, identifying the objects pres… ▽ More Active perception and foveal vision are the foundations of the human visual system. While foveal vision reduces the amount of information to process during a gaze fixation, active perception will change the gaze direction to the most promising parts of the visual field. We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene, identifying the objects present in their surroundings with in least number of gaze shifts. Our approach is based on three key methods. First, we take an off-the-shelf deep object detector, pre-trained on a large dataset of regular images, and calibrate the classification outputs to the case of foveated images. Second, a body-centered semantic map, encoding the objects classifications and corresponding uncertainties, is sequentially updated with the calibrated detections, considering several data fusion techniques. Third, the next best gaze fixation point is determined based on information-theoretic metrics that aim at minimizing the overall expected uncertainty of the semantic map. When compared to the random selection of next gaze shifts, the proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts and reduces to one third the number of required gaze shifts to attain similar performance. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: 6 pages, 8 figures, ICDL 2022 (International Conference on Development and Learning, formerly ICDL-EpiRob)

arXiv:2008.02155 [pdf, other]

Multiscale Stochastic Simulation of the US Pacific Northwest Using Distributed Computing and Databases with Integrated Inflow and Variable Renewable Energy

Authors: Joaquim Dias Garcia, Guilherme Machado, André Dias, Gerson Couto, John Ollis, John Fazio, Daniel Hua

Abstract: Modelling challenges of the United States Pacific Northwest system have grown in the last decade. Besides classical modelling difficulties such as a complex hydro cascade with many operational constraints, we have seen higher penetration of variable renewable energy inside and outside the system leading to internal issues and completely different power exchanges with the West Coast System. The ana… ▽ More Modelling challenges of the United States Pacific Northwest system have grown in the last decade. Besides classical modelling difficulties such as a complex hydro cascade with many operational constraints, we have seen higher penetration of variable renewable energy inside and outside the system leading to internal issues and completely different power exchanges with the West Coast System. The analysis of adequacy and reliability of this system motivated the design and implementation of a five-step simulator including four planning phases and an operation step. The five levels were modeled as mathematical programs that are linked from top to bottom by fixed decisions and improvements in forecasts, on the other hand they are also linked from bottom to top by system updated states. The solution of the millions of resulting mathematical programs was made possible by applying state of the art optimization techniques and high performance data bases in a massively parallel environment. △ Less

Submitted 5 August, 2020; originally announced August 2020.

arXiv:2005.05856 [pdf, other]

Probabilistic Semantic Segmentation Refinement by Monte Carlo Region Growing

Authors: Philipe A. Dias, Henry Medeiros

Abstract: Semantic segmentation with fine-grained pixel-level accuracy is a fundamental component of a variety of computer vision applications. However, despite the large improvements provided by recent advances in the architectures of convolutional neural networks, segmentations provided by modern state-of-the-art methods still show limited boundary adherence. We introduce a fully unsupervised post-process… ▽ More Semantic segmentation with fine-grained pixel-level accuracy is a fundamental component of a variety of computer vision applications. However, despite the large improvements provided by recent advances in the architectures of convolutional neural networks, segmentations provided by modern state-of-the-art methods still show limited boundary adherence. We introduce a fully unsupervised post-processing algorithm that exploits Monte Carlo sampling and pixel similarities to propagate high-confidence pixel labels into regions of low-confidence classification. Our algorithm, which we call probabilistic Region Growing Refinement (pRGR), is based on a rigorous mathematical foundation in which clusters are modelled as multivariate normally distributed sets of pixels. Exploiting concepts of Bayesian estimation and variance reduction techniques, pRGR performs multiple refinement iterations at varied receptive fields sizes, while updating cluster statistics to adapt to local image features. Experiments using multiple modern semantic segmentation networks and benchmark datasets demonstrate the effectiveness of our approach for the refinement of segmentation predictions at different levels of coarseness, as well as the suitability of the variance estimates obtained in the Monte Carlo iterations as uncertainty measures that are highly correlated with segmentation accuracy. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: Submitted to IEEE Transactions on Image Processing (April 2020)

arXiv:1909.09225 [pdf, other]

Gaze Estimation for Assisted Living Environments

Authors: Philipe A. Dias, Damiano Malafronte, Henry Medeiros, Francesca Odone

Abstract: Effective assisted living environments must be able to perform inferences on how their occupants interact with one another as well as with surrounding objects. To accomplish this goal using a vision-based automated approach, multiple tasks such as pose estimation, object segmentation and gaze estimation must be addressed. Gaze direction in particular provides some of the strongest indications of h… ▽ More Effective assisted living environments must be able to perform inferences on how their occupants interact with one another as well as with surrounding objects. To accomplish this goal using a vision-based automated approach, multiple tasks such as pose estimation, object segmentation and gaze estimation must be addressed. Gaze direction in particular provides some of the strongest indications of how a person interacts with the environment. In this paper, we propose a simple neural network regressor that estimates the gaze direction of individuals in a multi-camera assisted living scenario, relying only on the relative positions of facial keypoints collected from a single pose estimation model. To handle cases of keypoint occlusion, our model exploits a novel confidence gated unit in its input layer. In addition to the gaze direction, our model also outputs an estimation of its own prediction uncertainty. Experimental results on a public benchmark demonstrate that our approach performs on pair with a complex, dataset-specific baseline, while its uncertainty predictions are highly correlated to the actual angular error of corresponding estimations. Finally, experiments on images from a real assisted living environment demonstrate the higher suitability of our model for its final application. △ Less

Submitted 19 September, 2019; originally announced September 2019.

Comments: Work to be published in its final version at WACV '20

arXiv:1908.11166 [pdf, other]

Fast Inter-Prediction based on Decision Trees for AV1 encoding

Authors: Jieon Kim, Saverio Blasi, Andre Seixas Dias, Marta Mrak, Ebroul Izquierdo

Abstract: The AOMedia Video 1 (AV1) standard can achieve considerable compression efficiency thanks to the usage of many advanced tools and improvements, such as advanced inter-prediction modes. However, these come at the cost of high computational complexity of encoder, which may limit the benefits of the standard in practical applications. This paper shows that not all sequences benefit from using all suc… ▽ More The AOMedia Video 1 (AV1) standard can achieve considerable compression efficiency thanks to the usage of many advanced tools and improvements, such as advanced inter-prediction modes. However, these come at the cost of high computational complexity of encoder, which may limit the benefits of the standard in practical applications. This paper shows that not all sequences benefit from using all such modes, which indicates that a number of encoder optimisations can be introduced to speed up AV1 encoding. A method based on decision trees is proposed to selectively decide whether to test all inter modes. Appropriate features are extracted and used to perform the decision for each block. Experimental results show that the proposed method can reduce the encoding time on average by 43.4\% with limited impact on the coding efficiency. △ Less

Submitted 29 August, 2019; originally announced August 2019.

arXiv:1908.04168 [pdf]

Decision Trees for Complexity Reduction in Video Compression

Authors: Natasha Westland, André Seixas Dias, Marta Mrak

Abstract: This paper proposes a method for complexity reduction in practical video encoders using multiple decision tree classifiers. The method is demonstrated for the fast implementation of the 'High Efficiency Video Coding' (HEVC) standard, chosen because of its high bit rate reduction capability but large complexity overhead. Optimal partitioning of each video frame into coding units (CUs) is the main s… ▽ More This paper proposes a method for complexity reduction in practical video encoders using multiple decision tree classifiers. The method is demonstrated for the fast implementation of the 'High Efficiency Video Coding' (HEVC) standard, chosen because of its high bit rate reduction capability but large complexity overhead. Optimal partitioning of each video frame into coding units (CUs) is the main source of complexity as a vast number of combinations are tested. The decision tree models were trained to identify when the CU testing process, a time-consuming Lagrangian optimisation, can be skipped i.e a high probability that the CU can remain whole. A novel approach to finding the simplest and most effective decision tree model called 'manual pruning' is described. Implementing the skip criteria reduced the average encoding time by 42.1% for a Bjøntegaard Delta rate detriment of 0.7%, for 17 standard test sequences in a range of resolutions and quantisation parameters. △ Less

Submitted 12 August, 2019; originally announced August 2019.

Showing 1–7 of 7 results for author: Dias, A