(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–7 of 7 results for author: Zimmermann, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2303.13859  [pdf, other

    cs.MM eess.IV

    XGC-VQA: A unified video quality assessment model for User, Professionally, and Occupationally-Generated Content

    Authors: Xinhui Huang, Chunyi Li, Abdelhak Bentaleb, Roger Zimmermann, Guangtao Zhai

    Abstract: With the rapid growth of Internet video data amounts and types, a unified Video Quality Assessment (VQA) is needed to inspire video communication with perceptual quality. To meet the real-time and universal requirements in providing such inspiration, this study proposes a VQA model from a classification of User Generated Content (UGC), Professionally Generated Content (PGC), and Occupationally Gen… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: 6 pages, 4 figures

  2. arXiv:2303.09818  [pdf, other

    eess.IV cs.MM

    A real-time blind quality-of-experience assessment metric for HTTP adaptive streaming

    Authors: Chunyi Li, May Lim, Abdelhak Bentaleb, Roger Zimmermann

    Abstract: In today's Internet, HTTP Adaptive Streaming (HAS) is the mainstream standard for video streaming, which switches the bitrate of the video content based on an Adaptive BitRate (ABR) algorithm. An effective Quality of Experience (QoE) assessment metric can provide crucial feedback to an ABR algorithm. However, predicting such real-time QoE on the client side is challenging. The QoE prediction requi… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

    Comments: 6 pages,4 figures

  3. arXiv:2211.15979  [pdf, other

    eess.SP cs.LG

    AirFormer: Predicting Nationwide Air Quality in China with Transformers

    Authors: Yuxuan Liang, Yutong Xia, Songyu Ke, Yiwei Wang, Qingsong Wen, Junbo Zhang, Yu Zheng, Roger Zimmermann

    Abstract: Air pollution is a crucial issue affecting human health and livelihoods, as well as one of the barriers to economic and social growth. Forecasting air quality has become an increasingly important endeavor with significant social impacts, especially in emerging countries like China. In this paper, we present a novel Transformer architecture termed AirFormer to collectively predict nationwide air qu… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: Published at AAAI-23

  4. arXiv:2010.16078  [pdf, other

    cs.CV eess.IV

    LIFI: Towards Linguistically Informed Frame Interpolation

    Authors: Aradhya Neeraj Mathur, Devansh Batra, Yaman Kumar, Rajiv Ratn Shah, Roger Zimmermann

    Abstract: In this work, we explore a new problem of frame interpolation for speech videos. Such content today forms the major form of online communication. We try to solve this problem by using several deep learning video generation algorithms to generate the missing frames. We also provide examples where computer vision models despite showing high performance on conventional non-linguistic metrics fail to… ▽ More

    Submitted 2 December, 2020; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: 9 pages, 7 tables, 4 figures

  5. arXiv:2008.02492  [pdf, other

    cs.CV cs.LG eess.IV

    Zero-Shot Multi-View Indoor Localization via Graph Location Networks

    Authors: Meng-Jiun Chiou, Zhenguang Liu, Yifang Yin, Anan Liu, Roger Zimmermann

    Abstract: Indoor localization is a fundamental problem in location-based applications. Current approaches to this problem typically rely on Radio Frequency technology, which requires not only supporting infrastructures but human efforts to measure and calibrate the signal. Moreover, data collection for all locations is indispensable in existing methods, which in turn hinders their large-scale deployment. In… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: Accepted at ACM MM 2020. 10 pages, 7 figures. Code and datasets available at https://github.com/coldmanck/zero-shot-indoor-localization-release

    ACM Class: I.2.10

    Journal ref: Proceedings of the 28th ACM International Conference on Multimedia, 2020

  6. arXiv:2006.08599  [pdf, other

    cs.CL cs.SD eess.AS

    "Notic My Speech" -- Blending Speech Patterns With Multimedia

    Authors: Dhruva Sahrawat, Yaman Kumar, Shashwat Aggarwal, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann

    Abstract: Speech as a natural signal is composed of three parts - visemes (visual part of speech), phonemes (spoken part of speech), and language (the imposed structure). However, video as a medium for the delivery of speech and a multimedia construct has mostly ignored the cognitive aspects of speech delivery. For example, video applications like transcoding and compression have till now ignored the fact h… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: Under Review

  7. arXiv:1907.01367  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Lipper: Synthesizing Thy Speech using Multi-View Lipreading

    Authors: Yaman Kumar, Rohit Jain, Khwaja Mohd. Salik, Rajiv Ratn Shah, Yifang yin, Roger Zimmermann

    Abstract: Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular lan… ▽ More

    Submitted 28 June, 2019; originally announced July 2019.

    Comments: Accepted at AAAI 2019