(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 122 results for author: Tanaka, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.04952  [pdf, other

    cs.CV

    Deep Bayesian Active Learning-to-Rank with Relative Annotation for Estimation of Ulcerative Colitis Severity

    Authors: Takeaki Kadota, Hideaki Hayashi, Ryoma Bise, Kiyohito Tanaka, Seiichi Uchida

    Abstract: Automatic image-based severity estimation is an important task in computer-aided diagnosis. Severity estimation by deep learning requires a large amount of training data to achieve a high performance. In general, severity estimation uses training data annotated with discrete (i.e., quantized) severity labels. Annotating discrete labels is often difficult in images with ambiguous severity, and the… ▽ More

    Submitted 9 September, 2024; v1 submitted 7 September, 2024; originally announced September 2024.

    Comments: 14 pages, 8 figures, accepted in Medical Image Analysis 2024

    Journal ref: Medical Image Analysis 2024

  2. arXiv:2409.02245  [pdf, other

    cs.SD cs.AI cs.LG eess.AS stat.ML

    FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo

    Abstract: Diffusion-based voice conversion (VC) techniques such as VoiceGrad have attracted interest because of their high VC performance in terms of speech quality and speaker similarity. However, a notable limitation is the slow inference caused by the multi-step reverse diffusion. Therefore, we propose FastVoiceGrad, a novel one-step diffusion-based VC that reduces the number of iterations from dozens to… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted to Interspeech 2024. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/fastvoicegrad/

  3. arXiv:2408.11202  [pdf, other

    stat.ML cs.AI cs.LG

    Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits

    Authors: Tatsuhiro Shimizu, Koichi Tanaka, Ren Kishimoto, Haruka Kiyohara, Masahiro Nomura, Yuta Saito

    Abstract: We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the action space. For example, it might choose a set of furniture pieces (a bed and a drawer) from available items (bed, drawer, chair, etc.) for interior design sales. This setting is widespread in fields such as recommender systems and healthcare, yet OPE/L of CCB r… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted at RecSys2024

  4. arXiv:2408.06874  [pdf, other

    cs.CL

    Leveraging Language Models for Emotion and Behavior Analysis in Education

    Authors: Kaito Tanaka, Benjamin Tan, Brian Wong

    Abstract: The analysis of students' emotions and behaviors is crucial for enhancing learning outcomes and personalizing educational experiences. Traditional methods often rely on intrusive visual and physiological data collection, posing privacy concerns and scalability issues. This paper proposes a novel method leveraging large language models (LLMs) and prompt engineering to analyze textual data from stud… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 8 pages

  5. arXiv:2408.04293  [pdf, other

    cs.CL cs.CY

    Are Social Sentiments Inherent in LLMs? An Empirical Study on Extraction of Inter-demographic Sentiments

    Authors: Kunitomo Tanaka, Ryohei Sasano, Koichi Takeda

    Abstract: Large language models (LLMs) are supposed to acquire unconscious human knowledge and feelings, such as social common sense and biases, by training models from large amounts of text. However, it is not clear how much the sentiments of specific social groups can be captured in various LLMs. In this study, we focus on social groups defined in terms of nationality, religion, and race/ethnicity, and va… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  6. arXiv:2406.16535  [pdf, other

    cs.CL cs.AI cs.LG

    Token-based Decision Criteria Are Suboptimal in In-context Learning

    Authors: Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue

    Abstract: In-Context Learning (ICL) typically utilizes classification criteria from probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation. To address this problem, we propose Hidden Calibration, which renounces token probabilities and u… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 21 pages, 14 figures, 8 tables

  7. arXiv:2406.11266  [pdf, ps, other

    cs.CV

    DRIP: Discriminative Rotation-Invariant Pole Landmark Descriptor for 3D LiDAR Localization

    Authors: Dingrui Li, Dedi Guo, Kanji Tanaka

    Abstract: In 3D LiDAR-based robot self-localization, pole-like landmarks are gaining popularity as lightweight and discriminative landmarks. This work introduces a novel approach called "discriminative rotation-invariant poles," which enhances the discriminability of pole-like landmarks while maintaining their lightweight nature. Unlike conventional methods that model a pole landmark as a 3D line segment pe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 4 pages, 1 table

  8. arXiv:2406.01468  [pdf, other

    cs.CL cs.AI cs.LG

    Understanding Token Probability Encoding in Output Embeddings

    Authors: Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue

    Abstract: In this paper, we investigate the output token probability information in the output embedding of language models. We provide an approximate common log-linear encoding of output token probabilities within the output embedding vectors and demonstrate that it is accurate and sparse when the output space is large and output logits are concentrated. Based on such findings, we edit the encoding in outp… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 15 pages, 17 figures, 3 tables

  9. arXiv:2405.06185  [pdf, other

    cs.CV

    Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection

    Authors: Koji Takeda, Kanji Tanaka, Yoshimasa Nakamura, Asako Kanezaki

    Abstract: In everyday indoor navigation, robots often needto detect non-distinctive small-change objects (e.g., stationery,lost items, and junk, etc.) to maintain domain knowledge. Thisis most relevant to ground-view change detection (GVCD), a recently emerging research area in the field of computer vision.However, these existing techniques rely on high-quality class-specific object priors to regularize a c… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 7 pages, 7 figures

  10. arXiv:2404.11727  [pdf

    cs.CV

    Deep Learning for Video-Based Assessment of Endotracheal Intubation Skills

    Authors: Jean-Paul Ainam, Erim Yanik, Rahul Rahul, Taylor Kunkes, Lora Cavuoto, Brian Clemency, Kaori Tanaka, Matthew Hackett, Jack Norfleet, Suvranu De

    Abstract: Endotracheal intubation (ETI) is an emergency procedure performed in civilian and combat casualty care settings to establish an airway. Objective and automated assessment of ETI skills is essential for the training and certification of healthcare providers. However, the current approach is based on manual feedback by an expert, which is subjective, time- and resource-intensive, and is prone to poo… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  11. 1-out-of-n Oblivious Signatures: Security Revisited and a Generic Construction with an Efficient Communication Cost

    Authors: Masayuki Tezuka, Keisuke Tanaka

    Abstract: 1-out-of-n oblivious signature by Chen (ESORIC 1994) is a protocol between the user and the signer. In this scheme, the user makes a list of n messages and chooses the message that the user wants to obtain a signature from the list. The user interacts with the signer by providing this message list and obtains the signature for only the chosen message without letting the signer identify which messa… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: ICISC 2023

  12. arXiv:2403.16464  [pdf, other

    cs.SD cs.LG eess.AS

    Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka

    Abstract: A generative adversarial network (GAN)-based vocoder trained with an adversarial discriminator is commonly used for speech synthesis because of its fast, lightweight, and high-quality characteristics. However, this data-driven model requires a large amount of training data incurring high data-collection costs. This fact motivates us to train a GAN-based vocoder on limited data. A promising solutio… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to ICASSP 2024. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/augcondd/

  13. ProgrammableGrass: A Shape-Changing Artificial Grass Display Adapted for Dynamic and Interactive Display Features

    Authors: Kojiro Tanaka, Akito Mizuno, Toranosuke Kato, Masahiko Mikawa, Makoto Fujisawa

    Abstract: There are various proposals for employing grass materials as a green landscape-friendly display. However, it is difficult for current techniques to display smooth animations using 8-bit images and to adjust display resolution, similar to conventional displays. We present ProgrammableGrass, an artificial grass display with scalable resolution, capable of swiftly controlling grass color at 8-bit lev… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  14. arXiv:2403.10552  [pdf, ps, other

    cs.LG cs.AI cs.CV cs.RO

    Training Self-localization Models for Unseen Unfamiliar Places via Teacher-to-Student Data-Free Knowledge Transfer

    Authors: Kenta Tsukahara, Kanji Tanaka, Daiki Iwata

    Abstract: A typical assumption in state-of-the-art self-localization models is that an annotated training dataset is available in the target workspace. However, this does not always hold when a robot travels in a general open-world. This study introduces a novel training scheme for open-world distributed robot systems. In our scheme, a robot ("student") can ask the other robots it meets at unfamiliar places… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 7 pages, 3 figures, technical report

  15. arXiv:2402.15830  [pdf, other

    cs.HC cs.ET cs.RO

    Swarm Body: Embodied Swarm Robots

    Authors: Sosuke Ichihashi, So Kuroki, Mai Nishimura, Kazumi Kasaura, Takefumi Hiraki, Kazutoshi Tanaka, Shigeo Yoshida

    Abstract: The human brain's plasticity allows for the integration of artificial body parts into the human body. Leveraging this, embodied systems realize intuitive interactions with the environment. We introduce a novel concept: embodied swarm robots. Swarm robots constitute a collective of robots working in harmony to achieve a common objective, in our case, serving as functional body parts. Embodied swarm… ▽ More

    Submitted 29 February, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  16. arXiv:2402.06092  [pdf, other

    cs.CV cs.RO

    CLIP-Loc: Multi-modal Landmark Association for Global Localization in Object-based Maps

    Authors: Shigemichi Matsuzaki, Takuma Sugino, Kazuhito Tanaka, Zijun Sha, Shintaro Nakaoka, Shintaro Yoshizawa, Kazuhiro Shintani

    Abstract: This paper describes a multi-modal data association method for global localization using object-based maps and camera images. In global localization, or relocalization, using object-based maps, existing methods typically resort to matching all possible combinations of detected objects and landmarks with the same object category, followed by inlier extraction using RANSAC or brute-force search. Thi… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 7 pages, 7 figures. Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2024

  17. arXiv:2401.10005  [pdf, other

    cs.CV cs.CL

    Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation

    Authors: Kohei Uehara, Nabarun Goswami, Hanqin Wang, Toshiaki Baba, Kohtaro Tanaka, Tomohiro Hashimoto, Kai Wang, Rei Ito, Takagi Naoya, Ryo Umagami, Yingyi Wen, Tanachai Anakewat, Tatsuya Harada

    Abstract: The increasing demand for intelligent systems capable of interpreting and reasoning about visual content requires the development of large Vision-and-Language Models (VLMs) that are not only accurate but also have explicit reasoning capabilities. This paper presents a novel approach to develop a VLM with the ability to conduct explicit reasoning based on visual content and textual instructions. We… ▽ More

    Submitted 17 July, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  18. arXiv:2401.09014  [pdf, other

    cs.HC cs.MA

    Data assimilation approach for addressing imperfections in people flow measurement techniques using particle filter

    Authors: Ryo Murata, Kenji Tanaka

    Abstract: Understanding and predicting people flow in urban areas is useful for decision-making in urban planning and marketing strategies. Traditional methods for understanding people flow can be divided into measurement-based approaches and simulation-based approaches. Measurement-based approaches have the advantage of directly capturing actual people flow, but they face the challenge of data imperfection… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  19. arXiv:2401.08242  [pdf, other

    cs.CG math.NA

    Polygonal Sequence-driven Triangulation Validator: An Incremental Approach to 2D Triangulation Verification

    Authors: Sora Sawai, Kazuaki Tanaka, Katsuhisa Ozaki, Shin'ichi Oishi

    Abstract: Two-dimensional Delaunay triangulation is a fundamental aspect of computational geometry. This paper presents a novel algorithm that is specifically designed to ensure the correctness of 2D Delaunay triangulation, namely the Polygonal Sequence-driven Triangulation Validator (PSTV). Our research highlights the paramount importance of proper triangulation and the often overlooked, yet profound, impa… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 27 pages, 18 figures

    MSC Class: 65D18; 68U05; 65N30; 65G50

  20. arXiv:2312.16852  [pdf, other

    cs.LG cs.HC eess.SP

    Sensor Data Simulation for Anomaly Detection of the Elderly Living Alone

    Authors: Kai Tanaka, Mineichi Kudo, Keigo Kimura

    Abstract: With the increase of the number of elderly people living alone around the world, there is a growing demand for sensor-based detection of anomalous behaviors. Although smart homes with ambient sensors could be useful for detecting such anomalies, there is a problem of lack of sufficient real data for developing detection algorithms. For coping with this problem, several sensor data simulators have… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 26 pages, 10 figures

  21. arXiv:2312.15897  [pdf, ps, other

    cs.RO cs.CV cs.LG

    Recursive Distillation for Open-Set Distributed Robot Localization

    Authors: Kenta Tsukahara, Kanji Tanaka

    Abstract: A typical assumption in state-of-the-art self-localization models is that an annotated training dataset is available for the target workspace. However, this is not necessarily true when a robot travels around the general open world. This work introduces a novel training scheme for open-world distributed robot systems. In our scheme, a robot (``student") can ask the other robots it meets at unfamil… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: 5 pages, 4 figures, technical report

  22. arXiv:2311.00967  [pdf, other

    cs.RO cs.AI cs.CL

    Vision-Language Interpreter for Robot Task Planning

    Authors: Keisuke Shirai, Cristian C. Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, Shinsuke Mori

    Abstract: Large language models (LLMs) are accelerating the development of language-guided robot planners. Meanwhile, symbolic planners offer the advantage of interpretability. This paper proposes a new task that bridges these two trends, namely, multimodal planning problem specification. The aim is to generate a problem description (PD), a machine-readable file used by the planners to find a plan. By gener… ▽ More

    Submitted 19 February, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: ICRA 2024

  23. arXiv:2310.15504  [pdf, ps, other

    cs.CV cs.RO

    Cross-view Self-localization from Synthesized Scene-graphs

    Authors: Ryogo Yamamoto, Kanji Tanaka

    Abstract: Cross-view self-localization is a challenging scenario of visual place recognition in which database images are provided from sparse viewpoints. Recently, an approach for synthesizing database images from unseen viewpoints using NeRF (Neural Radiance Fields) technology has emerged with impressive performance. However, synthesized images provided by these techniques are often of lower quality than… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 5 pages, 5 figures, technical report

  24. arXiv:2310.08116  [pdf, other

    cs.RO cs.CV

    Multimodal Active Measurement for Human Mesh Recovery in Close Proximity

    Authors: Takahiro Maeda, Keisuke Takeshita, Norimichi Ukita, Kazuhito Tanaka

    Abstract: For physical human-robot interactions (pHRI), a robot needs to estimate the accurate body pose of a target person. However, in these pHRI scenarios, the robot cannot fully observe the target person's body with equipped cameras because the target person must be close to the robot for physical interaction. This close distance leads to severe truncation and occlusions and thus results in poor accurac… ▽ More

    Submitted 10 September, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted at Robotics and Automation Letters (RA-L)

  25. Gaze-Driven Sentence Simplification for Language Learners: Enhancing Comprehension and Readability

    Authors: Taichi Higasa, Keitaro Tanaka, Qi Feng, Shigeo Morishima

    Abstract: Language learners should regularly engage in reading challenging materials as part of their study routine. Nevertheless, constantly referring to dictionaries is time-consuming and distracting. This paper presents a novel gaze-driven sentence simplification system designed to enhance reading comprehension while maintaining their focus on the content. Our system incorporates machine learning models… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: Accepted by ACM ICMI 2023 workshops (Multimodal, Interactive Interfaces for Education)

  26. arXiv:2310.00242  [pdf, ps, other

    cs.RO cs.CV

    Walking = Traversable? : Traversability Prediction via Multiple Human Object Tracking under Occlusion

    Authors: Jonathan Tay Yu Liang, Kanji Tanaka

    Abstract: The emerging ``Floor plan from human trails (PfH)" technique has great potential for improving indoor robot navigation by predicting the traversability of occluded floors. This study presents an innovative approach that replaces first-person-view sensors with a third-person-view monocular camera mounted on the observer robot. This approach can gather measurements from multiple humans, expanding it… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: 6 figures, technical report

  27. arXiv:2308.07117  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki

    Abstract: The inverse short-time Fourier transform network (iSTFTNet) has garnered attention owing to its fast, lightweight, and high-fidelity speech synthesis. It obtains these characteristics using a fast and lightweight 1D CNN as the backbone and replacing some neural processes with iSTFT. Owing to the difficulty of a 1D CNN to model high-dimensional spectrograms, the frequency dimension is reduced via t… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted to Interspeech 2023. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet2/

  28. arXiv:2307.14105  [pdf, ps, other

    cs.RO

    Active Robot Vision for Distant Object Change Detection: A Lightweight Training Simulator Inspired by Multi-Armed Bandits

    Authors: Kouki Terashima, Kanji Tanaka, Ryogo Yamamoto, Jonathan Tay Yu Liang

    Abstract: In ground-view object change detection, the recently emerging mapless navigation has great potential to navigate a robot to objects distantly detected (e.g., books, cups, clothes) and acquire high-resolution object images, to identify their change states (no-change/appear/disappear). However, naively performing full journeys for every distant object requires huge sense/plan/action costs, proportio… ▽ More

    Submitted 24 October, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: 7 pages, 7 figures, technical report

  29. arXiv:2306.16086  [pdf, other

    cs.CV

    Lifelong Change Detection: Continuous Domain Adaptation for Small Object Change Detection in Every Robot Navigation

    Authors: Koji Takeda, Kanji Tanaka, Yoshimasa Nakamura

    Abstract: The recently emerging research area in robotics, ground view change detection, suffers from its ill-posed-ness because of visual uncertainty combined with complex nonlinear perspective projection. To regularize the ill-posed-ness, the commonly applied supervised learning methods (e.g., CSCD-Net) rely on manually annotated high-quality object-class-specific priors. In this work, we consider general… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  30. arXiv:2306.10782  [pdf, ps, other

    cs.CV cs.RO

    PartSLAM: Unsupervised Part-based Scene Modeling for Fast Succinct Map Matching

    Authors: Shogo Hanada, Kanji Tanaka

    Abstract: In this paper, we explore the challenging 1-to-N map matching problem, which exploits a compact description of map data, to improve the scalability of map matching techniques used by various robot vision tasks. We propose a first method explicitly aimed at fast succinct map matching, which consists only of map-matching subtasks. These tasks include offline map matching attempts to find a compact p… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: Offprint of IROS2013 paper

  31. arXiv:2306.06495  [pdf, other

    eess.AS cs.SD

    Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction

    Authors: Tomoya Yoshinaga, Keitaro Tanaka, Shigeo Morishima

    Abstract: This paper describes an audio-visual speech enhancement (AV-SE) method that estimates from noisy input audio a mixture of the speech of the speaker appearing in an input video (on-screen target speech) and of a selected speaker not appearing in the video (off-screen target speech). Although conventional AV-SE methods have suppressed all off-screen sounds, it is necessary to listen to a specific pr… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

    Comments: Accepted by EUSIPCO 2023

  32. Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning

    Authors: Sara Kashiwagi, Keitaro Tanaka, Qi Feng, Shigeo Morishima

    Abstract: This paper presents a novel metric learning approach to address the performance gap between normal and silent speech in visual speech recognition (VSR). The difference in lip movements between the two poses a challenge for existing VSR models, which exhibit degraded accuracy when applied to silent speech. To solve this issue and tackle the scarcity of training data for silent speech, we propose to… ▽ More

    Submitted 16 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023

  33. arXiv:2305.06179  [pdf, ps, other

    cs.CV cs.RO

    A Multi-modal Approach to Single-modal Visual Place Classification

    Authors: Tomoya Iwasaki, Kanji Tanaka, Kenta Tsukahara

    Abstract: Visual place classification from a first-person-view monocular RGB image is a fundamental problem in long-term robot navigation. A difficulty arises from the fact that RGB image classifiers are often vulnerable to spatial and appearance changes and degrade due to domain shifts, such as seasonal, weather, and lighting differences. To address this issue, multi-sensor fusion approaches combining RGB… ▽ More

    Submitted 10 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 7 pages, 6 figures, 1 table

  34. arXiv:2305.06141  [pdf, ps, other

    cs.CV cs.RO

    Active Semantic Localization with Graph Neural Embedding

    Authors: Mitsuki Yoshida, Kanji Tanaka, Ryogo Yamamoto, Daiki Iwata

    Abstract: Semantic localization, i.e., robot self-localization with semantic image modality, is critical in recently emerging embodied AI applications (e.g., point-goal navigation, object-goal navigation, vision language navigation) and topological mapping applications (e.g., graph neural SLAM, ego-centric topological map). However, most existing works on semantic localization focus on passive vision tasks… ▽ More

    Submitted 26 December, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: ACPR2023 (extended version)

    Journal ref: Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14406. Springer, Cham

  35. arXiv:2304.07087  [pdf, other

    cs.CV cs.LG

    Memory Efficient Diffusion Probabilistic Models via Patch-based Generation

    Authors: Shinei Arakawa, Hideki Tsunashima, Daichi Horita, Keitaro Tanaka, Shigeo Morishima

    Abstract: Diffusion probabilistic models have been successful in generating high-quality and diverse images. However, traditional models, whose input and output are high-resolution images, suffer from excessive memory requirements, making them less practical for edge devices. Previous approaches for generative adversarial networks proposed a patch-based method that uses positional encoding and global conten… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: Accepted to the Generative Models for Computer Vision workshop at CVPR 2023

  36. Pointcheval-Sanders Signature-Based Synchronized Aggregate Signature

    Authors: Masayuki Tezuka, Keisuke Tanaka

    Abstract: Synchronized aggregate signature is a special type of signature that all signers have a synchronized time period and allows aggregating signatures which are generated in the same period. This signature has a wide range of applications for systems that have a natural reporting period such as log and sensor data, or blockchain protocol. In CT-RSA 2016, Pointcheval and Sanders proposed the new random… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

    Journal ref: ICISC 2022

  37. arXiv:2303.13909  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis

    Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki

    Abstract: In speech synthesis, a generative adversarial network (GAN), training a generator (speech synthesizer) and a discriminator in a min-max game, is widely used to improve speech quality. An ensemble of discriminators is commonly used in recent neural vocoders (e.g., HiFi-GAN) and end-to-end text-to-speech (TTS) systems (e.g., VITS) to scrutinize waveforms from multiple perspectives. Such discriminato… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/waveunetd/

  38. arXiv:2302.12482  [pdf, other

    eess.IV cs.CV

    Disease Severity Regression with Continuous Data Augmentation

    Authors: Shumpei Takezaki, Kiyohito Tanaka, Seiichi Uchida, Takeaki Kadota

    Abstract: Disease severity regression by a convolutional neural network (CNN) for medical images requires a sufficient number of image samples labeled with severity levels. Conditional generative adversarial network (cGAN)-based data augmentation (DA) is a possible solution, but it encounters two issues. The first issue is that existing cGANs cannot deal with real-valued severity levels as their conditions,… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: Accepted at ISBI2023

  39. arXiv:2301.08107  [pdf, other

    cs.HC

    SHITARA: Sending Haptic Induced Touchable Alarm by Ring-shaped Air vortex

    Authors: Ryosei Kojima, Akihisa Shitara, Tatsuki Fushimi, Ryogo Niwa, Atushi Shinoda, Ryo Iijima, Kengo Tanaka, Sayan Sarcar, Yoichi Ochiai

    Abstract: Social interaction begins with the other person's attention, but it is difficult for a d/Deaf or hard-of-hearing (DHH) person to notice the initial conversation cues. Wearable or visual devices have been proposed previously. However, these devices are cumbersome to wear or must stay within the DHH person's vision. In this study, we have proposed SHITARA, a novel accessibility method with air vorte… ▽ More

    Submitted 7 November, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: 30 pages, 22 figures

  40. arXiv:2211.06430  [pdf, other

    q-bio.GN cs.LG

    Efficient HLA imputation from sequential SNPs data by Transformer

    Authors: Kaho Tanaka, Kosuke Kato, Naoki Nonaka, Jun Seita

    Abstract: Human leukocyte antigen (HLA) genes are associated with a variety of diseases, however direct typing of HLA is time and cost consuming. Thus various imputation methods using sequential SNPs data have been proposed based on statistical or deep learning models, e.g. CNN-based model, named DEEP*HLA. However, imputation efficiency is not sufficient for in frequent alleles and a large size of reference… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 6 pages

  41. arXiv:2210.13908  [pdf, other

    cs.RO math.OC

    Quasistatic contact-rich manipulation via linear complementarity quadratic programming

    Authors: Sotaro Katayama, Tatsunori Taniai, Kazutoshi Tanaka

    Abstract: Contact-rich manipulation is challenging due to dynamically-changing physical constraints by the contact mode changes undergone during manipulation. This paper proposes a versatile local planning and control framework for contact-rich manipulation that determines the continuous control action under variable contact modes online. We model the physical characteristics of contact-rich manipulation by… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: 8 pages, 7 figures. This work has been accepted to be presented at the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

  42. arXiv:2210.00282  [pdf, other

    cs.CL cs.NE

    Construction and Evaluation of a Self-Attention Model for Semantic Understanding of Sentence-Final Particles

    Authors: Shuhei Mandokoro, Natsuki Oka, Akane Matsushima, Chie Fukada, Yuko Yoshimura, Koji Kawahara, Kazuaki Tanaka

    Abstract: Sentence-final particles serve an essential role in spoken Japanese because they express the speaker's mental attitudes toward a proposition and/or an interlocutor. They are acquired at early ages and occur very frequently in everyday conversation. However, there has been little proposal for a computational model of acquiring sentence-final particles. This paper proposes Subjective BERT, a self-at… ▽ More

    Submitted 1 October, 2022; originally announced October 2022.

    Comments: 4 pages, 1 figure. Published in the Program and Abstract Booklet of Workshop on Constructive Approaches to Co-Creative Communication in Joint Conference on Language Evolution, 2022

    ACM Class: I.2.7

  43. arXiv:2209.02943  [pdf, ps, other

    math-ph cs.ET math.PR quant-ph

    Skeleton structure inherent in discrete-time quantum walks

    Authors: Tomoki Yamagami, Etsuo Segawa, Ken'ichiro Tanaka, Takatomo Mihana, André Röhm, Ryoichi Horisaki, Makoto Naruse

    Abstract: In this paper, we claim that a common underlying structure--a skeleton structure--is present behind discrete-time quantum walks (QWs) on a one-dimensional lattice with a homogeneous coin matrix. This skeleton structure is independent of the initial state, and partially, even of the coin matrix. This structure is best interpreted in the context of quantum-walk-replicating random walks (QWRWs), i.e.… ▽ More

    Submitted 2 February, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: 26 pages, 9 figures

    Journal ref: Physical Review A, Vol. 107, Issue 1, 012222 (2023)

  44. arXiv:2208.08863  [pdf, ps, other

    cs.CV cs.RO

    Compressive Self-localization Using Relative Attribute Embedding

    Authors: Ryogo Yamamoto, Kanji Tanaka

    Abstract: The use of relative attribute (e.g., beautiful, safe, convenient) -based image embeddings in visual place recognition, as a domain-adaptive compact image descriptor that is orthogonal to the typical approach of absolute attribute (e.g., color, shape, texture) -based image embeddings, is explored in this paper.

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: 3 pages, 4 figures, An extended abstract version of a manuscript submitted to an international conference

  45. arXiv:2208.03020  [pdf, other

    cs.CV

    Deep Bayesian Active-Learning-to-Rank for Endoscopic Image Data

    Authors: Takeaki Kadota, Hideaki Hayashi, Ryoma Bise, Kiyohito Tanaka, Seiichi Uchida

    Abstract: Automatic image-based disease severity estimation generally uses discrete (i.e., quantized) severity labels. Annotating discrete labels is often difficult due to the images with ambiguous severity. An easier alternative is to use relative annotation, which compares the severity level between image pairs. By using a learning-to-rank framework with relative annotation, we can train a neural network… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

    Comments: 14 pages, 8 figures, accepted at MIUA 2022

  46. arXiv:2206.06533  [pdf, other

    cs.CV cs.RO eess.IV

    3D scene reconstruction from monocular spherical video with motion parallax

    Authors: Kenji Tanaka

    Abstract: In this paper, we describe a method to capture nearly entirely spherical (360 degree) depth information using two adjacent frames from a single spherical video with motion parallax. After illustrating a spherical depth information retrieval using two spherical cameras, we demonstrate monocular spherical stereo by using stabilized first-person video footage. Experiments demonstrated that the depth… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: 13 pages, 18 figures

    ACM Class: I.4.1; I.4.5

  47. arXiv:2204.13991  [pdf

    cs.NE cs.LG nlin.AO physics.optics

    Physical Deep Learning with Biologically Plausible Training Method

    Authors: Mitsumasa Nakajima, Katsuma Inoue, Kenji Tanaka, Yasuo Kuniyoshi, Toshikazu Hashimoto, Kohei Nakajima

    Abstract: The ever-growing demand for further advances in artificial intelligence motivated research on unconventional computation based on analog physical devices. While such computation devices mimic brain-inspired analog information processing, learning procedures still relies on methods optimized for digital processing such as backpropagation. Here, we present physical deep learning by extending a biolo… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

  48. arXiv:2204.11789  [pdf, ps, other

    quant-ph cs.MA cs.RO eess.SY stat.CO

    Travel time optimization on multi-AGV routing by reverse annealing

    Authors: Renichiro Haba, Masayuki Ohzeki, Kazuyuki Tanaka

    Abstract: Quantum annealing has been actively researched since D-Wave Systems produced the first commercial machine in 2011. Controlling a large fleet of automated guided vehicles is one of the real-world applications utilizing quantum annealing. In this study, we propose a formulation to control the traveling routes to minimize the travel time. We validate our formulation through simulation in a virtual pl… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: 11 pages, 5 figures, 1 table

  49. arXiv:2204.10497  [pdf, ps, other

    cs.RO cs.AI

    Active Domain-Invariant Self-Localization Using Ego-Centric and World-Centric Maps

    Authors: Kanya Kurauchi, Kanji Tanaka, Ryogo Yamamoto, Mitsuki Yoshida

    Abstract: The training of a next-best-view (NBV) planner for visual place recognition (VPR) is a fundamentally important task in autonomous robot navigation, for which a typical approach is the use of visual experiences that are collected in the target domain as training data. However, the collection of a wide variety of visual experiences in everyday navigation is costly and prohibitive for real-time robot… ▽ More

    Submitted 28 July, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: 13 pages, 4 figures, draft version of a manuscript submitted to CVMI2022

  50. arXiv:2203.16908  [pdf, other

    cs.DS

    Suffix tree-based linear algorithms for multiple prefixes, single suffix counting and listing problems

    Authors: Laurentius Leonard, Ken Tanaka

    Abstract: Given two strings $T$ and $S$ and a set of strings $P$, for each string $p \in P$, consider the unique substrings of $T$ that have $p$ as their prefix and $S$ as their suffix. Two problems then come to mind; the first problem being the counting of such substrings, and the second problem being the problem of listing all such substrings. In this paper, we describe linear-time, linear-space suffix tr… ▽ More

    Submitted 18 April, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: 16 pages, 5 figures