(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 644 results for author: Khan, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.08813  [pdf, other

    eess.IV cs.AI cs.CV

    FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

    Authors: Yu Tian, Congcong Wen, Min Shi, Muhammad Muneeb Afzal, Hao Huang, Muhammad Osama Khan, Yan Luo, Yi Fang, Mengyu Wang

    Abstract: Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., di… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Codes available at https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain

  2. arXiv:2407.07054  [pdf, other

    cs.CR cs.ET cs.LG

    A Differentially Private Blockchain-Based Approach for Vertical Federated Learning

    Authors: Linh Tran, Sanjay Chari, Md. Saikat Islam Khan, Aaron Zachariah, Stacy Patterson, Oshani Seneviratne

    Abstract: We present the Differentially Private Blockchain-Based Vertical Federal Learning (DP-BBVFL) algorithm that provides verifiability and privacy guarantees for decentralized applications. DP-BBVFL uses a smart contract to aggregate the feature representations, i.e., the embeddings, from clients transparently. We apply local differential privacy to provide privacy for embeddings stored on a blockchain… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  3. arXiv:2407.06889  [pdf, other

    cs.RO cs.CV cs.SC

    A Neurosymbolic Approach to Adaptive Feature Extraction in SLAM

    Authors: Yasra Chandio, Momin A. Khan, Khotso Selialia, Luis Garcia, Joseph DeGol, Fatima M. Anwar

    Abstract: Autonomous robots, autonomous vehicles, and humans wearing mixed-reality headsets require accurate and reliable tracking services for safety-critical applications in dynamically changing real-world environments. However, the existing tracking approaches, such as Simultaneous Localization and Mapping (SLAM), do not adapt well to environmental changes and boundary conditions despite extensive manual… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  4. arXiv:2407.06345  [pdf, other

    cs.HC cs.CE cs.CY cs.ET

    Multi-person eye tracking for real-world scene perception in social settings

    Authors: Shreshth Saxena, Areez Visram, Neil Lobo, Zahid Mirza, Mehak Rafi Khan, Biranugan Pirabaharan, Alexander Nguyen, Lauren K. Fink

    Abstract: Eye movements provide a window into human behaviour, attention, and interaction dynamics. Previous research suggests that eye movements are highly influenced by task, setting, and social others; however, most eye tracking research is conducted in single-person, in-lab settings and is yet to be validated in multi-person, naturalistic contexts. One such prevalent real-world context is the collective… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Please refer to the supplementary video illustrating the proposed approach in this paper here: https://tinyurl.com/multipersonET

    ACM Class: I.4.8; J.4; J.5; C.4; D.2.10

  5. arXiv:2407.04519  [pdf, other

    cs.CV

    Success or Failure? Analyzing Segmentation Refinement with Few-Shot Segmentation

    Authors: Seonghyeon Moon, Haein Kong, Muhammad Haris Khan

    Abstract: The purpose of segmentation refinement is to enhance the initial coarse masks generated by segmentation algorithms. The refined masks are expected to capture the details and contours of the target objects. Research on segmentation refinement has developed as a response to the need for high-quality initial masks. However, to our knowledge, no method has been developed that can determine the success… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 4 pages

  6. arXiv:2407.04069  [pdf, other

    cs.CL cs.AI cs.LG

    A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

    Authors: Md Tahmid Rahman Laskar, Sawsan Alqahtani, M Saiful Bari, Mizanur Rahman, Mohammad Abdullah Matin Khan, Haidar Khan, Israt Jahan, Amran Bhuiyan, Chee Wei Tan, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty, Jimmy Huang

    Abstract: Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the comple… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  7. arXiv:2407.02871  [pdf, other

    eess.IV cs.CV

    LMBF-Net: A Lightweight Multipath Bidirectional Focal Attention Network for Multifeatures Segmentation

    Authors: Tariq M Khan, Shahzaib Iqbal, Syed S. Naqvi, Imran Razzak, Erik Meijering

    Abstract: Retinal diseases can cause irreversible vision loss in both eyes if not diagnosed and treated early. Since retinal diseases are so complicated, retinal imaging is likely to show two or more abnormalities. Current deep learning techniques for segmenting retinal images with many labels and attributes have poor detection accuracy and generalisability. This paper presents a multipath convolutional neu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  8. arXiv:2407.02598  [pdf, other

    cs.CV cs.AI

    AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

    Authors: Mustafa Khan, Hamidreza Fazlali, Dhruv Sharma, Tongtong Cao, Dongfeng Bai, Yuan Ren, Bingbing Liu

    Abstract: Realistic scene reconstruction and view synthesis are essential for advancing autonomous driving systems by simulating safety-critical scenarios. 3D Gaussian Splatting excels in real-time rendering and static scene reconstructions but struggles with modeling driving scenarios due to complex backgrounds, dynamic objects, and sparse views. We propose AutoSplat, a framework employing Gaussian splatti… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  9. arXiv:2407.01440  [pdf, other

    cs.LG

    GAT-Steiner: Rectilinear Steiner Minimal Tree Prediction Using GNNs

    Authors: Bugra Onal, Eren Dogan, Muhammad Hadir Khan, Matthew R. Guthaus

    Abstract: The Rectilinear Steiner Minimum Tree (RSMT) problem is a fundamental problem in VLSI placement and routing and is known to be NP-hard. Traditional RSMT algorithms spend a significant amount of time on finding Steiner points to reduce the total wire length or use heuristics to approximate producing sub-optimal results. We show that Graph Neural Networks (GNNs) can be used to predict optimal Steiner… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Preprint for The 2024 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2024)

  10. arXiv:2406.18325  [pdf, ps, other

    cs.IT

    Linear codes with few weights over $\mathbb{F}_{p}+u\mathbb{F}_{p}$

    Authors: Pavan Kumar, Noor Mohammad Khan

    Abstract: For any positive integer $m$ and an odd prime $p$; let $\mathbb{F}_{q}+u\mathbb{F}_{q}$, where $q=p^{m}$, be a ring extension of the ring $\mathbb{F}_{p}+u\mathbb{F}_{p}.$ In this paper, we construct linear codes over $\mathbb{F}_{p}+u\mathbb{F}_{p}$ by using trace function defined on $\mathbb{F}_{q}+u\mathbb{F}_{q}$ and determine their Hamming weight distributions by employing symplectic-weight… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    MSC Class: 94B05; 11T71

  11. arXiv:2406.18307  [pdf, ps, other

    cs.IT

    Five-Lee-weight linear codes over $\mathbb{F}_{q}+u\mathbb{F}_{q}$

    Authors: Pavan Kumar, Noor Mohammad Khan

    Abstract: In this study, linear codes having their Lee-weight distributions over the semi-local ring $\mathbb{F}_{q}+u\mathbb{F}_{q}$ with $u^{2}=1$ are constructed using the defining set and Gauss sums for an odd prime $q $. Moreover, we derive complete Hamming-weight enumerators for the images of the constructed linear codes under the Gray map. We finally show an application to secret sharing schemes.

    Submitted 5 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    MSC Class: 94B05; 11T71

  12. arXiv:2406.17746  [pdf, other

    cs.CL cs.AI

    Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

    Authors: USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, Jyothir S V, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke, Katherine Lee, Naomi Saphra

    Abstract: Memorization in language models is typically treated as a homogenous phenomenon, neglecting the specifics of the memorized data. We instead model memorization as the effect of a set of complex factors that describe each sample and relate it to the model and corpus. To build intuition around these factors, we break memorization down into a taxonomy: recitation of highly duplicated sequences, recons… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  13. arXiv:2406.17190  [pdf, other

    cs.SD cs.LG eess.AS

    Sound Tagging in Infant-centric Home Soundscapes

    Authors: Mohammad Nur Hossain Khan, Jialu Li, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam

    Abstract: Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted in IEEE/ACM CHASE 2024

  14. arXiv:2406.15831  [pdf, other

    cs.CV

    Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation

    Authors: Muhammad Saif Ullah Khan, Muhammad Zeshan Afzal, Didier Stricker

    Abstract: Reconstructing texture-less surfaces poses unique challenges in computer vision, primarily due to the lack of specialized datasets that cater to the nuanced needs of depth and normals estimation in the absence of textural information. We introduce "Shape2.5D," a novel, large-scale dataset designed to address this gap. Comprising 364k frames spanning 2635 3D models and 48 unique objects, our datase… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: This dataset paper was originally written in 2022

  15. Modeling & Evaluating the Performance of Convolutional Neural Networks for Classifying Steel Surface Defects

    Authors: Nadeem Jabbar Chaudhry, M. Bilal Khan, M. Javaid Iqbal, Siddiqui Muhammad Yasir

    Abstract: Recently, outstanding identification rates in image classification tasks were achieved by convolutional neural networks (CNNs). to use such skills, selective CNNs trained on a dataset of well-known images of metal surface defects captured with an RGB camera. Defects must be detected early to take timely corrective action due to production concerns. For image classification up till now, a model-bas… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  16. arXiv:2406.14498  [pdf, other

    cs.CL

    LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors

    Authors: Sheikh Asif Imran, Mohammad Nur Hossain Khan, Subrata Biswas, Bashima Islam

    Abstract: Integrating inertial measurement units (IMUs) with large language models (LLMs) advances multimodal AI by enhancing human activity understanding. We introduce SensorCaps, a dataset of 26,288 IMU-derived activity narrations, and OpenSQA, an instruction-following dataset with 257,562 question-answer pairs. Combining LIMU-BERT and Llama, we develop LLaSA, a Large Multimodal Agent capable of interpret… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review at ARR (for EMNLP 2024)

  17. arXiv:2406.14370  [pdf, other

    cs.CV

    Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification

    Authors: Muhammad Saif Ullah Khan, Tahira Shehzadi, Rabeya Noor, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: Automated signature verification on bank checks is critical for fraud prevention and ensuring transaction authenticity. This task is challenging due to the coexistence of signatures with other textual and graphical elements on real-world documents. Verification systems must first detect the signature and then validate its authenticity, a dual challenge often overlooked by current datasets and meth… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted for publication in 16th IAPR International Workshop on Document Analysis Systems 2024

  18. arXiv:2406.13439  [pdf, other

    cs.CL

    Finding Blind Spots in Evaluator LLMs with Interpretable Checklists

    Authors: Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Sshubam Verma, Mitesh M. Khapra

    Abstract: Large Language Models (LLMs) are increasingly relied upon to evaluate text outputs of other LLMs, thereby influencing leaderboards and development decisions. However, concerns persist over the accuracy of these assessments and the potential for misleading conclusions. In this work, we investigate the effectiveness of LLMs as evaluators for text generation tasks. We propose FBI, a novel framework d… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  19. arXiv:2406.13302  [pdf, other

    cs.CV

    Situational Instructions Database: Task Guidance in Dynamic Environments

    Authors: Muhammad Saif Ullah Khan, Sankalp Sinha, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: The Situational Instructions Database (SID) addresses the need for enhanced situational awareness in artificial intelligence (AI) systems operating in dynamic environments. By integrating detailed scene graphs with dynamically generated, task-specific instructions, SID provides a novel dataset that allows AI systems to perform complex, real-world tasks with improved context sensitivity and operati… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures

  20. arXiv:2406.08775  [pdf, other

    cs.CV

    ALINA: Advanced Line Identification and Notation Algorithm

    Authors: Mohammed Abdul Hafeez Khan, Parth Ganeriwala, Siddhartha Bhattacharyya, Natasha Neogi, Raja Muthalagu

    Abstract: Labels are the cornerstone of supervised machine learning algorithms. Most visual recognition methods are fully supervised, using bounding boxes or pixel-wise segmentations for object localization. Traditional labeling methods, such as crowd-sourcing, are prohibitive due to cost, data privacy, amount of time, and potential errors on large datasets. To address these issues, we propose a novel annot… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Paper has been accepted to The 3rd CVPR Workshop on Vision Datasets Understanding, 2024

  21. arXiv:2406.08344  [pdf, other

    cs.CV

    Blind Image Deblurring using FFT-ReLU with Deep Learning Pipeline Integration

    Authors: Abdul Mohaimen Al Radi, Prothito Shovon Majumder, Syed Mumtahin Mahmud, Mahdi Mohd Hossain Noki, Md. Haider Ali, Md. Mosaddek Khan

    Abstract: Blind image deblurring is the process of deriving a sharp image and a blur kernel from a blurred image. Blurry images are typically modeled as the convolution of a sharp image with a blur kernel, necessitating the estimation of the unknown blur kernel to perform blind image deblurring effectively. Existing approaches primarily focus on domain-specific features of images, such as salient edges, dar… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 20 pages, 13 figures

  22. arXiv:2406.06533  [pdf, other

    cs.AR cs.AI

    Pragmatic Formal Verification Methodology for Clock Domain Crossing (CDC)

    Authors: Aman Kumar, Muhammad Ul Haque Khan, Bijitendra Mittra

    Abstract: Modern System-on-Chip (SoC) designs are becoming more and more complex due to the technology upscaling. SoC designs often operate on multiple asynchronous clock domains, further adding to the complexity of the overall design. To make the devices power efficient, designers take a Globally-Asynchronous Locally-Synchronous (GALS) approach that creates multiple asynchronous domains. These Clock Domain… ▽ More

    Submitted 20 April, 2024; originally announced June 2024.

    Comments: Published in DVCon Europe 2023

  23. arXiv:2406.06500  [pdf, ps, other

    cs.AI cs.LG

    Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation

    Authors: Mohidul Haque Mridul, Mohammad Foysal Khan, Redwan Ahmed Rizvee, Md Mosaddek Khan

    Abstract: In Multi-agent Reinforcement Learning (MARL), accurately perceiving opponents' strategies is essential for both cooperative and adversarial contexts, particularly within dynamic environments. While Proximal Policy Optimization (PPO) and related algorithms such as Actor-Critic with Experience Replay (ACER), Trust Region Policy Optimization (TRPO), and Deep Deterministic Policy Gradient (DDPG) perfo… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  24. arXiv:2405.20363  [pdf, other

    cs.CV

    LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild

    Authors: Zhiqiang Wang, Dejia Xu, Rana Muhammad Shahroz Khan, Yanbin Lin, Zhiwen Fan, Xingquan Zhu

    Abstract: Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal language models, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images f… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 7 pages, 3 figures, 5 tables, CVPR 2024 Workshop on Computer Vision in the Wild

  25. arXiv:2405.20084  [pdf, other

    cs.CV

    Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach

    Authors: Muhammad Saif Ullah Khan, Dhavalkumar Limbachiya, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: Human pose estimation is a key task in computer vision with various applications such as activity recognition and interactive systems. However, the lack of consistency in the annotated skeletons across different datasets poses challenges in developing universally applicable models. To address this challenge, we propose a novel approach integrating multi-teacher knowledge distillation with a unifie… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 15 pages (with references)

  26. arXiv:2405.17520  [pdf, other

    eess.IV cs.CV

    Advancing Medical Image Segmentation with Mini-Net: A Lightweight Solution Tailored for Efficient Segmentation of Medical Images

    Authors: Syed Javed, Tariq M. Khan, Abdul Qayyum, Arcot Sowmya, Imran Razzak

    Abstract: Accurate segmentation of anatomical structures and abnormalities in medical images is crucial for computer-aided diagnosis and analysis. While deep learning techniques excel at this task, their computational demands pose challenges. Additionally, some cutting-edge segmentation methods, though effective for general object segmentation, may not be optimised for medical images. To address these issue… ▽ More

    Submitted 12 July, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  27. arXiv:2405.15563  [pdf

    cs.CV

    Heterogeneous virus classification using a functional deep learning model based on transmission electron microscopy images (Preprint)

    Authors: Niloy Sikder, Md. Al-Masrur Khan, Anupam Kumar Bairagi, Mehedi Masud, Jun Jiat Tiang, Abdullah-Al Nahid

    Abstract: Viruses are submicroscopic agents that can infect all kinds of lifeforms and use their hosts' living cells to replicate themselves. Despite having some of the simplest genetic structures among all living beings, viruses are highly adaptable, resilient, and given the right conditions, are capable of causing unforeseen complications in their hosts' bodies. Due to their multiple transmission pathways… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  28. arXiv:2405.14497  [pdf, other

    cs.CV

    Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

    Authors: Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir, M. Saquib Sarfraz, Mohsen Ali

    Abstract: In this work, we tackle the problem of domain generalization for object detection, specifically focusing on the scenario where only a single source domain is available. We propose an effective approach that involves two key steps: diversifying the source domain and aligning detections based on class prediction confidence and localization. Firstly, we demonstrate that by carefully selecting a set o… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  29. arXiv:2405.13518  [pdf, other

    cs.CV

    PerSense: Personalized Instance Segmentation in Dense Images

    Authors: Muhammad Ibraheem Siddiqui, Muhammad Umer Sheikh, Hassan Abid, Muhammad Haris Khan

    Abstract: Leveraging large-scale pre-training, vision foundational models showcase notable performance benefits. While recent years have witnessed significant advancements in segmentation algorithms, existing models still face challenges to automatically segment personalized instances in dense and crowded scenarios. The primary factor behind this limitation stems from bounding box-based detections, which ar… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Technical report of PerSense

  30. arXiv:2405.12427  [pdf, other

    cs.LG

    Deep learning approaches to indoor wireless channel estimation for low-power communication

    Authors: Samrah Arif, Muhammad Arif Khan, Sabih Ur Rehman

    Abstract: In the rapidly growing development of the Internet of Things (IoT) infrastructure, achieving reliable wireless communication is a challenge. IoT devices operate in diverse environments with common signal interference and fluctuating channel conditions. Accurate channel estimation helps adapt the transmission strategies to current conditions, ensuring reliable communication. Traditional methods, su… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  31. arXiv:2405.08252  [pdf, other

    cs.LG cs.AI

    Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning

    Authors: Muhammad Junaid Khan, Syed Hammad Ahmed, Gita Sukthankar

    Abstract: We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning. Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble. This not only results in performance improvements over the original REDQ (Chen et al. 2021) and its variant DroQ (Hi-raoka et al. 2022), thereby enhanc… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: FLAIRS-37 (2024)

  32. arXiv:2405.06835  [pdf, other

    cs.LG cs.AI cs.SE

    Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs

    Authors: Harsh Patel, Buvaneswari A. Ramanan, Manzoor A. Khan, Thomas Williams, Brian Friedman, Lawrence Drabeck

    Abstract: This paper explores the possibilities of the current generation of Large Language Models for incorporating Machine Learning Operations (MLOps) functionalities into ML training code bases. We evaluate the performance of OpenAI (gpt-3.5-turbo) and WizardCoder (open-source, 15B parameters) models on the automated accomplishment of various MLOps functionalities in different settings. We perform a benc… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: The work was completed during 2Q, 3Q of Year 2023, when WizardCoder was the top performing Open source LLM for coding. Newer and better models have emerged since then. The processes and methodologies utilized for this benchmarking can still be utilized for evaluating the current SoTA models

  33. arXiv:2405.06128  [pdf, other

    cs.CV

    Enhanced Multimodal Content Moderation of Children's Videos using Audiovisual Fusion

    Authors: Syed Hammad Ahmed, Muhammad Junaid Khan, Gita Sukthankar

    Abstract: Due to the rise in video content creation targeted towards children, there is a need for robust content moderation schemes for video hosting platforms. A video that is visually benign may include audio content that is inappropriate for young children while being impossible to detect with a unimodal content moderation system. Popular video hosting platforms for children such as YouTube Kids still p… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 8 pages, 3 figures, Accepted at The 37th International FLAIRS Conference

  34. arXiv:2405.04735  [pdf, other

    cs.CR cs.DS cs.IR

    Cryptanalysis of the SIMON Cypher Using Neo4j

    Authors: Jonathan Cook, Sabih ur Rehman, M. Arif Khan

    Abstract: The exponential growth in the number of Internet of Things (IoT) devices has seen the introduction of several Lightweight Encryption Algorithms (LEA). While LEAs are designed to enhance the integrity, privacy and security of data collected and transmitted by IoT devices, it is hazardous to assume that all LEAs are secure and exhibit similar levels of protection. To improve encryption strength, cry… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 10 pages, 10 figures, 2 algorithms, accepted by the 4th International Conference on Electrical, Computer and Energy Technologies (ICECET) to be presented in July 2024

  35. arXiv:2405.03660  [pdf, other

    cs.CV

    CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification

    Authors: Sankalp Sinha, Muhammad Saif Ullah Khan, Talha Uddin Sheikh, Didier Stricker, Muhammad Zeshan Afzal

    Abstract: Zero-shot learning has been extensively investigated in the broader field of visual recognition, attracting significant interest recently. However, the current work on zero-shot learning in document image classification remains scarce. The existing studies either focus exclusively on zero-shot inference, or their evaluation does not align with the established criteria of zero-shot evaluation in th… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 18 Pages, 4 Figures and Accepted in ICDAR 2024

  36. arXiv:2405.01310  [pdf, other

    cs.IR cs.CL

    Overcoming LLM Challenges using RAG-Driven Precision in Coffee Leaf Disease Remediation

    Authors: Dr. Selva Kumar S, Afifah Khan Mohammed Ajmal Khan, Imadh Ajaz Banday, Manikantha Gada, Vibha Venkatesh Shanbhag

    Abstract: This research introduces an innovative AI-driven precision agriculture system, leveraging YOLOv8 for disease identification and Retrieval Augmented Generation (RAG) for context-aware diagnosis. Focused on addressing the challenges of diseases affecting the coffee production sector in Karnataka, The system integrates sophisticated object detection techniques with language models to address the inhe… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 6 pages, 3 figures

  37. arXiv:2405.00025  [pdf, other

    cs.CV cs.LG

    Leveraging Pre-trained CNNs for Efficient Feature Extraction in Rice Leaf Disease Classification

    Authors: Md. Shohanur Islam Sobuj, Md. Imran Hossen, Md. Foysal Mahmud, Mahbub Ul Islam Khan

    Abstract: Rice disease classification is a critical task in agricultural research, and in this study, we rigorously evaluate the impact of integrating feature extraction methodologies within pre-trained convolutional neural networks (CNNs). Initial investigations into baseline models, devoid of feature extraction, revealed commendable performance with ResNet-50 and ResNet-101 achieving accuracies of 91% and… ▽ More

    Submitted 26 February, 2024; originally announced May 2024.

  38. arXiv:2404.15337  [pdf, other

    eess.SP cs.LG cs.NI

    RSSI Estimation for Constrained Indoor Wireless Networks using ANN

    Authors: Samrah Arif, M. Arif Khan, Sabih Ur Rehman

    Abstract: In the expanding field of the Internet of Things (IoT), wireless channel estimation is a significant challenge. This is specifically true for low-power IoT (LP-IoT) communication, where efficiency and accuracy are extremely important. This research establishes two distinct LP-IoT wireless channel estimation models using Artificial Neural Networks (ANN): a Feature-based ANN model and a Sequence-bas… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  39. arXiv:2404.14955  [pdf, other

    cs.CV

    A Comprehensive Survey for Hyperspectral Image Classification: The Evolution from Conventional to Transformers

    Authors: Muhammad Ahmad, Salvatore Distifano, Adil Mehmood Khan, Manuel Mazzara, Chenyu Li, Jing Yao, Hao Li, Jagannath Aryal, Gemine Vivone, Danfeng Hong

    Abstract: Hyperspectral Image Classification (HSC) is a challenging task due to the high dimensionality and complex nature of Hyperspectral (HS) data. Traditional Machine Learning approaches while effective, face challenges in real-world data due to varying optimal feature sets, subjectivity in human-driven design, biases, and limitations. Traditional approaches encounter the curse of dimensionality, strugg… ▽ More

    Submitted 12 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  40. arXiv:2404.12957  [pdf, other

    cs.CL cs.LG

    Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction

    Authors: Qinyuan Wu, Mohammad Aflah Khan, Soumi Das, Vedant Nanda, Bishwamittra Ghosh, Camila Kolling, Till Speicher, Laurent Bindschaedler, Krishna P. Gummadi, Evimaria Terzi

    Abstract: We propose an approach for estimating the latent knowledge embedded inside large language models (LLMs). We leverage the in-context learning (ICL) abilities of LLMs to estimate the extent to which an LLM knows the facts stored in a knowledge base. Our knowledge estimator avoids reliability concerns with previous prompting-based methods, is both conceptually simpler and easier to apply, and we demo… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  41. arXiv:2404.09342  [pdf, other

    cs.CV cs.SD eess.AS

    Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

    Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

    Abstract: The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: ACM Multimedia Conference - Grand Challenge

  42. arXiv:2404.08168  [pdf, other

    cs.LG stat.ML

    Conformal Prediction via Regression-as-Classification

    Authors: Etash Guha, Shlok Natarajan, Thomas Möllenhoff, Mohammad Emtiyaz Khan, Eugene Ndiaye

    Abstract: Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals.~Here, we circumvent the challenges by converting regression to a classifica… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: International Conference of Learning Representations 2024

    Journal ref: International Conference of Learning Representations 2024

  43. arXiv:2404.08165  [pdf, other

    cs.CR

    Lightweight Cryptanalysis of IoT Encryption Algorithms : Is Quota Sampling the Answer?

    Authors: Jonathan Cook, Sabih ur Rehman, M. Arif Khan

    Abstract: Rapid growth in the number of small sensor devices known as the Internet of Things (IoT) has seen the development of lightweight encryption algorithms. Two well-known lightweight algorithms are SIMON and SIMECK which have been specifically designed for use on resource-constrained IoT devices. These lightweight encryption algorithms are based on the efficient Feistel block structure which is known… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 24 pages, 21 figures, 7 tables

  44. arXiv:2404.08024  [pdf, other

    cs.LG

    The OxMat dataset: a multimodal resource for the development of AI-driven technologies in maternal and newborn child health

    Authors: M. Jaleed Khan, Ioana Duta, Beth Albert, William Cooke, Manu Vatish, Gabriel Davis Jones

    Abstract: The rapid advancement of Artificial Intelligence (AI) in healthcare presents a unique opportunity for advancements in obstetric care, particularly through the analysis of cardiotocography (CTG) for fetal monitoring. However, the effectiveness of such technologies depends upon the availability of large, high-quality datasets that are suitable for machine learning. This paper introduces the Oxford M… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  45. arXiv:2404.05508  [pdf, other

    cs.SE cs.AI cs.CL

    Synergy of Large Language Model and Model Driven Engineering for Automated Development of Centralized Vehicular Systems

    Authors: Nenad Petrovic, Fengjunjie Pan, Krzysztof Lebioda, Vahid Zolfaghari, Sven Kirchner, Nils Purschke, Muhammad Aqib Khan, Viktor Vorobev, Alois Knoll

    Abstract: We present a prototype of a tool leveraging the synergy of model driven engineering (MDE) and Large Language Models (LLM) for the purpose of software development process automation in the automotive industry. In this approach, the user-provided input is free form textual requirements, which are first translated to Ecore model instance representation using an LLM, which is afterwards checked for co… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Report number: TUM-I24109 ACM Class: D.2.1; D.2.2; D.2.4; I.2.7; I.2.2; I.7.0

  46. arXiv:2404.01878  [pdf, other

    cs.CV cs.AI

    Real, fake and synthetic faces -- does the coin have three sides?

    Authors: Shahzeb Naeem, Ramzi Al-Sharawi, Muhammad Riyyan Khan, Usman Tariq, Abhinav Dhall, Hasan Al-Nashash

    Abstract: With the ever-growing power of generative artificial intelligence, deepfake and artificially generated (synthetic) media have continued to spread online, which creates various ethical and moral concerns regarding their usage. To tackle this, we thus present a novel exploration of the trends and patterns observed in real, deepfake and synthetic facial images. The proposed analysis is done in two pa… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  47. arXiv:2404.01438  [pdf

    cs.CV cs.AI

    Generation and Detection of Sign Language Deepfakes -- A Linguistic and Visual Analysis

    Authors: Shahzeb Naeem, Muhammad Riyyan Khan, Usman Tariq, Abhinav Dhall, Carlos Ivan Colon, Hasan Al-Nashash

    Abstract: A question in the realm of deepfakes is slowly emerging pertaining to whether we can go beyond facial deepfakes and whether it would be beneficial to society. Therefore, this research presents a positive application of deepfake technology in upper body generation, while performing sign-language for the Deaf and Hard of Hearing (DHoH) community. The resulting videos are later vetted with a sign lan… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 13 pages, 13 figures, Computer Vision and Image Understanding Journal

  48. arXiv:2404.00068  [pdf, other

    cs.CR cs.LG

    A Data-Driven Predictive Analysis on Cyber Security Threats with Key Risk Factors

    Authors: Fatama Tuz Johora, Md Shahedul Islam Khan, Esrath Kanon, Mohammad Abu Tareq Rony, Md Zubair, Iqbal H. Sarker

    Abstract: Cyber risk refers to the risk of defacing reputation, monetary losses, or disruption of an organization or individuals, and this situation usually occurs by the unconscious use of cyber systems. The cyber risk is unhurriedly increasing day by day and it is right now a global threat. Developing countries like Bangladesh face major cyber risk challenges. The growing cyber threat worldwide focuses on… ▽ More

    Submitted 28 March, 2024; originally announced April 2024.

    Comments: The paper contains 15 pages, 7 tables and 6 figures

  49. arXiv:2403.19949  [pdf, other

    cs.CV

    FairCLIP: Harnessing Fairness in Vision-Language Learning

    Authors: Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang

    Abstract: Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair… ▽ More

    Submitted 5 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  50. arXiv:2403.16194  [pdf, other

    cs.CV

    Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery

    Authors: Siddharth Tourani, Ahmed Alwheibi, Arif Mahmood, Muhammad Haris Khan

    Abstract: Unsupervised landmarks discovery (ULD) for an object category is a challenging computer vision problem. In pursuit of developing a robust ULD framework, we explore the potential of a recent paradigm of self-supervised learning algorithms, known as diffusion models. Some recent works have shown that these models implicitly contain important correspondence cues. Towards harnessing the potential of d… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted in CVPR 2024