-
Generalists vs. Specialists: Evaluating Large Language Models for Urdu
Authors:
Samee Arif,
Abdul Hameed Azeemi,
Agha Ali Raza,
Awais Athar
Abstract:
In this paper, we compare general-purpose pretrained models, GPT-4-Turbo and Llama-3-8b-Instruct with special-purpose models fine-tuned on specific tasks, XLM-Roberta-large, mT5-large, and Llama-3-8b-Instruct. We focus on seven classification and six generation tasks to evaluate the performance of these models on Urdu language. Urdu has 70 million native speakers, yet it remains underrepresented i…
▽ More
In this paper, we compare general-purpose pretrained models, GPT-4-Turbo and Llama-3-8b-Instruct with special-purpose models fine-tuned on specific tasks, XLM-Roberta-large, mT5-large, and Llama-3-8b-Instruct. We focus on seven classification and six generation tasks to evaluate the performance of these models on Urdu language. Urdu has 70 million native speakers, yet it remains underrepresented in Natural Language Processing (NLP). Despite the frequent advancements in Large Language Models (LLMs), their performance in low-resource languages, including Urdu, still needs to be explored. We also conduct a human evaluation for the generation tasks and compare the results with the evaluations performed by GPT-4-Turbo and Llama-3-8b-Instruct. We find that special-purpose models consistently outperform general-purpose models across various tasks. We also find that the evaluation done by GPT-4-Turbo for generation tasks aligns more closely with human evaluation compared to the evaluation by Llama-3-8b-Instruct. This paper contributes to the NLP community by providing insights into the effectiveness of general and specific-purpose LLMs for low-resource languages.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Breast Cancer Diagnosis: A Comprehensive Exploration of Explainable Artificial Intelligence (XAI) Techniques
Authors:
Samita Bai,
Sidra Nasir,
Rizwan Ahmed Khan,
Sheeraz Arif,
Alexandre Meyer,
Hubert Konik
Abstract:
Breast cancer (BC) stands as one of the most common malignancies affecting women worldwide, necessitating advancements in diagnostic methodologies for better clinical outcomes. This article provides a comprehensive exploration of the application of Explainable Artificial Intelligence (XAI) techniques in the detection and diagnosis of breast cancer. As Artificial Intelligence (AI) technologies cont…
▽ More
Breast cancer (BC) stands as one of the most common malignancies affecting women worldwide, necessitating advancements in diagnostic methodologies for better clinical outcomes. This article provides a comprehensive exploration of the application of Explainable Artificial Intelligence (XAI) techniques in the detection and diagnosis of breast cancer. As Artificial Intelligence (AI) technologies continue to permeate the healthcare sector, particularly in oncology, the need for transparent and interpretable models becomes imperative to enhance clinical decision-making and patient care. This review discusses the integration of various XAI approaches, such as SHAP, LIME, Grad-CAM, and others, with machine learning and deep learning models utilized in breast cancer detection and classification. By investigating the modalities of breast cancer datasets, including mammograms, ultrasounds and their processing with AI, the paper highlights how XAI can lead to more accurate diagnoses and personalized treatment plans. It also examines the challenges in implementing these techniques and the importance of developing standardized metrics for evaluating XAI's effectiveness in clinical settings. Through detailed analysis and discussion, this article aims to highlight the potential of XAI in bridging the gap between complex AI models and practical healthcare applications, thereby fostering trust and understanding among medical professionals and improving patient outcomes.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Deep learning approaches to indoor wireless channel estimation for low-power communication
Authors:
Samrah Arif,
Muhammad Arif Khan,
Sabih Ur Rehman
Abstract:
In the rapidly growing development of the Internet of Things (IoT) infrastructure, achieving reliable wireless communication is a challenge. IoT devices operate in diverse environments with common signal interference and fluctuating channel conditions. Accurate channel estimation helps adapt the transmission strategies to current conditions, ensuring reliable communication. Traditional methods, su…
▽ More
In the rapidly growing development of the Internet of Things (IoT) infrastructure, achieving reliable wireless communication is a challenge. IoT devices operate in diverse environments with common signal interference and fluctuating channel conditions. Accurate channel estimation helps adapt the transmission strategies to current conditions, ensuring reliable communication. Traditional methods, such as Least Squares (LS) and Minimum Mean Squared Error (MMSE) estimation techniques, often struggle to adapt to the diverse and complex environments typical of IoT networks. This research article delves into the potential of Deep Learning (DL) to enhance channel estimation, focusing on the Received Signal Strength Indicator (RSSI) metric - a critical yet challenging aspect due to its susceptibility to noise and environmental factors. This paper presents two Fully Connected Neural Networks (FCNNs)-based Low Power (LP-IoT) channel estimation models, leveraging RSSI for accurate channel estimation in LP-IoT communication. Our Model A exhibits a remarkable 99.02% reduction in Mean Squared Error (MSE), and Model B demonstrates a notable 90.03% MSE reduction compared to the benchmarks set by current studies. Additionally, the comparative studies of our model A with other DL-based techniques show significant efficiency in our estimation models.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
UQA: Corpus for Urdu Question Answering
Authors:
Samee Arif,
Sualeha Farid,
Awais Athar,
Agha Ali Raza
Abstract:
This paper introduces UQA, a novel dataset for question answering and text comprehension in Urdu, a low-resource language with over 70 million native speakers. UQA is generated by translating the Stanford Question Answering Dataset (SQuAD2.0), a large-scale English QA dataset, using a technique called EATS (Enclose to Anchor, Translate, Seek), which preserves the answer spans in the translated con…
▽ More
This paper introduces UQA, a novel dataset for question answering and text comprehension in Urdu, a low-resource language with over 70 million native speakers. UQA is generated by translating the Stanford Question Answering Dataset (SQuAD2.0), a large-scale English QA dataset, using a technique called EATS (Enclose to Anchor, Translate, Seek), which preserves the answer spans in the translated context paragraphs. The paper describes the process of selecting and evaluating the best translation model among two candidates: Google Translator and Seamless M4T. The paper also benchmarks several state-of-the-art multilingual QA models on UQA, including mBERT, XLM-RoBERTa, and mT5, and reports promising results. For XLM-RoBERTa-XL, we have an F1 score of 85.99 and 74.56 EM. UQA is a valuable resource for developing and testing multilingual NLP systems for Urdu and for enhancing the cross-lingual transferability of existing models. Further, the paper demonstrates the effectiveness of EATS for creating high-quality datasets for other languages and domains. The UQA dataset and the code are publicly available at www.github.com/sameearif/UQA.
△ Less
Submitted 22 July, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
RSSI Estimation for Constrained Indoor Wireless Networks using ANN
Authors:
Samrah Arif,
M. Arif Khan,
Sabih Ur Rehman
Abstract:
In the expanding field of the Internet of Things (IoT), wireless channel estimation is a significant challenge. This is specifically true for low-power IoT (LP-IoT) communication, where efficiency and accuracy are extremely important. This research establishes two distinct LP-IoT wireless channel estimation models using Artificial Neural Networks (ANN): a Feature-based ANN model and a Sequence-bas…
▽ More
In the expanding field of the Internet of Things (IoT), wireless channel estimation is a significant challenge. This is specifically true for low-power IoT (LP-IoT) communication, where efficiency and accuracy are extremely important. This research establishes two distinct LP-IoT wireless channel estimation models using Artificial Neural Networks (ANN): a Feature-based ANN model and a Sequence-based ANN model. Both models have been constructed to enhance LP-IoT communication by lowering the estimation error in the LP-IoT wireless channel. The Feature-based model aims to capture complex patterns of measured Received Signal Strength Indicator (RSSI) data using environmental characteristics. The Sequence-based approach utilises predetermined categorisation techniques to estimate the RSSI sequence of specifically selected environment characteristics. The findings demonstrate that our suggested approaches attain remarkable precision in channel estimation, with an improvement in MSE of $88.29\%$ of the Feature-based model and $97.46\%$ of the Sequence-based model over existing research. Additionally, the comparative analysis of these techniques with traditional and other Deep Learning (DL)-based techniques also highlights the superior performance of our developed models and their potential in real-world IoT applications.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
A multi-institutional pediatric dataset of clinical radiology MRIs by the Children's Brain Tumor Network
Authors:
Ariana M. Familiar,
Anahita Fathi Kazerooni,
Hannah Anderson,
Aliaksandr Lubneuski,
Karthik Viswanathan,
Rocky Breslow,
Nastaran Khalili,
Sina Bagheri,
Debanjan Haldar,
Meen Chul Kim,
Sherjeel Arif,
Rachel Madhogarhia,
Thinh Q. Nguyen,
Elizabeth A. Frenkel,
Zeinab Helili,
Jessica Harrison,
Keyvan Farahani,
Marius George Linguraru,
Ulas Bagci,
Yury Velichko,
Jeffrey Stevens,
Sarah Leary,
Robert M. Lober,
Stephani Campion,
Amy A. Smith
, et al. (15 additional authors not shown)
Abstract:
Pediatric brain and spinal cancers remain the leading cause of cancer-related death in children. Advancements in clinical decision-support in pediatric neuro-oncology utilizing the wealth of radiology imaging data collected through standard care, however, has significantly lagged other domains. Such data is ripe for use with predictive analytics such as artificial intelligence (AI) methods, which…
▽ More
Pediatric brain and spinal cancers remain the leading cause of cancer-related death in children. Advancements in clinical decision-support in pediatric neuro-oncology utilizing the wealth of radiology imaging data collected through standard care, however, has significantly lagged other domains. Such data is ripe for use with predictive analytics such as artificial intelligence (AI) methods, which require large datasets. To address this unmet need, we provide a multi-institutional, large-scale pediatric dataset of 23,101 multi-parametric MRI exams acquired through routine care for 1,526 brain tumor patients, as part of the Children's Brain Tumor Network. This includes longitudinal MRIs across various cancer diagnoses, with associated patient-level clinical information, digital pathology slides, as well as tissue genotype and omics data. To facilitate downstream analysis, treatment-naïve images for 370 subjects were processed and released through the NCI Childhood Cancer Data Initiative via the Cancer Data Service. Through ongoing efforts to continuously build these imaging repositories, our aim is to accelerate discovery and translational AI models with real-world data, to ultimately empower precision medicine for children.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Breast Cancer Classification using Deep Learned Features Boosted with Handcrafted Features
Authors:
Unaiza Sajid,
Rizwan Ahmed Khan,
Shahid Munir Shah,
Sheeraz Arif
Abstract:
Breast cancer is one of the leading causes of death among women across the globe. It is difficult to treat if detected at advanced stages, however, early detection can significantly increase chances of survival and improves lives of millions of women. Given the widespread prevalence of breast cancer, it is of utmost importance for the research community to come up with the framework for early dete…
▽ More
Breast cancer is one of the leading causes of death among women across the globe. It is difficult to treat if detected at advanced stages, however, early detection can significantly increase chances of survival and improves lives of millions of women. Given the widespread prevalence of breast cancer, it is of utmost importance for the research community to come up with the framework for early detection, classification and diagnosis. Artificial intelligence research community in coordination with medical practitioners are developing such frameworks to automate the task of detection. With the surge in research activities coupled with availability of large datasets and enhanced computational powers, it expected that AI framework results will help even more clinicians in making correct predictions. In this article, a novel framework for classification of breast cancer using mammograms is proposed. The proposed framework combines robust features extracted from novel Convolutional Neural Network (CNN) features with handcrafted features including HOG (Histogram of Oriented Gradients) and LBP (Local Binary Pattern). The obtained results on CBIS-DDSM dataset exceed state of the art.
△ Less
Submitted 16 January, 2023; v1 submitted 26 June, 2022;
originally announced June 2022.
-
Artificial Intelligence For Breast Cancer Detection: Trends & Directions
Authors:
Shahid Munir Shah,
Rizwan Ahmed Khan,
Sheeraz Arif,
Unaiza Sajid
Abstract:
In the last decade, researchers working in the domain of computer vision and Artificial Intelligence (AI) have beefed up their efforts to come up with the automated framework that not only detects but also identifies stage of breast cancer. The reason for this surge in research activities in this direction are mainly due to advent of robust AI algorithms (deep learning), availability of hardware t…
▽ More
In the last decade, researchers working in the domain of computer vision and Artificial Intelligence (AI) have beefed up their efforts to come up with the automated framework that not only detects but also identifies stage of breast cancer. The reason for this surge in research activities in this direction are mainly due to advent of robust AI algorithms (deep learning), availability of hardware that can train those robust and complex AI algorithms and accessibility of large enough dataset required for training AI algorithms. Different imaging modalities that have been exploited by researchers to automate the task of breast cancer detection are mammograms, ultrasound, magnetic resonance imaging, histopathological images or any combination of them. This article analyzes these imaging modalities and presents their strengths, limitations and enlists resources from where their datasets can be accessed for research purpose. This article then summarizes AI and computer vision based state-of-the-art methods proposed in the last decade, to detect breast cancer using various imaging modalities. Generally, in this article we have focused on to review frameworks that have reported results using mammograms as it is most widely used breast imaging modality that serves as first test that medical practitioners usually prescribe for the detection of breast cancer. Second reason of focusing on mammogram imaging modalities is the availability of its labeled datasets. Datasets availability is one of the most important aspect for the development of AI based frameworks as such algorithms are data hungry and generally quality of dataset affects performance of AI based algorithms. In a nutshell, this research article will act as a primary resource for the research community working in the field of automated breast imaging analysis.
△ Less
Submitted 3 October, 2021;
originally announced October 2021.
-
Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI
Authors:
Laxmi Pandey,
Ahmed Sabbir Arif
Abstract:
Speech sounds of spoken language are obtained by varying configuration of the articulators surrounding the vocal tract. They contain abundant information that can be utilized to better understand the underlying mechanism of human speech production. We propose a novel deep neural network-based learning framework that understands acoustic information in the variable-length sequence of vocal tract sh…
▽ More
Speech sounds of spoken language are obtained by varying configuration of the articulators surrounding the vocal tract. They contain abundant information that can be utilized to better understand the underlying mechanism of human speech production. We propose a novel deep neural network-based learning framework that understands acoustic information in the variable-length sequence of vocal tract shaping during speech production, captured by real-time magnetic resonance imaging (rtMRI), and translate it into text. The proposed framework comprises of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. On the USC-TIMIT corpus, the model achieved a 40.6% PER at sentence-level, much better compared to the existing models. To the best of our knowledge, this is the first study that demonstrates the recognition of entire spoken sentence based on an individual's articulatory motions captured by rtMRI video. We also performed an analysis of variations in the geometry of articulation in each sub-regions of the vocal tract (i.e., pharyngeal, velar and dorsal, hard palate, labial constriction region) with respect to different emotions and genders. Results suggest that each sub-regions distortion is affected by both emotion and gender.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
How Do Users Interact with an Error-Prone In-Air Gesture Recognizer?
Authors:
Ahmed Sabbir Arif,
Wolfgang Stuerzlinger,
Euclides Jose de Mendonca Filho,
Alec Gordynski
Abstract:
We present results of two pilot studies that investigated human error behaviours with an error prone in-air gesture recognizer. During the studies, users performed a small set of simple in-air gestures. In the first study, these gestures were abstract. The second study associated concrete tasks with each gesture. Interestingly, the error patterns observed in the two studies were substantially diff…
▽ More
We present results of two pilot studies that investigated human error behaviours with an error prone in-air gesture recognizer. During the studies, users performed a small set of simple in-air gestures. In the first study, these gestures were abstract. The second study associated concrete tasks with each gesture. Interestingly, the error patterns observed in the two studies were substantially different.
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
Heterogeneous Hand Guise Classification Based on Surface Electromyographic Signals Using Multichannel Convolutional Neural Network
Authors:
Niloy Sikder,
Abu Shamim Mohammad Arif,
Abdullah-Al Nahid
Abstract:
Electromyography (EMG) is a way of measuring the bioelectric activities that take place inside the muscles. EMG is usually performed to detect abnormalities within the nerves or muscles of a target area. The recent developments in the field of Machine Learning allow us to use EMG signals to teach machines the complex properties of human movements. Modern machines are capable of detecting numerous…
▽ More
Electromyography (EMG) is a way of measuring the bioelectric activities that take place inside the muscles. EMG is usually performed to detect abnormalities within the nerves or muscles of a target area. The recent developments in the field of Machine Learning allow us to use EMG signals to teach machines the complex properties of human movements. Modern machines are capable of detecting numerous human activities and distinguishing among them solely based on the EMG signals produced by those activities. However, success in accomplishing this task mostly depends on the learning technique used by the machine to analyze EMG signals; and even the latest algorithms do not result in flawless classification. In this study, a novel classification method has been described employing a multichannel Convolutional Neural Network (CNN) that interprets surface EMG signals by the properties they exhibit in the power domain. The proposed method was tested on a well-established EMG dataset, and the result yields very high classification accuracy. This learning model will help researchers to develop prosthetic arms capable of detecting various hand gestures to mimic them afterwards.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
Human Activity Recognition Using Multichannel Convolutional Neural Network
Authors:
Niloy Sikder,
Md. Sanaullah Chowdhury,
Abu Shamim Mohammad Arif,
Abdullah-Al Nahid
Abstract:
Human Activity Recognition (HAR) simply refers to the capacity of a machine to perceive human actions. HAR is a prominent application of advanced Machine Learning and Artificial Intelligence techniques that utilize computer vision to understand the semantic meanings of heterogeneous human actions. This paper describes a supervised learning method that can distinguish human actions based on data co…
▽ More
Human Activity Recognition (HAR) simply refers to the capacity of a machine to perceive human actions. HAR is a prominent application of advanced Machine Learning and Artificial Intelligence techniques that utilize computer vision to understand the semantic meanings of heterogeneous human actions. This paper describes a supervised learning method that can distinguish human actions based on data collected from practical human movements. The primary challenge while working with HAR is to overcome the difficulties that come with the cyclostationary nature of the activity signals. This study proposes a HAR classification model based on a two-channel Convolutional Neural Network (CNN) that makes use of the frequency and power features of the collected human action signals. The model was tested on the UCI HAR dataset, which resulted in a 95.25% classification accuracy. This approach will help to conduct further researches on the recognition of human activities based on their biomedical signals.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
Enabling Input on Tiny/Headless Systems Using Morse Code
Authors:
Anna-Maria Gueorguieva,
Gulnar Rakhmetulla,
Ahmed Sabbir Arif
Abstract:
This paper presents results of a pilot study that explored the potential of Morse code as a method for text entry on mobile devices. In the study, participants without prior experience with Morse code reached 6.7 wpm with a Morse code keyboard in three short sessions. Learning was observed both in terms of text entry speed and accuracy, which suggests that the overall performance of the keyboard i…
▽ More
This paper presents results of a pilot study that explored the potential of Morse code as a method for text entry on mobile devices. In the study, participants without prior experience with Morse code reached 6.7 wpm with a Morse code keyboard in three short sessions. Learning was observed both in terms of text entry speed and accuracy, which suggests that the overall performance of the keyboard is likely to improve with practice.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Evaluating Feedback Strategies for Virtual Human Trainers
Authors:
Xiumin Shang,
Ahmed Sabbir Arif,
Marcelo Kallmann
Abstract:
In this paper we address feedback strategies for an autonomous virtual trainer. First, a pilot study was conducted to identify and specify feedback strategies for assisting participants in performing a given task. The task involved sorting virtual cubes according to areas of countries displayed on them. Two feedback strategies were specified. The first provides correctness feedback by fully correc…
▽ More
In this paper we address feedback strategies for an autonomous virtual trainer. First, a pilot study was conducted to identify and specify feedback strategies for assisting participants in performing a given task. The task involved sorting virtual cubes according to areas of countries displayed on them. Two feedback strategies were specified. The first provides correctness feedback by fully correcting user responses at each stage of the task, and the second provides suggestive feedback by only notifying if and how a response can be corrected. Both strategies were implemented in a virtual training system and empirically evaluated. The correctness feedback strategy was preferred by the participants, was more effective time-wise, and was more effective in improving task performance skills. The overall system was also rated comparable to hypothetically performing the same task with real interactions.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Metrics for Multi-Touch Input Technologies
Authors:
Ahmed Sabbir Arif
Abstract:
Multi-touch input technologies are becoming popular with the increased interest in touchscreen- and touchpad-based devices. A great deal of work has been done on different multi-touch technologies, and researchers and practitioners are frequently coming up with new ones. However, it is almost impossible to compare such technologies due to the absence of multi-touch performance metrics. Designers u…
▽ More
Multi-touch input technologies are becoming popular with the increased interest in touchscreen- and touchpad-based devices. A great deal of work has been done on different multi-touch technologies, and researchers and practitioners are frequently coming up with new ones. However, it is almost impossible to compare such technologies due to the absence of multi-touch performance metrics. Designers usually use their own methods to report their techniques' performances. Moreover, multi-touch interaction was never modeled. That makes it impossible for designers to predict the performance of a new technology before developing it, costing them valuable time, effort, and money. This article discusses the necessity of having dedicated performance metrics and prediction model for multi-touch technologies, and ways of approaching that.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Exploratory Study of Young Children's Social Media Needs and Requirements
Authors:
Di "Chelsea" Sun,
Vaishnavi Melkote,
Ahmed Sabbir Arif
Abstract:
As social media are becoming increasingly popular among young children, it is important to explore this population's needs and requirements from these platforms. As a first step to this, we conducted an exploratory design workshop with children aged between ten and eleven years to find out about their social media needs and requirements. Through an analysis of the paper prototypes solicited from t…
▽ More
As social media are becoming increasingly popular among young children, it is important to explore this population's needs and requirements from these platforms. As a first step to this, we conducted an exploratory design workshop with children aged between ten and eleven years to find out about their social media needs and requirements. Through an analysis of the paper prototypes solicited from the workshop, here we discuss the social media features that are the most desired by this population.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
Early Blindness Detection Based on Retinal Images Using Ensemble Learning
Authors:
Niloy Sikder,
Md. Sanaullah Chowdhury,
Abu Shamim Mohammad Arif,
Abdullah-Al Nahid
Abstract:
Diabetic retinopathy (DR) is the primary cause of vision loss among grownup people around the world. In four out of five cases having diabetes for a prolonged period leads to DR. If detected early, more than 90 percent of the new DR occurrences can be prevented from turning into blindness through proper treatment. Despite having multiple treatment procedures available that are well capable to deal…
▽ More
Diabetic retinopathy (DR) is the primary cause of vision loss among grownup people around the world. In four out of five cases having diabetes for a prolonged period leads to DR. If detected early, more than 90 percent of the new DR occurrences can be prevented from turning into blindness through proper treatment. Despite having multiple treatment procedures available that are well capable to deal with DR, the negligence and failure of early detection cost most of the DR patients their precious eyesight. The recent developments in the field of Digital Image Processing (DIP) and Machine Learning (ML) have paved the way to use machines in this regard. The contemporary technologies allow us to develop devices capable of automatically detecting the condition of a persons eyes based on their retinal images. However, in practice, several factors hinder the quality of the captured images and impede the detection outcome. In this study, a novel early blind detection method has been proposed based on the color information extracted from retinal images using an ensemble learning algorithm. The method has been tested on a set of retinal images collected from people living in the rural areas of South Asia, which resulted in a 91 percent classification accuracy.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Text-to-Image Generation with Attention Based Recurrent Neural Networks
Authors:
Tehseen Zia,
Shahan Arif,
Shakeeb Murtaza,
Mirza Ahsan Ullah
Abstract:
Conditional image modeling based on textual descriptions is a relatively new domain in unsupervised learning. Previous approaches use a latent variable model and generative adversarial networks. While the formers are approximated by using variational auto-encoders and rely on the intractable inference that can hamper their performance, the latter is unstable to train due to Nash equilibrium based…
▽ More
Conditional image modeling based on textual descriptions is a relatively new domain in unsupervised learning. Previous approaches use a latent variable model and generative adversarial networks. While the formers are approximated by using variational auto-encoders and rely on the intractable inference that can hamper their performance, the latter is unstable to train due to Nash equilibrium based objective function. We develop a tractable and stable caption-based image generation model. The model uses an attention-based encoder to learn word-to-pixel dependencies. A conditional autoregressive based decoder is used for learning pixel-to-pixel dependencies and generating images. Experimentations are performed on Microsoft COCO, and MNIST-with-captions datasets and performance is evaluated by using the Structural Similarity Index. Results show that the proposed model performs better than contemporary approaches and generate better quality images. Keywords: Generative image modeling, autoregressive image modeling, caption-based image generation, neural attention, recurrent neural networks.
△ Less
Submitted 18 January, 2020;
originally announced January 2020.
-
Put a Ring on It: Text Entry Performance on a Grip Ring Attached Smartphone
Authors:
Monwen Shen,
Gulnar Rakhmetulla,
Ahmed Sabbir Arif
Abstract:
This paper presents results of a study investing effects of grip rings on text entry. Results revealed that grip rings do not affect text entry performance in terms of speed, accuracy, or keystrokes per character. It then reflects on future research directions based on the results and observations from the study. The purpose of this work is to stress the necessity of classifying and evaluating low…
▽ More
This paper presents results of a study investing effects of grip rings on text entry. Results revealed that grip rings do not affect text entry performance in terms of speed, accuracy, or keystrokes per character. It then reflects on future research directions based on the results and observations from the study. The purpose of this work is to stress the necessity of classifying and evaluating low-cost mobile phone accessories.
△ Less
Submitted 14 September, 2018;
originally announced September 2018.
-
PROPEL: Probabilistic Parametric Regression Loss for Convolutional Neural Networks
Authors:
Muhammad Asad,
Rilwan Basaru,
S M Masudur Rahman Al Arif,
Greg Slabaugh
Abstract:
In recent years, Convolutional Neural Networks (CNNs) have enabled significant advancements to the state-of-the-art in computer vision. For classification tasks, CNNs have widely employed probabilistic output and have shown the significance of providing additional confidence for predictions. However, such probabilistic methodologies are not widely applicable for addressing regression problems usin…
▽ More
In recent years, Convolutional Neural Networks (CNNs) have enabled significant advancements to the state-of-the-art in computer vision. For classification tasks, CNNs have widely employed probabilistic output and have shown the significance of providing additional confidence for predictions. However, such probabilistic methodologies are not widely applicable for addressing regression problems using CNNs, as regression involves learning unconstrained continuous and, in many cases, multi-variate target variables. We propose a PRObabilistic Parametric rEgression Loss (PROPEL) that facilitates CNNs to learn parameters of probability distributions for addressing probabilistic regression problems. PROPEL is fully differentiable and, hence, can be easily incorporated for end-to-end training of existing CNN regression architectures using existing optimization algorithms. The proposed method is flexible as it enables learning complex unconstrained probabilities while being generalizable to higher dimensional multi-variate regression problems. We utilize a PROPEL-based CNN to address the problem of learning hand and head orientation from uncalibrated color images. Our experimental validation and comparison with existing CNN regression loss functions show that PROPEL improves the accuracy of a CNN by enabling probabilistic regression, while significantly reducing required model parameters by $10 \times$, resulting in improved generalization as compared to the existing state-of-the-art.
△ Less
Submitted 8 August, 2020; v1 submitted 28 July, 2018;
originally announced July 2018.
-
The Perception of Humanoid Robots for Domestic Use in Saudi Arabia
Authors:
Ohoud Alharbi,
Ahmed Sabbir Arif
Abstract:
We propose a research to investigate Saudi peoples' perception of humanoid domestic robots and attitude towards the possibility of having one in their house. Through a series of questionnaires, semi-structured interviews, focus groups, and participatory design sessions, this research will explore Saudi peoples' level of acceptance towards domestic robots, the tasks and responsibilities they would…
▽ More
We propose a research to investigate Saudi peoples' perception of humanoid domestic robots and attitude towards the possibility of having one in their house. Through a series of questionnaires, semi-structured interviews, focus groups, and participatory design sessions, this research will explore Saudi peoples' level of acceptance towards domestic robots, the tasks and responsibilities they would feel comfortable assigning to these robots, their preferred appearance of domestic robots, and the cultural stereotypes they feel a domestic robot must mimic.
△ Less
Submitted 24 June, 2018;
originally announced June 2018.
-
Metrics for Bengali Text Entry Research
Authors:
Sayan Sarcar,
Ahmed Sabbir Arif,
Ali Mazalek
Abstract:
With the intention of bringing uniformity to Bengali text entry research, here we present a new approach for calculating the most popular English text entry evaluation metrics for Bengali. To demonstrate our approach, we conducted a user study where we evaluated four popular Bengali text entry techniques.
With the intention of bringing uniformity to Bengali text entry research, here we present a new approach for calculating the most popular English text entry evaluation metrics for Bengali. To demonstrate our approach, we conducted a user study where we evaluated four popular Bengali text entry techniques.
△ Less
Submitted 25 June, 2017;
originally announced June 2017.
-
An Enhanced Static Data Compression Scheme Of Bengali Short Message
Authors:
Abu Shamim Mohammad Arif,
Asif Mahamud,
Rashedul Islam
Abstract:
This paper concerns a modified approach of compressing Short Bengali Text Message for small devices. The prime objective of this research technique is to establish a low complexity compression scheme suitable for small devices having small memory and relatively lower processing speed. The basic aim is not to compress text of any size up to its maximum level without having any constraint on space…
▽ More
This paper concerns a modified approach of compressing Short Bengali Text Message for small devices. The prime objective of this research technique is to establish a low complexity compression scheme suitable for small devices having small memory and relatively lower processing speed. The basic aim is not to compress text of any size up to its maximum level without having any constraint on space and time, rather than the main target is to compress short messages up to an optimal level which needs minimum space, consume less time and the processor requirement is lower. We have implemented Character Masking, Dictionary Matching, Associative rule of data mining and Hyphenation algorithm for syllable based compression in hierarchical steps to achieve low complexity lossless compression of text message for any mobile devices. The scheme to choose the diagrams are performed on the basis of extensive statistical model and the static Huffman coding is done through the same context.
△ Less
Submitted 7 September, 2009; v1 submitted 1 September, 2009;
originally announced September 2009.