-
Present and Future of AI in Renewable Energy Domain : A Comprehensive Survey
Authors:
Abdur Rashid,
Parag Biswas,
Angona Biswas,
MD Abdullah Al Nasim,
Kishor Datta Gupta,
Roy George
Abstract:
Artificial intelligence (AI) has become a crucial instrument for streamlining processes in various industries, including electrical power systems, as a result of recent digitalization. Algorithms for artificial intelligence are data-driven models that are based on statistical learning theory and are used as a tool to take use of the data that the power system and its users generate. Initially, we…
▽ More
Artificial intelligence (AI) has become a crucial instrument for streamlining processes in various industries, including electrical power systems, as a result of recent digitalization. Algorithms for artificial intelligence are data-driven models that are based on statistical learning theory and are used as a tool to take use of the data that the power system and its users generate. Initially, we perform a thorough literature analysis of artificial intelligence (AI) applications related to renewable energy (RE). Next, we present a thorough analysis of renewable energy factories and assess their suitability, along with a list of the most widely used and appropriate AI algorithms. Nine AI-based strategies are identified here to assist Renewable Energy (RE) in contemporary power systems. This survey paper comprises an extensive review of the several AI techniques used for renewable energy as well as a methodical analysis of the literature for the study of various intelligent system application domains across different disciplines of renewable energy. This literature review identifies the performance and outcomes of nine different research methods by assessing them, and it aims to distill valuable insights into their strengths and limitations. This study also addressed three main topics: using AI technology for renewable power generation, utilizing AI for renewable energy forecasting, and optimizing energy systems. Additionally, it explored AI's superiority over conventional models in controllability, data handling, cyberattack prevention, smart grid implementation, robotics- AI's significance in shaping the future of the energy industry. Furthermore, this article outlines future directions in the integration of AI for renewable energy.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
AI-Driven Approaches for Optimizing Power Consumption: A Comprehensive Survey
Authors:
Parag Biswas,
Abdur Rashid,
Angona Biswas,
Md Abdullah Al Nasim,
Kishor Datta Gupta,
Roy George
Abstract:
Reduced environmental effect, lower operating costs, and a stable and sustainable energy supply for current and future generations are the main reasons why power optimization is important. Power optimization makes ensuring that energy is used more effectively, cutting down on waste and optimizing the utilization of resources.In today's world, power optimization and artificial intelligence (AI) int…
▽ More
Reduced environmental effect, lower operating costs, and a stable and sustainable energy supply for current and future generations are the main reasons why power optimization is important. Power optimization makes ensuring that energy is used more effectively, cutting down on waste and optimizing the utilization of resources.In today's world, power optimization and artificial intelligence (AI) integration are essential to changing the way energy is produced, used, and distributed. Real-time monitoring and analysis of power usage trends is made possible by AI-driven algorithms and predictive analytics, which enable dynamic modifications to effectively satisfy demand. Efficiency and sustainability are increased when power consumption is optimized in different sectors thanks to the use of intelligent systems. This survey paper comprises an extensive review of the several AI techniques used for power optimization as well as a methodical analysis of the literature for the study of various intelligent system application domains across different disciplines of power consumption.This literature review identifies the performance and outcomes of 17 different research methods by assessing them, and it aims to distill valuable insights into their strengths and limitations. Furthermore, this article outlines future directions in the integration of AI for power consumption optimization.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Comparison of On-Orbit Manual Attitude Control Methods for Non-Docking Spacecraft Through Virtual Reality Simulation
Authors:
Ajit Krishnan,
Himanshu Vishwakarma,
Maharudra Kharsade,
Pradipta Biswas
Abstract:
On-orbit manual attitude control of manned spacecraft is accomplished using external visual references and some method of three axis attitude control. All past, present, and developmental spacecraft feature the capability to manually control attitude for deorbit. National Aeronautics and Space Administration (NASA) spacecraft permit an aircraft windshield type front view, wherein an arc of the Ear…
▽ More
On-orbit manual attitude control of manned spacecraft is accomplished using external visual references and some method of three axis attitude control. All past, present, and developmental spacecraft feature the capability to manually control attitude for deorbit. National Aeronautics and Space Administration (NASA) spacecraft permit an aircraft windshield type front view, wherein an arc of the Earths horizon is visible to the crew in deorbit attitude. Russian and Chinese spacecraft permit the crew a bottom view wherein the entire circular Earth horizon disk is visible to the crew in deorbit attitude. Our study compared these two types of external views for efficiency in achievement of deorbit attitude. We used a Unity Virtual Reality (VR) spacecraft simulator that we built in house. The task was to accurately achieve deorbit attitude while in a 400 km circular orbit. Six military test pilots and six civilians with gaming experience flew the task using two methods of visual reference. Comparison was based on time taken, fuel consumed, cognitive workload assessment and user preference. We used ocular parameters, EEG, NASA TLX and IBM SUS to quantify our results. Our study found that the bottom view was easier to operate for manual deorbit task. Additionally, we realized that a VR based system can work as a training simulator for manual on-orbit flight path control tasks by pilots and non pilots. Results from our study can be used for design of manual on orbit attitude control of present and future spacecrafts.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
The weak relationship between ankle proprioception and gait speed after stroke a robotic assessment study
Authors:
Christopher A. Johnson,
Piyashi Biswas,
Rubi Tapia,
Jill See,
Lucy Dodakian,
Vicky Chan,
Po T. Wang,
Zoran Nenadic,
An H. Do,
David J. Reinkensmeyer
Abstract:
Ankle proprioceptive deficits are common after stroke and occur independently of ankle motor impairments. Despite this independence, some studies have found that ankle proprioceptive deficits predict gait function, consistent with the concept that somatosensory input plays a key role in gait control. Other studies, however, have not found a relationship, possibly because of variability in proprioc…
▽ More
Ankle proprioceptive deficits are common after stroke and occur independently of ankle motor impairments. Despite this independence, some studies have found that ankle proprioceptive deficits predict gait function, consistent with the concept that somatosensory input plays a key role in gait control. Other studies, however, have not found a relationship, possibly because of variability in proprioception assessments. Robotic assessments of proprioception offer improved consistency and sensitivity. Here we relationships between ankle proprioception, ankle motor impairment, and gait function after stroke using robotic assessments of ankle proprioception. We quantified ankle proprioception using two different robotic tests (Joint Position Reproduction and Crisscross) in 39 persons in the chronic phase of stroke. We analyzed the extent to which these robotic proprioception measures predicted gait speed, measured over a long distance (6-minute walk test) and a short distance (10-meter walk test). We also studied the relationship between robotic proprioception measures and lower extremity motor impairment, quantified with measures of ankle strength, active range of motion, and the lower extremity Fugl-Meyer exam. Impairment in ankle proprioception was present in 87% of the participants. Ankle proprioceptive acuity measured with JPR was weakly correlated with 6MWT gait speed (\r{ho} = -0.34, p = 0.039) but not 10mWT (\r{ho} = -0.29, p = 0.08). Ankle proprioceptive acuity was not correlated with lower extremity motor impairment (p > 0.2). These results confirm the presence of a weak relationship between ankle proprioception and gait after stroke that is independent of motor impairment.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
A Fuzzy Approach to Record Linkages
Authors:
Pratik K. Biswas
Abstract:
Record Linkage is the process of identifying and unifying records from various independent data sources. Existing strategies, which can be either deterministic or probabilistic, often fail to link records satisfactorily under uncertainty. This paper describes an indigenously (locally) developed fuzzy linkage method, based on fuzzy set techniques, which can effectively account for this uncertainty…
▽ More
Record Linkage is the process of identifying and unifying records from various independent data sources. Existing strategies, which can be either deterministic or probabilistic, often fail to link records satisfactorily under uncertainty. This paper describes an indigenously (locally) developed fuzzy linkage method, based on fuzzy set techniques, which can effectively account for this uncertainty prevalent in the disparate data sources and address the shortcomings of the existing approaches. Extensive testing, evaluation and comparisons have demonstrated the efficacy of this fuzzy approach for record linkages.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
A Hybrid Strategy for Chat Transcript Summarization
Authors:
Pratik K. Biswas
Abstract:
Text summarization is the process of condensing a piece of text to fewer sentences, while still preserving its content. Chat transcript, in this context, is a textual copy of a digital or online conversation between a customer (caller) and agent(s). This paper presents an indigenously (locally) developed hybrid method that first combines extractive and abstractive summarization techniques in compr…
▽ More
Text summarization is the process of condensing a piece of text to fewer sentences, while still preserving its content. Chat transcript, in this context, is a textual copy of a digital or online conversation between a customer (caller) and agent(s). This paper presents an indigenously (locally) developed hybrid method that first combines extractive and abstractive summarization techniques in compressing ill-punctuated or un-punctuated chat transcripts to produce more readable punctuated summaries and then optimizes the overall quality of summarization through reinforcement learning. Extensive testing, evaluations, comparisons, and validation have demonstrated the efficacy of this approach for large-scale deployment of chat transcript summarization, in the absence of manually generated reference (annotated) summaries.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Histopathological Image Analysis with Style-Augmented Feature Domain Mixing for Improved Generalization
Authors:
Vaibhav Khamankar,
Sutanu Bera,
Saumik Bhattacharya,
Debashis Sen,
Prabir Kumar Biswas
Abstract:
Histopathological images are essential for medical diagnosis and treatment planning, but interpreting them accurately using machine learning can be challenging due to variations in tissue preparation, staining and imaging protocols. Domain generalization aims to address such limitations by enabling the learning models to generalize to new datasets or populations. Style transfer-based data augmenta…
▽ More
Histopathological images are essential for medical diagnosis and treatment planning, but interpreting them accurately using machine learning can be challenging due to variations in tissue preparation, staining and imaging protocols. Domain generalization aims to address such limitations by enabling the learning models to generalize to new datasets or populations. Style transfer-based data augmentation is an emerging technique that can be used to improve the generalizability of machine learning models for histopathological images. However, existing style transfer-based methods can be computationally expensive, and they rely on artistic styles, which can negatively impact model accuracy. In this study, we propose a feature domain style mixing technique that uses adaptive instance normalization to generate style-augmented versions of images. We compare our proposed method with existing style transfer-based data augmentation methods and found that it performs similarly or better, despite requiring less computation and time. Our results demonstrate the potential of feature domain statistics mixing in the generalization of learning models for histopathological image analysis.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Machine Learning Assisted Bad Data Detection for High-throughput Substation Communication
Authors:
Suman Sourav,
Partha P. Biswas,
Vyshnavi Mohanraj,
Binbin Chen,
Daisuke Mashima
Abstract:
Electrical substations are becoming more prone to cyber-attacks due to increasing digitalization. Prevailing defense measures based on cyber rules are often inadequate to detect attacks that use legitimate-looking measurements. In this work, we design and implement a bad data detection solution for electrical substations called ResiGate, that effectively combines a physics-based approach and a mac…
▽ More
Electrical substations are becoming more prone to cyber-attacks due to increasing digitalization. Prevailing defense measures based on cyber rules are often inadequate to detect attacks that use legitimate-looking measurements. In this work, we design and implement a bad data detection solution for electrical substations called ResiGate, that effectively combines a physics-based approach and a machine-learning-based approach to provide substantial speed-up in high-throughput substation communication scenarios, while still maintaining high detection accuracy and confidence. While many existing physics-based schemes are designed for deployment in control centers (due to their high computational requirement), ResiGate is designed as a security appliance that can be deployed on low-cost industrial computers at the edge of the smart grid so that it can detect local substation-level attacks in a timely manner. A key challenge for this is to continuously run the computationally demanding physics-based analysis to monitor the measurement data frequently transmitted in a typical substation. To provide high throughput without sacrificing accuracy, ResiGate uses machine learning to effectively filter out most of the non-suspicious (normal) data and thereby reducing the overall computational load, allowing efficient performance even with a high volume of network traffic. We implement ResiGate on a low-cost industrial computer and our experiments confirm that ResiGate can detect attacks with zero error while sustaining a high throughput.
△ Less
Submitted 12 February, 2023;
originally announced February 2023.
-
Towards Precision in Appearance-based Gaze Estimation in the Wild
Authors:
Murthy L. R. D.,
Abhishek Mukhopadhyay,
Shambhavi Aggarwal,
Ketan Anand,
Pradipta Biswas
Abstract:
Appearance-based gaze estimation systems have shown great progress recently, yet the performance of these techniques depend on the datasets used for training. Most of the existing gaze estimation datasets setup in interactive settings were recorded in laboratory conditions and those recorded in the wild conditions display limited head pose and illumination variations. Further, we observed little a…
▽ More
Appearance-based gaze estimation systems have shown great progress recently, yet the performance of these techniques depend on the datasets used for training. Most of the existing gaze estimation datasets setup in interactive settings were recorded in laboratory conditions and those recorded in the wild conditions display limited head pose and illumination variations. Further, we observed little attention so far towards precision evaluations of existing gaze estimation approaches. In this work, we present a large gaze estimation dataset, PARKS-Gaze, with wider head pose and illumination variation and with multiple samples for a single Point of Gaze (PoG). The dataset contains 974 minutes of data from 28 participants with a head pose range of 60 degrees in both yaw and pitch directions. Our within-dataset and cross-dataset evaluations and precision evaluations indicate that the proposed dataset is more challenging and enable models to generalize on unseen participants better than the existing in-the-wild datasets. The project page can be accessed here: https://github.com/lrdmurthy/PARKS-Gaze
△ Less
Submitted 13 February, 2023; v1 submitted 5 February, 2023;
originally announced February 2023.
-
Self Supervised Low Dose Computed Tomography Image Denoising Using Invertible Network Exploiting Inter Slice Congruence
Authors:
Sutanu Bera,
Prabir Kumar Biswas
Abstract:
The resurgence of deep neural networks has created an alternative pathway for low-dose computed tomography denoising by learning a nonlinear transformation function between low-dose CT (LDCT) and normal-dose CT (NDCT) image pairs. However, those paired LDCT and NDCT images are rarely available in the clinical environment, making deep neural network deployment infeasible. This study proposes a nove…
▽ More
The resurgence of deep neural networks has created an alternative pathway for low-dose computed tomography denoising by learning a nonlinear transformation function between low-dose CT (LDCT) and normal-dose CT (NDCT) image pairs. However, those paired LDCT and NDCT images are rarely available in the clinical environment, making deep neural network deployment infeasible. This study proposes a novel method for self-supervised low-dose CT denoising to alleviate the requirement of paired LDCT and NDCT images. Specifically, we have trained an invertible neural network to minimize the pixel-based mean square distance between a noisy slice and the average of its two immediate adjacent noisy slices. We have shown the aforementioned is similar to training a neural network to minimize the distance between clean NDCT and noisy LDCT image pairs. Again, during the reverse mapping of the invertible network, the output image is mapped to the original input image, similar to cycle consistency loss. Finally, the trained invertible network's forward mapping is used for denoising LDCT images. Extensive experiments on two publicly available datasets showed that our method performs favourably against other existing unsupervised methods.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Detecting Hidden Attackers in Photovoltaic Systems Using Machine Learning
Authors:
Suman Sourav,
Partha P. Biswas,
Binbin Chen,
Daisuke Mashima
Abstract:
In modern smart grids, the proliferation of communication-enabled distributed energy resource (DER) systems has increased the surface of possible cyber-physical attacks. Attacks originating from the distributed edge devices of DER system, such as photovoltaic (PV) system, is often difficult to detect. An attacker may change the control configurations or various setpoints of the PV inverters to des…
▽ More
In modern smart grids, the proliferation of communication-enabled distributed energy resource (DER) systems has increased the surface of possible cyber-physical attacks. Attacks originating from the distributed edge devices of DER system, such as photovoltaic (PV) system, is often difficult to detect. An attacker may change the control configurations or various setpoints of the PV inverters to destabilize the power grid, damage devices, or for the purpose of economic gain. A more powerful attacker may even manipulate the PV system metering data transmitted for remote monitoring, so that (s)he can remain hidden. In this paper, we consider a case where PV systems operating in different control modes can be simultaneously attacked and the attacker has the ability to manipulate individual PV bus measurements to avoid detection. We show that even in such a scenario, with just the aggregated measurements (that the attacker cannot manipulate), machine learning (ML) techniques are able to detect the attack in a fast and accurate manner. We use a standard radial distribution network, together with real smart home electricity consumption data and solar power data in our experimental setup. We test the performance of several ML algorithms to detect attacks on the PV system. Our detailed evaluations show that the proposed intrusion detection system (IDS) is highly effective and efficient in detecting attacks on PV inverter control modes.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
To show or not to show: Redacting sensitive text from videos of electronic displays
Authors:
Abhishek Mukhopadhyay,
Shubham Agarwal,
Patrick Dylan Zwick,
Pradipta Biswas
Abstract:
With the increasing prevalence of video recordings there is a growing need for tools that can maintain the privacy of those recorded. In this paper, we define an approach for redacting personally identifiable text from videos using a combination of optical character recognition (OCR) and natural language processing (NLP) techniques. We examine the relative performance of this approach when used wi…
▽ More
With the increasing prevalence of video recordings there is a growing need for tools that can maintain the privacy of those recorded. In this paper, we define an approach for redacting personally identifiable text from videos using a combination of optical character recognition (OCR) and natural language processing (NLP) techniques. We examine the relative performance of this approach when used with different OCR models, specifically Tesseract and the OCR system from Google Cloud Vision (GCV). For the proposed approach the performance of GCV, in both accuracy and speed, is significantly higher than Tesseract. Finally, we explore the advantages and disadvantages of both models in real-world applications.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
Sub-Aperture Feature Adaptation in Single Image Super-resolution Model for Light Field Imaging
Authors:
Aupendu Kar,
Suresh Nehra,
Jayanta Mukhopadhyay,
Prabir Kumar Biswas
Abstract:
With the availability of commercial Light Field (LF) cameras, LF imaging has emerged as an up and coming technology in computational photography. However, the spatial resolution is significantly constrained in commercial microlens based LF cameras because of the inherent multiplexing of spatial and angular information. Therefore, it becomes the main bottleneck for other applications of light field…
▽ More
With the availability of commercial Light Field (LF) cameras, LF imaging has emerged as an up and coming technology in computational photography. However, the spatial resolution is significantly constrained in commercial microlens based LF cameras because of the inherent multiplexing of spatial and angular information. Therefore, it becomes the main bottleneck for other applications of light field cameras. This paper proposes an adaptation module in a pretrained Single Image Super Resolution (SISR) network to leverage the powerful SISR model instead of using highly engineered light field imaging domain specific Super Resolution models. The adaption module consists of a Sub aperture Shift block and a fusion block. It is an adaptation in the SISR network to further exploit the spatial and angular information in LF images to improve the super resolution performance. Experimental validation shows that the proposed method outperforms existing light field super resolution algorithms. It also achieves PSNR gains of more than 1 dB across all the datasets as compared to the same pretrained SISR models for scale factor 2, and PSNR gains 0.6 to 1 dB for scale factor 4.
△ Less
Submitted 26 July, 2022; v1 submitted 24 July, 2022;
originally announced July 2022.
-
Texture Aware Autoencoder Pre-training And Pairwise Learning Refinement For Improved Iris Recognition
Authors:
Manashi Chakraborty,
Aritri Chakraborty,
Prabir Kumar Biswas,
Pabitra Mitra
Abstract:
This paper presents a texture aware end-to-end trainable iris recognition system, specifically designed for datasets like iris having limited training data. We build upon our previous stagewise learning framework with certain key optimization and architectural innovations. First, we pretrain a Stage-1 encoder network with an unsupervised autoencoder learning optimized with an additional data relat…
▽ More
This paper presents a texture aware end-to-end trainable iris recognition system, specifically designed for datasets like iris having limited training data. We build upon our previous stagewise learning framework with certain key optimization and architectural innovations. First, we pretrain a Stage-1 encoder network with an unsupervised autoencoder learning optimized with an additional data relation loss on top of usual reconstruction loss. The data relation loss enables learning better texture representation which is pivotal for a texture rich dataset such as iris. Robustness of Stage-1 feature representation is further enhanced with an auxiliary denoising task. Such pre-training proves beneficial for effectively training deep networks on data constrained iris datasets. Next, in Stage-2 supervised refinement, we design a pairwise learning architecture for an end-to-end trainable iris recognition system. The pairwise learning includes the task of iris matching inside the training pipeline itself and results in significant improvement in recognition performance compared to usual offline matching. We validate our model across three publicly available iris datasets and the proposed model consistently outperforms both traditional and deep learning baselines for both Within-Dataset and Cross-Dataset configurations
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Q-Learning Based Energy-Efficient Network Planning in IP-over-EON
Authors:
Pramit Biswas,
Md Shahbaz Akhtar,
Aneek Adhya,
Sriparna Saha,
Sudhan Majhi
Abstract:
During network planning phase, optimal network planning implemented through efficient resource allocation and static traffic demand provisioning in IP-over-elastic optical network (IP-over-EON) is significantly challenging compared with the fixed-grid wavelength division multiplexing (WDM) network due to increased flexibility in IP-over-EON. Mathematical optimization models used for this purpose m…
▽ More
During network planning phase, optimal network planning implemented through efficient resource allocation and static traffic demand provisioning in IP-over-elastic optical network (IP-over-EON) is significantly challenging compared with the fixed-grid wavelength division multiplexing (WDM) network due to increased flexibility in IP-over-EON. Mathematical optimization models used for this purpose may not provide solution for large networks due to large computational complexity. In this regard, a greedy heuristic may be used that intuitively selects traffic elements in sequence from static traffic demand matrix and attempts to find the best solution. However, in general, such greedy heuristics offer suboptimal solutions, since appropriate traffic sequence offering the optimal performance is rarely selected. In this regard, we propose a reinforcement learning technique (in particular a Q-learning method), combined with an auxiliary graph (AG)-based energy efficient greedy method to be used for large network planning. The Q-learning method is used to decide the suitable sequence of traffic allocation such that the overall power consumption in the network reduces. In the proposed heuristic, each traffic from the given static traffic demand matrix is successively selected using the Q-learning technique and provisioned using the AG-based greedy method.
△ Less
Submitted 20 November, 2021;
originally announced November 2021.
-
Safest Nearby Neighbor Queries in Road Networks (Full Version)
Authors:
Punam Biswas,
Tanzima Hashem,
Muhammad Aamir Cheema
Abstract:
Traditional route planning and k nearest neighbors queries only consider distance or travel time and ignore road safety altogether. However, many travellers prefer to avoid risky or unpleasant road conditions such as roads with high crime rates (e.g., robberies, kidnapping, riots etc.) and bumpy roads. To facilitate safe travel, we introduce a novel query for road networks called the k safest near…
▽ More
Traditional route planning and k nearest neighbors queries only consider distance or travel time and ignore road safety altogether. However, many travellers prefer to avoid risky or unpleasant road conditions such as roads with high crime rates (e.g., robberies, kidnapping, riots etc.) and bumpy roads. To facilitate safe travel, we introduce a novel query for road networks called the k safest nearby neighbors (kSNN) query. Given a query location $v_l$, a distance constraint $d_c$ and a point of interest $p_i$, we define the safest path from $v_l$ to $p_i$ as the path with the highest path safety score among all the paths from $v_l$ to $p_i$ with length less than $d_c$. The path safety score is computed considering the road safety of each road segment on the path. Given a query location $v_l$, a distance constraint $d_c$ and a set of POIs P, a kSNN query returns k POIs with the k highest path safety scores in P along with their respective safest paths from the query location. We develop two novel indexing structures called Ct-tree and a safety score based Voronoi diagram (SNVD). We propose two efficient query processing algorithms each exploiting one of the proposed indexes to effectively refine the search space using the properties of the index. Our extensive experimental study on real datasets demonstrates that our solution is on average an order of magnitude faster than the baselines.
△ Less
Submitted 26 April, 2022; v1 submitted 29 July, 2021;
originally announced July 2021.
-
User Perception of Privacy with Ubiquitous Devices
Authors:
Priyam Rajkhowa,
Pradipta Biswas
Abstract:
Privacy is important for all individuals in everyday life. With emerging technologies, smartphones with AR, various social networking applications and artificial intelligence driven modes of surveillance, they tend to intrude privacy. This study aimed to explore and discover various concerns related to perception of privacy in this era of ubiquitous technologies. It employed online survey question…
▽ More
Privacy is important for all individuals in everyday life. With emerging technologies, smartphones with AR, various social networking applications and artificial intelligence driven modes of surveillance, they tend to intrude privacy. This study aimed to explore and discover various concerns related to perception of privacy in this era of ubiquitous technologies. It employed online survey questionnaire to study user perspectives of privacy. Purposive sampling was used to collect data from 60 participants. Inductive thematic analysis was used to analyze data. Our study discovered key themes like attitude towards privacy in public and private spaces, privacy awareness, consent seeking, dilemmas/confusions related to various technologies, impact of attitude and beliefs on individuals actions regarding how to protect oneself from invasion of privacy in both public and private spaces. These themes interacted amongst themselves and influenced formation of various actions. They were like core principles that molded actions that prevented invasion of privacy for both participant and bystander. Findings of this study would be helpful to improve privacy and personalization of various emerging technologies. This study contributes to privacy by design and positive design by considering psychological needs of users. This is suggestive that the findings can be applied in the areas of experience design, positive technologies, social computing and behavioral interventions.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
Using Eye Tracker To Evaluate Cockpit Design -- A Flight Simulation Study
Authors:
Archana Hebbar,
Abhay Pashilkar,
Pradipta Biswas
Abstract:
This paper investigates applications of eye tracking in transport aircraft design evaluations. Piloted simulations were conducted for a complete flight profile including take off, cruise and landing flight scenario using the transport aircraft flight simulator at CSIR National Aerospace Laboratories. Thirty-one simulation experiments were carried out with three pilots and engineers while recording…
▽ More
This paper investigates applications of eye tracking in transport aircraft design evaluations. Piloted simulations were conducted for a complete flight profile including take off, cruise and landing flight scenario using the transport aircraft flight simulator at CSIR National Aerospace Laboratories. Thirty-one simulation experiments were carried out with three pilots and engineers while recording the ocular parameters and the flight data. Simulations were repeated for high workload conditions like flying with degraded visibility and during stall. Pilots visual scan behaviour and workload levels were analysed using ocular parameters; while comparing with the statistical deviations from the desired flight path. Conditions for fatigue were also recreated through long duration simulations and signatures for the same from the ocular parameters were assessed. Results from the study found correlation between the statistical inferences obtained from the ocular parameters with those obtained from the flight path deviations. The paper also demonstrates an evaluators console that assists the designers or evaluators for better understanding of pilots attentional resource allocation.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
A Wearable Virtual Touch System for Cars
Authors:
Gowdham Prabhakar,
Priyam Rajkhowa,
Pradipta Biswas
Abstract:
In automotive domain, operation of secondary tasks like accessing infotainment system, adjusting air conditioning vents, and side mirrors distract drivers from driving. Though existing modalities like gesture and speech recognition systems facilitate undertaking secondary tasks by reducing duration of eyes off the road, those often require remembering a set of gestures or screen sequences. In this…
▽ More
In automotive domain, operation of secondary tasks like accessing infotainment system, adjusting air conditioning vents, and side mirrors distract drivers from driving. Though existing modalities like gesture and speech recognition systems facilitate undertaking secondary tasks by reducing duration of eyes off the road, those often require remembering a set of gestures or screen sequences. In this paper, we have proposed two different modalities for drivers to virtually touch the dashboard display using a laser tracker with a mechanical switch and an eye gaze switch. We compared performances of our proposed modalities against conventional touch modality in automotive environment by comparing pointing and selection times of representative secondary task and also analysed effect on driving performance in terms of deviation from lane, average speed, variation in perceived workload and system usability. We did not find significant difference in driving and pointing performance between laser tracking system and existing touchscreen system. Our result also showed that the driving and pointing performance of the virtual touch system with eye gaze switch was significantly better than the same with mechanical switch. We evaluated the efficacy of the proposed virtual touch system with eye gaze switch inside a real car and investigated acceptance of the system by professional drivers using qualitative research. The quantitative and qualitative studies indicated importance of using multimodal system inside car and highlighted several criteria for acceptance of new automotive user interface.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
A Brief Survey on Interactive Automotive UI
Authors:
Gowdham Prabhakar,
Pradipta Biswas
Abstract:
Automotive User Interface (AutoUI) is relatively a new discipline in the context of both Transportation Engineering and Human Machine Interaction (HMI). It covers various HMI aspects both inside and outside vehicle ranging from operating the vehicle itself, undertaking various secondary tasks, driver behaviour analysis, cognitive load estimation and so on. This review paper discusses various inter…
▽ More
Automotive User Interface (AutoUI) is relatively a new discipline in the context of both Transportation Engineering and Human Machine Interaction (HMI). It covers various HMI aspects both inside and outside vehicle ranging from operating the vehicle itself, undertaking various secondary tasks, driver behaviour analysis, cognitive load estimation and so on. This review paper discusses various interactive HMI inside a vehicle used for undertaking secondary tasks. We divided recent HMIs through four sections on virtual touch interfaces, wearable devices, speech recognition and non-visual interfaces and eye gaze controlled systems. Finally, we summarized advantages and disadvantages of various technologies.
△ Less
Submitted 30 May, 2021;
originally announced May 2021.
-
A Hybrid Recommender System for Recommending Smartphones to Prospective Customers
Authors:
Pratik K. Biswas,
Songlin Liu
Abstract:
Recommender Systems are a subclass of machine learning systems that employ sophisticated information filtering strategies to reduce the search time and suggest the most relevant items to any particular user. Hybrid recommender systems combine multiple recommendation strategies in different ways to benefit from their complementary advantages. Some hybrid recommender systems have combined collaborat…
▽ More
Recommender Systems are a subclass of machine learning systems that employ sophisticated information filtering strategies to reduce the search time and suggest the most relevant items to any particular user. Hybrid recommender systems combine multiple recommendation strategies in different ways to benefit from their complementary advantages. Some hybrid recommender systems have combined collaborative filtering and content-based approaches to build systems that are more robust. In this paper, we propose a hybrid recommender system, which combines Alternating Least Squares (ALS) based collaborative filtering with deep learning to enhance recommendation performance as well as overcome the limitations associated with the collaborative filtering approach, especially concerning its cold start problem. In essence, we use the outputs from ALS (collaborative filtering) to influence the recommendations from a Deep Neural Network (DNN), which combines characteristic, contextual, structural and sequential information, in a big data processing framework. We have conducted several experiments in testing the efficacy of the proposed hybrid architecture in recommending smartphones to prospective customers and compared its performance with other open-source recommenders. The results have shown that the proposed system has outperformed several existing hybrid recommender systems.
△ Less
Submitted 19 July, 2022; v1 submitted 26 May, 2021;
originally announced May 2021.
-
Iterative Gradient Encoding Network with Feature Co-Occurrence Loss for Single Image Reflection Removal
Authors:
Sutanu Bera,
Prabir Kumar Biswas
Abstract:
Removing undesired reflections from a photo taken in front of glass is of great importance for enhancing visual computing systems' efficiency. Previous learning-based approaches have produced visually plausible results for some reflections type, however, failed to generalize against other reflection types. There is a dearth of literature for efficient methods concerning single image reflection rem…
▽ More
Removing undesired reflections from a photo taken in front of glass is of great importance for enhancing visual computing systems' efficiency. Previous learning-based approaches have produced visually plausible results for some reflections type, however, failed to generalize against other reflection types. There is a dearth of literature for efficient methods concerning single image reflection removal, which can generalize well in large-scale reflection types. In this study, we proposed an iterative gradient encoding network for single image reflection removal. Next, to further supervise the network in learning the correlation between the transmission layer features, we proposed a feature co-occurrence loss. Extensive experiments on the public benchmark dataset of SIR$^2$ demonstrated that our method can remove reflection favorably against the existing state-of-the-art method on all imaging settings, including diverse backgrounds. Moreover, as the reflection strength increases, our method can still remove reflection even where other state of the art methods failed.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Extractive Summarization of Call Transcripts
Authors:
Pratik K. Biswas,
Aleksandr Iakubovich
Abstract:
Text summarization is the process of extracting the most important information from the text and presenting it concisely in fewer sentences. Call transcript is a text that involves textual description of a phone conversation between a customer (caller) and agent(s) (customer representatives). This paper presents an indigenously developed method that combines topic modeling and sentence selection w…
▽ More
Text summarization is the process of extracting the most important information from the text and presenting it concisely in fewer sentences. Call transcript is a text that involves textual description of a phone conversation between a customer (caller) and agent(s) (customer representatives). This paper presents an indigenously developed method that combines topic modeling and sentence selection with punctuation restoration in condensing ill-punctuated or un-punctuated call transcripts to produce summaries that are more readable. Extensive testing, evaluation and comparisons have demonstrated the efficacy of this summarizer for call transcript summarization.
△ Less
Submitted 15 April, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
A Novel Approach for Earthquake Early Warning System Design using Deep Learning Techniques
Authors:
Tonumoy Mukherjee,
Chandrani Singh,
Prabir Kumar Biswas
Abstract:
Earthquake signals are non-stationary in nature and thus in real-time, it is difficult to identify and classify events based on classical approaches like peak ground displacement, peak ground velocity. Even the popular algorithm of STA/LTA requires extensive research to determine basic thresholding parameters so as to trigger an alarm. Also, many times due to human error or other unavoidable natur…
▽ More
Earthquake signals are non-stationary in nature and thus in real-time, it is difficult to identify and classify events based on classical approaches like peak ground displacement, peak ground velocity. Even the popular algorithm of STA/LTA requires extensive research to determine basic thresholding parameters so as to trigger an alarm. Also, many times due to human error or other unavoidable natural factors such as thunder strikes or landslides, the algorithm may end up raising a false alarm. This work focuses on detecting earthquakes by converting seismograph recorded data into corresponding audio signals for better perception and then uses popular Speech Recognition techniques of Filter bank coefficients and Mel Frequency Cepstral Coefficients (MFCC) to extract the features. These features were then used to train a Convolutional Neural Network(CNN) and a Long Short Term Memory(LSTM) network. The proposed method can overcome the above-mentioned problems and help in detecting earthquakes automatically from the waveforms without much human intervention. For the 1000Hz audio data set the CNN model showed a testing accuracy of 91.1% for 0.2-second sample window length while the LSTM model showed 93.99% for the same. A total of 610 sounds consisting of 310 earthquake sounds and 300 non-earthquake sounds were used to train the models. While testing, the total time required for generating the alarm was approximately 2 seconds which included individual times for data collection, processing, and prediction taking into consideration the processing and prediction delays. This shows the effectiveness of the proposed method for Earthquake Early Warning (EEW) applications. Since the input of the method is only the waveform, it is suitable for real-time processing, thus the models can also be used as an onsite EEW system requiring a minimum amount of preparation time and workload.
△ Less
Submitted 16 January, 2021;
originally announced January 2021.
-
Adaptive Accessible AR/VR Systems
Authors:
Pradipta Biswas,
Pilar Orero,
Manohar Swaminathan,
Kavita Krishnaswamy,
Peter Robinson
Abstract:
Augmented, virtual and mixed reality technologies offer new ways of interacting with digital media. However, such technologies are not well explored for people with different ranges of abilities beyond a few specific navigation and gaming applications. While new standardization activities are investigating accessibility issues with existing AR/VR systems, commercial systems are still confined to s…
▽ More
Augmented, virtual and mixed reality technologies offer new ways of interacting with digital media. However, such technologies are not well explored for people with different ranges of abilities beyond a few specific navigation and gaming applications. While new standardization activities are investigating accessibility issues with existing AR/VR systems, commercial systems are still confined to specialized hardware and software limiting their widespread adoption among people with disabilities as well as seniors. This proposal takes a novel approach by exploring the application of user model-based personalization for AR/VR systems to improve accessibility. The workshop will be organized by experienced researchers in the field of human computer interaction, robotics control, assistive technology, and AR/VR systems, and will consist of peer reviewed papers and hands-on demonstrations. Keynote speeches and demonstrations will cover latest accessibility research at Microsoft, Google, Verizon and leading universities.
△ Less
Submitted 8 January, 2021;
originally announced January 2021.
-
Analysing ocular parameters for web browsing and graph visualization
Authors:
Somnath Arjun,
KamalPreet Singh Saluja,
Pradipta Biswas
Abstract:
This paper proposes a set of techniques to investigate eye gaze and fixation patterns while users interact with electronic user interfaces. In particular, two case studies are presented - one on analysing eye gaze while interacting with deceptive materials in web pages and another on analysing graphs in standard computer monitor and virtual reality displays. We analysed spatial and temporal distri…
▽ More
This paper proposes a set of techniques to investigate eye gaze and fixation patterns while users interact with electronic user interfaces. In particular, two case studies are presented - one on analysing eye gaze while interacting with deceptive materials in web pages and another on analysing graphs in standard computer monitor and virtual reality displays. We analysed spatial and temporal distributions of eye gaze fixations and sequence of eye gaze movements. We used this information to propose new design guidelines to avoid deceptive materials in web and user-friendly representation of data in 2D graphs. In 2D graph study we identified that area graph has lowest number of clusters for user's gaze fixations and lowest average response time. The results of 2D graph study were implemented in virtual and mixed reality environment. Along with this, it was ob-served that the duration while interacting with deceptive materials in web pages is independent of the number of fixations. Furthermore, web-based data visualization tool for analysing eye tracking data from single and multiple users was developed.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Eye Tracking to Understand Impact of Aging on Mobile Phone Applications
Authors:
Antony William Joseph,
Jeevitha Shree DV,
Kamal Preet Singh Saluja,
Abhishek Mukhopadhyay,
Ramaswami Murugesh,
Pradipta Biswas
Abstract:
Usage of smartphones and tablets have been increasing rapidly with multi-touch interaction and powerful configurations. Performing tasks on mobile phones become more complex as people age, thereby increasing their cognitive workload. In this context, we conducted an eye tracking study with 50 participants between the age of 20 to 60 years and above, living in Bangalore, India. This paper focuses o…
▽ More
Usage of smartphones and tablets have been increasing rapidly with multi-touch interaction and powerful configurations. Performing tasks on mobile phones become more complex as people age, thereby increasing their cognitive workload. In this context, we conducted an eye tracking study with 50 participants between the age of 20 to 60 years and above, living in Bangalore, India. This paper focuses on visual nature of interaction with mobile user interfaces. The study aims to investigate how aging affects user experience on mobile phones while performing complex tasks, and estimate cognitive workload using eye tracking metrics. The study consisted of five tasks that were performed on an android mobile phone under naturalistic scenarios using eye tracking glasses. We recorded ocular parameters like fixation rate, saccadic rate, average fixation duration, maximum fixation duration and standard deviation of pupil dilation for left and right eyes respectively for each participant. Results from our study show that aging has a bigger effect on performance of using mobile phones irrespective of any complex task given to them. We noted that, participants aged between 50 to 60+ years had difficulties in completing tasks and showed increased cognitive workload. They took longer fixation duration to complete tasks which involved copy-paste operations. Further, we identifed design implications and provided design recommendations for designers and manufacturers.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Noise Conscious Training of Non Local Neural Network powered by Self Attentive Spectral Normalized Markovian Patch GAN for Low Dose CT Denoising
Authors:
Sutanu Bera,
Prabir Kumar Biswas
Abstract:
The explosive rise of the use of Computer tomography (CT) imaging in medical practice has heightened public concern over the patient's associated radiation dose. However, reducing the radiation dose leads to increased noise and artifacts, which adversely degrades the scan's interpretability. Consequently, an advanced image reconstruction algorithm to improve the diagnostic performance of low dose…
▽ More
The explosive rise of the use of Computer tomography (CT) imaging in medical practice has heightened public concern over the patient's associated radiation dose. However, reducing the radiation dose leads to increased noise and artifacts, which adversely degrades the scan's interpretability. Consequently, an advanced image reconstruction algorithm to improve the diagnostic performance of low dose ct arose as the primary concern among the researchers, which is challenging due to the ill-posedness of the problem. In recent times, the deep learning-based technique has emerged as a dominant method for low dose CT(LDCT) denoising. However, some common bottleneck still exists, which hinders deep learning-based techniques from furnishing the best performance. In this study, we attempted to mitigate these problems with three novel accretions. First, we propose a novel convolutional module as the first attempt to utilize neighborhood similarity of CT images for denoising tasks. Our proposed module assisted in boosting the denoising by a significant margin. Next, we moved towards the problem of non-stationarity of CT noise and introduced a new noise aware mean square error loss for LDCT denoising. Moreover, the loss mentioned above also assisted to alleviate the laborious effort required while training CT denoising network using image patches. Lastly, we propose a novel discriminator function for CT denoising tasks. The conventional vanilla discriminator tends to overlook the fine structural details and focus on the global agreement. Our proposed discriminator leverage self-attention and pixel-wise GANs for restoring the diagnostic quality of LDCT images. Our method validated on a publicly available dataset of the 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge performed remarkably better than the existing state of the art method.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Accessibility evaluation of websites using WCAG tools and Cambridge Simulator
Authors:
Shashank Kumar,
JeevithaShree DV,
Pradipta Biswas
Abstract:
There is plethora of tools available for automatic evaluation of web accessibility with respect to WCAG. This paper compares a set of WCAG tools and their results in terms of ease of comprehension and implementation by web developers. The paper highlights accessibility issues that cannot be captured only through conformance to WCAG tools and propose additional methods to evaluate accessibility thr…
▽ More
There is plethora of tools available for automatic evaluation of web accessibility with respect to WCAG. This paper compares a set of WCAG tools and their results in terms of ease of comprehension and implementation by web developers. The paper highlights accessibility issues that cannot be captured only through conformance to WCAG tools and propose additional methods to evaluate accessibility through an Inclusive User Model. We initially selected ten WCAG tools from W3 website and used a set of these tools on the landing pages of BBC and WHO websites. We compared their outcome in terms of commonality, differences, amount of details and usability. Finally, we briefly introduced the Inclusive User Model and demonstrated how simulation of user interaction can capture usability and accessibility issues that are not detected through WCAG analysis. The paper concludes with a proposal on a Common User Profile format that can be used to compare and contrast accessibility systems and services, and to simulate and personalize interaction for users with different range of abilities.
△ Less
Submitted 14 September, 2020;
originally announced September 2020.
-
Progressive Update Guided Interdependent Networks for Single Image Dehazing
Authors:
Aupendu Kar,
Sobhan Kanti Dhara,
Debashis Sen,
Prabir Kumar Biswas
Abstract:
Images with haze of different varieties often pose a significant challenge to dehazing. Therefore, guidance by estimates of haze parameters related to the variety would be beneficial, and their progressive update jointly with haze reduction will allow effective dehazing. To this end, we propose a multi-network dehazing framework containing novel interdependent dehazing and haze parameter updater n…
▽ More
Images with haze of different varieties often pose a significant challenge to dehazing. Therefore, guidance by estimates of haze parameters related to the variety would be beneficial, and their progressive update jointly with haze reduction will allow effective dehazing. To this end, we propose a multi-network dehazing framework containing novel interdependent dehazing and haze parameter updater networks that operate in a progressive manner. The haze parameters, transmission map and atmospheric light, are first estimated using dedicated convolutional networks that allow color-cast handling. The estimated parameters are then used to guide our dehazing module, where the estimates are progressively updated by novel convolutional networks. The updating takes place jointly with progressive dehazing using a network that invokes inter-step dependencies. The joint progressive updating and dehazing gradually modify the haze parameter values toward achieving effective dehazing. Through different studies, our dehazing framework is shown to be more effective than image-to-image mapping and predefined haze formation model based dehazing. The framework is also found capable of handling a wide variety of hazy conditions wtih different types and amounts of haze and color casts. Our dehazing framework is qualitatively and quantitatively found to outperform the state-of-the-art on synthetic and real-world hazy images of multiple datasets with varied haze conditions.
△ Less
Submitted 7 June, 2023; v1 submitted 4 August, 2020;
originally announced August 2020.
-
Decoding CNN based Object Classifier Using Visualization
Authors:
Abhishek Mukhopadhyay,
Imon Mukherjee,
Pradipta Biswas
Abstract:
This paper investigates how working of Convolutional Neural Network (CNN) can be explained through visualization in the context of machine perception of autonomous vehicles. We visualize what type of features are extracted in different convolution layers of CNN that helps to understand how CNN gradually increases spatial information in every layer. Thus, it concentrates on region of interests in e…
▽ More
This paper investigates how working of Convolutional Neural Network (CNN) can be explained through visualization in the context of machine perception of autonomous vehicles. We visualize what type of features are extracted in different convolution layers of CNN that helps to understand how CNN gradually increases spatial information in every layer. Thus, it concentrates on region of interests in every transformation. Visualizing heat map of activation helps us to understand how CNN classifies and localizes different objects in image. This study also helps us to reason behind low accuracy of a model helps to increase trust on object detection module.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Lightweight Modules for Efficient Deep Learning based Image Restoration
Authors:
Avisek Lahiri,
Sourav Bairagya,
Sutanu Bera,
Siddhant Haldar,
Prabir Kumar Biswas
Abstract:
Low level image restoration is an integral component of modern artificial intelligence (AI) driven camera pipelines. Most of these frameworks are based on deep neural networks which present a massive computational overhead on resource constrained platform like a mobile phone. In this paper, we propose several lightweight low-level modules which can be used to create a computationally low cost vari…
▽ More
Low level image restoration is an integral component of modern artificial intelligence (AI) driven camera pipelines. Most of these frameworks are based on deep neural networks which present a massive computational overhead on resource constrained platform like a mobile phone. In this paper, we propose several lightweight low-level modules which can be used to create a computationally low cost variant of a given baseline model. Recent works for efficient neural networks design have mainly focused on classification. However, low-level image processing falls under the image-to-image' translation genre which requires some additional computational modules not present in classification. This paper seeks to bridge this gap by designing generic efficient modules which can replace essential components used in contemporary deep learning based image restoration networks. We also present and analyse our results highlighting the drawbacks of applying depthwise separable convolutional kernel (a popular method for efficient classification network) for sub-pixel convolution based upsampling (a popular upsampling strategy for low-level vision applications). This shows that concepts from domain of classification cannot always be seamlessly integrated into image-to-image translation tasks. We extensively validate our findings on three popular tasks of image inpainting, denoising and super-resolution. Our results show that proposed networks consistently output visually similar reconstructions compared to full capacity baselines with significant reduction of parameters, memory footprint and execution speeds on contemporary mobile devices.
△ Less
Submitted 11 July, 2020;
originally announced July 2020.
-
Eye Gaze Controlled Interfaces for Head Mounted and Multi-Functional Displays in Military Aviation Environment
Authors:
LRD Murthy,
Abhishek Mukhopadhyay,
Varshit Yellheti,
Somnath Arjun,
Peter Thomas,
M Dilli Babu,
Kamal Preet Singh Saluja,
JeevithaShree DV,
Pradipta Biswas
Abstract:
Eye gaze controlled interfaces allow us to directly manipulate a graphical user interface just by looking at it. This technology has great potential in military aviation, in particular, operating different displays in situations where pilots hands are occupied with flying the aircraft. This paper reports studies on analyzing accuracy of eye gaze controlled interface inside aircraft undertaking rep…
▽ More
Eye gaze controlled interfaces allow us to directly manipulate a graphical user interface just by looking at it. This technology has great potential in military aviation, in particular, operating different displays in situations where pilots hands are occupied with flying the aircraft. This paper reports studies on analyzing accuracy of eye gaze controlled interface inside aircraft undertaking representative flying missions. We reported that pilots can undertake representative pointing and selection tasks at less than 2 secs on average. Further, we evaluated the accuracy of eye gaze tracking glass under various G-conditions and analyzed its failure modes. We observed that the accuracy of an eye tracker is less than 5 degree of visual angle up to +3G, although it is less accurate at minus 1G and plus 5G. We observed that eye tracker may fail to track under higher external illumination. We also infer that an eye tracker to be used in military aviation need to have larger vertical field of view than the present available systems. We used this analysis to develop eye gaze trackers for Multi-Functional displays and Head Mounted Display System. We obtained significant reduction in pointing and selection times using our proposed HMDS system compared to traditional TDS.
△ Less
Submitted 27 May, 2020;
originally announced May 2020.
-
Eye Gaze Controlled Robotic Arm for Persons with SSMI
Authors:
Vinay Krishna Sharma,
L. R. D. Murthy,
KamalPreet Singh Saluja,
Vimal Mollyn,
Gourav Sharma,
Pradipta Biswas
Abstract:
Background: People with severe speech and motor impairment (SSMI) often uses a technique called eye pointing to communicate with outside world. One of their parents, caretakers or teachers hold a printed board in front of them and by analyzing their eye gaze manually, their intentions are interpreted. This technique is often error prone and time consuming and depends on a single caretaker.
Objec…
▽ More
Background: People with severe speech and motor impairment (SSMI) often uses a technique called eye pointing to communicate with outside world. One of their parents, caretakers or teachers hold a printed board in front of them and by analyzing their eye gaze manually, their intentions are interpreted. This technique is often error prone and time consuming and depends on a single caretaker.
Objective: We aimed to automate the eye tracking process electronically by using commercially available tablet, computer or laptop and without requiring any dedicated hardware for eye gaze tracking. The eye gaze tracker is used to develop a video see through based AR (augmented reality) display that controls a robotic device with eye gaze and deployed for a fabric printing task.
Methodology: We undertook a user centred design process and separately evaluated the web cam based gaze tracker and the video see through based human robot interaction involving users with SSMI. We also reported a user study on manipulating a robotic arm with webcam based eye gaze tracker.
Results: Using our bespoke eye gaze controlled interface, able bodied users can select one of nine regions of screen at a median of less than 2 secs and users with SSMI can do so at a median of 4 secs. Using the eye gaze controlled human-robot AR display, users with SSMI could undertake representative pick and drop task at an average duration less than 15 secs and reach a randomly designated target within 60 secs using a COTS eye tracker and at an average time of 2 mins using the webcam based eye gaze tracker.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
Interactive Sensor Dashboard for Smart Manufacturing
Authors:
LRD Murthy,
Somnath Arjun,
Kamalpreet Singh Saluja,
Pradipta Biswas
Abstract:
This paper presents development of a smart sensor dashboard for Industry 4.0 encompassing both 2D and 3D visualization modules. In 2D module, we described physical connections among sensors and visualization modules and rendering data on 2D screen. A user study was presented where participants answered a few questions using four types of graphs. We analyzed eye gaze patterns in screen, number of c…
▽ More
This paper presents development of a smart sensor dashboard for Industry 4.0 encompassing both 2D and 3D visualization modules. In 2D module, we described physical connections among sensors and visualization modules and rendering data on 2D screen. A user study was presented where participants answered a few questions using four types of graphs. We analyzed eye gaze patterns in screen, number of correct answers and response time for all the four graphs. For 3D module, we developed a VR digital twin for sensor data visualization. A user study was presented evaluating the effect of different feedback scenarios on quantitative and qualitative metrics of interaction in the virtual environment. We compared visual and haptic feedback and a multimodal combination of both visual and haptic feedback for VR environment. We found that haptic feedback significantly improved quantitative metrics of interaction than a no feedback case whereas a multimodal feedback is significantly improved qualitative metrics of the interaction.
△ Less
Submitted 8 May, 2020;
originally announced May 2020.
-
Unsupervised Pre-trained, Texture Aware And Lightweight Model for Deep Learning-Based Iris Recognition Under Limited Annotated Data
Authors:
Manashi Chakraborty,
Mayukh Roy,
Prabir Kumar Biswas,
Pabitra Mitra
Abstract:
In this paper, we present a texture aware lightweight deep learning framework for iris recognition. Our contributions are primarily three fold. Firstly, to address the dearth of labelled iris data, we propose a reconstruction loss guided unsupervised pre-training stage followed by supervised refinement. This drives the network weights to focus on discriminative iris texture patterns. Next, we prop…
▽ More
In this paper, we present a texture aware lightweight deep learning framework for iris recognition. Our contributions are primarily three fold. Firstly, to address the dearth of labelled iris data, we propose a reconstruction loss guided unsupervised pre-training stage followed by supervised refinement. This drives the network weights to focus on discriminative iris texture patterns. Next, we propose several texture aware improvisations inside a Convolution Neural Net to better leverage iris textures. Finally, we show that our systematic training and architectural choices enable us to design an efficient framework with upto 100X fewer parameters than contemporary deep learning baselines yet achieve better recognition performance for within and cross dataset evaluations.
△ Less
Submitted 20 February, 2020;
originally announced February 2020.
-
The Angel is in the Priors: Improving GAN based Image and Sequence Inpainting with Better Noise and Structural Priors
Authors:
Avisek Lahiri,
Arnav Kumar Jain,
Prabir Kumar Biswas
Abstract:
Contemporary deep learning based inpainting algorithms are mainly based on a hybrid dual stage training policy of supervised reconstruction loss followed by an unsupervised adversarial critic loss. However, there is a dearth of literature for a fully unsupervised GAN based inpainting framework. The primary aversion towards the latter genre is due to its prohibitively slow iterative optimization re…
▽ More
Contemporary deep learning based inpainting algorithms are mainly based on a hybrid dual stage training policy of supervised reconstruction loss followed by an unsupervised adversarial critic loss. However, there is a dearth of literature for a fully unsupervised GAN based inpainting framework. The primary aversion towards the latter genre is due to its prohibitively slow iterative optimization requirement during inference to find a matching noise prior for a masked image. In this paper, we show that priors matter in GAN: we learn a data driven parametric network to predict a matching prior for a given image. This converts an iterative paradigm to a single feed forward inference pipeline with a massive 1500X speedup and simultaneous improvement in reconstruction quality. We show that an additional structural prior imposed on GAN model results in higher fidelity outputs. To extend our model for sequence inpainting, we propose a recurrent net based grouped noise prior learning. To our knowledge, this is the first demonstration of an unsupervised GAN based sequence inpainting. A further improvement in sequence inpainting is achieved with an additional subsequence consistency loss. These contributions improve the spatio-temporal characteristics of reconstructed sequences. Extensive experiments conducted on SVHN, Standford Cars, CelebA and CelebA-HQ image datasets, synthetic sequences and ViDTIMIT video datasets reveal that we consistently improve upon previous unsupervised baseline and also achieve comparable performances(sometimes also better) to hybrid benchmarks.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
Faster Unsupervised Semantic Inpainting: A GAN Based Approach
Authors:
Avisek Lahiri,
Arnav Kumar Jain,
Divyasri Nadendla,
Prabir Kumar Biswas
Abstract:
In this paper, we propose to improve the inference speed and visual quality of contemporary baseline of Generative Adversarial Networks (GAN) based unsupervised semantic inpainting. This is made possible with better initialization of the core iterative optimization involved in the framework. To our best knowledge, this is also the first attempt of GAN based video inpainting with consideration to t…
▽ More
In this paper, we propose to improve the inference speed and visual quality of contemporary baseline of Generative Adversarial Networks (GAN) based unsupervised semantic inpainting. This is made possible with better initialization of the core iterative optimization involved in the framework. To our best knowledge, this is also the first attempt of GAN based video inpainting with consideration to temporal cues. On single image inpainting, we achieve about 4.5-5$\times$ speedup and 80$\times$ on videos compared to baseline. Simultaneously, our method has better spatial and temporal reconstruction qualities as found on three image and one video dataset.
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
Fast Bayesian Uncertainty Estimation and Reduction of Batch Normalized Single Image Super-Resolution Network
Authors:
Aupendu Kar,
Prabir Kumar Biswas
Abstract:
Convolutional neural network (CNN) has achieved unprecedented success in image super-resolution tasks in recent years. However, the network's performance depends on the distribution of the training sets and degrades on out-of-distribution samples. This paper adopts a Bayesian approach for estimating uncertainty associated with output and applies it in a deep image super-resolution model to address…
▽ More
Convolutional neural network (CNN) has achieved unprecedented success in image super-resolution tasks in recent years. However, the network's performance depends on the distribution of the training sets and degrades on out-of-distribution samples. This paper adopts a Bayesian approach for estimating uncertainty associated with output and applies it in a deep image super-resolution model to address the concern mentioned above. We use the uncertainty estimation technique using the batch-normalization layer, where stochasticity of the batch mean and variance generate Monte-Carlo (MC) samples. The MC samples, which are nothing but different super-resolved images using different stochastic parameters, reconstruct the image, and provide a confidence or uncertainty map of the reconstruction. We propose a faster approach for MC sample generation, and it allows the variable image size during testing. Therefore, it will be useful for image reconstruction domain. Our experimental findings show that this uncertainty map strongly relates to the quality of reconstruction generated by the deep CNN model and explains its limitation. Furthermore, this paper proposes an approach to reduce the model's uncertainty for an input image, and it helps to defend the adversarial attacks on the image super-resolution model. The proposed uncertainty reduction technique also improves the performance of the model for out-of-distribution test images. To the best of our knowledge, we are the first to propose an adversarial defense mechanism in any image reconstruction domain.
△ Less
Submitted 19 May, 2021; v1 submitted 22 March, 2019;
originally announced March 2019.
-
Improved Techniques for GAN based Facial Inpainting
Authors:
Avisek Lahiri,
Arnav Jain,
Divyasri Nadendla,
Prabir Kumar Biswas
Abstract:
In this paper we present several architectural and optimization recipes for generative adversarial network(GAN) based facial semantic inpainting. Current benchmark models are susceptible to initial solutions of non-convex optimization criterion of GAN based inpainting. We present an end-to-end trainable parametric network to deterministically start from good initial solutions leading to more photo…
▽ More
In this paper we present several architectural and optimization recipes for generative adversarial network(GAN) based facial semantic inpainting. Current benchmark models are susceptible to initial solutions of non-convex optimization criterion of GAN based inpainting. We present an end-to-end trainable parametric network to deterministically start from good initial solutions leading to more photo realistic reconstructions with significant optimization speed up. For the first time, we show how to efficiently extend GAN based single image inpainter models to sequences by a)learning to initialize a temporal window of solutions with a recurrent neural network and b)imposing a temporal smoothness loss(during iterative optimization) to respect the redundancy in temporal dimension of a sequence. We conduct comprehensive empirical evaluations on CelebA images and pseudo sequences followed by real life videos of VidTIMIT dataset. The proposed method significantly outperforms current GAN based state-of-the-art in terms of reconstruction quality with a simultaneous speedup of over 15$\times$. We also show that our proposed model is better in preserving facial identity in a sequence even without explicitly using any face recognition module during training.
△ Less
Submitted 20 October, 2018;
originally announced October 2018.
-
Unsupervised Domain Adaptation for Learning Eye Gaze from a Million Synthetic Images: An Adversarial Approach
Authors:
Avisek Lahiri,
Abhinav Agarwalla,
Prabir Kumar Biswas
Abstract:
With contemporary advancements of graphics engines, recent trend in deep learning community is to train models on automatically annotated simulated examples and apply on real data during test time. This alleviates the burden of manual annotation. However, there is an inherent difference of distributions between images coming from graphics engine and real world. Such domain difference deteriorates…
▽ More
With contemporary advancements of graphics engines, recent trend in deep learning community is to train models on automatically annotated simulated examples and apply on real data during test time. This alleviates the burden of manual annotation. However, there is an inherent difference of distributions between images coming from graphics engine and real world. Such domain difference deteriorates test time performances of models trained on synthetic examples. In this paper we address this issue with unsupervised adversarial feature adaptation across synthetic and real domain for the special use case of eye gaze estimation which is an essential component for various downstream HCI tasks. We initially learn a gaze estimator on annotated synthetic samples rendered from a 3D game engine and then adapt the features of unannotated real samples via a zero-sum minmax adversarial game against a domain discriminator following the recent paradigm of generative adversarial networks. Such adversarial adaptation forces features of both domains to be indistinguishable which enables us to use regression models trained on synthetic domain to be used on real samples. On the challenging MPIIGaze real life dataset, we outperform recent fully supervised methods trained on manually annotated real samples by appreciable margins and also achieve 13\% more relative gain after adaptation compared to the current benchmark method of SimGAN
△ Less
Submitted 18 October, 2018;
originally announced October 2018.
-
Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images
Authors:
Avisek Lahiri,
Charan Reddy,
Prabir Kumar Biswas
Abstract:
Deep learning based object detectors require thousands of diversified bounding box and class annotated examples. Though image object detectors have shown rapid progress in recent years with the release of multiple large-scale static image datasets, object detection on videos still remains an open problem due to scarcity of annotated video frames. Having a robust video object detector is an essenti…
▽ More
Deep learning based object detectors require thousands of diversified bounding box and class annotated examples. Though image object detectors have shown rapid progress in recent years with the release of multiple large-scale static image datasets, object detection on videos still remains an open problem due to scarcity of annotated video frames. Having a robust video object detector is an essential component for video understanding and curating large-scale automated annotations in videos. Domain difference between images and videos makes the transferability of image object detectors to videos sub-optimal. The most common solution is to use weakly supervised annotations where a video frame has to be tagged for presence/absence of object categories. This still takes up manual effort. In this paper we take a step forward by adapting the concept of unsupervised adversarial image-to-image translation to perturb static high quality images to be visually indistinguishable from a set of video frames. We assume the presence of a fully annotated static image dataset and an unannotated video dataset. Object detector is trained on adversarially transformed image dataset using the annotations of the original dataset. Experiments on Youtube-Objects and Youtube-Objects-Subset datasets with two contemporary baseline object detectors reveal that such unsupervised pixel level domain adaptation boosts the generalization performance on video frames compared to direct application of original image object detector. Also, we achieve competitive performance compared to recent baselines of weakly supervised methods. This paper can be seen as an application of image translation for cross domain object detection.
△ Less
Submitted 4 October, 2018;
originally announced October 2018.
-
State-Space Identification of Unmanned Helicopter Dynamics using Invasive Weed Optimization Algorithm on Flight Data
Authors:
Navaneethkrishnan B,
Pranjal Biswas,
Saumya Kumaar Saksena,
Gautham Anand,
S N Omkar
Abstract:
In order to achieve a good level of autonomy in unmanned helicopters, an accurate replication of vehicle dynamics is required, which is achievable through precise mathematical modeling. This paper aims to identify a parametric state-space system for an unmanned helicopter to a good level of accuracy using Invasive Weed Optimization (IWO) algorithm. The flight data of Align TREX 550 flybarless heli…
▽ More
In order to achieve a good level of autonomy in unmanned helicopters, an accurate replication of vehicle dynamics is required, which is achievable through precise mathematical modeling. This paper aims to identify a parametric state-space system for an unmanned helicopter to a good level of accuracy using Invasive Weed Optimization (IWO) algorithm. The flight data of Align TREX 550 flybarless helicopter is used in the identification process. The rigid-body dynamics of the helicopter is modeled in a state-space form that has 40 parameters, which serve as control variables for the IWO algorithm. The results after 1000 iterations were compared with the traditionally used Prediction Error Minimization (PEM) method and also with Genetic Algorithm (GA), which serve as references. Results show a better level of correlation between the actual and estimated responses of the system identified using IWO to that of PEM and GA.
△ Less
Submitted 2 September, 2018;
originally announced September 2018.
-
Retinal Vessel Segmentation under Extreme Low Annotation: A Generative Adversarial Network Approach
Authors:
Avisek Lahiri,
Vineet Jain,
Arnab Mondal,
Prabir Kumar Biswas
Abstract:
Contemporary deep learning based medical image segmentation algorithms require hours of annotation labor by domain experts. These data hungry deep models perform sub-optimally in the presence of limited amount of labeled data. In this paper, we present a data efficient learning framework using the recent concept of Generative Adversarial Networks; this allows a deep neural network to perform signi…
▽ More
Contemporary deep learning based medical image segmentation algorithms require hours of annotation labor by domain experts. These data hungry deep models perform sub-optimally in the presence of limited amount of labeled data. In this paper, we present a data efficient learning framework using the recent concept of Generative Adversarial Networks; this allows a deep neural network to perform significantly better than its fully supervised counterpart in low annotation regime. The proposed method is an extension of our previous work with the addition of a new unsupervised adversarial loss and a structured prediction based architecture. To the best of our knowledge, this work is the first demonstration of an adversarial framework based structured prediction model for medical image segmentation. Though generic, we apply our method for segmentation of blood vessels in retinal fundus images. We experiment with extreme low annotation budget (0.8 - 1.6% of contemporary annotation size). On DRIVE and STARE datasets, the proposed method outperforms our previous method and other fully supervised benchmark models by significant margins especially with very low number of annotated examples. In addition, our systematic ablation studies suggest some key recipes for successfully training GAN based semi-supervised algorithms with an encoder-decoder style network architecture.
△ Less
Submitted 5 September, 2018;
originally announced September 2018.
-
Improving Consistency and Correctness of Sequence Inpainting using Semantically Guided Generative Adversarial Network
Authors:
Avisek Lahiri,
Arnav Jain,
Prabir Kumar Biswas,
Pabitra Mitra
Abstract:
Contemporary benchmark methods for image inpainting are based on deep generative models and specifically leverage adversarial loss for yielding realistic reconstructions. However, these models cannot be directly applied on image/video sequences because of an intrinsic drawback- the reconstructions might be independently realistic, but, when visualized as a sequence, often lacks fidelity to the ori…
▽ More
Contemporary benchmark methods for image inpainting are based on deep generative models and specifically leverage adversarial loss for yielding realistic reconstructions. However, these models cannot be directly applied on image/video sequences because of an intrinsic drawback- the reconstructions might be independently realistic, but, when visualized as a sequence, often lacks fidelity to the original uncorrupted sequence. The fundamental reason is that these methods try to find the best matching latent space representation near to natural image manifold without any explicit distance based loss. In this paper, we present a semantically conditioned Generative Adversarial Network (GAN) for sequence inpainting. The conditional information constrains the GAN to map a latent representation to a point in image manifold respecting the underlying pose and semantics of the scene. To the best of our knowledge, this is the first work which simultaneously addresses consistency and correctness of generative model based inpainting. We show that our generative model learns to disentangle pose and appearance information; this independence is exploited by our model to generate highly consistent reconstructions. The conditional information also aids the generator network in GAN to produce sharper images compared to the original GAN formulation. This helps in achieving more appealing inpainting performance. Though generic, our algorithm was targeted for inpainting on faces. When applied on CelebA and Youtube Faces datasets, the proposed method results in a significant improvement over the current benchmark, both in terms of quantitative evaluation (Peak Signal to Noise Ratio) and human visual scoring over diversified combinations of resolutions and deformations.
△ Less
Submitted 17 November, 2017; v1 submitted 16 November, 2017;
originally announced November 2017.
-
Modeling and Performance Comparison of Privacy Approaches for Location Based Services
Authors:
Pratima Biswas,
Ashok Singh Sairam
Abstract:
In pervasive computing environment, Location Based Services (LBSs) are getting popularity among users because of their usefulness in day-to-day life. LBSs are information services that use geospatial data of mobile device and smart phone users to provide information, entertainment and security in real time. A key concern in such pervasive computing environment is the need to reveal the user's exac…
▽ More
In pervasive computing environment, Location Based Services (LBSs) are getting popularity among users because of their usefulness in day-to-day life. LBSs are information services that use geospatial data of mobile device and smart phone users to provide information, entertainment and security in real time. A key concern in such pervasive computing environment is the need to reveal the user's exact location which may allow an adversary to infer private information about the user. To address the privacy concerns of LBS users, a large number of security approaches have been proposed based on the concept of k-anonymity. The central idea in location k-anonymity is to find a set of k-1 users confined in a given geographical area of the actual user, such that the location of these k users are indistinguishable from one another, thus protecting the identity of the user. Although a number of performance parameters like success rate, amount of privacy achieved are used to measure the performance of the k-anonymity approaches, they make the implicit, unrealistic assumption that the k-1 users are readily available. As such these approaches ignore the turnaround time to process a user request, which is crucial for a real-time application like LBS. In this work, we model the k-anonymity approaches using queuing theory to compute the average sojourn time of users and queue length of the system. To demonstrate that queuing theory can be used to model all k-anonymity approaches, we consider graph-based k-anonymity approaches. The proposed analytical model is further validated with experimental results.
△ Less
Submitted 14 November, 2017;
originally announced November 2017.
-
JU_KS_Group@FIRE 2016: Consumer Health Information Search
Authors:
Kamal Sarkar,
Debanjan Das,
Indra Banerjee,
Mamta Kumari,
Prasenjit Biswas
Abstract:
In this paper, we describe the methodology used and the results obtained by us for completing the tasks given under the shared task on Consumer Health Information Search (CHIS) collocated with the Forum for Information Retrieval Evaluation (FIRE) 2016, ISI Kolkata. The shared task consists of two sub-tasks - (1) task1: given a query and a document/set of documents associated with that query, the t…
▽ More
In this paper, we describe the methodology used and the results obtained by us for completing the tasks given under the shared task on Consumer Health Information Search (CHIS) collocated with the Forum for Information Retrieval Evaluation (FIRE) 2016, ISI Kolkata. The shared task consists of two sub-tasks - (1) task1: given a query and a document/set of documents associated with that query, the task is to classify the sentences in the document as relevant to the query or not and (2) task 2: the relevant sentences need to be further classified as supporting the claim made in the query, or opposing the claim made in the query. We have participated in both the sub-tasks. The percentage accuracy obtained by our developed system for task1 was 73.39 which is third highest among the 9 teams participated in the shared task.
△ Less
Submitted 24 December, 2016;
originally announced December 2016.
-
Deep Neural Ensemble for Retinal Vessel Segmentation in Fundus Images towards Achieving Label-free Angiography
Authors:
Avisek Lahiri,
Abhijit Guha Roy,
Debdoot Sheet,
Prabir Kumar Biswas
Abstract:
Automated segmentation of retinal blood vessels in label-free fundus images entails a pivotal role in computed aided diagnosis of ophthalmic pathologies, viz., diabetic retinopathy, hypertensive disorders and cardiovascular diseases. The challenge remains active in medical image analysis research due to varied distribution of blood vessels, which manifest variations in their dimensions of physical…
▽ More
Automated segmentation of retinal blood vessels in label-free fundus images entails a pivotal role in computed aided diagnosis of ophthalmic pathologies, viz., diabetic retinopathy, hypertensive disorders and cardiovascular diseases. The challenge remains active in medical image analysis research due to varied distribution of blood vessels, which manifest variations in their dimensions of physical appearance against a noisy background.
In this paper we formulate the segmentation challenge as a classification task. Specifically, we employ unsupervised hierarchical feature learning using ensemble of two level of sparsely trained denoised stacked autoencoder. First level training with bootstrap samples ensures decoupling and second level ensemble formed by different network architectures ensures architectural revision. We show that ensemble training of auto-encoders fosters diversity in learning dictionary of visual kernels for vessel segmentation. SoftMax classifier is used for fine tuning each member auto-encoder and multiple strategies are explored for 2-level fusion of ensemble members. On DRIVE dataset, we achieve maximum average accuracy of 95.33\% with an impressively low standard deviation of 0.003 and Kappa agreement coefficient of 0.708 . Comparison with other major algorithms substantiates the high efficacy of our model.
△ Less
Submitted 19 September, 2016;
originally announced September 2016.
-
Forward Stagewise Additive Model for Collaborative Multiview Boosting
Authors:
Avisek Lahiri,
Biswajit Paria,
Prabir Kumar Biswas
Abstract:
Multiview assisted learning has gained significant attention in recent years in supervised learning genre. Availability of high performance computing devices enables learning algorithms to search simultaneously over multiple views or feature spaces to obtain an optimum classification performance. The paper is a pioneering attempt of formulating a mathematical foundation for realizing a multiview a…
▽ More
Multiview assisted learning has gained significant attention in recent years in supervised learning genre. Availability of high performance computing devices enables learning algorithms to search simultaneously over multiple views or feature spaces to obtain an optimum classification performance. The paper is a pioneering attempt of formulating a mathematical foundation for realizing a multiview aided collaborative boosting architecture for multiclass classification. Most of the present algorithms apply multiview learning heuristically without exploring the fundamental mathematical changes imposed on traditional boosting. Also, most of the algorithms are restricted to two class or view setting. Our proposed mathematical framework enables collaborative boosting across any finite dimensional view spaces for multiclass learning. The boosting framework is based on forward stagewise additive model which minimizes a novel exponential loss function. We show that the exponential loss function essentially captures difficulty of a training sample space instead of the traditional `1/0' loss. The new algorithm restricts a weak view from over learning and thereby preventing overfitting. The model is inspired by our earlier attempt on collaborative boosting which was devoid of mathematical justification. The proposed algorithm is shown to converge much nearer to global minimum in the exponential loss space and thus supersedes our previous algorithm. The paper also presents analytical and numerical analysis of convergence and margin bounds for multiview boosting algorithms and we show that our proposed ensemble learning manifests lower error bound and higher margin compared to our previous model. Also, the proposed model is compared with traditional boosting and recent multiview boosting algorithms.
△ Less
Submitted 5 August, 2016;
originally announced August 2016.
-
WEPSAM: Weakly Pre-Learnt Saliency Model
Authors:
Avisek Lahiri,
Sourya Roy,
Anirban Santara,
Pabitra Mitra,
Prabir Kumar Biswas
Abstract:
Visual saliency detection tries to mimic human vision psychology which concentrates on sparse, important areas in natural image. Saliency prediction research has been traditionally based on low level features such as contrast, edge, etc. Recent thrust in saliency prediction research is to learn high level semantics using ground truth eye fixation datasets. In this paper we present, WEPSAM : Weakly…
▽ More
Visual saliency detection tries to mimic human vision psychology which concentrates on sparse, important areas in natural image. Saliency prediction research has been traditionally based on low level features such as contrast, edge, etc. Recent thrust in saliency prediction research is to learn high level semantics using ground truth eye fixation datasets. In this paper we present, WEPSAM : Weakly Pre-Learnt Saliency Model as a pioneering effort of using domain specific pre-learing on ImageNet for saliency prediction using a light weight CNN architecture. The paper proposes a two step hierarchical learning, in which the first step is to develop a framework for weakly pre-training on a large scale dataset such as ImageNet which is void of human eye fixation maps. The second step refines the pre-trained model on a limited set of ground truth fixations. Analysis of loss on iSUN and SALICON datasets reveal that pre-trained network converges much faster compared to randomly initialized network. WEPSAM also outperforms some recent state-of-the-art saliency prediction models on the challenging MIT300 dataset.
△ Less
Submitted 3 May, 2016;
originally announced May 2016.