Search | arXiv e-print repository

A large dataset curation and benchmark for drug target interaction

Authors: Alex Golts, Vadim Ratner, Yoel Shoshan, Moshe Raboh, Sagi Polaczek, Michal Ozery-Flato, Daniel Shats, Liam Hazan, Sivan Ravid, Efrat Hexter

Abstract: Bioactivity data plays a key role in drug discovery and repurposing. The resource-demanding nature of \textit{in vitro} and \textit{in vivo} experiments, as well as the recent advances in data-driven computational biochemistry research, highlight the importance of \textit{in silico} drug target interaction (DTI) prediction approaches. While numerous large public bioactivity data sources exist, res… ▽ More Bioactivity data plays a key role in drug discovery and repurposing. The resource-demanding nature of \textit{in vitro} and \textit{in vivo} experiments, as well as the recent advances in data-driven computational biochemistry research, highlight the importance of \textit{in silico} drug target interaction (DTI) prediction approaches. While numerous large public bioactivity data sources exist, research in the field could benefit from better standardization of existing data resources. At present, different research works that share similar goals are often difficult to compare properly because of different choices of data sources and train/validation/test split strategies. Additionally, many works are based on small data subsets, leading to results and insights of possible limited validity. In this paper we propose a way to standardize and represent efficiently a very large dataset curated from multiple public sources, split the data into train, validation and test sets based on different meaningful strategies, and provide a concrete evaluation protocol to accomplish a benchmark. We analyze the proposed data curation, prove its usefulness and validate the proposed benchmark through experimental studies based on an existing neural network model. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2307.01984 [pdf, other]

The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT

Authors: Nicholas Heller, Fabian Isensee, Dasha Trofimova, Resha Tejpaul, Zhongchen Zhao, Huai Chen, Lisheng Wang, Alex Golts, Daniel Khapun, Daniel Shats, Yoel Shoshan, Flora Gilboa-Solomon, Yasmeen George, Xi Yang, Jianpeng Zhang, Jing Zhang, Yong Xia, Mengran Wu, Zhiyang Liu, Ed Walczak, Sean McSweeney, Ranveer Vasdev, Chris Hornung, Rafat Solaiman, Jamee Schoephoerster , et al. (20 additional authors not shown)

Abstract: This paper presents the challenge report for the 2021 Kidney and Kidney Tumor Segmentation Challenge (KiTS21) held in conjunction with the 2021 international conference on Medical Image Computing and Computer Assisted Interventions (MICCAI). KiTS21 is a sequel to its first edition in 2019, and it features a variety of innovations in how the challenge was designed, in addition to a larger dataset.… ▽ More This paper presents the challenge report for the 2021 Kidney and Kidney Tumor Segmentation Challenge (KiTS21) held in conjunction with the 2021 international conference on Medical Image Computing and Computer Assisted Interventions (MICCAI). KiTS21 is a sequel to its first edition in 2019, and it features a variety of innovations in how the challenge was designed, in addition to a larger dataset. A novel annotation method was used to collect three separate annotations for each region of interest, and these annotations were performed in a fully transparent setting using a web-based annotation tool. Further, the KiTS21 test set was collected from an outside institution, challenging participants to develop methods that generalize well to new populations. Nonetheless, the top-performing teams achieved a significant improvement over the state of the art set in 2019, and this performance is shown to inch ever closer to human-level performance. An in-depth meta-analysis is presented describing which methods were used and how they faired on the leaderboard, as well as the characteristics of which cases generally saw good performance, and which did not. Overall KiTS21 facilitated a significant advancement in the state of the art in kidney tumor segmentation, and provides useful insights that are applicable to the field of semantic segmentation as a whole. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 34 pages, 12 figures

arXiv:2303.11971 [pdf, other]

Defect Detection Approaches Based on Simulated Reference Image

Authors: Nati Ofir, Yotam Ben Shoshan, Ran Badanes, Boris Sherman

Abstract: This work is addressing the problem of defect anomaly detection based on a clean reference image. Specifically, we focus on SEM semiconductor defects in addition to several natural image anomalies. There are well-known methods to create a simulation of an artificial reference image by its defect specimen. In this work, we introduce several applications for this capability, that the simulated refer… ▽ More This work is addressing the problem of defect anomaly detection based on a clean reference image. Specifically, we focus on SEM semiconductor defects in addition to several natural image anomalies. There are well-known methods to create a simulation of an artificial reference image by its defect specimen. In this work, we introduce several applications for this capability, that the simulated reference is beneficial for improving their results. Among these defect detection methods are classic computer vision applied on difference-image, supervised deep-learning (DL) based on human labels, and unsupervised DL which is trained on feature-level patterns of normal reference images. We show in this study how to incorporate correctly the simulated reference image for these defect and anomaly detection applications. As our experiment demonstrates, simulated reference achieves higher performance than the real reference of an image of a defect and anomaly. This advantage of simulated reference occurs mainly due to the less noise and geometric variations together with better alignment and registration to the original defect background. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2302.08291 [pdf]

On-chip fully reconfigurable Artificial Neural Network in 16 nm FinFET for Positron Emission Tomography

Authors: Andrada Muntean, Yonatan Shoshan, Slava Yuzhaninov, Emanuele Ripiccini, Claudio Bruschini, Alexander Fish, Edoardo Charbon

Abstract: Smarty is a fully-reconfigurable on-chip feed-forward artificial neural network (ANN) with ten integrated time-to-digital converters (TDCs) designed in a 16 nm FinFET CMOS technology node. The integration of TDCs together with an ANN aims to reduce system complexity and minimize data throughput requirements in positron emission tomography (PET) applications. The TDCs have an average LSB of 53.5 ps… ▽ More Smarty is a fully-reconfigurable on-chip feed-forward artificial neural network (ANN) with ten integrated time-to-digital converters (TDCs) designed in a 16 nm FinFET CMOS technology node. The integration of TDCs together with an ANN aims to reduce system complexity and minimize data throughput requirements in positron emission tomography (PET) applications. The TDCs have an average LSB of 53.5 ps. The ANN is fully reconfigurable, the user being able to change its topology as desired within a set of constraints. The chip can execute 363 MOPS with a maximum power consumption of 1.9 mW, for an efficiency of 190 GOPS/W. The system performance was tested in a coincidence measurement setup interfacing Smarty with two groups of five 4 mm x 4 mm analog silicon photomultipliers (A-SiPMs) used as inputs for the TDCs. The ANN successfully distinguished between six different positions of a radioactive source placed between the two photodetector arrays by solely using the TDC timestamps. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: 13 pages, 24 Figures

arXiv:1811.07311 [pdf, other]

Regularized adversarial examples for model interpretability

Authors: Yoel Shoshan, Vadim Ratner

Abstract: As machine learning algorithms continue to improve, there is an increasing need for explaining why a model produces a certain prediction for a certain input. In recent years, several methods for model interpretability have been developed, aiming to provide explanation of which subset regions of the model input is the main reason for the model prediction. In parallel, a significant research communi… ▽ More As machine learning algorithms continue to improve, there is an increasing need for explaining why a model produces a certain prediction for a certain input. In recent years, several methods for model interpretability have been developed, aiming to provide explanation of which subset regions of the model input is the main reason for the model prediction. In parallel, a significant research community effort is occurring in recent years for developing adversarial example generation methods for fooling models, while not altering the true label of the input,as it would have been classified by a human annotator. In this paper, we bridge the gap between adversarial example generation and model interpretability, and introduce a modification to the adversarial example generation process which encourages better interpretability. We analyze the proposed method on a public medical imaging dataset, both quantitatively and qualitatively, and show that it significantly outperforms the leading known alternative method. Our suggested method is simple to implement, and can be easily plugged into most common adversarial example generation frameworks. Additionally, we propose an explanation quality metric - $APE$ - "Adversarial Perturbative Explanation", which measures how well an explanation describes model decisions. △ Less

Submitted 21 November, 2018; v1 submitted 18 November, 2018; originally announced November 2018.

arXiv:1805.11837 [pdf, other]

Learning multiple non-mutually-exclusive tasks for improved classification of inherently ordered labels

Authors: Vadim Ratner, Yoel Shoshan, Tal Kachman

Abstract: Medical image classification involves thresholding of labels that represent malignancy risk levels. Usually, a task defines a single threshold, and when developing computer-aided diagnosis tools, a single network is trained per such threshold, e.g. as screening out healthy (very low risk) patients to leave possibly sick ones for further analysis (low threshold), or trying to find malignant cases a… ▽ More Medical image classification involves thresholding of labels that represent malignancy risk levels. Usually, a task defines a single threshold, and when developing computer-aided diagnosis tools, a single network is trained per such threshold, e.g. as screening out healthy (very low risk) patients to leave possibly sick ones for further analysis (low threshold), or trying to find malignant cases among those marked as non-risk by the radiologist ("second reading", high threshold). We propose a way to rephrase the classification problem in a manner that yields several problems (corresponding to different thresholds) to be solved simultaneously. This allows the use of Multiple Task Learning (MTL) methods, significantly improving the performance of the original classifier, by facilitating effective extraction of information from existing data. △ Less

Submitted 21 November, 2018; v1 submitted 30 May, 2018; originally announced May 2018.

arXiv:1805.11601 [pdf, other]

AdapterNet - learning input transformation for domain adaptation

Authors: Alon Hazan, Yoel Shoshan, Daniel Khapun, Roy Aladjem, Vadim Ratner

Abstract: Deep neural networks have demonstrated impressive performance in various machine learning tasks. However, they are notoriously sensitive to changes in data distribution. Often, even a slight change in the distribution can lead to drastic performance reduction. Artificially augmenting the data may help to some extent, but in most cases, fails to achieve model invariance to the data distribution. So… ▽ More Deep neural networks have demonstrated impressive performance in various machine learning tasks. However, they are notoriously sensitive to changes in data distribution. Often, even a slight change in the distribution can lead to drastic performance reduction. Artificially augmenting the data may help to some extent, but in most cases, fails to achieve model invariance to the data distribution. Some examples where this sub-class of domain adaptation can be valuable are various imaging modalities such as thermal imaging, X-ray, ultrasound, and MRI, where changes in acquisition parameters or acquisition device manufacturer will result in a different representation of the same input. Our work shows that standard fine-tuning fails to adapt the model in certain important cases. We propose a novel method of adapting to a new data source, and demonstrate near perfect adaptation on a customized ImageNet benchmark. Moreover, our method does not require any samples from the original data set, it is completely explainable and can be tailored to the task. △ Less

Submitted 15 November, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

Showing 1–7 of 7 results for author: Shoshan, Y