-
Understanding random forests and overfitting: a visualization and simulation study
Authors:
Lasai Barreñada,
Paula Dhiman,
Dirk Timmerman,
Anne-Laure Boulesteix,
Ben Van Calster
Abstract:
Random forests have become popular for clinical risk prediction modelling. In a case study on predicting ovarian malignancy, we observed training c-statistics close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behaviour of random forests by (1) visualizing data space in three real world case studies and (2) a simulation study. For t…
▽ More
Random forests have become popular for clinical risk prediction modelling. In a case study on predicting ovarian malignancy, we observed training c-statistics close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behaviour of random forests by (1) visualizing data space in three real world case studies and (2) a simulation study. For the case studies, risk estimates were visualised using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true c-statistic and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 were simulated and RF models trained with minimum node size 2 or 20 using ranger package, resulting in 192 scenarios in total. The visualizations suggested that the model learned spikes of probability around events in the training set. A cluster of events created a bigger peak, isolated events local peaks. In the simulation study, median training c-statistics were between 0.97 and 1 unless there were 4 or 16 binary predictors with minimum node size 20. Median test c-statistics were higher with higher events per variable, higher minimum node size, and binary predictors. Median training slopes were always above 1, and were not correlated with median test slopes across scenarios (correlation -0.11). Median test slopes were higher with higher true c-statistic, higher minimum node size, and higher sample size. Random forests learn local probability peaks that often yield near perfect training c-statistics without strongly affecting c-statistics on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression
Authors:
Ruben van den Goorbergh,
Maarten van Smeden,
Dirk Timmerman,
Ben Van Calster
Abstract:
Methods to correct class imbalance, i.e. imbalance between the frequency of outcome events and non-events, are receiving increasing interest for developing prediction models. We examined the effect of imbalance correction on the performance of standard and penalized (ridge) logistic regression models in terms of discrimination, calibration, and classification. We examined random undersampling, ran…
▽ More
Methods to correct class imbalance, i.e. imbalance between the frequency of outcome events and non-events, are receiving increasing interest for developing prediction models. We examined the effect of imbalance correction on the performance of standard and penalized (ridge) logistic regression models in terms of discrimination, calibration, and classification. We examined random undersampling, random oversampling and SMOTE using Monte Carlo simulations and a case study on ovarian cancer diagnosis. The results indicated that all imbalance correction methods led to poor calibration (strong overestimation of the probability to belong to the minority class), but not to better discrimination in terms of the area under the receiver operating characteristic curve. Imbalance correction improved classification in terms of sensitivity and specificity, but similar results were obtained by shifting the probability threshold instead. Our study shows that outcome imbalance is not a problem in itself, and that imbalance correction may even worsen model performance.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Video Camera Identification from Sensor Pattern Noise with a Constrained ConvNet
Authors:
Derrick Timmerman,
Swaroop Bennabhaktula,
Enrique Alegre,
George Azzopardi
Abstract:
The identification of source cameras from videos, though it is a highly relevant forensic analysis topic, has been studied much less than its counterpart that uses images. In this work we propose a method to identify the source camera of a video based on camera specific noise patterns that we extract from video frames. For the extraction of noise pattern features, we propose an extended version of…
▽ More
The identification of source cameras from videos, though it is a highly relevant forensic analysis topic, has been studied much less than its counterpart that uses images. In this work we propose a method to identify the source camera of a video based on camera specific noise patterns that we extract from video frames. For the extraction of noise pattern features, we propose an extended version of a constrained convolutional layer capable of processing color inputs. Our system is designed to classify individual video frames which are in turn combined by a majority vote to identify the source camera. We evaluated this approach on the benchmark VISION data set consisting of 1539 videos from 28 different cameras. To the best of our knowledge, this is the first work that addresses the challenge of video camera identification on a device level. The experiments show that our approach is very promising, achieving up to 93.1% accuracy while being robust to the WhatsApp and YouTube compression techniques. This work is part of the EU-funded project 4NSEEK focused on forensics against child sexual abuse.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Excitation efficiency and limitations of the luminescence of Eu3+ ions in GaN
Authors:
Dolf Timmerman,
Brandon Mitchell,
Shuhei Ichikawa,
Jun Tatebayashi,
Masaaki Ashida,
Yasufumi Fujiwara
Abstract:
The excitation efficiency and external luminescence quantum efficiency of trivalent Eu3+ ions doped into gallium nitride (GaN) was studied under optical and electrical excitation. For small pump fluences it was found that the excitation of Eu3+ ions is limited by an efficient carrier trap that competes in the energy transfer from the host material. For large pump fluences the limited number of hig…
▽ More
The excitation efficiency and external luminescence quantum efficiency of trivalent Eu3+ ions doped into gallium nitride (GaN) was studied under optical and electrical excitation. For small pump fluences it was found that the excitation of Eu3+ ions is limited by an efficient carrier trap that competes in the energy transfer from the host material. For large pump fluences the limited number of high-efficiency Eu3+ sites, and the small excitation cross-section of the majority Eu3+ site, limit the quantum efficiency. At low temperatures under optimal excitation conditions, the external luminescence quantum efficiency reached a value of 46%. These results show the high potential for this material as an efficient light emitter, and demonstrates the importance of the excitation conditions on the light output efficiency.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Dosimetric and Deformation Effects of Image-Guided Interventions during Stereotactic Body Radiation Therapy of the Prostate using an Endorectal Balloon
Authors:
Bernard L. Jones,
Gregory Gan,
Quentin Diot,
Brian Kavanagh,
Robert D. Timmerman,
Moyed Miften
Abstract:
During Stereotactic Body Radiotherapy (SBRT) for the treatment of prostate cancer, an inflatable endorectal balloon (ERB) may be used to reduce motion of the target and reduce the dose to the posterior rectal wall. This work assessed the dosimetric impact of manual interventions on ERB position in patients receiving prostate SBRT and investigated the impact of ERB interventions on prostate shape.…
▽ More
During Stereotactic Body Radiotherapy (SBRT) for the treatment of prostate cancer, an inflatable endorectal balloon (ERB) may be used to reduce motion of the target and reduce the dose to the posterior rectal wall. This work assessed the dosimetric impact of manual interventions on ERB position in patients receiving prostate SBRT and investigated the impact of ERB interventions on prostate shape. Daily kilovoltage (kV) cone-beam computed tomography (CBCT) imaging was performed to localize the PTV, and an automated fusion with the planning images yielded displacements required for PTV re-localization. When the ERB volume and/or position were judged to yield inaccurate repositioning, manual adjustment (ERB re-inflation and/or repositioning) was performed. Based on all 59 CBCT image sets acquired, a deformable registration algorithm was used to determine the dose received by, displacement of, and deformation of the prostate, bladder, and anterior rectal wall. This dose tracking methodology was applied to images taken before and after manual adjustment of the ERB (intervention), and the delivered dose was compared to that which would have been delivered in the absence of intervention. Interventions occurred in 24 out of 35 (69%) of the treated fractions. The direct effect of these interventions was an increase in the prostate radiation dose that included 95% of the PTV (D95) and an increase in prostate coverage. Additionally, ERB interventions reduced prostate deformation in the anterior-posterior (AP) direction, reduced errors in the sagittal rotation of the prostate, and increased the similarity in shape of the prostate to the radiotherapy plan. Image-guided interventions in ERB volume and/or position during prostate SBRT were necessary to ensure the delivery of the dose distribution as planned.
△ Less
Submitted 10 July, 2013;
originally announced July 2013.
-
Direct bandgap optical transitions in Si nanocrystals
Authors:
A. A. Prokofiev,
A. S. Moskalenko,
I. N. Yassievich,
W. D. A. M. de Boer,
D. Timmerman,
H. Zhang,
W. J. Buma,
T. Gregorkiewicz
Abstract:
The effect of quantum confinement on the direct bandgap of spherical Si nanocrystals has been modelled theoretically. We conclude that the energy of the direct bandgap at the $Γ$-point decreases with size reduction: quantum confinement enhances radiative recombination across the direct bandgap and introduces its "red" shift for smaller grains. We postulate to identify the frequently reported eff…
▽ More
The effect of quantum confinement on the direct bandgap of spherical Si nanocrystals has been modelled theoretically. We conclude that the energy of the direct bandgap at the $Γ$-point decreases with size reduction: quantum confinement enhances radiative recombination across the direct bandgap and introduces its "red" shift for smaller grains. We postulate to identify the frequently reported efficient blue emission (F-band) from Si nanocrystals with this zero-phonon recombination. In a dedicated experiment, we confirm the "red" shift of the F-band, supporting the proposed identification.
△ Less
Submitted 5 February, 2010; v1 submitted 27 January, 2009;
originally announced January 2009.
-
Energy transfer processes in Er-doped SiO2 sensitized with Si nanocrystals
Authors:
I. Izeddin,
D. Timmerman,
T. Gregorkiewicz,
A. S. Moskalenko,
A. A. Prokofiev,
I. N. Yassievich,
M. Fujii
Abstract:
We present a high-resolution photoluminescence study of Er-doped SiO2 sensitized with Si nanocrystals (Si NCs). Emission bands originating from recombination of excitons confined in Si NCs and of internal transitions within the 4f-electron core of Er3+ ions, and a band centered at lambda = 1200nm have been identified. Their kinetics have been investigated in detail. Based on these measurements,…
▽ More
We present a high-resolution photoluminescence study of Er-doped SiO2 sensitized with Si nanocrystals (Si NCs). Emission bands originating from recombination of excitons confined in Si NCs and of internal transitions within the 4f-electron core of Er3+ ions, and a band centered at lambda = 1200nm have been identified. Their kinetics have been investigated in detail. Based on these measurements, we present a comprehensive model for energy transfer mechanisms responsible for light generation in this system. A unique picture of energy flow between subsystems of Er3+ and Si NCs is developed, yielding truly microscopic information on the sensitization effect and its limitations. In particular, we show that most of the Er3+ ions available in the system are participating in the energy exchange. The long standing problem of apparent loss of optical activity of majority of Er dopants upon sensitization with Si NCs is clarified and assigned to appearance of a very efficient energy exchange mechanism between Si NCs and Er3+ ions. Application potential of SiO2:Er sensitized by Si NCs is discussed in view of the newly acquired microscopic insight.
△ Less
Submitted 5 June, 2008;
originally announced June 2008.