This repository contains the code for the paper "Incorporating graph-based prior knowledge into Bayesian Neural Networks"
In order to run the code, you need to install the project dependencies first. Recommend way to do this is to first
create a conda environment and install the packages in the requirements.txt
>> conda create --name bnn_bg
>> conda activate bnn_bg
>> pip install -r requirements.txt
In addition to the package dependencies listed in requirements.txt
, the project depends on two other projects:
- netZooPy: For running the PANDA algorithm [1] and generating the prior knowledge graph
- HorseshoeBNN: For running the Horseshoe BNN model [2], which is one of the baseline models.
We had to make changes to parts of the code in both of these projects. We made the following changes:
- netZooPy: We added a functionality to save the updated Protein-Protein Interaction (PPI) and correlation matrices. The original implementation can be found here.
- HorseshoeBNN: We added a functionality to support more than one layer. The original implementation (can be found here) only supported one layer.
In order to adhere to the double-blind review policy, we anonymized the changes we made to both projects and included a compressed version of the modified projects. To install the modified versions of these projects, run the following commands:
unzip netZooPy.zip
cd netZooPy
pip install -r requirements.txt
pip install .
unzip horseshoe-bnn.zip
cd horseshoe-bnn
pip install -r requirements.txt
pip install .
The GDSC drug experiments are run using the run_drug_exps.py
scripts. To reproduce the results in the paper, you
should use the random seeds in the seeds.txt
file. The data for the experiments can be downloaded from here (in zip format). The
data should be placed in the data/gdsc
(the default path) folder.
The following command runs the experiments for the GDSC drugs for the BNN + BG, BNN w/o BG and Random
Forest models. The results are saved in the data/gdsc/exps
folder.
./run_drug_exps.py --seed seeds.txt
To run the experiments for the Horseshoe BNN model, you should set the horseshoe_bnn
flag to True
:
./run_drug_exps.py --seed seeds.txt --horseshoe_bnn 1
The GAT model is run using the run_gnn.py
script.
./run_gnn.py --seed seeds.txt
./run_ft_rank.py --seed seeds.txt
Note: The feature ranking experiment needs to be run after the GDSC drug experiments as it uses the results from these experiments.
The public datasets experiments are run using the run_public_exps.py
script. The datasets can be downloaded from
here (in zip format). The data
should be placed in the data/pub_data
(the default path) folder. The results in the
paper are obtained using 10-fold cross-validation. To reproduce the results, you should use the first 10 seeds in
the seeds.txt
file.
head -n 10 seeds.txt > seeds_10.txt
./run_public_exps.py --seed seeds_10.txt
The above command runs the experiments for the BNN + BG, BNN w/o BG and Random Forest models. The results are saved
in the data/pub_data/exps
folder. To run the experiments for the Horseshoe BNN model, you should set the horseshoe_bnn
flag to True
:
head -n 10 seeds.txt > seeds_10.txt
./run_public_exps.py --seed seeds_10.txt --horseshoe_bnn 1
./run_drug_ft_rank.py --seed seeds.txt --exp_dir path/to/experiment/results
The summary tables in the paper are generated using the gen_summary_table.py
script. To generate the tables,
run the following command:
./gen_summary_table.py --seed seeds.txt --exp_dir path/to/experiment/results --save_dir path/to/save/tables
--data_type gdsc
Use the --data_type
flag to specify whether to generate the tables for the GDSC or public datasets experiments.
Note: You have to run the experiments first before generating the summary tables.
The feature ranking plots in the paper are generated using the gen_ft_rank_plots.py
script. To generate the plots,
run the following command:
./gen_ft_rank_plots.py --seed seeds.txt --exp_dir path/to/experiment/results --save_dir path/to/save/plots
Note: You have to run the feature ranking experiments first before generating the plots.