Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
data		data
src		src
README.md		README.md
install_packages.sh		install_packages.sh

README.md

Distillation Contrastive Decoding (DCD) Evaluation

Overview

This package provides a method for evaluating the performance of Language Learning Models (LLMs) on various standard benchmarks. For more information about the evaluation process, please refer to our DCD paper.

Installation

# If you have already done this, you can skip these steps
git clone https://github.com/pphuc25/distillation-contrastive-decoding.git
cd distillation-contrastive-decoding
pip install -e .

# Setting up the evaluation environment
cd dcd_eval
bash install_packages.sh

Basic Usage

To evaluate the generative performance of a language model on a specific dataset (GSM8K or StrategyQA), use the following command:

python3 src/run_generation.py \
    --model_name_or_path $model_name_or_path \
    --task $task \
    --ntrain $ntrain \
    --seed $seed

# Alternatively, you can use the existing bash file

bash configs/combined/deepseak/quantize-strategy-deepseek-7b-base-beta08.sh

Experiments

Main Arguments

Argument	Example	Description
`--model_name_or_path`	`meta-llama/Llama-2-7b-hf`	Specifies the model to be used.
`--student_name_or_path`	`TheBloke/Llama-2-7B-AWQ`	Specifies the student model to be used. In our context, it's the quantized model.
`--prompt_file`	`gsm8k`	The name of the dataset to be evaluated on the test set.
`--constractive_prompt_student`	`4`	The types of contrastive CoT prompting for the amateur model. The number corresponds with the prompting detail in the paper (See appendix for more detail).
`--outfile`	`output_path.json`	The location to store the output results.
`--alpha_coef`	`1`	The threshold for plausibility.
`--beta_coef`	`27`	The strength of the amateur model compared to the expert model or the adjustment factor for the amateur penalty.
`--dropout_num`	`0.1`	The dropout rate of the amateur model.

Other Arguments

Argument	Example	Description
`--cot_flag`	`enable`	Add the flag text to extract the results. By default, the flag is "The answer is ".
`--fp16`	`enable`	The model will run in float 16 (with quantization on the amateur model, this setting only loads on the expert model).
`--bf16`	`enable`	The model will run in bfloat 16 (with quantization on the amateur model, this setting only loads on the expert model).
`--max_new_tokens`	`256`	The maximum number of tokens generated by the model.

Understanding `--constractive_prompt_student`

The --constractive_prompt_student argument accepts an integer from 1 to 4, each corresponding to a type of contrastive prompting. By specifying different types, we can adjust the decoding behavior of the amateur model.

Arithmetic Task (GSM8K)

Types	Description of Types Contrastive CoT Prompting
1	Rule-based Number Shuffle.
2	Rule-based Number Shuffle with Calculation Wrong
3	Synthetic Demonstration

Commonsense Task (StrategyQA)

Types	Description of Types Contrastive CoT Prompting
1	Synthetic Demonstration.

Citation

If you find this useful in your research, please consider citing:

@misc{phan2024distillation,
      title={Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation},
      author={Phuc Phan and Hieu Tran and Long Phan},
      year={2024},
      eprint={2402.14874},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dcd_eval

dcd_eval

README.md

Distillation Contrastive Decoding (DCD) Evaluation

Overview

Installation

Basic Usage

Experiments

Main Arguments

Other Arguments

Understanding `--constractive_prompt_student`

Arithmetic Task (GSM8K)

Commonsense Task (StrategyQA)

Citation

Files

dcd_eval

Directory actions

More options

Directory actions

More options

Latest commit

History

dcd_eval

Folders and files

parent directory

README.md

Distillation Contrastive Decoding (DCD) Evaluation

Overview

Installation

Basic Usage

Experiments

Main Arguments

Other Arguments

Understanding --constractive_prompt_student

Arithmetic Task (GSM8K)

Commonsense Task (StrategyQA)

Citation

Understanding `--constractive_prompt_student`