(Translated by https://www.hiragana.jp/)
distil-cd/dcd_eval at main · pphuc25/distil-cd · GitHub
Skip to content

Latest commit

 

History

History

dcd_eval

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Distillation Contrastive Decoding (DCD) Evaluation

Overview

This package provides a method for evaluating the performance of Language Learning Models (LLMs) on various standard benchmarks. For more information about the evaluation process, please refer to our DCD paper.

Installation

# If you have already done this, you can skip these steps
git clone https://github.com/pphuc25/distillation-contrastive-decoding.git
cd distillation-contrastive-decoding
pip install -e .

# Setting up the evaluation environment
cd dcd_eval
bash install_packages.sh

Basic Usage

To evaluate the generative performance of a language model on a specific dataset (GSM8K or StrategyQA), use the following command:

python3 src/run_generation.py \
    --model_name_or_path $model_name_or_path \
    --task $task \
    --ntrain $ntrain \
    --seed $seed

# Alternatively, you can use the existing bash file

bash configs/combined/deepseak/quantize-strategy-deepseek-7b-base-beta08.sh

Experiments

Main Arguments

Argument Example Description
--model_name_or_path meta-llama/Llama-2-7b-hf Specifies the model to be used.
--student_name_or_path TheBloke/Llama-2-7B-AWQ Specifies the student model to be used. In our context, it's the quantized model.
--prompt_file gsm8k The name of the dataset to be evaluated on the test set.
--constractive_prompt_student 4 The types of contrastive CoT prompting for the amateur model. The number corresponds with the prompting detail in the paper (See appendix for more detail).
--outfile output_path.json The location to store the output results.
--alpha_coef 1 The threshold for plausibility.
--beta_coef 27 The strength of the amateur model compared to the expert model or the adjustment factor for the amateur penalty.
--dropout_num 0.1 The dropout rate of the amateur model.

Other Arguments

Argument Example Description
--cot_flag *enable* Add the flag text to extract the results. By default, the flag is "The answer is ".
--fp16 *enable* The model will run in float 16 (with quantization on the amateur model, this setting only loads on the expert model).
--bf16 *enable* The model will run in bfloat 16 (with quantization on the amateur model, this setting only loads on the expert model).
--max_new_tokens 256 The maximum number of tokens generated by the model.

Understanding --constractive_prompt_student

The --constractive_prompt_student argument accepts an integer from 1 to 4, each corresponding to a type of contrastive prompting. By specifying different types, we can adjust the decoding behavior of the amateur model.

Arithmetic Task (GSM8K)

Types Description of Types Contrastive CoT Prompting
1 Rule-based Number Shuffle.
2 Rule-based Number Shuffle with Calculation Wrong
3 Synthetic Demonstration

Commonsense Task (StrategyQA)

Types Description of Types Contrastive CoT Prompting
1 Synthetic Demonstration.

Citation

If you find this useful in your research, please consider citing:

@misc{phan2024distillation,
      title={Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation},
      author={Phuc Phan and Hieu Tran and Long Phan},
      year={2024},
      eprint={2402.14874},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}