Improve the chatbot

The process to further improve the current alpaca model with larger datasets.

Dataset

First of all, a big thank to Alpaca-CoT for collecting and formatting such a detailed dataset. And a great training framework for efficient training process.

I desided to further improve current model alpaca 7B/13B in 2 steps:

The final dataset results in:

And we combine json files with:

import json

files=['file1.json','file2.json','file3.json']
output_file = 'combined.json'

def merge_JsonFiles(filename):
    result = list()
    for f1 in filename:
        with open(f1, 'r') as infile:
            result.extend(json.load(infile))

    with open(output_file, 'w') as output_file:
        json.dump(result, output_file)

merge_JsonFiles(files)

Train steps

step 1:

After setting up the dependencies with:

git clone git@github.com:PhoebusSi/Alpaca-CoT.git
pip install -r requirements.txt

First we train a small size mode with small size of dataset to validate the training process.

Combine the jsons:

files=['alpaca_data_cleaned.json','CoT_data.json','belle_data05cn.json']

Train with

export HF_DATASETS_CACHE="/home/.cache"
torchrun --nproc_per_node 8  \
    --nnodes=1 --node_rank=0 uniform_finetune.py \
    --model_type llama --model_name_or_path ../llama_weights_converted/7B \
    --data alpaca-cot-belle --lora_target_modules q_proj v_proj \
    --per_gpu_train_batch_size 128 --gradient_accumulation_steps 32 \
    --learning_rate 3e-4 --epochs 1

We found durining training, the convergence speed is slow, and the memory usage is not optimized using lora method. As a result we add the learning rate as well as the batch size and per_gpu_train_batch_size.

Second, when deploiying the test model, it still doesn’t know where to stop for each model, and the dialogue performance is poor as well. We decided to change add_eos_token=False to True. And decided to include more dialog related dataset.

Step 2: bigger dataset

At this stage, we only fine tune the 7B model, applying the above measurements, we

combine the jsons:

files=['CoT_data.json','gpt4all_without_p3_formatted.json', 'Vicuna.json', 'belle_data1M_cn.json', 'dialog_w_context/train.json']

train with

export HF_DATASETS_CACHE="/home/.cache"
torchrun --nproc_per_node 8 uniform_finetune.py --model_type llama  \
		--model_name_or_path ../llama_weights_converted/13B  \
		--data gpt4-cot-belle1M-vicuna-dialog --lora_target_modules q_proj v_proj  \
		--per_gpu_train_batch_size 32 --gradient_accumulation_steps 1  \
		--learning_rate 4.5e-4 --epochs 1 &

It takes 1h for data spliting and mapping and 10h 37m 26s for training on 8 NVIDIA A800-SXM4-80GB GPUs. And the training summary is presented in the Appendix.

Note that the training process is badly inefficient, refer to the appendix, the memory accessing takes as high as 75% time of training. The high memory accessing problem might because of training 7B on 8 devices is too much. And the low memory usage is because I want to keep to batch size not higher than 256 to get a good generalization ability. But I have no time to do ablation on this super parameter.

Deployment and comparision

We deploy the model with context using the same way as the last section of previous blog.

Actually it is really difficult to evaluate a large language model. Using GPT4 is a possible approach but I don’t have OpenAI plus. So at this stage, I evaluate by myself.

Here are some example results between the further fine tuned model and the alpaca model from precious blog.

python alpaca_backend.py --size 7 --data gpt4-cot-belle1M-vicuna-dialog --bit 1
python alpaca_backend.py --size 7 --data alpaca --bit 1

dialog & discourse ability

Alpaca Fine tuned alpaca
Screenshot from 2023-04-07 14-27-04 Screenshot from 2023-04-07 14-27-15

Chinese ability

Alpaca Fine tuned alpaca
Screenshot from 2023-04-07 17-27-04 Screenshot from 2023-04-07 17-27-04
Screenshot from 2023-04-07 17-37-13 Screenshot from 2023-04-07 17-39-48
Screenshot from 2023-04-07 17-34-39 Screenshot from 2023-04-07 17-25-09

We can the Chinese ability of fine tuned version shows a big improvement. Impressively, even though Alpaca hasn't been trained in Chinese. It can understand Chinese prompt. As I shown in the first example, when I asked "What is the emergent abilities of LLM[1]?" The Alpaca model gives the model perfectly.

In the second example, when I ask "Introduce Shanghai" in Chinese. The model also clearly understands my instruction and gives me what I want, though in English.

However, the translation ability is still poor. It's too hard for it.

Appendix

Overview
State finished
Start time April 6th, 2023 at 4:55:13 pm
Duration 10h 37m 26s
Hostname localhost.localdomain
OS Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17
Python version 3.9.12
Python executable /home/conda/llama/bin/python
Command /home/singleGPU/chatbot/fintune/Alpaca-CoT-main/uniform_finetune.py --model_type llama --model_name_or_path ../llama_weights_converted/7B --data gpt4-cot-belle1M-vicuna-dialog --lora_target_modules q_proj v_proj --per_gpu_train_batch_size 32 --gradient_accumulation_steps 1 --learning_rate 4.5e-4 --epochs 1
System Hardware
CPU count 56
GPU count 8
GPU type NVIDIA A800-SXM4-80GB
Train logs:
epoch 1
global_step 6573
learning_rate 9.732735980225552e-7
loss 0.8531
total_flos 34166331403636048000
train_loss 0.8965309444272535
train_runtime 38249.6774
train_samples_per_second 43.989
train_steps_per_second 0.172
Training process visualization
W&B Chart 4_7_2023, 4_32_33 PM W&B Chart 4_7_2023, 4_33_38 PM
W&B Chart 4_7_2023, 4_33_12 PM W&B Chart 4_7_2023, 4_32_57 PM
W&B Chart 4_7_2023, 4_34_23 PM W&B Chart 4_7_2023, 4_34_05 PM
W&B Chart 4_7_2023, 4_34_36 PM W&B Chart 4_7_2023, 4_34_54 PM

References

Belle: LianjiaTech/BELLE

alpaca COT: PhoebusSi/Alpaca-CoT

alpaca COT dataset: Alpaca-CoT


Improve the chatbot
https://daydreamatnight.github.io/2023/04/02/Improve-the-chatbot/
Author
Ryan LI
Posted on
April 2, 2023
Licensed under