Guide for using paddle with FlagCX on XPU machines#
Environment setup#
prepare a docker container on XPU machines
start the docker container
sudo docker exec -it [container_name] bash
clone Paddle
git clone https://github.com/PaddlePaddle/Paddle.git
Compile paddle with FlagCX#
Please follow the following commands
# Checkout develop branch
cd Paddle && git checkout develop
# Create build directory
mkdir build && cd build
# Install paddle dependencies
pip install -r ../python/requirements.txt
# Run cmake
cmake .. -GNinja -DPY_VERSION=3.10 -DCMAKE_BUILD_TYPE=Release \
-DWITH_GPU=OFF \
-DWITH_XPU=ON \
-DON_INFER=OFF \
-DWITH_PYTHON=ON \
-DWITH_XPU_XRE5=ON \
-DWITH_MKL=ON \
-DWITH_XPU_BKCL=ON \
-DWITH_FLAGCX=ON \
-DWITH_TESTING=OFF \
-DWITH_DISTRIBUTE=ON \
-DWITH_XPTI=OFF \
-DBUILD_WHL_PACKAGE=ON \
-DWITH_XPU_XFT=OFF
# Compile
ninja -j$(nproc)
# Locate paddle whl package
cd ./python/dist
# Install whl package
pip install -U [whl_package_name]
# Check if installation was successful
python -c "import paddle;paddle.utils.run_check()"
Train model using paddle + FlagCX#
We now support training GPT3 on XPU machines using Paddle + FlagCX. Please refer to the following steps to get started
clone PaddleNLP
git clone https://github.com/PaddlePaddle/PaddleNLP.git
install dependencies
pip install -r requirements.txt pip install -r requirements-dev.txt
download data
# create data repository mkdir -p ./llm/data cd ./llm/data # download data wget https://bj.bcebos.com/paddlenlp/models/transformers/gpt/data/gpt2_openwebtext_100k.bin wget https://bj.bcebos.com/paddlenlp/models/transformers/gpt/data/gpt2_openwebtext_100k.idx
prepare training script
please refer to the following script for training GPT3# this is the script for training gpt3 on XPU machines using flagcx as communication backend # define root path export root_path=/workspace export PYTHONPATH=$root_path/PaddleNLP:$PYTHONPATH export PADDLE_DISTRI_BACKEND=flagcx # log export GLOG_v=0 export FLAGCX_DEBUG=INFO export FLAGCX_DEBUG_SUBSYS=INIT export XPU_FORCE_SHARED_DEVICE_CONTEXT=1 current_date=$(date +"%m%d") task_name="gpt13b_dynamic_hand_nosp_ly4_debug_$current_date" log_dir="log_$current_date/${task_name}_1" output_dir="output_$current_date/${task_name}_1" rm -rf ${log_dir} rm -rf ${output_dir} python -u -m paddle.distributed.launch \ --xpus "0,1,2,3,4,5,6,7" \ --log_dir ${log_dir} \ run_pretrain.py \ ${root_path}/PaddleNLP/tests/test_tipc/dygraph/hybrid_parallelism/gpt3/auto_config_gpt3_13b/pretrain-gpt3_13b-config.json echo "---- $task_name performance:" echo "throughput(tokens/s/card):" cat ${log_dir}/workerlog.0 | grep "interval_tokens_per_second_per_device:" | awk -F ',' '{print $11}' | awk -F ' ' '{print $2}' | awk 'NR > 10 {print $1}' |sort -n | awk '{values[NR] = $1} END {for (i = 3; i <= NR-2; i++) sum += values[i]; print sum / (NR-4)}' echo "max_memory_allocated(GB):" cat ${log_dir}/workerlog.0 | grep "interval_tokens_per_second_per_device:" | awk -F ',' '{print $7}' | tail -n 1 echo "max_memory_reserved(GB):" cat ${log_dir}/workerlog.0 | grep "interval_tokens_per_second_per_device:" | awk -F ',' '{print $8}' | tail -n 1
Note:
To train model using xpu, we need to specify the device type in model config json file:"device": "xpu"