User Guide

User Guide#

This guide covers how to configure FlagScale and run training, inference, serving, and reinforcement learning tasks.

Step 1: Configure YAML files#

FlagScale uses Hydra for configuration management. Every task is driven by two YAML files that work together: an experiment-level file and a task-level file, both in the examples/ directory. Before running the task, you need to configure these files first.

Experiment-level YAML#

Use the examples/qwen3/conf/serve.yaml as an example to explain this configuration file.

The experiment-level file is the entry point for flagscale commands. It defines a global context for the run:

where outputs are stored: exp_dir: outputs/${experiment.exp_name}
which backend engine to use: backend: vllm
which task-level file to load: defaults: - serve: 8b

# Example: examples/qwen3/conf/serve.yaml
defaults:
- _self_
- serve: 8b

experiment:
  exp_name: qwen3_8b
  exp_dir: outputs/${experiment.exp_name}
  task:
    type: serve
    backend: vllm
  runner:
    hostfile: null
    deploy:
      use_fs_serve: false
  envs:
    CUDA_VISIBLE_DEVICES: 0
    CUDA_DEVICE_MAX_CONNECTIONS: 1

action: run

hydra:
  run:
    dir: ${experiment.exp_dir}/hydra

Task-level YAML#

Use the examples/qwen3/conf/serve/8b.yaml as an example to explain this configuration file.

The task-level YAML file specifies the model, dataset, and parameters for specific tasks such as training or inference. Every parameter in this file maps directly to an argument accepted by the backend engine, with hyphens (-) replaced by underscores (_).

# Example: examples/qwen3/conf/serve/8b.yaml
- serve_id: vllm_model
  engine_args:
    model: ${oc.env:QWEN3_PATH}
    host: 0.0.0.0
    uvicorn_log_level: warning
    port: ${oc.env:QWEN3_PORT}
    gpu_memory_utilization: 0.9
    trust_remote_code: true
    no_enable_prefix_caching: true
    compilation_config: '{"full_cuda_graph": true}'

Step 2： Run tasks#

FlagScale provides a unified runner for various tasks, including training, inference, reinforcement learning, and serving. Simply specify the configuration file to run the task with a single flagscale command. The runner will automatically load the configurations and execute the task. The following sections demonstrate how to run a distributed training task.

Train#

Require Megatron-LM-FL enviroment

Prepare dataset demo and tokenizer:

Download dataset: We provide a small processed data (bin and idx) from the Pile dataset.

mkdir -p ./data && cd ./data
wget https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/datasets/enron_emails_demo_text_document_qwen/enron_emails_demo_text_document_qwen.idx
wget https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/datasets/enron_emails_demo_text_document_qwen/enron_emails_demo_text_document_qwen.bin

Download tokenizer

mkdir -p ./qwentokenizer && cd ./qwentokenizer
wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/tokenizer_config.json" -O tokenizer_config.json
wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/qwen.tiktoken" -O qwen.tiktoken
wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/qwen_generation_utils.py" -O qwen_generation_utils.py
wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/tokenization_qwen.py" -O tokenization_qwen.py

Modify the paths of the dataset and tokenizer in the task-level YAML file and the model name in the experiment-level YAML file {style=lower-alpha}

Task-level YAML file: Modify the data_path and tokenizer_path in ./examples/qwen3/conf/train/0_6b.yaml.

data:
 data_path: ./data/enron_emails_demo_text_document_qwen    # modify data_path here
 split: 1
 no_mmap_bin_files: true
 tokenizer:
     legacy_tokenizer: true
     tokenizer_type: QwenTokenizerFS
     tokenizer_path: ./qwentokenizer   # modify tokenizer_path here
     vocab_size: 151936
     make_vocab_size_divisible_by: 64

Experiment-level YAML: Modify trainmodel name in ./examples/qwen3/conf/train.yaml. The value must match the file name 0_6b.yaml above.

defaults:
- _self_
- train: 0_6b  # modify: train value must match its corresponding config file name

Start the distributed training job:

flagscale train qwen3 --config ./examples/qwen3/conf/train.yaml
# or
flagscale train qwen3 -c ./examples/qwen3/conf/train.yaml

Stop the distributed training job:
```
flagscale train qwen3 --stop
```

Inference#

Require vLLM-FL environment

Download inference model

modelscope download --model Qwen/Qwen3-4B --local_dir ./Qwen3-4B

Modify the model path in the task-level YAML file and the model name in the experiment-level YAML file {style=lower-alpha}
1. Task-level YAML file: Modify the model path in ./examples/qwen3/conf/inference/4b.yaml.
```
llm:
 model: ./Qwen3-4B         # modify: Set model directory
 trust_remote_code: true
 tensor_parallel_size: 1
 pipeline_parallel_size: 1
 gpu_memory_utilization: 0.9
 seed: 1234
```
1. Experiment-level YAML: Modify inferencemodel name in ./examples/qwen3/conf/inference_fl.yaml. The value must match the file name 4b.yaml above.
```
defaults:
- _self_
- inference: 4b    # modify: Inference value must match its corresponding config file name
```
Start inference:

flagscale inference qwen3 --config ./examples/qwen3/conf/inference_fl.yaml
# or
flagscale inference qwen3 -c ./examples/qwen3/conf/inference_fl.yaml

Serve#

Download serving model

Modify the model path in the task-level YAML file and the model name in the experiment-level YAML file {style=lower-alpha}

Task-level YAML file: Modify the model path in ./examples/qwen3/conf/serve/0_6b.yaml.

- serve_id: vllm_model
engine_args:
   model: ./Qwen3-0.6B          # modify: Set model directory
   host: 0.0.0.0
   max_model_len: 4096
   max_num_seqs: 4
   uvicorn_log_level: warning
   port: 30000                  # A port available in your env, for example: 30000

Experiment-level YAML: Modify serve model name in ./examples/qwen3/conf/serve.yaml.

defaults:
- _self_
- serve: 0_6b         # modify: Serve value must match its corresponding config file name
experiment:
exp_name: qwen3-0.6b  # modify as needed for test clarity
exp_dir: outputs/${experiment.exp_name}
task:
   type: serve
   backend: vllm
runner:
   hostfile: null
   deploy:
   use_fs_serve: false
envs:
   CUDA_VISIBLE_DEVICES: 0
   CUDA_DEVICE_MAX_CONNECTIONS: 1

Start the server:

flagscale serve qwen3 --config ./examples/qwen3/conf/serve.yaml
# or
flagscale serve qwen3 -c ./examples/qwen3/conf/serve.yaml

Stop the server:
```
flagscale serve qwen3 --stop
```

Reinforcement Learning#

Require verl-FL environment

Download model

modelscope download --model Qwen/Qwen3-0.6B --local_dir ./Qwen3-0.6B

Download dataset

mkdir gsm8k && cd gsm8k
wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/rl/datasets/gsm8k/train.parquet"
wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/rl/datasets/gsm8k/test.parquet"