制作一个简单的语义分割训练镜像#
参考ymir镜像制作简介, 通过加载 /in 目录下的数据集,超参数,任务信息,预训练权重, 在 /out 目录下产生模型权重,进度文件,训练日志。
镜像输入输出示例#
.
├── in
│ ├── annotations
│ │ └── coco-annotations.json
│ ├── assets -> /home/ymir/ymir/ymir-workplace/sandbox/0001/asset_cache
│ ├── config.yaml
│ ├── env.yaml
│ ├── models
│ │ ├── best_mIoU_iter_180.pth
│ │ └── fast_scnn_lr0.12_8x4_160k_cityscapes.py
│ ├── train-index.tsv
│ └── val-index.tsv
├── out
│ ├── models
│ │ ├── 20221103_082913.log
│ │ ├── 20221103_082913.log.json
│ │ ├── fast_scnn_lr0.12_8x4_160k_cityscapes.py
│ │ ├── iter_10000.pth
│ │ ├── iter_12000.pth
│ │ ├── iter_14000.pth
│ │ ├── iter_16000.pth
│ │ ├── iter_18000.pth
│ │ ├── iter_20000.pth
│ │ ├── latest.pth -> iter_20000.pth
│ │ └── result.yaml
│ ├── monitor.txt
│ ├── tensorboard -> /home/ymir/ymir/ymir-workplace/ymir-tensorboard-logs/0001/t00000010000043b47591667304420
│ └── ymir-executor-out.log
└── task_config.yaml
工作目录#
cd seg-semantic-demo-tmi
提供超参数模型文件#
镜像中包含/img-man/training-template.yaml 表示镜像支持训练
指明数据格式 export_format 为 seg-coco:raw, 即语义/实例分割标注格式,详情参考Ymir镜像数据集格式
# training template for your executor app
# after build image, it should at /img-man/training-template.yaml
# key: gpu_id, task_id, pretrained_model_paths, class_names, gpu_count should be preserved
# gpu_id: '0'
# gpu_count: 1
# task_id: 'default-training-task'
# pretrained_model_params: []
# class_names: []
# format of annotations and images that ymir should provide to this docker container
# annotation format: must be seg-coco
# image format: must be raw
export_format: 'seg-coco:raw'
# just for test, remove this key in your own docker image
expected_miou: 0.983 # expected mIoU for training task
idle_seconds: 3 # idle seconds for each task
RUN mkdir -p /img-man # 在镜像中生成/img-man目录
COPY img-man/*.yaml /img-man/ # 将主机中img-man目录下的所有yaml文件复制到镜像/img-man目录
提供镜像说明文件#
object_type 为 3 表示镜像支持语义分割
# 3 for semantic segmentation
"object_type": 3
- Dockerfile
COPY img-man/*.yaml /img-man/在复制training-template.yaml的同时,会将manifest.yaml复制到镜像中的/img-man目录
提供默认启动脚本#
- Dockerfile
RUN echo "python /app/start.py" > /usr/bin/start.sh # 生成启动脚本 /usr/bin/start.sh
CMD bash /usr/bin/start.sh # 将镜像的默认启动脚本设置为 /usr/bin/start.sh
实现基本功能#
sample function of training
which shows: - how to get config file - how to read training and validation datasets - how to write logs - how to write training result
Source code in seg-semantic-demo-tmi/app/start.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |
写进度#
if idx % monitor_gap == 0:
monitor.write_monitor_logger(percent=0.2 * idx / N)
monitor.write_monitor_logger(percent=0.2)
monitor.write_monitor_logger(percent=1.0)
写结果文件#
rw.write_model_stage(stage_name='epoch20',
files=['epoch20.pt', 'config.py'],
evaluation_result=dict(mIoU=expected_miou))
写tensorboard日志#
write_tensorboard_log(cfg.ymir.output.tensorboard_dir)
制作镜像 demo/semantic_seg:training#
# a docker file for an sample training / mining / infer executor
# FROM ubuntu:20.04
FROM python:3.8.16
ENV LANG=C.UTF-8
# Change mirror
RUN sed -i 's#http://archive.ubuntu.com#http://mirrors.ustc.edu.cn#g' /etc/apt/sources.list \
&& sed -i 's#http://security.ubuntu.com#http://mirrors.ustc.edu.cn#g' /etc/apt/sources.list
# Set timezone
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \
&& echo 'Asia/Shanghai' >/etc/timezone
# Install linux package
RUN apt-get update && apt-get install -y gnupg2 git libglib2.0-0 \
libgl1-mesa-glx libsm6 libxext6 libxrender-dev \
build-essential ninja-build \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt /app/
RUN pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
WORKDIR /app
# copy user code to WORKDIR
COPY ./app/*.py /app/
# copy user config template and manifest.yaml to /img-man
RUN mkdir -p /img-man
COPY img-man/*.yaml /img-man/
# view https://github.com/protocolbuffers/protobuf/issues/10051 for detail
ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
# entry point for your app
# the whole docker image will be started with `nvidia-docker run <other options> <docker-image-name>`
# and this command will run automatically
RUN echo "python /app/start.py" > /usr/bin/start.sh
CMD bash /usr/bin/start.sh
docker build -t demo/semantic_seg:training -f Dockerfile .