
Working at NCSA, I Dockerize code written by scientists. That code includes large AI model files and tons of loosely defined dependencies. I don’t want to understand their code, I just want to run it. But the Docker images end up at 25 GB, which is a huge pain and breaks our K8 cluster.
Let’s discuss Docker Slim: https://github.com/slimtoolkit/slim. Its claim to fame is “Black box optimization” that simply “traces the files touched when executing your image and removes everything that isn’t on the ‘hot path‘ of your test script.” In theory, that’s literally perfect. Keep everything you need and nothing you don’t with near-zero user effort: just write a run script that executes all the functions you require in your image.
Here’s my minimal workflow
- Build your normal image.
docker build -t kastanday/aceretro:prod .
2. Slim it down. You should see a healthy space savings like cmd=build info=results status='MINIFIED' by='8.36X' size.original='25 GB' size.optimized='3.0 GB'
# inject env vars with -e
slim build \
--target kastanday/aceretro:prod \
--tag kastanday/aceretro:slim \
--http-probe=false \
-e "MINIO_URL=<PROJECT>.ncsa.illinois.edu" \
-e "MINIO_ACCESS_KEY=<KEY>" \
-e "MINIO_SECRET_ACCESS_KEY=<SECRET>" \
--exec "python3 entrypoint.py --job_id 21"
3. Test the slim image. Now you might hit the frustration of Docker Slim: sometimes it removes too much. It’s a game of adding paths to files you want to keep. Run your test and check for missing file errors.
docker run \
-e "MINIO_URL=<PROJECT>.ncsa.illinois.edu" \
-e "MINIO_ACCESS_KEY=<KEY>" \
-e "MINIO_SECRET_ACCESS_KEY=<SECRET>" \
kastanday/aceretro:slim \
python entrypoint.py --job_id 21
4. If files are missing, add them with --include-path=... In this case, I’m adding all the python dependencies inside the conda env. Remember, include-path points to files inside the container, so check your Dockerfile to see where your files are placed, or docker exec to get inside and look around the file tree.
slim build \
--target kastanday/aceretro:prod \
--tag kastanday/aceretro:slim \
--http-probe=false \
--include-path=/opt/conda/envs/aceretro-env/lib/python3.6 \
--exec "python3 entrypoint.py --job_id 21"
After this, I was still missing a few important files, so I added more includes:
slim build \
--target kastanday/aceretro:prod \
--tag kastanday/aceretro:slim \
--http-probe=false \
--include-path=/opt/conda/envs/aceretro-env/lib/python3.6 \
--include-path=/app/pathway_search_standalone/rxn_cluster_token_prompt \
--include-path=/app/pathway_search_standalone/askcos-core \
--exec "python3 entrypoint.py --job_id 21"
Finally, I retained enough for my image tests to succeed. My file size savings fell from 22x to 5x, but at least I didn’t have to debug the scientist’s code. That’s the beauty: black box optimization.
Caveats
This wasn’t as awesome as I hoped. I did ~10 rounds of adding --include-path= one at a time as files were found missing. Each iteration took several minutes because my test script ran all the AI models each time. That could be optimized by skipping steps if the output already existed or something… but that’s probably not worth the effort.
Moreover, it’s tough to judge how granular you can be when including new paths. You want to be as targeted as possible, but that’s a lot more rounds of iteration. Seems like it defeats the purpose of “automatic tracing”.
My tracing seems messed up: it didn’t even include basic python files that were obviously used during execution. Something’s a bit weird. I’ve read the docs and I’m pretty sure I’m doing everything right, but there must be a better way to fix these missing files. Leave a comment down below if you know a better solution.
Docker in Docker
To avoid installing docker slim, you can run it as a docker container. That’s what I do. It’s more verbose, but keeps your system clean.
# running slim as a container
sudo docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock \
dslim/slim build \
--target kastanday/aceretro:prod \
--tag kastanday/aceretro:slim \
--http-probe=false \
--include-path=/opt/conda/envs/aceretro-env/lib/python3.6 \
--include-path=/app/pathway_search_standalone/rxn_cluster_token_prompt \
--include-path=/app/pathway_search_standalone/askcos-core \
-e "MINIO_URL=<PROJECT>.ncsa.illinois.edu" \
-e "MINIO_ACCESS_KEY=<KEY>" \
-e "MINIO_SECRET_ACCESS_KEY=<SECRET>" \
--exec "python entrypoint.py --job_id 21"
About the http probe
If your docker container is a web server (flask/express) and exposes HTTP endpoints, slim can automatically detect those endpoints and invoke all of them. Therefore, it tries to run everything in your app and delete everything that wasn’t used. I haven’t tried this feature, but *in theory* it sounds great. Just pointing it out here, that’s why I had to specify --http-probe=false in my usage above.
If you have any advice for me, leave a comment here or shoot me an email kastanvday [at] the gmails.