Difference between revisions of "Docker"

From Interaction Station Wiki
Jump to navigation Jump to search
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
ML Docker Image installed on the Interaction Station ML computers:<br/>
+
ML Docker Image installed on the Interaction Station ML computers (Ubuntu 16.04):<br/>
'''Deepo'''. It includes:<br/>
 
*cudnn
 
*theano
 
*tensorflow
 
*sonnet
 
*pytorch 
 
*keras
 
*lasagne
 
*mxnet
 
*cntk
 
*chainer
 
*caffe
 
*caffe2
 
*torch
 
  
'''Run Deepo image with Docker:'''
+
=Installing Docker CE:=
*sudo nvidia-docker run -it ufoym/deepo:gpu bash
+
*sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
 +
*curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
 +
*sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable"
 +
*sudo apt-get update
  
'''Run Deepo image with Docker (with python 2.7):'''
+
*More info: https://unix.stackexchange.com/questions/363048/unable-to-locate-package-docker-ce-on-a-64bit-ubuntu
*sudo nvidia-docker run -it ufoym/deepo:py27 bash
 
  
'''Setting up ML computers:'''
+
==Change Docker root dir using systemd (Don't do this, set volume instead)==
*Linux distribution installed: Ubuntu 16.04
+
*systemctl status docker.service
 +
*sudo nano /etc/default/docker
 +
*Edit ExecStart line to look like this ExecStart =/usr/bin/dockerd -g /media/MachineLearning/docker -H fd://
 +
*systemctl daemon-reload
 +
*systemctl restart docker
 +
*sudo docker info - verify the root dir has updated
 +
*https://github.com/IronicBadger/til/blob/master/docker/change-docker-root.md
  
'''Partition made for machine learning:MachineLearning'''
+
==Docker - clean up all the volumes==
*In Windows: Disk Management -> Resize DataStorage
+
*sudo docker system prune -a -f --volumes
*Create new ext4 patition
 
  
'''Mounting the partition automatically:'''
 
  
*'''Get the UUID of the learning:MachineLearning partition'''
+
=Installing nvidia-docker v1 (deprecated!):=
*sudo blkid
+
*docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
 
+
*sudo apt-get purge -y nvidia-docker
*'''Add partition to fstab:'''
+
*curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
*sudo nano /etc/fstab
+
*sudo apt-key add -
*Add at the bottom these two lines:
+
*distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
*UUID=(id of the MachineLearning partition) /media/MachineLearning rw,suid,dev,auto,user,async,exec      0      2
+
*curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
*UUID=(id of the DataStorage partition) /media/DataStorage ntfs-3g defaults=en_US.UTF-8 0 0
+
*sudo tee /etc/apt/sources.list.d/nvidia-docker.list
 
+
*sudo apt-get update
*The parameters mounting the MachineLearning partition solved this problem running caffe from that partition:
+
*sudo apt-get install -y nvidia-docker
*https://github.com/rbgirshick/py-faster-rcnn/issues/162
+
*sudo pkill -SIGHUP dockerd
*https://askubuntu.com/questions/678857/fstab-doesnt-mount-with-exec
+
* #Test nvidia-smi with the latest official CUDA image
 
+
*docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
'''Give writing permissions to new MachineLearning partition'''
+
*Link:
*sudo chmod -R a+rwx /media/MachineLearning/
+
*https://github.com/NVIDIA/nvidia-docker
 
 
*Need extra space? Extending the partition
 
https://askubuntu.com/questions/492054/how-to-extend-my-root-partition
 
  
'''Installing NVIDIA Driver:'''
+
=Installing docker-compose:=
*Set Ubuntu to boot on console mode. Type:
 
*sudo apt-get install systemd
 
*sudo systemctl set-default multi-user.target
 
*sudo reboot now
 
*Login and in console mode, type:
 
*sudo add-apt-repository ppa:graphics-drivers/ppa
 
*sudo apt update
 
*sudo apt upgrade
 
*For GeForce 1070Ti (07/2018), type:
 
*sudo apt-get install nvidia-390
 
*Re-set Ubuntu to boot on graphical mode. Type:
 
*sudo systemctl set-default graphical.target
 
*sudo reboot now
 
  
 +
=Installing nvidia-docker-compose:=
 +
*pip install nvidia-docker-compose
 +
*link: https://hackernoon.com/docker-compose-gpu-tensorflow-%EF%B8%8F-a0e2011d36
 +
* Permission Denied on curl and save for docker compose: https://github.com/docker/machine/issues/652
  
'''Checking if Nvidia Driver is properly installed. Type:'''
+
=Using Docker with nvidia-docker-compose=
*nvidia-smi
 
*nvidia-settings
 
  
'''Installing CUDA 9.0 for Ubuntu 16.04 (the latest version is not supported by TensorFlow):'''
+
*Public docker repository (When doing FROM in Dockerfile, we need to select one of those)
*wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
+
*https://hub.docker.com/
*wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb
 
*wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7-dev_7.0.5.15-1+cuda9.0_amd64.deb
 
*wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl2_2.1.4-1+cuda9.0_amd64.deb
 
*wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl-dev_2.1.4-1+cuda9.0_amd64.deb
 
*sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
 
*sudo dpkg -i libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb
 
*sudo dpkg -i libcudnn7-dev_7.0.5.15-1+cuda9.0_amd64.deb
 
*sudo dpkg -i libnccl2_2.1.4-1+cuda9.0_amd64.deb
 
*sudo dpkg -i libnccl-dev_2.1.4-1+cuda9.0_amd64.deb
 
*sudo apt-get update
 
*sudo apt-get install cuda=9.0.176-1
 
*sudo apt-get install libcudnn7-dev
 
*sudo apt-get install libnccl-dev
 
*sudo reboot now
 
*export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
 
*export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
 
*sudo nano .bashrc
 
*Add the two last export lines at the end of the file. Save and reboot.
 
  
'''Checking if CUDA is properly installed. Type:'''
+
*Dir structure:
*nvcc --version
+
*docker-compose.yml
 +
*deepo
 +
*deepo/do_not_finish.sh
 +
*deepo/Dockerfile
 +
*deepo_data (folder that is visible by deepo image)
  
'''Resources used:'''
+
*docker-compose.yml:
*https://askubuntu.com/questions/61396/how-do-i-install-the-nvidia-drivers
+
version: '3'
*https://medium.com/@bbloks/a-machine-learning-environment-with-ubuntu-and-gpu-acceleration-in-5-steps-765608325356
+
services:
*https://yangcha.github.io/CUDA90/
+
  #machine name
 +
  deepo:
 +
    #container name
 +
    container_name: deepo
 +
    #path to Dockerfile
 +
    build: deepo
 +
    command: sh do_not_finish.sh
 +
    volumes:
 +
      - ./deepo_data:/media/deepo_data
 +
    tty: true
  
'''Installing Docker CE on Ubuntu 16.04:'''
+
*Dockerfile:
*sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
+
FROM ufoym/deepo
*curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
+
ADD do_not_finish.sh /
*sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable"
+
*Dockerfiles guide:
*sudo apt-get update
+
*https://rock-it.pl/how-to-write-excellent-dockerfiles/
  
*More info: https://unix.stackexchange.com/questions/363048/unable-to-locate-package-docker-ce-on-a-64bit-ubuntu
+
*do_not_finish.sh:
 +
#!/bin/bash
 +
sh -c 'while :; do sleep 100; done'
  
'''Installing Deepo:'''
+
*We need that endless loop, because docker-compose closes the container when is deployed
*[http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/340.107/NVIDIA-Linux-x86_64-340.107.run&lang=us&type=TITAN Prerequisite 1: Nvidia driver ]
+
*The endless loop allowed us to use it with a docker exec
*[https://github.com/NVIDIA/nvidia-docker Prerequisite 2: nvidia-docker]
 
*[https://github.com/ufoym/deepo Deepo (for GPU)]
 
  
 +
==Run it==
 +
*Steps 1 and 2: Within the folder where is the docker-compose.yml file
 +
*sudo nvidia-docker-compose build
 +
*sudo nvidia-docker-compose up
  
 +
*Step 3: From another terminal:
 +
*sudo nvidia-docker exec -it deepo bash
  
 +
==Troubleshooting problems==
 +
*Check nvidia-docker version (needs to be version 1)
 +
*nvidia-docker version
 +
*More info:
 +
*https://github.com/eywalker/nvidia-docker-compose/issues/26
  
  
 +
*Permission denied: u'./docker-compose.yml
 +
*https://github.com/docker/docker-snap/issues/26
  
'''Change Docker root dir using systemd'''
+
=Deepo=
*systemctl status docker.service
+
It includes:<br/>
*sudo nano /etc/default/docker
+
*cudnn
*Edit ExecStart line to look like this ExecStart =/usr/bin/dockerd -g /media/MachineLearning/docker -H fd://
+
*theano
*systemctl daemon-reload
+
*tensorflow
*systemctl restart docker
+
*sonnet
*sudo docker info - verify the root dir has updated
+
*pytorch 
*https://github.com/IronicBadger/til/blob/master/docker/change-docker-root.md
+
*keras
 +
*lasagne
 +
*mxnet
 +
*cntk
 +
*chainer
 +
*caffe
 +
*caffe2
 +
*torch
  
'''Docker - clean up all the volumes'''
 
*sudo docker system prune -a -f --volumes
 
  
'''Other options:'''
+
==Installing Deepo:==
 +
*[http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/340.107/NVIDIA-Linux-x86_64-340.107.run&lang=us&type=TITAN Prerequisite 1: Nvidia driver ]
 +
*[https://github.com/NVIDIA/nvidia-docker Prerequisite 2: nvidia-docker]
 +
*[https://github.com/ufoym/deepo Deepo (for GPU)]
  
'''NTFS fstab wizard:'''
+
==Run Deepo image with Docker:==
*sudo apt-get install ntfs-config
+
*sudo nvidia-docker run -it ufoym/deepo:gpu bash
*sudo ntfs-config
 
  
'''Format large capacity HD with fs ExFat for having access to it from Ubuntu:'''
+
==Run Deepo image with Docker (with python 2.7):==
*On Windows 10
+
*sudo nvidia-docker run -it ufoym/deepo:py27 bash
*cmd
 
*diskpart
 
*select disk '#' (where # is the number of the target drive)
 
*list part
 
*select part # (where # is the number of the partition)
 
*format fs=exfat QUICK
 

Revision as of 23:47, 27 November 2019

ML Docker Image installed on the Interaction Station ML computers (Ubuntu 16.04):

Installing Docker CE:

Change Docker root dir using systemd (Don't do this, set volume instead)

Docker - clean up all the volumes

  • sudo docker system prune -a -f --volumes


Installing nvidia-docker v1 (deprecated!):

Installing docker-compose:

Installing nvidia-docker-compose:

Using Docker with nvidia-docker-compose

  • Dir structure:
  • docker-compose.yml
  • deepo
  • deepo/do_not_finish.sh
  • deepo/Dockerfile
  • deepo_data (folder that is visible by deepo image)
  • docker-compose.yml:
version: '3'
services:
  #machine name
  deepo:
    #container name
    container_name: deepo
    #path to Dockerfile
    build: deepo
    command: sh do_not_finish.sh
    volumes:
      - ./deepo_data:/media/deepo_data
    tty: true
  • Dockerfile:

FROM ufoym/deepo ADD do_not_finish.sh /

  • do_not_finish.sh:
  1. !/bin/bash

sh -c 'while :; do sleep 100; done'

  • We need that endless loop, because docker-compose closes the container when is deployed
  • The endless loop allowed us to use it with a docker exec

Run it

  • Steps 1 and 2: Within the folder where is the docker-compose.yml file
  • sudo nvidia-docker-compose build
  • sudo nvidia-docker-compose up
  • Step 3: From another terminal:
  • sudo nvidia-docker exec -it deepo bash

Troubleshooting problems


Deepo

It includes:

  • cudnn
  • theano
  • tensorflow
  • sonnet
  • pytorch
  • keras
  • lasagne
  • mxnet
  • cntk
  • chainer
  • caffe
  • caffe2
  • torch


Installing Deepo:

Run Deepo image with Docker:

  • sudo nvidia-docker run -it ufoym/deepo:gpu bash

Run Deepo image with Docker (with python 2.7):

  • sudo nvidia-docker run -it ufoym/deepo:py27 bash