Revision as of 23:47, 27 November 2019

ML Docker Image installed on the Interaction Station ML computers (Ubuntu 16.04):

Installing Docker CE:

sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable"
sudo apt-get update

More info: https://unix.stackexchange.com/questions/363048/unable-to-locate-package-docker-ce-on-a-64bit-ubuntu

Change Docker root dir using systemd (Don't do this, set volume instead)

systemctl status docker.service
sudo nano /etc/default/docker
Edit ExecStart line to look like this ExecStart =/usr/bin/dockerd -g /media/MachineLearning/docker -H fd://
systemctl daemon-reload
systemctl restart docker
sudo docker info - verify the root dir has updated
https://github.com/IronicBadger/til/blob/master/docker/change-docker-root.md

Docker - clean up all the volumes

sudo docker system prune -a -f --volumes

Installing nvidia-docker v1 (deprecated!):

docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker
sudo pkill -SIGHUP dockerd
#Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Link:
https://github.com/NVIDIA/nvidia-docker

Installing docker-compose:

Installing nvidia-docker-compose:

pip install nvidia-docker-compose
link: https://hackernoon.com/docker-compose-gpu-tensorflow-%EF%B8%8F-a0e2011d36
Permission Denied on curl and save for docker compose: https://github.com/docker/machine/issues/652

Using Docker with nvidia-docker-compose

Public docker repository (When doing FROM in Dockerfile, we need to select one of those)
https://hub.docker.com/

Dir structure:
docker-compose.yml
deepo
deepo/do_not_finish.sh
deepo/Dockerfile
deepo_data (folder that is visible by deepo image)

docker-compose.yml:

version: '3'
services:
  #machine name
  deepo:
    #container name
    container_name: deepo
    #path to Dockerfile
    build: deepo
    command: sh do_not_finish.sh
    volumes:
      - ./deepo_data:/media/deepo_data
    tty: true

Dockerfile:

FROM ufoym/deepo ADD do_not_finish.sh /

Dockerfiles guide:
https://rock-it.pl/how-to-write-excellent-dockerfiles/

do_not_finish.sh:

!/bin/bash

sh -c 'while :; do sleep 100; done'

We need that endless loop, because docker-compose closes the container when is deployed
The endless loop allowed us to use it with a docker exec

Run it

Steps 1 and 2: Within the folder where is the docker-compose.yml file
sudo nvidia-docker-compose build
sudo nvidia-docker-compose up

Step 3: From another terminal:
sudo nvidia-docker exec -it deepo bash

Troubleshooting problems

Check nvidia-docker version (needs to be version 1)
nvidia-docker version
More info:
https://github.com/eywalker/nvidia-docker-compose/issues/26

Permission denied: u'./docker-compose.yml
https://github.com/docker/docker-snap/issues/26

Deepo

It includes:

cudnn
theano
tensorflow
sonnet
pytorch
keras
lasagne
mxnet
cntk
chainer
caffe
caffe2
torch

@@ Line 1: / Line 1: @@
-ML Docker Image installed on the Interaction Station ML computers:<br/>
+ML Docker Image installed on the Interaction Station ML computers (Ubuntu 16.04):<br/>
-=Installing Docker CE on Ubuntu 16.04:=
+=Installing Docker CE:=
 *sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
 *curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
@@ Line 9: / Line 9: @@
 *More info: https://unix.stackexchange.com/questions/363048/unable-to-locate-package-docker-ce-on-a-64bit-ubuntu
-==Change Docker root dir using systemd==
+==Change Docker root dir using systemd (Don't do this, set volume instead)==
 *systemctl status docker.service
 *sudo nano /etc/default/docker
@@ Line 22: / Line 22: @@
-=Installing nvidia-docker 1.0 on Ubuntu 16.04:=
+=Installing nvidia-docker v1 (deprecated!):=
-*# nvidia-docker2 still not supported by nvidia-docker-composite
 *docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
 *sudo apt-get purge -y nvidia-docker
 *curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
-  sudo apt-key add -
+*sudo apt-key add -
-distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+*distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
 *curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
-  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
+*sudo tee /etc/apt/sources.list.d/nvidia-docker.list
 *sudo apt-get update
-*sudo apt-get install -y nvidia-docker2
+*sudo apt-get install -y nvidia-docker
 *sudo pkill -SIGHUP dockerd
-*# Test nvidia-smi with the latest official CUDA image
+* #Test nvidia-smi with the latest official CUDA image
 *docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
 *Link:
 *https://github.com/NVIDIA/nvidia-docker
+=Installing docker-compose:=
+=Installing nvidia-docker-compose:=
+*pip install nvidia-docker-compose
+*link: https://hackernoon.com/docker-compose-gpu-tensorflow-%EF%B8%8F-a0e2011d36
+* Permission Denied on curl and save for docker compose: https://github.com/docker/machine/issues/652
+=Using Docker with nvidia-docker-compose=
+*Public docker repository (When doing FROM in Dockerfile, we need to select one of those)
+*https://hub.docker.com/
+*Dir structure:
+*docker-compose.yml
+*deepo
+*deepo/do_not_finish.sh
+*deepo/Dockerfile
+*deepo_data (folder that is visible by deepo image)
+*docker-compose.yml:
+ version: '3'
+ services:
+   #machine name
+   deepo:
+     #container name
+     container_name: deepo
+     #path to Dockerfile
+     build: deepo
+     command: sh do_not_finish.sh
+     volumes:
+       - ./deepo_data:/media/deepo_data
+     tty: true
+*Dockerfile:
+FROM ufoym/deepo
+ADD do_not_finish.sh /
+*Dockerfiles guide:
+*https://rock-it.pl/how-to-write-excellent-dockerfiles/
+*do_not_finish.sh:
+#!/bin/bash
+sh -c 'while :; do sleep 100; done'
+*We need that endless loop, because docker-compose closes the container when is deployed
+*The endless loop allowed us to use it with a docker exec
+==Run it==
+*Steps 1 and 2: Within the folder where is the docker-compose.yml file
+*sudo nvidia-docker-compose build
+*sudo nvidia-docker-compose up
+*Step 3: From another terminal:
+*sudo nvidia-docker exec -it deepo bash
+==Troubleshooting problems==
+*Check nvidia-docker version (needs to be version 1)
+*nvidia-docker version
+*More info:
+*https://github.com/eywalker/nvidia-docker-compose/issues/26
+*Permission denied: u'./docker-compose.yml
+*https://github.com/docker/docker-snap/issues/26
 =Deepo=
@@ Line 66: / Line 129: @@
 ==Run Deepo image with Docker (with python 2.7):==
 *sudo nvidia-docker run -it ufoym/deepo:py27 bash
-=Setting up ML computers:=
-*Linux distribution installed: Ubuntu 16.04
-==Partition made for machine learning:MachineLearning==
-*In Windows: Disk Management -> Resize DataStorage
-*Create new ext4 patition
-==Mounting the partition automatically:==
-===Get the UUID of the learning:MachineLearning partition===
-*sudo blkid
-===Add partition to fstab:===
-*sudo nano /etc/fstab
-*Add at the bottom these two lines:
-*UUID=(id of the MachineLearning partition) /media/MachineLearning rw,suid,dev,auto,user,async,exec      0      2
-*UUID=(id of the DataStorage partition) /media/DataStorage ntfs-3g defaults=en_US.UTF-8 0 0
-*The parameters mounting the MachineLearning partition solved this problem running caffe from that partition:
-*https://github.com/rbgirshick/py-faster-rcnn/issues/162
-*https://askubuntu.com/questions/678857/fstab-doesnt-mount-with-exec
-===Give writing permissions to new MachineLearning partition===
-*sudo chmod -R a+rwx /media/MachineLearning/
-*Need extra space? Extending the partition
-https://askubuntu.com/questions/492054/how-to-extend-my-root-partition
-==Installing NVIDIA Driver:==
-*Set Ubuntu to boot on console mode. Type:
-*sudo apt-get install systemd
-*sudo systemctl set-default multi-user.target
-*sudo reboot now
-*Login and in console mode, type:
-*sudo add-apt-repository ppa:graphics-drivers/ppa
-*sudo apt update
-*sudo apt upgrade
-*For GeForce 1070Ti (07/2018), type:
-*sudo apt-get install nvidia-390
-*Re-set Ubuntu to boot on graphical mode. Type:
-*sudo systemctl set-default graphical.target
-*sudo reboot now
-'''Checking if Nvidia Driver is properly installed. Type:'''
-*nvidia-smi
-*nvidia-settings
-==Installing CUDA 9.0 for Ubuntu 16.04 (the latest version is not supported by TensorFlow):==
-*wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
-*wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb
-*wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7-dev_7.0.5.15-1+cuda9.0_amd64.deb
-*wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl2_2.1.4-1+cuda9.0_amd64.deb
-*wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl-dev_2.1.4-1+cuda9.0_amd64.deb
-*sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
-*sudo dpkg -i libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb
-*sudo dpkg -i libcudnn7-dev_7.0.5.15-1+cuda9.0_amd64.deb
-*sudo dpkg -i libnccl2_2.1.4-1+cuda9.0_amd64.deb
-*sudo dpkg -i libnccl-dev_2.1.4-1+cuda9.0_amd64.deb
-*sudo apt-get update
-*sudo apt-get install cuda=9.0.176-1
-*sudo apt-get install libcudnn7-dev
-*sudo apt-get install libnccl-dev
-*sudo reboot now
-*export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
-*export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
-*sudo nano .bashrc
-*Add the two last export lines at the end of the file. Save and reboot.
-'''Checking if CUDA is properly installed. Type:'''
-*nvcc --version
-'''Resources used:'''
-*https://askubuntu.com/questions/61396/how-do-i-install-the-nvidia-drivers
-*https://medium.com/@bbloks/a-machine-learning-environment-with-ubuntu-and-gpu-acceleration-in-5-steps-765608325356
-*https://yangcha.github.io/CUDA90/
-'''Other options:'''
-'''NTFS fstab wizard:'''
-*sudo apt-get install ntfs-config
-*sudo ntfs-config
-'''Format large capacity HD with fs ExFat for having access to it from Ubuntu:'''
-*On Windows 10
-*cmd
-*diskpart
-*select disk '#' (where # is the number of the target drive)
-*list part
-*select part # (where # is the number of the partition)
-*format fs=exfat QUICK

Difference between revisions of "Docker"

Revision as of 23:47, 27 November 2019

Contents

Installing Docker CE:

Change Docker root dir using systemd (Don't do this, set volume instead)

Docker - clean up all the volumes

Installing nvidia-docker v1 (deprecated!):

Installing docker-compose:

Installing nvidia-docker-compose:

Using Docker with nvidia-docker-compose

Run it

Troubleshooting problems

Deepo

Installing Deepo:

Run Deepo image with Docker:

Run Deepo image with Docker (with python 2.7):

Navigation menu

Search