From Interaction Station Wiki
Jump to navigation Jump to search

What is YOLO?

YOLO (You Only Look Once) is a state-of-the-art (2019) technique to detect objects within images. One of its advantages is that it's extremely fast compared to other techniques, which makes it suitable for using it with video feeds at high frame rates (with a fast Nvidia GPU).

YOLO applies a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

Initial notes

  • This guide was tested with YOLOv3 as part of Darknet.
  • The configuration file has been changed for making use of Nvidia GPU and of OpenCV (needed for running YOLOv3 in a video file).
  • Tested on Ubuntu 16.04


  • Go to the directory and download the YOLO weights:
cd darknet
wget https://pjreddie.com/media/files/yolov3.weights

Using YOLO

First we go to the darknet directory with the terminal

  • In some computers:
cd /media/interactionstation/MachineLearning/darknet
  • In others:
cd /home/interactionstation/MachineLearning/darknet

Using YOLO with an image

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

Using YOLO with a video file

./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights data/ny720p2.mp4

Using YOLO with the live feed from a webcam

./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights

Changing The Detection Threshold

  • By default, YOLO only displays objects detected with a confidence of .25 or higher. You can change this by passing the -thresh <val> flag
./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg -thresh .1

Tiny YOLOv3

  • Tiny YOLOv3 is a smaller model as well for constrained environments. To use this model, first download the weights:
cd darknet
wget https://pjreddie.com/media/files/yolov3-tiny.weights
  • Then run the detector with the tiny config file and weights:
./darknet detect cfg/yolov3-tiny.cfg yolov3-tiny.weights data/dog.jpg

Training your own Dataset

  • Download the convolutional weights from the darknet53 model that are pre-trained on Imagenet and place them in the Darknet folder:
wget https://pjreddie.com/media/files/darknet53.conv.74

Step 1: Dataset

Step 2: Data Annotation

  • Go to the directory where you have darknet installed. In my case:
cd /media/interactionstation/MachineLearning/darknet
  • Go to the directory where you can find the YOLO annotation tool. In my case:
cd Yolo-Annotation-Tool-New--master
  • If the folder is not there, you could download it and copy inside to darknet from this link:
  • Modify the classes.txt file. Add the name of your categories, each one of them in a new line. Example:
  • Enter the following command in the terminal:
python main.py
  • Note: If _tkinter is not installed, type in the terminal
sudo apt-get install python-tk
  • Note: If PIL is not installed, type in the terminal
pip install Pillow
  • Write the name of the folder where you have placed the images of the first category of your dataset and click the button "Load"
  • Select the category of those images in the combo box next that "Choose Class".
  • Create the bounding boxes in each one of the images in this category by first, clicking on the image and then clicking on the button Next
  • When you are done with all the images of that category, write the name of the folder of the next category and press the Load button.
  • Select the category of those images in the combo box next that "Choose Class".
  • Create the bounding boxes for all the images of this category.
  • Do the same for the rest of categories.
  • Whenever you are done, close the tool
  • The tool should have created one text file that contains the objects coordinates for each one of the images of your dataset in a folder called Labels.
  • Move all the text files with the coordinates in the folders that contain the images of your dataset.
  • Now we need to split the images of the dataset into two sets: train and test.
  • By default this is done in a 90%/10% ratio, but it can be changed in the script.
  • We first need to edit process.py (write click, and open with gedit).
  • We modify the line with the word current_dir =, with the path of our dataset. In my case:
current_dir = '/media/interactionstation/MachineLearning/darknet/Yolo-Annotation-Tool-New--master/Images/cardog'
  • Be aware of not introducing white spaces in the line, otherwise python will throw an error.
  • Save the file: File->Save and then File->Close.
  • Now we can type in the terminal:
python process.py
  • This script will generate the train.txt and test.txt files.

Step 3: We need to create 3 files and save them into our dataset directory:

- File 1 : myDataset-obj.names
- This text file should only contain the names of the categories. For example if we have the categories cat and dog, the content be:
- Copy this file into your dataset directory in Darknet.
- File 2: myDataset.data
- This file contains the number of categories, the name of the .names file, the name of the train and validation set files, and the folder where you want to store the yolo weights file.
    classes= 2 
    train  = Yolo-Annotation-Tool-New--master/Images/myDataset/train.txt  
    valid  = Yolo-Annotation-Tool-New--master/Images/myDataset/test.txt  
    names = Yolo-Annotation-Tool-New--master/Images/myDataset/myDataset-obj.names  
    backup = backup/
- Copy this file into your dataset directory in Darknet.
- File 3: myDataset.cfg
- This file contains some parameters for the training. We can choose to start from the default yolov3.cfg config file or from yolov3-tiny.cfg (faster but less precise). For now we will work with the tiny version of yolov3.
- Duplicate and the file yolov3-tiny.cfg located in Darknet/cfg and rename it as myDataset-yolov3-tiny.cfg and make the following changes:
    Line 3: set batch=24, this means we will be using 24 images for every training step
    Line 4: set subdivisions=8, the batch will be divided by 8 to decrease GPU VRAM requirements.
    Line 127: set filters=(classes + 5)*3 in our case filters=21
    Line 135: set classes=2, the number of categories we want to detect
    Line 171: set filters=(classes + 5)*3 in our case filters=21
    Line 177: set classes=2, the number of categories we want to detect

Step 4: Train your Dataset!

./darknet detector train Yolo-Annotation-Tool-New--master/Images/myDataset/myDataset.data Yolo-Annotation-Tool-New--master/Images/myDataset/myDataset-yolov3-tiny.cfg darknet53.conv.74
- This step will take several hours...
- Note: When completed 100 iteration it will automatically store weights file and kill the process once the average loss is less than 0.06 to get good a accuracy.
-Note: If getting "annot load image" errors might be the encoding/settings of the train.txt and test.txt text files. Open them to the Ubuntu text editor, click File-Save as and select
Character encoding: Current Locale (UTF-8)
Line Ending: Unix/Linux

Click Save

-Note: If you would like to use YOLOv3 instead of YOLOv3-tiny, duplicate the yolov3.cfg in the previous step instead of yolov3-tiny.cfg
-More info:

Step 5: Test it!

- For testing your new model, go to your backup folder and rename the file generated as the result of the training process to myDataset.weights. Copy the file into the Darknet folder, and run this line:
./darknet detect cfg/yolov3.cfg myDataset.weights data/myImage.jpg
- Being myImage.jpg the one in which you would like to try to find the new categories that were in your dataset.


More information

Alternative method