Wednesday, July 12, 2017

Drone following instructions

Reading instructions from QR codes and executing them using android application


Recently I got an opportunity to build a drone prototype controlled by Android device. Firstly I had to choose the best candidate. The requirements were: small size and SDK with a video streaming. After some research I've decided that the Bebop 2 from Parrot would be the best choice. Parrot is one of the few companies that has an open SDK for developers. Recently they have released the 3rd version of their SDK.

The first step led to try the android application example. This example covers almost every basic feature: connecting to drone, moving around, making picture in high quality and accessing the drone's media.

One of the steps for the prototype would be autonomous landing onto a pattern. I've done some research about the existing solutions and found this paper that describes the theory behind the landing. So I've decided to create an android application that navigates the drone to land onto detected pattern(in this case a QR code). Later, I've made and update so application can read instructions from the detected patterns and executes them sequentially.

Drone details

Bebop 2 has got many cool features I'm not going to write about, but I will draw your attention to the flight time that is about 22 minutes, which is quite useful for development. After going through the SDK documentation, I found a small disadvantage. The ultrasonic sensor for altitude detection is not yet accessible via API. On the other hand I was pleasantly surprised with a camera features. The drone has got one camera placed in the front with a fish eye lens. The camera is using the gyroscope so the streaming video is still in one fixed position even if the drone is leaning to sides. You can also setup this angle via API so you get the output video from requested angle. For the purpose of this prototype I needed the frontal and bottom view. Streaming video quality can be set up to 640 x 368px. Recording quality has higher resolution but is not accessible as a stream. The video resolution can be also set via API.

Bebop 2 on a cardboard landing pad

Detection and output

I've had a small issue to get a raw image from the video stream. After passing this issue I've used asynchronous detection for QR code using the google vision lib. Small disadvantage for this library is that the result object does not contain the rotation of QR code, so I had to add this missing method. I also needed some output drawing so I added a transparent layer above streaming.

Sequence of moves

Searching existing solutions I've found few libraries written in python or javascript that can execute movements sequentially. These moves work as a sorted list of commands that are executed in predefined order. I've implemented my own sequence moves which consist of 3 different types of commands .

  • time move - executes move in time (e.g. move forward for 3000 milliseconds)
  • single action - execution of singe action (e.g. take off, land, take picture, ...0
  • condition - executes some action after satisfying condition (e.g. after locking onto the pattern reads instruction and adds to the command stack)


Landing is condition rule command type. The condition part is to center the drone above the pattern, and the action part is landing. So let's describe the centering condition. I've used seven simple independent controllers to center the drone onto the exact position above the pattern. Five of those controllers are for movement and two of them are for possible correction. Each of these controllers gets position from asynchronous QR code detector from video stream with a time stamp.

The movement controllers are quite straightforward. They take care of movement in one axis in both directions. The rotation controller rotates drone to pattern-orientation so the next instruction will be executed in the same rotation. The controller gets active only if other four movement controllers are properly centered. The small correction controller is using the knowledge of last detected QR code position (e.g. if the last position was detected in the bottom side, try to move backward ). The large correction controller is launched when the pattern was not detected for a long time (1-3 sec) and starts searching procedure which contains few steps (move up, rotate, ...). Both correction controllers are time limited.

movement controllers
  • forward / backward
  • left / right
  • rotate clockwise / rotate anticlockwise
  • up / down

pattern rotation
  • correction controllers
  • small correction (use last knowledge of pattern position)
  • large correction (everything is lost, just try to find)

Executing instructions

Executing instruction is a same condition rule command type as a landing action. The condition part is to center the drone above the pattern, but the action afterwords is to parse QR code message and add a new command into a command stack. Each message has a simple structure (e.g. "id:5;fw:2000" means go forward 2000 milliseconds) and a unique identifier which has to be larger than previous one.


For testing purposes I've made a few cardboard landing pads. The pads cannot be very light so the wind coming from the drone won't move them away.

The following instruction process was tested indoors and outdoors. The outdoor results were insufficient. Even light side wind put drone away from QR code. To successfully read the instruction from pattern, the drone had to execute the searching procedure for a few times.

On the other hand the indoor results were quite satisfying. The QR code was detected almost every time. The searching procedure was not launched even once. The indoor testing was captured on a video and you can see it in attached video.


From my point of view the simple logic controllers could be replaced by functions that describe speed in time for each movement. Therefore the lock onto pattern will be done more quickly and it can be used for an outside purposes too. For better orientation in space some positioning system could be used, but that was not point of this exercise.


Friday, March 24, 2017

Robotic arm with computer vision

Robotic arm with computer vision - picking up the object


The main idea was to build an environment with a robotic arm that can execute various commands based on an image analysis of a scene. In this article I'm going to describe all parts of the idea. For the first task I've chosen detection and moving one object.


The whole environment consists of few parts mounted together. For the base I've chosen an old table and repainted it with a white color to get better contrast with objects. Onto the middle of longer side I mounted robotic arm that I got from e-bay. The arm has 6 servo motors, with rotation base and claws on the other size. Parts are made of aluminium and are quite solid. Then I got some perforated metal ledges, short them, and mounted them to the corners of the table. Screw it all together. Then I put RGB Led strip to the bottom side of top part of construction. In the end i placed USB camera at the top of construction so it can see the whole scene.

Communication with arm

The robotic arm has 6 servo motors. The quickest way is to get a servo controller which allows us to control one or group of servos. I chose controller with a serial output with custom protocol, so the communication can be done via USB via few lines of code in any language.

Example of a group operation:

Logic flow

The application has few independent parts that communicate with each other.

The input from camera is running in separate thread and runs a preprocessing on interesting frame. (Interesting frame is a frame that was detected after the movement.) The result of preprocessing is list of detected objects with coordinates.
The interesting frame is then sent to main logic. This is where all the modules are registered. If there is no active module, then it tries to initialize first that satisfied initial conditions. If any module is active, interesting image is send to this module. Modules takes care of logic and decides about what to do next. Module then sent movement commands into to queue for usb communicator.
The usb communicator repeatidly reads messages from its queue and sends commands to controller via USB. Controller then moves with the servo motors.

schema of logic flow

Calculation of the next move

One of the most frequently used feature will be picking up the object. After we get preprocessed input from the camera we have to calculate the move to pick up the object. So now we have frame with detected object and center of arm. Next we know the real size of the table, length of arm parts and base height. Our task is to calculate angles for each servo in arm so it can be able to reach and pick up the object. We can split this problem into two smaller problems. Each part has got a little bit of a geometry character.

First part will be the base rotation (we can imagine this as a view from the top). This is trigonometry exercise where we know all the points of a triangle, two sides and we want to calculate the angle. Then we recalculate the angle to miliseconds for servo controller.

Second part will be the rotation of three servos to lean the arm (we can imagine this as a view from the side). In the first version we do not know the height of the object so we use the constant instead. The problem is very similar to the previous trigonometry problem. This time we have one side and angle for each part. So if we substitute the right three values we know if we reached our object. I've used brute force to calculate the three angles (time was less than one second).


The idea is to create an application with easily insert able modules. Each module we can imagine as series of a moves with custom logic. These modules are extended from template class. Each module is defined by the list of states.
Each state can be changed for one of the following triggers:

time trigger - wait some time to do next move
interesting frame trigger - when movement from the camera is detected
command execution trigger - send broadcast from USB controller that move was executed
So the application would have all the logic for each task, separated in custom module. For example the module for picking up the object can have the following states:

start (interesting frame trigger)
pick up object (time trigger)
move object (time trigger)
release object and return to default position (time trigger)
verify if object was moved (wait if the last move is executed - command execution trigger)

First testing

Small issue that I came across writing with c++ openCV was, that you can not show an image from the background thread, only main thread can call imgshow() method. So I used singleton instance that keeps images and main thread afterwards shows the image.

One of the open problems is detecting the object's height. It's not possible to detect object's height from a single camera. There could be used some sensor at the end of arm or other approach.

Even though, the first tests for picking up the object were successful, more calibration is required. After this calibration there could be used learning for the best object's spot for successful picking. Also the claws should enclose the object hard enough not to slip.


Tuesday, January 10, 2017

Counting dice and train wagons using computer vision

Computer vision exercises with preprocessing

Before the next project I decided to do some computer vision exercises. Each example is based on a simple logic image preprocessing. No data structure or learning is required.


I got this idea while browsing the net. I was curious about how hard can it be to write such a script. I'll describe the algorithm in steps.

  1. movement detection: Comparing few frames with thresholds gives us the information, whether something is moving in the frame. Adding some small time frame after the movement stops gives us more precise information.
  2. remove background: Thresholding gray frame removes the background and gives us binary image with objects
  3. cropping the objects: Using contours to detect object and then separate them by cropping.
  4. detecting dots: Inverting the image we get objects that can be again simply detected using contours.
  5. filtering dots: If dice is visible also from the side therefore dots from that side can be recognized as well. But we can simply filter these dots by comparing side ratio from their bounding boxes.
  6. result: Count the recognized dots and do some visualization to output frame.

The results are quite good. More testing with different dice and different background should be the next step.

Train Wagon counter

Everyone is doing car counter on highways. But you can't find any for counting train wagons. For the next exercise i chose static video of a train from youtube. Again, I'll briefly describe the algorithm in steps.

  1. compare frame with background: Comparing every new frame to a background frame. By background I mean the frame without train. We get a first binary frame.
  2. compare last two frames: Comparing two continuously frames we got an actual movement. We get another binary frame.
  3. combine binary frames: Combining these frames with OR condition.
  4. morphological operation: Use morph opening to remove the noise.
  5. fill in the holes: Detecting areas using contours and detect if they are all filled.
  6. select the area: The idea was to choose an area in a frame where we can clearly see the background in-between wagons. For this video we've chosen the right edge of the image.
  7. signal processing: Now we're facing a new problem. We're finding local minimum in a signal function. Adding some thresholds and limitations for repetition we can get the local minimum.
  8. result: Count the wagons from filtering function and do some visualization to see when and where the local minimum is detected.

This approach is very limited and works only in a good light conditions and wagons should be the same type and color. Next steps should be using colors, shapes or more wagon details to be more accurate.