Skip to content

Commit

Permalink
Merge branch 'develop' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
kinivi committed Mar 16, 2021
2 parents d2e0be7 + 449c075 commit 727b894
Show file tree
Hide file tree
Showing 7 changed files with 291 additions and 20 deletions.
244 changes: 243 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,246 @@
# tello-gesture-control
# DJI Tello Hand Gesture control

The main goal of this project is to control the drone using hand gestures without any gloves or additional equipment.
Just camera on the drone or your smartphone(soon), laptop and human hand.<br>

<img alt="demo_gif" src="https://user-images.githubusercontent.com/13486777/111168690-fb2e9280-85aa-11eb-894f-fe70633072fd.gif">


## Index
1. [Introduction](#Introduction)
2. [Setup](#Setup)
1. [Install pip packages](#1.-Installing-pip-packages)
2. [Connect and test Tello](#2.-Connect-Tello)
3. [Usage](#Usage)
* [Keyboard control](##Keyboard-control)
* [Gesture control](#Gesture-control)
4. [Adding new gestures](#Adding-new-gestures)
* [Technical description](#Technical-details-of-gesture-detector)
* [Creating dataset](#Creating-dataset-with-new-gestures)
* [Retrain model](#Notebook-for-retraining-model)
5. [Repository structure](#Repository-structure)

## Introduction
This project relies on two main parts - DJI Tello drone and Mediapipe fast hand keypoints recognition.

DJI Tello is a perfect drone for any kind of programming experiments. It has a rich Python API (also Swift is available) which helps to almost fully control a drone, create drone swarms and utilise its camera for Computer vision.

Mediapipe is an amazing ML platform with many robust solutions like Face mesh, Hand Keypoints detection and Objectron. Moreover, their model can be used on the mobile platforms with on-device acceleration.

Here is a starter-pack that you need:

<img alt="starter_pack" width="80%" src="https://user-images.githubusercontent.com/13486777/111294166-b65e3680-8652-11eb-8225-c1fb1e5b867d.JPG">

## Setup
### 1. Installing pip packages
First, we need to install python dependencies. Make sure you that you are using `python3.7`

List of packages
```sh
ConfigArgParse == 1.2.3
djitellopy == 1.5
numpy == 1.19.3
opencv_python == 4.5.1.48
tensorflow == 2.4.1
mediapipe == 0.8.2
```

Install
```sh
pip3 install -r requirements.txt
```
### 2. Connect Tello
Turn on drone and connect computer to its WiFi

<img width="346" alt="wifi_connection" src="https://user-images.githubusercontent.com/13486777/110932822-a7b30f00-8334-11eb-9759-864c3dce652d.png">


Next, run the following code to verify connectivity

```sh
python3 tests/test_connection.py
```

On successful connection

```json
1. Connection test:
Send command: command
Response: b'ok'


2. Video stream test:
Send command: streamon
Response: b'ok'
```

If you get such output, you may need to check your connection with the drone

```json
1. Connection test:
Send command: command
Timeout exceed on command command
Command command was unsuccessful. Message: False


2. Video stream test:
Send command: streamon
Timeout exceed on command streamon
Command streamon was unsuccessful. Message: False
```

## Usage
The most interesting part is demo. There are 2 types of control: keyboard and gesture. You can change between control types during the flight. Below is a complete description of both types.

Run the following command to start the tello control :

```sh
python3 main.py
```

This script will start the python window with visualization like this:

<img width="60%" alt="window" src="https://user-images.githubusercontent.com/13486777/111294470-09d08480-8653-11eb-895d-a8ca9f6a288d.png">


### Keyboard control
(To control the drone with your keyboard, first press the `Left Shift` key.)

The following is a list of keys and action description -

* `k` -> Toggle Keyboard controls
* `g` -> Toggle Gesture controls
* `Left Shift` -> Take off drone #TODO
* `Space` -> Land drone
* `w` -> Move forward
* `s` -> Move back
* `a` -> Move left
* `d` -> Move right
* `e` -> Rotate clockwise
* `q` -> Rotate counter-clockwise
* `r` -> Move up
* `f` -> Move down
* `Esc` -> End program and land the drone


### Gesture control

By pressing `g` you activate gesture control mode. Here is a full list of gestures that are available now.

<img alt="gestures_list" width="80%" src="https://user-images.githubusercontent.com/13486777/110933057-f1035e80-8334-11eb-8458-988af973804e.JPG">

## Adding new gestures
Hand recognition detector can add and change training data to retrain the model on the own gestures. But before this,
there are technical details of the detector to understand how it works and how it can be improved
### Technical details of gesture detector
Mediapipe Hand keypoints recognition is returning 3D coordinated of 20 hand landmarks. For our
model we will use only 2D coordinates.

<img alt="gestures_list" width="80%" src="https://user-images.githubusercontent.com/13486777/110933339-49d2f700-8335-11eb-9588-5f68a2677ff0.png">


Then, these points are preprocessed for training the model in the following way.

<img alt="preprocessing" width="80%" src="https://user-images.githubusercontent.com/13486777/111294503-11902900-8653-11eb-9856-a50fe96e750e.png">


After that, we can use data to train our model. Keypoint classifier is a simple Neural network with such
structure

<img alt="model_structure" width="80%" src="https://user-images.githubusercontent.com/13486777/111294522-16ed7380-8653-11eb-9fed-e472c8a9a039.png">



_check [here](#Grid-Search) to understand how the architecture was selected_
### Creating dataset with new gestures
First, pull datasets from Git LFS. [Here](https://github.com/git-lfs/git-lfs/wiki/Installation) is the instruction of how
to install LFS. Then, run the command to pull default csv files
```sh
git lfs install
git lfs pull
```

After that, run `main.py` and press "n" to enter the mode to save key points
(displayed as **MODE:Logging Key Point**

<img width="60%" alt="writing_mode" src="https://user-images.githubusercontent.com/13486777/111301228-a185a100-865a-11eb-8a3c-fa4d9ee96d6a.png">


If you press "0" to "9", the key points will be added to [model/keypoint_classifier/keypoint.csv](model/keypoint_classifier/keypoint.csv) as shown below.<br>
1st column: Pressed number (class ID), 2nd and subsequent columns: Keypoint coordinates

<img width="90%" alt="keypoints_table" src="https://user-images.githubusercontent.com/13486777/111295338-ec4fea80-8653-11eb-9bb3-4d27b519a14f.png">

In the initial state, 7 types of learning data are included as was shown [here](#Gesture-control). If necessary, add 3 or later, or delete the existing data of csv to prepare the training data.
### Notebook for retraining model
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kinivi/tello-gesture-control/blob/main/Keypoint_model_training.ipynb)

Open [Keypoint_model_training.ipynb](Keypoint_model_training.ipynb) in Jupyter Notebook or Google Colab.
Change the number of training data classes,the value of **NUM_CLASSES = 3**, and path to the dataset. Then, execute all cells
and download `.tflite` model

<img width="60%" alt="notebook_gif" src="https://user-images.githubusercontent.com/13486777/111295516-1ef9e300-8654-11eb-9f59-6f7a85b99076.gif">


Do not forget to modify or add labels in `"model/keypoint_classifier/keypoint_classifier_label.csv"`

#### Grid Search
❗️ Important ❗️ The last part of the notebook is an experimental part of the notebook which main functionality is to test hyperparameters of the model structure. In a nutshell: grid search using TensorBoard visualization. Feel free to use it for your experiments.


<img width="70%" alt="grid_search" src="https://user-images.githubusercontent.com/13486777/111295521-228d6a00-8654-11eb-937f-a15796a3024c.png">


## Repository structure
<pre>
│ main.py
│ Keypoint_model_training.ipynb
│ config.txt
│ requirements.txt
├─model
│ └─keypoint_classifier
│ │ keypoint.csv
│ │ keypoint_classifier.hdf5
│ │ keypoint_classifier.py
│ │ keypoint_classifier.tflite
│ └─ keypoint_classifier_label.csv
├─gestures
│ │ gesture_recognition.py
│ │ tello_gesture_controller.py
│ └─ tello_keyboard_controller.py
├─tests
│ └─connection_test.py
└─utils
└─cvfpscalc.py
</pre>
### app.py
Main app which controls the functionality of drone control and gesture recognition<br>
App also includes mode to collect training data for adding new gestures.<br>

### keypoint_classification.ipynb
This is a model training script for hand sign recognition.

### model/keypoint_classifier
This directory stores files related to gesture recognition.<br>

* Training data(keypoint.csv)
* Trained model(keypoint_classifier.tflite)
* Label data(keypoint_classifier_label.csv)
* Inference module(keypoint_classifier.py)

### gestures/
This directory stores files related to drone controllers and gesture modules.<br>

* Keyboard controller (tello_keyboard_controller.py)
* Gesture controller(tello_keyboard_controller.py)
* Gesture recognition module(keypoint_classifier_label.csv)

### utils/cvfpscalc.py
Module for FPS measurement.

# TODO
- [ ] Motion gesture support (LSTM)
Expand Down
3 changes: 2 additions & 1 deletion config.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
device = 0
width = 960
height = 540
min_detection_confidence = 0.5
min_detection_confidence = 0.7
min_tracking_confidence = 0.5
buffer_len = 5
is_keyboard = True
12 changes: 6 additions & 6 deletions gestures/gesture_recognition.py
Original file line number Diff line number Diff line change
Expand Up @@ -458,12 +458,12 @@ def _draw_info_text(self, image, brect, handedness, hand_sign_text,
cv.putText(image, info_text, (brect[0] + 5, brect[1] - 4),
cv.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 1, cv.LINE_AA)

if finger_gesture_text != "":
cv.putText(image, "Finger Gesture:" + finger_gesture_text, (10, 60),
cv.FONT_HERSHEY_SIMPLEX, 1.0, (0, 0, 0), 4, cv.LINE_AA)
cv.putText(image, "Finger Gesture:" + finger_gesture_text, (10, 60),
cv.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2,
cv.LINE_AA)
# if finger_gesture_text != "":
# cv.putText(image, "Finger Gesture:" + finger_gesture_text, (10, 60),
# cv.FONT_HERSHEY_SIMPLEX, 1.0, (0, 0, 0), 4, cv.LINE_AA)
# cv.putText(image, "Finger Gesture:" + finger_gesture_text, (10, 60),
# cv.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2,
# cv.LINE_AA)

return image

Expand Down
27 changes: 19 additions & 8 deletions gestures/tello_gesture_controller.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from djitellopy import Tello


class TelloGestureController:
def __init__(self, tello: Tello):
self.tello = tello
Expand All @@ -16,23 +17,33 @@ def gesture_control(self, gesture_buffer):
print("GESTURE", gesture_id)

if not self._is_landing:
if gesture_id == 0:
if gesture_id == 0: # Forward
self.forw_back_velocity = 30
elif gesture_id == 1:
elif gesture_id == 1: # STOP
self.forw_back_velocity = self.up_down_velocity = \
self.left_right_velocity = self.yaw_velocity = 0
if gesture_id == 5: # Back
self.forw_back_velocity = -30
elif gesture_id == 2:
self.forw_back_velocity = 0
elif gesture_id == 3:

elif gesture_id == 2: # UP
self.up_down_velocity = 25
elif gesture_id == 4: # DOWN
self.up_down_velocity = -25

elif gesture_id == 3: # LAND
self._is_landing = True
self.forw_back_velocity = self.up_down_velocity = \
self.left_right_velocity = self.yaw_velocity = 0
self.tello.land()

elif gesture_id == 6: # LEFT
self.left_right_velocity = 20
elif gesture_id == 7: # RIGHT
self.left_right_velocity = -20

elif gesture_id == -1:
self.forw_back_velocity = self.up_down_velocity = \
self.left_right_velocity = self.yaw_velocity = 0

self.tello.send_rc_control(self.left_right_velocity, self.forw_back_velocity,
self.up_down_velocity, self.yaw_velocity)



9 changes: 6 additions & 3 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ def get_args():
parser.add("--min_tracking_confidence",
help='min_tracking_confidence',
type=float)
parser.add("--buffer_len",
help='Length of gesture buffer',
type=int)

args = parser.parse_args()

Expand Down Expand Up @@ -64,7 +67,7 @@ def main():


# Take-off drone
# tello.takeoff()
tello.takeoff()

cap = tello.get_frame_read()

Expand All @@ -74,7 +77,7 @@ def main():

gesture_detector = GestureRecognition(args.use_static_image_mode, args.min_detection_confidence,
args.min_tracking_confidence)
gesture_buffer = GestureBuffer(buffer_len=5)
gesture_buffer = GestureBuffer(buffer_len=args.buffer_len)

def tello_control(key, keyboard_controller, gesture_controller):
global gesture_buffer
Expand Down Expand Up @@ -137,7 +140,7 @@ def tello_battery(tello):
# Battery status and image rendering
cv.putText(debug_image, "Battery: {}".format(battery_status), (5, 720 - 5),
cv.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
cv.imshow('Hand Gesture Recognition', debug_image)
cv.imshow('Tello Gesture Recognition', debug_image)

tello.land()
tello.end()
Expand Down
2 changes: 1 addition & 1 deletion model/keypoint_classifier/keypoint_classifier_label.csv
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Forward
Stop
Up
OK
Land
Down
Back
Left
Expand Down
14 changes: 14 additions & 0 deletions tests/connection_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from djitellopy import Tello

if __name__ == '__main__':

print('1. Connection test:')
tello = Tello()
tello.connect()
print('\n')

print('2. Video stream test:')
tello.streamon()
print('\n')

tello.end()

0 comments on commit 727b894

Please sign in to comment.