Select your language:  ukflag    span flag   
Daniel Martinez Capilla personal webpage - Thesis

Sign Language Translator using Microsoft Kinect XBOX 360
A master's thesis for the Erasmus Mundus in Computer Vision and Robotics (VIBOT 5)
by Daniel Martínez Capilla

Master Thesis: Sign language translator using Microsoft Kinect  XBOX 360

Posted by Admin on April 13, 2012


Figure 2 shows the flux diagram for each frame that is captured by the camera. For this frame, the joints of interest are obtained, normalized, and finally the frame descriptor is created.  The current working mode (TESTING/TRAINING) will define in which dataset the sign that the current frame belongs to can be found. If the current mode is TESTING and once the last frame of the sign is added to the test gesture, the classifier will output the correspondent word in the output display.

Block diagram


The system makes use of the six joints of interest shown in Figure 3. These joints are both hands (LH,RH), both elbows (LE,RE), the torso (T), and the head (H), where the last two are only used for normalization purpose. A weight is applied to give more importance to the joints that are more meaningful.

used joints                                


Invariant to user’s position:
All the joints are expressed with respect to the torso (T) joint to make the system robust to user’s position.
 Invariant to user’s size:
The joints are expressed in spherical coordinates (Figure 4), and the distances d are normalized by the factor of the distance dHT from Figure 5 to make the system robust to user’s size.

spherical_coordinates                       modulus joints


After evaluating the importance of each of one of the features d, θ, and ϕ, only d and ϕ result to be meaningful. Hence, the 8-dimensional descriptor from Figure 6 is built by storing the values of these features for all the joints along the frames.



The first proposed classifier is named as Nearest Group Dynamic Time Warping (NG-DTW).  The DTW is an algorithm that can tell about the similarity between two sequences of data that differ in length.  By computing the similarities between a given test sign and the ones from the dictionary, the test sign is labeled as the group of signs whose mean DTW value is the lowest. In the case of the second classifier (Nearest Neigbor DTW), the test is labeled with the closest sign from the dictionary instead of with the group with the smallest average value.


Daniel Martínez Capilla

Posted by Admin on Aprli 4, 2012

Personal picture
Acces to the option from below to have acces to my CV.


Watch online the presentation of my MSc Thesis

Posted by Admin on June 19, 2012


Watch online the presentation of my MSc Thesis about Sign Language Translator using Microsoft Kinect XBOX 360 that I gave during the last VIBOT day 2012. Also have a look to the publications and feel free to download the open source code.


How to make working Microsoft Kinect XBOX 360 from scratch

Posted by Admin on April 4, 2012

This tutorial (also included in my final report), tries to make your life easier when trying to set up the Kinect on your laptop. It took me for a while and I think it is interesting to share my experience with you.