Playing with advanced capabilities

NLP

Background

Sometimes, the control of a mission must be accomplished while the pilot, or more generally, the commander, is executing any other task at the same time. It means a high loss of efficiency if there is a need to be thinking about a complex coded  language, and even been typing it on any graphical interface, while a critical situation that requires quickness and lucidity. With this situation in mind, the goal of this part of the project is to get an abstraction while commanding.

This work is part of the Master Thesis of the Master on Mechanical Engineering. A natural language commanding tool has been developed that may be used via the keyboard on a terminal or simply with the voice. This way, the commander may be in charge of the mission by visualizing its current
state thanks to the visualization tools, described inside the MAGNA part, while commanding via oral.

GitHub source code

ControlVoiceAssistant

Frontend

The Frontend is implemented as a Python class with no ROS-related element on it. Its responsibility is to act as the interface with the user, leaving to the backend the ROS part of talking to compound the message and send it to the Ground Station node.

  • Initially, the backend is initialised and stays waiting for incoming commands using PyAudio library.
  • The audio information is sent to Google Speech Recognition library to recognize speech so the raw text is achieved.
  • Finally, the text is given to the WIT API that splits it into meaningful information with its online pretrained model.

Natural Language Processing application

The information contained in the voice or text command needs to be extracted to a structure understandable by a machine. Training a neural network for Natural Language Processing would have been a good option since the focus of this project is to dig into Machine Learning knowledge. However, a quicker way has been chosen initially to act as a bridge for this step.

WIT, like some other platforms, offer a pre-trained model for free. This model already extracts general information, but a deeper knowledge of the singular application must be provided by the user. This way, before using their Python API, the online database must be taught with a group of sentences whose main information is extracted so the model learns how to extrapolate the process from those examples. The figure below presents some of those examples extracted from the WIT page of the application explicitly created for this project.

Slide 1 Heading
Lorem ipsum dolor sit amet consectetur adipiscing elit dolor
Click Here
Slide 1 Heading
Lorem ipsum dolor sit amet consectetur adipiscing elit dolor
Click Here
Slide 1 Heading
Lorem ipsum dolor sit amet consectetur adipiscing elit dolor
Click Here
Slide 1 Heading
Lorem ipsum dolor sit amet consectetur adipiscing elit dolor
Click Here
Previous
Next

Backend

The Backend is a ROS node created for communications with MAGNA. Hence, the metainformation received from WIT must be understood, divided into meaningful pieces of data, reconstructed and sent. This is a straight forward process that is performed every time that the Frontend detects a new voice command.

Provided the message entities returned by WIT, the first task is to retrieve the main intent of the command. For that purpose, it is implemented a check of the existence of that information, given the fact that the message would be understandable noise and no practical information was extracted. The second important data is the agent or UAV that must accomplish the action, contained in the “main” entity. It is imperative that these two fields are known, otherwise, no message may be interpreted. With them, a dictionary defining the message is built and later updated.

All entities are then mapped to an action interpretable by MAGNA. For that purpose, a message is fulfilled and sent to the Ground Station.