Cognitive Perception

Thanks to our technology intelligent machines are able to process visual and auditory information, assess situations and interact with their environment. Because, after all, driverless road vehicles or robots working side by side with production workers must be able to perceive their environment. No collaboration between humans and intelligent systems is possible without these perception skills.

As humans we often combine different sensory impressions and process them simultaneously to get an immediate and precise picture of our environment. Similarly, if artificially intelligent systems are able to assess and predict the situations in real-time they must be able to draw on and process a variety of information from different channels.

We have, therefore, adapted Machine Learning processes to specifically process auditory and visual information. We use a combination of facial recognition and voice recognition, based on speech characteristics, to identify individual speakers, for example in TV programs. This type of multimodal indexing, i.e. the combination and analysis of information from different channels and in a variety of formats increases the likelihood of recognizing what you are looking for more reliably and quickly.

The systems we have developed for recognizing spoken German are the most accurate in the world. We can draw on large quantities of training data in the form of our own speech database which contains in excess of 1000 hours of transcribed voice recordings. Our solutions have been used successfully for many years across many sectors including the media industry and are continually being refined.

We use Machine Learning, particularly deep neural networks, to recognize objects such as road traffic signs. These Deep Learning methods are particularly successful if the data is characterized by a hierarchical structure and a large amount of training data is available.

If sufficient training data is not available for certain scenarios we favor hybrid Machine Learning processes that allow us to add the knowledge of experts to the data we do have. We also research other data efficient learning solutions designed to generate artificial training data, for example.

Go to our technical publications

The recognition and analysis of speech, audio signals, images, videos and documents by means of artificial intelligence are the focus of our research.

Cooperations

Training courses

Research priorities

Multimodal recognition and indexing

Data efficient learning

Representation learning

Embedded and real-time perception

Highlights

Fraunhofer audio mining for the German television network ARD

Live subtitling in the parliament of the Free State of Saxony

Speech dialog system

Image recognition patent

Recognizing road traffic signs in roadworks

Condition monitoring in sewer networks

Contact Press / Media

Dr.-Ing. Joachim Köhler