Speech technology “made in Germany” is the ideal tool for extrapolating audiovisual content and enabling intuitive interaction in business-to-business applications.

Speech Technologies

Spoken language is one of the most important types of communication. Leading the market in Germany, our speech recognition technology recognizes spoken language, converts it into text and makes it browsable. Increasingly, the spoken word is replacing using a keyboard or graphic user interface to interact with technical systems. This is where speech assistance systems tailor-made for our customers come into play.

This intuitive interaction with technology is not just useful in everyday life, but also gives companies the major opportunity to optimize services and offer new services. Our solutions also make it easier for our customers to retrieve the information they are after. With audio mining, for example, it is possible to recognize individual speakers based on speech and voice and to systematically identify audio content in media archives.

We are using our speech technology to develop voice-enabled dialog systems that are able to answer questions and control devices. All of the components required for this technology have been developed by us and can be tailored to our customers’ individual requirements: from speech recognition and incorporating domain-specific information from a range of disciplines right up to the output of naturally spoken language.

This technology combines state-of-the-art components, including the incorporation of knowledge via knowledge graphs, to address the specific challenges facing business-to-business applications. With the help of Machine Learning speech assistance and dialog systems can be trained to recognize subject- and industry-specific knowledge and terminology.

Any system developed by us also guarantees technical sovereignty: Every single component is developed in Germany, partly using open-source components. Sensitive data can be stored and processed in secure data rooms. In the medical field, for instance, we often prefer local installations instead of cloud-based solutions. Our technology is tailored specifically to the German language and its use in industry and commerce.

Portfolio of services

Our speech recognition solutions have been used successfully for many years by our customers, including many media professionals. We also offer individually configurable speech assistance and dialog systems which cater for companies’ requirements and can be used across a range of business sectors.

 

Mining platform

Using our AI-based Mining Platform, companies can exploit their text, audio and video information fully automatically, generating valuable metadata. This metadata could then be used to search archives for relevant content such as files or documents more quickly.

Automated speech recognition

Our real-time speech recognition applications are highly accurate including when working with dialects. Transcribed recordings are searchable by original sound recordings. We tailor our solutions to our customers’ individual requirements and integrate the terminology used in their specific field.

Speech-enabled dialog systems

We develop dialog systems for industry and commerce, paying particular attention to and incorporating semantically structured expertise and factual knowledge drawn from a variety of data sources.

Voice-based diagnostics

We can analyze voice characteristics to help diagnose medical conditions – including the early detection of illnesses such as Parkinson’s or to recognize blockages in the lung.

Highlights

Accessing the ARD archives using audio mining

As part of our long-term cooperation with the German television network ARD we have been using automated speech recognition and other audio technologies such as voice recognition for archival and editorial work.

This tool allows journalists to search for spoken keywords across the entire ARD archive to retrieve relevant articles and contributions. Voice recognition means statements made by individuals can be accurately searched for and retrieved. Generating subtitles or transcribing raw material provides valuable editorial support.

Live subtitling in the parliament of the Free State of Saxony

The parliament (Landtag) of the Free State of Saxony uses our speech recognition system for live subtitling of plenary session broadcasts. We were tasked with training the system to recognize a range of specialist legal and political terms as well as the names of politicians.

An additional module automatically inserts punctuation and ensures the text flows naturally and in a structured fashion. Applications requiring a high level of data protection can be installed locally using standard server architecture. Cloud-based speech recognition systems are also equally feasible.

Interactive city tour guide using a voice-activated assistance platform

We have worked with Volkswagen AG to develop a prototype speech dialog system. The in-built dialog system acts as an interactive city tour guide answering the driver’s questions about selected points of interest along the route. The prototype is a good example of how our speech technology interacts – speech recognition, content analysis using knowledge graphs and speech synthesis – in dialog systems based on domain-specific knowledge.

SPEAKER project

As part of the BMWi-financed SPEAKER project the Fraunhofer IAIS and IIS institutes are jointly developing a speech assistance platform “Made in Germany”. The individual components of the speech assistance and dialog systems can be easily adapted to individual industry sectors and customers.

Particular attention is paid to maintaining data sovereignty by observing data protection standards. Assistance systems tailor-made to the individual clients’ needs are organized in modular fashion on the platform and easily put into practice.

Voice diagnostics

We use speech and voice analysis tools to diagnose diseases and support therapies to treat them. As part of an EU-wide project we developed the Parkinson’s disease early detection i-PROGNOSIS app. The app not only logs speech data but also timing and pressure data when the patient is using the i-PROGNOSIS keyboard. The app also logs location data, facial expression data taken from front-facing camera shots and affective content data from stored text messages. The app conforms to the current EU data protection regulations.