Speech Technologies

Spoken language is one of the most important means of communication. Leading the market in Germany, our speech recognition technology recognizes spoken language in 99 languages, converts it into text and thus makes it searchable. Increasingly, the spoken word is replacing the use of a keyboard or graphic user interface to interact with technical systems. This is where speech assistance systems tailor-made for our customers come into play.

This intuitive interaction with technology is not just useful in everyday life, but also gives companies a major opportunity to optimize existing services and offer new services. Our solutions also make it easier for our customers to retrieve the information they are looking for. With Audio Mining, for example, it is possible to recognize individual speakers based on speech and voice and to systematically identify audio content in media archives.

We are using our speech technology to develop voice-enabled dialog systems that are able to answer questions and control devices. All of the components required for this technology have been developed by us and can be tailored to our customers’ individual requirements: from speech recognition and incorporating domain-specific information from a range of disciplines right up to the output of synthesized speech.

This technology combines state-of-the-art components, including the incorporation of knowledge via knowledge graphs, to address the specific challenges facing business-to-business applications. With the help of machine learning speech assistance and dialog systems can be trained to recognize domain and industry-specific knowledge and terminology.

Any system developed by us also guarantees technical sovereignty: Every single component is developed in Germany, partly using open-source components. Sensitive data can be stored and processed in secure data rooms. In the medical field, for instance, we often prefer local installations instead of cloud-based solutions. Our technology is tailored specifically to the German language and its use in industry and commerce.

Portfolio of services

Our speech recognition solutions have been used successfully for many years by our customers, including many media professionals. We also offer individually configurable speech assistance and dialog systems which cater for companies’ requirements and can be used across a range of business sectors.


Audio Mining

The Fraunhofer IAIS Audio Mining system allows audiovisual media to be searched specifically for media information such as terms, quotations, and speakers, and media libraries can be better managed.


Live Automatic Speech Recognition (ASR)

With the Live ASR technology from Fraunhofer IAIS you can experience our speech recognition technology in real time. Maximize efficiency and inclusive communication with the highest accuracy for your transcription needs.


Generative AI

Seize the opportunities offered by Generative Artificial Intelligence. We support you with more than 20 years of AI experience and a broad portfolio for enterprises. Start with us where you are, whether you want to get an overview, work on concrete solutions, or plan the implementation in your infrastructures.

99 languages

With our innovative Audio Mining, media can easily be transcribed into 99 different languages, expanding your international reach.


Voice-based diagnostics

We can analyze voice characteristics to help diagnose medical conditions – including the early detection of illnesses such as Parkinson’s or to recognize blockages in the lung.

Mining platform

Using our AI-based Mining Platform, companies can exploit text, audio and video information fully automatically, generating valuable metadata. These could then be used to search archives for content such as files or documents more quickly.

Speech Technologies made by Fraunhofer IAIS

Customized solution

We can tailor our speech technologies specifically to your individual requirements and can thus support you in achieving your internal company goals. In doing so, we can draw on the expertise of the Fraunhofer network.

We are also happy to assist you with implementation and use. Please do not hesitate to contact us if you have any questions.

Fraunhofer network

A strong partner for your future development: Fraunhofer IAIS stands at the center of a strong research network and coordinates, among other things, as the managing institute of the Fraunhofer Big Data and Artificial Intelligence Alliance, which bundles the cross-industry expertise of more than 30 Fraunhofer institutes in Big Data and Artificial Intelligence.

Quality guarantee

At Fraunhofer IAIS, we are constantly developing our technologies to ensure that you benefit from the latest state of the art.

We pursue high quality standards: Thus, speech recognition "made by Fraunhofer" has the highest recognition quality in the industry for the German language.


Allinga: Voice Assistants for professional environments

In a team of over 60 experts, the Fraunhofer Institutes IIS and IAIS have developed the speech assistance solution Allinga. Allinga enables greater efficiency, barrier-free communication, the relief of employees and much more.

Two modules, the speech recognition and speech synthesis "Allinga Voice", are already available and successfully proving themselves on the market. Further components are currently under development.

Emotion recognition through speech, image and text analysis

In the research project "Multimodal mining of eyewitness interviews for the indexing of audiovisual cultural heritage", which is being carried out with the foundation "Haus der Geschichte", our scientists are developing a technology that recognizes and categorizes emotions in eyewitness interviews. For this purpose, the transcript, voice pitch, speech rate and facial expressions of the eyewitnesses are analyzed. The intelligent video analysis uses technologies of speech recognition, image recognition and text recognition. In the future, this will enable a targeted search for emotions, e.g. about the fall of the Berlin Wall, at www.zeitzeugen-portal.de.

Accessing the ARD archives using audio mining

As part of our long-term cooperation with the German television network ARD we have been using automated speech recognition and other audio technologies such as voice recognition for archival and editorial work.

This tool allows journalists to search for spoken keywords across the entire ARD archive to retrieve relevant articles and contributions. Voice recognition means statements made by individuals can be accurately searched for and retrieved. Generating subtitles or transcribing raw material provides valuable editorial support.

Live subtitling in the parliament of the Free State of Saxony

The parliament (Landtag) of the Free State of Saxony uses our speech recognition system for live subtitling of plenary session broadcasts. We were tasked with training the system to recognize a range of specialist legal and political terms as well as the names of politicians.

An additional module automatically inserts punctuation and ensures the text flows naturally and in a structured fashion. Applications requiring a high level of data protection can be installed locally using standard server architecture. Cloud-based speech recognition systems are also equally feasible.

SELMA project

The SELMA initiative, part of the Horizon 2020 program, is revolutionizing media surveillance and journalism by analyzing large amounts of data. The goal is to develop an advanced, multilingual open source platform for efficient content processing. The platform integrates self-learning AI to create a cross-lingual information space that continuously collects and analyzes original language data. SELMA improves the accessibility of audiovisual media through transcription, translation, subtitling and voice-over. The project, led by a consortium including Fraunhofer IAIS, will carry out knowledge transfer and content processing in several languages from January 2021 to December 2023.

SPEAKER project

As part of the BMWi-financed SPEAKER project the Fraunhofer IAIS and IIS institutes are jointly developing a speech assistance platform “Made in Germany”. The individual components of the speech assistance and dialog systems can be easily adapted to individual industry sectors and customers.

Particular attention is paid to maintaining data sovereignty by observing data protection standards. Assistance systems tailor-made to the individual clients’ needs are organized in modular fashion on the platform and easily put into practice.

Conversational AI

At Fraunhofer IAIS in Dresden we develop intelligent dialog systems based of a question answering technique which gives users quick and efficient access to information via voice- and text-based input. This results in total solutions which are data protection compliant and can be used for many applications in the automotive, medicine, manufacturing, finance or tourism sectors and can be tailored to the users’ specific requirements.

Question answering techniques

Question/answer systems process spoken questions and answer them using natural language. In order to find the right answers the systems trawl through information from a wide range of sources. Our experts make use of innovative concepts and technologies based on "Linked Data", "Deep Learning" and "Natural Language Processing". This means that users do not need to send complex search requests and so remain free to concentrate fully on what they are meant to be doing. In practice these systems can support customer advisers in telecommunication companies or serve as in-car voice-activated assistants.