Generative AI

How does generative AI work?

At the heart of generative AI are large language models (LLMs) with powerful transformer architectures. These models are trained on high-performance computers using extensive amounts of data and form the engine of modern AI systems, from retrieval-augmented generation (RAG) approaches to agentic systems.

 

What are the main areas of research in generative AI?

Research at Fraunhofer IAIS focuses on the further development of fundamental AI technologies and the responsible design of large language models.

  • AI-based curation and synthesis of high-quality training data: We are researching new methods in which (domain-specific) training data is selected and generated with the help of AI. This data can be used to make training more efficient and develop more powerful models.
  • Improved reasoning skills: We are researching reasoning models that first break down complex tasks into understandable sub-steps and then solve them coherently, step by step.
  • Knowledge distillation: We are exploring new approaches to transfer selected capabilities of large models into compact, energy-efficient models that can be operated locally.
  • Domain-specific adaptation: We are researching methods for optimizing lean, specialized models in order to solve domain-specific tasks more precisely and resource-efficiently.
  • Multi-agent systems: We research modular agent systems that dynamically combine different models depending on the task at hand and cooperatively solve complex problems.
  • Multimodality: We conduct research on multimodal foundation models that can process text, images, audio, video, and structured data in various combinations. Our focus is on new methods that improve multimodal reasoning across multiple modalities.

Research into generative AI is closely linked to the Lamarr Institute's Foundation Model Group. An important milestone is the development of the Teuken models as part of the OpenGPT-X research project, which have set new standards through innovative tokenization and multilingual training data.

 

What are the benefits of research into generative AI for corporate applications?

Our research aims to develop the next generation of AI systems that are more powerful, efficient, and adaptable. We are making great strides in the areas of data curation and synthesis, reasoning, knowledge distillation, domain-specific model adaptation, and dynamically composable multi-agent systems. This enables companies to deploy AI solutions that are optimally tailored to their individual requirements and can be seamlessly integrated into existing processes.

Research projects and collaborations

 

Development of the multilingual language model Teuken 7B

OpenGPT-X

The OpenGPT-X project researched and tested the entire value chain of generative AI. Teuken 7B, a European, trustworthy, multilingual language model, was developed and released. One of the key research questions was how multilingual AI language models can be trained and operated in the most energy- and cost-efficient way possible.

AI technology platform for businesses and the public sector

DeployAI

DeployAI is dedicated to making AI solutions more accessible and usable for small and medium-sized enterprises (SMEs) and the public sector across Europe. The initiative is funded by the European Commission with €28 million and is a joint project to shape the future of AI in Europe.

The next generation of multilingual open-source language models

OpenEuroLLM

Europe's leading AI research institutions and organizations, including Fraunhofer IAIS, are pooling their resources and expertise in the OpenEuroLLM project. This is where the next generation of multilingual open-source language models is being developed – open, trustworthy, and multilingual.

 

AI turbo for Europe: Companies gain access to supercomputers

Jupiter AI Factory

The AI Factory around the Jülich supercomputer JUPITER aims to advance the training of next-generation AI models and, in particular, to support German and European start-ups as well as small and medium-sized enterprises in the development of powerful, secure, and data protection-compliant AI applications.

Development of an open, trustworthy, and fact-based LLM

TrustLLM

The main goal of TrustLLM is to develop an open, trustworthy, and fact-based LLM that initially focuses on Germanic languages. This will lay the foundation for an advanced open ecosystem for modular and expandable next-generation European LLMs.

Further collaborations

At the heart of a growing, closely networked innovation ecosystem, Fraunhofer IAIS conducts research on various topics related to Artificial Intelligence and Machine Learning.

Further information

Paper (2025)

"Judging Quality Across Languages (JQL)"

 

High-quality multilingual training data is essential for effectively pretraining large language models (LLMs). Yet, the availability of suitable open-source multilingual datasets remains limited. Existing state-of-the-art datasets mostly rely on heuristic filtering methods, restricting both their cross-lingual transferability and scalability.

In this paper, we introduce JQL, a systematic approach that efficiently curates diverse and high-quality multilingual data at scale while significantly reducing computational demands. 

Book (2023)

"Foundation Models for Natural Language Processing"

 

The book “Foundation Models for Natural Language Processing – Pre-trained Language Models Integrating Media” by Sven Giesselbach and Gerhard Paaß (Fraunhofer IAIS) provides a concise overview of the current state of research and the diverse applications of foundation models in the field of natural language processing (NLP). The book was published by Springer-Verlag in 2023 and is available for download free of charge.

Podcasts (in German)

Our experts in conversation

 

Europäische LLMs

Knowledge Science Episode 168

23.11.2024 | Dr. Mehdi Ali, Michael Fromm

Listen here

 

 

ChatGPT und Co – Potenzial von KI-Sprachmodellen

10.2.2023 | Prof. Dr. Christian Bauckhage

Listen here

Contact

 

Dr. Mehdi Ali

Head of the Innovation Group for Research on Foundation Models