OpenGPT-X: Teuken-7B - Fraunhofer IAIS

The European, open, multilingual large language model

Companies from all industries can now implement AI applications with »Teuken 7B« - the large language model from the OpenGPT-X research project is now available to download free of charge as open source on Hugging Face: »Teuken 7B-instruct-v0.4« is trained from scratch with the 24 official languages of the EU and has seven billion parameters. Developers from research and companies can download »Teuken-7B« and adapt, supplement and further fine-tune it as a basis for their applications. After this step, a model is created that is optimized for specific use cases in the company.

»Teuken 7B« is available in two versions: »Teuken 7B-instruct-research-v0.4« can be used for research purposes, »Teuken 7B-instruct-commercial-v0.4« is available to companies for commercial purposes under the »Apache 2.0« license. The model has already been optimized for chat by »Instruction Tuning«.

Press release
OpenGPT-X project website

In addition to the two Fraunhofer Institutes IAIS and IIS and the Jülich Research Center, the German AI Association, TU Dresden, the German Research Center for Artificial Intelligence (DFKI), IONOS, Aleph Alpha, ControlExpert and Westdeutscher Rundfunk (WDR) have collaborated on OpenGPT-X as partners.

Multilingual

Our model is fully multilingual, with training in all 24 EU languages.
It contains approximately 50 percent non-English pretraining data.
A performance comparison demonstrates that the model produces similar results across the linguistic spectrum.
Consequently, it reflects European characteristics, norms, and values, facilitating effective multilingual communication.

Open Source

The model is available for download at no cost from Hugging Face.
The »Apache 2.0« license permits adaptation, further development, and utilization of »Teuken 7B-instruct-commercial-v0.4« for commercial artificial intelligence applications.
Sensitive data may remain within the organization.
The research license allows you to freely use and further develop »Teuken-7B instruct-research-v0.4« for research and testing.

Science-driven

The product was developed by scientists for commercial use.
The multilingual tokenizer allows for particularly energy-efficient training and operation of multilingual applications.
The European Leaderboard, developed by our team, is designed to evaluate and assess a range of models in response to multilingual tasks.
Podcast Knowledge Science: Mehdi Ali and Michael Fromm from Fraunhofer IAIS explain the development of multilingual European AI systems.

Application of Teuken-7B in the company

Download »Teuken 7B«

Developers can download »Teuken 7B« free of charge under the »Apache 2.0« license (or under a research license) on Hugging Face.

free Download

Webinar / in German

Find out from the experts what is possible with »Teuken 7B«.

06.06.25, 11:00 - 11:45 a.m.
for companies / free of charge

Webinar / in English

Find out from the experts what is possible with »Teuken 7B«.

09.05.25, 11:00 - 11:45 a.m.
for companies / free of charge

Get started with us

We adapt »Teuken 7B« to align with your organizational procedures. To learn more about our offerings or to schedule a consultation, please do not hesitate to contact us.

continue

Technical Info & Research

Model cards and benchmarks

Technical data regarding the model and its utilization. Comparative graphical illustrations and technical explanations of the model in relation to other models.

view

Use Cases

The following represents a representative sample of specific application examples drawn from a variety of sectors, including industry, healthcare, legal, finance, and media.

view

Publications and Code Repositories

Research results on multilingual language models

view

LLM Community

We respond to technical and scientific inquiries from the community and provide a forum for feedback and discourse on the OpenGPT-X Discord server.

join the discussion

FAQ about Teuken-7B

Expand all Close all

What versions of »Teuken 7B« are available and what is the difference?

»Teuken 7B« is available free of charge in two license versions: »Teuken 7B-instruct-research-v0.4« can be used by the scientific community and companies for research purposes, »Teuken 7B-instruct-commercial-v0.4« is available to companies for commercial purposes under the »Apache 2.0« license.

»Teuken 7B-instruct-commercial-v.04« is comparable to the research version in terms of performance, although the research version achieves better results in the benchmarks by one to two percent. The reason for this is that some data sets used in the research version exclude commercial use and were therefore not used in the version for companies.
Where can I download the large language model »Teuken 7B« and is there a cost involved?

»Teuken 7B« is available for download free of charge and open source on Hugging Face.
How can I try »Teuken 7B« without downloading it myself?
Companies in particular have the opportunity to take part in a free webinar in which Fraunhofer scientists explain which applications can be realized with appropriate further processing on the basis of »Teuken 7B«.

register: webinar in German

register: webinar in English
For what purposes can I use »Teuken 7B« in my company?
»»Teuken 7B« is multilingual and has been optimized for chat through »instruction tuning«, so it can be used as a multilingual chatbot, e.g. in international customer service or to make company knowledge accessible to employees.

The following other applications can be implemented with »Teuken 7B«:

Areas of application:

Summarize documents

Generate texts

Extracting information from texts

»Teuken 7B« can be further processed through Continued Pretraining, Finetuning, Instruction Tuning, Model Merging etc. in order to adapt the model to the company's own purposes. The result is a model that is optimized for the individual use cases in the company.
What do I have to consider as a company if I want to use the model in my company?

In order to adapt the model to your own business purposes, »Teuken 7B-instruct-commercial« can also be further processed with your own data through Continued Pretraining, Finetuning, Instruction Tuning, Model Merging, etc.

The model performs well in a performance comparison with other open source models, but still has potential for development in the areas of logical thinking, coding and mathematics. In addition, »Teuken 7B«, like other large language models, can generate content that is inappropriate, offensive or harmful.
Is »Teuken 7B-instruct« like ChatGPT?

»Teuken 7B-instruct« is a chatbot that is primarily intended for corporate applications and research projects. Developers from companies and the scientific community can use it to develop their individual chat applications. »Teuken 7B-instruct-commercial« can also be further processed with your own data through continued pretraining, fine-tuning, instruction tuning, model merging, etc. to adapt the model to your own business purposes.
Is »Teuken 7B-instruct-commercial-v0.4« commercially usable?

Yes, companies can use »Teuken 7B-instruct-commercial-v0.4« commercially for their AI applications under the »Apache 2.0 license«.
How can I access the base model?

Basic models are particularly susceptible to the generation of inappropriate, offensive or harmful content. At the same time, base models offer the advantage that they can be developed into powerful special models through fine-tuning and instruction tuning if used correctly and responsibly.
For this reason, the »Teuken 7B-base-v0.4 base« model is not published, but companies and other stakeholders interested in the base model can contact Fraunhofer IAIS so that the use of the base model can be coordinated and supported.

Inquiries by mail to: contact@opengpt-x.de
Do obligations of the European AI Regulation (AI Act) have to be taken into account when using the model?

Currently no. The EU AI Act will not apply until August 2025. AI models that were placed on the market before this date do not have to comply with the requirements of the EU AI Act until August 2027 (grandfathering).
What's next for OpenGPT-X? Will more models be released?

The OpenGPT-X research project is nearing completion with the release of »Teuken 7B-instruct-v0.4« and will run until March 31, 2025. Until then, we will continue to optimize and evaluate the model. There is development potential for the model for relevant tasks such as logical thinking, coding and mathematics as well as bias and toxicity. Furthermore, by continuing the model training, we can increase the number of tokens (context window) processed simultaneously by the model.

As this is an open source project, we also assume that adapted or specialized versions of the model will be developed for different applications by the scientific community or companies.
Where can I find out how the model performs compared to other language models?
All evaluation results are available on our European Leaderboard:

to the Leaderboard

For technical information on the model and its application, please refer to our model card:

to the model card

We have summarized the key findings of the evaluation:

to the benchmark graphics

A detailed review of the model can be found here:

to the OpenGPT-X project website
Who can I contact if I, as a developer or researcher, have questions about the model or want to give feedback?
Our scientists are in contact with the LLM community via the OpenGPT-X Discord server. This is also the place for questions and feedback about the model.

join the discussion

Digital sovereignty for Europe

About OpenGPTX

The OpenGPT-X project, comprising ten partners, commenced on January 1st. The project commenced in January 2022 with funding from the Federal Ministry for Economic Affairs and Climate Action (BMWK) to the value of approximately €14 million and is scheduled to conclude on 31st December. March 2025. The project, led by Fraunhofer IAIS and Fraunhofer IIS, is investigating the entire value chain of generative AI. This includes high-scale GPU-based infrastructure and data for training large language models, model development, and productive application in the form of prototypes and proof of concepts (PoCs). The project's overarching goal was to develop a large open-source AI speech model for research and industry, tailored to Europe's multilingual needs.

The release of Teuken 7B-base-v0.4 marks the achievement of this goal, offering a publicly funded alternative for future scientific investigations and economic applications of generative AI.

Teuken 7B webinar

We recommend all interested parties to participate in our free webinar.

Please register via the following link:

The webinar serves as an introduction to Teuken and LLMs. If you have already taken part in a webinar or have a specific request, you can also start directly with a consultation appointment. Please use the following form:

Teuken-7B consultation date

* Required

Please enter the data requested in your inquiry.

Salutation

First name

Last name

Phone

Location

Institution / Company

Industry sector

I am interested in a complimentary consultation. The purpose of my inquiry is as follows:

Role within the institution

Upon submission of this form, I affirm that I have read and understood the data protection policy. I consent to the electronic collection and storage of the data I have provided. Upon submission of the contact form, I consent to the processing of my data.

The email address previously provided will be used to inform you of similar offers (e.g., events) via email. You may opt out of this use at any time by contacting Fraunhofer, particularly via email at widerspruch@iais.fraunhofer.de.

Additional information regarding data protection at Fraunhofer, including details concerning the legally mandated information obligations, is available for consultation at any time via our data protection declaration.