OpenGPT-X: Teuken 7B

The European, open, multilingual large language model

Companies from all industries can now implement AI applications with »Teuken 7B« - the large language model from the OpenGPT-X research project is now available to download free of charge as open source on Hugging Face: »Teuken 7B-instruct-v0.4« is trained from scratch with the 24 official languages of the EU and has seven billion parameters. Developers from research and companies can download »Teuken-7B« and adapt, supplement and further fine-tune it as a basis for their applications. After this step, a model is created that is optimized for specific use cases in the company.

»Teuken 7B« is available in the following versions:

for research purposes:
»Teuken 7B-instruct-research-v0.4«
for non-commercial purposes:
»Teuken 7B-instruct-v0.6«
»Teuken 7B-base-v0.6«
for companies for commercial purposes under the »Apache 2.0« license:
»Teuken 7B-instruct-commercial-v0.4«
Press release
OpenGPT-X project website

In addition to the two Fraunhofer Institutes IAIS and IIS and the Jülich Research Center, the German AI Association, TU Dresden, the German Research Center for Artificial Intelligence (DFKI), IONOS, Aleph Alpha, ControlExpert and Westdeutscher Rundfunk (WDR) have collaborated on OpenGPT-X as partners.

Multilingual

Our model is fully multilingual, with training in all 24 EU languages.
It contains approximately 50 percent non-English pretraining data.
A performance comparison demonstrates that the model produces similar results across the linguistic spectrum.
Consequently, it reflects European characteristics, norms, and values, facilitating effective multilingual communication.

Open

The model can be downloaded free of charge in various versions and licenses from Hugging Face.
»Teuken 7B« can be used and further developed in versions 0.4 and 0.6 in research.
The »Apache 2.0« license permits adaptation, further development, and utilization of »Teuken 7B-instruct-commercial-v0.4« for commercial artificial intelligence applications.
Sensitive data may remain within the organization.

Science-driven

The product was developed by scientists for commercial use.
The multilingual tokenizer allows for particularly energy-efficient training and operation of multilingual applications.
The European Leaderboard, developed by our team, is designed to evaluate and assess a range of models in response to multilingual tasks.
Podcast Knowledge Science: Mehdi Ali and Michael Fromm from Fraunhofer IAIS explain the development of multilingual European AI systems (in German).

Application of Teuken-7B in the company

Download »Teuken 7B«

Developers can download »Teuken 7B« free of charge on Hugging Face.

free Download

Webinar / in English

As soon as a webinar in English becomes available, we will announce it here. Follow us on social media to stay up to date!

Get started with us

We adapt »Teuken 7B« to align with your organizational procedures. To learn more about our offerings or to schedule a consultation, please do not hesitate to contact us.

continue

Technical Info & Research

Model cards and benchmarks

Technical data regarding the model and its utilization. Comparative graphical illustrations and technical explanations of the model in relation to other models.

view

Use Cases

The following represents a representative sample of specific application examples drawn from a variety of sectors, including industry, healthcare, legal, finance, and media.

view

Publications and Code Repositories

Research results on multilingual language models

view

LLM Community

We respond to technical and scientific inquiries from the community and provide a forum for feedback and discourse on the OpenGPT-X Discord server.

join the discussion

FAQ about Teuken-7B

Alle ausklappen Alle einklappen

What versions of »Teuken 7B« are available and what is the difference?
»Teuken 7B« is available in the following versions:

»Teuken 7B-instruct-research-v0.4« for research purposes.

»Teuken 7B-instruct-commercial-v0.4« for companies for commercial purposes under the »Apache 2.0« license. The model has already been optimized for chat through Instruction Tuning. »Teuken 7B-instruct-commercial-v.04« is comparable to the research version in terms of performance, although the research version achieves better results in the benchmarks by one to two percent.

Teuken 7B-instruct-v0.6« and »Teuken 7B-base-v0.6« for non-commercial purposes under the license »CC BY-NC 4.0«. The update features significant improvements compared to »Teuken 7B-instruct-v.04«, including increased performance, improved robustness and reliability, and expanded application flexibility.
Where can I download the large language model »Teuken 7B« and is there a cost involved?

»Teuken 7B« is available for download free of charge and open source on Hugging Face.
How can I try »Teuken 7B« without downloading it myself?

Companies in particular have the opportunity to take part in a free webinar in which Fraunhofer scientists explain which applications can be realized with appropriate further processing on the basis of »Teuken 7B«.
For what purposes can I use »Teuken 7B« in my company?
»Teuken 7B-instruct-commercial-v0.4« is multilingual and has been optimized for chat through Instruction Tuning, so it can be used as a multilingual chatbot, e.g. in international customer service or to make company knowledge accessible to employees.

The following other applications can be implemented with »Teuken 7B«:

Areas of application:

Summarize documents

Generate texts

Extracting information from texts

»Teuken 7B« can be further processed through Continued Pretraining, Finetuning, Instruction Tuning, Model Merging etc. in order to adapt the model to the company's own purposes. The result is a model that is optimized for the individual use cases in the company.
What do I have to consider as a company if I want to use the model in my company?

Select the version »Teuken 7B-instruct-commercial-v0.4«.

In order to adapt the model to your own business purposes, »Teuken 7B-instruct-commercial« can also be further processed with your own data through Continued Pretraining, Finetuning, Instruction Tuning, Model Merging, etc.

The model performs well in a performance comparison with other open source models, but still has potential for development in the areas of logical thinking, coding and mathematics. In addition, »Teuken 7B«, like other large language models, can generate content that is inappropriate, offensive or harmful.
Is »Teuken 7B-instruct« like ChatGPT?

»Teuken 7B-instruct« is a chatbot that is primarily intended for corporate applications and research projects. Developers from companies and the scientific community can use it to develop their individual chat applications. »Teuken 7B-instruct-commercial-v0.4« can also be further processed with your own data through continued pretraining, fine-tuning, instruction tuning, model merging, etc. to adapt the model to your own business purposes.
Is »Teuken 7B-instruct-commercial-v0.4« commercially usable?

Yes, companies can use »Teuken 7B-instruct-commercial-v0.4« commercially for their AI applications under the »Apache 2.0 license«.
How can I access the base model?

The basic model in version »Teuken 7B-base-v0.6« can be downloaded here for research purposes and for private, educational, and non-commercial use.

Basic models are particularly susceptible to the generation of inappropriate, offensive or harmful content. At the same time, base models offer the advantage that they can be developed into powerful special models through fine-tuning and instruction tuning if used correctly and responsibly.
Do obligations of the European AI Regulation (AI Act) have to be taken into account when using the model?

Currently no. The EU AI Act will not apply until August 2025. AI models that were placed on the market before this date do not have to comply with the requirements of the EU AI Act until August 2027 (grandfathering).
What's next for OpenGPT-X? Will more models be released?

The OpenGPT-X research project has been completed.

As this is an open source project, we also assume that adapted or specialized versions of the model will be developed for different applications by the scientific community or companies.
Where can I find out how the model performs compared to other language models?
All evaluation results are available on our European Leaderboard:

to the Leaderboard

For technical information on the model and its application, please refer to our model card:

to the model card

We have summarized the key findings of the evaluation:

to the benchmark graphics

A detailed review of the model can be found here:

to the OpenGPT-X project website
Who can I contact if I, as a developer or researcher, have questions about the model or want to give feedback?
Our scientists are in contact with the LLM community via the OpenGPT-X Discord server. This is also the place for questions and feedback about the model.

join the discussion

OpenGPT-X: Digital sovereignty for Europe

The OpenGPT-X project, comprising ten partners, commenced on January 1st. The project commenced in January 2022 with funding from the Federal Ministry for Economic Affairs and Climate Action (BMWK) to the value of approximately €14 million and ended on March 31, 2025. The project, led by Fraunhofer IAIS and Fraunhofer IIS, is investigating the entire value chain of generative AI. This includes high-scale GPU-based infrastructure and data for training large language models, model development, and productive application in the form of prototypes and proof of concepts (PoCs). The project's overarching goal was to develop a large open-source AI speech model for research and industry, tailored to Europe's multilingual needs.

The release of »Teuken 7B« marks the achievement of this goal, offering a publicly funded alternative for future scientific investigations and economic applications of generative AI.

Teuken 7B webinar

We recommend all interested parties to participate in our free webinar.

The webinar is an introduction to Teuken and LLMs.

Further dates are being planned and will be announced here.

If you have already taken part in a webinar or have a specific request, you can also start directly with a consultation appointment. Please use the following form:

Teuken 7B consultation date

* Pflichtfelder

Please enter the data requested in your inquiry.

Salutation

First name

Last name

Phone

Location

Institution / Company

Role within the institution

Industry sector

I am interested in a complimentary consultation. The purpose of my inquiry is as follows:

Message Verfügbare Zeichen:

Upon submission of this form, I affirm that I have read and understood the data protection policy. I consent to the electronic collection and storage of the data I have provided. Upon submission of the contact form, I consent to the processing of my data.

The email address previously provided will be used to inform you of similar offers (e.g., events) via email. You may opt out of this use at any time by contacting Fraunhofer, particularly via email at widerspruch@iais.fraunhofer.de.

Additional information regarding data protection at Fraunhofer, including details concerning the legally mandated information obligations, is available for consultation at any time via our data protection declaration.

Verification

Please enter the characters shown

Teuken 7B

The European, open, multilingual large language model

Selected application areas

Multilingual

Open

Science-driven

Application of Teuken-7B in the company

Download »Teuken 7B«

Webinar / in English

Get started with us

Technical Info & Research

Model cards and benchmarks

Use Cases

Publications and Code Repositories

LLM Community

FAQ about Teuken-7B

What versions of »Teuken 7B« are available and what is the difference?

Where can I download the large language model »Teuken 7B« and is there a cost involved?

How can I try »Teuken 7B« without downloading it myself?

For what purposes can I use »Teuken 7B« in my company?

What do I have to consider as a company if I want to use the model in my company?

Is »Teuken 7B-instruct« like ChatGPT?

Is »Teuken 7B-instruct-commercial-v0.4« commercially usable?

How can I access the base model?

Do obligations of the European AI Regulation (AI Act) have to be taken into account when using the model?

What's next for OpenGPT-X? Will more models be released?

Where can I find out how the model performs compared to other language models?

Who can I contact if I, as a developer or researcher, have questions about the model or want to give feedback?

OpenGPT-X: Digital sovereignty for Europe

Research Generative AI

Teuken 7B webinar

We recommend all interested parties to participate in our free webinar.

Teuken 7B consultation date