OpenGPT-X: Teuken-7B

The European, open, multilingual large language model

Companies from all industries can now implement AI applications with »Teuken 7B« - the large language model from the OpenGPT-X research project is now available to download free of charge as open source on Hugging Face: »Teuken 7B-instruct-v0.4« is trained from scratch with the 24 official languages of the EU and has seven billion parameters. Developers from research and companies can download »Teuken-7B« and adapt, supplement and further fine-tune it as a basis for their applications. After this step, a model is created that is optimized for specific use cases in the company.

»Teuken 7B« is available in two versions: »Teuken 7B-instruct-research-v0.4« can be used for research purposes, »Teuken 7B-instruct-commercial-v0.4« is available to companies for commercial purposes under the »Apache 2.0« license. The model has already been optimized for chat by »Instruction Tuning«.
 

In addition to the two Fraunhofer Institutes IAIS and IIS and the Jülich Research Center, the German AI Association, TU Dresden, the German Research Center for Artificial Intelligence (DFKI), IONOS, Aleph Alpha, ControlExpert and Westdeutscher Rundfunk (WDR) have collaborated on OpenGPT-X as partners.

Multilingual

  • Our model is fully multilingual, with training in all 24 EU languages.
  • It contains approximately 50 percent non-English pretraining data.
  • A performance comparison demonstrates that the model produces similar results across the linguistic spectrum.
  • Consequently, it reflects European characteristics, norms, and values, facilitating effective multilingual communication.

Open Source

  • The model is available for download at no cost from Hugging Face.
  • The »Apache 2.0« license permits adaptation, further development, and utilization of »Teuken 7B-instruct-commercial-v0.4« for commercial artificial intelligence applications.
  • Sensitive data may remain within the organization.
  • The research license allows you to freely use and further develop »Teuken-7B instruct-research-v0.4« for research and testing.

Science-driven

  • The product was developed by scientists for commercial use.
  • The multilingual tokenizer allows for particularly energy-efficient training and operation of multilingual applications.
  • The European Leaderboard, developed by our team, is designed to evaluate and assess a range of models in response to multilingual tasks.
  • Podcast Knowledge Science: Mehdi Ali and Michael Fromm from Fraunhofer IAIS explain the development of multilingual European AI systems.

Application of Teuken-7B in the company

Download »Teuken 7B«

Developers can download »Teuken 7B« free of charge under the »Apache 2.0« license (or under a research license) on Hugging Face.

Webinar / in German

Find out from the experts what is possible with »Teuken 7B«. 

  • 06.06.25, 11:00 - 11:45 a.m.
  • for companies / free of charge

Webinar / in English

Find out from the experts what is possible with »Teuken 7B«. 

  • 09.05.25, 11:00 - 11:45 a.m.
  • for companies / free of charge

Get started with us

We adapt »Teuken 7B« to align with your organizational procedures. To learn more about our offerings or to schedule a consultation, please do not hesitate to contact us.

 

Technical Info & Research

 

Model cards and benchmarks

Technical data regarding the model and its utilization. Comparative graphical illustrations and technical explanations of the model in relation to other models.

 

Use Cases

The following represents a representative sample of specific application examples drawn from a variety of sectors, including industry, healthcare, legal, finance, and media.

 

Publications and Code Repositories

Research results on multilingual language models

 

LLM Community

We respond to technical and scientific inquiries from the community and provide a forum for feedback and discourse on the OpenGPT-X Discord server.

FAQ about Teuken-7B

  • »Teuken 7B« is available free of charge in two license versions: »Teuken 7B-instruct-research-v0.4« can be used by the scientific community and companies for research purposes, »Teuken 7B-instruct-commercial-v0.4« is available to companies for commercial purposes under the »Apache 2.0« license.

    »Teuken 7B-instruct-commercial-v.04« is comparable to the research version in terms of performance, although the research version achieves better results in the benchmarks by one to two percent. The reason for this is that some data sets used in the research version exclude commercial use and were therefore not used in the version for companies.

  • »Teuken 7B« is available for download free of charge and open source on Hugging Face.

  • Companies in particular have the opportunity to take part in a free webinar in which Fraunhofer scientists explain which applications can be realized with appropriate further processing on the basis of »Teuken 7B«.

  • »»Teuken 7B« is multilingual and has been optimized for chat through »instruction tuning«, so it can be used as a multilingual chatbot, e.g. in international customer service or to make company knowledge accessible to employees.

    The following other applications can be implemented with »Teuken 7B«:

    Areas of application:

    • Summarize documents
    • Generate texts
    • Extracting information from texts

    »Teuken 7B« can be further processed through Continued Pretraining, Finetuning, Instruction Tuning, Model Merging etc. in order to adapt the model to the company's own purposes. The result is a model that is optimized for the individual use cases in the company.

  • In order to adapt the model to your own business purposes, »Teuken 7B-instruct-commercial« can also be further processed with your own data through Continued Pretraining, Finetuning, Instruction Tuning, Model Merging, etc.


    The model performs well in a performance comparison with other open source models, but still has potential for development in the areas of logical thinking, coding and mathematics. In addition, »Teuken 7B«, like other large language models, can generate content that is inappropriate, offensive or harmful.

  • »Teuken 7B-instruct« is a chatbot that is primarily intended for corporate applications and research projects. Developers from companies and the scientific community can use it to develop their individual chat applications. »Teuken 7B-instruct-commercial« can also be further processed with your own data through continued pretraining, fine-tuning, instruction tuning, model merging, etc. to adapt the model to your own business purposes.

  • Yes, companies can use »Teuken 7B-instruct-commercial-v0.4« commercially for their AI applications under the »Apache 2.0 license«.

  • Basic models are particularly susceptible to the generation of inappropriate, offensive or harmful content. At the same time, base models offer the advantage that they can be developed into powerful special models through fine-tuning and instruction tuning if used correctly and responsibly.
    For this reason, the »Teuken 7B-base-v0.4 base« model is not published, but companies and other stakeholders interested in the base model can contact Fraunhofer IAIS so that the use of the base model can be coordinated and supported.

    Inquiries by mail to: contact@opengpt-x.de

  • Currently no. The EU AI Act will not apply until August 2025. AI models that were placed on the market before this date do not have to comply with the requirements of the EU AI Act until August 2027 (grandfathering).

  • The OpenGPT-X research project is nearing completion with the release of »Teuken 7B-instruct-v0.4« and will run until March 31, 2025. Until then, we will continue to optimize and evaluate the model. There is development potential for the model for relevant tasks such as logical thinking, coding and mathematics as well as bias and toxicity. Furthermore, by continuing the model training, we can increase the number of tokens (context window) processed simultaneously by the model.

    As this is an open source project, we also assume that adapted or specialized versions of the model will be developed for different applications by the scientific community or companies.

  • Our scientists are in contact with the LLM community via the OpenGPT-X Discord server. This is also the place for questions and feedback about the model.

Digital sovereignty for Europe

About OpenGPTX

The OpenGPT-X project, comprising ten partners, commenced on January 1st. The project commenced in January 2022 with funding from the Federal Ministry for Economic Affairs and Climate Action (BMWK) to the value of approximately €14 million and is scheduled to conclude on 31st December. March 2025. The project, led by Fraunhofer IAIS and Fraunhofer IIS, is investigating the entire value chain of generative AI. This includes high-scale GPU-based infrastructure and data for training large language models, model development, and productive application in the form of prototypes and proof of concepts (PoCs). The project's overarching goal was to develop a large open-source AI speech model for research and industry, tailored to Europe's multilingual needs.

The release of Teuken 7B-base-v0.4 marks the achievement of this goal, offering a publicly funded alternative for future scientific investigations and economic applications of generative AI.

 

 

Teuken-7B Trennerbild

Teuken 7B webinar

We recommend all interested parties to participate in our free webinar.

Please register via the following link:  

register: webinar in German

register: webinar in English

The webinar serves as an introduction to Teuken and LLMs. If you have already taken part in a webinar or have a specific request, you can also start directly with a consultation appointment. Please use the following form:

Teuken-7B consultation date

* Required