OpenGPT-X

Teuken-7B – the European, open, multilingual large language model

Companies from all industries can now implement AI applications with Teuken-7B - the large language model from the OpenGPT-X research project is now available to download free of charge as open source on Hugging Face: Teuken-7B-instruct-v0.4 is trained from scratch with the 24 official languages of the EU and has seven billion parameters. Developers from research and companies can download Teuken-7B and adapt, supplement and further fine-tune it as a basis for their applications. After this step, a model is created that is optimized for specific use cases in the company.

Teuken-7B is available in two versions: »Teuken-7B-instruct-research-v0.4« can be used for research purposes, »Teuken-7B-instruct-commercial-v0.4« is available to companies for commercial purposes under the »Apache 2.0« license. The model has already been optimized for chat by »Instruction Tuning«.
 

Multilingual

  • Our model is fully multilingual, with training in all 24 EU languages.
  • It contains approximately 50 percent non-English pretraining data.
  • A performance comparison demonstrates that the model produces similar results across the linguistic spectrum.
  • Consequently, it reflects European characteristics, norms, and values, facilitating effective multilingual communication.

Open Source

  • The model is available for download at no cost from Hugging Face.
  • The "Apache 2.0" license permits adaptation, further development, and utilization of »Teuken-7B-instruct-commercial-v0.4« for commercial artificial intelligence applications.
  • Sensitive data may remain within the organization.
  • The research license allows you to freely use and further develop “Teuken-7B-instruct-research-v0.4” for research and testing.

Science-driven

  • The product was developed by scientists for commercial use.
  • The multilingual tokenizer allows for particularly energy-efficient training and operation of multilingual applications.
  • The European Leaderboard, developed by our team, is designed to evaluate and assess a range of models in response to multilingual tasks.
  • Podcast Knowledge Science: Mehdi Ali and Michael Fromm from Fraunhofer IAIS explain the development of multilingual European AI systems.

APPLICATION IN THE COMPANY

Download

Developers can download Teuken-7B free of charge under the Apache 2.0 license (or under a research license) on Hugging Face.

Demo appointments

Discover the potential of Teuken-7B. Reserve a demonstration slot with the experts today.

  • For companies: Free of charge.
  • 60 minutes

Get started with us

We adapt Teuken-7B to align with your organizational procedures. To learn more about our offerings or to schedule a consultation, please do not hesitate to contact us.

 

TECHNICAL INFO & RESEARCH

 

Model cards and benchmarks

Technical data regarding the model and its utilization. Comparative graphical illustrations and technical explanations of the model in relation to other models.

 

USE CASES

The following represents a representative sample of specific application examples drawn from a variety of sectors, including industry, healthcare, legal, finance, and media.

Publications and Code Repositories

Research results on multilingual language models

LLM Community

We respond to technical and scientific inquiries from the community and provide a forum for feedback and discourse on the OpenGPT-X Discord server.

FAQ

  • Teuken-7B is available free of charge in two license versions: »Teuken-7B-instruct-research-v0.4« can be used by the scientific community and companies for research purposes, »Teuken-7B-instruct-commercial-v0.4« is available to companies for commercial purposes under the »Apache 2.0« license.

    Teuken-7B-instruct-commercial-v.04 is comparable to the research version in terms of performance, although the research version achieves better results in the benchmarks by one to two percent. The reason for this is that some data sets used in the research version exclude commercial use and were therefore not used in the version for companies.

  • Teuken-7B is available for download free of charge and open source on Hugging Face.

  • Companies in particular have the opportunity to take part in free demo sessions in which Fraunhofer scientists explain which applications can be realized with appropriate further processing on the basis of Teuken-7B.

  • Teuken-7B is multilingual and has been optimized for chat through “instruction tuning”, so it can be used as a multilingual chatbot, e.g. in international customer service or to make company knowledge accessible to employees.

    The following other applications can be implemented with Teuken-7B:

    Areas of application:

    • Summarize documents
    • Generate texts
    • Extracting information from texts

    Teuken-7B can be further processed through Continued Pretraining, Finetuning, Instruction Tuning, Model Merging etc. in order to adapt the model to the company's own purposes. The result is a model that is optimized for the individual use cases in the company.

  • In order to adapt the model to your own business purposes, Teuken-7B-instruct-commercial can also be further processed with your own data through Continued Pretraining, Finetuning, Instruction Tuning, Model Merging, etc.


    The model performs well in a performance comparison with other open source models, but still has potential for development in the areas of logical thinking, coding and mathematics. In addition, Teuken-7B, like other large language models, can generate content that is inappropriate, offensive or harmful.

  • Teuken-7B-instruct is a chatbot that is primarily intended for corporate applications and research projects. Developers from companies and the scientific community can use it to develop their individual chat applications. Teuken-7B-instruct-commercial can also be further processed with your own data through continued pretraining, fine-tuning, instruction tuning, model merging, etc. to adapt the model to your own business purposes.

  • Yes, companies can use Teuken-7B-instruct-commercial-v0.4 commercially for their AI applications under the Apache 2.0 license.

  • Basic models are particularly susceptible to the generation of inappropriate, offensive or harmful content. At the same time, base models offer the advantage that they can be developed into powerful special models through fine-tuning and instruction tuning if used correctly and responsibly.
    For this reason, the Teuken-7B-base-v0.4 base model is not published, but companies and other stakeholders interested in the base model can contact Fraunhofer IAIS so that the use of the base model can be coordinated and supported.

    Inquiries by mail to: contact@opengpt-x.de

  • Currently no. The EU AI Act will not apply until August 2025. AI models that were placed on the market before this date do not have to comply with the requirements of the EU AI Act until August 2027 (grandfathering).

  • The OpenGPT-X research project is nearing completion with the release of Teuken-7B-instruct-v0.4 and will run until March 31, 2025. Until then, we will continue to optimize and evaluate the model. There is development potential for the model for relevant tasks such as logical thinking, coding and mathematics as well as bias and toxicity. Furthermore, by continuing the model training, we can increase the number of tokens (context window) processed simultaneously by the model.

    As this is an open source project, we also assume that adapted or specialized versions of the model will be developed for different applications by the scientific community or companies.

  • Our scientists are in contact with the LLM community via the OpenGPT-X Discord server. This is also the place for questions and feedback about the model.

Digital sovereignty for Europe

About OpenGPTX

The OpenGPT-X project, comprising ten partners, commenced on January 1st. The project commenced in January 2022 with funding from the Federal Ministry for Economic Affairs and Climate Action (BMWK) to the value of approximately €14 million and is scheduled to conclude on 31st December. March 2025. The project, led by Fraunhofer IAIS and Fraunhofer IIS, is investigating the entire value chain of generative AI. This includes high-scale GPU-based infrastructure and data for training large language models, model development, and productive application in the form of prototypes and proof of concepts (PoCs). The project's overarching goal was to develop a large open-source AI speech model for research and industry, tailored to Europe's multilingual needs.

The release of Teuken-7B-base-v0.4 marks the achievement of this goal, offering a publicly funded alternative for future scientific investigations and economic applications of generative AI.

 

 

OpenGPT-X demo and consultation dates

* Required