From pixels to content: We make knowledge accessible.

Document Analytics

In society as a whole as well as in many companies a significant percentage of our knowledge exists in form of texts. However, these texts are often only available as documents or scanned images which means they can be neither edited nor browsed through for specific content.

Our document content extrapolation solutions make document-based knowledge accessible, i.e. searchable and editable. Beneficiaries of this technology include publishers, libraries, public authorities and companies. We develop systems tailored to your specific requirements and offer our document content extrapolation services and other services.

Document content extrapolation is a three step process. The first step is to digitalize the documents if this has not already been done. The quality will be optimized for fully automatic including removing any tears or folding marks, smoothing the type face and enhancing image definition.

The second step is content extrapolation. During this phase images will be converted to text format using optical character recognition (OCR). The OCR engine we have developed is industry-leading in many respects, particularly when it comes to the recognition of difficult to read documents. Not only does it convert images into editable text, but it also allows the user to remove unwanted data such as irrelevant articles from newspaper pages.

During the third step – the semantic exploration – Text Mining processes make it possible to identify famous politicians or celebrities for example. These processes are then used as a basis for enriching and linking the documents with content from other sources.

Our technology can be applied across a number of disciplines, including:

  • AI-based selection of accounting data
  • Digital retrofit of archives
  • Document workflows

Portfolio of services

Our technology is used to make documents accessible by a wide variety of organizations including local authorities, publishers, libraries, software manufacturers, banks and hospitals. We develop individual solutions that can be integrated into your company. We also provide our customers with information services.

Licensing of AI-based document analysis software

Most current conventional text recognition programs are able to read approximately 99 % of text correctly provided the documents are perfectly legible. If the quality of the original is less than perfect, however, the recognition accuracy rate will be reduced. This is often the case with historic documents or documents that are particularly difficult to read because of the background behind the text or other distortions. For cases like these we have developed a self-learning OCR system based on Deep Learning methods. The solution can be tested and licensed via our marketing partner DE-Patentverwertung GmbH.

Development of prototypes

We develop prototypes for new challenges and for your specific document analysis applications. In doing so we optimize the quality and speed of the analysis.

Integration into productive systems

Our services and workflows are fully automated. We integrate them into your productive systems.

Order processing

We process documents directly on our servers on your behalf. Examples would include digitalizing archives and newspaper back issues.

Highlights

Making documents accessible for the Deutsche Digitale Bibliothek

The Deutsche Digitale Bibliothek, the German Digital Library, aims to bring together the wealth of Germany’s cultural and scientific heritage and make it available to a wider public via an Internet portal. This means that information from German libraries, archives, museums and scientific establishments needs to be made available in a linked digital format. Together with our partners we used our experience working on the analysis, access and presentation of Germany’s digital cultural heritage to develop the technical conceptual design of the German Digital Library and coordinated the efforts required to make it a reality.