In society as a whole as well as in many companies a significant percentage of our knowledge exists in form of texts. However, these texts are often only available as documents or scanned images which means they can be neither edited nor browsed through for specific content.
Our document content extrapolation solutions make document-based knowledge accessible, i.e. searchable and editable. Beneficiaries of this technology include publishers, libraries, public authorities and companies. We develop systems tailored to your specific requirements and offer our document content extrapolation services and other services.
Document content extrapolation is a three step process. The first step is to digitalize the documents if this has not already been done. The quality will be optimized for fully automatic including removing any tears or folding marks, smoothing the type face and enhancing image definition.
The second step is content extrapolation. During this phase images will be converted to text format using optical character recognition (OCR). The OCR engine we have developed is industry-leading in many respects, particularly when it comes to the recognition of difficult to read documents. Not only does it convert images into editable text, but it also allows the user to remove unwanted data such as irrelevant articles from newspaper pages.
During the third step – the semantic exploration – Text Mining processes make it possible to identify famous politicians or celebrities for example. These processes are then used as a basis for enriching and linking the documents with content from other sources.
Our technology can be applied across a number of disciplines, including:
- AI-based selection of accounting data
- Digital retrofit of archives
- Document workflows