Project
AI4KNOWLEDGE
Semantic analysis of texts
Objective
The purpose of this solution is the creation of a tool, based on artificial intelligence techniques, which allows to:
- Extract text, tables, images and other elements from newspaper pages, scientific publications, manuals, data sheets, etc .;
- Translate, through OCR, images containing text into actual text;
- Subject the texts obtained to semantic analysis, with the dual purpose of obtaining the indexing of the contents and reconstructing the text in a web friendly form;
- Create a question response system that automatically answers questions asked through natural language, extracting content from the knowledge base created in the previous step.
Pipeline

Image Processing
The image undergoes a series of transformations that serve to identify the regions of interest.

Text extraction & OCR

Text validation


I] Magazzino cooperativo é un albero magnifico, i cul rami s’allargano e si rinnovano ogni di pil; 6 uno splendido fuoco che riscalda e riverbera la sua luce dappertutto. Ben a ragione gli operai di Rochdale assunsero il nome di Probi Pionieri; il pioniere é intrepido americano che apre i primi solchi nelle vergini foreste, e questi Pionieri di Rochdale hanno schiuso alle elassi lavoratrici la via dell’avvenire.
Luzzatti
Text is recomposed on a single line, without carriage return, non-alphanumeric characters and punctuation are removed strings less than two characters are removed keywords are removed, reducing the text to a keyword list each word key is validated on a dictionary of approximately 1 million words. Invalid words are replaced by the closest dictionary word based on measurable criteria.

Final listing of keywords
=
semantic domain of the text fragment
Question answering
Answering questions is not anymore looking for a string in a text, but for a concept in a piece of knowledge (ontology), according to the context.

The set of all semantic contexts collected from the various text fragments is stored in a database with the level of accuracy of the page, and constitutes the ontology on which the answers provided to the user are based.
Questions are forwarded to the database, which uses full-text search to search for answers sorted by rankings.

Request an online demo
One of our consultants will assist you with the explanation.