Named Entity Locator
The
Named Entity Locator is used to assign extracted entities to fields by using the
Natural Language Processing engine. This engine takes several seconds per
page longer than regular recognition. This locator method extracts named entities such as
a people, places, organizations, roles, times, amounts, etc. These named entities are found in unstructured, natural language text
like sentences found in emails, documents, or even a report, to name a few.
Named entities are text groupings such as a people, places, organizations, roles, times, amounts, etc. These named entities are found in unstructured, natural language text like sentences found in emails, documents, or even a report, to name a few. Each entity occurrence is called a mention of the selected entity type. The Natural Language Processing engine returns the mention with the highest confidence and that mention is highlighted on the document. However, the mention does not always match the extraction entity text itself.
In some cases, the Natural Language Processing engine recognizes and highlights text on a test document but returns a different extraction result. This is because the Natural Language Processing engine is trained to recognize certain entities in a specific way. The extraction results can differ from the highlighted text when there was a more commonly used or official term when the Natural Language Processing engine was trained.
For example, if the word U.S. is found on a document for a country entity, then the Natural Language Processing engine returns "United States of America." This is because as a country entity, the recognized U.S. value is replaced with the full country name. Similarly, if the word Amazon is located on a document for a company entity, the Natural Language Processing engine returns "Amazon.com" because this is the official company name.
Because this type of data is difficult to extract with other locators unless you have a known set of criteria to search for, such as a database with people names, this is where the Named Entity Locator comes in. It is able to extract data based on its grammatical placement in a document.
If you need to extract more than one type of entity, a separate Named Entity Locator is required for each entity type.
Once the entities are extracted, a customized script is needed to interpret the entities. For more information on scripting, refer to the Tungsten TotalAgility Scripting Help.
When configuring a Named Entity Locator, the biggest decision is how much information you want. If you are interested in the entity name only, use a simple field, as this is the only information returned. If you want to know the entity name, entity type, confidence, and sentiment of a document, use a table field.
The extraction type of a Named Entity Locator is determined by its configuration. When the extraction mode setting is set to Simple Field then the extraction type is "single". When the extraction mode setting is set to Table Field then the extraction type is "table".
By default, the extraction mode is set to Simple Field which means that the extraction type is "single" and this locator returns a single piece of information.
If the Named Entity Locator is configured for a table field, then the extraction type changes to "table" and this locator returns tabular information. Before you can map the field you must select a table model.
Click the Create and Assign Fields button to create either a simple field or a table field that is mapped to this locator.
The Properties of Named Entity Locator window has the following tabs that enable you to configure the extracted entities.
Related topics: