Evaluation Settings tab - Properties of Format Locator window

Format Locator icon The format locator works with format definitions such as pattern matching (regular expressions and simple expressions) and advanced algorithms (Levenshtein and trigrams). The format definitions in partnership with dictionaries and keywords are used to extract data from documents, without the need to define zones. The locator runs on a full or partial page read of the document to extract the data using searches that are specific to the data, not the document layout. The locator evaluates the found alternatives and the data output.

Use this tab to configure format evaluations.

Keywords

Use the following buttons to manage the keywords:

Button

Description

Delete

Removes the selected keyword from the keywords table.

Add

Creates a new keyword using the specified keyword settings and adds it to the table below.

Modify

Changes the selected keyword and updates the keywords table with the new settings.

Clear

Removes each of the settings from the selected keyword so you can start again.

This group has the following settings:

Ignore these characters

The characters typed here are ignored when searching for keywords. The default values are !/().:, and ; for this setting.

This setting does not work when a dictionary is inserted in the same format locator.

Keyword

Type a word or phrase that is likely to be found on a document. You can also select a dictionary if the list of keywords is long and in database format. The presence or absence of this keyword is used to rank the results that match the formats.

Match each word exactly (not fuzzy)

Enable this setting if you want to match a keyword phrase exactly. At least one of the words in the phrase should exactly match words on the document. Any recognition errors can lower the confidence of the desired alternative. (Default: Cleared)

If you are processing PDF documents, no recognition is performed, and the embedded text is used instead. This means that you can use this setting without fear of your alternatives receiving lower confidences because there is no chance of recognition errors.

Match all words as a phrase

Enable this setting if you want to search a document for each of the words included in a phrase. If the phrase contains recognition errors, it can be included in the result, but with a lower confidence. (Default: Cleared)

Weight

Type in a value or use the slider to specify a weight. A positive number indicates that a keyword is near the desired value and a negative number indicates that any matches near that keyword should be excluded from the results. The value for this setting is set to 100 by default.

Distance

Select one of the three distance settings of near, medium, and far, to indicate how far your keyword is located from the desired result. The value for this setting is set to Near by default.

Keyword in relation to match

Select the direction of the keyword in relation to the desired result. The eight directions include W, NW, N, NE, E, SE, and SW. The W, NW, and N directions are selected by default.

Use the following buttons to manage your keywords:

The following buttons are available at the bottom of this window:

Button

Description

Close

Closes the window and saves your changes.

Test icon   Test

Tests the locator settings. The results are displayed on the Test Results tab that is displayed automatically when you click this button.

Depending on the locator method, this button may have additional modes if the locator uses other locators as input.

Help icon  Help

Displays the help for the open window.

Related topics: