Tungsten Clarity Page Recognition Profile Settings window

The Tungsten Clarity recognition engine differs slightly from the other engines. It is specialized to recognize text in any image, including a picture of an ID badge worn by a person, or a photo of a sign. Also, instead of performing OCR on the Tungsten TotalAgility server, OCR is performed on a remote server via an internet connection.

You require a separate license to use this recognition engine.

Communication between the Tungsten Clarity recognition profile and its server use port 443. This port must be open in your Firewall settings.

If you perform recognition using Tungsten Clarity during testing in the Transformation Designer or at runtime, one volume license per page is consumed each time an internet connection is made. If you want to avoid consuming too many runtime licenses during project configuration and testing, Tungsten Automation recommends that you test Tungsten Clarity on selected images only, and that you perform OCR on other test images or training images using another recognition engine.

During configuration, the best practice is to run this recognition engine a few times without a fallback recognition engine configured. This ensures that everything is working as expected and that the proper internet access if available.

You can use this window to set the Tungsten Clarity full-text OCR profile. The settings are separated into the following groups:

You cannot call page recognition from script using the Tungsten Clarity recognition engine.

The following buttons are available at the bottom of this window:

Button

Description

OK

Closes the window and saves your changes.

Cancel

Closes the window without saving your changes.

Help icon  Help

Displays the help for the open window.

Languages

This group enables you to select one or more specific languages, or allow the recognition engine to determine the language itself.

This group has the following settings:

Automatic language detection

Select this to allow the Tungsten Clarity recognition engine to determine the language of a document. (Default: Selected)

Selected languages

Select this setting if you want to explicitly specify what languages are used in your documents.

Once selected, the list of languages is enabled. Select one or more languages.

The list of available languages depends on which Recognition mode is selected.

If your documents are in Chinese, Greek, Hebrew, Japanese, Korean, or Thai, these are not supported by Document Mode. These languages are supported by Text mode, but your documents may not be suitable for that mode. For the best results for documents with these languages, thoroughly test both modes with the selected language as well as the Automatic language detection setting to see which combination performs best. Alternatively, select a different recognition engine.

If you are not sure what languages are used in your documents, use the "Automatic language detection" setting because it provides better OCR results than if the wrong languages are selected.

General Settings

This group enables you to specify how a document is recognized with the Tungsten Clarity engine.

This group has the following settings:

Recognition mode

Select one of the following modes of recognition.

  • Document Mode. (Default: Selected)

    Select this mode if your documents are classic paper documents, forms, or densely packed text images. For example, an invoice or a bank letter.

  • Text mode.

    Select to detect and extract text from images with a small amount of text. For example, a photo ID card.

Word separation characters

Use this field to define what characters may separate words.

(Default: /:()-# (forward slash, colon, open and close parentheses, hyphen, pound))

Endpoint for OCR processing

Enter the URL for the Tungsten Clarity recognition engine. By default, the global endpoint is provided.

Since Tungsten Clarity may run in multiple locations, specify the location that best suits your needs.

Fallback Profile

If the Tungsten Clarity recognition profile is temporary unavailable, you can configure what recognition engine is used instead. This ensures that a broken network connection or a problematic image does not hold up processing with failed OCR results.

Recognition profile to be used as fallback

Select a page recognition profile that performs OCR if the Tungsten Clarity recognition profile is not available. (None)

The fallback profile is used in the following circumstances:

  • There is a network issue or a broken internet connection and Tungsten Clarity is not available.

  • The Tungsten Clarity license server is not available.

  • The image quality is bad and not recognized by the Tungsten Clarity recognition engine.

  • The image size exceeds the image size supported by Tungsten Clarity:

    • The maximum image size supported by Tungsten Clarity in 10MB after it is encoded and converted to .PNG format before it is sent to the Tungsten Clarity server.

    • For a .TIF image, the size on disk should not exceed 2.5MB per page.

    • For a .PNG image, the size on disk should not exceed 5MB.

    • For a .JPG image, the image compression factor has different limits:

      • For a picture compressed with a high compression rate, the size on disk should not exceed 1MB.

      • For a picture compressed with a low or normal compression rate, the size on disk should not exceed 3MB.

Related topics: