Auto Extract Locator
The Auto Extract Locator method uses a large language model alongside natural language instructions to extract data from a document. This means users can provide a simple worded description to explain what data you want to extract. There is no need for additional configuration, scripts, or training.
The connection to the large language model is configured in Tungsten TotalAgility. An API Key for the LLM provider is required.
You can add multiple subfields to the locator by providing a description of the type of data you are extracting. This means that one locator method can extract multiple pieces of information from a single document.
It may be possible to extract a field without a description if the field name itself is descriptive enough. For example, "Invoice Number" or "Social Security Number". If you do not see the desired extraction result, add a more detailed description.
It is also possible to get information that is inferred rather than directly printed on the document. For example, Tungsten TotalAgility can determine the language of a document even if the name of language is not printed on the document anywhere. This is because the first few words are examined and if they are all in the same recognized language, that language is returned.
The extraction type of a Auto Extract Locator is "group" which means that this locator has multiple subfields. When the locator is first created, no subfields are available so it is necessary to configure the locator before creating and assigning fields. Once subfields are available you can click on individual field links or click on the Create and Assign Fields button to create and assign all simple fields at once.
Currently, this functionality works best for Western languages. As a result, the results for other languages such as Chinese or right-to-left languages cannot be guaranteed.
The Properties of Auto Extract Locator window has the following tabs:
Related topics: