Entity Extraction Scripts

What is EES?

Entity Extraction Scripts (EES) is a scripting protocol that can be called during Entity Extraction. It allows you to create your own Text References and Connections (links) between Text References.

Why use EES?

Sintelix has in-built Entity Extraction which will mark up, for example, Persons, Organisations, Locations and Date times.

Dictionaries contain lists of words so you can mark up specialised terms, such as Drugs, Weapons, Aircraft, etc.

However, sometimes you want to capture:

  • text patterns, such as Licence Plates

  • combinations of words, for example, a Drug near a Weapon in a sentence.

In addition, with EES you can:

  • add features to Text References

  • create connections (links) between Text References

  • add images to entities

  • set Document Tags or Properties based on Text References found in a document.

Dictionaries and EES

With Dictionaries, you can use lists of words and phrases to create Text References, including adding features to the Text References created.

You can refer to Dictionary wordlists in your scripts or use the Text References created by Dictionaries.

When is EES applied?

You can assign EES configurations in the Document Processing configuration to be applied during Ingestion.

You have the choice of two stages in the Sintelix workflow for your EES - Early and Late.

  • Early applies EES before the built-in learned entities (with link types such as tag:Person, tag:Organisation and tag:Location) are extracted.

  • Late applies EES after the built-in entity extraction. If you want your EES to match Text References created during the built-in Entity Extraction, you need to set the EES to run in Late Stage processing.

Acronym detection is enabled only for a text references created by an EES added at the early stage. Acronyms will not be identified for text references created by a late script, therefore acronym detection also should be handled by the script.

How does EES work?

An EES contains a list of Rules.

Rules contain a:

  • Pattern which Sintelix checks for in the text of ingested Documents, and

  • Output Phrases, which tells Sintelix what to do once the Pattern has been matched.

Rules operate in sequence. A Rule can refer to the output of a Rule that has been run previously.

EES scripts can be conditional, for example, only run if a Document has a particular tag.

EES uses Annotations

An EES looks for Patterns. These patterns are identified by the Annotations and, if required, their features identified in the Document. A Text Reference is a type of Annotation.

A simplified example:

  • a Dictionary has been applied mark-up all Weapons.

  • another Dictionary has been applied to identify targets.

  • an EES will check if a Weapon and a Target occur in the same sentence and create a connection (link) between the Weapon and the Target.

Editing and Testing Scripts

You can edit and test EES using the Text Graph Analyser.