Entities and Text References
Key Concepts Summary
There are several Key Concepts and terms it is useful to understand when using Sintelix.
This topic defines Entities and Text References.
Entities
Entities are things, including:
-
real world objects such as people, locations and organisations;
-
temporal and numeric expressions such as dates, money and other numeric measures; and
-
abstract concepts such as religion, ideology and criminal charges.
Almost anything can be an entity, for example, emotions, items of clothing, drugs, weapons, and events.
The Entities you want to identify depends on the type of analysis you are conducting. The Entities identified in documents is defined by the Ingestion and Document Processing configurations.
Entity Lifecycle
Entities
The diagram below illustrates the differences between Text References, Entities and Nodes - collectively called Entities.
-
Entity Extraction identifies and marks up Text References in each Document.
-
Entity Resolution combines related Text References into a single unique Entity in that Document.
-
Clustering combines the related Entities across all the Documents in the Collection into a single unique Node in the Network.
Text References
Text References are mark ups in Documents as a result of Entity Extraction during Ingestion.
A Text Reference can become an Entity or remain a Text Reference Only, depending on the Ontology.
Entities and Text References are marked up in Documents in different ways:
|
Text Reference Only: Underlined. A Text Reference that is not resolved into an Entity is underlined. |
|
Entity: Box A Text Reference that has been resolved into an Entity has a box around it. Selecting an entity will open the Entity pane and highlight the Text References for the same Entity in the Document. |
|
Anaphor: Double Underlined. An anaphor is a word used to refer back to a word used earlier in the text, in this case an Entity. For example, he or she for a person, it for an object. |
Ontology
The Ontology controls the parameters for each type of Entity and Link. The Ontology configuration is assigned to the Ingestion configuration.
Text References are marked up with an Entity Class, for example, Person.
The Entity Class:
-
name is used to tag the Text References (see Tag Labels below)
-
defines if the Text Reference will be resolved into an Entity, remain a Text Reference Only or not be marked up in Documents
-
assigns the Entity Class a colour, applied to Text References.
-
assigns the Entity Class an icon, used in Network Graphs.
Connections are marked up with a Link Type, for example Person-title. The Link Type assigns a colour and identifies the Entity Classes that the link connects.
Tag Labels
Each Text Reference is tagged with the name of the Entity Class. When editing documents you can choose whether or not to show the tag labels.
|
Tag Labels not showing. |
|
Tag Labels showing. |
|
Tag Label for an Anaphor. |
Features
Text References and Connections can have Features added, where each feature has a name and a value, providing additional information about the Text Reference/Connection.
Viewing Documents
When viewing or editing documents, you can select Text References to view the details about the Entity.
The diagram below describes the ways Entities and Text References are displayed when viewing documents.
The diagram below describes the ways Entities and Text References are displayed when editing documents.


