Add (Ingest) Documents

Introduction

Source files are added to Sintelix to be ingested. Files are converted into standardised Documents to be stored in Collections. The content of Documents is then analysed, identifies Text References, Entities and Links.

Many different types of source files can be added: text, images, audio/video, emails, databases, etc.

You can:

Key Concepts

It is useful to understand how the Ingestion configurations selected can impact how documents are loaded and ingested.

See Ingestion, Ontology and Network Creation.

Add Documents tab

When you select the Add Documents tab, Unstructured ingestion with the Server Library source location are selected by default. You can navigate to different source locations by clicking on their name.

Add Documents process
On Demand or Scheduled

You can add files any time. You can also use the Ingestion Scheduler to automatically check for new or updated files in a specific library, or new emails from a server, then add them into a Collection. For more information, see Scheduling Ingestion.

File Formats

Sintelix can ingest files in many different formats. Some formats need to be installed and/or configured, including:

Other file formats have special features, including:

  • geospatial files which capture geographical data that can be displayed on a map. See Geospatial Files.

Source Location

You can navigate to different source locations by clicking on their name displayed across the top of the Add Documents pane.

You can add source files from the following locations:

Locations

Description

Unstructured

Structured

Server Library

A shared folder location. The Server Library is the fastest and therefore recommended for large volumes of data.

See Add files from a File Library

Local Files

You can upload files from your device. It can be the slowest method to load files.

Each file must not exceed 100 MB.

See Upload Local Files

From URLs

You can provide a list of URLs.

Files in the URL locations will be loaded.

See Add Files from URLs

AWS S3 Bucket

The AWS S3 Bucket needs to be configured and may have a different name (see Configure and Secure Libraries).

You load files from an AWS S3 Bucket library in the same way as a Server Library.

See Add from Server Library

Mail Server

Connects to an Email server, if configured (see Configure and Secure Libraries).

Requires the user to login to the Mail server to access the emails.

See Add from Email Server.

As an alternative, emails can be exported as email files and added using one of the above options.

 

From Data Source

Connects to a range of data sources, for example a database.

See Add from Data Source Connection.

A Data Source Connector needs to be configured. See Data Source Connectors (Legacy).

Currently only available for unstructured ingestion.

As an alternative, data can be exported as a text file (Delimited or Fixed Width) from the Data Source and ingested using Structured Ingestion.

 

External API

Using an External API to connect to an external API service such as X (formerly Twitter).

See Ingest tweets from X (Twitter).

 

The Administrator can add additional libraries, including server library, AWS S3 Bucket library and Mail Server libraries. see Configure and Secure Libraries.

Unstructured or Structured Ingestion

You can choose to ingest files using either:

  • Unstructured file ingestion (default), or

  • Structured file ingestion.

The key differences between the two methods of ingesting data is summarised below:

Function Unstructured Structured
Ingestion

The default Ingestion configuration for a Collection is pre-selected.

You choose a different Ingestion, if required.

You choose a Structured Import configuration when adding files.

The Structured Import configuration has the Ingestion configuration integrated into the settings.

Files

You can ingest a wide range of files. The files are converted into standardised Documents. The text content of the files is stored in the Content section of the Document.

 

You can ingest Excel, delimited, fixed width, JSON files and Database tables.

Each Structured Import configuration is designed for files with a specific schema.

That is the files have the same type and number of fields.

The most common application would be to import data extracted from software or a database.

Entity Extraction

Entity extraction examines text in the Document Content section to identify text references, entities and links.

The Structured Import configuration extracts entities and links from the data in the file according to rules defined within the configuration, which are loaded directly into the Network. You do not need to create Documents.

If a field contains unstructured data, you can create a Document and save to the field content to the Content section in the Document. Entity extraction is carried out on the Content section of the Document, the same as for Unstructured files.

Unstructured Ingestion

When you add documents using unstructured ingestion, the default Ingestion configuration for that Collection is pre-selected.

The currently selected Ingestion Configuration is displayed to the left of the Add Documents pane.

The Ingestion Configuration defines what information is extracted from the documents.

You can change the Ingestion Configuration, if required (but is not recommended).

To add files using unstructured ingestion you:

  1. Change the Ingestion Configuration, if required.

  2. Select the checkbox if you want to be emailed when the ingestion is completed. This can be useful if adding a large batch of files.

  3. Select the files to add from the Source Location. Each source location has a different way of selecting files:

Document Creation and Document Persistence

A Document is always created as part of the Sintelix ingestion process.

However, the Ingestion configuration selected can have the Document persistence setting unchecked, which means Documents are deleted once they have been processed and are not saved in the Collection. The information extracted from the Documents is stored in the Network only.

Document Persistence Disabled

When an Ingestion configuration has Document Persistence disabled, the document persistence disabled icon is displayed.

Structured Ingestion

You can try importing sample Structured Import configurations. See Structured Import Example.

To add documents using Structured Ingestion, select the Structured option.

To add documents using structured ingestion you:

  1. Select the Structured ingestion option.

  2. Change the Structured Load configuration, if required. The Structured Load configuration is designed for files with a specific schema (structure and file type). You need to match the Structured Load configuration to the type of file you are adding.

  3. Result: Messages are displayed showing the status of the configuration.

    See Structured Ingestion Messages.

  4. Enter a value for the filter, if prompted.

    See Prompt for Filter Parameter

  5. Select the checkbox if you want to be emailed when the ingestion is completed. This can be useful if processing a large batch of files.

  6. Select the files to add from the Source Location.

Structured Ingestion Messages

Once you have selected a Structured Load configuration, messages are displayed showing the status of the configuration.

The possible messages include:

No document or network will be created.

This indicates the configuration settings are incomplete.

Expected output document and network.

This configuration will create documents and a network.

Filters with constant values are applied in the structured configuration. Click 'Manage' to view the filters.

This configuration has filters applied based on a constant value. You can click on the manage button to open the Structured Import configuration to view the filters applied to the data.

Input the parameter value for filters.

This configuration has a filter applied that requires a value to be entered into the table provided. See Prompt for Filter Parameter below.

Prompt for Filter Parameter

A Structured Load configuration can include filters that require a parameter value to be entered.

Filters can be set to include or exclude records based on the value entered, checking if the value is Equal to or Not Equal to the value.

In the example below:

  • the container called Sheet_Employee Management Data is being filtered on the Department field.

  • this filter includes records

  • where the value entered in the Parameter Value field equals the value in the Department field in the file(s) being added.