Languages
Languages
Every language has its own vocabulary and grammatical rules. Therefore, each language requires its own Entity Extraction method, called a pipeline.
Language pipelines need to be licensed.
Select the Status tab to view the language pipelines supported, installed and licensed. (Enhanced) displayed after the language name indicates the Entity Extraction enhanced mode is installed. For example,
.
Language Detection
When a document is ingested, the language is detected.
If the detected language pipeline is:
-
available, that pipeline is used.
-
not available, the English pipeline is used.
The detected language and pipeline are displayed in the Document Properties section:
If Enhanced Entity Extraction has been applied, the pipeline_type will be show the value enhanced.
Language Pipelines
The following are the languages Sintelix supports, depending on the Entity Extraction Mode: Baseline or Enhanced.
|
Language |
Baseline |
Enhanced |
|---|---|---|
|
Arabic (ar) |
|
|
|
Chinese (zh) |
|
|
|
Dutch (nl) |
|
|
|
Farsi (fa) |
|
|
|
French (fr) |
|
|
|
Bahasa Indonesia (id) |
|
|
|
Italian (it) |
|
|
|
Latvian (lv) |
|
|
|
Lithuanian (lt) |
|
|
|
Portuguese (pt) |
|
|
|
Russian (ru) |
|
|
|
Spanish (es) |
|
|
|
Thai (th) |
|
|
|
Turkish (tr) |
|
|

