System Configuration#

Several aspects of Label Sleuth can be configured through the system’s configuration file.

Configuration file#

The default configuration file is located at label_sleuth/config.json.

A custom configuration file can be applied by passing the --config_path parameter to the “start_label_sleuth” command. In that case the following command can be used to invoke Label Sleuth:

 python -m label_sleuth.start_label_sleuth --config_path <path_to_my_configuration_json>

Alternatively, it is possible to override specific configuration parameters at startup by appending them to the “start_label_sleuth” command. For example, to set up the system to work with text data in Arabic, one can set the system language by using the following command:

 python -m label_sleuth.start_label_sleuth --language Arabic

Parameters#

The following parameters can be set in the configuration file:

Parameter

Description

first_model_positive_threshold

Number of elements that must be assigned a positive label for the category in order to trigger the training of a classification model.

See also: The training invocation documentation.

first_model_negative_threshold

Number of elements that must be assigned a negative label for the category in order to trigger the training of a classification model.

See also: The training invocation documentation.

changed_element_threshold

Number of changes in user labels for the category – relative to the last trained model – that are required to trigger the training of a new model. A change can be a assigning a label (positive or negative) to an element, or changing an existing label. Note that both, first_model_positive_threshold and first_model_negative_threshold, must also be met for the training to be triggered.

See also: The training invocation documentation.

training_set_selection_strategy

Strategy to be used from TrainingSetSelectionStrategy. A TrainingSetSelectionStrategy determines which examples will be sent to the classification models at training time - these will not necessarily be identical to the set of elements labeled by the user. For currently supported implementations see the get_training_set_selector() function.

See also: The training set selection documentation.

model_policy

Policy to be used from ModelPolicies. A ModelPolicy determines which type of classification model(s) will be used, and when (e.g. always / only after a specific number of iterations / etc.).

See also: The model selection documentation.

active_learning_strategy

Strategy to be used from ActiveLearningCatalog. An ActiveLearner module implements the strategy for recommending the next elements to be labeled by the user, aiming to increase the efficiency of the annotation process.

See also: The active learning documentation.

precision_evaluation_size

Sample size to be used for estimating the precision of the current model when the precision evaluation function is invoked.

Defaults to 20.

apply_labels_to_duplicate_texts

Specifies how to treat elements with identical texts. If true, assigning a label to an element will also assign the same label to other elements which share the exact same text; if false, the label will only be assigned to the specific element labeled by the user.

Defaults to true.

language

Specifies the chosen system-wide language. This determines some language-specific resources that will be used by models and helper functions (e.g., stop words). The list of supported languages can be found here. We welcome contributions of additional languages.

Defaults to ENGLISH.

login_required

Specifies whether or not using the system will require user authentication. If true, the configuration file must also include a users parameter.

Defaults to false.

users

Only relevant if login_required is true. Specifies the pre-defined login information in the following format:

”users”:[
 {
   “username”: “<predefined_username1>”,
   “token”:”<randomly_generated_token1>”,
   “password”:”<predefined_user1_password>”
 }
]
* The list of usernames is static and currently all users have access to all the workspaces in the system.

main_panel_elements_per_page

Number of elements per page in the main panel, i.e., document view.

Defaults to 500.

sidebar_panel_elements_per_page

Number of elements per page in the sidebar panels that use pagination.

Defaults to 50.

snippet_max_token_length

Max number of tokens after which a text snippet shown on the right panels is cut off.

Defaults to 100.

right_to_left

If true the text on the UI starts from the right of the page and continues to the left.

Defaults to false.

max_dataset_length

The number of rows limit of the csv files.

Defaults to 10000000.

max_document_name_length

The max number of chars of document names.

Defaults to 60.

cpu_workers

The max number of CPUs to use for running jobs.

Defaults to 100.