System Configuration#

Several aspects of Label Sleuth can be configured through the system’s configuration file.

Configuration file#

The default configuration file is located at label_sleuth/config.json.

A custom configuration file can be applied by passing the --config_path parameter to the “start_label_sleuth” command. In that case the following command can be used to invoke Label Sleuth:

 python -m label_sleuth.start_label_sleuth --config_path <path_to_my_configuration_json>

Parameters#

The following parameters can be set in the configuration file:

Parameter

Description

first_model_positive_threshold

Number of elements that must be assigned a positive label for the category in order to trigger the training of a classification model.

See also: The training invocation documentation.

changed_element_threshold

Number of changes in user labels for the category – relative to the last trained model – that are required to trigger the training of a new model. A change can be a assigning a label (positive or negative) to an element, or changing an existing label. Note that first_model_positive_threshold must also be met for the training to be triggered.

See also: The training invocation documentation.

training_set_selection_strategy

Strategy to be used from TrainingSetSelectionStrategy. A TrainingSetSelectionStrategy determines which examples will be sent in practice to the classification models at training time - these will not necessarily be identical to the set of elements labeled by the user. For currently supported implementations see get_training_set_selector().

See also: The training set selection documentation.

model_policy

Policy to be used from ModelPolicies. A ModelPolicy determines which type of classification model(s) will be used, and when (e.g. always / only after a specific number of iterations / etc.).

See also: The model selection documentation.

active_learning_strategy

Strategy to be used from ActiveLearningStrategies. An ActiveLearner module implements the strategy for recommending the next elements to be labeled by the user, aiming to increase the efficiency of the annotation process. For currently supported implementations see get_active_learner().

See also: The active learning documentation.

precision_evaluation_size

Sample size to be used for estimating the precision of the current model. To be used in future versions of the system, which will provide built-in evaluation capabilities.

apply_labels_to_duplicate_texts

Specifies how to treat elements with identical texts. If true, assigning a label to an element will also assign the same label to other elements which share the exact same text; if false, the label will only be assigned to the specific element labeled by the user.

language

Specifies the chosen system-wide language. This determines some language-specific resources that will be used by models and helper functions (e.g., stop words). The list of supported languages can be found in Languages. We welcome contributions of additional languages.

login_required

Specifies whether or not using the system will require user authentication. If true, the configuration file must also include a users parameter

users

Only relevant if login_required is true. Specifies the pre-defined login information in the following format:

”users”:[
 {
   “username”: “<predefined_username1>”,
   “token”:”<randomly_generated_token1>”,
   “password”:”<predefined_user1_password>”
 }
]
* The list of usernames is static and currently all users have access to all the workspaces in the system.