System Configuration#

Several aspects of Label Sleuth can be configured through the system’s configuration file.

Configuration file#

The default configuration file is located at label_sleuth/config.json.

A custom configuration file can be applied by passing the --config_path parameter to the “start_label_sleuth” command. In that case the following command can be used to invoke Label Sleuth:

 python -m label_sleuth.start_label_sleuth --config_path <path_to_my_configuration_json>

Alternatively, it is possible to override specific configuration parameters at startup by appending them to the “start_label_sleuth” command. For example, to set up the system to work with text data in Arabic, one can set the system language by using the following command:

 python -m label_sleuth.start_label_sleuth --language Arabic

Parameters#

The following parameters can be set in the configuration file:

Parameter	Description
`first_model_positive_threshold`	Number of elements that must be assigned a positive label for the category in order to trigger the training of a classification model. See also: The training invocation documentation.
`first_model_negative_threshold`	Number of elements that must be assigned a negative label for the category in order to trigger the training of a classification model. See also: The training invocation documentation.
`changed_element_threshold`	Number of changes in user labels for the category – relative to the last trained model – that are required to trigger the training of a new model. A change can be a assigning a label (positive or negative) to an element, or changing an existing label. Note that both, `first_model_positive_threshold` and `first_model_negative_threshold`, must also be met for the training to be triggered. See also: The training invocation documentation.
`training_set_selection_strategy`	Strategy to be used from TrainingSetSelectionStrategy. A TrainingSetSelectionStrategy determines which examples will be sent to the classification models at training time - these will not necessarily be identical to the set of elements labeled by the user. For currently supported implementations see the get_training_set_selector() function. See also: The training set selection documentation.
`model_policy`	Policy to be used from ModelPolicies. A ModelPolicy determines which type of classification model(s) will be used, and when (e.g. always / only after a specific number of iterations / etc.). See also: The model selection documentation.
`active_learning_strategy`	Strategy to be used from ActiveLearningCatalog. An ActiveLearner module implements the strategy for recommending the next elements to be labeled by the user, aiming to increase the efficiency of the annotation process. See also: The active learning documentation.
`precision_evaluation_size`	Sample size to be used for estimating the precision of the current model when the precision evaluation function is invoked. Defaults to `20`.
`apply_labels_to_duplicate_texts`	Specifies how to treat elements with identical texts. If `true`, assigning a label to an element will also assign the same label to other elements which share the exact same text; if `false`, the label will only be assigned to the specific element labeled by the user. Defaults to `true`.
`language`	Specifies the chosen system-wide language. This determines some language-specific resources that will be used by models and helper functions (e.g., stop words). The list of supported languages can be found here. We welcome contributions of additional languages. Defaults to `ENGLISH`.
`login_required`	Specifies whether or not using the system will require user authentication. If `true`, the configuration file must also include a `users` parameter. Defaults to `false`.
`users`	Only relevant if `login_required` is `true`. Specifies the pre-defined login information in the following format: ”users”:[ { “username”: “<predefined_username1>”, “token”:”<randomly_generated_token1>”, “password”:”<predefined_user1_password>” } ] * The list of usernames is static and currently all users have access to all the workspaces in the system.
`main_panel_elements_per_page`	Number of elements per page in the main panel, i.e., document view. Defaults to `500`.
`sidebar_panel_elements_per_page`	Number of elements per page in the sidebar panels that use pagination. Defaults to `50`.
`snippet_max_token_length`	Max number of tokens after which a text snippet shown on the right panels is cut off. Defaults to `100`.
`right_to_left`	If `true` the text on the UI starts from the right of the page and continues to the left. Defaults to `false`.
`max_dataset_length`	The number of rows limit of the csv files. Defaults to `10000000`.
`max_document_name_length`	The max number of chars of document names. Defaults to `60`.
`cpu_workers`	The max number of CPUs to use for running jobs. Defaults to `100`.