Skip to content

Test Sets

You can provide test sets labeled manually outside Duet so that you can get an independent assessment of the quality of your models. Duet provides a quality metric that is updated with every model update and it is calculated on the whole sampling set (unlabeled dataset) associated with the model. However, the manually labeled test set is an additional optional checkpoint for you on the model quality. Test sets are accepted in 2 formats, namely: CSV format and JSON format.

  • The CSV format follows the BIO (Beginning, Intermediate, Outside) classic format for entity extraction labels. The CSV file has three columns labeled 'Sentence', 'Word' and 'Tag'. In this format, each token (i.e., word) has a B, I, or O label. A period or a punctuation mark is also considered as a token. The label of each token is represented in the 'Tag' column. If the label is O, it means the word is outside the entity and is not part of any entity. If the label is B-entity type, it means the word is the beginning of an entity type. If the label is I-entity type, it means the word is a continuation of an entity type that spans multiple words. Use BIO format when your schema is flat with one entity node like "Organization" or "Location". You can find an example of a BIO file format here.
  • The JSON format follows a key-value pair comparison. The text of the document is represented under the "content" tag followed by a set of tags that carry the labels. For example, in the test set linked, there are labels for Graduation Year, College-Name and Location and points (which have the start character index, end character index and the entity value within). You should use JSON format when your model has a hierarchical schema like the set of skills a candidate has in their resume. You can find an example of a JSON file format here.

Once you upload a manually labeled test set where the entity labels in the test set exactly matches your model schema (more information below in Step #4), Duet will calculate a quality metric (F score) for each category in your schema. There is a maximum limit of 5 MBs on the size of a test set you upload.

  1. Press the Test tab at the top nav bar to test your model with a test set.

  2. Select the Upload testset tab at the top of the Test your Model widget.

  3. Either upload a new file or select one of the previously uploaded test sets.

  4. If you upload a new test set, you need to ensure that it's in the proper format. Test sets for entity extraction in BIO format have three columns that stand for Beginning, Inside and Outside. In this format, each token (i.e., word) has a B, I, or O label. Even the period is also considered as a token. The label of each token is represented in the 'Tag' column. If the label is O, it means the word is outside the entity and is not part of any entity. If the label is B-entity type, it means the word is the beginning of an entity type. If the label is I-entity type, it means the word is a continuation of an entity type that spans multiple words.
    Test sets in BIO format can only test model schemas that are flat with no children. In the sample linked above, the different tags following the B, I and O prefixes, such as "per" and "gpe" are to be used on different models, where the name of the single category in the schema is "per" or "gpe". The name of the entity must match exactly the label name in the test set.

    Test sets for entity extraction in JSON format follow a key-value pair format. The text of the document is represented under the "content" tag followed by a set of tags that carry the labels. For example, in the test set linked, there are labels for Graduation Year, College-Name and Location and points (which have the start character index, end character index and the entity value within).
    Meanwhile, test sets in JSON format can test entity extractors with hierarchical schemas. Please note that the full path of the entities in the JSON nested structure has to exactly match the schema.

    Here is a sample in BIO format. Here is a sample is JSON format.
    .

  5. Select the test set that you'd like to use. Click "Test" and the results for each category will display on the right.

    The results show the F score which is a popular quality metric to reflect how good each entity is. When the F score for an entity is above 50%, it is shown in green. Otherwise, the quality metric is red. Some tags are ignored. They are ignored because their labels don't match any entities in the schema.

Back to top