Sampling Set

Sampling set is the unlabeled data you obtain from your text streams for which you want to build an AI model to analyze automatically. In Duet, the starting point to build an AI model that can analyze your text data is to upload a sampling set in CSV or zipped folder structure formats. You can also pull your sampling set from Google Sheets, Zendesk or Reddit. The maximum size of your sampling set in Duet is 1GB. The sampling set you upload in Duet can be used to build one or more models. You don’t need to do any data cleaning to the sampling set before uploading it to Duet.

Duet assists the user in building a custom model by automatically choosing documents from the sampling set and asking the user to label. We call this process "automatic sampling". Duet has a unique technology to select the most informative document while respecting the original data distribution to avoid selection bias. Duet uses the most updated model to select the next document for the user to label, which optimizes the productivity of human effort by choosing the most informative example that the model is confused about.

