This article goes over importing new data in the Import Data and Save Model step of the model creation flow. You can import your own data whether you are creating a model using one of the Duet templates or from scratch. Please note that if you'd like to watch an instructional video on importing unlabeled data you can access it here.
When you are at the import data step in the model creation flow, you will see the following 3 tabs.
- The first tab is Data sources, which enables you to import datasets from your local drive or from external sources. Data can be imported from the local drive in .zip or .csv formats. You can also import data from platforms such as Zendesk, Google Sheets and Reddit.
- The second tab is Your data, which lists your datasets that you've previously imported to the portal.
- The third tab is Samples library, which allows you to access publicly available datasets.
In Duet, unlabeled datasets can be used to create one or more custom models: document classifiers or entity extractors. The unlabeled dataset is usually a collection of text documents related to your business or from your archived business data like customer support tickets and social media feeds for your products.
Zipped file or .CSV from your local drive¶
You can upload unlabeled datasets in one of the following formats from your local drive:
- Zipped file Format: A nested folder structure containing text files where each document is in a separate .txt file. You can upload this nested folder structure to Duet only as a zip file. You can find a formatted example of a file structure here and a zip with 1 CSV file here.
- CSV Format: A .csv file with at least one column whose header is “text” where each document is one record in the .csv file. If the .csv file has multiple columns, Duet will pick the one with the "text" header. You can upload the .csv file as-is or zip it and upload it as a zip file. If you have an empty row in the .csv file, Duet will ignore it. You can find a formatted example here.
If you want have data that you want to splice into different columns and rows to fit our formatting requirements, we suggest the Powery Query tool in Microsoft Excel.
Note: The upper limit for uploading unlabeled datasets to Duet is 1GB.
- If you click either Zipped file or CSV, a container will open up that will allow you to drag and drop accepted files or to browse your file explorer on your local machine.
- Once you have initiated data upload, the container will show the progress of uploads. Data upload can be canceled by pressing the x button to the top right of the progres bar.
- Once data is uploaded, click the "Continue" button.
- If you click the Zendesk option, a popup will appear that will ask you to enter a subdomain. Press "Add" to add the subdomain to the list. If you already have existing subdomains, the popup will list them as available for selection. You can choose one of them.
- After a subdomain is selected, you can filter data by date. Date is filtered via calendar views that are activated by engaging with From and To inputs.
- Once you press "Next", a progress bar appears showing the progress of data import from Zendesk in counts of 1000 items. You have the opportunity to pause and continue with the pulled data at any time. The number of documents pulled will be listed under the Zendesk icon.
- If you click the Google Sheets option, you will be redirected to the Google Authentication Flow.
- Authenticate your Google account. Please note that you must be signed in from only one Google account or be using incognito.
A form will appear showing your list of sheets. You can select one sheet to upload.
Once you've uploaded the data, the container will show the progress of the upload. The progress bar will fill up to blue, and can be canceled by pressing the "Cancel" button below the progress bar. The "Done" button in the bottom right of the form will remain grayed out until the progress bar is filled. The number of documents pulled will be listed under the Google Sheets icon.
- If you click the Reddit option, you will be redirected to the Reddit Authentication Flow.
- Authenticate your Reddit account.
A form will appear with two options: importing top subreddit posts or importing by search. Importing by top subreddit posts requires that you select one of the timeframes pictured below, and has you select the subreddit via a search function.
Importing by search requires that you select a subreddit by search, but allows you to add a query. This query will be further sorted by the options pictured below.
Once you've uploaded the data, the container will show the progress of the upload. The progress bar will fill up to blue, and can be canceled by pressing the "Cancel" button below the progress bar. The "Done" button in the bottom right of the form will remain grayed out until the progress bar is filled. The number of documents pulled will be listed under the Reddit icon.