Schema¶
Your schema is the categories that you define and use to label your data. The schema you define is what you would see in the JSON output of the model when you deploy it. The schema is initially planned for then iterated over time. You can start a schema entirely from scratch or use a template provided for Duet if it fits your data. You can check the validity of the template schema from the Template Library tab. Learn more about building with templates .
In machine learning, it is challenging to come up with a labeling schema (e.g. your customer support categories) without having seen enough examples of the documents in your data you want to analyze. The classical way of discovering the schema is to look at some samples of the data and decide one such initial schema, write some annotation guidelines based on what you have seen in the sample data and send it to some human annotators to label the data. The human annotators usually discover new patterns in the data that they cannot label in accordance with the annotation guidelines. The human annotators also get confused and introduce noisy labels. In this situation, the annotators go back to the product manager asking for guidance who in turn changes the schema based on the newly discovered patterns. The annotators must start labeling from scratch against the new data schema. This process is quite lengthy, erroneous and costly.
Duet presents a unique solution to this problem by enabling users to build their schema incrementally as they explore the data. Duet enables users to define their custom hierarchical schemas where they can add, delete, rename, or move nodes without losing their previous labeling work. This unique capability applies throughout the lifecycle of the model even after the model is deployed. Duet users can leverage the feedback loop in the system to update their models after deployment based on the real traffic received by the model. This enables models to adapt to changes in data distribution.