Do and Don't¶
|Do Plan Your Schema||Don't Deploy to an Endpoint too Quickly|
|Do Define Distinct Schema Categories||Don't Add too Few Features|
|Do Add Proper Features to Schema Categories||Don't Add Indistinct Features|
|Do Build Your Model Iteratively||Don't Label Too Many Documents of the Same Pattern|
|Do Monitor Performance Regularly||Don't Train and Publish With Every Document Analyzed|
|Do Leverage Suggested Features|
|Do Resolve Conflicts As Soon As They Appear|
Do Plan Your Schema¶
Before you start building your model's schema, figure out how you plan to use this model. The more you know about its function and endpoint, the better you can plan.
- Research end users
- Outline to the end-to-end user experience
- Collect information about what type of data the model will be used on
Don't Deploy to an Endpoint too Quickly¶
Publishing too quickly, without enough model teaching, would result in a poor quality model.
Do Define Distinct Schema Categories¶
Make sure that the semantics of your schema categories are not overlapping. If the features you provide to two categories are similar, consider moving the two categories under a common parent. For instance, if you have two categories ("Payment Methods" and "Payment Issues"), do not make "payment" a feature for either but rather make "issue" a feature to payment issue and "options" a feature to payment method.
Don't Add too Few Features¶
Features are essential to help the system distinguish the categories. It's important for each schema category to have features that uniquely identify it.
Do Add Proper Features to Schema Categories¶
Duet suggests the features to help the system discriminate between categories and resolve the conflicts but it needs help from the user to review the suggestion. Learn how features work in Duet and what sorts of things that system suggests as features. Once you learn how features work, define your own to add to the model's predictive ability.
Don't Add Indistinct Features¶
Make sure that features are distinct to their schema category. For instance, if you have categories like "Payment Methods" and "Payment Issues", creating a feature in both that triggers on words like "payment" alone would not be a distinct and useful feature.
Do Build Your Model Iteratively¶
Each teaching session that makes major changes ought to be saved as a new deployment, rather than merely replacing the previous deployment.
Don't Label Too Many Documents of the Same Pattern¶
Make sure to discover new patterns to label either through searching the dataset or using the get example intelligence that will help you diversity your labeled documents.
Do Monitor Performance Regularly¶
Always keep your eye on the quality metric after you add labels, features or edit your schema. As you can see your quality indicator improving, it means you are making good progress on your model.
Don't Train and Publish With Every Document Analyzed¶
Adding a single label or editing a feature won't have a huge impact on the model accuracy. You'll waste your time to publish after every model update.
Do Leverage Suggested Features¶
When conflicts arise, resolve them by using machine learning features.
Do Resolve Conflicts As Soon As They Appear¶
Conflicts are an indicator that the model is confused and need your input. The faster you resolve the conflict, the faster you will get to a good model quality. Conflicts appear on the model mostly because you are missing one or more features. Review feature suggestions to select the appropriate features to add. Sometimes conflicts are happening due to mis-labels in which case you need to correct the label on the document that has the conflict. Conflicts also reveal overlapping semantics in the schema in which case you need to edit your schema to make the semantics of the categories non-overlapping.