Skip to content

Do and Don't

DO DON'T
Do Plan Your Schema Don't Deploy to an Endpoint too Quickly
Do Define Distinct Schema Categories Don't Add too Few Features
Do Add Proper Features to Schema Categories Don't Add Indistinct Features
Do Build Your Model Iteratively Don't Label Too Many Documents of the Same Pattern
Do Monitor Performance Regularly Don't Train and Publish With Every Document Analyzed
Do Leverage Suggested Features
Do Resolve Conflicts As Soon As They Appear

Do Plan Your Schema

Before you start building your model's schema, figure out how you plan to use this model. The more you know about its function and endpoint, the better you can plan.

  • Research end users
  • Outline to the end-to-end user experience
  • Collect information about what type of data the model will be used on

Don't Deploy to an Endpoint too Quickly

Publishing too quickly, without enough model teaching, would result in a poor quality model.

Do Define Distinct Schema Categories

Make sure that the semantics of your schema categories are not overlapping. If the features you provide to two categories are similar, consider moving the two categories under a common parent. For instance, if you have two categories ("Payment Methods" and "Payment Issues"), do not make "payment" a feature for either but rather make "issue" a feature to payment issue and "options" a feature to payment method.

Don't Add too Few Features

Features are essential to help the system distinguish the categories. It's important for each schema category to have features that uniquely identify it.

Do Add Proper Features to Schema Categories

Duet suggests the features to help the system discriminate between categories and resolve the conflicts but it needs help from the user to review the suggestion. Learn how features work in Duet and what sorts of things that system suggests as features. Once you learn how features work, define your own to add to the model's predictive ability.

Don't Add Indistinct Features

Make sure that features are distinct to their schema category. For instance, if you have categories like "Payment Methods" and "Payment Issues", creating a feature in both that triggers on words like "payment" alone would not be a distinct and useful feature.

Do Build Your Model Iteratively

Each teaching session that makes major changes ought to be saved as a new deployment, rather than merely replacing the previous deployment.

Don't Label Too Many Documents of the Same Pattern

Make sure to discover new patterns to label either through searching the dataset or using the get example intelligence that will help you diversity your labeled documents.

Do Monitor Performance Regularly

Always keep your eye on the quality metric after you add labels, features or edit your schema. As you can see your quality indicator improving, it means you are making good progress on your model.

Don't Train and Publish With Every Document Analyzed

Adding a single label or editing a feature won't have a huge impact on the model accuracy. You'll waste your time to publish after every model update.

Do Leverage Suggested Features

When conflicts arise, resolve them by using machine learning features.

Do Resolve Conflicts As Soon As They Appear

Conflicts are an indicator that the model is confused and need your input. The faster you resolve the conflict, the faster you will get to a good model quality. Conflicts appear on the model mostly because you are missing one or more features. Review feature suggestions to select the appropriate features to add. Sometimes conflicts are happening due to mis-labels in which case you need to correct the label on the document that has the conflict. Conflicts also reveal overlapping semantics in the schema in which case you need to edit your schema to make the semantics of the categories non-overlapping.

Back to top