Skip to content

Feature Types

In machine learning, a feature is a data attribute the system observes and uses to distinguish schema categories. Features help Duet by providing clues to help the model distinguish the categories. The clues in Duet are provided as words/phrases from the domain vocabulary or patterns that involve words/phrases in a particular order. Duet uses these linguistic signals in conjunction with the labels to train models. Alongside a developed schema, adding features to your Duet model improves its accuracy.

In Duet, features are suggested by the system when there are conflicts. Duet uses a proprietary algorithm to suggest features. The suggested features highly correlate with the conflicts the model has. To avoid over-fitting on the training data, Duet uses the power of deep learning to suggest features from external sources given the context of these features in the training data. Duet also has repositories of manually-curated features in specific domains offered through Duet template library. The user expresses their domain knowledge when reviewing and editing the suggested features. This gives a path for the model to encode the user expertise in the form of features that can help the model generalize on unseen data in the domain when the model is deployed.

Each feature is associated with one or more categories when it is suggested or added to the model. Don't add features except when there are conflicts. This is the way Duet controls the model capacity to optimize for the human effort. The more features you add, the more labels you will need to balance these features.

Duet supports two type of features: dictionary of phrases and context features:

  • A dictionary feature is a set of phrases that relate to the same concept. For example, if you're trying to distinguish methods of payment for a schema category "Payment Method", concepts like "credit card" and "PayPal" are highly correlated with the "Payment Method" category.
  • Context features enable users to form linguistic patterns by composing features. For example, in tickets that relate to payment issues, there is a linguistic pattern where the concept of "issue" precedes the concept of "pay" with few tokens in between. To capture this pattern, you can define a context feature that composes the dictionary of the concept "issue" and the dictionary of the concept "pay" separated by few tokens (i.e., words). Such context features will capture something like "I have an issue when trying to pay for my online order". Context features can be composed of only 2 features which could be dictionary items, context features or a mix of both.

There are three ways to add new features:

  • Review suggested features to resolve conflicts.
  • Click "+ Add Context" or "+ Add Dictionary" at the bottom of the screen.
  • When making a document classifier, highlighting or double-clicking tokens that you see in a document (note: this function is not supported on entity extractors).

Click here to continue reading about features in document classifiers. Click here to continue reading about features in entity extraction.

Back to top