In machine learning, a feature is a data attribute the system observes and uses to distinguish schema categories. Users in Duet teach the model the semantics of their schemas through labeling and features. Features provide clues to help the model distinguish the categories. The clues in Duet are provided as words/phrases from the domain vocabulary or patterns that involve words/phrases in a particular order. Duet uses these linguistic signals in conjunction with the labels to train the model. Alongside a developed schema, adding features to your Duet model improves its accuracy.
In Duet, features are suggested by the system when there are conflicts. Duet uses a proprietary algorithm to suggest features. The suggested features highly correlate with the conflicts the model has. To avoid over-fitting on the training data, Duet uses the power of deep learning to suggest features from external sources given the context of these features in the training data. Duet also has repositories of manually curated features in specific domains offered through Duet template library. The user expresses their domain knowledge when reviewing and editing the suggested features. This gives a path for the model to encode the user expertise in the form of features that can help the model generalize on unseen data in the domain when the model is deployed.