Configure a Machine Learning Data Pattern

Aperture uses supervised machine learning algorithms to sort sensitive documents into Financial, Legal and Healthcare top-level categories for document classification and categorization. These top-level categories may contain documents that also classify into sub-categories, such as a financial accounting document classifies as a sub-category to the financial top-level category.
The Palo Alto Networks Data Science team collects large numbers of documents for each category that serve as the foundation for classification. The labeled data is then split into train, test, and verify data sets. The training data set is used to learn the classification model, the testing data set was used to tune the model, and the verification data set was used to evaluate the model.
Preprocessing the labeled training data generates features and the feature text is tokenized into n-gram words for processing to remove stop words, special characters, punctuations, etc. The classifier converts the features using a vector space model and generates a high-dimension document-feature matrix that identifies significant features to reduce the matrix dimension. For each significant feature, Aperture computes a term frequency-inverse document frequency (TF-IDF) weight, and the weight is normalized to remove the effects due to different document lengths. At the end of the data preprocessing, labeled documents then transform into labeled feature vectors for feeding into supervised machine learning algorithms.
To improve detection rates for sensitive data in your organization, you can define the machine learning data pattern match criteria to identify these sensitive assets in your cloud apps and protect them from exposure. By default, the machine learning category is always enabled and is applied to all your cloud apps. To change this setting, you must be an administrator with a Super Admin role or an Admin with access to All Apps.
  1. Define the machine learning data pattern settings.
    1. Select Policy and select the data pattern to view from the Rule Name column.
    2. Add the data pattern Match Criteria by Rule Type.
    3. Save your setting.
    po-configure-machine-learning-policy.png
  2. Enable or disable the machine learning data pattern.
    By default, the machine learning data pattern is always enabled. If you have Super Admin account or an Admin account with access to All Apps, you can disable a machine learning data pattern in Settings.
    1. Select SettingsMachine-learning Categories.
    2. Enable the data pattern by clicking the on/off toggle.
      po-settings-on-off-toggle-switch.png
    3. Save your setting.
    po-configure-machine-learning.png

Related Documentation