Configure a Machine Learning Data Pattern

Learn how to configure a machine learning data pattern on SaaS Security API.
SaaS Security API uses supervised machine learning algorithms to sort sensitive documents into Financial, Legal and Healthcare top-level categories for document classification and categorization. These top-level categories may contain documents that also classify into sub-categories, such as a financial accounting document classifies as a sub-category to the financial top-level category.
The Palo Alto Networks Data Science team collects large numbers of documents for each category that serve as the foundation for classification. The labeled data is then split into train, test, and verify data sets. The training data set is used to learn the classification model, the testing data set was used to tune the model, and the verification data set was used to evaluate the model.
Preprocessing the labeled training data generates features and the feature text is tokenized into n-gram words for processing to remove stop words, special characters, punctuations, etc. The classifier converts the features using a vector space model and generates a high-dimension document-feature matrix that identifies significant features to reduce the matrix dimension. For each significant feature, SaaS Security API computes a term frequency-inverse document frequency (TF-IDF) weight, and the weight is normalized to remove the effects due to different document lengths. At the end of the data preprocessing, labeled documents then transform into labeled feature vectors for feeding into supervised machine learning algorithms.
To improve detection rates for sensitive data in your organization, you can define the machine learning data pattern match criteria to identify these sensitive assets in your cloud apps and protect them from exposure. By default, the machine learning category is always enabled and is applied to all your cloud apps. To change this setting, you must be an administrator with a Super Admin role or an Admin with access to All Apps.
  1. Enable or disable the machine learning data pattern.
    By default, the machine learning data pattern is always enabled. If you have Super Admin account or an Admin account with access to All Apps, you can disable a machine learning data pattern in Settings.
    1. Select
      Machine-learning Categories
    2. Enable the data pattern by clicking the on/off toggle.
    3. Save
      your setting.

Recommended For You