Learn how to configure a machine learning data pattern
on SaaS Security API.
SaaS Security API uses supervised machine
learning algorithms to sort sensitive documents into Financial, Legal
and Healthcare top-level categories for document classification
and categorization. These top-level categories may contain documents
that also classify into sub-categories, such as a financial accounting
document classifies as a sub-category to the financial top-level
The Palo Alto Networks Data Science team collects
large numbers of documents for each category that serve as the foundation for
classification. The labeled data is then split into train, test,
and verify data sets. The training data set is used to learn the
classification model, the testing data set was used to tune the
model, and the verification data set was used to evaluate the model.
the labeled training data generates features and the feature text
is tokenized into n-gram words for processing to remove stop words,
special characters, punctuations, etc. The classifier converts the
features using a vector space model and generates a high-dimension
document-feature matrix that identifies significant features to
reduce the matrix dimension. For each significant feature, SaaS
Security API computes a term frequency-inverse document frequency
(TF-IDF) weight, and the weight is normalized to remove the effects
due to different document lengths. At the end of the data preprocessing,
labeled documents then transform into labeled feature vectors for
feeding into supervised machine learning algorithms.
detection rates for sensitive data in your organization, you can
define the machine learning data pattern match criteria to identify
these sensitive assets in your cloud apps and protect them from
exposure. By default, the machine learning category is always enabled
and is applied to all your cloud apps. To change this setting, you
must be an administrator with a Super Admin role or an Admin with
access to All Apps.
Enable or disable the machine learning data pattern.
By default, the machine learning data pattern is always
enabled. If you have Super Admin account or an Admin account with access
to All Apps, you can disable a machine learning data pattern in
Enable the data pattern by clicking the on/off toggle.