Shadow Data Discovery

Table of Contents

Data Asset Explorer

Start a Shadow Data Discovery Scan

Shadow Data Discovery

Shadow Data Discovery enables Enterprise Data Loss Prevention (E-DLP) to detect shadow data within your organization.

Where Can I Use This?	What Do I Need?
Strata Cloud Manager	Data Security license Enterprise DLP license Or any of the following licenses that include the Enterprise DLP and Data Security licenses Prisma Access CASB license Next-Generation CASB for Prisma Access and NGFW (CASB-X) license Data Security license

Where Can I Use This?

What Do I Need?

Strata Cloud Manager

Data Security license
Enterprise DLP license

Or any of the following licenses that include the Enterprise DLP and Data Security licenses

Prisma Access CASB license
Next-Generation CASB for Prisma Access and NGFW (CASB-X) license
Data Security license

Contact Palo Alto Networks to enable Shadow Data Discovery on your tenant.

Shadow data refers to the sensitive information that exists within your organization but remains unidentified and unprotected by your current data loss prevention systems. This data includes dynamically and rapidly generated unstructured content such as:

Research and development data with prototypes, designs, and patents.
Email communications between investment bankers using jargon or coded language to exchange insider trading information.
IoT device logs or custom telemetry data.
Financial documents from mergers and acquisitions.
Confidential documents concerning private partnerships, earnings reports, or code repositories.
Feedback forms containing customer complaints, dissatisfaction, or potential issues that could compromise your organization

Data security administrators face a significant challenge in identifying and protecting sensitive information within your dynamic, irregular, and contextual data repositories. In many cases, data security administrators can't configure pattern-based matching for all types of sensitive data because each definition lacks the complete contextual understanding of a document or payload. While you have newer machine learning techniques such as trainable classifiers that can grasp contextual nuances, they depend on manual uploads of known sensitive documents that might not always be available, known, or sufficient for training. These limitations create uncertainties in your data security posture, allowing sensitive information to go undetected and leaving it vulnerable to significant risks.

Shadow Data Discovery enables your data security administrators to analyze your organization's data to identify patterns and categories without requiring you to predefine what constitutes sensitive information. When you enable this feature, Enterprise Data Loss Prevention (E-DLP) runs summarization and categorization on your nonsensitive file-based assets, collecting summaries for up to 100,000 documents before training categorization models. Once trained, these models create categories with descriptions that help you understand what types of data exist in your environment. This approach helps you discover and protect sensitive data that would otherwise remain hidden in your environment.

Shadow Data Discovery supports file-based scanning for the following SaaS apps onboarded to Data Security (SaaS API).

Amazon Simple Storage Service (S3)
Atlassian Confluence
Azure Disk Storage
Bitbucket
Box
Citrix ShareFile
Confluence Data Center
Dropbox
GitHub
Google Base
Google Cloud Storage
Google Drive
Office 365
Quip
Workday HCM

Start a Shadow Data Discovery Scan	The Shadow Data Discovery process enables Enterprise DLP to analyze documents at rest in apps you onboarded to Data Security. Enterprise DLP uses machine learning to analyze the documents and automatically discover and categorize these documents into natural groupings based on the contents contained within each document. Through hierarchical clustering, Enterprise DLP creates meaningful categories and subcategories that reflect how your organization actually structures its documents and information rather than relying on predefined templates.
Analyze Discovered Shadow Data	After Enterprise DLP successfully scans your organization's shadow data, your data security administrators can start analyzing the shadow data discovery results to learn more about the AI-generated categories and document clusters to understand what types of data exist in your organization.
Remediate Discovered Shadow Data	After you analyze the discovered shadow data, take remediating action to create a category-specific custom document type based on discovered files. You can use these custom document types in data profiles to prevent exfiltration of sensitive data next time Shadow Data Discovery scan occurs.

Data Asset Explorer

Start a Shadow Data Discovery Scan

Enterprise DLP Docs

Shadow Data Discovery

Enterprise DLP Docs

Shadow Data Discovery

Activation & Onboarding

Next-Generation Firewalls

SASE

Cloud-Delivered Security Services

Endpoints

Visibility & Monitoring

Best Practices

Experts Corner