Shadow Data Discovery
Shadow Data Discovery enables Enterprise Data Loss Prevention (E-DLP) to detect shadow data within
your organization.
| Where Can I Use This? | What Do I Need? |
| Strata Cloud Manager |
- Data Security license
Enterprise DLP license
Or any of the following licenses that include the Enterprise DLP and Data Security licenses
- Prisma Access CASB license
- Next-Generation
CASB for Prisma Access and NGFW (CASB-X) license
- Data Security license
|
Shadow data refers to the sensitive information that exists within your organization but
remains unidentified and unprotected by your current data loss prevention systems. This
data includes dynamically and rapidly generated unstructured content such as:
Research and development data with prototypes, designs, and patents.
Email communications between investment bankers using jargon or coded language to
exchange insider trading information.
IoT device logs or custom telemetry data.
Financial documents from mergers and acquisitions.
Confidential documents concerning private partnerships, earnings reports, or code
repositories.
Feedback forms containing customer complaints, dissatisfaction, or potential
issues that could compromise your organization
Data security administrators face a significant challenge in identifying and protecting
sensitive information within your dynamic, irregular, and contextual data repositories.
In many cases, data security administrators can't configure pattern-based matching for
all types of sensitive data because each definition lacks the complete contextual
understanding of a document or payload. While you have newer machine learning techniques
such as trainable classifiers that can grasp contextual nuances, they depend on manual
uploads of known sensitive documents that might not always be available, known, or
sufficient for training. These limitations create uncertainties in your data security
posture, allowing sensitive information to go undetected and leaving it vulnerable to
significant risks.
Shadow Data Discovery enables your data security administrators to analyze your
organization's data to identify patterns and categories without requiring you to
predefine what constitutes sensitive information. When you enable this feature, Enterprise Data Loss Prevention (E-DLP) runs summarization and categorization on your nonsensitive
file-based assets, collecting summaries for up to 100,000 documents before training
categorization models. Once trained, these models create categories with descriptions
that help you understand what types of data exist in your environment. This approach
helps you discover and protect sensitive data that would otherwise remain hidden in your
environment.
Shadow Data Discovery supports file-based scanning for the following SaaS apps
onboarded to Data Security (SaaS API).
- Amazon Simple Storage Service (S3)
- Atlassian Confluence
- Azure Disk Storage
- Bitbucket
- Box
- Citrix ShareFile
- Confluence Data Center
- Dropbox
- GitHub
- Google Base
- Google Cloud Storage
- Google Drive
- Office 365
- Quip
- Workday HCM
|
|
The Shadow Data Discovery process enables Enterprise DLP to
analyze documents at rest in apps you onboarded to Data Security. Enterprise DLP uses machine learning to
analyze the documents and automatically discover and categorize
these documents into natural groupings based on the contents
contained within each document. Through hierarchical clustering, Enterprise DLP creates meaningful categories and subcategories
that reflect how your organization actually structures its documents
and information rather than relying on predefined templates.
|
|
|
After Enterprise DLP successfully scans your organization's
shadow data, your data security administrators can start analyzing
the shadow data discovery results to learn more about the
AI-generated categories and document clusters to understand what
types of data exist in your organization.
|
|
| After you analyze the discovered shadow data, take remediating action
to create a category-specific custom document type based on discovered files.
You can use these custom document types in data profiles to prevent
exfiltration of sensitive data next time Shadow Data Discovery scan
occurs. |