Shadow Data Discovery
Focus
Focus
Enterprise DLP

Shadow Data Discovery

Table of Contents

Shadow Data Discovery

Shadow Data Discovery enables Enterprise Data Loss Prevention (E-DLP) to detect shadow data within your organization.
Where Can I Use This?What Do I Need?
Strata Cloud Manager
  • Data Security license
  • Enterprise DLP license
Or any of the following licenses that include the Enterprise DLP and Data Security licenses
  • Prisma Access CASB license
  • Next-Generation CASB for Prisma Access and NGFW (CASB-X) license
  • Data Security license
Shadow data refers to the sensitive information that exists within your organization but remains unidentified and unprotected by your current data loss prevention systems. This data includes dynamically and rapidly generated unstructured content such as:
  • Research and development data with prototypes, designs, and patents.
  • Email communications between investment bankers using jargon or coded language to exchange insider trading information.
  • IoT device logs or custom telemetry data.
  • Financial documents from mergers and acquisitions.
  • Confidential documents concerning private partnerships, earnings reports, or code repositories.
  • Feedback forms containing customer complaints, dissatisfaction, or potential issues that could compromise your organization
Data security administrators face a significant challenge in identifying and protecting sensitive information within your dynamic, irregular, and contextual data repositories. In many cases, data security administrators can't configure pattern-based matching for all types of sensitive data because each definition lacks the complete contextual understanding of a document or payload. While you have newer machine learning techniques such as trainable classifiers that can grasp contextual nuances, they depend on manual uploads of known sensitive documents that might not always be available, known, or sufficient for training. These limitations create uncertainties in your data security posture, allowing sensitive information to go undetected and leaving it vulnerable to significant risks.
Shadow Data Discovery enables your data security administrators to analyze your organization's data to identify patterns and categories without requiring you to predefine what constitutes sensitive information. When you enable this feature, Enterprise Data Loss Prevention (E-DLP) runs summarization and categorization on your nonsensitive file-based assets, collecting summaries for up to 100,000 documents before training categorization models. Once trained, these models create categories with descriptions that help you understand what types of data exist in your environment. This approach helps you discover and protect sensitive data that would otherwise remain hidden in your environment.
Shadow Data Discovery supports file-based scanning for the following SaaS apps onboarded to Data Security (SaaS API).
  • Amazon Simple Storage Service (S3)
  • Atlassian Confluence
  • Azure Disk Storage
  • Bitbucket
  • Box
  • Citrix ShareFile
  • Confluence Data Center
  • Dropbox
  • GitHub
  • Google Base
  • Google Cloud Storage
  • Google Drive
  • Office 365
  • Quip
  • Workday HCM
The Shadow Data Discovery process enables Enterprise DLP to analyze documents at rest in apps you onboarded to Data Security. Enterprise DLP uses machine learning to analyze the documents and automatically discover and categorize these documents into natural groupings based on the contents contained within each document. Through hierarchical clustering, Enterprise DLP creates meaningful categories and subcategories that reflect how your organization actually structures its documents and information rather than relying on predefined templates.
After Enterprise DLP successfully scans your organization's shadow data, your data security administrators can start analyzing the shadow data discovery results to learn more about the AI-generated categories and document clusters to understand what types of data exist in your organization.
After you analyze the discovered shadow data, take remediating action to create a category-specific custom document type based on discovered files. You can use these custom document types in data profiles to prevent exfiltration of sensitive data next time Shadow Data Discovery scan occurs.