Supported EDM Data Set Formats
Focus
Focus
Enterprise DLP

Supported EDM Data Set Formats

Table of Contents

Supported EDM Data Set Formats

Supported source file and data type formats for encrypted Exact Data Matching (EDM) data sets.
On May 7, 2025, Palo Alto Networks is introducing new Evidence Storage and Syslog Forwarding service IP addresses to improve performance and expand availability for these services globally.
You must allow these new service IP addresses on your network to avoid disruptions for these services. Review the Enterprise DLP Release Notes for more information.
Where Can I Use This?What Do I Need?
  • NGFW (Managed by Panorama or Strata Cloud Manager)
  • Prisma Access (Managed by Panorama or Strata Cloud Manager)
  • Enterprise Data Loss Prevention (E-DLP) license
    Review the Supported Platforms for details on the required license for each enforcement point.
Or any of the following licenses that include the Enterprise DLP license
  • Prisma Access CASB license
  • Next-Generation CASB for Prisma Access and NGFW (CASB-X) license
  • Data Security license
The Exact Data Matching (EDM) CLI app supports CSV and TSV as source files for an encrypted EDM data set upload to Enterprise Data Loss Prevention (E-DLP). Before you upload an encrypted EDM data set to Enterprise DLP, review the supported CSV file, TSV file, and data type formatting.
Enterprise DLP uses an Exact Match for values that don't follow the supported data type format below or data types that have no unique formatting requirements. If a data type follows the supported format, Enterprise DLP can match other instances of the data type in the scanned file. For example, if you configure an EDM filtering profile to block files that contains the social security number 456-12-7890, Enterprise DLP also matches instances of social security numbers formatted as 456 12 7890 and 456.12.7890. However, if the EDM filtering profile is configured to block files containing the social security number 456127890, Enterprise DLP only blocks files containing an exact match to this social security number.
When preparing an EDM data set for upload, considering the following:
  • A header row is supported.
  • Enterprise DLP supports data sets in CSV and TSV formats.
    Palo Alto Networks recommends that data sets in CSV format adhere to the RFC-4180 standard.
  • Palo Alto Networks recommends Atomic columns to ensure accurate matching of sensitive data.
    Atomic columns are columns containing cells that are expected to contain a discrete or unique Data Type value. For example, in your data set you have the SSN column. One of the cells in this column contains the value "123456789;098765432. In this example, Enterprise DLP inspects for all incidents of 123456789;098765432 as a singular SSN rather than inspecting for 123456789 and 098765432 as unique incidents.
  • Enterprise DLP supports up to 50 individual Data Type values in a single cell.
    The Data Types are data values recognized by Enterprise DLP. If a cell has more than 50 Data Type values recognized by Enterprise DLP, Enterprise DLP processes only the first 50 values and ignores the remaining values.
    For example, Today is August 02 2020 contains three data type values; Today and is are Alphabet data types and August 02 2020 is a Date data type.
  • Only English (Latin script).
  • Enterprise DLP supports the following delimiters to separate values in scanned files:
    • Caret (^)
    • Pipe (|)
    • Semicolon (;)
    • Tab (\t)
    • Tilde (~)
  • By default, Enterprise DLP supports a maximum of 30 columns and 130 million rows per EDM data set.
    For example, you have one EDM data set containing 30 columns and 4 million rows and a second EDM data set containing 6 columns and 20 million rows. Both EDM data sets are supported because they each have up to the maximum number of rows and columns supported.
  • By default, Enterprise DLP supports up to 120 million cells per data set and up to 500 million cells for a single Enterprise DLP tenant across all EDM data sets uploaded to Enterprise DLP.
    Contact Palo Alto Networks Customer Support to increase the maximum number of cells supported for your Enterprise DLP tenant.
    By request, Enterprise DLP can support up to 1 billion cells per EDM data set and up to 2 billion cells per Enterprise DLP tenant across all EDM data sets uploaded to Enterprise DLP.
  • The supported file encoding schemes are UTF-8, UTF-16, ISO-8859-1, and US-ASCII.
  • The EDM CLI app removes all punctuation from data contained in the EDM data set.
The EDM CLI app supports the following data type formats for EDM data sets.
  • Date
    Enterprise DLP supports spaces, dashes (-), slashes (/), periods (.), and any combination of these as separators. Enterprise DLP is able to detect variations of the formats described below.
    FormatExample
    • DD-MM-YYYY
    • DD/MM/YYYY
    • DD MM YYYY
    • MM-DD-YYYY
    • MM/DD/YYYY
    • MM.DD.YYYY
    • MM DD YYYY
    • YYYY-MM-DD
    • YYYY/MM/DD
    • YYYY MM DD
    • DD Month YYYY
    • DD-Month-YYYY
    • DD/Month/YYYY
    • DD.Month.YYYY
    • 8-Aug-2020
    • 13-Aug-2020
    • 13/08/2020
    • 13.08.2020
    • 13 Aug 2020
    • 13 August, 2020
    • 08 August 2020
    • 8. August 2020
    • August 8 2020
    • Sunday 08 August 2020
    Ambiguous Dates—An ambiguous date is a date value that Enterprise DLP can't explicitly identify or interpret as a discrete and specific date. For ambiguous dates, Enterprise DLP requires an exact match to render a verdict and is unable to detect variations for ambiguous dates.
    Common examples include:
    • Dates where the YYYY component of the date value consists of 2 digits. For example, 23-08-20.
    • Dates where both the DD and MM components of the date value are any digit between 1 and 12. For example, 08-10-2020.
  • USA Social Security Number
    Format:
    Enterprise DLP supports spaces, dashes (-), periods (.) as separators.
    FormatExample
    • XXX-XX-XXXX
    • XXX XX XXXX
    • XXX.XX.XXXX
    • XXXXXXXXX
    • 123-45-6789
    • 123 45 6789
    • 123.45.6789
    • 123456789
  • Country Name
    FormatExample
    Enterprise DLP requires an exact match for country names to render a verdict.
    • Full country name
    • Country name abbreviation
    • US
    • USA
    • United States
    • United States of America
    • The United States of America
  • First, Middle, Last, and Full Name
    FormatExample
    • Uppercase letters
    • Lowercase letters
    • Bill
    • bill
    • Bill’s
    • bill’s
    • Bill Smith
    • bill smith
    • Bill Smith’s
    • bill smith’s
  • Medical Record Number
    FormatExample
    Enterprise DLP requires an exact match for Medical Record Numbers to render a verdict.
    N/A
  • Member ID and Reward ID
    FormatExample
    Enterprise DLP requires an exact match for Member IDs and Reward IDs to render a verdict.
    N/A
  • Alphabet and Alphanumeric
    FormatExample
    Numbers, uppercase letters, and lowercase letters
    • ABCDEFG
    • abcdefg
    • AB123CG
    • AB123cdab123cd
  • USA Driver License
    FormatExample
    Alphanumeric
    • E1234567
    • e1234567
  • Email
    FormatExample
    RFC5322—<emailprefix>@<emaildomain>
    • bill@business.com
    • BILL@BUSINESS.COM
    • BILL@business.com
    • bill@BUSINESS.com
  • Bank Routing Number and Bank Account Number
    FormatExample
    Enterprise DLP requires an exact match for a bank routing number and bank account number to render a verdict.
    N/A
  • IP Address (IPv4 and IPv6)
    FormatExample
    Enterprise DLP requires an exact match for IPv4 and IPv6 IP addresses to render a verdict.
    N/A
  • Numbers
    FormatExample
    Enterprise DLP requires an exact match for all numbers to render a verdict.
    Enterprise DLP removes positive signed integers (+) and treats the same as a non-signed integer. Enterprise DLP does not remove negative signed integers (-) to differentiate between positive and negative signed integers.
    • SI Numbers - 1234, +1234, or -1234
    • Formatted Numbers—9.00
    • Indian Number System—12, 34, 567.89
  • Phone Number
    FormatExample
    Ten-digit US phone number format only.
    The EDM CLI app removes the Country code, parentheses (()), dash(-), space, and periods (.).
    • 8001234567
    • (800)1234567
    • 1.800.123.4567
    • +1 (800)123-4567
    • 1 800 123 4567
    • +1 800 123 4567
    • +1 800 123-4567
    • 1-800-123-4567
    • 1 (800) 123-4567
    • (800)123-4567
    • (800) 123 4567
    • 800-123-4567
  • UUID
    FormatExample
    RFC4122—32 hexadecimal (base-16) digits. If you’re using hyphens, the total is 36 digits.
    • 123e4567e89b12d3a45642661417400
    • 123e4567-e89b-12d3-a456-42661417400
  • Credit Card
    FormatExample
    Between 13 to 23 digits including dashes (-).
    • 4739-5402-9061-0638
    • 4739540290610638