Enterprise DLP
Supported EDM Data Set Formats
Table of Contents
Expand All
|
Collapse All
Enterprise DLP Docs
Supported EDM Data Set Formats
Supported source file and data type formats for encrypted Exact Data Matching (EDM) data
sets.
On May 7, 2025, Palo Alto Networks is introducing new Evidence Storage and Syslog Forwarding service IP
addresses to improve performance and expand availability for these services
globally.
You must allow these new service IP addresses on your network
to avoid disruptions for these services. Review the Enterprise DLP
Release Notes for more
information.
Where Can I Use This? | What Do I Need? |
---|---|
|
Or any of the following licenses that include the Enterprise DLP license
|
The Exact Data Matching (EDM) CLI app
supports CSV and TSV as source files for an encrypted EDM data set upload to Enterprise Data Loss Prevention (E-DLP). Before you upload an encrypted EDM data set to Enterprise DLP, review the supported CSV file, TSV file, and data type
formatting.
Enterprise DLP uses an Exact Match for values that don't follow the supported data type
format below or data types that have no unique formatting requirements. If a data type
follows the supported format, Enterprise DLP can match other instances of the data
type in the scanned file. For example, if you configure a DLP rule to block files that contains the social
security number 456-12-7890, Enterprise DLP also matches
instances of social security numbers formatted as 456 12 7890 and
456.12.7890. However, if you configure the DLP rule to block
files containing the social security number 456127890, Enterprise DLP only blocks files containing an exact match to this social security
number.
When preparing an EDM data set for upload, considering the following:
- Enterprise DLP supports a header row in an EDM data set.
- Enterprise DLP supports data sets in CSV and TSV formats.Palo Alto Networks recommends that data sets in CSV format adhere to the RFC-4180 standard.
- Palo Alto Networks recommends Atomic columns to ensure accurate matching of sensitive data.Atomic columns are columns containing cells that contain a discrete or unique Data Type value. For example, in your data set you have the SSN column. One of the cells in this column contains the value "123456789;098765432. In this example, Enterprise DLP inspects for all incidents of 123456789;098765432 as a singular SSN rather than inspecting for 123456789 and 098765432 as unique incidents.
- Enterprise DLP supports up to 50 individual Data Type values in a single cell.The Data Types are data values recognized by Enterprise DLP. If a cell has more than 50 Data Type values recognized by Enterprise DLP, Enterprise DLP processes only the first 50 values and ignores the remaining values.For example, Today is August 02 2020 contains three data type values; Today and is are Alphabet data types and August 02 2020 is a Date data type.
- Only English (Latin script).
- Enterprise DLP supports the following delimiters to separate values in scanned files:
- Caret (^)
- Pipe (|)
- Semicolon (;)
- Tab (\t)
- Tilde (~)
- By default, Enterprise DLP supports a maximum of 30 columns and 130 million rows per EDM data set.For example, you have one EDM data set containing 30 columns and 4 million rows and a second EDM data set containing 6 columns and 20 million rows. Enterprise DLP supports both EDM data sets because they each have up to the maximum number of rows and columns supported.
- By default, Enterprise DLP supports up to 120 million cells per data set and up to 500 million cells for a single Enterprise DLP tenant across all EDM data sets uploaded to Enterprise DLP.Contact Palo Alto Networks Customer Support to increase the maximum number of cells supported for your Enterprise DLP tenant.By request, Enterprise DLP can support up to 1 billion cells per EDM data set and up to 2 billion cells per Enterprise DLP tenant across all EDM data sets uploaded to Enterprise DLP.
- The supported file encoding schemes are UTF-8, UTF-16, ISO-8859-1, and US-ASCII.
- The EDM CLI app removes all punctuation from data contained in the EDM data set.
The EDM CLI app supports the following data type formats for EDM data sets.
- Alphabet and Alphanumeric
Format Example Numbers, uppercase letters, and lowercase letters- ABCDEFG
- abcdefg
- AB123CG
- AB123cdab123cd
- Bank Routing Number and Bank Account Number
Format Example Enterprise DLP requires an exact match for a bank routing number and bank account number to render a verdict.N/A - Country Name
Format Example Enterprise DLP requires an exact match for country names to render a verdict.- Full country name
- Country name abbreviation
- US
- USA
- United States
- United States of America
- The United States of America
- Credit Card
Format Example Between 13 to 23 digits including dashes (-).- 4739-5402-9061-0638
- 4739540290610638
- Currency
- Enterprise DLP supports inspection of the US Dollar (USD, $), the Euro (EUR, €) and Generic Number values.
- All currencies support values from 0 to 1 trillion (1,000,000,000).
- All currencies support negative values. Enterprise DLP matches negative values against their absolute counterparts.
- Enterprise DLP recognize currency formats when a currency symbol or code is within 5 spaces of the beginning of the number.
- Enterprise DLP treats the following formats are currencies when a symbol or code is present. As standalone values, Enterprise DLP these as generic numbers.
- Simple Integer with No Symbol or Code
- No Symbol or Code with Cents
- US/UK Notation with No Symbol or Code
FormatExample Simple Integer with Symbol $1000 or €1000 Simple Integer with Code USD1000 or EUR1000 Simple Integer with Symbol and Space $ 1000 or € 1000 Simple Integer with Code and Space USD 1000 or EUR 1000 Symbol or Code with Cents $1000.00 or EUR1000.00 Symbol or Code Comma Notation $1,000 or EUR1,000 Comma Notation Symbol or Code with Cents $1,000.00 or EUR1,000.00 Symbol or Code European Dot-Comma Notation $1.000,00 or €1.000,00 Simple Integer with No Symbol or Code 1000 No Symbol or Code with Cents 1000.00 US/UK Notation with No Symbol or Code 1,000.00 - DateEnterprise DLP supports spaces, dashes (-), slashes (/), periods (.), and any combination of these as separators. Enterprise DLP is able to detect variations of the formats described below.Enterprise DLP supports of 20th and 21st century dates only. These are all starts starting on January 1, 1901.
Format Example Legend- DD—Two-digit day of the month. For example, 02.
- MMM—Three letter abbreviation for month. For example, Aug.
- MM—Two-digit month. For example, 12.
- YYYY—Four-digit year. For example, 2020.
- MM-DD-YYYY
- MM/DD/YYYY
- MM.DD.YYYY
- MMM-DD-YYYY
- MMM/DD/YYYY
- MMM.DD.YYYY
- DD-MM-YYYY
- DD/MM/YYYY
- DD MM YYYY
- DD.MM.YYYY
- DD-MMM-YYYY
- DD/MMM/YYYY
- DD MMM YYYY
- DD.MMM.YYYY
- YYYY-MM-DD
- YYYY/MM/DD
- YYYY.MM.DD
- YYYY-MMM-DD
- YYYY/MMM/DD
- YYYY.MMM.DD
- 12-02-2020
- 12/02/2020
- 12.02.2020
- Dec-02-2020
- Dec/02/2020
- Dec.02.2020
- 02-12-2020
- 02/12/2020
- 02 12 2020
- 02.12.2020
- 02-Dec-2020
- 02/Dec/2020
- 02 Dec 2020
- 02.Dec.2020
- 2020-12-02
- 2020/12/02
- 2020.12-02
- 2020-Dec-02
- 2020/Dec/02
- 2020.Dec.02
Ambiguous Dates—An ambiguous date is a date value that Enterprise DLP can't explicitly identify or interpret as a discrete and specific DD, MM, or YYYYvalue. For ambiguous dates, Enterprise DLP detects and takes action on all possible interpretations of ambiguous dates. When Enterprise DLP encounters an ambiguous year value, it only checks for dates in the 20th (19xx) and 21st (20xx) centuries.For example, consider the ambiguous date of 1/8/1975. This date can be interpreted as January 8th, 1975 or August 1st, 1975 depending on the country.Common examples of ambiguous dates include:- Dates where the year value of the date consists of 2 digits and is greater than 32.Check Type: Enterprise DLP checks for every possible day/month combination.Example: 2/06/50Dates Enterprise DLP Checks For:Feb 6 1950 Feb 6 2050 June 2 1950 June 2 2050Dates where the day and year value are ambiguous, or the month and year values are ambiguous.Check Type: Enterprise DLP assumes the last pair of numbers is the year value and checks for every variation of that date.Example: Feb/06/12Dates Enterprise DLP Checks For:Feb 6 1912 Feb 6 2012Dates where the day, month, and year values are all ambiguous.Check Type: Enterprise DLP assumes the last pair of numbers is the year value and checks for every variation of that date.Example: 2/06/12Dates Enterprise DLP Checks For:Feb 6 1912 Feb 6 2012 June 2 1912 June 2 2012Email
Format Example RFC5322—<emailprefix>@<emaildomain>- bill@business.com
- BILL@BUSINESS.COM
- BILL@business.com
- bill@BUSINESS.com
First, Middle, Last, and Full NameFormat Example - Uppercase letters
- Lowercase letters
- Bill
- bill
- Bill’s
- bill’s
- Bill Smith
- bill smith
- Bill Smith’s
- bill smith’s
IP Address (IPv4 and IPv6)Format Example Enterprise DLP requires an exact match for IPv4 and IPv6 IP addresses to render a verdict.N/AMedical Record NumberFormat Example Enterprise DLP requires an exact match for Medical Record Numbers to render a verdict.N/AMember ID and Reward IDFormat Example Enterprise DLP requires an exact match for Member IDs and Reward IDs to render a verdict.N/ANumbersFormat Example Enterprise DLP requires an exact match for all numbers to render a verdict.Enterprise DLP removes positive signed integers (+) and treats the same as a nonsigned integer. Enterprise DLP does not remove negative signed integers (-) to differentiate between positive and negative signed integers.- SI Numbers - 1234, +1234, or -1234
- Formatted Numbers—9.00
- Indian Number System—12, 34, 567.89
Phone NumberFormat Example Ten-digit US phone number format only.The EDM CLI app removes the Country code, parentheses (()), dash(-), space, and periods (.).- 8001234567
- (800)1234567
- 1.800.123.4567
- +1 (800)123-4567
- 1 800 123 4567
- +1 800 123 4567
- +1 800 123-4567
- 1-800-123-4567
- 1 (800) 123-4567
- (800)123-4567
- (800) 123 4567
- 800-123-4567
USA Driver LicenseFormat Example Alphanumeric- E1234567
- e1234567
USA Social Security NumberFormat:Enterprise DLP supports spaces, dashes (-), periods (.) as separators.Format Example - XXX-XX-XXXX
- XXX XX XXXX
- XXX.XX.XXXX
- XXXXXXXXX
- 123-45-6789
- 123 45 6789
- 123.45.6789
- 123456789
UUIDFormat Example RFC4122—32 hexadecimal (base-16) digits. If you’re using hyphens, the total is 36 digits.- 123e4567e89b12d3a45642661417400
- 123e4567-e89b-12d3-a456-42661417400