Supported EDM Data Set Formats
Focus
Focus
Enterprise DLP

Supported EDM Data Set Formats

Table of Contents

Supported EDM Data Set Formats

Supported source file and data type formats for encrypted Exact Data Matching (EDM) data sets.
Where Can I Use This?What Do I Need?
  • NGFW (Managed by Panorama or Strata Cloud Manager)
  • Prisma Access (Managed by Panorama or Strata Cloud Manager)
  • Enterprise Data Loss Prevention (E-DLP) license
    Review the Supported Platforms for details on the required license for each enforcement point.
Or any of the following licenses that include the Enterprise DLP license
  • Prisma Access CASB license
  • Next-Generation CASB for Prisma Access and NGFW (CASB-X) license
  • Data Security license
The Exact Data Matching (EDM) CLI application supports CSV and TSV as source files for an encrypted EDM data set upload to the DLP cloud service. Before you upload an encrypted EDM data set to the DLP cloud service, review the supported CSV file, TSV file, and data type formatting.
The DLP cloud service uses an Exact Match for values that don't follow the supported data type format below or data types that have no unique formatting requirements. If a data type follows the supported format, the DLP cloud service can match other instances of the data type in the scanned file. For example, if you configure an EDM filtering profile to block files that contains the social security number 456-12-7890, the DLP cloud service also matches instances of social security numbers that are formatted as 456 12 7890 and 456.12.7890. However, if the EDM filtering profile is configured to block files containing the social security number 456127890, only files containing an exact match to this social security number are blocked.
When preparing an EDM data set for upload, considering the following:
  • A header row is supported.
  • Data sets in CSV and TSV formats are supported.
    CSV format is recommended to adhere to the RFC-4180 standard.
  • Atomic columns are recommended to ensure accurate matching of sensitive data.
    Atomic columns are columns containing cells that are expected to contain a discrete or unique Data Type value. For example, in your data set you have the SSN column. One of the cells in this column contains the value "123456789;098765432. In this example, the DLP cloud service inspects for all incidents of 123456789;098765432 as a singular SSN rather than inspecting for 123456789 and 098765432 as unique incidents.
  • Up to 50 individual Data Type values are supported in a single cell.
    The Data Types are data values recognized by the DLP cloud service. If a cell has more than 50 Data Type values recognized by the DLP cloud service, only the first 50 values are processed and the remaining are ignored.
    For example, Today is August 02, 2020 contains three data type values; Today and is are Alphabet data types and August 02, 2020 is a Date data type.
  • Only English (Latin script).
  • Only the “,” and tab (t) delimiters are supported.
  • By default, A maximum of 30 columns and 130 million rows are supported per EDM data set.
    For example, you have one EDM data set containing 30 columns and 4 million rows and a second EDM data set containing 6 columns and 20 million rows. Both EDM data sets are supported because they each have contain up to the maximum number of rows and columns supported.
  • By default, Enterprise DLP supports up to 120 million cells per data set and up to 500 million cells for a single Enterprise DLP tenant across all EDM data sets uploaded to the DLP cloud service.
    Contact Palo Alto Networks Customer Support to increase the maximum number of cells supported for your Enterprise DLP tenant.
    By request, Enterprise DLP can support up to 1 billion cells per EDM data set and up to 2 billion cells per Enterprise DLP tenant across all EDM data sets uploaded to the DLP cloud service.
  • The supported file encoding schemes are UTF-8, UTF-16, ISO-8859-1, and US-ASCII.
  • The EDM CLI application removes all punctuation from data contained in the EDM data set.
The EDM CLI application supports the following data type formats for EDM data sets.
Data Type
Format
Example
Date
  • DD-MM-YYYY
    DD/MM/YYYY
    DD.MM.YYYY
    DD,MM,YYYY
    DD MM YYYY
  • MM-DD-YYYY
    MM/DD/YYYY
    MM.DD.YYYY
    MM,DD,YYYY
    MM DD YYYY
  • YYYY-MM-DD
    YYYY/MM/DD
    YYYY.MM.DD
    YYYY,MM,DD
    YYYY MM DD
A space, dashes (-), slash (/), comma (,), period (.), and any combination of these separators are supported.
  • 2-Aug-2020
  • 02-Aug-2020
  • 02.08.2020
  • 02 Aug 2020
  • 2 August, 2020
  • 2 Aug, 2020
  • 02 August 2020
  • 2. August 2020
  • August 2, 2020
  • Aug 2, 2020
  • Sunday, August 2, 2020
  • Sunday, August 02, 2020
  • Sunday, 2 August, 2020
  • Sunday 02 August 2020
Exact Data Matching is performed for ambiguous dates.
  • 20-08-02
  • 02.08.20
  • 08/02/20
  • 08 2, 20
  • 02/08/20
  • 8/2/20
  • 2020/08/02
  • 2020-08-02
  • 02/08/2020
  • 2/08/2020
USA Social Security Number
  • XXX-XX-XXXX
  • XXX XX XXXX
  • XXX.XX.XXXX
  • XXXXXXXXX
A space, dashes (-), period (.) are supported separators.
  • 123-45-6789
  • 123 45 6789
  • 123.45.6789
  • 123456789
Country Name
  • Country full name
  • Country name abbreviation
An Exact Match is performed for a country name.
US
USA
United States
United States of America
The United States of America
First Name
Last Name
Middle Name
Full Name
Uppercase and lowercase.
Bill
bill
Bill’s
bill’s
Bill Smith
bill smith
Bill Smith’s
bill smith’s
Medical Record Number
An Exact Match is performed for a Medical Record Number.
N/A
Member ID
Reward ID
An Exact Match is performed for a Medical Record Number.
N/A
Alphanumeric
Alphabet
Numbers, uppercase, and lowercase letters.
ABCDEFG
abcdefg
AB123CG
AB123cdab123cd
USA Driver License
Alphanumeric.
E1234567
e1234567
Email
RFC5322—<emailprefix>@<emaildomain>
bill@business.com
BILL@BUSINESS.COM
BILL@business.com
bill@BUSINESS.com
Bank Routing Number
Bank Account Number
An Exact Match is performed for a bank routing number and bank account number.
N/A
IP Address (IPv4 and IPv6)
An Exact Match is performed for an IPv4 and IPv6 IP address.
N/A
Numbers
An Exact Match is performed for all numbers.
A positive signed integer (+) is removed and treated the same as a nonsigned integer. A negative signed integer (-) isn’t removed as to differentiate between positive and negative signed integers.
  • SI Numbers - 1234, +1234, or -1234
  • Formatted Numbers—9.00
  • Indian Number System—12, 34, 567.89
Phone Number
Ten-digit US phone number format only.
Country code, parentheses, dash, space, and dots are removed.
8001234567
(800)1234567
1.800.123.4567
+1 (800)123-4567
1 800 123 4567
+1 800 123 4567
+1 800 123-4567
1-800-123-4567
1 (800) 123-4567
(800)123-4567
(800) 123 4567
800-123-4567
UUID
RFC4122—32 hexadecimal (base-16) digits. If you’re using hyphens, the total is 36 digits.
123e4567e89b12d3a45642661417400
123e4567-e89b-12d3-a456-42661417400
Credit Card
Between 13 to 23 digits including dashes.
4739-5402-9061-0638
4739540290610638