Supported EDM Data Set Formats
Focus
Focus
Enterprise DLP

Supported EDM Data Set Formats

Table of Contents

Supported EDM Data Set Formats

Supported source file and data type formats for encrypted Exact Data Matching (EDM) data sets.
Where Can I Use This?
What Do I Need?
  • NGFW (Panorama Managed)
  • Prisma Access (Cloud Management)
  • SaaS Security
  • NGFW (Cloud Managed)
  • Enterprise Data Loss Prevention (E-DLP)
    license
  • NGFW (Panorama Managed)
    —Support and
    Panorama
    device management licenses
  • Prisma Access (Cloud Management)
    Prisma Access
    license
  • SaaS Security
    SaaS Security
    license
  • NGFW (Cloud Managed)
    —Support and
    AIOps for NGFW Premium
    licenses
Or any of the following licenses that include the
Enterprise DLP
license
  • Prisma Access
    CASB license
  • Next-Generation CASB for Prisma Access and NGFW (CASB-X)
    license
  • Data Security
    license
The Exact Data Matching (EDM) CLI application supports CSV and TSV as source files for an encrypted EDM data set upload to the DLP cloud service. Before you upload an encrypted EDM data set to the DLP cloud service, review the supported CSV file, TSV file, and data type formatting.
The DLP cloud service uses an Exact Match for values that do not follow the supported data type format below or data types that have no unique formatting requirements. If a data type follows the supported format, the DLP cloud service can match other instances of the data type in the scanned file. For example, if you configure an EDM filtering profile to block files that contains the social security number
456-12-7890
, the DLP cloud service also matches instances of social security numbers that are formatted as
456 12 7890
and
456.12.7890
. However, if the EDM filtering profile is configured to block files containing the social security number
456127890
, only files containing an exact match to this social security number are blocked.
When preparing an EDM data set for upload, considering the following:
  • A header row is supported.
  • Data sets in CSV and TSV formats are supported.
    CSV format is recommended to adhere to the RFC-4180 standard.
  • Atomic columns are recommended to ensure accurate matching of sensitive data.
    Atomic columns are columns containing cells that are expected to contain a discrete or unique Data Type value. For example, in your data set you have the
    SSN
    column. One of the cells in this column contains the value "
    123456789;098765432
    . In this example, the DLP cloud service inspects for all incidents of
    123456789;098765432
    as a singular SSN rather than inspecting for
    123456789
    and
    098765432
    as unique incidents.
  • Up to 50 individual Data Type values are supported in a single cell.
    The Data Types are data values recognized by the DLP cloud service. If a cell has more than 50 Data Type values recognized by the DLP cloud service, only the first 50 values are processed and the remaining are ignored.
    For example,
    Today is August 02, 2020
    contains three data type values;
    Today
    and
    is
    are Alphabet data types and
    August 02, 2020
    is a Date data type.
  • Only English (Latin script) is supported.
  • Only the “
    ,
    ” and tab (
    t
    ) delimiters are supported.
  • A maximum of 120 rows and 30 columns are supported per EDM data set.
    For example, you have one EDM data set containing 30 columns and 4 million rows and a second EDM data set containing six columns and 120 million rows. Both EDM data sets are supported because they each have contain up to the maximum number of rows and columns supported.
  • By default, up to 500 million cells are supported for a single
    Enterprise DLP
    tenant across all EDM data sets uploaded to the DLP cloud service.
    Contact Palo Alto Networks Customer Support to increase the maximum number of cells supported for your
    Enterprise DLP
    tenant. Up to 1 billion cells are supported for your
    Enterprise DLP
    tenant.
  • The supported file encoding schemes are UTF-8, UTF-16, ISO-8859-1, and US-ASCII.
  • The EDM CLI application removes all punctuation from data contained in the EDM data set.
The EDM CLI application supports the following data type formats for EDM data sets.
Data Type
Format
Example
Date
  • DD-MM-YYYY
    DD/MM/YYYY
    DD.MM.YYYY
    DD,MM,YYYY
    DD MM YYYY
  • MM-DD-YYYY
    MM/DD/YYYY
    MM.DD.YYYY
    MM,DD,YYYY
    MM DD YYYY
  • YYYY-MM-DD
    YYYY/MM/DD
    YYYY.MM.DD
    YYYY,MM,DD
    YYYY MM DD
A space, dashes (
-
), slash (
/
), comma (
,
), period (
.
), and any combination of these separators are supported.
  • 2-Aug-2020
  • 02-Aug-2020
  • 02.08.2020
  • 02 Aug 2020
  • 2 August, 2020
  • 2 Aug, 2020
  • 02 August 2020
  • 2. August 2020
  • August 2, 2020
  • Aug 2, 2020
  • Sunday, August 2, 2020
  • Sunday, August 02, 2020
  • Sunday, 2 August, 2020
  • Sunday 02 August 2020
Exact Data Matching is performed for ambiguous dates.
  • 20-08-02
  • 02.08.20
  • 08/02/20
  • 08 2, 20
  • 02/08/20
  • 8/2/20
  • 2020/08/02
  • 2020-08-02
  • 02/08/2020
  • 2/08/2020
USA Social Security Number
  • XXX-XX-XXXX
  • XXX XX XXXX
  • XXX.XX.XXXX
  • XXXXXXXXX
A space, dashes (
-
), period (
.
) are supported separators.
  • 123-45-6789
  • 123 45 6789
  • 123.45.6789
  • 123456789
Country Name
  • Country full name
  • Country name abbreviation
An Exact Match is performed for a country name.
US
USA
United States
United States of America
The United States of America
First Name
Last Name
Middle Name
Full Name
Uppercase and lowercase.
Bill
bill
Bill’s
bill’s
Bill Smith
bill smith
Bill Smith’s
bill smith’s
Medical Record Number
An Exact Match is performed for a Medical Record Number.
N/A
Member ID
Reward ID
An Exact Match is performed for a Medical Record Number.
N/A
Alphanumeric
Alphabet
Numbers, uppercase, and lowercase letters.
ABCDEFG
abcdefg
AB123CG
AB123cdab123cd
USA Driver License
Alphanumeric.
E1234567
e1234567
Email
RFC5322—<emailprefix>@<emaildomain>
bill@business.com
BILL@BUSINESS.COM
BILL@business.com
bill@BUSINESS.com
Bank Routing Number
Bank Account Number
An Exact Match is performed for a bank routing number and bank account number.
N/A
IP Address (IPv4 and IPv6)
An Exact Match is performed for an IPv4 and IPv6 IP address.
N/A
Numbers
An Exact Match is performed for all numbers.
A positive signed integer (+) is removed and treated the same as nonsigned integer. A negative signed integer (-) isn’t removed as to differentiate between positive and negative signed integers.
  • SI Numbers - 1234, +1234, or -1234
  • Formatted Numbers—9.00
  • Indian Number System—12, 34, 567.89
Phone Number
Ten-digit US phone number format only.
Country code, parentheses, dash, space, and dots are removed.
8001234567
(800)1234567
1.800.123.4567
+1 (800)123-4567
1 800 123 4567
+1 800 123 4567
+1 800 123-4567
1-800-123-4567
1 (800) 123-4567
(800)123-4567
(800) 123 4567
800-123-4567
UUID
RFC4122—32 hexadecimal (base-16) digits. If you’re using hyphens, the total is 36 digits.
123e4567e89b12d3a45642661417400
123e4567-e89b-12d3-a456-42661417400
Credit Card
Between 13 to 23 digits including dashes.
4739-5402-9061-0638
4739540290610638

Recommended For You