Supported source file and data type formats for encrypted Exact Data Matching (EDM) data
sets.
Where Can I Use This?
What Do I Need?
NGFW (Panorama Managed)
Prisma Access
(Cloud Management)
SaaS Security
NGFW (Cloud Managed)
Enterprise Data Loss Prevention (E-DLP)
license
NGFW (Panorama Managed)
—Support and
Panorama
device management licenses
Prisma Access
(Cloud Management)
—
Prisma Access
license
SaaS Security
—
SaaS Security
license
NGFW (Cloud Managed)
—Support and
AIOps for NGFW Premium
licenses
Or any of the following licenses that include the
Enterprise DLP
license
Prisma Access
CASB license
Next-Generation
CASB for Prisma Access and NGFW (CASB-X)
license
Data Security
license
The Exact Data Matching (EDM) CLI
application supports CSV and TSV as source files for an encrypted EDM data
set upload to the DLP cloud service. Before you upload an encrypted EDM data set to the
DLP cloud service, review the supported CSV file, TSV file, and data type
formatting.
The DLP cloud service uses an Exact Match for values that do not follow the supported data type
format below or data types that have no unique formatting requirements. If a data type
follows the supported format, the DLP cloud service can match other instances of the
data type in the scanned file. For example, if you configure an EDM filtering profile to
block files that contains the social security number
456-12-7890
,
the DLP cloud service also matches instances of social security numbers that are
formatted as
456 12 7890
and
456.12.7890
.
However, if the EDM filtering profile is configured to block files containing the social
security number
456127890
, only files containing an exact match
to this social security number are blocked.
When preparing an EDM data set for upload, considering the following:
A header row is supported.
Data sets in CSV and TSV formats are supported.
CSV
format is recommended to adhere to the RFC-4180 standard.
Atomic columns are recommended to ensure accurate matching
of sensitive data.
Atomic columns are columns containing
cells that are expected to contain a discrete or unique Data Type
value. For example, in your data set you have the
SSN
column.
One of the cells in this column contains the value "
123456789;098765432
.
In this example, the DLP cloud service inspects for all incidents
of
123456789;098765432
as a singular
SSN rather than inspecting for
123456789
and
098765432
as
unique incidents.
Up to 50 individual Data Type values are supported in a single
cell.
The Data Types are data values recognized by the DLP
cloud service. If a cell has more than 50 Data Type values recognized
by the DLP cloud service, only the first 50 values are processed
and the remaining are ignored.
For example,
Today is August 02, 2020
contains three data type
values;
Today
and
is
are Alphabet data types and
August 02, 2020
is a
Date data type.
Only English (Latin script) is supported.
Only the “
,
” and tab (
t
)
delimiters are supported.
A maximum of 120 rows and 30 columns are supported per EDM data set.
For example, you have one EDM data set containing 30 columns and 4 million rows and a second EDM
data set containing six columns and 120 million rows. Both EDM data sets are
supported because they each have contain up to the maximum number of rows and
columns supported.
By default, up to 500 million cells are supported for a single
Enterprise DLP
tenant across
all EDM data sets uploaded to the DLP cloud service.
tenant. Up to 1 billion cells are
supported for your
Enterprise DLP
tenant.
The supported file encoding schemes are UTF-8, UTF-16, ISO-8859-1,
and US-ASCII.
The EDM CLI application removes all punctuation from data
contained in the EDM data set.
The EDM CLI application supports the following data type formats
for EDM data sets.
Data Type
Format
Example
Date
DD-MM-YYYY
DD/MM/YYYY
DD.MM.YYYY
DD,MM,YYYY
DD MM YYYY
MM-DD-YYYY
MM/DD/YYYY
MM.DD.YYYY
MM,DD,YYYY
MM DD YYYY
YYYY-MM-DD
YYYY/MM/DD
YYYY.MM.DD
YYYY,MM,DD
YYYY MM DD
A
space, dashes (
-
), slash (
/
), comma
(
,
), period (
.
), and any
combination of these separators are supported.
2-Aug-2020
02-Aug-2020
02.08.2020
02 Aug 2020
2 August, 2020
2 Aug, 2020
02 August 2020
2. August 2020
August 2, 2020
Aug 2, 2020
Sunday, August 2, 2020
Sunday, August 02, 2020
Sunday, 2 August, 2020
Sunday 02 August 2020
Exact Data Matching
is performed for ambiguous dates.
20-08-02
02.08.20
08/02/20
08 2, 20
02/08/20
8/2/20
2020/08/02
2020-08-02
02/08/2020
2/08/2020
USA Social Security Number
XXX-XX-XXXX
XXX XX XXXX
XXX.XX.XXXX
XXXXXXXXX
A space,
dashes (
-
), period (
.
) are
supported separators.
123-45-6789
123 45 6789
123.45.6789
123456789
Country Name
Country full name
Country name abbreviation
An Exact Match
is performed for a country name.
US
USA
United States
United
States of America
The United States of America
First Name
Last Name
Middle
Name
Full Name
Uppercase and lowercase.
Bill
bill
Bill’s
bill’s
Bill
Smith
bill smith
Bill Smith’s
bill smith’s
Medical Record Number
An Exact Match is performed for a Medical
Record Number.
N/A
Member ID
Reward ID
An Exact Match is performed for a Medical
Record Number.
N/A
Alphanumeric
Alphabet
Numbers, uppercase, and lowercase letters.
ABCDEFG
abcdefg
AB123CG
AB123cdab123cd
USA Driver License
Alphanumeric.
E1234567
e1234567
Email
RFC5322—<emailprefix>@<emaildomain>
bill@business.com
BILL@BUSINESS.COM
BILL@business.com
bill@BUSINESS.com
Bank Routing Number
Bank Account Number
An Exact Match is performed for a bank routing
number and bank account number.
N/A
IP Address (IPv4 and IPv6)
An Exact Match is performed for an IPv4
and IPv6 IP address.
N/A
Numbers
An Exact Match is performed for all numbers.
A positive signed integer (+) is removed and treated the same as nonsigned integer. A negative
signed integer (-) isn’t removed as to differentiate between
positive and negative signed integers.
SI Numbers - 1234, +1234, or -1234
Formatted Numbers—9.00
Indian Number System—12, 34, 567.89
Phone Number
Ten-digit US phone number format only.
Country
code, parentheses, dash, space, and dots are removed.
8001234567
(800)1234567
1.800.123.4567
+1
(800)123-4567
1 800 123 4567
+1 800 123 4567
+1
800 123-4567
1-800-123-4567
1 (800) 123-4567
(800)123-4567
(800)
123 4567
800-123-4567
UUID
RFC4122—32 hexadecimal (base-16) digits. If you’re using hyphens, the total is 36 digits.