Supported EDM Data Set Formats
Supported source file and data type formats for encrypted
Exact Data Matching (EDM) data sets.
The Exact Data Matching
(EDM) Cli application supports CSV and TSV as source files for
an encrypted EDM data set upload to the DLP cloud service. Before
you upload an encrypted EDM data set to the DLP cloud service, review
the supported CSV file, TSV file, and data type formatting.
The DLP cloud services uses an Exact Match for values that do
not follow the supported data type format below or data types that
have no unique formating requirements. If a data type follows the
supported format, the DLP cloud service can match other instances
of the data type in the scanned file. For example, if you configure
an EDM filtering profile to block files that contains the social
security number
456-12-7890
, the DLP cloud
service also matches instances of social security numbers that are
formated as 456 12 7890
and 456.12.7890
.
However, if the EDM filtering profile is configured to block files
containing the social security number 456127890
,
only files containing an exact match to this social security number
are blocked.When preparing an EDM data set for upload, considering the following:
- A header row is supported.
- Data sets in CSV and TSV formats are supported.CSV format is recommended to adhere to the RFC-4180 standard.
- Atomic columns are recommended to ensure accurate matching of sensitive data.Atomic columns are columns containing cells that are expected to contain a discrete or unique Data Type value. For example, in your data set you have theSSNcolumn. One of the cells in this column contains the value "123456789;098765432. In this example, the DLP cloud service inspects for all incidents of123456789;098765432as a singular SSN rather than inspecting for123456789and098765432as unique incidents.
- Up to 50 individual Data Type values are supported in a single cell.The Data Types are data values recognized by the DLP cloud service. If a cell has more than 50 Data Type values recognized by the DLP cloud service, only the first 50 values are processed and the remaining are ignored.For example,Today is August 02, 2020contains 3 data type values;Todayandisare Alphabet data types andAugust 02, 2020is a Date data type.
- Only English (Latin script) is supported.
- Only the “,” and tab (t) delimiters are supported.
- Up to 120 million cells are supported with a maximum of 30 columns.For example, you have one EDM data set containing 30 columns and 4 million rows and a second EDM data set containing 6 columns and 20 million rows. Both EDM data sets are supported because they each have 120 million cells in each data set.
- Up to 500 million cells are supported for a single user across all EDM datasets uploaded to the DLP cloud service.Contact Palo Alto Networks Support or your Palo Alto Networks sales representative if you need this cell maximum increased.
- The supported file encoding schemes are UTF-8, UTF-16, ISO-8859-1, and US-ASCII.
- The EDM CLI application removes all punctuation from data contained in the EDM data set.
The EDM CLI application supports the following data type formats
for EDM data sets.
Data Type | Format | Example |
---|---|---|
Date |
A
space, dashes ( - ), slash (/ ), comma
(, ), period (. ), and any
combination of these separators are supported. |
Exact Data Matching
is performed for ambiguous dates.
|
USA Social Security Number |
A space,
dashes ( - ), period (. ) are
supported separators. |
|
Country Name |
An Exact Match
is performed for a country name. | US USA United States United
States of America The United States of America |
First Name Last Name Middle
Name Full Name | Uppercase and lowercase. | Bill bill Bill’s bill’s Bill
Smith bill smith Bill Smith’s bill smith’s |
Medical Record Number | An Exact Match is performed for a Medical
Record Number. | N/A |
Member ID Reward ID | An Exact Match is performed for a Medical
Record Number. | N/A |
Alphanumeric Alphabet | Numbers, uppercase, and lowercase letters. | ABCDEFG abcdefg AB123CG AB123cdab123cd |
USA Driver License | Alphanumeric. | E1234567 e1234567 |
Email | RFC5322—<emailprefix>@<emaildomain> | bill@business.com BILL@BUSINESS.COM BILL@business.com |
Bank Routing Number Bank Account Number | An Exact Match is performed for a bank routing
number and bank account number. | N/A |
IP Address (IPv4 and IPv6) | An Exact Match is performed for an IPv4
and IPv6 IP address. | N/A |
Numbers | An Exact Match is performed for all numbers. A
positive signed integer (+) is removed and treated the same as non-signed
integer. A negative signe d integer (-) is not removed as to differentiate between
positive and negative signed integers. |
|
Phone Number | Ten digit US phone number format only. Country
code, parentheses, dash, space, and dots are removed. | 8001234567 (800)1234567 1.800.123.4567 +1
(800)123-4567 1 800 123 4567 +1 800 123 4567 +1
800 123-4567 1-800-123-4567 1 (800) 123-4567 (800)123-4567 (800)
123 4567 800-123-4567 |
UUID | RFC4122—32 hexadecimal (base-16) digits.
If you are using hyphens, the total is 36 digits. | 123e4567e89b12d3a45642661417400 123e4567-e89b-12d3-a456-42661417400 |
Credit Card | Between 13 to 23 digits including dashes. | 4739-5402-9061-0638 4739540290610638 |
Most Popular
Recommended For You
Recommended Videos
Recommended videos not found.