Supported EDM Data Set Formats

Supported source file and data type formats for encrypted Exact Data Matching (EDM) data sets.
The Exact Data Matching (EDM) Cli application supports CSV and TSV as source files for an encrypted EDM data set upload to the DLP cloud service. Before you upload an encrypted EDM data set to the DLP cloud service, review the supported CSV file, TSV file, and data type formatting.
The DLP cloud services uses an Exact Match for values that do not follow the supported data type format below or data types that have no unique formating requirements. If a data type follows the supported format, the DLP cloud service can match other instances of the data type in the scanned file. For example, if you configure an EDM filtering profile to block files that contains the social security number
456-12-7890
, the DLP cloud service also matches instances of social security numbers that are formated as
456 12 7890
and
456.12.7890
. However, if the EDM filtering profile is configured to block files containing the social security number
456127890
, only files containing an exact match to this social security number are blocked.
When preparing an EDM data set for upload, considering the following:
  • A header row is supported.
  • Only the “
    ,
    ” and tab (
    t
    ) delimiters are supported.
  • Up to 120 million cells are supported with a maximum of 30 columns.
    For example, you have one EDM data set containing 30 columns and 4 million rows and a second EDM data set containing 6 columns and 20 million rows. Both EDM data sets are supported because they each have 120 million cells in each data set.
  • The supported file encoding schemes are UTF-8, UTF-16, ISO-8859-1, and US-ASCII.
  • The EDM CLI application removes all punctuation from data contained in the EDM data set.
The EDM CLI application supports the following data type formats for EDM data sets.
Data Type
Format
Example
Date
  • DD-MM-YYYY
    DD/MM/YYYY
    DD.MM.YYYY
    DD,MM,YYYY
    DD MM YYYY
  • MM-DD-YYYY
    MM/DD/YYYY
    MM.DD.YYYY
    MM,DD,YYYY
    MM DD YYYY
  • YYYY-MM-DD
    YYYY/MM/DD
    YYYY.MM.DD
    YYYY,MM,DD
    YYYY MM DD
A space, dashes (
-
), slash (
/
), comma (
,
), period (
.
), and any combination of these separators are supported.
  • 2-Aug-2020
  • 02-Aug-2020
  • 02.08.2020
  • 02 Aug 2020
  • 2 August, 2020
  • 2 Aug, 2020
  • 02 August 2020
  • 2. August 2020
  • August 2, 2020
  • Aug 2, 2020
  • Sunday, August 2, 2020
  • Sunday, August 02, 2020
  • Sunday, 2 August, 2020
  • Sunday 02 August 2020
Exact Data Matching is performed for ambiguous dates.
  • 20-08-02
  • 02.08.20
  • 08/02/20
  • 08 2, 20
  • 02/08/20
  • 8/2/20
  • 2020/08/02
  • 2020-08-02
  • 02/08/2020
  • 2/08/2020
USA Social Security Number
  • XXX-XX-XXXX
  • XXX XX XXXX
  • XXX.XX.XXXX
  • XXXXXXXXX
A space, dashes (
-
), period (
.
) are supported separators.
  • 123-45-6789
  • 123 45 6789
  • 123.45.6789
  • 123456789
Country Name
  • Country full name
  • Country name abbreviation
An Exact Match is performed for a country name.
US
USA
United States
United States of America
The United States of America
First Name
Last Name
Middle Name
Full Name
Uppercase and lowercase.
Bill
bill
Bill’s
bill’s
Bill Smith
bill smith
Bill Smith’s
bill smith’s
Medical Record Number
An Exact Match is performed for a Medical Record Number.
N/A
Member ID
Reward ID
An Exact Match is performed for a Medical Record Number.
N/A
Alphanumeric
Alphabet
Numbers, uppercase, and lowercase letters.
ABCDEFG
abcdefg
AB123CG
AB123cdab123cd
USA Driver License
Alphanumeric.
E1234567
e1234567
Email
RFC5322—<emailprefix>@<emaildomain>
bill@business.com
BILL@BUSINESS.COM
BILL@business.com
bill@BUSINESS.com
Bank Routing Number
Bank Account Number
An Exact Match is performed for a bank routing number and bank account number.
N/A
IP Address (IPv4 and IPv6)
An Exact Match is performed for an IPv4 and IPv6 IP address.
N/A
Numbers
An Exact Match is performed for all numbers.
A positive signed integer (+) is removed and treated the same as non-signed integer. A negative signe d integer (-) is not removed as to differentiate between positive and negative signed integers.
  • SI Numbers - 1234, +1234, or -1234
  • Formatted Numbers—9.00
  • Indian Number System—12, 34, 567.89
Phone Number
Ten digit US phone number format only.
Country code, parentheses, dash, space, and dots are removed.
8001234567
(800)1234567
1.800.123.4567
+1 (800)123-4567
1 800 123 4567
+1 800 123 4567
+1 800 123-4567
1-800-123-4567
1 (800) 123-4567
(800)123-4567
(800) 123 4567
800-123-4567
UUID
RFC4122—32 hexadecimal (base-16) digits. If you are using hyphens, the total is 36 digits.
123e4567e89b12d3a45642661417400
123e4567-e89b-12d3-a456-42661417400
Credit Card
Between 13 to 23 digits including dashes.
4739-5402-9061-0638
4739540290610638

Recommended For You