DataMaskingWithSubstitution_Complete.pdf

7
© 2008 Informatica Corporation Data Masking with Lookup Transformations

Transcript of DataMaskingWithSubstitution_Complete.pdf

Page 1: DataMaskingWithSubstitution_Complete.pdf

© 2008 Informatica Corporation

Data Masking with Lookup Transformations

Page 2: DataMaskingWithSubstitution_Complete.pdf

2

Overview The following example shows how to configure multiple Lookup transformations to retrieve substitute data for data masking.

The example includes the following types of data masking:

Name and address substitution from Lookup tables

Key masking

Blurring

Special mask formats

Customer Source Data A customer database table contains sensitive data that you want to use in a test scenario, but you need to maintain security. You can create a mapping to mask each column of the customer data and write the test data to another table.

The following table describes each source column and how it is masked in this example: Column Datatype Mask Transformation Mask Objectives

CustID Integer Data_Masking Replace with a random number that is repeatable each time the same CustID occurs in the source.

FullName String LKP_Firstnames LKP_Surnames EXPTRANS

Substitute with a combination of a random first name and a random last name from sample files.

Address String

City String

State String

Zip5 String

Country String

LKP_Address

Substitute with a random address from a sample file. Parse the address into the street address, city, state, zip, and country.

Phone String Data_Masking Replace with random numbers and retain the same phone number format.

Fax String Data_Masking Replace with random numbers and retain the same phone number format.

CreatedDate Date Data_Masking Replace with a date that is within one year of the source date.

Email String Data_Masking Replace with random characters and retain the same format as the original email address.

SSN String Data_Masking Replace with an invalid social security number.

CreditCard String Data_Masking Replace with random numbers that have a valid credit card checksum.

Sample Name and Address Data This example uses several files installed with Data Masking. The mapping substitutes production names and addresses with names and addresses from these files.

The following files are in the <PowerCenter installation directory>server\infa_shared\LkpFiles folder:

Firstnames.dic. Contains 21,000 first names. Each record has a serial number, name, and gender code.

Page 3: DataMaskingWithSubstitution_Complete.pdf

3

Surnames.dic. Contains 13,000 last names. Each record has a serial number and a last name.

Address.dic. Contains 81,000 addresses. Each record has a serial number, street, city, state, zip code, and country.

Mapping The mapping includes a Data Masking transformation to mask some of the customer fields. It also includes three Lookup transformations that retrieve substitute first names, last names, and addresses from the sample files.

The following figure shows the mapping:

The mapping has the following transformations: Transformation Type Description

SQ_Customers_Prod Source Qualifier Passes customer data to the Data Masking transformation.

Data_Masking Data Masking Generates random numbers to look up a substitute first name, last name, and address. Applies special mask formats to the phone number, fax, email address, and credit card number.

LKUP_Firstnames Lookup Performs a flat file lookup on Firstnames.dic to retrieve a substitute first name.

LKUP_Surnames Lookup Performs a flat file lookup on Surnames.dic to retrieve a substitute last name.

Exptrans Expression Combines the first and last name. Returns a full name.

LKUP_Address Lookup Performs a flat file lookup on the Address.dic file to retrieve a substitute address.

Source Qualifier Transformation The Source Qualifier passes customer data to the Data Masking transformation. It passes the CustID column to the following ports in the Data Masking transformation:

CustID. Customer number.

Randid1. Random number generator for lookups on the Firstnames.dic file.

Randid2. Random number generator for lookups on the Surnames.dic file.

Randid3. Random number generator for lookups on the Address.dic file.

Page 4: DataMaskingWithSubstitution_Complete.pdf

4

Data Masking Transformation The Data Masking transformation applies special mask formats to the phone number, fax, email address, and credit card number. It generates random numbers to look up the replacement first name, last name, and address.

The Data Masking transformation is a passive transformation. The transformation has one output port for each input port. When you add a port to the transformation, the Designer adds an output port by default. Each output port name is out_<port name>.

Configure the masking properties for input ports on the Masking Properties tab.

The following figure shows the Masking Properties tab:

You can configure the following types of masking for each column:

Key. Mask source data with repeatable values. The Data Masking transformation produces deterministic results for the same source data and seed value. The Data Masking transformation requires a seed value to produce deterministic results.

Random. Mask source data with random, non-repeatable values. Random masking does not require a seed value. The results of random masking are non-deterministic. If you choose random masking, the Designer displays additional masking properties that you can configure to restrict the results.

No Masking. Do not mask the source data. The Data Masking transformation returns the data to the output port without changing it. Default is No Masking.

Special Masks. Mask an SSN, credit card number, phone number, URL, email address, or IP address. The Data Masking transformation applies built-in rules to intelligently mask these common types of sensitive data.

Masking Properties for the Customers Data The Data Masking transformation modifies the following fields from the source:

CustID. Key masking. Produces deterministic results for the same source data and seed value.

Page 5: DataMaskingWithSubstitution_Complete.pdf

5

Phone. Phone masking. The Data Masking transformation retains the format of the original phone number when it returns a new number.

Fax. Phone masking.

CreatedDate. Random masking within a variance of the source date. The Data Masking transformation returns a CreatedDate that is within one year of the source date. Configure blurring to mask the date as a variance of the source date. The unit is a part of the date to apply the variance to. You can select a year, month, day, or year. For this example, the variance is one year. The low and high bounds define the variance above and below the source date year. The Data Masking transformation applies the variance to the unit and masks the rest of the date with random numbers.

Email. Email masking. The Data Masking transformation returns an email address of random ASCII characters when it masks an email address.

SSN. SSN masking. The Data Masking transformation creates a random social security number and verifies that the number is not on the High Group List from the Social Security Administration. The High Group List contains valid numbers that the Social Security Administration has issued.

CreditCard. Credit Card masking. The Data Masking transformation generates a logically valid credit card number when it masks a valid credit card number. The length of the source credit card number must be from 13 to 19 digits. The Data Masking transformation does not mask the six-digit Bank Identification Number (BIN).

RANDID1. Random masking. RANDID1 represents a random serial number in the Firstnames.dic file. RANDID1 must be an integer from 1 to 21,000 because Firstnames.dic file has 21,000 records.

RANDID2. Random masking. RANDID2 represents a random serial number in the Surnames.dic file. RANDID2 must be an integer from 1 to 13,000

RANDID3. Random masking. RANDID3 represents a random serial number in the Address.dic file. RANDID3 must be an integer from 1 to 81,000.

The following figure shows how to configure a range of valid mask values for RANDID1:

Page 6: DataMaskingWithSubstitution_Complete.pdf

6

Lookup Transformations The mapping contains Lookup transformations to retrieve random first name, last name, and address records from the sample name and address files. Each Lookup transformation receives a different random number from the Data Masking transformation.

The following figure shows the ports in the LKP_Firstnames Lookup transformation:

The lookup condition compares the value of the random number with the serial number in the Firstnames.dic file. The Lookup transformation returns the record that contains the serial number equal to the random number.

The Condition tab contains the following lookup condition: SNO = out_RANDID1

Expression Transformation The Expression transformation receives a first name from the LKP_Firstnames transformation and a last name from the LKP_Surnames transformation. The Expression transformation combines the first and last name into a full name column. The Expression transformation output port has an expression that adds a space between the first and last name. The Expression transformation passes the full name to the target.

The following expression concatenates the first and last names: FIRSTNAME || ' ' || SURNAME

Customers Target The Customers_Test target definition receives the new customer data from the following mapping components:

Transformation Column Name

Data Masking transformation CustID, Phone, Fax, CreatedDate, Email, SSN, CreditCard

Expression Transformation Full Name

Lkp_Address Lookup Transformation Address, City, State, Zip5, Country

The Customers_Test table contains realistic data that you can use in a test environment. None of the original data can be derived from the substitute data.

Page 7: DataMaskingWithSubstitution_Complete.pdf

7

Author Ellen Chandler Principal Technical Writer