Data Masking Best Practices

Data Masking Best Practices

It is important to know different building blocks when implementing data masking functionality using Informatica’s Test Data Management tool. Firstly, Policies and Domains are not mandatory. They are useful when assigning the rules to different tables-columns quickly. It will reduce your work in rule assignment only.

“Rules” when created, can be added to “n” number of table columns. A single column can be assigned with multiple rules, but, you have to select a default rule (this topic is kind of “advanced”, so do not go there yet).

Rules (like policies) are global. If you want to use a rule within a project, you will have to add them into the project. You can add a single rule in as many projects as possible. Beware that, once you modify a rule, all rule assignments (across all projects wherever it has been assigned) will be impacted and you will receive an error same as the one you got (from the screen print). Now, this was technical, let’s talk about the approach.

You need to create a project for each domain/application within your organization. Then, import the necessary tables into it. The next step is to import the rules into the project and then assign them accordingly to the table-columns. When you create a plan, you need to add the masking rules to the plan. When you get to the “Criteria” step as part of the “Plan Creation Wizard”, you can see many table-columns. If you do not want any of the columns to be masked, you can select them (via checkbox) and turn-off the scope (you can see the options at the top right corner). In your case, you have added the rules to the policy and then when used that policy in the plan, you are seeing more tables than required (which is correct behavior as you are saying that you want to mask according to the policy). In such cases, you can select the unwanted columns and turn-off the scope, so that, those tables are not involved in the masking process.

Make sure to create rules such that there are no modifications, because, they will have huge impact if changed as mentioned above. If the requirement demands, you can create domain/application specific rules as well and add them in those projects only. Also, when creating plans, you can create a single table per mapping so that the processing happens in parallel – this is valid only if you are doing in-place masking (source & target are same; only UPDATEs the tables). You cannot do that if your source and target are different as you need to maintain the referential integrity when inserting data.

TDM tool can subset, mask, generate test data and also can create test data warehouse. It is not mandatory to use Data Subset functionality to mask the data. You can always do in-place (data at rest) masking. However, if the requirement is to move subset of data from one database (e.g. Production) into another (e.g. QA or Development) you will be using this feature. Again, this is implicit. When you select the masking rules, and a different source and target connections, that configuration itself is sufficient to treat it as a subset & masking plan. So, even though you are not using Subset components, you can still mask the data. However, depending on certain contexts, it might be useful to create a subset entity, and use it accordingly within a plan.

TDM tool will never generate “mapplets” by itself. However, you can develop mapplets separately and import them within TDM as masking rules. When you generate mappings for these rules, a reusable mapplet is created. You can also use a DMO transformation directly in the mapping. However, in that case, there is no need to generate a separate TDM plan and a corresponding mapping (which generates automatically from TDM). A traditional DMO transformation doesn’t offer all attributes when compared with doing it via TDM. Additionally, when you do via TDM, the amount of time required to develop mappings is drastically reduced. You can easily configure the rules against multiple columns easily via the TDM UI.

When you want to achieve same masking results across multiple tables and schemas, you need to leverage the “repeatable” option on the masking rules. Almost all the masking rules have this option. When you are creating a rule, make it repeatable and specify a “seed” value – you will see that as soon as you select “repeatable” option, the seed value field will be enabled to enter a value. This way, the masking algorithm makes sure that for a given input value (ex. ABC), the masked value is always the same (ex. XYZ) across all masking runs.