Data Sovereignty Rules In WKC

guptaneeru
6 min readMay 23, 2022

In the previous post, we learned how to manage and assign locations in the Watson Knowledge Catalog. Once assets are assigned locations, then those can be governed by the data location rules and policies such that, data movement is restricted based on the sovereign rules.

Let's understand the sovereign rules in practical ways. Today, if you want to travel internationally, you need to have a valid passport from your home country and a valid visa to your destination. You acquire these documents based on the policies of both sovereign nations. As per the official definition; A passport is an official document issued by a government, certifying the holder’s identity and citizenship and entitling them to travel under its protection to and from foreign countries. A valid passport certifies that you are free to travel out of your home country, you are not escaping the country illegally and there are no pending litigations against you which prohibit you to leave the country. A valid visa authorizes you to enter another sovereign country. As per definition, the visa is an endorsement on a passport indicating that the holder is allowed to enter, leave, or stay for a specified period of time in a country.
In a nutshell, you need to abide by the Emigration policies of the origin country and the Immigration policies of the target country. In today’s data-driven technical world it is important that the data follows the rules of the source and the target countries while crossing the sovereign boundaries.

WKC enables you to define these data protection rules.
Egress (Outgoing data rules): When data leaves a sovereign country, data needs to follow the emigration rules of the origin country. Users can define data egress rules just like emigration rules. For example, if you want to ensure that no data from EU country leaves the sovereign boundaries unless sensitive information is redacted. You can define an Egress/outgoing rule as shown below in WKC.

Egress rule to mask all data from European Union

The above rule is applicable for the data that leaves a sovereign country. If the asset is from ‘European Union’ as defined in the condition in the data location rule, then any column which is tagged with ‘PII’ information will be redacted. If your assets are marked with sovereign location ‘European union’ or any of the child locations defined in the locations reference datasets like Germany or France (See locations management post), then the rule will be applied to all those assets. The enforcement of location rules based on hierarchical location definition avoids the need to create multiple conditions or multiple rules for each sub-location. Users do not need to explicitly define that the source location is ‘Germany’ or ‘France’. They can define rules for the parent location which are applicable to all the locations grouped in it.

Ingress Rules (Incoming data rules): WKC provides the capability to define rules when data enters a specific location. When a country wants to prevent data to enter a certain target location, it can define the Ingress/Incoming data location rule. For instance, if as per the business requirements or sovereign laws, the data is prohibited to travel to Turkey then you can define a rule saying if the target sovereignty is ‘Turkey’ then deny access to the data. The target sovereignty is determined from the location of the control pane where the data will be processed.

Data Location Rule to mask EU PII data on entering Turkey

The above rule is only triggered when the EU data is accessed from Turkey. In that case, the data is masked.

Data location rules (DLRs) are enforced on the sovereignty of the data. Watson Knowledge Catalog provides the ability to specify both the physical and sovereign location of the data assets (See locations management post).

  • In cases where sovereign location is not known for the assets but assets have the physical location, then the sovereign location is derived from the physical to sovereign location relationships defined in the location reference datasets. For instance, if an asset is located at the IBM-Dallas data center, then all the location rules which are defined for the United States will be enforced on this dataset, because of the relationship between Dallas and United States defined in the location reference dataset. (See locations management post). This is why it is important to establish the right location definition before defining the location rules.
  • In cases where neither physical nor sovereign location is defined on the assets (Assets that were cataloged before the location functionality was released), the clients have the option to provide a default location. The Rules enforcement Engine, uses the default location during the evaluation of rules.
  • If the source and target sovereign locations are the same, then the rules of data locations are not enforced and access is allowed.

Data location rules are enforced via Data Fabric Orchestrator (DFO) APIs. DFO acts like a gatekeeper that controls the access to the data assets cataloged in WKC by applying Data protection and Data location rules. Today, Watson Studio Notebooks is the only service that has adopted the DFO APIs. Once you create the data location rules in your account and access the assets from WKC using DFO APIs in the notebook, then both the data protection and data location rules and policies will be enforced.

At this time, enforcement of Data Location rules occurs at the IBM data center where the Data Fabric Orchestrator, Notebook server, and Watson Knowledge Catalog run. It's the location of Notebook runtime, which DFO considers the target location for the data. If there are Ingress DLRs defined that prevent the data access to the location where the notebook is running, then the notebook will not be able to access the data. At this time, the target location is not based on the end-user location where the user is located.
In the experimental release, Egress (Outgoing) rules are enforced at the processing location of DFO in comparison to the source location where data is located. This is called ‘Soft Sovereignty Enforcement’. The egress rules will be enforced at the source in future releases.

Conventions of Data Location Rules Enforcement: Data location rules can be set to follow two types of conventions.

Deny Everything Author Allow (DEAA): The data is prevented from crossing the sovereign boundaries until the data owners explicitly allow the flow of data across sovereign boundaries. That implies no one including the data owners can access the data from outside the sovereign boundaries unless special governance rules and policies are authored to allow access. ‘DEAA’ is the default setting for an account but the convention can be toggled.

Allow Everything Author Deny (AEAD): Users have the option to toggle the settings to AEAD. If a user opts ‘Allow Everything Author Deny’ then the data remains accessible across sovereign boundaries unless policies prevent the data from flowing to or from a specific sovereign location.

Tenants have the option to change the convention if no DLRs exist in the account. See more details in the WDP documentation.

Please note, that this release of Data Fabric Orchestrator is designated “experimental” which means it is being delivered by IBM for evaluation purposes.
See more details and links to a tutorial demonstrating the use of Data Location Rules and Data Fabric Orchestrator at:: https://community.ibm.com/community/user/cloudpakfordata/communities/community-home/d[…]50-ae4c-57d769937235&bm=3899732d-8a8c-48dd-9077-d20df5a00bb9

In this post, we learned about:
How to define the data location rules in Watson Knowledge Catalog?
What are the different types of data access conventions?
When and how the data location rules are enforced in IBM Cloud Pak for data?

--

--

No responses yet