How to exchange metadata between WKC and other metadata repositories using Egeria.

guptaneeru
8 min readOct 14, 2021

In my last post, I talked about how we leveraged Egeria’s Open Metadata Repository Services (OMRS) to modernize our data catalog and assist customers in the modernization journey. From the Cloud Pak for Data 4.0 release, we are extending Egeria support to enable the Watson® Knowledge Catalog (WKC) to participate in metadata exchange with any other Egeria compliant metadata repository. Let’s dig deeper to know how to add WKC to the ecosystem to exchange metadata with other repositories.

WKC provides the ability to discover metadata and makes it actionable by assigning glossary terms and data classes. Hence, there are two aspects of managing the metadata in WKC.

  1. Data Assets: Data Assets like datasets, tables, columns, and files as well as their relationships to glossary artifacts are managed in CAMS (Common Assets Managed Services) repositories.
  2. Glossary Artifacts: Glossary artifacts like business terms and data classes are managed in the business glossary repository.

While you may perceive Assets and Glossary artifacts as one unit, these are two different repositories in WKC and need to participate in cohorts separately.

a rectangle in the middle of the image labeled “egeria cohort”. there is a rectangle labeled “c4pd 1” with two rectangles inside of it. The inner rectangles are labeled “business glossary” and “CAMS”. They have two way arrows from them to the central rectangle. There are three other rectangles labeled “cp4d 2”, “information server” and “other egeria compliant repositories (like atlas, microsoft)”. they also have two way arrows from them to the central rectangle

Let’s start with assets in CAMS. CAMS has an embedded Egeria engine. The assets are owned by catalogs in WKC and there can be multiple catalogs in WKC. Admins or Owners can choose which catalog should participate in a cohort. Once you add a catalog to OMRS cohort, any asset added in the catalog triggers Kafka OMRS messages which are sent to the cohort.

To add a catalog to the cohort, you need a catalog id, a cohort name, and Kafka configuration. You must be an admin of the catalog to be able to add the catalog to the cohort. You can find the catalog id from the UI (highlighted string in the image below). Alternatively, you can use the list APIs and get the list of cohorts and get their id (https://api.dataplatform.dev.cloud.ibm.com/v2/cams/explorer/#!/Catalogs/listCatalogsV2).

Once you know the catalog id, you need to invoke two different CAMS APIs to setup the cohort and activate the OMRS instance:

  1. Add CAMS to OMRS cohort: You can use the below CAMS APIs to add the catalog to the cohort. In the API payload, you can specify the Kafka configurations of the cohort.
curl -vk -H “Authorization: Bearer $token” POST https://<ip-address>/v2/catalogs/{catalogId}/open-metadata/cohorts/{cohort_name}

cohort_name is the name of the cohort you are participating in. cohort_name is embedded in the Kafka topic name over which the communication will take place. The Egeria format for a Kafka topic name is ‘egeria.omag.openmetadata.repositoryservices.cohort.{cohort_name}.OMRSTopic

In the above call, you will provide the Kafka consumer and Kafka producer properties (Details are in the WKC documentation). The Kafka consumer and producer properties can point to the same Kafka server and port. This signifies where the cohort members consume cohort messages and where will it produce the messages.
Note that if you are setting up a new cohort, you may need to create the Kafka topic in your Kafka server and the Kafka server should be accessible from the WKC/CPD system.

2. Activate the OMRS engine: Once Kafka configurations are specified, you will need to activate the OMRS instance such that the Egeria engine can start producing and consuming messages from this topic. To activate the CAMS OMRS instance you will need to invoke the following API:

curl -vk -H “Authorization: Bearer $token” -X POST https://<ip-address>v2/catalogs/<catalog-id>/open-metadata/instance

If the call is successful it will return HTTP Success (200) message. If the call does not return back, likely, the Kafka server is not reachable from the WKC. You can look at catalog-api pod logs to get more information. You can also try to ping your Kafka server or list the Kafka topics from your remote Kafka server within the CPD/WKC pods to make sure that the Kafka server is accessible by the WKC.

After successful activation of Egeria engine, you can see the registration messages in the Cohort Kafka Topic like below. ‘metadataColelctionId’ in the message will be the guid of the participating catalog.

From within CPD system, you can execute the following command to see the messages on the Cohort Kafka topic:

oc exec -it kafka-0 — /opt/kafka/bin/kafka-console-consumer.sh — bootstrap-server <remote kafkaserver:port> — topic <cohort topic name> — from-beginning

{"class":"OMRSEventV1","protocolVersionId":"OMRS V1.0","timestamp":1650640771383,"originator":{"metadataCollectionId":"29a40148-ed5f-456e-917e-f43816f315fc","serverName":"29a40148-ed5f-456e-917e-f43816f315fc_omag","serverType":"Repository Proxy","organizationName":"IBM"},"eventCategory":"REGISTRY","registryEventSection":{"registryEventType":"REGISTRATION_EVENT","registrationTimestamp":1650640771382,"metadataCollectionName":"WKC_BG_29a40148-ed5f-456e-917e-f43816f315fc","remoteConnection":{"class":"Connection","headerVersion":0,"connectorType":{"class":"ConnectorType","headerVersion":0,"type":{"class":"ElementType","headerVersion":0,"elementOrigin":"LOCAL_COHORT","elementVersion":0,"elementTypeId":"954421eb-33a6-462d-a8ca-b5709a1bd0d4","elementTypeName":"ConnectorType","elementTypeVersion":1,"elementTypeDescription":"A set of properties describing a type of connector."},"guid":"75ea56d1-656c-43fb-bc0c-9d35c5553b9e","qualifiedName":"OMRS REST API Repository Connector","displayName":"OMRS REST API Repository Connector","description":"OMRS Repository Connector that calls the repository services REST API of a remote server.","connectorProviderClassName":"org.odpi.openmetadata.adapters.repositoryservices.rest.repositoryconnector.OMRSRESTRepositoryConnectorProvider"},"endpoint":{"class":"Endpoint","headerVersion":0,"address":"https://localhost:9443/servers/29a40148-ed5f-456e-917e-f43816f315fc_omag"}}}}
{"class":"OMRSEventV1","protocolVersionId":"OMRS V1.0","timestamp":1650640771482,"originator":{"serverName":"29a40148-ed5f-456e-917e-f43816f315fc_omag","serverType":"Repository Proxy","organizationName":"IBM"},"eventCategory":"REGISTRY","registryEventSection":{"registryEventType":"REFRESH_REGISTRATION_REQUEST"}}
{"class":"OMRSEventV1","protocolVersionId":"OMRS V1.0","timestamp":1650643548672,"originator":{"metadataCollectionId":"29a40148-ed5f-456e-917e-f43816f315fc","serverName":"29a40148-ed5f-456e-917e-f43816f315fc_omag","serverType":"Repository Proxy","organizationName":"IBM"},"eventCategory":"REGISTRY","registryEventSection":{"registryEventType":"RE_REGISTRATION_EVENT","registrationTimestamp":1650640771000,"metadataCollectionName":"WKC_BG_29a40148-ed5f-456e-917e-f43816f315fc","remoteConnection":{"class":"Connection","headerVersion":0,"connectorType":{"class":"ConnectorType","headerVersion":0,"type":{"class":"ElementType","headerVersion":0,"elementOrigin":"LOCAL_COHORT","elementVersion":0,"elementTypeId":"954421eb-33a6-462d-a8ca-b5709a1bd0d4","elementTypeName":"ConnectorType","elementTypeVersion":1,"elementTypeDescription":"A set of properties describing a type of connector."},"guid":"75ea56d1-656c-43fb-bc0c-9d35c5553b9e","qualifiedName":"OMRS REST API Repository Connector","displayName":"OMRS REST API Repository Connector","description":"OMRS Repository Connector that calls the repository services REST API of a remote server.","connectorProviderClassName":"org.odpi.openmetadata.adapters.repositoryservices.rest.repositoryconnector.OMRSRESTRepositoryConnectorProvider"},"endpoint":{"class":"Endpoint","headerVersion":0,"address":"https://localhost:9443/servers/29a40148-ed5f-456e-917e-f43816f315fc_omag"}}}}
{"class":"OMRSEventV1","protocolVersionId":"OMRS V1.0","timestamp":1650643548680,"originator":{"serverName":"29a40148-ed5f-456e-917e-f43816f315fc_omag","serverType":"Repository Proxy","organizationName":"IBM"},"eventCategory":"REGISTRY","registryEventSection":{"registryEventType":"REFRESH_REGISTRATION_REQUEST"}}
^CProcessed a total of 4 messages

These two steps will ensure that the specific catalog is participating in the OMRS cohort. Any assets added to the catalog will produce Kafka messages to the cohort topic and all other participating repositories can start consuming the messages. You can add another instance of WKC from another Cloud Pak for the Data system or any other catalog to the same cohort by repeating the two steps above for a new catalog id. Once two catalogs (on the same or different Cloud Pak for Data systems ) are added to the same cohort, the assets will start flowing between the two catalogs.

If there are other metadata repositories like Atlas participating in the cohort, then WKC will consume messages from that. It will be able to create and persist assets from Atlas in WKC. IBM Information Server also has support to set up cohorts. You can add an Information server to the cohort with WKC as well. This will enable you to exchange metadata between the Information server and WKC.

The catalog where assets are created acts as the master repository of assets. Assets can only be updated and deleted in the master repository. All the updates to the assets are synchronized from the master repository to the metadata cohort. Target repositories can handle the update and delete messages and update the reference copies.

While assets can not be updated or deleted by the receiving repositories, Egeria allows adding relationships to the reference copies. For instance, you can assign glossary terms, tags, or data classes to the reference copies of assets in your repository which came from other members. The added relationship messages are synced back to the cohort. Just like the reference entities, reference relationships can not be edited/deleted by other repositories. So, a term assigned by a repository can only be edited/removed by the master repository.

Now, let’s discuss adding Business glossary (BG) to the cohort. Think about the business glossary as another metadata repository participating in the cohort. When you want to assign a term to an asset, that term is managed by the business glossary. So when participating in an external cohort, it is recommended (not required) that both the repositories (BG and CAMS) are added to the cohort. That way assets and asset relationships to glossary artifacts will be synchronized completely to the cohort.

To add the WKC Business glossary repository to a cohort you need to repeat similar steps for the glossary repository.

Register Glossary repository in Cohort: Use the below API to specify cohort kafka configurations to add BG in the cohort.

curl — location — request POST ‘https://<hostname>/v3/glossary_terms/admin/open-metadata/cohorts/<cohort-name>?topic_url_root=egeria.omag'

In WKC, Business Glossary is organized under categories. If you do not want all the categories to participate in a cohort then WKC BG APIs provide you the flexibility to only add a specific category and its children to the cohort. This is controlled by the user permissions. While configuring the cohort, you can specify a ‘sending_user_id’. Only the categories where configured cohort user has admin privileges will be sent to the cohort. This way a subset of categories can participate in the cohort. As an administrator, you can create a special ‘cohort_user’, and add that user as an ‘Admin’ to all the categories which should participate in the cohort metadata exchange. If you only wish to receive messages from the cohort and do not wish to send the messages to the cohort, then you can leave this field empty.

If there are multiple WKC repositories participating in the metadata exchange cohort, then you can provide the additional configurations in the target WKC. These additional configurations will control which incoming messages will be handled in the target WKC repository. The incoming Egeria messages from other categories will be persisted in the configured ‘Exceptions category’.

Use the below API to setup additional configurations.

curl -k -X POST "https://<hostname>/v3/glossary_terms/admin/open-metadata/external_repositories" -H "Authorization: Bearer <token>" -H "Content-Type: application/json --data '{
"default_configuration": {
"receiving_user_id": "<receiving_user_id>",
"default_category_artifact_id": "<default_category_id>",
"exceptions_category_artifact_id": "<exception_category_id>"
},
"repositories": [
{
"source_repository_id": "<source_repository_id>",
"source_repository_name": "<source_repository_name>",
"receiving_user_id": "<receiving_user_id>",
"default_category_artifact_id": "<default_category_id>",
"exceptions_category_artifact_id": "<exception_category_id>",
"root_categories": [
{
"external_id": "<external_id>",
"name": "<name>"
}
]
}
]
}'"

In the above API, you can specify the list of root categories from external repositories from where you want to accept the artifacts. If the list of root categories is not configured, then all the categories and artifacts are accepted by the WKC BG repository. If a list of external root categories is specified, then the artifacts from other categories are added to the default or specific exceptions category.

As part of the above API, you can also specify the home category which will be an exception category for the incoming ignored artifacts.

With these simple steps, WKC is ready to exchange metadata with other Egeria-compliant repositories. Please check out the official documentation (WKC Documentation) of APIs and asset types that are supported in WKC at this time. We are continuously enhancing the support and are excited to participate in the distributed metadata exchange landscape.

WKC official and detailed documentation to setup cohort: WKC Documentation
The swagger documentation of CAMS and BG also describe these API. You can access swagger documentation at:

CPD
BG: https://<host:port>/v3/glossary_terms/api
CAMS: https://<host:port> /v2/cams/explorer

SAAS
CAMS: https://api.dataplatform.dev.cloud.ibm.com/v2/cams/explorer
Business Glossary: https://api.dataplatform.dev.cloud.ibm.com/v3/glossary_terms/api#
https://egeria.odpi.org/

--

--