AI and its intrinsic link with data governance

Henry Royce

AI and its intrinsic link with data governance

/A

Martijn Veldkamp

/A

October 20, 2023

I hope you found my previous blog post on AI governance and its link with data governance insightful. The feedback and engagement from you have been very interesting. As I am getting back in the habit of writing a blog a month (or more), I wanted to explore my own thoughts into the critical role data plays in the success of AI implementations.

In my last article, I emphasized that AI is only as good as the data it’s built upon. I’ve tried to highlight the importance of accurate, complete, and trackable data.

So how can we enhance AI governance? Indeed, by focusing on the foundation – your data.

As a small side note to the image I used -> In anything : perfection is the end goal, but please be realistic.

1. The Data Lifecycle: From Collection to Utilisation

The journey of data within your organisation let alone your Salesforce Org is a complex one. From the moment it’s collected to its utilisation in AI models. To ensure data integrity, you need to pay close attention to every stage of this data lifecycle. This means understanding not only where your data comes from and how it’s stored but also who has access to it and for what purposes. Customer Journeys are a great vehicle to explore this. Underlying the Customer Journey steps are capabilities that deliver one or more process steps that support that Journey. Processes are a time-tested way of documenting what goes in and what is produced.

For example, the capability Case Management has the process Case Creation.

Salesforce Service Cloud – Case creation standard view

Within that process we have documented the mandatory fields, who has access to that data and when that data will be removed.

Having such a framework where one can trace what the customer does and need to where in the application landscape it ends up is great. If only from an impact assessment perspective when discussing changes to that data.

2. Data Governance Frameworks: A Must-Have

Next to knowing where and how that data was created, manipulated and stored. You should also establish some processes around that data. Regulations like GDPR make that a must. So to control and maintain the quality of your data, it’s essential to establish a comprehensive data governance framework.

Example framework from Office of the CIO

This framework should encompass data ownership, data stewardship, and clearly defined policies and procedures for data management. It’s not enough to say you’ll comply with privacy laws like the earlier mention GDPR. You need a structured agreed and mandated approach. Involvement of your leadership team is key. With this framework you can then govern your Salesforce implementation and the underlying data model and make sure they fill in fields like the example below.

Company field on Lead object

3. Collaboration Across Teams

Data quality is a collective effort that involves various teams within your organization. Don’t know where to start? Look back at the Customer Journeys and where they touch or engage the different departments. Data Quality is created in the whole end-to-end Value Stream.

Salesforce Lead generation

A great example is where companies split the departments across Lead Generation and Lead Conversion. Usually they are Marketing and Sales. And usually they are both measured on different KPIs. Leads – Created vs Opportunities – Closed Won.

Somewhere you want to measure the probability of success that will lead to an order. So where does this measuring start? What will we track to determine where we best can put our efforts? As the process in Salesforce is that a Lead will be converted into an Account, Contact and Opportunity by copying key data fields, what fields do we want and need to capture in the beginning? What value does the extra field bring downstream? Where do we report on that data?

Also don’t forget the supporting departments, such as IT, data engineering, and data science. All must work in harmony to ensure data quality. Siloed improvement efforts can lead to all kinds of issues like data fragmentation, duplications or other inconsistencies.

To address this, awareness is a great first step. If you can explain it by (again) pointing to the Customer Journey and where what data is captured to be used in the next steps is half the battle. This helps in encouraging cross-functional collaboration and a shared commitment to at least the common goal that will impact data integrity.

4. Automate Data Quality Assurance

With the growing volume of data, manual data quality checks become impractical. Leveraging tools for data quality assurance can be a game-changer.

Basic Data Quality dashboard in Salesforce

Example above is from a simple data quality dashboard/A in Salesforce but there are many ways in Salesforce or through external tools that can help you on your way.

These tools can automatically identify anomalies, inconsistencies, and potential errors in your data, ensuring that only high-quality data enters your Salesforce Orgs and thus your AI models. I would recommend to start with the tools that Salesforce has Out of the Box.

You can standardise data entry with Picklist and Validation Rules. Validate and/or Enrich your data with a trusted source. Create a Flow that calls an external validation service to check on Company Information and Addresses when that Account record is created. Salesforce also has Duplication Detection/A with all kinds of Fuzzy (Probabilistic) matching logic. This match returns a % confidence. Typically you see that above 85% match = Auto Merge. Below 65% Match = Auto Create the new record. Between these thresholds you will need to send for manual review. It’s good that you’ve setup a Data Governance with Stewards, so you know where to send that Merge Conflict to. To further that you can call our Merge API, or extend the Merge Trigger when two records are being ‘unduplicated’ and you want to influence on how that specific field survivorship looks like in your Salesforce Org. For maintaining History we have an Audit Trail and it is always possible to write all interactions to a so called Big Object. If you want to integrate the newly created records we offer a wide set of capabilities. From SOAP, REST and Durable Streaming Data AP to Composite Kafka-as-a-service (Change Data Capture). And we finish it with our great build in capabilities: Reports, Dashboards and Einstein Analytics.

  • If these standard capabilities of Salesforce are no longer up to the task than there is a large ecosystem of partners that either integrate with Salesforce directly via an App
  • Exchange package or you need to custom build those integrations. I would advise that the last option only when doing large migrations or big clean up operations in a very restricted time period. So if the scope is more then 10 million records.

5. Continuous Monitoring and Improvement

Safeguarding your Data quality and integrity is not a one-time task. It’s an ongoing process. That is why you need to have that Data Governance in place to regularly monitor and report on the quality of your data and track changes over time.

This includes data lineage, changes in data sources, and potential security threats. Continuously improve your data governance framework based on these insights.

Conclusion

In the quest for successful AI implementations, we must never forget that data integrity is the cornerstone of AI governance. By focusing on the quality, security, and integrity of your data, you pave the way for more robust and reliable AI systems. In the next blog in this series, we’ll delve into the evolving landscape of AI governance.












Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.