Welcome to part 2 of turning data into knowledge at scale, and about the arduous job of cleaning, defining, collapsing and expanding information and knowledge in the enterprise. Join me as I learn from Jessica Talisman senior Taxonomist at Pluralsight.
Join hundreds of practitioners and leaders like you with episode insights straight in your inbox.
Checkout our brands or sponsors page to see if you are a match. We publish conversations with industry leaders to help data practitioners maximise the impact of their work.
Clean, structured, and trusted data is the goal of most data engineering teams. But if we want to create business outcomes we can’t stop at the data layer, we need to turn data into knowledge. The trillion dollar question:
How do we turn data into knowledge at scale?
This week I had the pleasure to explore this question with Jessica Talisman senior taxonomist at Pluralsight.
Clean, structured, and trusted data is the goal of most data engineering teams. But if we want to create business outcomes we can’t stop at the data layer, we need to turn data into knowledge. The trillion dollar question:
How do we turn data into knowledge at scale?
This week I had the pleasure to explore this question with Jessica Talisman senior taxonomist at Pluralsight. This is a summary of what I learned in part 2 of our conversation. If you want to dig deeper check out episode 041 of Discovering Data in your podcast app or visit the episode page.
What I learned
The first takeaway is that Information is contextual. This means that the extent to which we can benefit from it and extract some sort of value depends on how well we can reconstruct the context. Information becomes knowledge when you have a synthesis.
The second point is that we all process information differently. Show the same facts to different people and they will draw different conclusions. That’s normal, unlike machines, our cognition is not 100% reproducible. Plus we process information via our rational slow brain and our fast or emotional brain. Different neural pathways, different results — that’s the beauty of being human beings.
Another point for me is that new ideas come from our ability to absorb and process information. Only when we understand the domain we can turn information into knowledge and build a conceptual understanding. Then from there, we can do a counterfactual analysis: “what would happen if we did X or Y…”
The other big point is that data is a shared responsibility. We are all data stewards and we all have different business purposes. But a true data mesh is a representation of every part of the enterprise, so ALL our voices count and need to be heard.
Not everybody can be a software engineer or a data scientist. But the process of defining terms and their relationships and agreeing on that ground truth must involve everyone. We talk a lot about data democratization. Why don’t we decide to work together to agree on what things mean? It’s a cognitive workout, it’s hard work, but it’s something we must do.
Are we doing this now? Not nearly enough. Many engineers love to be the gatekeepers for how terms are defined. The problem with this is that sales, marketing, or procurement invariably have different ideas.
Jessica also pointed out that governance means having humans in the loop. If we leave everything to machines and we create our standards, we open the door to bias. To avoid this we need a true collaboration.
One way to decide whether to add or collapse terms is Slack voting. It encourages a conversation so that we come to a democratized data decision and build a shared understanding. When we get our hands dirty and tackle a problem together we heighten awareness around issues that can end up costing the business a lot of money. Unfortunately, though, even if the benefits of this are enormous there's not always time for a Sack poll. And we don't deal with these sorts of issues until there are fires.
We then spent some time thinking about how to build a truly collaborative data catalog. We imagined a transparent source of truth that everybody can access, coupled with a forum. Ideally, we would like to be able to search if somebody already asked that question and assist them to resolve conflicts. Jessica and I agreed that a kind of communication loop is essential. We can’t just use a bot to build taxonomies and create controlled vocabularies, because who consumes these things? Humans. There has to be consensus and agreement.
We also speculated on what can go wrong if we let anyone ask questions on slack without moderation. Someone has to sort through these requests and categorize them.
What I am thinking
Two years ago I had an intuition that led me to create this podcast, I called it “Data Through a Human Lens”. After the conversation with Jessica, I know why I did it. It's something I felt but couldn’t explain at the time, and it's the fascinating boundary between humans and machines. It's also the process of aligning people. It's deciding to embrace our differences and agree to align on the basics. It's deciding to collaborate in the process of creating and sharing knowledge.
We are yet to see a successful implementation of data mesh. Maybe this is because we've ignored the socio aspect of that socio-technical idea that underpins Data Mesh.
Coming Next
Next week we dive into data observability with the super energetic Salma Bakouk, co-founder of Sifflet. That’s going to be episode 042 - stay tuned and stay curious!
Your ideas help us create useful and relevant content. Send a private message or rate the show on Apple Podcast or Spotify!
Coming Soon