A data catalog promises to help everyone search and find every document and datapoint in the organization but it's easier said than done.
Join hundreds of practitioners and leaders like you with episode insights straight in your inbox.
Checkout our brands or sponsors page to see if you are a match. We publish conversations with industry leaders to help data practitioners maximise the impact of their work.
The promise of data catalogues is to help us find the right information at the right time. This episode is for people that want to understand the organization's critical assets and their relationships and build an enterprise data catalog that actually works.
We will start by defining what a data catalogue really is, and explain the strategic value that this capability brings to the business. We then deep dive into how to organize data in data catalogs and other metadata repositories, starting from why most catalogues don’t work and discovering some of the techniques that we can use to deliver results both short term and long term.
My guest today is Ole Olesen-Bagneux, author of the upcoming book “Enterprise Data Catalogues” published by O’Reilly media. If there is one thing to remember from this conversation is that how we organize data defines how we can search for it, and that we need to rely less on technology and more on the human factor when organizing data.
Ole is passionate about data management, particularly about the problem of search: how do we search for data. He dreams of searches that are seamless, and powerful that that can help us answer even the most intricate business questions. Ole has a LinkedIn newsletter called the Symphony of Search, a supplement to his upcoming book where he guides us through the world of search by telling us the story of a very special organization, and focuses on the search of the future.
Your ideas help us create useful and relevant content. Send a private message or rate the show on Apple Podcast or Spotify!
Loris Marini: What if everyone could search for everything in their company, you know, every data asset, all the documents, everything you need.
Loris Marini: data catalogs promise to help us do just that. But if you have experience building this capability, In your own organization, you know how tricky it is to get it right? So this episode is for everyone that wants to understand the organization's critical assets and their relationships and, and how to map this whole landscape really effectively, we'll start by defining what a data catalog really is and explain the strategic value that this capability brings to the business.
Loris Marini: We then deep dive into how. To organize data in data catalogs and other metadata repositories, starting from why most catalogs don't work and discovering some of the techniques that we can use to deliver results, both short term and long term. There's one thing to remember from this conversation is that how we organize data, defines how we can search for it.
Loris Marini: And we need to rely less on technology and more on the human factor when we try to organize data. So today I'm speaking with O Les BNA O author of the upcoming book, enterprise data catalogs published by oral media and soon to be available, uh, Ole is passionate about data, manage. Particularly about the problem with search, uh, how do we search for data?
Loris Marini: What does it mean? He dreams of searches that are seamless and powerful that can help us answer really even the most intricate business questions. All it has a LinkedIn newsletter called the symphony of search has supplement to his upcoming book where it guides us through the world of search by telling us the story of a very special organization, which I'm.
Loris Marini: You know, not saying too much here and focuses on the search of the future, you can subscribe. Now there's gonna be links as usual on the show notes. Um, and I believe all it will keep you in, uh, formed of any upcoming event, particularly around book launches. So if, uh, data cataloging and search is a dear topic to you, I strongly recommend crabbing today. Okay. I'm here with, uh, Olo lesson BNA, Ole. Welcome to this current data. And thank you for being with me today.
Ole Olesen-Bagneux: Thank you. Uh, thank you very much for, uh, for having me on and, uh, for this, uh, fantastic introduction. I, you are very smooth communicator Laris. I, I couldn't have done this myself, but thank you very.
Loris Marini: It's a pleasure. It's a pleasure to have you here. So all help me understand, um, what is a data catalog really? And why is such a critical capability for the business for any business? Uh,
Ole Olesen-Bagneux: Yeah. So, so a data catalog very briefly explained as a, uh, meta data overview of all the data that you have in, uh, in your it, uh, landscape. So, uh, it does not contain any data in itself. It contains only descriptions of data that is, uh, stored in, in source systems. So it, uh, it really is like, uh, old school catalogs that sort of.
Ole Olesen-Bagneux: [00:05:00] Gateway to all the knowledge that surrounds, uh, the catalog and, uh, that is very high, high level. What a data catalog is.
Loris Marini: I love it. A gateway, a gateway, right? So like a point of access, like when you don't know what data you need, you don't know where things are. You go there and you knock one door instead of knocking 10. Is that the idea?
Ole Olesen-Bagneux: Yeah, totally. The idea is, uh, at its best that a company should only have one data catalog because there is this distinction. Uh, you can find it in literature. Uh, Between the map and the territory. Now, the territory looks as it, as it does. It's it's a piece of land. It's an island, whatever it is, this territory is, is something in the world.
Ole Olesen-Bagneux: Now, when we draw a map of it, we draw a distinct map of the territory and each time you've, uh, different people draw this map, there will be slight differences. . And so the difference between the map and the territory is something that you really have to watch out for. If we map the territory differently, then uh, things do not match.
Ole Olesen-Bagneux: And we can't really understand the, the territory. That's why we only need one data catalog in a company, not 10, because they will not depict what is out there in the same.
Loris Marini: Right. So it would be the equivalent of taking many different pictures with different cameras and different lenses in different zoom and in different angles. And then trying to do some complicated 3d rendering to recompose the original image, obviously it's time consuming. And, uh, obviously this is an analogy, but reality is much more complicated.
Loris Marini: Than that because we, we need to dive into the topic of meaning and how people interpret things. And, um, I can't wait. So before I go crazy and I jump, you know, 150 miles an hour into this topic. where do you think
Ole Olesen-Bagneux: I actually think we are, we are we're beginning this, uh, conversation, uh, quite well, uh, Laris. I, I think to, I think we should just continue. I mean, I, I wanna add that a data catalog works. Most likely works in, in, in one of these two ways. So they can, it can either pull data, uh, with crawlers, uh, from the data landscape itself, or it can push data from the data sources into, uh, the catalog via streaming.
Ole Olesen-Bagneux: And so going back to the analogy with the ma and the territory, a Corolla that, that crawls the data source will always be an interpretation. It will be something that interprets what what's an asset in this, in this, uh, data source. So, so that's why it will never be totally the same if we use several catalogs, but I basically high level.
Ole Olesen-Bagneux: What I wanna talk about is, uh, is the fine art of searching for data that is something very different than searching in data. We do not talk very much about searching for data. So I think that's a very important topic that we should talk about.
Loris Marini: Let's dive in. So what's the difference between searching for data and searching in data?
Ole Olesen-Bagneux: Yeah. So basically the difference is, um, when you have a specific, uh, question that you want to answer, um, if you are a data scientist, for example, um, Or similar, uh, discipline person. What you do is that you search in data in databases, writing, for example, SQL statements. But when you search for data, you're not searching inside a database to, to search, to, to ask questions to the data that is in, in there.
Ole Olesen-Bagneux: You're searching for, uh, places where you can actually find data. So where can I find, uh, the most, uh, relevant research data, uh, related to this process in the organization and so on. And. This discipline, this searching for data discipline. It's just, it's it's, it's something of its own. It needs to be catered for in, in other systems.
Ole Olesen-Bagneux: And it works in a slightly different way than searching in data. So I think that one of the great mysteries of data catalog and what something that many people are puzzled about in data catalogs is that if they take the best brains in their company and say, how does this thing actually work? I've bought ACA data catalog.
Ole Olesen-Bagneux: Can you tell me how it works? They will not give totally precise answers because these people that are very, very bright minded, they are used to search in data. That's what they do for a living. They, they search in data and searching for data is, is something little different. Uh, it takes a different skill set.[00:10:00]
Ole Olesen-Bagneux: It takes a lot of patience and it's definitely not something that is valued very much, uh, so that you won't find many employees capable of doing that as if this at a very high, uh, level of qu quality, quite frankly.
Loris Marini: That's incredibly insightful. I think, especially in light of a recent conversation I had with, um, Bill Schmarzo in about his, his book on, uh, the, the economics of data and AI. Um, he, it proposes in his idea of the marginal propensity to reuse, which. Basically just means, you know, how luckily is the, a data, um, a data point, a data set is going to be reused in the future.
Loris Marini: And his point is because data is an intangible asset. As all intangible assets, they can be used over and over and over. They don't deplete. It's not like a track. You buy that depreciates over time. It's something you can add value over time. Of course. That is a big assumption. Can you reuse it? And of, of course, reuse has many components to it
Ole Olesen-Bagneux: Mm-hmm
Loris Marini: we reuse it when we understand when first we can find it then understand it.
Loris Marini: right. So that, that finding is key. If we want to change the, what we do in the organization from, uh, exploitation. So data set, try to extract meaning from it and then dump it somewhere in a key repository, in an S3, uh, bucket and forget about it versus okay. Finding it, refining it, adding meaning to it, sharing it and keep going through that loop to add value.
Loris Marini: Um, so incredibly important. I mean, strategically this can make all the difference between an ROI that goes to zero and an ROI that is exponential. Um, I least in the short term,
Loris Marini: so why is it that we struggle so much understanding the mechanisms of organizing data and searching for it and what can we do to avoid the straps?
Ole Olesen-Bagneux: Oh, I think there's a lot of, basic human behavior that plays into this. Of course the data catalog space in terms of technology is not completely evolved. but I think the major thing, at stake here is human nature. No one, no one wants to fill out forms, and, uh, correctly document that had process have been carried out. No one wants to do that. It's just boring stuff. And because it is so boring The methodologies around it have not been evolved to a very high degree of, complexity and potential.
Ole Olesen-Bagneux: Um, unless you find yourself in a very, very specific discipline, that is for example, Compliance in, uh, compliance and quality functions in the pharmaceutical industry or in the food industry, or if you're in the public sector and you find yourself in archives, national archives or museums or other institutions that are.
Ole Olesen-Bagneux: Uh, by nature, uh, given the task of, of preserving something for eternity. Uh, so there, you can find these disciplines and, and they actually work very, very well and they are intelligent and I find them very, very beautiful, but. But if you go to a medium sized company that is not subject to heavy regulations, because it's not something that become inside the human body then or something that is poses of threat to, to nature there, we have this issue of, of, of, of technology that is.
Ole Olesen-Bagneux: That necessitates human activity at a level where it's difficult to engage end users. So I think that's, that is to be honest, the, the, the biggest pain point for, for data catalogs, but for every data tool out there,
Loris Marini: a few thoughts bubble up in my mind. The first one is the conversation that we, I had recently with, um, ire staying back on data lineage from a business perspective, one of the latest books she published and we had similar conversation around data lineage and made a, met a data management with, uh, Jan Ulrich, uh, VP of, uh, research and educational mantra.
Loris Marini: Um, And the insights there were really like this, this disconnect often between the real world, the, the world in which the business operates and the digital representation of that world and the importance of, you know, creating maps, really to navigate how we understand what the business does in a digital, uh, way.
Loris Marini: And we, we also talked about a lot with James Price. I think there was episode 17 time to wake up with about the problem of motivating, uh, the importance of data management to the business. And so understanding the strategic imperative behind any data management initiative and of course, catalog catalogs as belong to the macro field of data architecture, which is.
Loris Marini: Part of data [00:15:00] management are, um, fit into that conversation. We from James, I remember the biggest insight was, you know, you have to, you have to shift. As a business executive, you need to start thinking about data as an, as an asset, as something you control and that you can monetize. Um, the same is something that Doug Laney preaches in his book, um, infos, beautiful book, by the way, really enjoyed it.
Loris Marini: And, and the, the insights are always the same, right? We need to start think changing our mindset from data is something that. Uh, that I use once to something that I use reuse and that it adds, uh, that that can, that can grow in, in value and usefulness over time, assuming that we are ready to put in the work to of refining it, of structuring it, of standardizing it, which is the, a bit of a, you know, what people associate as the boring stuff of data in general, you know, the, the data life cycle.
Loris Marini: Now I strongly disagree with that. because I think that actually. Managing data, whether it's a catalog or, you know, trying to map the lineage, um, or, or design the systems that need to be in place to move it around to me is, is a fundamentally a human challenge. So when you say that the problem with data catalogs is human behavior.
Loris Marini: I'm like, yes. You know that it's yes, this is what we need to unpack. And I don't think there is a book yet that really dives into what does it mean? What, what does it mean human behavior in data? Um, we can take that route or we can focus on search or we can go back first in search and then talk about this, but I definitely wanna talk about this with you.
Loris Marini: So
Ole Olesen-Bagneux: Yeah, cool. Me as well, me as well. Those, I think that these topics are a little, um, uh, in, uh, they, they are kind of overlapping. So, so my comment here, first of all, is I have a couple of comments to what you're saying, first of all. Um, I think there's a very interesting, um, Paradigm shift in, in the way we think of asset in this, uh, data mesh Mo movement where, uh, where it has been, um, put forward that, that we could also think of data as a product and not an asset.
Ole Olesen-Bagneux: And so the difference between thinking of data as an asset asset and as a product, according to the data mesh movement, right? Is that, um, is that a product is something that consumers wanna have. They want to, they wanna purchase that. They wanna really have it. Whereas data has an asset to some extended least, uh, protects data, uh, in a way that, that, that.
Ole Olesen-Bagneux: Locks it away, um, and focuses more on, on keeping it and protecting it like oil that you put in highly secured places where very few people can access it. Now, if the distinction holds, uh, uh, the test of time, let's see. I think it's an interesting concept to consider as an alternative for data as an asset.
Ole Olesen-Bagneux: Now talking about, uh, the human behavior and how we can, how we can, uh, how we can perhaps, uh, encourage, uh, a better way of searching for data. I think it's, uh, it's overlapping in the sense that, many of the systems that we use in a data catalog universe, They also hold, uh, data lineage.
Ole Olesen-Bagneux: Most of them good data catalogs has a, has a data lineage functionality built into them. as I see it, these tools have been developed by the data management community, uh, in parallel with the computer science, uh, discipline. I frankly believe I have a bio background in, uh, library and information science.
Ole Olesen-Bagneux: I have a PhD in information science and it is my experience that. Computer scientists and data management professionals that do not have this same background as myself. They are not very used to thinking of how to structure data at a conceptual level. And I know this is very provoking to say, but the idea behind my statement here is not to say that they don't know how to organize data, but they know they are used to thinking of organizing data in databases.
Ole Olesen-Bagneux: Data models and they do structures that, uh, orchestrates processes within it systems. Now, if you wanna organize data so that you can search for data and not in it in a database, you just need to go about it in a slightly different way. So I think that many of the tools that are out there do perhaps not encourage an end use engagement beyond.
Ole Olesen-Bagneux: Bare minimum of what is required of the user simply [00:20:00] because they are not, they are not yet developed all of them. Some of the data catalogs out there are, are superb. So I don't wanna, I don't wanna, like, I don't wanna trash talk the, the data catalogs base at all, but, but you really see a lot of different things, um, in this space.
Ole Olesen-Bagneux: And the most important thing in this, in this regard is the ability to, to, to depict your knowledge in domains and tag them with vocabularies of various degrees of, of complexity. This is very important. So you must have very controlled vocabularies, very loosely controlled vocabularies, and you must be able to navigate your way through the universe of, of knowledge in your company in a way that is. Understandable. And also very stable and solid, and this is very difficult to obtain. And this is not something that has provided out of the box from the technology vendor. They just deliver the platform. Right? So this intellectual exercise of actually mapping the. The territory that is up to the end users that is up to the company itself that buys this technology.
Ole Olesen-Bagneux: And that is very important and it is very difficult and it has tried to be solved by people with backgrounds, not being used to thinking in these ways.
Loris Marini: So if, if I, um, was to step into an, an enterprise, an organization, that's struggling with mapping that, that data landscape. And, um, and I wanted to lead to that intellectual exercise. We mentioned before of aligning people, meaning and structuring and organizing, um, the, the data of the company. Where do I start and what kind of, um, sort. Pit falls or roadblocks, I would say, um, should I be expecting along the way, especially at the very beginning of that
Ole Olesen-Bagneux: oh yeah, yeah. That's a very messy exercise. Uh, but it's, uh, but it's also very fun. I, I really love doing it. I, I think, uh, just as much as humans hate filling out forms and, and, and documenting stuff, just as fun, uh, it is to, to, to try to, to, to, to map the territory, right. To actually think how, how is this put together?
Ole Olesen-Bagneux: Right. But. I offer sound and practical advice in my book. And I, uh, I can definitely re re repeat some of the points here or, or mention some of the points here. So data catalogs, they work in free dimensions. You organize your data. Uh, vertically in domains, horizontally in lineage, and then as a splash in every direction, in a knowledge graph.
Ole Olesen-Bagneux: So you have these three dimensions, in a data catalog and vertical dimension is the one that is the, the first you have to, to do. And it's, it's the domain. The domain mapping. So you have to map your domains. And one of the problems I have talked on a, on another very excellent podcast, uh, just like yours, uh, Laura it's, uh, the data mesh radio, uh, by Scott Hillman.
Ole Olesen-Bagneux: And there I discussed this domain in particular because the domain is something that is difficult to map, but again, many computer data sci uh, data scientists, the computer scientists, they come from this domain driven design background. So they are used to designing domains as something that orchestrates a process of how data flows inside a system or between systems.
Ole Olesen-Bagneux: But if you wanna map a domain that is totally. Uh, abstract that is not linked to it systems in any way, then you must step out of this. I, this notion of trying to make domains, Linked in terms of how data flow between them. That's the heritage of ever Evan's, uh, book of domain driven design that that's, that's how they are used to designing software because the domain driven design is intended for good software design, not good design of domains itself.
Ole Olesen-Bagneux: Knowledge itself. There's a distinction here. That's very important. So the way you do domain mapping in a data catalog is based on identifying stable containers that can be divided into sub parts. You can either use capabilities or you can use processes. If you're a high regulated company, you have a QMS a quality management system.
Ole Olesen-Bagneux: And inside that QMS there is a process map. And if you have that process map, please do not try to invent a new map. You already have your map. It's right there. So use that as a way to structure all the data in your company. That's the most. And first important thing you organize your domains vertically.
Ole Olesen-Bagneux: Once you've done that many [00:25:00] data catalogs will provide you the horizontal way of searching, uh, for data with a lineage function and lineage functions. Can be made in many different ways, but that is an automatic, uh, interpretation. So unlike domains, lineage is something that you get out of the box of the technology you choose. And then that graph, if you have the possibility of also having a graph, then
Loris Marini: And, And, even then like lineage itself, the different levels of obstruction, uh, this conceptual lineage as logical and physical, uh, is, is, uh, cataloging mostly concerned with the conceptual and the logical, or does it often go down all the way? You know, to the physical sort, it cares about how do you represent integers and like the actual, underlying representation, binary representation of data.
Loris Marini: as people that are searching for data, should we care about, um, the three layers of lineage or do we
Ole Olesen-Bagneux: you
Ole Olesen-Bagneux: choose. Um, there are some
Loris Marini: of just looking for data, not in data, should we just stop at a conceptual logical?
Ole Olesen-Bagneux: I don't know if I can answer that Laura, to be totally honest, but I can definitely give you some context that I think is relevant for your question.
Loris Marini: Yeah. Yeah, sure. I mean, it is a conversation,
Ole Olesen-Bagneux: yeah, I wanna be completely honest. I, I don't know if I can answer it. I just let, let me be honest about it, but I
Loris Marini: I
Loris Marini: can
Loris Marini: answer 90% of the questions I ask.
Ole Olesen-Bagneux: Okay. Okay.
Loris Marini: discovering data because we're discovering it. So.
Ole Olesen-Bagneux: Yeah. Yeah. I want to get back to the name of your podcast because it made me think of something else, but, but, um, yeah, so, so, so lineage, can have two primary, qualities. If you work in side analytics, uh, in some, capacity, it's very interesting for you to go upstream in lineage, cuz you can see where something breaks in that lineage.
Ole Olesen-Bagneux: You can see why you are reporting, whatever it is about and however complex it is, why it breaks because you can go. Upstream in the lineage. And so to approach an answer to your question, I'd say that it, the lineage functionality must be, precise enough to actually deliver answers to such a browsing experience, searching for data upstream, uh, in the lineage.
Ole Olesen-Bagneux: And then there's the opposite direction. The DPO of your company, the data protection officer will sit, close to the source and say, we have the sensible data. How is it processed in my company? And that's that's lineage downstream. That's the opposite direction they wanna see. Okay. How is this data processed?
Ole Olesen-Bagneux: And is this data processed in accordance with the consent that we have collected? So these are the, these are the two, these are the two big, big selling points for a good lineage function in a data catalog. It's, it's reporting people that can go upstream to see where, why and where their reporting breaks.
Ole Olesen-Bagneux: And it's downstream for the caw. Like the chief information security officer and the data protection officer assessing, how does this data flow from the source, uh, to its destination? Is there something that is insecure or, uh, not respecting privacy and so on. So that's, and I think that the best line function goes down to column level, right. so they can, they can look at the actual values in columns and see how, how, uh, how that is processed from system to system. But I think, uh, I read a source that there are at least 14 different lineage, uh, ways of depicting lineage. I, I didn't agree in half of them, but, uh, it's just, it goes to tell that there, there are many different, uh, many ways of depicting lineage here.
Ole Olesen-Bagneux: But I definitely think that like an, I don't know what you conceptual line. I don't think that offers a lot of value
Loris Marini: Yeah, no conceptual is, is way to, to abstract. I mean, I suppose it depends also who is asking the question. If it's, um, someone that then has to further manipulate data or do you know, create. Uh, children models from the, the upstream ones, or if it's someone that is just looking at a high level interdependency, you know, sometimes you do impact analysis on when, especially when we are refactoring, that is changing the code that.
Loris Marini: You know, describes how we, we organize data, for example, in a data warehouse or lake house or whatever we wanna call it. Um, sometimes it's useful to ask the question or sometimes we have to ask the question. If I change a line of code here, how many systems [00:30:00] down downstream will I impact, um, might not necessarily be related to privacy or compliance, but it is something that impacts, uh, our ability to.
Loris Marini: Keep systems running, you know, and, and, uh, keep delivering data to every, uh, downstream up that uses it.
Loris Marini: Uh, so I, I suppose maybe is it, is it fair to say that the, the lineage we need depends on the question we ask, and of course we do want to have the ability to go the way down to physical, because sometimes the questions require that that information and the, the
Loris Marini: tooling has to be ready to provide.
Loris Marini: That's
Ole Olesen-Bagneux: Yeah, definitely. But, but lineage really is a very, very, very complex, um, technology. I, I know that they're good lineage vendors out there and I, I deeply respect the work they do because it's so complicated to do lineage. It's very complicated.
Ole Olesen-Bagneux: Um,
Loris Marini: I wanted to take a step cause I wanna start from lineage because to me moving my, my, um, my thoughts horizontally is a little bit more intuitive than going vertically. So I wanted to start from the horizontal axis lineage, but of course, lineage then allows us to travel through, to move through vertical lines.
Loris Marini: And so let's, let's dive into that domain aspect because that's also something really, um, Complicated, you know, people have different ideas of what a domain is. How do you understand, how do we wrap ahead around what a domain is? And yeah. What are some of the sort of tips that you learn along the way when you're structuring data
Ole Olesen-Bagneux: yeah. I, uh, I, as I said, I have a background in library information science. Now, the reason I mentioned this law is, is because that completely unparalleled with the domain driven design and that discipline and all the heritage that we have from, from Eric Evans, uh, book. Completely unparallel. Uh, there is domain analysis, studies in, library, information science, and they also study domains, but they do not study domains to build software.
Ole Olesen-Bagneux: It's something that is, has nothing to do with software it'sly has to do with how we, you know, map the knowledge universe of the world. and that's a wonderful discipline. I really, really love it. And it has sound an easy advice to understand domains. So in library information, science domains are not difficult.
Ole Olesen-Bagneux: The analysis of identifying a domain is difficult, but the understanding of a domain is not that difficult. It's, it's a group of people that would have a shared purpose means of communications and, uh, shared methods as.
Ole Olesen-Bagneux: So a domain in that respect is not super difficult. What I suggest simply is that you use, uh, capabilities. So try to try to translate your, uh, organization into a set of capabilities. With a hierarchy or use a process map. But when you do this, when you depict these capabilities, for example, it management as a capability, what are, are the underlying, uh, capabilities in that capability when you do that depiction, do not think of domains as something that has to orchestrate software.
Ole Olesen-Bagneux: do not think of domains as something that needs to be tied together because data flows between the domains. It has nothing to do with that. Think of domains as something that represents the universe of, of knowledge in your company. So inside the it management, you will have sub capabilities. Uh, so depending on.
Ole Olesen-Bagneux: Where you are on your, for example, both cloud and agile, uh, journey. Uh, you can be, you can be set up in, in, uh, so you can have agile team management. You can have, or you could have like old school on, uh, on, on premise management. Right. And, uh, So, yeah, so you could just depict these capabilities, that resides within the it management capability.
Ole Olesen-Bagneux: And at that point, when you have that map of all the, the sub capabilities that links up to the, the high capability at that, at that point, you. You pull out your, uh, crawls or, you start pushing with your streaming technology. So at that level, You define, how the, the catalog is actually ingesting small parts of an it system into that exact capability.
Ole Olesen-Bagneux: And that makes you represent all the data in your company, uh, in a logical profitable vertically
Loris Marini: So capability here, just so I wanna make sure I understand the, the determine in terms of the business context. Is it, um, a [00:35:00] business line? Like, is it for example, marketing, um, operations, procurement, these are domains or,
Ole Olesen-Bagneux: yeah. So the definition you can have, there are several, uh, sources, so that defines capabilities, uh, Bibo, for example, uh, I can't actually remember what, what the acronym stands for, but, but Bibo is a very good, uh, source in this regard, but there are also some research, uh, papers out there on capabilities.
Ole Olesen-Bagneux: You can also look at uhto SL , uh, Practice of enterprise architecture. It's actually also based in Australia.
Ole Olesen-Bagneux: Um,
Loris Marini: interesting.
Ole Olesen-Bagneux: yeah. yeah. So, so
Loris Marini: in the show notes for those that wanna deep dive
Ole Olesen-Bagneux: I definitely wanna do that, but, but like the simple, I show I'll send you the links. Of course. So Laura, but the simple definition of a capability is, um, is, is what you do.
Ole Olesen-Bagneux: What do you do? And it is something that is expressed statically as opposed to a process map that really describes how do you do it? So the difference here is that when you have a process map, you have your value chain and you have your strategic processes and you have your supporting processes and they are interlinked.
Ole Olesen-Bagneux: They, they support each other in various ways. So. HR, for example, they have onboarding, then they have employee management and then they have off onboarding. And that is a process. Those things are linked, but capabilities. They are like building blocks. They are not linked. It's just different kinds of activities.
Ole Olesen-Bagneux: So, and both actually work in a data catalog context. You shouldn't mix them. That's the big thing you have to remember. Don't make CAPA mix capabilities and processes, but both the elements work just fine. If you wanna map your data a construction
Loris Marini: So back to the HR example, onboarding and off onboarding, those are two examples of capabilities, the sequence in which you, uh, activate them. That's the process, you know? So if, if, am I
Ole Olesen-Bagneux: Yeah.
Loris Marini: correctly?
Ole Olesen-Bagneux: Mm. Yeah. I mean, you can, you can express capabilities as something, uh, that is not. So you have to use, uh, nouns and not burps. I know this is very much into the weeds here, but if you want to express a capability, use a noun. It's not an activity. So use management at the end, for example, whereas if it's a process you onboard, that's a process.
Ole Olesen-Bagneux: Something happens after the onboarding and you do not necessarily have to some have to, to have something that happens after a specific capability. And also if you really wanna like, be mind parcelled about it, Laris
Loris Marini: Yeah, I'm loving this.
Ole Olesen-Bagneux: Yeah, that's good. But uh, okay, so, so, so, so now it's we pull in an extra dimension. If the capabilities, they can be many places in your process map.
Ole Olesen-Bagneux: imagine I was a manager, I have to do one on ones with my employees and I have to do assessments of their performance. And so on that is regardless of whether or not I'm a manager in finance or manager in manufacturing. So the capabilities are all over the process map. The the same capability can be in many different parts of the process map.
Ole Olesen-Bagneux: that's why you shouldn't mix them. If you wanna depict the universe of knowledge in your company, because then it, it doesn't work. You should either stick to processes or capabilities.
Loris Marini: So if we mix them, what happens is that the same, the same capabilities represented in different ways, in two different parts of the organization, then they will be surely different because they're done by different people with different, um, abstraction exercises, you know, that intellectual, uh, effort that we, we mentioned before of defining things, um, and things will.
Loris Marini: Basically that that's the, the bottom line is that there's no interoperability between these two, there's two versions of the same thing. Um, and so which one do you look at?
Ole Olesen-Bagneux: Um, so, so the question here, Loris, I think, was to what happens if you mix capabilities and processes?
Ole Olesen-Bagneux: Yeah. Yeah. So in the spirit of this conversation, I'd say that you lose the possibility of searching for data. You can, you can navigate that structure. You don't know what's up and down there. It's just nonsense. is it, is it, is it just the, yeah, I hope I hope it, it makes sense to the,
Ole Olesen-Bagneux: to
Ole Olesen-Bagneux: the listers,
Loris Marini: Yeah, I, I think I'm, I'm definitely building a picture in, uh, in my mind. So we started with the domain domain lineage and then, uh, connecting. So the lineage, um, I'm [00:40:00] thinking about that horizontally domains vertically, and then in this metrics, then we have, we need to have the ability to connect this boxes, basically this dots with, with each other, right.
Loris Marini: That was the third layer that, that you mentioned
Ole Olesen-Bagneux: That was the knowledge graph. So,
Ole Olesen-Bagneux: yeah. So, so imagine that look, they click a top, the top domain. Let's say we stay in the process space and we stay with the HR example in respect of the listeners. So we click on the HR domain. We go into the, the process, uh, that is carried out in the HR domain. And we see, for example, onboarding, we click that one.
Ole Olesen-Bagneux: We go into onboarding. And then there is the very first parts of onboarding that is like head hunting or even scouting. Right. We go into that sub domain and at that point, We launch the crawler towards the assets that sits in the data sources relevant for that, that specific domain. So we do a very, very precisely defined scan of some typical data I'm.
Ole Olesen-Bagneux: So of some data sources that we think belongs to this domain. And once we have done that, we can expose them within this, uh, Within this domain. And, and once we have done that, we can click on those assets and we can browse them vertically in limits.
Ole Olesen-Bagneux: and
Ole Olesen-Bagneux: and
Loris Marini: the dream
Ole Olesen-Bagneux: that is, that is what a data catalog should definitely be able, capable of doing the best data catalogs out there can do that for you, but you need to know all these human practice elements. The technology will not give you this structure. You need to define it. And then, and then we need also to talk about the knowledge graph, right?
Ole Olesen-Bagneux: Because the knowledge graph is the last thing. Each of those assets in those domains will also be browsable in a knowledge, uh, graph structure, which is something else than lineage. At least in my view, some would oppose that and say that, uh, knowledge graph is also managed. I, I, I don't think it is because the knowledge graph really tells you that this asset is linked.
Ole Olesen-Bagneux: True this asset that is a term or a process or whatever you want. It's totally free. Right. Redefine the knowledge graph as we want. It's an ontology. So,
Loris Marini: You,
Ole Olesen-Bagneux: so that's, that's how you navigate, uh, in my view, a perfect data catalog, mark.
Loris Marini: you mentioned a couple times, uh, crawling, you know, as the first step when you identify the domain. So in terms of, um, the underlying sort of our, um, systems that need to be in place for a data catalog to work, when do we know that it's the right time that we are ready to implement cataloging as a capability for the organization?
Ole Olesen-Bagneux: Oh, yeah, that's another good one. LA I have an entire chapter implementation and persuading stakeholders because people get this technology wrong to a degree that I could just, I could keep on explaining that technology are stakeholders for the rest of my work life I could have perfect work life. Just explain and that's what I do by the way, but, uh, but, uh, yeah, so.
Ole Olesen-Bagneux: you wanna persuade your DPO and your Cecil, your data protection officer and your chief information security officer that they need to, to stop drawing their own map of the territory. The DPO has a privacy information management system, a pips, uh, we're in the DPO has drawn a map of all the data in the company, the chief information security officer Hassay ism and information, security management system wherein the chief information security officer has drawn a map of all the data.
Ole Olesen-Bagneux: In the company, they have not drawn. They have not drawn the same map because they haven't coordinated this activity. It is manually maintained and it, and it prevents them from actually doing their job job, persuade those guys. To do one map in the catalog. One map that will identify personal identifiable information and confidential, confidential information that we need to protect for security reasons.
Ole Olesen-Bagneux: Now then you expand those, the network of stakeholders to all the analytics people. They're pretty keen on getting this technology so they won't
Ole Olesen-Bagneux: oppos it, but. Yeah. Yeah. So, so it makes them capable of discovering data sources for analytical use cases. That's, that's the big selling point of data catalogs, really.
Ole Olesen-Bagneux: Um, so they won't oppose it, but if they do show them the, the lineage function, show them that they can see why their reporting breaks. That's the big way to persuade them if [00:45:00] they, if they, um, oppose it, but you shouldn't try to implement a data catalog without having persuaded these, uh, stakeholders. You can also persuade them afterwards.
Ole Olesen-Bagneux: If a, say a chief data officer, once the data catalog. But it needs persuasion because the DPO will be worried. Oh, are we exposing too much data for, for too many users? Because you need to explain this. This is a metadata tool. It doesn't expose data. It only exposes metadata. Yeah. So it's not, it's not dangerous, but you need to explain that.
Ole Olesen-Bagneux: Thoroughly and show it, explain that and, and show it right. The same goes for the, for the chief information security officer. But once they get that, it's a tool that will allow them to work more efficiently and reach a higher level of, uh, professionalism in what they do. They are totally with you, but you need to explain them this.
Loris Marini: Absolutely. And that communication is, uh, is part, is, is critical to anything we do in the organization. Um, in any context, really, even outside work even between husband and wife, where do we put, you know, t-shirts and where do we put, you know, uh, a pants, you know, I'm, I'm thinking really down to earth here, but it is a problem.
Loris Marini: You know, I moved, I moved, we moved houses. Four times in the last, uh, nine years. And every time we, we move, uh, you need to reorganize your, your things, right. Um, so if thing, if we substitute see physical objects with data intangible objects, uh, the underlying problem still remains. Someone has to agree on where we put stuff.
Loris Marini: Otherwise we can't find it, you know, um,
Ole Olesen-Bagneux: Lauras. Exactly. That's that's the, that is, that is the truth of what I'm trying to say here. That is exactly
Loris Marini: just.
Ole Olesen-Bagneux: of organizing data so that you can find it again.
Loris Marini: We need to just find, yeah, let's agree. Otherwise, what are we doing here? We're wasting one hour. Every time I need to, you know, look for my, my, my black shoes. That's that's not, that's not possible.
Ole Olesen-Bagneux: And, and say that you organize your shoes, following a different principle than your wife. Let's say she has a lot of shoes and that you have a lot of shoes. Why not? If, if those two principles aren't aligned, then the structure and it's like, totality doesn't make any sense.
Ole Olesen-Bagneux: It's not
Loris Marini: Yeah. Yeah, and it gets worse because shoes, you can't replicate them easily. You know, you would have to go under the shop and buy them. Whereas with, with data, someone can't find a, a shoe quote, unquote, that will just make up a new shoe, right. That will just create a new copy of a thing and name it differently.
Loris Marini: And all of a sudden, now you have two black shoes. One is deep black. One is. You know, towards the gray and who knows, who defined it when they
Loris Marini: defined it for which business purpose. And so multiply that by the number of people in, in a large organization. And it's, there's no, I mean, it's no surprise that we end up with a, with a big mess.
Ole Olesen-Bagneux: Exactly. Yeah. It, it would really, that is, that is data cataloging as it's at its call.
Loris Marini: Um, yeah, so I, I could spend another, you know, 24 hours, uh, talking about this with you. I I'm conscious of our time. I think we're getting towards the end of it before we close off. Is there anything that we, uh, crucially important that you wanted to communicate? And, uh, I, as a host, didn't do a really good job asking you that in particular.
Ole Olesen-Bagneux: Oh, I think this was a wonderful conversation, Laura. It really was. no, I definitely wanna promote, I think a suggestion, something else. I think this conversation was just perfect. So I have nothing to add. I, I know that the title of, of your podcast refers to the fact that we are discovering data as a.
Ole Olesen-Bagneux: Like what's the universe of data what's inside it. And I really love the diversity of, of all your guests. Uh, they have so many different backgrounds and I, I enjoy listening to a podcast very much. you could perhaps also spend a couple of more episodes on, on the, thing that I thought your podcast was about.
Ole Olesen-Bagneux: When I discovered it discovering data as data discovery. So the, the very process of data discovery, how does that play out? Because I see a lot of misunderstanding about searching for data and searching in data. People think data discovery begins when they're searching in data, because that's where the really complex maths actually begin right.
Ole Olesen-Bagneux: Searching in data. But data discovery begins before searching in data, uh, in data, it, it begins searching for data. So that's.
Loris Marini: I am
Loris Marini: absolutely on board, absolutely. On board. And actually, if there's any particular speaker, um, that you think I should get in touch to, uh, that's true for you, but also for you, you, the listener, if you're listening to this and there's someone that. Clicks. And you're like, oh, I love lawyers to have a conversation with [00:50:00] that person.
Loris Marini: Definitely, uh, reach out on LinkedIn or via my email. You'll find it on the website. There's a contact firstname.lastname@example.org. Um, always looking to, uh, to add new angles to the conversation and really deep dive, you know, one podcast at a time, try to get everybody closer to, uh, some sort of, you know, agreeable methodology and framework to think about.
Loris Marini: And. Data management initiatives. Um, the focus of course, like Scott Taylor says many times on LinkedIn. There's one thing is to search for meaning and that's analytics, there's data science, there's data storytelling. Um, and another thing is to search for the truth in the data. Uh, and the man data management is really about trying to get at the bottom of, um, what is, what, and can you trust it?
Loris Marini: Um, these are really the full two fundamental questions. So all, they absolutely a pleasure for me, uh, having you on the.
Ole Olesen-Bagneux: Likewise.
Loris Marini: generally look forward to, to have connect outside the podcast and do, uh, way more discovery together.
Ole Olesen-Bagneux: Yeah. Yeah. Likewise, Laura, this has been great. And, uh, yeah, let's keep the conversation going, uh, off, um, off air.
Loris Marini: Awesome. Cheer, Matt. Thanks.
Ole Olesen-Bagneux: Thank you.
Loris Marini: of your day.
Ole Olesen-Bagneux: You too. You too. Bye.