Episode:
21

From data governance to data fabric

Loris Marini - Podcast Host Discovering Data

Managing data assets is not just about technology but the right platform can make our life a lot easier. The folks at Ataccama are leading the space. What are they doing differently?

Also on :
Stitcher LogoPodcast Addict LogoTune In Logo

Join the list

Join hundreds of practitioners and leaders like you with episode insights straight in your inbox.

You're in! Look out for the next episode in your inbox.
Oops! Something went wrong while submitting the form.

Want to tell your data story?

Loris Marini Headshot

Checkout our brands or sponsors page to see if you are a match. We publish conversations with industry leaders to help data practitioners maximise the impact of their work.

Why this episode

What I learned

Share on :

Want to see the show grow?

Your ideas help us create useful and relevant content. Send a private message or rate the show on Apple Podcast or Spotify!

Episode transcripts

[00:06:00] Loris Marini: It's fair to say that the data layer in most organizations today is full of gaps and cracks, those seemingly invisible roadblocks that oppose the flow of information and slow everybody down. They can be as simple as incompatible units of measure and different data types, all the way up along the abstraction ladder, the territory of semantics and meaning. Going back to the hospital with 7 definitions of patient or the enterprise with 170 definitions of customer, these gaps remain unnoticed until they become the center of everybody's attention. It might only hinder innovation, but that can also lead to reputational damage, litigations, and various degrees of psychological suffering.

Fixing this is almost an art that requires a careful balance between tools, skills, and behavior. How do the best teams in the world manage this? Today, I have the pleasure to speak with Marek Ovčáček, VP of Platform Strategy at Ataccama. Marek's job is to build the next-generation unified data management and governance platform. The two of us connected via LinkedIn about data and leadership. In addition to these, we're very passionate about physics and knowledge sharing. I’m very excited to be here today. Marek, welcome to The Data Project.

Marek Ovčáček: Happy to be here. Thank you for having me.

Marek's journey in data management

[00:07:34] Loris Marini: My pleasure. So let's start with your journey in data. What was it like to get here?

Marek Ovčáček: Well, thank you for this question. As you mentioned, we are both interested in physics and I was actually studying physics back at the university when I first came across the problem that involved huge data sets and data quality. I was doing some experiments that produced tens and hundreds of gigabytes of data. I needed to figure out how to process that data set and then clean it up and get rid of inconsistencies and outliers. Another time was when I was conducting an experiment. I wanted to compare it against multiple different institutions and everybody had different ways of storing the data, different slices of it. So, getting them all to the same format was kind of a problem.

I started my journey to data quality very early. And I kind of stayed there from that point on.

[00:08:47] Loris Marini: Interesting. So, when you made that transition from the academic world into the data industry, what did you expect to see? And in a general, broader question, how are we managing data today in the world in general?

Marek Ovčáček: Well, when I started, I was quite surprised because I thought the academic world had the cutting edge of technology. But a lot of the proprietary technologies that were used in the industry were actually better than the tools that we had in academia because it's a different approach in the industry. People tend to buy new tools whereas in academia you tend to create your own solution. Actually, Ataccama had, in the past and I still have it, a free product called the data quality analyser. And I was able to use that to process some of my data sets and it was very useful. That was just a very small slice of what is available in professional tools, in the data governance and data management industry. So it was quite nice to get that glimpse.

[00:10:02] Loris Marini: Yeah, I was impressed because I saw that you guys recently published a survey to get a pulse of the market in terms of data governance and data management. There were more than a thousand respondents, if I remember correctly, almost 74% of people, replied that Excel spreadsheets and emails are the primary way that data moves through the organizations and only 1 in 10 uses a data catalog. It has procedures and protocols internally to actually manage your stuff properly.

Were you surprised to see this or was it something that was to be expected?

Marek Ovčáček: Yeah, so actually that's not the primary way to move data. It's one of the ways to move data, but still very important. I prefer, of course, the primary way which is pre-built data pipelines, but yes, 74%-75% still mentioned that Excel is somewhere in that process. That manual Excel and emailing data are somehow involved somewhere in their organization. And it was honestly not a surprise to me. Everybody knows Excel. If you look at the modern interfaces for data preparation or data transformations, they basically copy Excel because it's familiar.

Where I was a bit surprised was another statistic that came out of that survey and that was that only 10% of people tend to use data catalog, even if they have it in their organization. So, we dug deeper into those questions and we found out that it's not because of the data catalog not having the information that they needed but it was lacking any kind of next steps. You have a data catalog, you have the information you need, but then you cannot do anything with the data because it's not really connected to any kind of data processing or data quality reports or anything like that.

I actually have an idea why that is. Most of the very popular data cataloging and data management tools on the market start from the position of, "we want to build a data catalog then everything else will follow after." We actually started in the opposite direction. For 4 years, we provided data quality and data processing to the companies then we saw a need for a data catalog. So, we added it. It started to become very obvious to us from that survey that there is a kind of a big hole in the market in that area.

Approachable tools to govern your data

[00:13:13] Loris Marini: Yeah. Absolutely. And I think the concept of familiarity is crucial because it taps into the psychology of human behavior, how we do things and why we prefer one way as opposed to another.

Sometimes we even go to great lengths and try to do the impossible with a tool that wasn't designed to do what we're doing or what we need to do. But because it's the only thing that we really know how to use, we stick to it which is paradoxical because sometimes it literally takes you 4, 5, or 6 times the time that it would take you if only you had the right tools, but because you don't even know what's available out there you, you kind of settle for this suboptimal situation.

Recently, there was a prospect of mine that came with a request. They're in the food industry and they wanted to track which allergens claims they can make given a recipe. And they have this whole system in Excel except it's not working and they needed it fixed. So, they reached out and said, hey, could you help out? And I was like, is Excel a must-have here or a nice to have? Can we go somewhere else? Because we can choose another tool. I can do it in a day. If we’re going to stick with Excel, it's going to take a lot longer. We have to hardcode this stuff from scratch and maintainability is going to be a problem because unless you have someone that knows how to manipulate those via VBA expressions, it's not gonna work. You won't be able to make changes.

There is a huge element here, and it almost ties in with the concept of integration and continuity between systems.

I mean, think of Excel. It's so familiar. People already have it installed by default in their operating system. And they know it's there 99% of the time they use it as if it was a platform, but it's really just a Swiss Army knife. Yeah, it can give you a lot of flexibility, a lot of functionality, but that’s just about it. You have to write that functionality. It's up to you. And sometimes you're not an expert, you don't know the best way to do it. So, making that leap and realizing that there is a limit, and sometimes you just need a better tool, is important. Obviously, that’s not the whole story. So just to crystallize here, what is Ataccama? What's the core problem you guys are solving?

[00:15:55] Marek Ovčáček: So in Ataccama, we create tools that unify data governance, data quality, and master data management, and give you a single platform where you can do all the actions that are required to govern and master your data. That's kind of the gist of it.

Now, into the details, basically, we provide end-to-end data management solutions, starting with a data catalog, to the data quality, master data management, all the way to the reporting at the end. We are trying to leverage all of the possible information about the data and also automate a lot of these parts of the stack, making your life easier and making the tools more approachable.

Flexible data governance: how to test ideas quickly with properly governed data sets

Loris Marini: In data management, we hear a lot that technology is one part of the picture, but obviously that alone cannot deliver the outcomes we want because there are also processes and people involved. And so, the upskill internally, as we mentioned at the beginning, is getting people to know how to behave, know how to use the tech, but also the processes and the procedures to align even something as simple as having a generally understood way of interpreting a data set and not 50 different ones. So, in this sense, going back to the original question, how does Ataccama do that? What are the protocols and lessons that you learned internally in managing your own data stack?

[00:17:36] Marek Ovčáček: It starts with the culture. In Ataccama, we have what we call unlimited playground. We have people that can go on and try doing different things to know if it’s going to be the best fit for them, or if it's gonna help the company or help our customers. So, it's kind of giving them the freedom to experiment and grow. And we have all the people going through the different positions in the company, growing where they seem like the best fit.

But at the same time, we need to make sure that no one is accessing data that they shouldn’t, no one working with the tools that they shouldn't, working before getting the information about certain customers that are restricted. Now the same kind of principle we apply to our tools.

We are building tools that allow people to have fast and governed access to the data, so they can experiment, sandbox, and test their theories on the datasets in a properly governed way. So, you have some kind of governance in place that allows you to not create a mess. To what you mentioned in the intro, if you have like 700 definitions of a customer, that's not good for the company. It may allow you to experiment and try new things. But in the end, we will end up with a big mess.

We are providing tools that allow you to manage all of that information in one place and guide users to basically create unified definitions and connect them to their data. Because once anybody does any kind of data management work or data governance, that information should be propagated to the rest of the company. And if the next person goes and does the same thing, they should at least be given a glimpse of that information. They should be starting with a good, properly managed data catalog and a glossary that is connected to the data catalog in a unified way.

Building good data habits: the role of a platform to nudge user practices without getting in the way

So that's where we are coming into play. And this is not a tool that enforces governance on the people. It’s a tool that gives suggestions. You see the data you want to work with instead of letting you classify your data center tool. We'll look at other data sets in the organization, the catalog that tells you, this looks familiar, this looks like your customer definition.

Is it the customer definition or is it something else? We are not forcing the user to properly classify that. We found out through the years of experimentation and the people using these tools that if you give them suggestions, they tend to go with what makes more sense. They're not going to create a new definition of a customer out of spite because they see it's already existing so they can use whatever is there.

The way for this to properly work is enablement. It is making sure that this action of classifying data, properly managing all the definitions is easy for a customer. It doesn't create additional work for them. It's easy because it looks familiar. And the system learns from the user actions and makes itself better over time.

[00:21:52] Loris Marini: Yeah, it makes me think of one of my favorite books, Atomic Habits by James Clear where he really distilled the dynamic of habit building. He says, if you decide you want to run one hour, starting with the idea that you will run for the first time for an hour, it sounds incredibly daunting. You’re probably not gonna do it. But if you think, well, all I have to do is put on my running shoes and get out of the door and that's it. Then, you have much lower resistance to overcome. And the example of running he makes in the book can be applied to anything, but in terms of the data governance workflow, it’s the same.

If it takes me contacting five different people, reading hundreds of pages of policy documents, just to send an email, or moving Excel spreadsheets from one place to another, I'm not going to do it. I will just do whatever makes sense to me at that moment and the trouble is what makes sense for me is not what makes sense for you.

Marek Ovčáček: Yes.

Loris Marini: So it's almost guaranteed though. We end up with a mess unless the system nudges people in a direction that is designed for the collective good of the organization.

Workflow and user experience: a tour inside the Ataccama platform

Could you walk me through the life cycle, because I'm really interested? Say I’m an analyst who has been tasked with answering a question about what’s going on with our sales in a particular region. What would be the first thing I do inside the platform?

[00:23:47] Marek Ovčáček: Yeah. So that's a good question. And actually, you mentioned that what makes sense at the moment, and that's actually where I want to start from. For example, more than half of the business users, when they need to do anything like you just described, more than half have to wait to access and transform the data set for more than a day. And it kind of kills that moment. I have an idea right now. I want to do something with the data. I know that the data set exists. I want to test my theory. And then, I submit the request and wait for a day. Tomorrow, I already have a completely different mindset. I am in a completely different space and I need to adjust again, get back to it.

It kills this kind of agility innovation that you are doing so well. What they are doing at Ataccama, as an analyst, you can go to our catalog and search for the appropriate data set. That's usually the easiest thing that you can do in any catalog.

What we do differently is that since we are doing both cataloging and data processing, if you have data governance built-in, and have access to any data from the dataset, you are free to use it.

Loris Marini: Right.

Marek Ovčáček: And this actually ties in with a kind of granular data governance. If something is wrong with the sales data from a particular region, you have to look at that data set connected to the customers. You have to deal with the customer identifying information, PII data, GDPR regulations, and all of that. You might not have access to it. But they actually don't need to have access to that, to get you the information that you need.

You don't need to have identifying information about their customers to determine where your sales data are off-kilter so if you have granular data governance like we have, you basically just have access to the part of the data set automatically and filter out whatever is locked behind some data governance barrier for you. But in most cases, you just don't need the rest of the data. You don't need the personal identifying information available there.

Then you want to do something with a bit of data. Now, first, you want to figure out whether the data set that you found is the right one. And what is the basic characteristic of that data set? What we'll do at Ataccama is we won't wait for you to perform that action. We are actually running statistics on data sets continuously, especially for the most frequent data sets.

So, even if you're searching for a data set, you already have that information. Let's say you have customer data for Western Australia. You search for it; you see some data sets. In our catalog, we have already automatically computed data quality information for those data sets. Let's say you see that one of them is a significantly better data quality because it's a data set that already went through some classic standardization and you start using that. That's additional metadata that you have in your catalog automatically available. You don't have to do it manually.

What if there is something wrong with the data set? We have AI models looking at the statistics from our data sets to see if there is something that is anomalous. We call it anomaly detection; it gives you that information while you are searching because we are precomputing it. And we actually have some customers who were very happy with it. We have some banking customers who were able to detect fraudulent transactions of it. We're saving millions and millions of dollars for our customers just by looking at the basic data quality indicators of the data set without actually requiring any user actions.

So, those are all the things that you can get from proper data governance and data management. From using it without any kind of user interaction, but then, if you're an expert, usually you want to go further than that. We allow you to build your own business rules to check that data set beside the basic data quality rules so that you have statistical business rules that you want to build on top of your data. And once you do that, you can run them on top of the data set, right from the interface again, removing the context switching between different tools between different implementations, and at the end, you can get that result and export it. Usually in Excel format, as you know, Excel is very, very popular, or just share it with the rest of your organization. Not just results but the business rules that you created so other people can use the same bit of definition for a customer like we discussed before. You can also do this with the business rules, as well as multiple data sets from different parts of the company.

Here is how a data platform can encourage collaboration and bring everyone on the same page

[00:30:01] Loris Marini: One example that comes to mind is, again, another prospect. This is an enterprise, really big company, operating everywhere across the world and they have an internal glossary of terms. It's not really a business glossary, it's mostly like a database, a web URL it's super intuitive, anyone in the company can use it. You can type in an acronym and it will tell you what it means. It's connected to the bigger picture of the business activities and customers and products.

So, I'm thinking with a system like this, is it possible to connect those two dots? Because sometimes you get an idea while you're looking at the data, especially as an analyst or a data scientist, as you’re solving a problem. You realize that there are issues with how people perceive or interpret a data set. And so the catalog and the glossary, particularly the glossary here, comes into play. Because if you find that there is a disagreement and you get the right stakeholders on the line, and you're achieving new understanding, and you want to correct it, you should be able to do it as seamlessly as possible. Is it something that you see happening a lot because I'm interested in closing that loop?

Marek Ovčáček: Yes. And that’s actually true for a majority of the category and glossary tools out there in the professional enterprise market right now. You have a glossary; you have a catalog in one place and usually, some kind of business works on top of that. You have an agreement at the beginning, let's say our glossary terms have a certain structure. And if you want to change it, it needs to go through some approval process. How we approach that and, specifically for Ataccama, what we found works the best is we open up this ability to change the glossary terms to as wide an audience as possible.

[00:32:11] Loris Marini: Yeah.

Marek Ovčáček: Almost anybody can create what we call drafts. You can create a change. Anyone can look at certain business terms then they can go through some kind of back-and-forth approval process or refining that suggestion. What’s important is that instead of just looking at the term, finding the owner and emailing them to tell them, or finding them somewhere in the organization and discussing this with them, you can do it right from the interface. And you can just say, I want to change the definition of a customer. I can do it right in the tool and create a request for change. And then the rest of that is facilitated through the standard processes.

Loris Marini: Reminds me of the GitHub workflow when someone just submits a pull request, and you've got the full context of the code with each line. You know what you want to change.

Marek Ovčáček: Yes. So, that is actually a good example. But we don't like to use that, we use more of a business user-related example. They're usually scared when they start hearing us explain this as like, you create a code, and this is how we do pull request. You lost them at that point. So that's why we are calling them drafts because it's more understandable for them. And, usually, they say this kind of merging is easier. It's not even merging; it’s just replacing it with a new version.

[00:33:53] Loris Marini: Yeah. We have been doing this for a while. You know, we actually synced notes for this episode using Google Docs because you've got that rich context and you can see when someone proposes a change. You can approve it or discard it. So having that context avoids the back and forth of sending you this version and sending you the other version.

How your data catalog can help you bridge the business-technical divide

I don't know the statistics, but intuitively I think just the fact that you don't have everybody on the same tool, not even the same page, metaphorically speaking, but the actual same tool is a reason why people don't know what's going on. How can you manage something if you don't know the current state, you don't know who is proposing changes, and you have to reconstruct that by reading infinite threads of emails with CCs, and it's just not going to happen.

[00:34:41] Marek Ovčáček: It's the number one reason why data governance and data management projects fail in the organization, it’s a lack of buy-in from users. As you mentioned, they need to be in the same tool. If they're not, then one part of the campaign is doing one thing, another part is doing another thing and everything breaks down because they don't know what other parts the rest of the company is doing.

In our survey, we have like only 10% of the business users really using the data catalog. For them, the data catalog is something that's technical. Developers are using it to manage their metadata. They have it, they should be using it, but it doesn't bring them any value. So the business user buy-in is very low.

You have a catalog, but it's useless because nobody is using it. So, we are trying to get to the state where they can basically provide useful functionality to both business users and technical users and the different parts of the company using the same tool so they can communicate with each other and create that interface between different parts of the company that glues all of these efforts.

At the center: CDOs and strategic initiatives

[00:35:59] Loris Marini: Yeah. So, speaking of adoption and users, who were the people that reached out to you most, what did they do? What's the role and what's the business driver that is at the center of them looking after a new solution?

Marek Ovčáček: That’s actually a great question because it started to change in the last 3 to 5 years. It used to be that people who reached out to us were in middle management. They had some project in mind, or they were looking to implement a certain project or functionality like needing to support business users while they're creating marketing campaigns or cleaning up data because of regulation. But in the last 3 to 5 years, more and more people more from the data governance organization of a company, all the way up to the CDOs have started to approach us. Back then, a lot of the companies didn't even have a CDO. Now, basically, everybody has one and they are approaching us with really strategic initiatives. They know that they need to level up their data management or even star data management and data governance initiatives company-wide. And they are looking for tools that can work together with other legacy or newer parts of the organization. And they can integrate with all the different types of datasets.

You have cloud applications, platforms such as service applications like Salesforce, Office360, and SharePoint that you actually don't own, just lease from the company. And you have to have a tool that brings all of that together. The only kind of organization or only kind of people that can drive it together is people in the data governance organization under the CDO. They need to be able to access all of the parts of the canvas data landscape of the organization. And the only way you can do it is to have a tool or set of tools that can physically access the data, gather the statistics, gather the metadata, and put them in a central location.

How a simple trick can multiply adoption and user engagement

[00:38:21] Loris Marini: So, one to rule them all, the single point of access seems to be a recurring theme. In terms of adopting and rolling out the solution, what are the human challenges involved? What should people expect once you get to decide that you found the right platform and you’re ready to kick things off?

Marek Ovčáček: Yeah. As you mentioned before, adoption and human interaction is one of the biggest parts of data governance but it's not a silver bullet. It's not going to fix all your problems. It's not going to create a data governance initiative for you. It requires you to go to the users and guide them through the process.

What does a data governance initiative mean for them? How can they contribute? What kind of value can they get from it? If we just stopped talking about the tools, there's equally as big a part of that data governance initiative as anything else, so does user adoption. At the very beginning, you should have prepared some kind of introductory information for different parts of the company. What is the process to use this? What kind of benefit will we give you? How can you contribute and how can you see the result of your contribution?

What I found that works the best is when the users can actually see the results of their contributions and they can see the value of using whatever is built under the data governance. We have some customers who are managing hospitals in the US and they found out through the data quality statistics that they can actually change the behavior of some of the personnel and lower the infection rate. And this was actually pre-COVID. This was in operating rooms and certain check-ups, stuff like that. And through the data quality, they actually found out that changing some settings in their procedures drastically lowered the infection rates.

They tell that in the company meeting, they have newsletters, they have this kind of visibility that the users can see their contributions have a positive effect on not only company but on their customers, on their organization. For me, half of your work is not implementing tooling, connecting to the systems, and making sure all trains are running on time, but that the people are actually contributing to the system and they feel like they are provided with and are providing value.

The importance of the discovery phase

[00:41:19] Loris Marini: We have a saying, I think in Australia, I heard it recently and it made me laugh, having all the ducks in a row. It really appeals to this image of getting people to manifest a coherent behavior. I'm wondering in terms of industry size and type and level of information maturity; how does it look like from your end? Are people just looking for a tool or do they still reach out even when they don't know what they're doing and they need a bit of guidance?

Marek Ovčáček: Well, actually both. Let me explain it. Sometimes they already went through what we call a discovery phase. They figured out what they need, how they want to approach this data governance project, or they already have gone through data governance and data management initiatives, and they know where it failed or they know where they need to update. And those are organizations who buy the tool and then they just work with it on their own. They know what they need.

Then you have organizations that are just starting and they see that their counterparts are doing better in the industry, getting a lot more done, and the single defining characteristic of why they are doing better is the metric of their data governance and data management processes. That's basically where it is. And so, they're seeing these organizations and they want to do something like it but they don't know exactly what it is. And for those, we actually have our professional services where we deliver data management and data governance projects using our tools, but there are experts in that industry or they go through it from the partners, big consulting firms that have experts on it. They start with doing a discovery. And during the discovery, they also look at the different types of tools that they can use in the organization to deliver initiatives.

For me, there’s a defining moment where I speak with the customer and I discover what they actually need, whether they know what they need, or they don't.

[00:44:20] Loris Marini: Yeah.

Marek Ovčáček: The people who don't know, they start by mentioning Google or Facebook or Uber. They say, look at these companies.

Loris Marini: We want to be like them.

Marek Ovčáček: We want to be like them. We want to work with data like them. Okay. So, you don't know what it takes. You're in a completely different industry. You have a completely different landscape. You don't have it figured out, but that's fine. We can help you with that.

And other companies will start talking about metadata management with configurable modelling. We need knowledge graphs. They know why they needed it. They know what kind of value it's going to be providing. So those are the different kinds of organizations where you can speak to them, tell them what capabilities you have, and then let them mostly drive themselves through that process because they understand how to use it.

Now, these are usually much more demanding, but that's fine. We usually gather a lot of information and feedback from these types of customers who know what they know. And we try to go at it from the perspective that we should be providing them as many capabilities as we can.

Data quality fabric: a blueprint for data management and data governance initiatives

[00:45:52] Loris Marini: Yes. And, since we're approaching the end of our time, I was thinking that I would like to discuss this concept of data quality fabric with you because from the perspective of the consumer, we hear a lot of terms and sometimes a different meaning is associated to it. You're one of the minds behind this, what is the principle behind a data fabric?

Marek Ovčáček: Yeah. So the data fabric itself, it's a design concept that specifies how to do data governance and data management in a way that will allow you to get the most value out of it. It’s kind of a blueprint of how to structure data management and data governance initiatives, to ensure that you are future-ready, and you can expand and provide the data to the consumers as frictionless as possible.

So what I’m describing, I search for data, get as much information as I can, and then I can actually use data to do something actionable on top of the data. That's what I need to be able to provide: data for my consumers which can be people or processes with the quality and format and speed they need. Let's say you have to analyze the sales data for a region. They need to go through a process first to find where the data is and who owns it, then get access, get additional information, get a data quote. Traditionally, it's built on top of people and different tools, users, tribal knowledge, I call it the metadata layer: data in the heads of people or in the Excel sheets, in the data models inside of the data sources itself.

So that's your normal data governance layer. Now, if you look at how you can automate it, in what we call the data fabric, you first need to gather all of that metadata information in one place. That’s your data catalog with the knowledge graph on top of it. And then you can use automated processes for the rest of them.

For example, what are the different possible relationships with the data? What is the data quality? What are those anomalies that I mentioned before?

You get this information to the users and your need actually dictates what kind of format or what type of quality and shape of the data you need and this data preparation layer should use the metadata. It knows the structure, granularity, timeliness, and quality of the data it should transform and prepare for you. And at the end, you have some kind of data ops. You can have some kind of automation on top of that, that allows you to execute that action. So, all of these things together and create your data fabric.

In Ataccama, having automated data quality gives you huge benefits not only business results but also, and this is usually overlooked in the way that you process the data, when you process the data with the higher quality, you have less and less outliers, you have less and less anomalies that you need to deal with. And the processes are actually way more effective in most cases.

[00:49:44] Loris Marini: And you have to keep it clean.

Marek Ovčáček: Yeah, especially in something complicated like master data management systems. If you have data quality built-in, you can actually save something like 40% of your processing time and money to get that result faster, better quality, and with less processing power.

Loris Marini: So, the keyword is friction?

Marek Ovčáček: Automation of that data governance layer is what the data quality fabric is.

A vision for the future

[00:50:14] Loris Marini: In terms of long-term challenges and sort of macro trends, now that it's getting easier and easier to fix the technology part of the problem, what do you see is going to happen in the next decade? What are the biggest roadblocks we'll need to face as organizations diving into this?

Marek Ovčáček: I think sooner or later, you arrive at the conclusion that you will have multiple sources of metadata in your organization. And I think the big new trend in the future will be basically what you are doing in the master data management with your data sets, you will need to do with your metadata in the future.

And that's where I see a lot of automation. And loads of AI processing will be needed because there simply is no standard exchange format for metadata to have in a widely different format and wildly different export capabilities.

There are some initiatives in place that should make the standard metadata format, but this is a commercial space. You don't have that much incentive of having some exchange formats. If you look at the exchange format of your social media platforms, you cannot simply take data from Facebook and Twitter and merge them together. They are wildly different. And you cannot simply take all your information from one platform and then migrate to another platform.

I'm still hopeful that we'll have some kind of semblance of the most important entities and a common metadata format in the future, but I would rather be prepared for it to not. And I think the big trend will be detecting the formats and entities from various metadata tools automatically and merging them together and creating this master view of your metadata. That's the trend that I see in the future for data management.

Towards true interoperability between different systems

[00:52:33] Loris Marini: I think I agree with you that the chances of something standard is going to be really hard. But there are examples of successful projects. I'm thinking about Apache Arrow. That was an incredible effort to try and unify and standardize how memory analytics works and it's been successful.

I mean, I’m not super up to date with the state of the project, but it does at some point, become super clear that the benefits associated with interoperability are huge down the line. And I agree with you that it requires a mental shift from, “this is my territory, this is my brand and my identity, and I don't share it with anyone” to “let's actually help each other build on top of each other and therefore accelerate progress at a global level.”

Marek Ovčáček: So, I am actually kind of agreeing with you, although I'm not that optimistic. It actually worked pretty well for Ataccama in the past. Everything that we are doing, although we have proprietary software, are endpoints and APIs are not proprietary. And we are proudly building our capabilities specifically to be able to integrate with any tool out there.

But instead of relying on the other tool’s capabilities of providing you the proper API, we are actually building metadata transformation capabilities into our tool itself. We have projects where we are using elation as a data catalog, as opposed to our data catalog, or we are partnering with Collibra on actually a lot of the projects because it's entrenched in all their organizations, they know how to use it.

They have internal buy-in so it's easier for users. And the way we are approaching it is we are not relying on there to be a standard, but at least some kind of openness between the tools where they will provide you the APIs. We'll provide you the documentation about those APIs, and then they will let you basically go on and do whatever you want with that metadata.

[00:54:50] Loris Marini: Yeah. It goes back to this concept of removing roadblocks as much as possible and getting the tech to be an aid, as opposed to a blockage, to the flow of ideas and innovation.

Marek Ovčáček: For us, it's kind of built into our mentality because as I mentioned, we are not only selling the software, we are also delivering product. And our people in the project need to have that functionality to be able to successfully deliver the projects. So, over the years we developed that and that is what's working for us incredibly well.

Follow the Ataccama journey

[00:55:24] Loris Marini: That's amazing, Marek. What's the best way to follow you?

Marek Ovčáček: This is actually weird. I'm not that active on Twitter or other social media, but you can follow me on LinkedIn, whereas I have a very particular name, you can find me very easily. There are no other people that are named like me in the data governance space. So that would be probably the best way to follow me.

And, also follow our Ataccama Twitter, where we are sharing not only our specific marketing messages but also some nice industry stories and insights. And of course, ataccama.com, where you can actually find guidelines on how to do data management projects.

Loris Marini: Okay. And you guys are based in Toronto.

Marek Ovčáček: Yes, we have headquarters in Toronto with around 40 or 50 people. And then we have offices in 10 different cities around the world. The large R&D office is in Prague where we have 300 people.

Loris Marini: Pretty cool. I look forward to a future not too far where borders will be open. I'll be able to fly back to Europe and meet the guys that are doing this day-to-day. Be where the action is.

Marek Ovčáček: Definitely give us a call. And if you are there, we are happy to give you a tour or discuss any additional things. If you want to, we are always open to visitors.

[00:57:08] Loris Marini: Fantastic, Marek. Thank you very much for your time. And, I wish you the best of luck for the future in terms of COVID and for the data governance and data management challenges that expect us for the next decade.

Marek Ovčáček: Thank you for having me. Have a nice day.

Loris Marini: You too. Cheers.

Contact Us

Thanks for your message. You'll hear from us soon!
Oops! Something went wrong while submitting the form.