Chad Sanderson: The semantic data warehouse

Loris Marini - Podcast Host Discovering Data

Business users must be able to communicate their data needs quickly without writing code. This is the idea of the semantic warehouse. But how do you build one? Join me as I learn from Chad Sanderson, Head of Data at Convoy.

Also on :
Stitcher LogoPodcast Addict LogoTune In Logo

Join the list

Join hundreds of practitioners and leaders like you with episode insights straight in your inbox.

You're in! Look out for the next episode in your inbox.
Oops! Something went wrong while submitting the form.

Want to tell your data story?

Loris Marini Headshot

Checkout our brands or sponsors page to see if you are a match. We publish conversations with industry leaders to help data practitioners maximise the impact of their work.

Why this episode

The gap between business and technical people costs companies billions every year. Data practitioners need to learn more about the business if they want to create useful and usable data products. But nobody knows the business better than business leaders. What if they were able to communicate their data needs quickly and accurately without writing code? What if their requests were so context-rich that data engineers improve their understanding of the business while producing data assets? That’s the idea behind the semantic warehouse proposed by Chad Sanderson, Head of Data at Convoy. Today we decompose this idea and explore what engineers need to build a comprehensive data “system” that scales.If you want to learn about data products I highly recommend subscribing to Chad’s Substack here: https://dataproducts.substack.com/

What I learned

Share on :

Want to see the show grow?

Your ideas help us create useful and relevant content. Send a private message or rate the show on Apple Podcast or Spotify!

Episode transcripts

**Loris Marini:** If you've been around LinkedIn recently, you've definitely heard of the immutable data warehouse. So this is a concept that has been, talked about by Ch Sanderson. We met on LinkedIn as usual, and we got together to do an episode around the semantic layer the modern data stack and what is missing, cuz we there's a lot of buzz around this terms at the moment in the industry.

So I wanna really do dive into this each individual components and make hopefully a little bit of clarity out of it. Chad is the head of product for the data platform at convoy. Today, we're gonna talk about the pressures that a startup goes through the process and the lack of design in data products, spaghetti-SQL Why DBT doesn't seem to be solving the problem of spaghetti sequel and the problem of, and the challenge of creating a unified semantic layer that everybody can tap into across the organization to contribute to the organizational knowledge. And so create those positive reinforcement, those positive cycles that are closed, that start from an idea and experimentation, then feedback testing, and an improvement.

And. So I'm super excited to have Chad here on the show. Chad, thank you for taking the time for being with me and welcome to the podcast.

**Chad Sanderson:** Thank you happy to be here.

**Loris Marini:** All righty. So where should we start? You reckon we start from perhaps a broad, the broader overview of what's been happening in data product, the data design and the modern data stack. How do you understand the space what's been going on?

**Chad Sanderson:** Yeah. So there's a few really interesting things happening right now. And just for the context of this conversation, I'm mainly gonna be focusing on what we've been seeing in the US, because I do think that there are some major geographic differences in terms of how folks approach data and their data infrastructure.

And. United States essentially had a big era of Hadoop followed by this mass migration to the cloud. what happened was over the last 20 years, you've seen this gradual reduction in the proliferation of data modeling lack of the data architect as a common, higher, even a, large technical companies with lots of data. Not too much thought put into relational modeling, and this has essentially been more or less abandoned in favor of speed,

High volume of different data sources. More or less being thrown over the fence by producers

and what that has resulted in and the place we are now is a fundamentally unscalable data infrastructure and a data swamp in place of a traditional warehouse.

And so a lot of companies are reaching this crossroads where they don't know what to do. They are leveraging the modern data stack as it's called DBT Fivetran airflow, Snowflake Databricks and they're really struggling to find value in their data. And that's where we are. And those are some of the core problems that we need to solve.

**Loris Marini:** Yeah. Yeah, absolutely. And you've been writing quite a lot about it. But first, before we dive into the immutable data warehouse I have a question for you. It's about your journey at convoy because we, in our prep call I shared with you my experience, Sendal send those logistics startup very different in the way that it operates compared to convoy.

I think your business model is a little bit more real time and, nuanced. So there's a lot of moving parts have been reading your newsletter recently. And it's fascinating what you guys are doing. How did you get to start worrying about data products and what was your journey at convoy briefly?

**Chad Sanderson:** Yeah. So convoy just for everyone's understanding is a freight technology startup, and it's a marketplace that sits in between a shipper on one side and a carrier. It's just a business that ships. Freight across the country on the other side. So it's a B2B marketplace and that fundamentally means we do not have an enormous number of customers, but we do have a pretty decent amount of data.

And machine learning is core to our business. We have machine learning models that predict the prices that we should be putting shipments onto the marketplace models that manage the relevance of our offers that we send to truckers that use our application. We have batching models that essentially group shipments together into a single job, more or less in, in many other types of machine learning models.

And you could truly say that ML is like the beating heart of convoy. If we didn't have machine learning, we wouldn't have a company and that's not oftentimes the case at a lot of

businesses. ML is like a nice to have, it's an add on sort of a fun side project, but it really is the core of convoy's entire business model and it's working like it actually is, it is doing its job.

**Loris Marini:** Mm-hmm

**Chad Sanderson:** But the problems that we faced in data, we're very different than the problems that some of our counterparts on the marketplace side Airbnb or Amazon or someone like that face, they have mass volume. Their issues are around, compute and storage costs optimization, latency, things like that.

We don't really have as many of those problems. We have a complexity problem where we have hundreds of important entities. We have dozens of life cycles thousands of events. These real world behaviors that we care about from the moment we bid on any RFP to the moment that we complete a shipment how we pay that shipment out to our customers.

And it's critical to convoys business to understand all of those life cycles and to do analytics and machine learning on top of all of that relevant data. So our data convoy was a mess. We like most early state startups began from basically nothing from scratch. We were taking a lot of our data, pulling it out of S3 and solving simple problems with it.

But over as time went on, we started accruing all of this technological debt with our methodology and and we couldn't scale. And so we had to solve this problem of like, how do we allow convoys data complexity to scale versus. Our our storage and compute costs.

**Loris Marini:** right. And this inability to scale, how was that perceived? In what way was the business. Feeling the problem with scale.

**Chad Sanderson:** We certainly had an issue of data quality, where there were many models that were being built off of training sets, downstream owned by teams that didn't really have anything to do with the data that they were consuming. So you may have a pricing team that's consuming data about the market that they don't produce, and they don't own.

They're consuming data about, bids that are coming in from various carriers they're consuming contract data and tender data. They don't own any of this. And so if that data isn't modeled particularly well, if there's no contract, if it isn't high quality, then they're the ones that suffer.

And they're the ones that look bad, and convo is the one that sort of feels the impact there. It's really not incredibly difficult to lose millions of dollars when your pricing model is

is not


**Loris Marini:** yeah.

Is wrong.

**Chad Sanderson:** And then we also had this issue of just understanding how our business worked and being able to answer, critical questions.

If you asked a question like how many active shippers does convo have at any one point in time, depending on who you asked and depending on the data set that they use, you could get a different answer. And that was a really important metric for us to make decisions off of. So our executive team, needed to have trust in the data.

Our machine learning team needed to have trust in the data. And that kind of starts by understanding what data you have and how does the business work? What data do you need, and then how do you ensure that data is high quality?

**Loris Marini:** absolutely. So that's the whole point of this micro discipline. We call data management, right? Which is a combination of many things. There's engineering, there's architecture, there's modeling, there's governance. There's all sorts of things in it. There's design. the idea is really to to map what you just said.

So know what the business case is, what data you need to support that business initiative or decision. And then have visibility at the very least on the full life cycle of a data set from the moment it gets in to the moment it's used. Ideally having some sort of impact analysis that tells you, if you make a change here, what happens downstream and what are the the upstream, the dependencies for a feature.

So really, what a lot of people call lineage which, there's many levels of lineage just gets pretty complicated, but let's talk about this idea of the modern data stack, the role that DBT played in shaping it and what's wrong with it because I, I keep talking to people and no matter the size of the business, they all complain SQLs.

Great. It's the declarative. It allows you to express, to write complex queries, join many tables and do things that would take you forever to doing a prescriptive Lang language, but it can, without the right. Approach, it can turn into spaghetti just like any other code and that's what exactly what's happening in a lot of companies.

And we have, we see this tendency on the one side of data teams to democratize access data and have more people do more with it. I certainly, I experienced firsthand you got this new BI tool, relatively new BI tools that are, that allow you to do more. We're talking about reverse ETL.

which is the idea of using. Models created by consumers directly, perhaps in their BI utilization tool, bringing them back into the data warehouse. So it's clear that there has to be a cycle, right? The data comes in there's transformations done to it.

There's meaning added to it. And then someone has to consume this pieces of digital assets to do something with them. Why from your perspective, the current system isn't working and what are we missing?

**Chad Sanderson:** So I always like to use a metaphor here because it abstracts the problem enough so that anyone can understand it. And, we call we call a data warehouse and I think that a data warehouse is actually quite apt because when you are designing a house or any building of any kind there's actually multiple steps.

Multiple workflows, the first workflow is, requirements gathering and definition. You have to describe Hey, what sort of house are you building? How many floors is it gonna have? Is this a beach house? Or is it a house in the mountains? You have to describe, what do you want it to look like?

Is it gonna have wooden floors or tile floors or stone floors? And where are the cabinets? And how many windows is it gonna have? You usually do this alongside a designer or an architect, or you provide requirements to a designer or an architect, they take those requirements and then they generate mockups, right?

To show you, this is probably what your thing is gonna look like. They build a floor plan. They say, okay, here's where all the windows are gonna be. Here's where the doors are gonna be. Here's the spacing. Here's how the multiple floors and where the stairways are.

And that floor plan is what is given to a developer to actually construct the house along with the instructions on how to use the materials, a render of what the house actually looks like, and then armed with all of that information, armed at the requirements, the materials, the floor plan, the context of the home.

Then the developer can actually figure out, okay, what are the materials that I actually need to order, how many of them do I need to order? Where should I get them from? What is the context of which I should use them? What team do I need to build this house? And then what are the tools that I need to actually construct the house?

Because those are gonna obviously be very different if you're constructing a mansion or if you're constructing a shack out in the woods, and then finally all after all of that, then you actually build the home, right? You go through the process of constructing and sawing and hammering and soldering and whatever else.

I don't know. I'm not a home

builder. there is certainly a workflow there. And when we talk about the modern data stack, the metaphor here is that it only focuses on the procurement step down. It's like assuming that we are just dumping a lot of materials at a build site, and that we have developers that are waiting on the other end with tools like DBT and airflow, and they know exactly what to construct and how it should be constructed in a way that would be most useful to a customer that is not even present in the process.

And of course that seems totally nonsensical, right? If we built houses that way, the entire housing market across the country would fall into disarray and chaos was O overnight. And yet this is exactly how we build, data warehouses. This is how we scale our data infrastructure.

And so the core problem to me is that there's nothing wrong with the tooling that we use in the modern data stack, right? There's nothing wrong with, using SQL I think SQL is great. There's nothing wrong with using DBT using a command line tool is great. It's super easy in the cloud.

There's nothing wrong with tools like, airflow to orchestrate your jobs. There's nothing wrong with the BI tools. The problem is if there's no context for the data, if the data that you were using was not designed and implemented with intent and purpose you're always going to have this crazy mess because people don't know what to build.

They're gonna build for siloed use cases. Those siloed use cases are gonna overlap. There'll be business context that emerges over time that people don't think about. Or remember, teams will roll off. They'll forget to do documentation. There is no real source of truth that actually describes the real world. And I think that's what's needed.

**Loris Marini:** I wanna dive into this description real world, because I think that's key. And I certainly agree with you that the data warehouse should be. Should be a digital representation of what's happening outside and inside as well. The walls of the org of the company. And there are two different perhaps ways of approaching the problem, right?

If you're looking at external customers versus internal customers, and for some reason, people tend to understand more intuitively the concept of an external customer, because we know when we think about a product, a pair of shoes or a bottle of water, there's someone we are selling that to in exchange for cash or someone thing.

And that someone is typically not us or not our team. But there is a lot of products that we create as in, we need to agree on terminology here. I forgot to say at the beginning, but this is very much a conversation. I don't wanna assign that interview at all because I think we, so if you have any questions, jump in and stop me anytime.

But I see the term product as this concept of a thing that people want to use and they find Val find it useful and valuable. Could be more or less, but it's something that is consumed and people want to consume because they've got a need, basically they want to consume it.

And so if you take that definition a lot of the things we do arguably, 90% of data transformations we do in the modern data stack are internal data products. Cuz they are used either by some other. Model. So another product that comes downstream that uses in, the stuff that comes upstream or they're used by someone in the org, it could be a BI dashboard. It could be a machine learning model. The point is if you don't use it, it's useless. You've gotta use it to make something out of it. The process of using something in this case, a digital resource let's call it. No, I don't wanna call it asset, but digital resource makes it a product.

So someone has to take the time to ask the question, who's gonna use it. Why and how do we make. Useful what I see though, and especially true in a lot of startups, but I have learned the bigger the company, the bigger, the mess the tendencies, as you just said of siloing, the production of this data products in the realms of engineering or it in large enterprises.

And so the folks that are able to write command line that, that able to write code and to, look at a com and a CLI and actually interrogate the database. They're the ones that are gonna make a lot of assumptions in integrate those assumptions into the code. They're right. Often with limited visibility across, by stakeholders.

And things don't work because we make assumptions behind closed doors. And we don't even if we had that ability to talk to the customer, quote, unquote, like internal customer they wouldn't be able to check and cross check and that feedback loop all of a sudden becomes too expensive unbearable.

And so a lot of teams could just go Hey, just do your best, push it, we'll merge it. We'll see. And then we'll get some feedback and guess what's gonna happen. People are not gonna give you that feedback because timing issues, feature requests that keep piling up, McDonald orders like Brian O'Neil calls it, we keep serving stuff.

And so you've got this data team that is sitting here, whether it's hub and spoke or centralized, whatever it is the common problem is a lot of requests. Bandwidth is limited. People are overwhelmed. There's no time to actually do design. So design is an afterthought and then things don't work. That's you know what I'm seeing. So I'm asking you because you guys solved the problem at convo and you led the new way of doing things. What is the right way to do it?

**Chad Sanderson:** Yeah first of all, we I wouldn't say that we've completely solved the problem. We're definitely well on our way. And we're seeing some exciting benefits that come from our approach, but I, so I wanna talk about a couple things that you said, cuz I think it's really important. The first thing is, yes.

I definitely agree that anything that can be used, anything that is used by a customer and is built through by software is a product. And so any data set is certainly a product that is used. However, the entire data warehouse you could say or the, this, any the entire snowflake platform you could also argue is a code base that is being used more broadly speaking.

And it is a product. It is a product that is designed to serve mini customers. It's actually a service. In fact, everyone intuitively understands. I think that if you use snowflake or you use data bricks, or these other, platforms that they are services, they're just not often treated with the same sort of respect and dignity that we treat all of our other services.

So that's one really big the other thing I would say is that I actually like to disambiguate I like to think about what is the purpose of data? Like why do we even collect data in the first place? The reason that I believe we collect data is because.

Building features and understanding how those features impacted our world are two sides of the same coin. You can't actually have one without the other. You have one without the other. It's not a valuable conversation. You're just building things, but you don't know what happened, not meaningful.

If you understand what's happening in the world, but you don't have the capacity to change it. Not really meaningful. You don't have a company. and when you think about what the data warehouse was originally meant to be, it was originally simply meant to describe our world as accurately as possible.

So that downstream consumers could then ask questions of the universe. And those questions could be infinite and variable. As long as we have the connections and network effects between our data and this I think has been lost, instead of. Designing the real world through data. What we often do is we create pipelines, right?

We say, oh, we have a use case. And we need to build a pipeline in order to generate the data that we need, and then pipe that into a dashboard. And what that creates is silos. And when you have silos, it means that you lose all the valuable network effects that you could be gaining all the relationships the modeling, right?

This is the core value of doing the data modeling in the first place. And that's why I actually believe that the design effort should and could be it sh it could be, and it should be separate from how the data is actually used.

It should be this effort to describe the universe as accurately as possible.

And then if you're able to do that effectively, the number of questions the business can act, answer, become exponentially


**Loris Marini:** It's communicator. Yeah.

**Chad Sanderson:** Yes. Ator. Yes, exactly. And so I think the place where you ended your comment was like how did we approach this at convoy, right? How did we think about even solving this problem?

And I'd say it started there, right? It started from that sort of fundamental assumption that the role of the data warehouse, and this is how Billman describes it anyway. But the role of the data warehouse is really to describe the real world as accurately as possible. And what that means is these software engineers who are building databases and constructing services are also siloed, right?

They're working on a specific section of that problem. They're not really even thinking about how to use the data, to describe the real world they're building features and

applications. And so there has to be something informing the data that is collected from these systems. That do create that real world view.

We so start started by creating this request system at convoy where a data consumer is going to a data producer and saying, this is the way that the world is. We've got entities, those entities have properties, they have relationships to other entities.

Those are sort of our nouns and our adjectives. And then we've also got verbs. There's real things that happen in the world, entities interacting by themselves or inter entities interacting with each other. And then there are properties of those verbs. There are ad


right? And once you have

**Loris Marini:** anaology.

**Chad Sanderson:** it's an ontology.

Exactly. And once you have actually described the ontology, then you need to have a way of converting that ontology into some physical manifestation in the code where we are capturing the data, the way that was intended.

then if you can do that accurately, you've built your view of the real world.

And so this has been one of our big focuses at convoy. How do we do.

**Loris Marini:** One of the biggest pushbacks that I I get often when I talk about that because, before I even knew about the work of bill Inman, I had this idea of a digital representation of the world. And in business meetings got this means pretty brutally very quickly because it sounds like something that's never gonna be done.

You can put a budget on it. It's unclear whether you're gonna get an ROI from it. If I was to create a cartoon, I would put, business leaders on one side being the skeptical, actor on the stage, looking at the data and go yeah, everybody's talking about it.

The big consulting firms keep publishing stuff and there's a lot of kind creatives talking about it. But all we care about is, managing risks, minimizing exposure, basically reducing our costs and increasing our profits. That's what we want. Because the bottom line has to go up.

That's all, and of course there's many ways to do it. It's not just that it's also customer retention, customer satisfaction. You can speak about the same coin by looking at the two sides, but that's

**Chad Sanderson:** have present ROI

is what


**Loris Marini:** Yeah, exactly. You gotta bring results.

in your experience, did you have that pushback, a convoy? And what would do you think is perhaps the, an effective way to manage those kind of arguments and say, Hey, we, yes, we need to worry about right now. We need to have data that is fit for purpose. We don't need a hundred percent quality, whatever the quality measure metrics we use to define what a high quality data set is.

But we also need to think about the bigger picture. Cause if we do reactive type work, we'll just end up focusing on how many lines of a sequel, we pushing a GitHub and we're never gonna see the impact on the business as a whole. It's tricky.

**Chad Sanderson:** It is tricky and we spend a lot of time figuring out, how to position this, by the way. I think a skill that a lot of data people need to get good at is sales. I've met so many folks that are brilliant. But they do not understand how to sell a project. And that leads to them becoming eternally frustrated, never being able to implement what they feel are best practices.

So the way that we thought about this at convo and the way that I positioned it was number one, we have some serious issues with our data. That means we are unable to answer key questions about our business. And I went through our company and I found example after example, where that was the case.

And I contextualized those examples and said, listen, we have a mutability in our data sets for these. Key entities. And that means while we always know the status of those entities at any particular point in time, we have no understanding of the history of those entities. That data is in is invaluable for our machine learning models.

And that means we're not actually able to train our ML models on the right data set, which means we're losing money. So that's one way that I framed it. The other way that I framed it is we have soaring tech debt that prevents us from answering questions. And it also makes the lives of our data scientists, which are some of the most valuable resources and expensive resources at the company much worse.

It increases churn, it increase it, it decreases happiness. It decreases time to insight, and it also decreases the velocity of innovations that we can ship. And when you know, convo is a startup and our ability to quickly deliver features is. one of our biggest advantages. And if we're saying that our lack of data infrastructure is actually hurting our ability to ship features quickly and that's because we can't make decisions fast enough.

Then that is something that really matters to it, it really matters to leadership. And then there's also this element of trust, which is we can show authoritatively not everywhere, but we can show in enough places where the data was just wrong. And that caused us to make a bad decision. as an example, we might say, Hey, we, we were using some third party application and we turns out due to some like implementation error. It looked like we were losing 25% of our customers. Month over month, we invested 150 or $200,000 in marketing campaigns that turned out to be totally


are the types of things that we, that I looked for. And then I framed all of those disparate problems as actually part of the same problem. And the same problem was we do not have a source of truth for our data. There are, there is no ownership. There is no contract between the producers who are generating the data and the consumers that are leveraging it.

And because there's no contract and we don't have the central source of truth that describes the real world. We're not able to leverage our data effectively. We're not able to move quickly. Our machine learning models break all the time and so on and so forth. And while you're not ever gonna be able to accurately describe the totality of the problem, you can describe it enough that any leader who has common sense is gonna be able to extrapolate, right?

I'll give you, one really clear example of this. We had a data scientist that used to work at convoy that was famous within the company. . This is someone that had been there for a really long time. They had worked closely with the CPO. They'd worked with the CEO and I got a quote from that person that said I was trying to answer a simple business question about shipments and it took me two weeks.

And to present that information to an executive, this is someone that they think is unbelievably smart. And if you say they're trying to answer a simple question that on the surface looks like it should take half a second. And this unbelievably smart, skilled person takes two weeks to answer it.

What's happening in the rest of the organization. What's happening in those places where there's more junior data scientists or

junior analysts, right? Executives are smart. Like they, they under, they can extrapolate, but you have to tell a story and a narrative that where you allow them to fill in the blanks about how bad the situation is.

**Loris Marini:** Yeah. Absolutely. So step one is really speak up, right? Because if we stick ahead in the sand, nothing is gonna happen. I think that there's a lot of, so culture, there plays a huge role now, and this is a field of leadership in general, which I think has been there's plenty of books and, a lot of really smart people have talked about a length of what makes a great culture in terms of, how do you drive a team that can achieve much more than the sum of its components?

How do you create that synergy, which is, sounds like a buzzword or a bit of a cheeky, yucky word to, to throw in the context of a data and engineering, technical audience. But that's what you're looking for, ideally, you're looking for a combination. Skills and interests and approaches to the problem that give you that extra that extra velocity that you wouldn't have simply by some in the single, the parts of the system.

This is the reason at the same time, why I think data is fascinating and extremely hard to get. And it's the fact that, in any other business you do have a luxury of saying, Hey, we're gonna deal with this within this domain.

If you're talking about sales, you can gather your top people and say, Hey, we're gonna do this. We're gonna try this and that experiment we're gonna do internally. What we'll learn will bring you, we'll pass it to the business. We'll do an executive summary of some sort, and that learning and knowledge, that is gonna be a mixed of documents and tribal knowledge in the heads of people will stay within the sales team.

You do wanna hope that they have knowledge management system. Otherwise when people inevitably leave that knowledge is lost, but that's another problem. But you can do that right within the domain In data, if you could do that, but it leads to siloing. And so you, it kills your ability to reuse.

I think Bo sch smarter calls it a Mar marginal propensity to reuse the asset, which in economic terms is really the big selling point of intangibles, whether it's data, IP knowledge, information, anything that you can't kick with your toe, it's an intangible asset. And that means it can be used and reused over and over of course, to, to reuse it.

You need to know where does it come from? that contract that you were talking about? So what cause a lot of people I see a lot of confusion around even what a contract is. What is a contract in your view?

**Chad Sanderson:** So for me a contract is simply an agreement between two parties. And when we talk about contracts, we're. agreements between the producers of data and the consumers. And in this context, the reason that's important is because the consumer of the data is the data warehouse. And what has traditionally happened in most American, tech-centric organizations that have adopted the modern data stack.

Is that the primary? And I'll even go as far as saying exclusive use case of the data in databases is for operational use cases. So you have software engineers, they have implemented these databases that are basically just like implementation details of their services. And then thanks to the magic of ELT.

They throw all of that over the fence, they throw it into snowflake, they push it into data bricks. It doesn't really represent the real world. It's not modeled correctly. There's a lot of data that's missing that is incredibly important for downstream analytics, but teams essentially work with what they have and they start to consume that data because of course they do.

It's the best source of information. This first party data it's it is the truth. And when you start piping that into production use cases like machine learning models, or, calculating the margin that your company is gonna use to report out your shareholders.

An engineer could potentially change that at any time, either by negligence on purpose or just because of a bug

and. This happens constantly. So there's two things that contracts provide. The first thing that contracts provide is essentially an abstraction that sits in between the operational use case and the analytical and machine learning use cases.

**Loris Marini:** Yeah.

**Chad Sanderson:** And it basically shields the software engineer from having to think about, oh, there's a lot of data that my service submits that would be very valuable downstream, but it's not particularly valuable to my service. And I don't wanna put it directly into my database, but I can surface it through an API and that's different.

And then the second thing it does is that it guarantees quality and it guarantees ownership. So if something were to break, there's a clear line of sight into exactly who it's breaking when did

it break,

**Loris Marini:** who to reach out

**Chad Sanderson:** And you could reach out to them if you're making changes that are backwards incompatible, you see everybody that's gonna affect and you can communicate those changes well in



**Loris Marini:** that would be the dream.

**Chad Sanderson:** Yeah. that way you're essentially ensuring that you are treating snowflake as a production service. So that's the quality piece, but once you have contracts in place, then there's a lot more that you can do. And this is where I think things start getting really exciting because the contract is essentially this link between the producer and the consumer that never existed before it's building the relationship and breaking down the silo between those two groups.

And when the silo is broken, then the consumers actually have a mechanism of communication with the producers that they never had before. So now they can say, Hey you have a contract. You're sending me this data and an API. You're telling me here's a shipment entity. And here's 10 properties that I am now ingesting and pushing into my dashboard or my machine learning model.

Can you add this property? This is something that would be really valuable to me. and there's a reason that I want this property is because I think it would be a valuable feature in my machine learning model. And I think it's gonna improve our accuracy by half a percentage.

And that has a clear dollar value. Before there's not really a good reason for the engineer to, to do that. But now that they have this contract, then, and they're already treating the data as a product and like an API that, that bond is there. And then you mentioned something before that, I think finalizes, at least the last component, the last important component of the contract, which is the lineage.

I think the lineage is so important, right? And the lineage is what allows the producer to see down the line. Who's gonna be impacted by a change. , but it also flows back in the reverse direction. It allows the producer to see how is this data that I am generating being used. They have no idea about that today.

**Loris Marini:** Yeah.

**Chad Sanderson:** It's completely a black

box to

them. And so of course, like of course an engineer isn't really gonna care about snowflake when it's totally abstracted way. But imagine if it was tangible, right? Imagine if you owned a service and you could look at that service and see, oh wow. These three columns that I own are actually powering our offer relevance model in the

app. Like that changes the game in terms of how engineers think about data and how they



**Loris Marini:** Absolutely. It helps you focus on what actually matters. It helps you build a business case when you wanna do stuff, as you just mentioned, but it also closes the loop, right? The whole idea. Of data is to use it to at some point, generate, create knowledge, share it within the organization. And knowledge is not something you create, you produce somewhere and then forget about it.

The it's a has to be a close loop. And what you just described is the fact that there is no close loop in most data systems today. There's production done ad hoc reactively forget about the bigger picture and the representation of the world. Like we are not even closer there we produce, we dam it in our warehouse.

We say, okay, job done. Fisher branches being merged. We've done our job. We have no visibility whatsoever. If we don't have visibility you can't get feedback. If you can't get feedback, you can't improve it. And so the whole idea, the whole property of an intangible asset, which makes it so attractive from an economic standpoint, that could be the settling point for a business leader, which is the ability of reusing it and dropping that cost to reuse down to zero.

**Chad Sanderson:** Yeah.

**Loris Marini:** that marginal propensity reuse is gone. You just, you're not, you are not creating the system to reuse it. It's just produced dump produce, dump, produce dump. So we need, we gotta close the loop somehow.

**Chad Sanderson:** Yeah. Yeah. The I and the amazing thing is that closing the loop is actually beneficial for everyone. It's beneficial for. It's beneficial for software engineers, because now they're able to add value in ways that they didn't even realize they could add value. If I'm someone who's working on the engineering side and someone says, Hey, can I have a new, can I have a new attribute?

Or can I, can you admit a new event that captures some real world behavior? I'm gonna say, okay, I'm, I'm doing this. And then what what do I get for it? Basically, nothing like it's just work that kind of evaporates into the ether. But if I do the work and then I can visibly see that now it's being used in our pricing model, that's very different.

And like there, I, now I now have skin in the game. Like I can take credit for that. I can put that on my I can put that on my promotional package. I can say that I was able to help increase the accuracy of our pricing model by 2%, that has like real value. And then the other thing that it does is It allows the engineering team to start thinking, not just in terms of to your point, not just in terms of how to improve their services, like what features do we add and how do we make database changes in order to improve our features.

But they start thinking about the consumers in the warehouse as features and customers as well. The warehouse is a feature and the warehouse has customers and then they start thinking okay I thought of some cool things that we should probably be capturing. I'm gonna add those in, I'm gonna create an event.

This is something that's important and meaningful. And I think that there's gonna be some there's gonna be some use cases for


Yep. Product managers and data scientists and analysts can start saying, Hey, I have business needs. There's questions that I need to answer. And those requests, and can start going to software engineers and software engineers can start thinking about here's how I would fulfill those requests through data,

which has never been an

option for


**Loris Marini:** And I wanna offer you an extension of that as well, because we are focusing on the technical team, but in a known technical organization with, a known tech startup, say take an established enterprise that does food production. A lot of the insights can actually come outside of the engineering team or the it team.

And I would actually argue that 90% of the time, the real nuggets are not within it. They are in distributor across the org is the domain, which is the idea of the mesh. This domain oriented, decentralized approach to data modeling and data architecture, which is, the folks that are there that are doing the work day in day out, they have a feeling they know what's going on.

And at the moment they have, there's no way to not just communicate what they want, but communicating in a way that is actionable. Because, I complain on teams or slack, Hey, we're struggling with this is, it's great. It's the first step about let's dig deeper, and it would take forever for an engineer to gain and gather, collect all the context around that need.

instead of putting all the either the responsibility on the technical team, can we distribute it literally distribute as in those contracts that you talked about, those agreements that capture the metadata, they should capture enough information, enough context for anyone to be able to look at a number and understand how does it fit in into the bigger digital model of the business.

and by themselves, be able to say, Hey, I'm on this. I need this. There's a collision. For example, between two terms, I've been having a conversation over lunch with Greg from sales, and we couldn't agree on what a customer is at the moment. So we need to dis Ambi that we need to create this extra model.

Cause it's creating tension. Our reports don't agree. The CEO is getting stressed, right?

So that feedback, you have to have the channel. And I think one of the biggest elements of the contracts is that if they're done well in the sense that inte intelligible, so people that can understand them, there will create that self serve functionality to the feedback mechanism.

So the loop doesn't have to be closed by the engineer necessarily. It can be at the system designed such that it's very easy to get feedback in actionable feedback and people can then evaluate a


**Chad Sanderson:** That's exactly right. And so this is something that I'm working on writing, but the way that I see it is that sort of transitioning to this idea of a what you just described is what I call the pro is like the promise of the semantic warehouse. If business users can communicate data needs in semantics, instead of encode, then you are going to have this lifecycle this sort of flywheel effect where non-technical people can communicate their asks because they understand the business. They can do that. Semantically, they, the software engineers can implement these contracts as schema that flows into the warehouse.

And it's very easy to manifest that data as an answer to the question, but in order to get there, I think that there's really three, three levels of maturity that the company has to go through. The first level is how do you get the engineers to care about data? Because in most cases like we already described they're just not thinking about it.

It's an implementation detail. And the way that you do that is by starting with lineage. And contracts. Those are two sides of the same coin, right? If you understand the lineage from the services down, then the engineer now has accountability, like having visibility equals accountability, if I'm, if I know that this is gonna break my pricing model and I do it anyway,

it's my


If I don't know that if I have no clue that it's gonna break the pricing model, cuz I can't see

it, it's not my fault


**Loris Marini:** yeah,

**Chad Sanderson:** You can't blame me for that. And so first like the lineage and making that lineage visible and making it easy to understand if I make this change, who am I gonna break?

And how severe is that break gonna be and is a table that I'm gonna break? Does it have a lot of customers and consumers and are people referencing it and writing queries on it? Or is it nobody that, that's the first thing. And so if you understand that, then the next thing that the engineer's gonna want is okay I don't wanna break people.

I wanna give them an API. I wanna version control this. I wanna ensure that nobody comes along after me and does something silly and ends up breaking one of my customers downstream, right? They are opposite. They are two sides of the same coin. One is the understanding of how the data is being used and then contracts.

The other is the vehicle to prevent the breakage

from actually


**Loris Marini:** Absolutely. And you need both.

**Chad Sanderson:** you, and you need both at the same time. so once you've done that you've created this bridge that I described before.

You've created the bridge between the consumer and the producer that that previously didn't exist. And the next step is the transition to requests, right? People can start to. Ask for things. You can start to structure your contracts in a way that you're making modifications to them.

I want this new property. I would like to add this new value in an existing property. I want this new entity. that is that is the next step. And you can tie, like you said, you can tie those contracts, the metadata to the actual business use case, and then things are gonna start getting really interesting because all of this conversation is gonna start happening and there'll be like true collaboration, right?

That like step two is when the collaboration actually starts to occur. And then in is when you start describing the world in its entirety and you move to this universe of life cycles and semantics. Right now, we're not just thinking about contracts anymore of one off contracts that describe entities.

Now we're thinking about how does the whole system fit together? How do, how are entities related to each other through events, through, through real world behaviors. And once you're living in this like semantic world, then you have insight from you have various levels of abstractions where you can actually start from the semantic context.

You can then move to your sort of E R D, right? The, the map of your schema and your databases. And then from there, you can move into go down the level of abstraction into the warehouse tables, and you can understand. From your semantic properties all the way down to the physical implementation, like where is your data coming from?

What does it mean? How does it used, who actually owns it? This is what I call like the true data catalog, right? Like when people talk about data catalogs today, they're usually often talking about the lowest common denominator of that system, but that's not how people understand data. And without that, that higher level, that, that sort of higher order of capability, then it's gonna prevent anybody who I is not like familiar with the underlying data model from making any requests or having any modifications to make of the data itself.

Once you have those three pieces, then in effect, this is what I call the semantic warehouse. And this is when you can begin to. The non-technical consumer can really begin to involve themselves in the process in a way that they've been excluded, essentially for the past decade or


**Loris Marini:** Yeah, that's the power of collaborative networks, right? Ron, Adam is talking a lot about these days about that concept, which I think is fantastic, but it's been in the realm of fantasies and unachievable for most organizations for the last 40, 50 years, if we were to create knowledge systems like that.

So I'm not even talking about the data, the raw ingredient, but go up on that abstraction layer, quite a few layers and go from data, information, knowledge to meaning. And, and logic at a really high level then and if we have visibility across vertically across all the layers of the stack and and horizontally from source to destination, and if the system is easy enough to navigate to anyone without a, a degree in computer science can jump in and see correlations link physically in their heads, what they're reading on the screen to what the customer, whether it's internal, external is experiencing at the moment, that would be incredibly, that would be almost like the fabric that, yeah the substrate on which knowledge is created, shared and used across the organization, that yes, has data.

Yes. Has Kafkas, yes. Have, has all your beats and pieces to move data around and transform it. But. It doesn't stop at the data layer. It connects data to what's actually happening in the real world.

**Chad Sanderson:** Yeah, exactly. I know this is not a new concept. There's been a lot of people who talked this philosophically, but what I've tried to do at convoy is ask myself how, like, how do you actually get there, right? It's one thing to say, yes we need to have semantics and we need to have those semantics need to be connected to the underlying physical architecture O of our data.

And we need to have a way of, communicating these things to producers. But if there's not a path on how to make that happen, then all of these things get shut down, like right in the dirt, this is a human problem, it's actually a problem of you mentioned collaboration and it's like collaboration and connectivity.

If you have not built. The bridge between the producer and the consumer, if you haven't helped them develop empathy, if the producer doesn't have any empathy for the data consumer, you will never arrive at a point where you have this semantic substrate that you can operate in.

It just won't happen. And I think that, I've seen a few products that try to get at this, and they're these very like heavy, bulky things. And it makes sense, like why it's heavy and bulky, it's basically yeah, if you build out your whole warehouse on our product, then we'll give you all of these like really great stuff from soup to nuts.

**Loris Marini:** Yeah,

**Chad Sanderson:** But obviously that's not where anybody's data actually is. Like convoy, certainly Conway has thousands of data models. the, there's no way that we can just start over from scratch. And so the, and so that, that was actually the question I was trying to answer is not only how do you get the people involved to care enough to start doing this?

But also, how do you do it iteratively over time in a way that is that doesn't require a massive refactor that, that doesn't require the entire organization to start moving all at the same time. And that's where this model has come from, because what a person can do is they say, okay I have my one use case, right?

I have my pricing model and it breaks and it sucks. And I don't like that. And I want a contract. And by having that lineage, you build the empathy the contract gets created. And now you're able to connect in that single use case, the semantics to the physical layer. And you just gradually start expanding out use case by use case you're gradually expanding out more and more, and.

When you have the team that cares about building these relationships, which is your data team, your business intelligence, engineers, your data, architects your data engineers, your analytics engineers they are very naturally going to want to starting. They're gonna wanna start creating these relationships for themselves, right?

Like they're going to recognize, Hey, I could use this. If I just had this one connection. If I had this one foreign key in production, I'd be able to build the data set that you, that I need. And that's all they need to do. Hey, software engineer, producer, I want this single foreign key. And here's the here's exactly what that means.

And actually we've, this has been defined elsewhere in our system. So you don't even have to reinvent the wheel. You can just use the definition of this property that already exists somewhere else. And it makes it tremendously easy for them to do that. And so the model is hyper iterative in that


**Loris Marini:** Absolutely this to me, this is the definition of agility, the true definition. It doesn't matter how fast you can process data. It doesn't matter how many petabytes you can crunch per hour that doesn't tell you that doesn't give you agility. It gives you the how quickly you can run the job. The single atomic.

Piece of computation, but that's not what we're talking about. We're talking about how quickly you can get an idea from the frontline into the system and back and distribute that new thing. The new idea, once it's been demonstrated, it's actually valuable. And so you need to have your sandbox environments.

There's a lot of architectural considerations, but if you do that fast, then you can truly innovate and learn and grow as an organization. I just add an aha moment. I wonder if this stuff that we're talking about, the concept of a semantic layer and a data layer and lineage is related to what Irina sta was talking about in episode 25.

She, we, we did a diving in her book, data lineage from a business perspective, and she was saying that there are three, three types of lineage actually. We, most people think about the physical layer. So the actual data in the table. So column, row, and column. It gives you one value.

And what we think about when we think about lineage, we think about how that number is calculated and who uses that number downstream. So connecting cells in a table, basically with other cells, but there's another layer. There's the, I think logical layer above the, just on top of the physical layer and as the conceptual layer and what she does in her work is often start focus on the conceptual and the logical.

We're talking about large enterprises here, so massive organizations at some point regulatory compliance becomes an absolute priority. They realize they have no visibility and they call an expert, this case arena, there's many other consultants that do this type of thing.

And One of the things they do, they map the business terminology, the business use cases, and they really describe it in words like the semantic layer is not an actual table. You query is a piece of document.

You could do it in word, right? that describes what the term means and why it matters. What's the plan. And who's gonna use it before you even worry. How do you physically store it in a table, in a graph database, in a relation database? So I think I'm not sure maybe this is just a long shot, but I think there is some strong similarity between these things that might be connected.

And we might borrow perhaps from the more enterprisey sort of stuff and bring them back into smaller organizations and say, what we could do on top much better than enterprises is what you just said, integrating that because one thing is to build lineage. Another thing is to do it in a way that is not just an exercise that dies two weeks after you, you shipped the project, right?

**Chad Sanderson:** So there are a lot of data catalogs out there that incorporate this idea of business logic and your the conceptual layer. I've been calling it, the descriptive layer. I think conceptual layer and descriptive layer both serve the same purpose, which is like, what is the real world?

Logical layer is like, what can we derive based on the real world? And then the physical layers, obviously like how that maps this sort of the underlying, tables the biggest problem is that there is very little incentive for the people who understand.

the semantic layer or the descriptive layer to do that work. Usually these are people on the business side they're product managers that sometimes it's analysts and they already have this stuff up in their head. And so spending the time to actually map and model it out, doesn't really have a lot of intrinsic value for their everyday lives.

Like they're trying to ship features and answer business questions and on all these other things, they're like, okay, I could tell you what I know. And maybe you could tell me what, and we could put that in a document and that's really great, but unless I can do something about that, it's not gonna be particularly meaningful to me.

And so I think where. It makes a lot of sense is when you have an organization that like where it's gotten so bad that the company has said, we are going to commit money to actually developing this lineage and we need to start from the semantics, but it takes a long time if ever for a lot of organizations to get there.

So one of the questions is how can you incentivize people to do this? And again, this is the beauty and the power of the contract, because if you've moved to a world where you are asking for data you have to describe what that data actually is in order for the engineer to implement it. If you're gonna ask for a new entity, you have to explain what that entity means and how it relates to other entities.

And what that relationship actually is. The engineers and requesters also communicate through the vehicle of Seman. And it's the same semantics that you can then use to traverse that lineage. So if you're storing that information, basically the requirements, if you're storing those requirements and you're including it as the top layer in your catalog.

And then those requirements are generating schema and the schema is being produced by is being produced by a software engineer. And then that's flowing to the physical layer. You now have all three layers. connect what's in the warehouse up through of derived properties up to the actual semantic requirements.

And if you've created a system that allow that like facilitates integrating these


And so this is the next step of what we're working on at convoy is like a literal design surface where if I were to create a entity, in an interface, I say, okay, I have this brand new entity.

I'm gonna call it. An RFP, a request for pricing. And that has meaning. And so I need to describe what that meaning is. And then I need to say, these are all of the, these are all the foreign keys. This, these are all the relationships that this RFP has.

It has a relationship with the shipper, right? Where it has a relationship with a broker, like a convoy broker. It has a relationship with like our bidding system. And I can say all these relationships exist because I want them as foreign keys.

And then they have a bunch of properties and I have to describe all those things semantically. And then when I do that, I'm able to auto generate the schema. I know what all these properties are. schema for the contract is self evident, right? It's something like, you have a brand new, you say like entity dot RFP, and you have the whole list of the, these are all of the properties that need to be there.

These are all the values of those properties

that that gets turned over.

Y yeah, it's, it's like self documenting code in a way, but it's like it's like self documenting data where once this is actually implemented by the software engineer, anybody can then go back on and navigate that system purely through the business logic alone.

So I really think that contracts are that they are so useful and multi purposeful but, I always recommend don't try to go crazy with the semantics stuff at first, that will, that, that will come over time. Like a as your system matures, you will start to see the benefits of the semantics start to emerge.

But just leveraging the contract in order to create that relationship between producers and consumers I really can't stress enough. How valuable that



**Loris Marini:** a solid foundation, cuz you can build on top of it. And a lot of people might be thinking, how costly is this? It sounds like it's gonna be super expensive to think about the contract, the agreement and get everybody on board to agree. But actually no, because remember the marginal propensity to reuse, that's where your cost goes down.

It's a cost that you pay upfront to create a contract once, then maintenance the cost of maintaining a contract and the ability to do so much more with that piece of data. By combining it with the other pieces of data in a combinatorial fashion is the selling point of this stuff. So you've gotta do it right.

Even if it's not perfect, but you have to start with that idea. Otherwise there's very quickly code becomes a swamp and it's too late. And then,

**Chad Sanderson:** exactly. And the great thing about this system, I'm patting myself on the back here, but like you can do it all. With what exists, right? There are open source tools for lineage. This is Apache Atlas. You can use that today. There's you've got aro. You've got, we're getting into the technical details here, but you've got aro and pro buff for implementing schema that exists today.

At convoy we have a Kafka pipeline. And we have, we put together a little library and all someone has to do update their schema definition with whatever the new schema is for the contract.

They use our library, like it handles some it, it handles like a lot of the boiler plate code. And they push they push the new contract, whether it's like an entity sort of crud update, like we're, you're wrapping a database or it's like some real world event, like a verb.

We have a Kafka pipeline. So again, that's just open source stuff. We push that into the warehouse and then once it lands we pull the data from schema registry. We pull it from the warehouse and we put it in a very simple cataloging interface, which really could work with any data catalog today.

Like you can push it to something like Amon, you could. Wherever you wanna push it, you could push it easy. Like you could do this. All open source is what I'm saying. And just starting with the smallest sort of conceivable unit of value would be would be incredibly impactful.

And like you said you're building this foundation of, upon which you can begin to layer semantics on more and more, I'm not gonna pretend that this is gonna be an overnight


**Loris Marini:** Oh, yeah.

**Chad Sanderson:** I don't want people to come away from this thinking like, yeah. Okay, cool.

I'll go and implement this semantic warehouse stuff. And then tomorrow my world is utopian and perfect The hardest part about this is not the technology. The hardest part about this is, connecting these two siloed groups and convincing them to, to

care about each

other and


**Loris Marini:** building. Absolutely. Yeah.

**Chad Sanderson:** you can effectively do that, you've, you're 80% of the way there. Maybe 75

**Loris Marini:** Yeah, I love it. It has to be grassroot and top down as well simultaneously. Cause you can have a sponsor and someone from the board saying, yes, he's a million dollars, go do it. But if people don't are not motivated, as you mentioned this couple times in this podcast, and I love that attention to, the reward system, how to brain work.

So that dopamine spike we get, when we see that our work has real impact, that's what we need to try and leverage in a positive way. Not by leveraging the sense of, you only get promoted if, he's the impact. He's why you exist in the organization. He's the impact of your work.

He's why it's meaningful. So definitely definitely it's gonna be a, it's gonna be, I think, an interesting decade to see how this evolves and especially finding that balance. I think. The end state, which we need to be able to describe and sell to the business. And I'm using the word sell here in the most genuine, and meaning way

**Chad Sanderson:** Mm-hmm mm.

**Loris Marini:** and the ability to execute, especially at a high speed, there's a different velocity as well talked about with Franciscos called a dear friend of mine.

He's an analyst, the CEO dumps a lot of requests to business analyst. They understand the business really well. They don't have, they can't wait for a future to be built because, engineering tends to be slower. So there's so many problems that we need to fix, but I think this is going in the right direction.

And I'm gonna add one last note, cause you've been really super generous with your time, which we haven't even mentioned. We talk a lot about business literacy with those contracts and the ability to navigate across different abstraction layers left and right up and down. Imagine how much quicker.

It is to onboard new people to, to the team. If someone just can see how the business works by navigating the right at the right level of obstruction, whatever makes sense for them in that moment, if they wanna go down deep to the physical area. Sure. Help yourself. If you wanna stay up to the conceptual level, you can do that.

And you can see the relationships that make up the bus. The business is literally a bunch of relationships between entities and those entities can be human beings can be machines and our, with our intelligence, we can be the interface. We can easily navigate the machine to human interface because we are the humans and we are smart.

Machines are stupid.

**Chad Sanderson:** You are exactly right. One of the questions that I've always found funny, and I've asked this on LinkedIn a few times and I've made this statement on LinkedIn as well, but like very few people actually understand how their own business works. Like they understand it in these broad, like very vague terms.

Yeah. I know we we ship things from here to there, or, we're a search engine and, maybe I really understand my particular piece of the puzzle, but you don't really see the, how you're part of a network and how the things that you do are gonna impact someone else and how the things somebody else does is gonna impact you.

And that shines like a totally different light on your work. getting into this, utopian vision thing here, but once you really understand that someone two degrees of separation on you can have an impact on the work that you do, right?

Someone, 2, 2, 2, or three, two teams away that you've never even talked to, can roll out a feature that, that impacts your customers in a way that, you never expect and have no insight into it's gonna change the dynamics of the organization. Like a lot of organizations today. I keep saying this but they're very siloed.

Like they have their problems that they are solving and we're gonna focus on our problems and we're gonna optimize like this model or this use case, or this set of features. And you might realize, oh, wait a second. What's happening upstream of me. If I fix a problem, if something that maybe they don't even care about, that's not a big deal.

If I fix that, my team is actually gonna improve by an order of magnitude. So instead of me winding my wheels on trying to hyper optimize. A UI, why don't I get these three other teams to work on my behalf to make my




**Loris Marini:** the definition of working smart and not working hard. And that's the groundwork you need to have. That's the base on which you're gonna negotiate your pay rise because this is what I've done, instead of working in the Ford factory, measuring hours, I am. I'm a knowledge worker, which means I have an understanding of what's happening.

I build relationships internally and my work adds value every single day. If I leave tomorrow, what I've done, doesn't come. I bring with me my expertise, my experience, but what stays is the contribution that I just gave to the company by virtual building those relationships with the people, it's a completely different way of, about really understanding.

It's such, such a fundamental shift that we need to go through. And just, I can't wait to see that happen. I think it's gonna be very interesting. I wanna do one thing before we we close this one is a summary in 60 seconds. You tell me when you are ready. Think about the most important parts that someone that was to tune in.

Now they skipped a hundred and an hour and 20 minutes of conversation, they just want a 62nd summary. I've got a clock here. You tell me when you're ready.

**Chad Sanderson:** I'm ready.

**Loris Marini:** Smash

**Chad Sanderson:** Okay. So of the most important problems in data today is that there is a siloed relationship between the people who are generating the data and the consumers of that data, because they're not talking to each other, you get data that is missing data that is breaking data that doesn't represent the real world.

And what needs to happen is building up those relationships. And you can build those relationships. Starting with lineage. The producers need to understand how the data they generate is actually affecting the downstream and how it's affecting the company. next. They can invest in data contracts and contracts enable engineers to ensure that the data they're producing is high quality.

I think that's the best possible foundation for this semantic future

**Loris Marini:** Nailed it

**Chad Sanderson:** not bad.

**Loris Marini:** you did a really good job. Way better than I could have done.

Amanda, this has been a fantastic conversation. I wanna thank you so much for really, for taking the time there's we opened so many doors, we knocked on so many doors. I'm really looking forward to open some of those together in the future in one capacity or the other, we're gonna do have a feeling we're gonna do some work.

So definitely glad that we connected that we did this.

**Chad Sanderson:** Awesome. Loris, thank you for having me. And I'd love to collaborate on this. I like, I truly believe that this is the future, but it's just gonna take some people actually doing like real work to make it happen. And I know you're doing that and I'm trying to do that as well. And I know there's a few other people who are, so I'm very

excited about what's



**Loris Marini:** We should bring them together for sure. Next step is to us. is to have a channel together and brainstorm ways to do this. Thanks. Thanks a lot, Chad. And enjoy the rest of whatever's left of the day there.

**Chad Sanderson:** Thank you. Thanks Loris.

Contact Us

Thanks for your message. You'll hear from us soon!
Oops! Something went wrong while submitting the form.