Data and people in a $1B ecommerce business

Loris Marini - Podcast Host Discovering Data

Today we learn about the role that trusted data played to scale an eCommerce business to $1B and beyond with James Edwards, CDO and founder of Pet Circle. We talk about the business impact of fragmented data and strategies to bring everyone on the same page.

Also on :
Stitcher LogoPodcast Addict LogoTune In Logo

Join the list

Join hundreds of practitioners and leaders like you with episode insights straight in your inbox.

You're in! Look out for the next episode in your inbox.
Oops! Something went wrong while submitting the form.

Want to tell your data story?

Loris Marini Headshot

Checkout our brands or sponsors page to see if you are a match. We publish conversations with industry leaders to help data practitioners maximise the impact of their work.

Why this episode

Today we learn about the role that trusted data played to scale an eCommerce business to $1B and beyond with James Edwards, CDO and founder of Pet Circle. We talk about the business impact of fragmented information and interpretations of the data, what it means for data to be fit for purpose, the difference between an engineering mindset and a data mindset, the process of standardising definitions and defining ownership, and the cultural / people challenges they faced along the way. We also look at some of the frameworks that the data team at Pet Circle successfully used to support and scale the business, and share some hard-learned tips for new CDOs in startups. You can follow James on LinkedIn.

Without a system to map your data models, any change could potentially undermine your regulatory compliance, upset people, and erode trust with your customers. It also stresses the heck out of people, who live in a state of constant fear that things might break.  How do you bring the business onboard? How do you choose the right tool for the job, scope the work and secure the necessary funding to get it done?

Join me LIVE as I learn from Irina Steenbeek, an absolute data lineage expert.

In an exclusive 2h crash course titled "How to build a successful data lineage business case" Irina will show us how to find compelling business drivers for your data lineage initiative, how to secure executive sponsorship, scope the work and get the necessary funding. We will also learn about the RACI matrix for data lineage, how to prepare requirements, choose the right approach, select the right tooling and much more. At the end of the course we'll get templates and cheatsheets to apply this knowledge immediately at work. 1 WEEK LEFT - ENROL NOW (use the code "DD-IRINA-MAR-10" at checkout for a 10% discount)

What I learned

Share on :

Want to see the show grow?

Your ideas help us create useful and relevant content. Send a private message or rate the show on Apple Podcast or Spotify!

Episode transcripts

Loris Marini Welcome to another episode of Discovering Data. I'm here with my friend James Edwards. James and I met at the University of Sydney a few years ago when I graduated. It was my Ph.D. ceremony, and he gave an amazing speech that stuck with me till today. I wanted to talk to him about data management because James is the founder of Pet Circle, a company that, as we'll hear in a second, has done wonders with data and information. It has also gone through a number of transformational stages. It's interesting to hear the perspective of a person that actually used data to drive business outcomes. Without further ado, we'll dive in. James, welcome to the show.

James Edwards: Hi Loris. Thanks for having me.

Loris Marini: Tell me an overview. What is Pet Circle? What's the story?  How did it all start?

James Edwards: Yeah. Pet Circle is 10 or 11 years old and is a pet-oriented Australian e-com business that we are branching out into other areas as it stands. We're one of the largest pet businesses in the country now, sending dog food or cat litter or meds or toys to people around the country.

Loris Marini: Why do you need data in the first place? How do you sell it? How does a business work? What are the core relationships that make up the business?

James Edwards: It's been a journey. We've gone through different phases as you've said but what's become really apparent over the years is that we're as a specialist player. We're not a marketplace. We're not a grocery store.  We're not an Amazon or a Walmart or anything like that.

We are a pet-oriented business. Pets are an incredibly personal aspect of people's lives. There's a whole trend called the humanization of pets. Pets are part of the family. My caboodle might be the lowest rung in many ways, as she recognizes every dinner when she's begging for scraps but she's part of the family. We just went on a holiday; it was mandatory that we could take her with us.

We need to provide a personal service but how do you do that at scale? It's one thing if you're in a bricks and mortar store and you get to know your customers, you get to know their pets but we're trying to do this digitally. We're trying to do this at mass scale.

Data is the way to achieve that: to personalize the experience, to understand in some ways with our vast reservoir of data to help pet parents be better pet parents. "To pet better”, as we like to say. We should know more about what a caboodle eats and plays with than a parent of a pet. We should help people be better parents that way.

Loris Marini: Yeah. So, for the end-user, what does that data mean for the business? Do you use it to be more efficient in your processes as important from a risk mitigation perspective or simply to make the business more scalable? Is it all three?

James Edwards: Yeah. I mean it runs the gamut. We talked about phases historically, like in the early stages it was primarily about analytics helping us as managers understand what's going on in the business then we moved into probably using it to drive operations more efficiently. I still really feel passionate about that aspect of it.

Ordering products from our suppliers better and now we're sort of going into that next phase where it's about operationalizing to our customers. The classic example, of course, is making better recommendations on-site and firing them our recommendations. We want to do a lot more than that in the future.

Loris Marini:  There's one thing that we seem to keep finding in the industry, talking to different people on this divide or this gap between how data is perceived from the perspective of say an engineer or a data scientist or an analyst and how that is perceived by the business.

You almost seem to have developed the skills across the board because you had that it was imperative for the business. So, I suppose the business came first and then the technical, or was it the other way around? Tell me about the story. How did you develop your skills in data through data for the business?

James Edwards: Personally, I had the advantage of my undergraduate degree; it was in engineering or computer science.  Back then my postgraduate was in mathematics and then of course I started a business.  I actually had all these three sorts of perspectives forced upon me.  I do take that into our day-to-day now within the team.

I would say I see them as three different perspectives, not just two different perspectives. Very much engineering and analytics and data science are complimentary and they actually approach things in different ways.  The comment I often make is that engineering is typically rules-oriented whereas analytics and data scientists are typically goals-oriented and it can seem similar.  It often is very similar and they both tend to be methodical and detail-oriented but they actually do have some distinct differences which would come into play sometimes.

For most business people, what they want out of data is better reporting insights but also just operational reporting: are sales up and down? That's great. It's super important. It's probably not what gets the juices going for the data team though?

Loris Marini: Yeah. I don't know if It's because we like complexity when we're working with data. Some people just love the sheer challenge of massaging huge data sets or trying to extract a signal in the noise.  It is a challenging thing. It's easy to get enamored by the idea of becoming almost this wizard in the backend kind of things that other people, common mortals, cannot do.

I can see it on social media. There's a bunch of polls going around LinkedIn when other content creators ask, “What would you like to see next?” They're talking to data scientists, the top choice is deep learning and I'm here looking at them and going like, “Dude like there's only 5% of companies where deep learning can make a tangible sizable significant impact on the business.”

Most businesses don't even know where the data is. So, let's be realistic a little and focus on that, the boring part of data but very necessary.

James Edwards: Absolutely. I mean it's becoming almost a cliche in businesses saying, “I want to be more data-oriented. I want to be more data-driven. Let's go and hire data scientists.” So, they hire these expensive smart people to come in and then they go like, “Where's the data? What do I do with it? “You got to get your nuts and bolts in play and your fundamentals.

We only really kicked that off in 2020. That took up a good while. We obviously have databases and such and our reporting tool over the top of it but it was desegregated. It's set up over the top of our transactional systems. That impedes a lot of the ability to produce nice, clean datasets and all that sort of stuff. We really had to get the foundations of our data environment right which involved the data lake, data warehouse, that sort of stuff got going through the consolidation of data.

Loris Marini: Tell me a few stories, if you feel like disclosing them, but I'm curious to know more about the actual problems that having disconnected data can bring to the organization. From meetings running longer to people not finding agreement on basic things, how does it actually feel when you're experiencing that?

James Edwards: That's a great question. So many things pop into my head. One of the worst things that happen with desegregated data is that, especially if you've got the kind of company that wants to be data-oriented which we always have been, is that you focus on what can be measured and neglect what can't be measured. So much can't be measured when it's not integrated. I mean it's hard enough when it is integrated sometimes a little late when it's desegregated.

What I found that meant was that it's always tended to be a lot easier to measure negatives than positives. It tended to be a lot easier to measure costs than loyalty. The loyalty of a customer, so the propensity repurchase, the propensity of them to spend more of their share of wallet across pet with us, and the sentiment they have towards us like are they just buying from us? Because they don't know that we were the only pet company they know. Are they buying from us because they value our delivery, our customer service, our pricing, and our range?

It's very hard to measure but you can absolutely check the cost of your customer service team per contact or something like that. So, it tends to lead to quite a cost-cutting approach rather than an expensive one which as a growth company we want to achieve.

The other major issue, and we're still struggling with this even in an environment where we have a lot more connections between the data, is fragmentation. I think that that's really critical and data management because there are different sources of data. It's only natural that people try and calculate the same sorts of things but from different sources and that leads to problems.

When you've got one department talking about customer retention using one set of numbers and metrics and another department talking about it from a different perspective you just talk in different languages across the table and it's actually unhelpful.  I'd rather go non-data-driven instead of having that. So, we're trying to resolve those issues.

Loris Marini: It's interesting, I never thought about this. So, with this clarity the particular difference between negative and positive and how easy it is to manage, to measure churn for example, and how hard it is to measure things like loyalty.

They obviously require more data over a longer period of time but also more data points because loyalty, how do you measure it? You can use different metrics. It's a harder question to answer through data and so you need more.

James Edwards: Yeah. I mean I can speak specifically on this because of our ERP system and our website tracking system as well as our customer service and such was disconnected. It was very hard to tell that someone maybe they weren't buying from us in the ERP system. Very easy to check on but what we didn't know was that they were coming back and visiting the website every week and the product was out of stock. So, we would just ignore it.

The products are out of stock all the time especially at the moment unfortunately with supply chain issues. So that doesn't ring any alarm bells but if the customer is someone who's bought from us 10 times and now, they can't get their product, we should be on top of that. We should be helping, doing something, saying, “Hey we noticed you're looking for X Y Z but there's this other substitute product. It's really good.” Let's help you make the transition because this isn't coming back into stock anytime soon.

Loris Marini: Tell me about the types of data products or data problems that you guys solve.  I'm jumping ahead of myself. Let's talk about data products and how that is perceived by the data team, by the business. What do you think is a data product?

James Edwards: You can tell me if you've got a different perspective.

The data team was set up with a particular mission in mind. That was to operationalize data in the business in order to promote loyalty. The number one objective of Pet Circle is to help people pet better. This sounds like corporate fluff but it's real. This is what we want to do. I want to do that by promoting loyalty, by giving people the drivers of loyalty. What do they value in a pet business? We want to give that to them fast delivery, expert advice, great range. The one that suits a person are personalized to their dog or cat or lizard or whatever it is.

We want to operationalize our data so that we can we can help people do that better. The biggest kind of split I would say in data products is operational data and analytic oriented data by operations. If you're speaking technically web services that are accessible to the website, to our emailing system, to our customer service system so they can really shape the website experience or help a customer service agent understand that when I call up my deliveries in transit and that's likely what I'm going to be talking about and this has happened before.

We were definitely focusing on the former, on the operationalization.  There's a split there between sort of the internal efficiencies and quality.  I think that the best example of that would be like ordering optimally. We have tens of millions of dollars in stock at any given time. With tens of thousands of products and of course, we're fulfilling tens, hundreds of thousands of orders every week.  It's beyond human challenge to make sure you're optimizing your stock levels in order to fulfill all the different customer needs.

We're spending huge amounts of money to set up warehouses all around the country in order to get products to customers faster but that's all useless if we don't have the right products in those warehouses to get to people.

Basically, optimize the ordering of that against expected repurchase rates against the availability of space in the warehouse and all these sorts of things which are best solved with data.

Loris Marini: Current issues with the logistics.

James Edwards: it's a primary issue at the moment of course. We need mathematical techniques to apply. We use linear algebra to manipulate the data into the operational set which is the recommended order. On the customer side which is where we're branching out into now, we have the product recommendation. We have different types of product recommendations. So, this one's for our order delivery customers.

Dogs and cats need to be fed fairly regularly. We have a course subscription model with that where we give a lot of flexibility because it's not like a Netflix kind of subscription that just comes through every month. Your dog might eat more or less each month or what have you but it is a subscription model. We recommend complimentary products which is an interesting effort because there are different ways to solve for that.

The one that I'm most happy that we've got in is a pet detector. One of the things that I was quite embarrassed about for many years is that despite being a pet specialist business we didn't always know whether you had a dog or a cat or what have you. This is not good. We're sending out emails to people saying, “Hey there's a big sale on dog food.” I was like, “Well I've got a cat. I find cat litter from you constantly. Why are you sending me this?” It's a great question.

We did a lot of work on it but it was a challenge but bringing all the data together and then processing it and such. We now have apparently well over 99% accuracy on our detection of what sort of a pet you have. We're looking to delve deeper into sort of segment that out further. Do you have a puppy? Do you have a large breed or a small breed? A tropical fish or a cold-water fish and these sorts of things.

Loris Marini: I suppose it goes down to the personalization.

James Edwards: Exactly. I've got a caboodle so when I log on to Pet Circle, what I want to see is small dog pampered dog sort of stuff covering the website. If there's a litter sale on it's not relevant to me but of course, we want to personalize it. I want advice about knee conditions, which caboodles can suffer from, from the vet staff.

This was the thing is it's not about what products to buy. It's about caring for your pet and that's something that could be informed very much by data and then lead people to the right content using data.

Loris Marini: I want to dive into the process of cleaning the data. First of all, bringing it all together and then structuring it into making sure that people can access it easily.  What was it like to do that? How did you organize the data team to accomplish it?

There's a lot of debate going on around who should own the data and as the founder and the head of data you are in a sweet spot. I suppose you have a strong opinion is that I want to hear it. Who should own the data?

James Edwards: I have a strong opinion being the head of data and the founder. It doesn't get you quite as much impact on that in a larger organization as you might think.

The way it should work is that engineering teams like the technical team should own the transactional systems and obviously structure the data accordingly, focused entirely on transactional efficiency.  All terms, OLTP, relational databases, MySQL, Postgres, all these sorts of things. Use third normal form structure it correctly, clean data. That's great.  Replicate that into the data lake which the data team own and then let us transform it into usable information.

There are some edge cases there. What happens when there's a fairly useless for the transactional system feed coming in from a transactional system? An example we have on that are careers. We have a bunch of different careers because it's always a challenge getting a high-quality delivery. They'll send event-based feeds. Should that go into the ERP system? It's useful but it's far more useful in my view coming into the data lake. We went one way on that which is going into the transactional system to the ERP.

There was a transformation occurring in that side and then it came into the data lake.  We've had our challenges with that from the data team. Once the data is transformed, I think it's bad practice to transform it again. If you're building a transformation on a transformation, if something's gone wrong, you need data architecture. It's fragile, there are multiple sources of change. We operate on a different cadence than the product feature engineering teams. I can't do anything about that. That is just the way the world works. They work so hard. The engineering teams get features out on the time that they need to.

A business changes constantly. We're a high-growth business we work in a very dynamic environment especially under the current conditions for that. That's never going to change. So, they rarely can give what I would like to see at the time and effort to preserve the transformation process that they've developed in order to keep not just continuity going in the data environment but also coherence and the data environment.

One thing we're looking at is actually having a dedicated data engineering, mini squad if you like, in the data team to be able to do things like process these feeds that come in and be highly tactical about it. The benefit of that is the product feature squad shouldn't then have to worry about it. When these very urgent things come into play, they have more availability.

They have less to do because I don't have to worry about this side issue and focus on integrating a new courier or changing the processes to handle a new career system. Well, we're working out ways through that as a business to be honest. I don't have a definitive answer.

Loris Marini: Yeah, and few companies, few teams have it. To be honest it's an ongoing learning experience. It's part of the reason why we changed the name from The Data Project to Discovering Data. We all are. It's a work in progress. Definitely.

There's one thing that I keep hearing from over at Data Foundations to do some consulting. People are struggling to just come up with a structure for the data team. In this particular case, you led the data team, how did you go about creating that?

James Edwards: Yeah. Our data team is currently five people with seven or eight roles in market. It's not as large as some teams that you'll see out there but it's also larger than a lot of the teams. Anyone talks about the large teams when I'm interviewing people quite often they're coming from two or three-person data teams which are a subset of engineering whereas we're standalone. We worked very closely with engineering, the structure very clearly engineering. Engineering was where we started. We have to get the fundamentals right. That begins with engineering.

We worked with a technical infrastructure DevOps team. That's not part of data but we have added those capabilities into our team fairly recently in order to beat up the speed up, have the focus. We still collaborate very closely with the company DevOps team but we have a lot more self-sub-organization in this respect. It’s relatively easy for us because we have our own systems because we're the data team. We know we're not sharing a system. The feature squads are doing across the rest of the platform.

We're running BigQuery databases and such which would be inappropriate for a transactional system.

Loris Marini: Oh yeah absolutely. The engineer goes like, “Should I now take a lunch break and wait for this query to come?”

James Edwards: That's right. I've been acting as the data scientist; we've hired a couple more data scientists to start it in two weeks to sort of have full-time focus on that. They're going to sit separately to the engineers and then the analyst will sit separately as well even though we're wanting but with different sort of leads and such and then collaborate temporarily on specific initiatives that they're working on.

Loris Marini: And the relationship between the data team and the business, is it like you reporting directly to them?

James Edwards: What's clear is that we own the data environment. No one has any concerns with that. There's been a lot of support from everyone for the creation of a data warehouse and people are very happy cause it gets very messy and very unglamorous very quickly. We're good with all that.

Data science and by that, I mean the creation of machine learning-oriented or based models for operational purposes in particular, that's with us. No one else has really concern with that. There have been noises made about, “Oh it'd be very useful to have a data scientist in this team or in this other team.”  I don't have any real concerns with that they’re quite localized initiatives that they would be involved on. Just having a single team for their entire time with Pet Circle that's compartmentalized. That sounds fine to me.

The concern would be is if they had deployment of models that didn't go through the professional standards that really only a data team, and to an extent, an engineering team can have like peer-review code, code versioning, testing. A lot of departments just simply don't have that in their DNA. Analysis I think is where there's a lot of contention, a lot of debate that I see.

That's the sort of three different models. Is it centralized? Is it completely decentralized or do you go to the hub and spoke? We're going the hub and spoke and that's from a team perspective.

Loris Marini: Tell me more how does it actually work?

James Edwards: Yeah. What that means is that obviously we provide the data warehouse. Anyone in the business can make use of the data warehouse through the BI tool and that includes data analysts hired within other departments.

Loris Marini: So, finance would have their own analysts?

James Edwards: I mean finance are always going to have their own analysts because they have a very specialized need in the commercial side. Broadly, that's not an issue. Commercial people tend to be of a similar sort of mindset to data people and engineering people.

Loris Marini: But they do access data. The data’s already been cleaned and prepared?

James Edwards: Exactly. Only data engineers and the data team will have access or do have access to the data lake or the raw data.

Analysts outside the team have two limitations. One is they can really only access the data warehouse which is clean, documented data dictionary. From my perspective most importantly, cohesive and singular. There are not different definitions of what a new customer is or what a lapsed customer is or what a sales figure is. It's all in there once. The other kind of limitation is their sharing rights of the BI tool. Commercial can do it because commercial have done a lot of standardized reporting on sales and these sorts of things.

Most teams’ analysts build reports and they can only share them within their function and that's to avoid the situation which has happened repeatedly in our history of Pet Circle where one team genuinely needs a metric, let's say lapse. They do some work on it. Of course, it's functional for them but then it gets disseminated across the whole business. It's not necessarily useful if it can be distinct from some of the other ways people view that across the business. Cohesion, data fragmentation of the data is something that I'm relentless on pushing back against.

Once we have fragmented data across the business: fragmented information, fragmented metrics, fragmented interpretation of what's going on, it kills our ability to operationalize it. That's where I was. I think I said at the start I'd rather have data less than data-driven when it's three different definitions of whether this customer is lapsed or is not lapsed or it's something else. That's the prism through which I view the team organization.

We as sort of engineers and as people who value data more than anyone else also need to guard against that sort of golden-esque, “It's my precious approach.” We want to encourage analysts in all departments to do the best job they can. We're looking at ways we can enable them to upload their own unique data sets into the system to keep that separate from the rest of it. It's not official company data but it is there so they can match company data against their own private data sets.

Loris Marini: Especially if you don't do it, they'll do it anyways.

James Edwards: That's a good point. Yeah. You want to have trust that people are doing good things. We don't want to centralize and bog down. I think we are passionate about democratizing data. I think I went too far at the start and made it too available. Everyone in the company had SQL access to the entire database and that caused a lot of problems. We've got to pull it back but I don't want to over-correct.

Loris Marini: Yeah. So, your approach to governance is that there are specific specialized functions within the topology of the data function and they can be distributed. They can be in different teams. They do access clean, standardized high-quality data for their needs but they can also pitch in with their own data sets as long as we know what is gold data and what is bronze data.

James Edwards: Yeah, and that bronze data shouldn't be shared. It shouldn't be made a top-level company metric or company charter report. The other thing I've seen on a practical basis from within a business is it can be quite challenging for non-data teams to actually hire and retain and engage high-quality data people.

People always argue about, “Oh I should have more analysts.” The fact is it's very hard to hire good quality data or any data people at the at the moment. It can be a bit of an academic exercise. A lot of our departments at Pet Circle have actually said, "You know what, I need some analysts. How about you go and hire them?” You have that sort of formal reporting line but they're permanently succonded to that department. I mean I ceded this approach but I think it's right from what I've seen like individuals in the profession of data want to have the professional development, the standards, the collaboration with peers. They want to be a manager and a head of function who actually understands data and is looking to progress it and making use of new technologies and new capabilities.

Career development as well as someone who's going to promote what they've created so they're actually more attracted to working within a specialized data team.

Loris Marini: I have this picture of the octopus in mind, this brain with distributed sensing systems. You do want to know what's happening in each sub function and sub-function because without that knowledge you can really build a cohesive, holistic picture of how you're going.

You have a bunch of reports but they might be disconnected. You might not have the domain knowledge to understand them or even sell the ideas that those reports really are about in the right way to the right stakeholders.

James Edwards: That last point is absolutely massive. If you have a wonderful supply-chain data analyst. They can do all this stuff but they're limited in what they can do. They're limited in the way that they can then get the data engineers to build a better-quality inbound feed, get the data engineers to also get the output of their insights and incorporate it into this company-wide model. They're going to struggle to have the results of supply chain-oriented analysis going through into the bigger picture being integrated into the machine learning models that the data scientists are being integrated into the company-wide data warehouse and reporting.

Every department should want to get their way of working, their way of thinking integrated into this company brain, the central company brains. Raising the visibility if nothing else on those sorts of things

I think it's obvious you can't, in any practical sense, have that happening when it's fully decentralized. There can't be a centralized data environment and decentralized data workers because it would fragment.

Loris Marini: It's interesting because this raises a number of questions around accountability as well.  If I am embedded into finance and I am responsible for essentially collecting the data and visualizing and creating reports and telling stories with data, I have a ton of domain knowledge.

I have cleaned data but when data’s not clean, I have an issue. I can't do my job or there are issues within the team. How do you escalate that? Do you go to the direct report within finance and then all the way to the CFO and then from the CFO all the way to the CTO and back into engineering to fix it? Should we become a lot more comfortable with loose lines and establishing channels where information can propagate really quickly from the edge of the system all the way to the central hub?

James Edwards: Yeah. It's not happening that much currently but it has been happening. To be honest, what's best is when the analyst from another department goes directly to an engineer or an analyst in data and algorithms but the data team. They tend to just work together and they solve the problem and off it goes.

What works badly is when it gets escalated including to me. I'm like, “No they can't do that. They should've gone through process.” I block it. They actually don't tell me half the time now. They just go and help each other out. It's usually the managers who are the problem the actual guys doing the work are fantastic.

Those loose connections, those informal connections as long as you can keep the fragmentation below a certain threshold, my experience is that they work through by the people doing the work like the engineers and the analysts. They can actually solve those problems for the most part.

Loris Marini: I wanted to take a big magnifying lens and really look into the details of those cracks, gaps, and issues that were mentioned before.  I think you said something really interesting. You said there are issues with data fragmentation at the data level. So, the raw ingredient that comes in numbers or units of measures a whole bunch of that stuff.

There are issues at the level of the information and knowledge and then the interpretation what people see when they look at a feature or a column or a row in a spreadsheet. How do you go about fixing those? Who should own each of those differences? What are some of the stories?

James Edwards: Very briefly obviously I spoke about how we created data warehouse. That took a long time. But to really then promote the use of that data warehouse required to implement definitions, implement common across the business definitions.

What's a new customer? What's a sale? What's an order? Really basic questions. I should have known this better than anyone. I was a bit blind to it. I kind of knew there'd be some like little bits but I wildly underestimated how significant were the gaps in understanding what those basic terms were. What's an order? You would think that that is an unambiguous objective thing but it's not.

When you go about trying to create a cohesive data warehouse, a set of data that then clearly formalizes a single definition, it causes a lot of problems. Engineers just want the answer. We just want to know the answer. The data engineers will go, “Well who do I email? Who do I ask what the order is?” What happened was that people didn't know.  When you looked around the existing reporting and other operations, there were probably half a dozen different definitions.

This really bogged us down. That might sound silly to people but I said, “All right. What if you cancel an order? What if you canceled before anything else happens?” We have a subscription service as I mentioned. What does happen is that people leave open their subscription. They leave it open and then they get a ping in the morning saying, “Oh your order’s recurred from the bank,” because they've obviously ignored our emails that have come through that lead up to it. I call up and they’re like, “Oh I don't want this order. It's cancelled. It's fine.” Maybe it didn't even go through on their credit card.  It never went to the warehouse and all that. Is that really an audit?

Sometimes have a split an order because of a stock issue. Are they two orders or is that one order? Do you see what I mean? Broadly it's the same number. We've created 15,000 orders on a day but was it 15,010? Or was it 15,090? That starts to matter especially when our mission and vision in the data team is to operationalize this data.

Between these 80 customers who might be being sent an email saying, “Hey you've just had your third order. Isn't that awesome? Here's a free gift,” or, “No, I didn't create an order” or whatever. We went round and round in circles on this for a while. What works is working. What we decided is we needed a single owner for every single definition in the business. That's a huge task.

There's a bunch of issues that go along with it but to start with we have to get the buy-in. This was a good idea. A lot of concern on this topic from the executives. Well, it was actually quite hard to sometimes understand what the concern was but there was a lot of concern.

What we ended up doing was we divided up everything into a business domain. A delivery obviously is in the realm of supply chain. Hence there the head of supply chain, the executive in charge of supply chain was the owner of most of the delivery metrics. How many days did it take a delivery to occur? Which again sounds really simple but do you include weekends? Do you like divide by 24 and take only positive numbers? Is it like calendar days? If it arrives at 11:00 PM on Monday night and is delivered at 8:00 AM Tuesday morning, is that one day or two days or zero days? I don't know.

The other trick was really just communicating over and over and over again that owning that definition did not give anyone authority over all the operations that relate to that definition. Marketing owns most of the definitions relating to customers by default, especially new customers but that doesn't necessarily mean that everything to do with the customer is under the umbrella of the marketing function. Product operations all have to play in that. Just reiterating that message again and again and again, that really helped.

We had a few sessions and the executives got really into it after a while. It was actually great. It's been very helpful for everyone. Some of these terms have finally got definitions after 10 or 11 years or whatever it is so we're actually starting to speak in a common language across the business. Now what we're doing in the data team is implementing a lot of these definitions. What we're also doing incidentally is implementing as close as we can to the definition. Part of the thing is that if you own the definition, you can change it anytime you want. That doesn't necessarily trigger an engineer to go and reimplement it.

We will good faith go and reimplement it as soon as we can but there can be a gap between what the formal definition is and what the implementation is. For us it means it's fantastic because now there is a single implementation of that definition at any given time. People talking a common language, it's cohesive.

If one department says this customer's lapsing and someone else is, “No they're not lapsing.” It's like well hang on. There's something wrong here. It's not just different interpretation. It should be in the database. Let's have a look who owns that definition. Have we implemented that correctly? Yeah. Okay. Well then it should be an objective answer to this. That's working quite well.

Loris Marini: I'm interested to know more about the process of realizing that you had disconnected differences in interpretation. Illusionary speaking, we are machines that even when we're not thinking about it, we're creating models of the world around us to navigate complexity. We build up obstructions. We create a mental model for how this bottle works.  I know that there's something called gravity even if I don't know the equation for it. I know how water behaves. I don't have to question it. Is this water? Is this a liquid?  is there gravity? Yeah.

We have intuitions for this stuff. Most of the time they run undetected. They're part of our subconscious. They just help us navigate the work do less and achieve more with less. How does it work when a team realizes that, “Hey what we thought was obvious is not obvious.” We need to actually ask three trivial questions because they're not trivial and get the buy-in from the business. Because obviously, I imagine it's a long and intensive process.

James Edwards: Yeah, it's interesting. Okay. I’ll pick an example which is: what's a customer. How many customers do we have right now? How many customers joined us today?

The way the process eventuated is we went in to do it. There were a bunch of edge cases. Engineers are constantly annoying. I used to be an engineer and I did this to other people constantly annoying on edge case questions. Is the day-to-day basically added to the customer table in the database? Is it the date of their first order?

Well, who do we ask? Okay. I don't know. Marketing and commercial were the go-tos. We got back different and quite passionate responses. If I can boil it down a little bit, marketing's perspective was a customer is a customer. If they come on with an intention to buy and they try to then they're a customer. Now if they had to cancel their order because we don't have stock or whatever that's still a customer. That was marketing's perspective. Whereas commercial's perspective, they have to be revenue-generating. We've got these existing reports that they use that implicit definition but no one ever agreed to across the entire business. You're wrecking everything if we go that way.

From the data side, the first thing we try to do was mediate. We just want an answer. It was like, “Well I might prefer marketing's definition because that doesn't change in the future.” So, if someone places an order today and it's canceled tomorrow, the summary table we generated the night before is out of date is wrong. We actually lost the customer from yesterday today. So that's horrible from a database management perspective. That was my interest.

It was a six-week process of trying to get the definition right. Mediating, finding common ground, and by the end of it, there just wasn't. It was just different perspectives. Where we ended up was, we go, “Well okay. Who owns that definition?” We had an all-in meeting in the executive team and people agreed to it. It was really good attitude around that. and then and then we agreed to it and we got the definition but I mean in practice what actually happened is we were we were stuck.

We couldn't progress this initiative for what I say six weeks. It was six weeks but then there was another delay while we got the meeting organized and such. I can't recall exactly what it was but essentially it killed a project. Like there was a small mini-project that was going on that required this and we just couldn't do it because we couldn't get the definition of it so that it was it wasn't just some reporting thing. It ended up killing off an initiative of the business. So, we want to avoid that. And hence we're going through this process now.

Loris Marini: If you were to give advice to the CEO of a startup what would you say are the top three things that they should really keep in mind and have very clear in their heads when it comes to data analytics

James Edwards: it depends on the startup. It depends on the nature of the business and such if they were in my world which is retail e-com particularly become a data-driven data native company, not a digital native company. So, this is something I’m becoming more and more aware of.

25 years ago, I started digital native companies and we have relentlessly risen and taken the business of bricks and mortar native Businesses. Very few of them have made the transition. Very few of them have adapted. Obviously, there's still a need for bricks-and-mortar businesses. I don't think they're going away anytime soon but by being digital native by putting engineering processes efficiencies like that at the heart of the business we've fueled this relentless growth it's becoming commoditized I would say. So don't confuse being digital native because most startups are being data native.

If I was to start a new business today, I'd be data native from the very beginning. Put the database at the center of your organization not the engineer. The engineering is incredibly important. I would never dismiss it but start by using data. Centralize your database but find ways and evolve in the path of a business.

There used to be two of us then there were 10, 30, 100 and now there’s many hundreds, 500, 600 I think was the last number I heard employees of Pet Circle. Find ways to disseminate and genuinely empower everyone in the organization to make use of data in a way that's appropriate for them. That just starts by having a BI tool or or a SQL tool. If you're cash constrained when starting a business BI tools aren't always cheap but the lesson, I've learned is to get the definitions clear.

Look at the transformations as much as you can. That's easy advice to give. I remember at the beginning of a startup you didn't have a lot of spare time but if you can do those transformations to create an audit table, a customer table, a product table or whatever's relevant to your business and always work off that and then update the definition underlying it, it's going to make the business more cohesive especially as it goes through that 30+ employees' scale.

Loris Marini: I just loved the difference you made between being digital and data-driven and data-informed or data-centric. It really speaks to what the business is all about.  We are here to learn and to adapt to the changes that happen inside and outside the organization and react to those changes. Ideally first level react with them. We want to be a lot more strategic ideally and not just react but plan.

James Edwards: Absolutely. Anticipate things. I was guilty of it.  You conflated engineering and data for so long and they're just such different things. You talk about strategic; I think I said this to you the other day, everyone wants to be strategic. Data is the path that can make everyone strategic because you come talk to us about your goals, that strategy. That's how a CEO talks right? They talk about goals. You come to us and you talk about the rules you want to apply. That's not what the CEO says. You're not being strategic. That's an engineering type of approach and we want you to apply this if statement and that while loop and so forth. Data takes that out of your hats.

You talk to us about goals and we'll set up the data structures that they allow the machine to interpret that correctly and make the right decisions and we have to get comfortable with taking ourselves out of that process both as humans because the computers can do it better in so many different use cases now.

Loris Marini: Yeah, it's funny how ego creeps in from every angle. No matter how you look at it the engineering team they pride themselves for their technical abilities. It's a tribe. There's a certain culture within it too. There's a certain lingo. People from the outside don't really understand.

Sometimes even other engineers can’t understand and a subset of engineers. I mean it happened to me, jumping on some slack channels and you're like, “What the hell is going on here?” You have custom emojis, you have custom acronyms for things that you only know what they really mean.

There are so many invisible barriers within each team. It's true for engineering. It's true for any team. But when you look at the developing of that culture in the organization where people feel empowered to use the asset and they're treated as an asset, they know that there are incentives in place so that if you find an anomaly or an unconformity or a gap or a mismatch, you should report it. You should know how to do it and you should be rewarded for that.

That's part of your job. It's not that because you're posing what you're doing and spending half an hour 15 minutes to submit a ticket or log. The problem is not that you're not doing your job. If anything, you're doing it. Not only that you're making sure that tomorrow's version of yourself and everybody else that will need that piece of data will find something that's usable, fit for purpose as opposed to nobody knowing what's going on. So, there's a bit of a change in perspective and mentality that we need to do to get around that.

I know this is hard because we left the uni a bunch of years ago and even harder for you but what would you suggest to someone that is clearly passionate about data and information? They want to develop a career and maybe they have a degree in data science. What are the skills that these people are clearly lacking and that prevent them from having an impact in the business?

James Edwards: It's a classic old engineering, better growth across the board. It's just a tool. It's a way of approaching a problem. You have to immerse yourself in the domain to be taken seriously by the rest of the business if you like.

I would never say that the business is sales and marketing and such. The business's engineers and data are equal part of the business but you have to be practical about it. You have to be prepared to go out to the warehouse. If you're working on a supply chain issue, do you have to go and work with the delivery drivers? If it's another sort of supply chain issue, to really get your head around digital marketing non-digital marketing and such.

You can't just isolate yourself and focus entirely on the data and the techniques. If you're having a problem, the solution isn't doubling down on trying a different type of database or a random forest rather than a decision tree approach or something like that. It's to go and talk, immerse yourself in the business, spend the time. It's worth it.  it's actually fun if you get over that hurdle to really learn about the domain that you're analyzing.

Loris Marini: Yeah. If you're the listener right now listening to the stream, I hope you haven't noticed all the cuts. We had so many technical issues today. So, James, I really want to wrap this up but I want to thank you for your time for your patience. Sticking with me for almost two hours to record this.  I'm so sorry for the technical issues but through the magic of editing we will get rid of them. I just want to say what an honor has been to have this conversation with you over at Discovering Data. So, thank you again.

Contact Us

Thanks for your message. You'll hear from us soon!
Oops! Something went wrong while submitting the form.