Taxonomy is the practice and science of categorization or classification, and it’s what we need if we want to be able to search for data.
Join hundreds of practitioners and leaders like you with episode insights straight in your inbox.
If you are a data engineer you probably feel that turning data into knowledge is super hard, no matter the architectural paradigm you use. Data sprawl is a nightmare. It erodes trust, and adoption, and it makes it really hard to improve on existing data products so we are constantly reinventing the wheel and going nowhere.
A big part of this problem is search. We can’t improve or build on top of existing data if we can’t find it. Taxonomies are what enable search and information retrieval, but when was the last time you heard someone talk about it? Today we are going to look at exactly what it is, how to do it well, and how to communicate its value to the business.
My guest today is Hellen Lippel. Helen is a taxonomy consultant with over 15 years of experience in the field. Hellen just wrote a new book titled “Taxonomies: Practical Approaches to Developing and Managing Vocabularies for Digital Information” published by Facet.
You can follow Helen on LinkedIn.
'Taxonomies' Book Giveaway
We are giving away 10 copies of Helen's new book: "Taxonomies: Practical Approaches to Developing and Managing Vocabularies for Digital Information". If you want to start cleaning up your data mess but don't know where to start this is the book for you. Want your copy? You have until Dec 17th to be part of this. Visit https://www.discoveringdata.com/taxonomies to win your copy!
A word on the publisher
Facet is the global publisher of books for library, information and heritage professionals, and the publishing arm of CILIP: The Chartered Institute for Library and Information Professionals. Facet publishes a range of titles for practitioners, researchers and students authored by some of the leading minds in the field, and have a commitment to producing quality content that advances the information disciplines.
If you’re interested in metadata, taxonomies, or cataloging, then Facet is where to look. All Facet titles are available in print and digitally through all major online booksellers and on the Facet website.
Plus we have a special offer for you – get 25% off the RRP for Taxonomies when you use the code TAXONOMY25. Just to go to: www.facetpublishing.co.uk/taxonomies.
Customers in the US and Canada can purchase the book through the American Library Association, and customers in Australia or New Zealand can purchase the book through Taylor & Francis.
Join the Discovering Data community!
Do you want to turn data into business outcomes and get promoted? Discovering Data just launched a new Discord server to connect you with people like you. Discover new ideas, frameworks, jobs and strategies to maximise the impact of your work. Data can be a lonely and challenging career, don’t do it alone!
Request access now: https://bit.ly/discovering-data-discord
Do you want to showcase your thought leadership with great content and build trust with a global audience of data leaders? We publish conversations with industry leaders to help practitioners create more business outcomes. Explore all the ways to tell your data story here https://www.discoveringdata.com/brands.
Want to help educate the next generation of data leaders? As a sponsor, you get to hang out with the very in the industry. Want to see if you are a match? Apply now: https://www.discoveringdata.com/sponsors
Do you enjoy educating an audience? Do you want to help data leaders build indispensable data products? That's awesome! Great episodes start with a clear transformation. Pitch your idea at https://www.discoveringdata.com/guest.
💬 Feedback, ideas, and reviews
Want to help me stir the direction of this show? Want to see this show grow? Get in touch privately or leave me a review with one of the forms at discoveringdata.com/review.
Your ideas help us create useful and relevant content. Send a private message or rate the show on Apple Podcast or Spotify!
**Loris Marini:** As a data engineer, you probably feel that turning data into knowledge is super hard. No matter the architectural paradigm you use, data sprawl is a nightmare. It erodes trust adoption and it makes it really hard in general to improve an existing data product.
So we are constantly reinventing the wheel and going nowhere. . And a big part of the problem, I think is search because we can't improve or build on top of something that already exists if we can't fund it in the first place. Taxonomies are what enabled search and information retrieval. But when was the last time you heard about taxonomies?
I certainly didn't. So today I wanna talk about taxonomies, what they are and how to do it well, and how to communicate their value to the business. My guest is Helen Lapel. Helen is a taxonomy consultant with over 15 years of experience in the field. . She recently authored the book Taxonomies Practical Approaches to Developing and Managing Vocabularies for Digital Information.
She loves sorting out messy problems with digital content and data, and harnessing the power of semantics to help organizations make the most of what they create, whether they're trying to make money work efficiently, or create knowledge. Her clients have included Electronic Arts, Pearson, the bbc, the Department Four, International Trade, the Financial Times, Phillips, and the Metropolitan Police.
She has been the program chair of the Taxonomy Bootcamp London since its inception in 2016, and she contributes articles to the search network and speaks and writes regularly for taxonomy practitioners. So I'm super excited to be here. Helen. thank you for being with me on the podcast.
Thanks for taking the.
**Helen Lippel:** Oh, thank you. No, really great to be here and to. Able to discuss this great topic with, your listeners and maybe get them thinking about things in different ways?
**Loris Marini:** I'm so excited about this episode because, we don't talk enough about the topic, and I think it's fundamental. So let's lay the foundations for this chat. What is taxonomy from your perspective, and why does it matter so much?
**Helen Lippel:** Um, well, I there's the kind of dictionary definition that you'll find on Wikipedia, which is a hierarchical structure of words and phrases. so it's an idea that came out of the library science world. I, tend to characterize it as a knowledge organization system. so it gives you a set of agreed.
Definitions and labels for words and concepts and puts them in a structure that makes logical sense to the users of that information system or, to the wider organization's needs. that's headline. but I'm probably not doing it justice because it's something that's incredibly powerful and has been increasing and prominent in, the tech sector, over the, since I started out in my, my
**Loris Marini:** Yeah. so let's
talk about that for a second. How did you start in the first place? how did you find yourself being anonom and did you know at the time that you were an taxonomist
**Helen Lippel:** no, a lot of people don't, Yeah, I fell into it accidentally. but once I got into it, I realized this was absolutely what I was born to do. so I started because I'd done economics as my degree, I started out with a financial newspaper here in the uk, at their business news, economic news.
They were aggregating sources from all over the world and adding tags to it. And this was at a time when, People were only just starting to develop information services over the internet and selling them to people I started out tagging news articles that were coming in from all over the world. you might get something about the steel industry in Malaysia, or you might get political news from the UK or, unemployment stats from America and It all just comes into a massive bucket.
And in order to make sense and to package it up and sell it to people, you have to add that context. And that's where the tagging comes in. And you don't. Type in a few words to describe the article you're picking from Predesigned Taxonomies for individual things. So for example, we had a geography taxonomy that included all the countries because if you are interested in United States of America economic news, you want stuff tagged with United States of America and economic news, unemployment figures, whatever it might be.
So I started by seeing. All this kind of unstructured text, essentially coming in and realizing the power of the, context and the information that we were adding value to it through this tagging. And I just thought, this is way cool and I've never looked back really. but I, it's very natural for me to organize information.
very natural for humans to organize information. Actually. It's, a very primal thing. You need to know if this mushroom is gonna kill you or feed your family. So that is a categorization. and obviously as we've evolved from roaming the Savannah and we generate information, and especially digital information now being able to give it context and meaning, so it can be used in different ways, is I think, really quite a fundamental process.
**Loris Marini:** just love you
the story of, the mushroom and the savanna a bunch of
images popped up in
my mind, of how, of
many examples of everyday categorization. It's so intuitive, yet when you get a bunch of engineers in a room trying to build data products for some reason that intuition does not turn into work.
That makes assets, easy to find.
**Helen Lippel:** E Exactly.
**Loris Marini:** that's what fascinates
me, right? what, how
**Helen Lippel:** Yeah. some, sometimes people, in, in, business and that's on the tech and the business side, they, just. Don't necessarily see it because they haven't necessarily been exposed to it. So they might be creating a new app or a data driven product, but actually organizing that information so it has value for the end users that needs to be better understood as something that can be done by people who live and breathe this thing and can really take, a very basic front end of a database and make it something that's really valuable, searchable, reusable for different kinds of audiences.
**Loris Marini:** Right. So this, extends beyond data products, right? Like categorization, as you said, can be applied to anything, even to real world if you work in a warehouse, ikea, how many times did you work in a, in an IKEA warehouse and find things that are divided, per aisle, per level, and you've got code.
And everything makes sense and you can find stuff easily. So there's a user experience, component. There is a, an architectural component. Everything has to work seamlessly together so that when you go to IKEA and you go, and you wanna buy things and you have the product code, it doesn't take you five hours to find
**Helen Lippel:** Yep.
**Loris Marini:** even when maybe one product is split into three or four parts and you have to combine them together, And, so yeah, it, absolutely, like the intuition is there. I think, the business value when it, when you move from the physical world into the digital world, it can be lost. So I wanted to dive, if it's okay with you for a few minutes, to understand really what is the business impact of A taxonomy? Do we talk about a taxonomy? Do we talk about taxonomies? And we, communicate this message to a business leader?
**Helen Lippel:** Yeah, no, great set of questions there. I, think it's okay to talk about a taxonomy or multiple taxonomies cuz it, it depends, and most of the projects that I would do would probably have multiple taxonomies doing slightly different things or covering slightly different facets of information. and unless your data or your content is completely homogeneous, if you have a physical library full of books and periodicals, you probably only need one taxonomy that covers all the range of topics, and that's fine.
But when we get into the digital world, which is what I've spent, a hundred percent of my career on, you are dealing with text, with images, with video and audio, with physical products, especially in kind of marketing space. So you are dealing with heterogeneous inputs and outputs. And what I really enjoy if I go in somewhere as a consultant is designing what that ecosystem of taxonomies will, look like.
And sometimes people just say, Oh, we just want one great big taxonomy. And actually what they need is smaller focused ones that they're actually gonna be able to manage. So that kind of answers one of your questions, I think. but you, I think the, number of what you wanted was the sort of business value, how you start communicating that,
**Loris Marini:** sorry if I
because the, the business
practical value. when you, think about my, real world,
what surrounds me, I've got objects, got pants, I've got papers, I've got clothes that I wear, I have toys for my kid, I have a car, I have documents about everything that
to me that knowing when things are and being able to collect them together by purpose. So if I am
reviewing, all the insurances
we have, I want to be able to very easily go somewhere where all the documentation around insurance is so I don't have to spend five hours,
**Helen Lippel:** Yeah. So can I give you some sort of example applications? Cuz sometimes people struggle to get their head rounds. It just because it is so powerful and can fit in anywhere and it's, not like saying, Oh, I've got a nail, I'm, and I'm gonna bring you a hammer and Bosch. so I think some of the places where people Might be familiar is e-commerce.
E-commerce would not work without proper metadata and categorization. in order to enable search filtering, if you are looking for shoes, you wanna know, are they designed for men or women, the sizes, the fits fit, the styles, the colors, And someone at some point has had to think about all of that possible information and decide which are the most relevant ones to put up on, on your website or your app.
just for one example, and I worked on a restaurant recommendations app where we had a whole bunch of raw data that was coming in from different data brokers and from restaurants themselves. And I worked with the product team to pick out which were the most important things and then to make connections between them so that we could do the recommendations, to try.
Make it not just really practical that if I want a Chinese restaurant in soho in Central London, that Bosch, there I go. But to be able to search by the budget, is it a really cheap and cheerful place or is this somewhere you would take people for a family occasion? and if we know you like Chinese restaurants, can we recommend you some Thai fusion or Vietnamese in the same street that you might wanna try?
Because people often want to explore. Yeah. Yeah. E exactly. and so yeah, I was working on various taxonomies and different kind of knowledge models so that for the end user, all they see is they type in a couple of search words and they get what they want, and then the whole bunch of other stuff that they might not have thought about or, here's this really new interesting place that you've never seen before.
So it's helping that kind of exploration. . So that's the kind of e-commerce thing where it's really about saying, do some proper modeling of your, knowledge and your data and categorization and you will get results. your, product will be seen as authoritative. Interesting.
And people will keep coming back, They'll spend money. So in a sense, those can be the easier kind of projects to derive the value. Cuz you're saying put this much in and we'll try and find metrics for what does success look like with your product once we've done all the hard work to organize the data and present it in a way that makes sense.
**Loris Marini:** I love it. I love this
because, as you speak
about the business
impact and examples of
actually do, e-commerce
being one, I see one to one parallels almost with the world of data
management. As in, the discipline
of ensuring that data
is trusted, is unambiguous
and you can
rely on, right?
Instead of having millions of copies of the same thing, we wanna have one thing. Instead of having 10, 15 different interpretations of a term, we wanna have one term that ideally people within the
organization agree, on. And when they see it, they similar
mental picture, hopefully of what that term actually a business operational perspective.
**Helen Lippel:** Yeah, sorry, I didn't mean to interrupt. And yeah, and sometimes, you have to come to agree definitions, but that doesn't mean you can't map in synonyms into the taxonomy. That's one of the really powerful things, and that actually leads onto, the search kind of use case. I've also done quite a lot of work in the UK public sector, which generates a huge amount of really important information, which is always changing, policies are changing, benefits information would be changing.
Governments and political parties are changing. those projects are always super interesting for me because you really get a huge bucket of stuff to try and sort out essentially. and that's where synonyms can come in really handy. So I worked on a UK central government web portal where. People, they don't know the official name for a benefit or a scheme or particular policy.
They'll just type in a couple of random words that have stuck in their head. And your job as someone working on that digital team is can you guide them to the exact thing that they actually need, even if they've got no idea what it should be called and why should they, People shouldn't have to be wants of, sort of politics in order to find out how to do stuff and interact with government.
So it's about producing all those kind of synonyms and relationships between stuff. So there are a few overlaps with seo, but it's not quite the same.
**Loris Marini:** again,
I see parallels with what happens in
data teams, particularly
in, in organizations that are trying to experiment with self-serve capabilities platforms that allow self-serve. A lot of people consume data that might make changes to it. We have in an increasingly
more capable, front-end
applications that dashboards and visualization
tools, but also, reverse CTL
and technologies that bring business logic that has been basically implemented at the very edges of the architecture when the data is consumed and visualized, which is at the end
of the chain
and broad back,
sometimes, one would, would want
to bring them back into the source of truth.
that cycle, is often.
As chaotic as what you just described, many people with different interpretations
doing things, the fly. And yet we
mess. So it right
for me as the person that doesn't know much about taxonomies, to think that taxonomies are what enables
this mess to be manageable
so that who
uses things, knows where, things
are, how they're related to other things, and they're a little bit more aware when they create last minute
Or am I
**Helen Lippel:** Yeah. no.
I think you are, talking about both sides of it really. You're talking about end users who might just be, the general public who needs to know when a particular VIN collection happens with their local council. So they need to be able to get that information really quickly.
But I think what you're also alluding to is people inside the organization who might need the most up to date version of a policy or a particular document. So taxonomies are really helpful and managing that internal mess of, content that everywhere has. And it, it's not unique to the public sector.
The private sector generates lots of stuff that's all in a big mess. And taxonomy's not, they're not the only tool. I think they're fantastic, but obviously, but it's one way of starting to. Get stuff in, ways that people can use it. it's not lost because every organization invests money in creating content, information, data, whatever it might be.
some of that will be wasted because it's not retrievable or people don't know it's there or it's not turning up in the right place for the right users
**Loris Marini:** wonder if there are, applications in regulatory compliance as well, like ensuring that you have an idea of what's happening in your data ecosystem, which fields, proprietary information, which fields should be hashed. I have a feeling that, some connection, right?
So the overall business value is that if you can't manage
what you can, measure and
can't see. and a
lot about dark data, but dark
data being, things that
are there, wasting space, or we're using space in a database, but you have no idea they're there because you can't find them.
You can't search
for them. a lot
of that has to do with this prowling effect of many
engineers, people naturally, moving from
a role to a different role or leaving the organization and features being
built and just left there. They, hanging in
this digital multiverse where we don't really know whether they are still up to date, whether they're
just crazy to, to think
**Helen Lippel:** yeah, makes my head spin sometimes, but keeps me in gainful employment. are you aware of the open data movement?
**Loris Marini:** Yes.
**Helen Lippel:** yeah, Cool. because public sector in particulars go back to that use case. They generate all this stuff and open data was very much about getting that out of the silos, getting it out from, behind people's desks and in ways that people could use it.
think open data really benefits from having context and metadata and, providence information so that you know where this has come from and you can judge whether it's gonna be useful to you or not, or how often it's updated. And all of this can be helped with cation schemes, coding schemes, metadata.
And this is where you start to see a lot of overlap between the kind of pure taxonomies that I do and data schemers and data models, which are probably pretty familiar to, some of your listeners.
**Loris Marini:** Yes. recently,
LinkedIn has been, a bit of
a buzz around the semantic layer,
**Helen Lippel:** yeah,
**Loris Marini:** the
data layer, the
semantic layer. I wanted to
dive into this a little bit with you. Where do you see semantics fitting into the world of taxonomies and into the problem specifically of sprawling data features and tables all over the place?
Not connected, not searchable,
**Helen Lippel:** Well, essentially taxonomies you could get into a really massive deep dive philosophically, but we don't necessarily need to do that. But taxonomies are trying to create meaning and to. Visible and understood and agreed. lots of the projects I work on, half of the work is just talking to people, understanding their mental models and saying, Oh, so you call it X they call it this.
the people over here use word for something completely different. So how do we reconcile that? Or you just get lots of differences of terminology. And, that's true of any sector that I've worked in, sort of media, government or retail. There's, always different understandings.
So taxonomies are trying to flush that out and to get that down on a page Agreed so that people can use it. And it doesn't mean you can't still have local definitions and local ways of doing things. But now you have an asset in the business which is hopefully being maintained and governed properly, like a data dictionary essentially.
saying this thing has this definition and needs relationships to other things. So essentially that is semantic and taxonomies and knowledge organization systems have been part of that whole push for semantics for a long I know Tim Burns, Lee has had that vision of the Semantic web for, Forever it seems, and it was stuck in academia for a long time and it was always just on the cusp of being a thing and breaking out into the mainstream.
And I think we are finally getting there. So are you aware of the concept of knowledge graphs?
**Loris Marini:** yes. Often
confused with knowledge graph databases, which are not the same thing, obviously. One is the technology, one is
**Helen Lippel:** Yeah.
**Loris Marini:** so, I give
you my interpretation on knowledge graph and tell me if it's correct because it, it could be completely wrong. A knowledge graph, the way that I
understood it is way of representing
as an actual graph made of nodes and edges. The nodes are the entities, the
edges are the relationships
between entities or the connections. And so it allows you to quickly
navigate and find, connections,
for lack of a better word, between one entity and
the other. for example, if you, are a
guest, you are Helen Lapel, and I know you work
in taxonomy, taxonomy
In common between you and five other guests. And so with the knowledge graph, I can easily follow that connection, that relationship. Something that with a relational system, like a relational database, would take me a bunch of queries and a lot of to find, to
walk the, graph.
**Helen Lippel:** Yeah, absolutely. I think that's a pretty good
people have long, endless arguments about what a knowledge graph is, but I think you just have to try and nail down the basics of what it's trying to do. And that's exactly it. You have, entities and they are related to things and you know what the nature of the relationship is.
because in a the, relationships are relatively limited. will have a parent term of something, a child term, a related term or a synonym, and essentially that's it. which is fine for a lot of applications. But then there's also these things called ontologies, which have those kind of known relationships so that you can say that Helen lapel works in taxonomies, or Helen lapel lives in London, and you've got a lot more flexibility in defining what those.
Named relationships are. And from that's where you build up all your lovely nodes and edges that can be really easily queried. you think about LinkedIn that's got a big knowledge graph or probably multiple knowledge graphs behind it. Airbnb are using knowledge graphs a lot because obviously they have a lot of data rich, and they want to improve recommendations and those things that we were discussing earlier, search quality, I M D V.
So you wanna see this person that was in that show you liked what else they've done and who else they post are with. so yeah, it's exactly like you're saying. You can traverse the graph to find out useful information taxonomy is a kind of one component of it
**Loris Marini:** it's getting meta right. Really quickly, let's get meta,
**Helen Lippel:** Why not?
**Loris Marini:** the taxonomy and the ontology of taxonomies and
ontologists. So a taxonomy a hierarchical
classification. So an
example is, I don't know,
microphones. Within microphones we
have capacity, microphones
within capacity microphones.
We have a particular brand at a particular price point that would be a hierarchy. So that you picked up this thing that it's in front of me, That thing is
belongs to, capacitor
based microphones, which belongs to the category of belongs very linear.
So the equivalent intended experience in every
is the folder in the sub folder, right? We
have, tax invoices,
of this financial year for that particular quarter and so on. with the ontology, you did something fantastic
there because you said, ontologies
allow you to jump in another direction.
Rest taxonomies are vertical
and they go one
way or the other, but along a line. And ontology is what allows you to escape the
**Helen Lippel:** um, sort of,
**Detached audio:** right
**Helen Lippel:** again, I'm not too dogmatic about terminology, but taxonomies, you can also have the idea of a related term so that you can say, Helen Apel and taxonomies are two concepts that are related, but what you don't have is any idea of how they're related. They're just two strings essentially, that you've drawn a line between.
that is fine for a lot of purposes, but like we were saying with the ontology example, that's where you can add the appropriate name to that relationship and instantly you've got a next level of, semantic meaning.
**Loris Marini:** in that example, if I have to contextualize this particular episode, Helen Lapel is a guest of Discovering Data
Podcast. that relationship is, Helen Lapel
is one node in the graph Discovering Data podcast is another node, and you being a guest is the link between you as an entity and this current data podcast as another.
**Helen Lippel:** Yes. E exactly. And if I was modeling my life, I could say, Helen Lapel is guest on this podcast or that podcast. And we would know that the two podcasts were related because they were both podcasts. They might be in their own separate control vocabulary of podcast names. But I could say that I was a guest on this podcast, or I spoke at this conference.
that's essentially some of what LinkedIn are, trying to do, cuz obviously people are inputting all this wonderful data all day long. again, it comes down to a big bucket and how can you derive value from that
**Loris Marini:** Yeah. awesome. I'm actually
this cuz I'm
a I, want to,
dive now into and then
really explore with
you because it's nice that we keep doing parallels
between, the front
end, the application and the back end, the data
How, does a
taxonomy project typically start?
**Helen Lippel:** guess the glib answer is when someone realizes something is broken, but normally what, will happen? Some project or idea will start being discussed in the organization, whether it's, launching a new app or a new portal or doing, a migration to a new content management system or digital asset management.
And at some point somebody will realize whether that's top down or someone on the project team, that there's something missing in the middle, that there's a lack of context or meaning or we just can't find anything. if the organization has an in-house taxonomist, they would probably try and get onto the project and say, what you need is, vocabularies and a metadata schema, try and do something.
So you'll have that kind of initial scoping phase, which is really, important. So that like with any data project, you don't. Try and boil the ocean, make it far too complicated. But you also want to make something appropriate and useful. if it's a system that's gonna have lots of documents in, say like a government website, you don't just rely on a free tech search for people to be able to find stuff that you need some context in there and some tagging, some way of presenting that to people to be useful.
So I would say the scoping is the really critical part, both true of any technology project. A lot of tech projects go wrong because the scoping is wrong.
**Loris Marini:** I just
love that, as you mentioned these names and describing it, I love the
fact that taxonomist, they touch
a lot of different people within the
organization. to map the, stakeholder map, of a taxonomy project.
**Helen Lippel:** Yeah. And that's of the bits of the job that I enjoy. yeah, I like playing around with information and language and modeling stuff. But actually the, on a big project, I'll be dealing with business analysts, product managers, project managers, technical architects, developers,
**Detached audio:** UX
**Helen Lippel:** managers.
yeah. UX designers. because it's such a critical component. if, People are thinking about it early enough in the project, then it's got a much higher chance of success. sometimes stuff will get built and then it'll be, Oh, maybe we should slap a taxonomy in here. And that's, not always an optimal approach.
It's much better to design holistically all the different components. there's no point in investing in, a really high end database system or a knowledge graph database system if you're not gonna do that sort of information layer, that hard work of what are the models, what are the schemers, what are the concepts we're interested in, how we gonna get this adopted?
Who's gonna maintain it? really, fascinating. It's not just a case of I sit in a room for two weeks and play and then hand something over and we just plug it in. Like a sim card.
**Loris Marini:** The more you talk,
the more I see data management and the problems we have in data
management. It's come on. why is the
stuff not being taught in data
management 1 0 1? before you
even try to think of naming yourself a data engineer responsible for modeling data and serving data features across the organization at scale reliably, you should know about
so in a way, I, you, I'm
getting excited because my
intuition, two years
ago was that data ease. A, multidisciplinary
exercise, and then no matter who you are, whether you call yourself a data scientist or a data disorder, did an engineer. That title
doesn't narrow the, sort of
problems you should worry about to do a good job.
to open up and, look at
a bigger picture. And so we're mentioning UX
professionals, architects, worrying about
the end to end flow of things and who's use it and what kind of experience they're gonna get.
**Helen Lippel:** even if you are a UX designer and you don't really touch data at all, if you've got an appreciation of, you are designing the search interface page for an e-commerce site, if you have an awareness of the taxonomists work on your project, you know which filters you might wanna put up, how you wanna present them, how you want the interaction to work, where you're gonna use different kind of color and graphic devices to make things more usable, and the more people have an appreciation of any of that, it makes my life so much easier because you're kicking at an open door.
And I've worked with fantastic people all across different disciplines, and there's a few developers that understand semantics and taxonomies and you just have a really great interaction because you're talking the same language essentially in the end.
**Loris Marini:** Yeah. And there's an
element also of, educating
people and upskilling
them to, imagine you,
you make a new hire, someone joins the
engineering I feel like a lot of the business logic is embedded in. the relationships between these
the, all we
do in the end is mapping or trying to replicate a model that is ideally as close as possible to the real world.
But in the digital world, in databases, whether they're graph
relational sequel, I, don't wanna
say it doesn't matter because it does matter, but that's the role of the architect to, to peak the right technology. The, but the point is we are trying to create a digital representation of the world, and the world is, Entities, people, objects and relationships between them.
And so if we don't have a good grasp or understanding of the relationships between the entities that make up a business, we can't really understand the
business. And almost feel
like there's, at the byproduct of the taxonomy of this hard work, of this, an intellectual work that you cannot outsource and it's hard to buy.
get help from, a consultant,
but you can't push a button. It magically happens. You
have to, sit in that
room. You have to have those
**Helen Lippel:** yeah, I want, people to see their, taxonomies or their knowledge graphs or whatever it might be, as being a really integral business asset rather than just a thing. You do one off for a project and then it's done. people treat their technology as a business asset, so why not treat your business knowledge, which is codified in all these different vocabularies and data schemers that is a business asset that needs to be looked after because it ultimately, it's what you derive value from as, a business, whether you are trying to make stuff easier.
The general public or whether you're trying to make money or you're trying to use content efficiently. So I always try and hammer that into people.
**Loris Marini:** what
greater tax onom?
**Helen Lippel:** gosh, lots of things. I think being a good communicator, being able to understand different people's motivations, respecting that not everybody is as fascinated by taxonomies as me.
It's hard to believe, but , people just want to get stuff done. And those people you try and. Talk to in a slightly different way. That's where you emphasize the ROI or the business value or the cost efficiencies. And other people you do wanna dig straight more into a lot more detail because you're trying to hammer stuff out on a day to day basis.
So I think it's understanding motivation styles, being able to get along with a lot of different people. think it curiosity is a really good thing. because in my book I pulled together my sort of fantasy squad of all many of the best taxon I've ever worked with. what they all have is that kind of curiosity, whatever sector or project they're working.
they will find it interesting. I've never worked on a boring project because there's always something interesting, whether it's restaurants or, passport application processes or, a supermarket website. There's always interesting discussions around language meaning to be pulled from that so there's that, it's having also good awareness of technical skills of how to build a taxonomy and how to think logically about meaning.
And sometimes this gets characterized as of being a lumper or a splitter, in the taxonomy. do you lump in all vehicles together or do you split things out into sports cars, tricycles, trams? And the answer, the correct answer will be, it depends on the particular system.
**Loris Marini:** depends.
**Helen Lippel:** So there is, there's very rarely a one size fits all that.
That's why I often steer people away from buying in a taxonomy from somewhere else. unless it's something that they shouldn't need to think about themselves. a, list of all the countries in the world and their official titles and their capitals and all that sort of thing. e-commerce site might not want to bother about managing that themselves.
They can just pull that in from United Nations or whatever. But usually if you are trying to create your own value and your own differentiation for whatever product or system it is, that's why you should be building your own taxonomies.
**Loris Marini:** I
love it. Is the
stress on, pragmatism,
yes. We can
keep asking questions and go deep into the
philosophy and the, science
knowledge. we would probably spend weeks and never get to,
**Helen Lippel:** Yeah. People spend their whole career arguing about this.
**Loris Marini:** Yeah. there's people
that definitely did that and that they are still doing it or we can get really pragmatic and say,
we care about as a business.
This has business value. It makes sense doing it. I can see the operational, the risk
management impact of this. if
I can't see it, then there's no point in doing it. So do you split or you don't split? Do you leave
them together as a macro category or do you go down one extra layer is a pretty critical question.
And that also means that, I suppose if you read the taxonomy of a
business, it's well
built, you should be able to infer what matters to the business and what doesn't.
**Helen Lippel:** Yeah, if I come in and there's already taxonomies in the business, you get a sense of where they are with their stuff, if it's really, messy. Like I went in somewhere and they hadn't had anyone managing the taxonomy for five years, even though it completely underpinned their business model.
So it, it really was a complete mess. It was like going into a garden full of wheats and just hacking it away to find the bits that were still working. you see the kind of language the business uses, the relationships they care about. You can also get a sense of ethical dimensions.
This is quite an interesting facet of, construction because I, think maybe a few decades ago, people would just build. Taxonomies or classifications that were just their view of the world and they would say, This is universal. is the only perspective that matters. And I think we're so much more mature now in understanding diverse perspectives.
if you think about some of the place names around the world, and, particularly in Australia where things have a kind of indigenous name and what they've been called for, other communities for the last few decades. And knowing that there are different official names and recognizing that your perspective is not the only one in the universe.
So that, that wasn't meant to be a big rant, but I just think it is a really interesting perspective that sometimes this stuff isn't neutral. Even if you're doing a taxon of lawn mower parts, there's probably some weird. Ethical dimension to that, that we haven't thought of?
**Loris Marini:** Interesting. I wonder
perhaps getting the full diversity and and inclusion
picture requires a taxonomy and the full knowledge graph because I'm thinking situations where there are things that you can't really place
in a box. they belong
Yeah. Yeah. Or depending
on how you see them. Or multiple. Yeah. So what do you do in those cases? You gotta pick a box or do you pick three boxes and
make copies and, express
relationship a graph?
**Helen Lippel:** it really depends on the application. if I was doing a taxonomy of Australian place names, I would be really careful to make sure. that we had the right relationships between the different official names, because that's really important to get right. If you were doing something for Australian government, you need to get that right.
But if you're doing something like music genres, maybe you don't need to be so dogmatic about this has to fit in a box, or this has to be shown as a subgenre of this because nobody really owns that. Some of those ideas, have very different interpretations and maybe that doesn't matter so much.
we are not really in the world of, the only way you can get music is to go to a local record shop and you look in the rock section or the reggae section with digital information, you have so much more power, you have a lot more ability to get lost. But maybe. You can be a lot looser in those definitions if it's appropriate to the project.
**Loris Marini:** I love it because in
the book there's a
that really, stuck with
me. You said, rather than owning a taxonomy, think of yourself as a steward of organizational knowledge.
**Helen Lippel:** Yes.
**Loris Marini:** that's
referring to, where
**Helen Lippel:** Yeah, yeah, yeah,
**Loris Marini:** imposing
one way and this is the only way it's inevitable that will there
be clashes and, Issues of,
**Helen Lippel:** yeah. And there might not even be a massive ethical issue, terms of diverse perspective. It just might be that. Different people use the same word for completely different things. I found this on a, marketing project. I can't remember the term in question, but the, different definitions in the teams were completely diametrically opposed to each other.
So you are trying to bring in those perspectives and have some agreement and saying, can we try and reach some agreement between all these different teams? Because these are the benefits. in this particular project, they were being really inefficient because stuff was being created and then it was going into the wrong systems because of what it was called and how those different systems understood the meaning of a particular kind of asset. you start saying, doing stuff that hurts your business. Can we do things in a better way?
**Loris Marini:** I think this is
a perfect, segue for
next, which is the how to, build
taxonomies. And I know you
have a, most of, the book
is really focused on this. We don't have nowhere near the time that we
need to it all. But, perhaps it's
fun to start from what can go
wrong because started
book and I
Oh, in loops poly
a lot of, fun, fun stuff
can go wrong. your, summary
of what can go wrong in a taxonomy project,
**Helen Lippel:** Oh gosh. they can go wrong in . So many
I'd say the kind of boring answer is that if the project hasn't been set up properly with the right scope and the right time and the right expectations, that will make everything go downhill. But then once you're actually starting to build the taxonomy, I, think it, it's things like if you don't get proper agreement on what things are, if you still have a bit too much vagueness, then that can potentially go wrong.
Yeah. And I know, the very funny chapter in the book talks about infinite loops, where things are basically related to each other and, that just really confuses a computer because computers are really stupid, essentially. poly hierarchy is, I love that word, where you have terms they live in multiple branches of, the taxonomy.
So if you had the term cats, that might be a child term of pets, or it might be a child term of fee lines or a child term of furry animals, whatever.
**Loris Marini:** is
seeing on Wikipedia when we say this
ablation page, Did
**Helen Lippel:** yeah,
**Loris Marini:** this context? Yeah.
**Helen Lippel:** it's a similar thing, but yeah, that, that's another good example. yeah, we'll talk about this ambiguation where because language is messy and I, it feels like the English language is particularly messy for a lot of things. if you. Set that upright, can cause problems because users, they've searched for something and they end up somewhere completely unexpected.
I often use the example of sort of Turkey and Turkey, as a food staff. Turkey is also a country. so when I was working at the bbc, they have lots and lots of Turkey recipes cuz they do lots of food stuff and they have lots of news about Turkey. So across the whole website, , you've got lots of Yeah.
you can think of infinite examples of that. And when someone searches you want to try and steer them in the way that best suits their needs. projects, we always have massive problems with Iceland because it's the name of a country, but it's also the name of a, frozen food supermarket in the uk.
**Loris Marini:** Yeah
**Helen Lippel:** and that can go wrong. we, had a big problem a few years ago when a company from Iceland, the country bought Iceland in the supermarket. So all of the little disambiguation rules that we'd set up in the system to try and make sure we are tagging correctly with supermarkets or the country, that all just went up in the air because suddenly both of these concepts are being mentioned in the same place.
So it's not, that isn't really example of where a taxonomy goes wrong, but it's an example of the ambiguity of language and it can be difficult to ameliorate that.
**Loris Marini:** And when you combine the ambiguity of language
with change, especially
in the context of data
management, with data
sets change all the time. A new business initiative might mean new data sources altogether. New ches,
new meaning, that somehow
they have to be connected for, if you want an
That's the reason why we joined tables in the relational
databases, is to get
information in and, dive deeper
into the root causes of something we've been
I think it's, that's the hard part, right? Because as an engineer you have to pick your fights. solve them all. You have to prioritize relentlessly because the business keeps asking questions. The analyst want data. The
scientist new features
constantly, asking the question, Do we really need it ? And the answer to that question can
be very, yes, we, everything is needed, but is it
essential? And, and how do
you make that call when things keep changing all
come across or anything, or have you developed
a system to, ensure
that taxonomies can scale and can react to change as well?
**Helen Lippel:** often the answer to this is not technology, it's business process and people, If the taxon is being used in a end user public place, you need to be aware of how language and concepts will change. And thinking of colleagues that work on fashion projects and think how often completely new types of style or completely new items come along.
And that's where you need the business process, whatever it is, the analyzing search logs or customer research or talking to your marketing team and saying, Oh, there's this brand new category of thing that's come along. We need to make sure that's in the taxonomy so that when people search for it, they will get it.
And that's why you need really good maintenance and governance processes to make sure that level of change, which is knowledge that everybody in that organization will have floating around their heads. , sure that gets reflected and codified into the data and knowledge models that you are relying on for your search or your digital asset management or your content management, whatever it might be.
**Loris Marini:** I'm getting a lot of clarity. I see at
least two things that I want to try and summarize to, to
crystallize it. and please
correct me if I'm getting it wrong because I One is the hierarchy
between, the relationship
between data, information,
knowledge, and, wisdom
it's after knowledge. data is just bare, numbers could
be binary, could be any other base, but it's. Very basic representation of
information. it's not information
per se, because if it's not structured and you don't know how to pars it, how to read it, it's just a bunch of bits. When you give it structure, it becomes
When, information is, in context,
it becomes knowledge. So it's
not just, the date today
is the 15th
of August. that's a piece
information. but when you,
you have it in context with something else, Okay, Today is the 15th of August and I'm recording a podcast with Helen Lapel. That is knowledge.
Now I know what's happening within
that day. who, are the
actors and what's their relationship? at the knowledge
layer we need, a vocabulary
because we need to know what terms mean.
Otherwise we, can't build a mental picture of things.
We need a. , hierarchy
between terms because it helps our thinking and a, it helps us create, find similarities
between objects and, create
**Helen Lippel:** Yep.
**Loris Marini:** and we need a, graph or
so, say an ontology, because the ontology gives us the relationship between entities. So
it's a, name or
subject object, and the verb between
them. So the, example
before Helen Lapel is recording a podcast, Tolos. That's an ontology, But what does Helen lapel mean?
If you go in a vocabulary,
we should be able to know what
that means. what, does
podcast mean? Is in the vocabulary, the knowledge is the combination of the
hierarchy, the, ontology
vocabulary. Is that
**Helen Lippel:** Yeah,
Yeah, no, that, that's a great summary. that's better than I normally manage when I'm trying to explain my job to people. I'd say, yeah, the extra kind of layer on that is appropriateness and a lot of the design decisions that we make when we are looking at those systems. for example, we've exchanged lots of emails and information.
You don't need to know my shoe size. That is a piece of data about me, but it's not relevant to the use case that we are dealing with here. So it's knowing what to discard, what to have. , how to classify it, really at a baseline.
**Loris Marini:** This
is huge. I, don't hear
argument. this is massive for, data engineer. you're prioritizing
every single day, every instant of your life. If you knew what to focus on. Like one of the things we, it's a big, huge pain point is impact analysis. We create all this models, we write all the sequel, and then you wanna make a change and you're like, Oh, I guess we just have to hope for the best that some dashboard downstream is not gonna break.
And we get a CFO
shouting in a
Where is my report? What's going?
Yeah. Cause that's what
happens in reality.
**Helen Lippel:** Yeah, and, having the right things in the reports. Like sometimes you'll get someone saying, We must have dashboards, or we must have reports, and you can chuck loads of stuff at them, but what is it they actually need in order to make decisions or understand what's going on or understand what, what's going wrong with the system?
So it, is really logical design mindset to these things. And, I, I've been accused of just being a list maker in my role. this was a long time ago, but it's rankled with me. it's not just about writing out all the examples of the particular thing and then sticking that in the spreadsheet to plug into a system.
that value and meaning in order to do something else. So I'm not quite sure where that fits on your stack of data, information and wisdom, but arguably it could come in at any level.
**Loris Marini:** Yeah, the process of asking questions and probing and
testing, testing is
big part. I know you have a section on the book entirely dedicated on testing and validation
because that's, I the main track is governing the process of ensuring So building it, one is one part of the problem.
How do you go from zero to a taxonomy and how do you go from a taxonomy that keeps following changes that
naturally happen in the real
world? Cause of the
difference with, Australia
countries, and, different, languages
that for example, Aboriginal people speak, that it's a body of long knowledge.
We have books that have
been written that information
has been collected. It's been document. , it's unlikely that tomorrow a new slang will come up. Especially now, unfortunately, maybe in the past. But there are things that change much, much
faster. Like the example of, a feature, that a data
science team needs a new interpretation of customer, of core
conversion for, a user
within, as part of a user journey.
So those things
can change much faster. How do we
as data engineers, as taxonomists, manage and govern at taxonomy without being hated by everyone else?
**Helen Lippel:** that is the 64 million question for some of my colleagues. no, It should never be hated . I, think about knowing the right places to look for where you might see changes coming through for a start. So there's a really great example in the user testing and validation chapter of the book,
The, use case was a charity. the, guy who wrote the chapter was working on the information architecture for the website for particular events that they run for fundraising. And he just thought the most logical thing was to put marathons, in one place. And they did a bunch of user testing and a massive majority of the users thought the marathons were somewhere else.
But that was a really surprising thing because it challenged his assumptions and I was like, this, what, I did was not illogical, but they had to make the change. And the trade off of that is, as long as it's not something really dft, cuz you wouldn't want necessarily wanna do that for brand reputation, but they changed the website and more people signed up to the events.
More people are fundraising for the charity. So you've got a nice end to end example of looking in the right place, adapting your thinking, making a change leads to. Better outcomes. And so you have to keep making the case that this is a valuable process, that this is not a one off thing. And that's, it's sometimes frustrating that, other technology projects, people wouldn't ever see that as a one off.
Oh, we've installed a new system for managing all our HR data. We're never, ever gonna look at it again. Of course you don't, you, you adapt as new features, new systems, new requirements come into play. So why not do the same with your vocabularies and your definitions and your schemers? that's why you do need people who can keep an eye on this all the Keeping in touch with the business, doing that testing, analyzing user data or product data, wherever it's coming from. Having those discussions with people about should we put this brand new fashion item? Up on the website is a category. Is it really important enough? Is it gonna grow?
Is it gonna make us money? Or is it something that we think is really minor and can be handled another way? But it is that constant iteration because as you say, data never really sits still. you are dealing with,
**Loris Marini:** yeah, if you wanna thrive in the economy of knowledge, we need to learn and we need to keep learning adapt.
**Helen Lippel:** actually answer your question there?
**Loris Marini:** Yeah. No, it does. It's just that you see me puzzle because I kept thinking about analogies with other fields, thinking about something similar for and ontologies and data management more broadly. current state is a. in most organizations is a mess. We are lacking lots of things. We lack the accountability. We lack visibility. We lack top leadership. We lack many things.
But in general, going from state a mess to state B, oh, okay. Now the system is ordered enough that at least we are. When it gets outta control, we have enough visibility data, enough of an understanding, enough awareness that we feel it when something is trustworthy or not. Because it's never done.
It's clearly a continuous process. Things change. We keep trying to put order in cows, but it gets messy again, and we have to reorder systems forever and ever until the business runs. You're gonna still worry about this sort of stuff.
Is this something that you see from experience, that there is some sort of minimum level of order that you need to have for the system to then have a good chance of becoming ordered enough?
Or it doesn't really matter. You can start with a with a small, very,
**Helen Lippel:** Again, I think we've mentioned it a few times being pragmatic that, maybe when confronted by a mess, you are not gonna design the knowledge organization system to rule them all.
maybe it's just too complex, but you need to maybe chunk it down and start small, know, sometimes I go into an organization and they've got nothing in place. They've just got siloed information, data streams, dashboards, people with all sorts of different requirements, sometimes quite conflicting requirements and.
You can't come in like the White Knight on the horse and say, Don't worry, I'll just make a big ontology. Everything will be fixed. I'd love it if I could, but maybe you pick your battles and say let's start with this project. Can we do it as a pilot or a prototype? And just start with something small, demonstrate a bit of value, get people using it, get people thinking about it.
Sometimes people turn off and say, Oh, I don't wanna do tagging. I'm already really busy creating content for all these different channels that we're supposed to run. Oh, I don't want more work. But it's about trying to help them and make it intuitive and automated if possible. And then showing the benefits that actually the stuff is gonna be better organized, it's gonna be better directed to the places where it needs to go sometimes it is better to work from that sort of almost grassroots way.
Rather than designing absolutely everything because
**Detached audio:** Yeah.
**Helen Lippel:** looks great in my head. It can be a kind of five dimensional chess game. But actually real people are not working like that. They want systems that work for them that make use of what they're doing. So Get them realizing the value of it, get the higher ups, realizing the value, and then maybe next time you build on that, you build out from it.
**Loris Marini:** Yeah I you just gave me an idea cuz there are interesting conversations in software development and software engineering. There's a famous book, Glin Code by Robert Martin handbook for Agile software Craftmanship. lot of people that argue that test driven development is the way to, to go and has to be pure.
Every single line of code has to be tested. You start with the test. Actually, that's what test driven develop. Really saying, you start with a test because the test forces you to express the meaning what you're trying to achieve. And then you write the code that does the function. But then there, there's people that be, that argue that no behavioral driven development is the way to go.
Testing is too, is impossible. You're trying to clean and make sure everything is tested. You spend more time writing tests than time writing code. So the folks that are technical, that are developing the software we use every day have this sort of arguments every day. Imagine when you now open the gates to everyone within the organization instead of worrying about software.
You worry about knowledge within the company. So many peoples with so many ideas. So it would be nearly impossible, I think, to to do something equivalent to TD to test human development where. Every single thing is tested and you're aiming for a hundred percent test coverage or 95% test coverage so that you have clean code, in this case, clean taxonomies, stable vocabularies.
It's almost a mirage. Right.
**Helen Lippel:** Yeah, it can be, Yeah. It's just, you have to see it as a kind of daily work. It is the work it's not just an equivalent to what you are saying. You code a thing and then it's done. Nobody, no decent engineer would ever think like that. So why is it any different with you, your vocabularies and your data, but Yeah.
**Detached audio:** The point is to be pragmatic. Yeah.
you only do the things that drive business value and not
**Helen Lippel:** yeah. It's like you probably think, the second you finish the taxonomy, you think, Great job well done. I'm off to the pub, it's perfect, but within about 10 minutes, probably something's changed. Not
**Detached audio:** Yeah.
**Helen Lippel:** but just things are always evolving and that's not a bad thing.
That's what like life dynamic and interesting. Yeah, there probably is that millisecond where you can go. Yep. This is the baseline. We are happy. Yes. Stick this into the system.
**Loris Marini:** . Perfect. Helen, I think we're approaching first the end of our time. Wanted is there anything that we haven't covered that it's an absolute must remember from the broader context of a knowledge management.
**Helen Lippel:** God, I could talk for all day on all this stuff, but I think we've covered the main things I'd like to get across to your audience, that information vocabularies, metadata schemers, are such an important business asset and should be treated with care and love and not just forgotten about or shoved in at the last minute on your project.
you don't have to be perfect. You don't have to be a philosopher of semantic knowledge to make these things work.
**Loris Marini:** Fantastic. I think I wanna do something and it's fun. I've started doing it recently. The 62nd summary. I've got a timer here with me. You tell me you're ready. I'm gonna fire it off. And imagine you are at a pub, at a bar, drinking tea with a junior taxonomist someone that is super excited and wants to get into the field or sees the value, but they have zero experience.
What would you tell them in what order and what in terms of skills, upskilling, but also strategies, things to remember, the top tips you learned in your career. You tell me when you're ready.
**Helen Lippel:** I think being interested in taxonomy is a fantastic thing. It's been really good for me. I would think about what. Experience or skills or interests you have now, and think about what applies to your situation. So lots of taxonomists coming from different roles, they might be in other digital roles, technical roles, or someone I worked with was a town planner for a local council.
So think about those things. loads of free resources all over for, of YouTube and industry associations, conferences, webinars, if you can time for those. also say read my book because it's full of fantastic perspectives from all sorts of different people. We've got 17 different contributors I hope I haven't scared them off completely. But
think the sort of formal education in this stuff is still not quite there. It's still quite scattered between kind of library science or you might cover a little bit in computer science with semantics and data dictionaries.
People who do digital stuff like product management might do a little bit on data. So I'd like to see stuff get a lot more coherent. I've done some teaching of people post-graduates they, they come from a kind of library science background, and you can show them that there's all these really great applications across.
Digital world, which is only growing, it's only gonna be more, more important in our lives. And that there are lots and lots of free resources if you can't do a postgraduate course or, get your employer to pay for training. Look at the videos on YouTube, read a few books, read a few blogs, talk to people on Twitter or there's various kind of communities of practice.
So I think being curious and being a self starter will you going a lot more than doing a particular course and trying to follow a pathway, because that just doesn't exist like a lot of roles in the data world.
**Loris Marini:** Fantastic. So taxonomies practical approaches to developing and managing vocabularies for digital information. I started reading this book. I'm 30% through and I really liked it so far, so I'm really looking forward to smashing the rest the remaining 70%. Helen, thank you so much. This has been a great fun for me.
I hope you enjoyed it too. The process of just exploring and trying to connect
**Helen Lippel:** Yeah.
**Loris Marini:** complex stuff less complex so that we , we make more order in the digital mess we live in.
**Helen Lippel:** Yeah, absolutely. No, thank you for this opportunity to articulate a lot of stuff that feels like second nature to me, but I want more and more people to get their heads around and think this is really cool and useful. So I hope I've achieved that a little bit and not gone on too many tangents.
**Loris Marini:** Absolutely. No, it worked for me. and I'm super excited. I can't wait to release this episode so As usual, we'll get in touch. Ellen, thank you again for being with me and I forward many more chats in the future, like this
**Helen Lippel:** Yeah,
**Loris Marini:** Thanks.
**Helen Lippel:** no, thank you so much for having.
**Loris Marini:** Cheers.