Introduction to Data De-Duplication in Equipment Management, how to unify data from various sources (ERP, maintenance, telematics) into a single record, preventing data fragmentation and improving efficiency.
Class Transcript
[00:00:00] Hi everyone, so this recording will be about something called *deduplication*. Deduplication is something that you don't often speak about in your, I don't know, when you're drinking a beer with people, but when it comes to equipment and equipment management and software, it's actually pretty important, which is why I wanted to record this, to be of help.
For those of you who know already a bit about this topic, I hope this will clarify some things.
For those of you who don't know anything about it, hopefully it will give a quick explanation. Let's start. Imagine you have a fleet and in the fleet you've got obviously various assets. Let's pick up one of them.
An excavator. A Caterpillar 336 for example. The data about this Caterpillar actually sits in different places. You have information about this excavator in your ERP. Let's say in Viewpoint or in Spectrum or whatever your ERP is. You also [00:01:00] have data about this excavator in your maintenance system. So your maintenance system could be E360 or B2W or it could be Clue or something else.
And that system also has information about the same excavator. And then you also have, telematics. So if it's a Caterpillar, it might be on VisionLink. maybe it's an old Caterpillar, so actually, you have all the information about it coming from a GPS system, potentially, because you put a tracker on it.
we look in just one small excavator, we realize the data about that machine sits in different places. When it comes to your mechanic, he doesn't care about the information that sits in different places. They just want to find out what's the service history, or maybe when you acquired this asset, or what's the status of the warranty?
And all the information needs to come from different systems. So when it comes to merging the [00:02:00] information from these systems,
computer systems will end up having a record coming from your ERP, and another record coming from the maintenance system, and another record coming from your telematics potentially, and another record coming potentially from the GPS system.
And you need to find a way, to tell the, single pane of glass system, whether it's Clue or whatever, maybe you built something internally, you need to find a way to tell, all these systems that this is the same object you're talking about. It's the same Caterpillar asset. Okay?
Think about it as lines in a spreadsheet. You don't need four different lines in a spreadsheet because the data comes from different places. You want to have it in one place. This process of taking data from various sources and then, making it unified is basically what's called deduplication. Because we are taking duplicated records that are coming from different places and we need to create [00:03:00] some rule to deduplicate them. So we end up with one.
Now, this is a fairly complex stuff to do, actually, for computers. In an ideal world, you might only have five assets, and in all systems, for example, you would give them the same name. So then the computer program can go and deduplicate and say, hey, if Caterpillar 336, equipment number EX001, exists in more than one system, let's put them together.
Unfortunately, you probably have different names for the same equipment in different systems. For example, if in your ERP you called this excavator "EX123", but in your maintenance system you called it "EX 123", for computers that's not the same thing. And if you never bothered updating the asset name in VisionLink, maybe it's just using still the serial number, so it doesn't even know [00:04:00] which asset that is.
So a lot of the work we do in Clue, behind the scenes, is a very complex set of rules, computer code, algorithms, a bit of artificial intelligence to be able to tell and merge and deduplicate all this information into one single place.
Now, why is that important? The main problem I see a lot of you guys face when the fleet grows is just the data is all over the place.
It is so hard for you to take the data from one system, merge it with the data from another system because you have no ability to de-duplicate it. If you have passed this stage and maybe you spent some time, you took a week to de-duplicate it, you then enter into a different problem, and that is, this is not a one-off job.
You have to do it around the [00:05:00] clock. So I see a lot of systems, that upfront, they will deduplicate the data for you manually. But then a month later, six months later, a year later, you're back into the same problem. You bought a bunch of assets, and then again, duplicated assets appear.
This is the stage where I hope I haven't lost all of you... but this is just the beginning because everything I described could sound fairly simple, right?
Take four records, find some common denominator, and merge them into one.
But then we enter what in the computer world we call edge cases, the complex cases.
Let's give a few examples. What happens if, for example, you put a tracker on this asset, and then you remove the tracker from the asset? What would happen?
if, for example, we put a tracker, a GPS tracker on the asset, all of a sudden it's a GPS tracker that determines the location of that asset [00:06:00] in the system. And if you deduplicated it, in that case, you will have one record that represents this excavator and the data will come from the GPS tracker.
But the second you move this same GPS tracker physically to be on another asset, you have to know and manage carefully what's happening because you already merged these two records into one. And before you know it, if the system doesn't know how to handle it you will start getting engine hour reporting on the wrong asset because you haven't reversed this deduplication.
And that's a problem we're seeing regularly with various GPS systems, whether it's Samsara or Geotab or whatever you're using, because it's a logical problem.
There's a way to solve it, of course, but most systems are not doing a good job at that.
What happens if you, for example, by mistake gave the wrong name to the asset, which caused this two completely separate assets to get merged into a single [00:07:00] record. That's another problem that you need to know how to deal with.
What's happening if you have data from United Rentals, or Sunbelt, or some rental house, and you merge it also into your single pane of glass system? How does it know to handle it when the equipment doesn't actually have a name, so now you have to deduplicate it by serial number.
What happens if your name is not unique? I see a lot of companies calling it "Rental (space) 123", or just "Rental Excavator". All of a sudden, a system that doesn't know how to deduplicate it correctly will end up merging all the rental assets into a single one.
And I can go on and on, but really the key to deduplication is finding a partner that really understands the in and out of all these edge cases and complexity, this is not an easy problem to solve, which is potentially why many people haven't tried to simply [00:08:00] solve it at all.
You just end up with a system that when we log in, we see for the first time as we audit it and review it, we see a lot of need to scrub the data and clean the data.
Now, if you wonder what is the solution to that?
A couple of solutions. First, you can certainly do it on your own. Doing it on your own, the best advice I would give is try to be as organized and as clean in entering the data into your systems. Whomever is creating the equipment record in your ERP, whether it's Viewpoint or whatever you're using, I would highly recommend you have only one person in the entire team that has the ability to serialize, the way we call it, an asset.
And that person takes care to enter the correct asset name in the correct format. They take the extra minute to ensure that the serial number they put for the asset is correct and so on.
Second option is, of course, to outsource it, whether it's for solutions like Clue or [00:09:00] a vendor, that's another solution.
But in general, once you're going to hit a fleet, I'd say maybe 500 assets plus, but potentially in some companies it could be even fewer than that. This problem of knowing where the information from the assets come from and dealing with deduplication becomes something that's very important.
The last thing I would say about why is this important is in this era that we have more and more reliance on algorithms and data, for example, even for just generating preventive data, preventive maintenance - and AI - having clean, normalized data becomes really important, and you will end up having different tools for different things. Your ERP is not going away, so whatever information comes about the financial status of the asset, your repair and maintenance budget and depreciation and whatnot will come from the ERP, [00:10:00] but other pieces of data will come from other systems.
You will have telematics providing you the fault codes and information about the asset. You might have a GPS tracker providing you more granular GPS data. You might have the maintenance system obviously sending you all the maintenance data. So the ability to de duplicate and integrate various sources for the same physical object or asset is absolutely of importance.
So again, I gave you a fair warning that this topic is somewhat boring for most people. but it's highly complex as well.
The goal of this is just to give you a bit of understanding about what is deduplication, why is it important to consider it, and how to try to address it if your fleet is sizable enough and varied enough in terms of the sources of data.
If you have any questions, or if we can be of help, answering questions, [00:11:00] walking through edge cases or issues you're facing as you're trying to deal with the duplication of data, please don't hesitate to reach out. Thanks.