Making Data Better
Making Data Better is a podcast about data quality and the impact it has on how we protect, manage, and use the digital data critical to our lives. Through conversation and examination, George Peabody and Stephen Wilson look at data's role in risk management, at use cases like identification, lending, age verification, healthcare, and more personal concerns. Privacy and data ownership are topics, as are our data failures.
Making Data Better
EP2: Data in our digital lives: Many moving parts | 01
Online trust? Digital identification? Wallets? Privacy? Data quality? If these topics resonate, join Lockstep's George Peabody and Steve Wilson in this, the first episode of Making Data Better, a podcast about data quality and the impact it has on how we protect, manage, and use the digital data critical to our lives.
We focus on making data better because data quality has everything to do with making good decisions. We introduce the critical and complex concerns that affect online trust and risk management.
One area where this really applies is in assessing risk. We believe data quality has an outsized impact on risk management. From relatively trivial decisions about where to go out to eat, to opening new bank accounts and sending money cross-border, each carries risk. Do we trust the data? Does it let us know who or what is on the other end of our digital interaction?
There are so many moving parts to understand, evaluate, and discuss. We’ll unpack all this with experts from fintech, cyber, healthcare, government services, and academia.
So take a listen and join us on this journey.
Speaker 2: Yeah, you bet, george. Thank you Great to be working with you. We've been working together for what about a decade or more? Something like that? Yep, so these days I'm an analyst and consultant and innovator working in what we call digital identity, but the theme of this podcast is going to be how that term is is breaking down. But let's say, i'm a computer scientist by training and I've been consulting and doing, i think, interesting cross-disciplinary work with lawyers and academics and practitioners and healthcare people for a long time Yeah, 25 years in the field of a digital identity now.
Speaker 1: I'm a payment strategy consultant by well career in the last 20 years working for several firms. The last one I was partner at Glenbrook Partners. Marvelous company, lovely team. Real to be working with you, steve, at Lockstep and on this and in this area. And obviously the payments area intersects with the problem of data quality in a big way.
Speaker 2: And I don't want to be reductionist about this and I get accused of being a bit too dry and a bit too clinical sometimes. Hopefully, a podcast is a good medium to be a bit more sort of humane. All of this stuff boils down to data, doesn't it? Digital identity and payments and digital health, and e-voting is every way into this, but it all comes down to data. And how good is that data? And why is the data so crappy? What are we going to do to make it better?
Speaker 1: Yeah, well, and that's you're setting the context for the podcast. We're going to be focused on making data better, because that data quality really has everything to do with making good decisions. One area where this really applies is in assessing risk, and we believe that that data quality has an outsized impact on risk management, and we're going to define that in the really broadest sense. But think about it between figuring out where to go out to eat and open a new bank account, each carries a certain level of risk. Obviously making a bad choice because of reading an online review that steered you wrong. Well, that's just the cost of a meal that you didn't enjoy. Having someone open a bank account with your name and your credentials wholly different level of risk, and that's got everything to do with data trust. Do we trust what we're using and seeing? Does it let us know who or what's on the other side of our digital interaction? And that knowledge is really essential. So this space is complex, sometimes confusing, and it has so many moving parts. Stephen and I decided that in this in, rather than setting the context across one episode, we're going to do it across two And we're going to dive into most, if not all, of them during the course of this podcast And our expectation this podcast is going to run for a very long time.
Speaker 1: A couple of things that I want to say before I start asking Steve questions. One of the things that's really intriguing to me about this area is that as we talk about technology, of course, that for a lot of us from a tech background that really fills up our screen Others from government, regulators, policymakers that intersection of tech and policy, it can get clouded. Our judgment can get clouded on both sides because of our own human intuition about that, creates assumptions, about facts, and we hope, through the course of our discussions on this podcast and hopefully you're listening that we'll be able to surface how human intuition can actually steer us wrong with respect to the use of data and data quality online. So let's get started. We're going to lay out some thoughts on some of the more pressing issues And so, steve, let's jump in, but before we do and I just asked some questions you'll ask me some too.
Speaker 1: Before we do that, i want to begin with a question. I love to ask folks and I've been doing it for a while now And what was the strangest, most unexpected or startling use of your digital data that you've encountered. I call this question your personal data tail. So what's your data tail?
Speaker 2: That's a good one, isn't it? This one's kind of mundane And luckily there's no terrible data crime or identity mishap, but it is about intuitions, like what you just said. So I set up my business about 15 years ago. Small business And like any small business person, what do you do? You open a bank account and you go to the branch where you do all of your normal banking, all of my personal banking. So I did that. I opened a lockstep bank account And in Australia, i guess, just like the US, you have to do KYC, show some business and corporation papers and set up a bank account that is controlled by this guy called Steve Wilson And the bank account holder is lockstepping And a year later I get paid by check.
Speaker 2: It's happened about five times And I took the check to the bank, handed it to the teller And she said to me which account do you want this to go into? And I said, well, the lockstep account. And she said, no, i can see a visa account, i can see a personal checking account, i can see a mortgage account, and it was an amazing dislocation. It was a cognitive problem for me because I was there, i'd walked into the bank in the identity of lockstep And all of a sudden, i had this meeting of worlds where the woman was going to put these things into my personal account.
Speaker 2: And I'm not a lawyer, george. We're going to talk to some lawyers in the course of this podcast, thank God. But what I do know is that lockstep is the holder of that bank account, not Steve Wilson. And so there was this very strange. It was a multi-personality. It was a real cognitive disruption, not to be too weird about it, but it honestly felt that I had been violated, that the bank had formed a single view of customer And they had merged in some sense, that they had exposed my banking details across two completely distinct identities, and that was so weird. It was like an unintended consequence of single view of customer. So I get it, but gee, it was weird, it felt really wrong.
Speaker 1: That's that know your customer process, and the distinction between you as a corporation and you as an individual from their risk management point of view is a very thin line indeed. So they were more than happy to conflate the two.
Speaker 2: And we are good at that. as human beings, we do this seamlessly. We can hold multiple identities in our head. But when you give this to computers and they screw it up, it doesn't work. It's very messy.
Speaker 1: And that'll be a theme, of course too that our intuitions, what we want, what we are, assumptions they don't translate to the digital realm at all.
Speaker 2: What about you? What's your data, George?
Speaker 1: I think this is one that most people will relate to. I'm addicted to Amazon, apple and Google. They're listening to me half the time And I've subscribed. I've got Google hardware in the house, i've got Google TV. So it's a little disconcerting when I have a conversation with my wife and Nancy And within a day or so I'm starting to see ads that are on television that are broadly relevant to that conversation. And of course, there's that My daughter has complained of the same thing.
Speaker 1: And then you know, i'm also a big critic of ad tech, not just for how it's the business model of the network of the internet, but it's just not very good. I guess my other, my true data tail is I get so irritated when I'm at a website and I'm seeing ads for something I've already bought And those continue to linger for a month. It's just, it's just annoying. How can it be so bad And so many really, really bright people working in an industry that can't do better and hasn't done better, all right, so let's, let's move on from our day to tails of whoa And to let me ask you, let's get started with. I want to ask you what do you think are the health and most impactful challenges we have to make with respect to making data better. Well, we know that's big one there for us, buddy.
Speaker 2: Yeah well, we're going to aim high on this series, aren't we? This is big stuff, so this is as big as society, isn't it? We say that data is the most important resource of the digital economy, and of course, you put digital on the front of economy. It's probably redundant. I mean, the digital economy is the economy these days. Now, if data is the most important resource that's going around, when are we going to start to treat it with the sort of respect that it deserves?
Speaker 2: We use data in an entirely ad hoc manner. I think there's a handful of professionals maybe statisticians that treat data seriously, but the rest of us don't give it a moment's thought about where it's come from. What's our contribution to the data? how can you even tell? And now this is going to hit us like a freight train, because we've already seen, you know, the deep fakes and the famous Tom Cruise and the famous pictures of a president under arrest. We know that pretty soon we're not going to be able to trust the evidence of our own eyes online, and that is serious. But much more, much more. Every day, credit card numbers are flying around. As he said, we have our stuff stolen behind our back and abused and used without our knowledge.
Speaker 2: And another data is supposed to be as important as clean drinking water, and we just don't have any systems or structures in place or rules yet that treat it as such. So I think that that's my starting point and working down from that.
Speaker 1: Just to underline one piece of that is the how data quality has such a big impact on the AI systems that we're using to make decisions about people. So if the data set is basically built and curated by a bunch of young white men, it's generally not going to be favorable to a black applicant for a loan.
Speaker 2: Yeah well, somebody who works and lives and commutes in the burbs. they get into their self-driving car that's been trained on the manicured streets of Silicon Valley And you take it out into a rural town and it doesn't know what it's doing.
Speaker 1: Well, thank God, we don't have that true problem yet The other one we do. All right, what else would be on your mind?
Speaker 2: Well, working down from that, we've had tools. We've had security tools and privacy laws around the world now for a long time. The security industry is old. I see emerging a blending of cybersecurity and data privacy into a bigger sense of data protection, and one of the slogans on our lockstep website these days is that we are into data protection writ large, and by that we mean what is it that makes data valuable And how are you going to safeguard those properties? So it's kind of a positive orientation. Security is really defensive. It's about confidentiality and integrity and availability and keeping the bad guys out.
Speaker 2: I think we collectively, the whole digital industry, needs to think more about what it is about data that's valuable. So the originality of data, the authorship of data, your point about training AI it's like where does the training data come from And is it biased And who's responsible for it? Who's licensed it? Fantastic intellectual property issues at the moment around artwork and literature and who owns that and is OpenAI entitled to pick this up, even though it's just in the public domain? So this comes back to what makes that stuff valuable. If we're training an AI on artwork, clearly that's valuable. So how do you protect that? How do we even think about and talk about and measure and digitize, like how do you measure the value of data and how do you protect that?
Speaker 2: So I see this merging. We've got some fantastic tools. We're going to talk about verifiable credentials. We're going to talk about infrastructure. So how do you plumb data so that you know where it's come from, what's its originality, has it been tampered with? as it moves through the economy and it gets value added and processed and transformed, how do you tell the story of that journey that data has gone through? And that's what makes data valuable And I see our tools being used to protect that. So maybe data engineering. We think about reticulated water. Clean water depends on professions and certificates and standards and clean pipes and good metal work and all of that stuff metaphorically contributes to data as well. So let's think about that.
Speaker 1: The story of the data itself, the story of the attribute, that metadata being able to have that and being able to have that and use that as an input to making a decision. So, as individuals listening to this, i know we really can't get away from the topic of privacy or set better. as individuals talking about data, privacy comes up And I joked earlier that I don't know I've got any, given my, as I said, my addiction to Amazon, apple and Google. That's a huge concern for us as citizens. The management where it lives, how it's collected and shared. It's a big concern.
Speaker 2: And, yeah, do you know what's going on? Are you prepared intellectually and in terms of behaviors, are you prepared as an individual to know what's going on with your data and what can you do about it? I think that one of the tragedies at the moment is that we are victim blaming all the time. We are saying that people are too promiscuous online. We blame kids for going out on Facebook and telling stuff about themselves. Absolutely, that is such a massive distraction.
Speaker 2: The world's worst data breaches occur at badly administered databases. We've had one in Australia recently where over 50% of the adult population has had their driver's licenses and Medicare numbers and residential addresses stolen. Now, that's through no fault of their own. In fact, i hate to use the old cliché of the older generation, but your grandmother might never have gone on the internet in her life, but if she's used a credit card and she's opened a bank account, then her details are in a database somewhere and they're going to be breached. It doesn't matter what she does. So I'm really really concerned about this victim blaming. We're not thinking intelligently about how data flows. It's not to do with promiscuity all the time. So what can you do as an individual, what should you do, as an individual, to take charge, and where's the dividing line between personal responsibility and civic accountability?
Speaker 1: And I don't think there aren't very many tools that have been put into the hands of individuals in order to be able to manage their data. They have to be. I mean, I know most of the folks listening to this are all technically sophisticated and they're making choices about whether to use ad blockers and password managers and all the rest of those tools that are available, But they're still hard to use and often you lose some of the experience that comes into your experience And they're brittle, they break easily.
Speaker 2: Imagine a car safety was done entirely on an aftermarket basis and we had to install our own seat belts, or if airbags were optional. We need to treat this much more seriously, as a matter of public safety.
Speaker 1: I'm an admirer of the European Union and their its ability to herd the cats, and it's a GDPR general data protection regulation. They just slapped a $1.3 billion fine on Meta for hauling the data about EU citizens back to the US and even sharing it with US stakeholders. So someone's out there taking this seriously? Indian Valley has obviously a huge focus on technology and that technology is the answer, is the cure, and I know you and I agree that regulation nothing like a regulation to move market and it has a real role here, and this we'll be talking about later on. We think regulators can also expand their vision as to what they can do and what they ought to do with respect to data.
Speaker 1: Okay, here's another one I think is a really interesting data ownership. Well, meta, google, they have all this data about me. Who's data is it? Is it mine? Is it there? Is it because I'm using their tools and they were free to me? Yeah, pretty sure that in the Ts and Ts that I didn't read terms and conditions that I didn't read it says it's our data, it's the corporate data.
Speaker 2: I think, without being cynical, what answer do you want? The reason why we asked the question about data ownership is that I think that there's an intuition that if we did own our data, then we would have more control over it and we would get better privacy as an outcome. But wow, that's the intuition trap. Yeah, exactly, let's just work through that. How much data about me is flowing behind my back, sight unseen? Do I really want to see all of that and own all of that and have a sense of saying every single thing We fixate on the abuses of data, but so much of this data flowing behind our backs is in our interest.
Speaker 2: I mean, if I've had some complicated hospital encounters over time, you spend a couple of days in hospital and you're generating test results and many different specialists are seeing you and referrals are going back to your GP, your family doctor, insurance claims are made. There's a ton of stuff going on in my interest and I trust the system by and large. I mean, i'm well and healthy and I'm lucky demographic, i'm lucky by birth, but wow, there's so much data flowing. That's in my interest and I just need to trust the system.
Speaker 1: I was just going to say that a lot of that data is put in place by the system to facilitate those interactions between you and the system.
Speaker 2: Yeah, so good research question. I would like to have an assistant work this out. But how much data in terms of megabytes about me is produced entirely behind my back, entirely automatically by systems, and I bet that it's 80 or 90%. So how do I own that? And the other thing about intellectual property is that if I'm a data scientist and I've come up with a really clever algorithm, a good algorithm for predicting health outcomes based on our digital breadcrumbs, i might do a PhD on this thing. it might be really, really smart, groundbreaking work.
Speaker 2: Who owns that? I mean, as the author of the algorithm, i'd kind of hope that I had a say in that. And yet if the algorithm is generating data about individuals, then they absolutely should have a say in that and they should be protected in what happens with that personal data that's synthesized about them. But it's not as simple as anybody owning it. So the final thing that I would just say on that point is that, again, i'm not a lawyer, but most privacy law that I come across around the world, including GDPR, it doesn't even use the word own. So the GDPR, the fearsome $1.5 billion fine against Meta, this serious stuff in the word ownership does not appear anywhere in the GDPR. So you can get good privacy outcomes assuming that we think that the fine was a good thing But you can get good privacy outcomes without worrying about who owns the data. So that's a paradox, that's counterintuitive.
Speaker 1: So let's move on to some use cases. This one I'm going to take because I want to talk about the use case that really leaps to mind first for me, and I'm going to explicitly refer to it as the use case of identification. This is the process that I need to take to know who I am dealing with online, similarly how I identify myself to someone on the other side of the transaction who might be taking.
Speaker 2: Hang on. You want to identify the merchant that you're doing business with and not just the other way around.
Speaker 1: I know it's a radical idea, steve, but there are such things as phony websites And I know it's a radical idea, but, yeah, i think it's fair for both parties to have a pretty good idea about who they are working with. And that's the identification use case. It's been addressed by the identity industry, which gives us sets of tools for different risk areas as well as for specific use cases. But there are a lot of different use cases or subcategories within identification that are specific to the transaction context. So healthcare has its own sets of risks and own sets of identification needs.
Speaker 1: Obviously, government benefits One it's kind of sad and pitiful, i bring this one up. But if you think about university and professional credentials and the sharing of those and the knowledge and the certainty about those things, we happen to have an idiot congressman in the US who was elected based on the fact that he just said he went to this university and was that on that sports team, and none of that's true. Wouldn't it be interesting if we had the ability to validate those identifiers that are being presented, those attributes that are being claimed? So identification is a huge one. And then and I think for me this frames a lot of my lockstep work And that's around risk ownership in a transaction. I have a lot of sympathy in a digital transaction for the party that's taking on the risk.
Speaker 2: Now, that would be who typically This would be like an merchant when you spin your credit card.
Speaker 1: That's absolutely. The merchant's got a bunch of tools that they use, some from the identity industry, some from the payments industry. There's even not just technical tools, but there's also regulatory ones. One of the things that the merchant gets at the physical point of sale is when it says approved on the terminal after you tap your card or your phone. that's actually a signal to the merchant that they have a guarantee that they are going to get paid Unless something gets wrong. and then there's some liability rules around that that help on satisfy the parties involved. So to say, it's not just technology. So there's the merchant. If I'm using one of these modern, fast payment systems, steve, and I'm sending money for the first time to someone, This is where you can use your phone number as the destination.
Speaker 2: I'm going to pay somebody to buy their phone number in my app.
Speaker 1: Exactly? And what if I have never sent money to that individual And maybe I don't know that individual that well? By rule of these payment systems, these fast payment systems, i send the money to that phone number or to that email address. Generally speaking, the rules of these systems say when I hit submit, it's on my head. If I send it to the wrong party, the wrong amount, the banks and the system in place that facilitates this it's not responsible for making me whole. These are authorized push payment systems And one of the promises that a lot of folks are. Of course, as human beings, we've always been subject to scammers, and scammers love these tools. I mean, what could be better than that? Yeah, haymond fraud. Hey.
Speaker 2: Mom fraud. I mean this. Why me that money? Yeah, i just need $2,000 on this. Somebody else's phone number please?
Speaker 1: Exactly, and the account holders, financial institution or the system, the third party system that provides this service isn't obligated, by terms and conditions of my relationship, to reverse those charges, unlike the card system, which one of its attributes is the promise of being able to unwind and charge back a fraudulent transaction, for example.
Speaker 2: And to me, i just love the fact that security is more than just the cryptography and the code and the ones and zeros. I think your average bank account holder would feel that security means the system is going to look after them if, through no fault of their own, they pull victim to something that results from the weakness in the system. And I think a broader view of security is what we're looking for here. It's about the rules. I find that for that reason that the credit card system is more secure than the authorized push payment system, in a sense, and that's got nothing to do with the cryptography or the encryption or the end to end, whatever, it's got to do with the rules.
Speaker 1: I will push back on that. I do think there are situations where the sender who has been defrauded could very much legitimately own that liability and that the financial institution or the operator doesn't have to make them whole. That said, I think that better use of data to communicate to the sender for example, the history of a mobile number or the history of an email address and the underlying account don't need that information, But some signal about, hey, what's the behavior of that destination account?
Speaker 2: It all comes down to data. It all comes down to the data and the quality of the data that's used to make those decisions, those fateful decisions to send a couple of thousand bucks to a phone number.
Speaker 1: Right, well, look, steven, let's leave it there for now And we will return in the next segment or the next episode. I think we're going to call this one many moving parts, part one, and the next one will be recalled many moving parts, part two, and then I think after that we'll just throw up our hands and admit that it's way complex and we're going to have a. We need to have a lot of episodes to take to the picture.
Speaker 2: Reasons to be cheerful, my friend. We will find other people to talk to, and this won't all be doom and gloom, but there are some lights on the horizon, bright, shining lights on the horizon, well, optimistic.
Speaker 1: I am too. I am too, and also knowing full well it's going to take a lot of work. Amen to that. All right, well with that. Thank you all for listening. We really appreciate your time and attention, that we know how valuable it is. By all means, please go to lockstepcomau, and if you'd like to read more about what Lockstep does, steve and I have a model that we've put in place that we think is a way for data quality to be curated in a more efficient way than has been put out there in the world thus far. But this podcast is not about that. This podcast is really about stimulating the conversation around those moving parts that we keep talking about and how, as industry and government, come together. I think that's where you're talking about the bright lights, steve. Indeed, all right, well with that. again, thanks very much. We'll see you next time.
Speaker 2: Cheers, George. Here's to better data.