Beginner’s Guide to Voice Biometrics: Transforming Call Center Security and Customer Experience
Voice Biometrics uses the unique properties of a speaker’s voice to confirm their identity (authentication) or identify them from a group of known speakers (identification) for fraud prevention. Voice Biometrics technology and its application in contact centres have evolved rapidly over the last few years, and it is now an incredibly effective way to enhance security, and improve customer experience and efficiency.
From this video, you will learn about the fundamentals of Voice Biometrics, its advantages over traditional authentication methods, common use cases, future trends and considerations for implementation.
Matt is the author of “Unlock Your Call Centre: A proven way to upgrade security, efficiency and caller experience”, a book based on his more than a decade’s experience transforming the security processes of the world’s most customer-centric organisations.
Matt’s mission is to remove “Security Farce” from the call centre and all our lives. All organisations need to secure their call centre interactions, but very few do this effectively today. The processes and methods they use should deliver real security appropriate to the risk, with as little impact on the caller and agent experience as possible. Matt is an independent consultant engaged by end-users of the latest authentication and fraud prevention technologies. As a direct result of his guidance, his clients are some of the most innovative users of modern security technology and have the highest levels of customer adoption. He is currently leading the business design and implementation of modern security for multiple clients in the US and UK.Only available to signed-in members
[00:00:00] Matt Smallman: Hi, good afternoon everyone, and- and thank you very much for joining us. Uh, we have quite a lot of people on the call today, which is really great to see. So, my name is, uh, Matt Smallman. I’m the author of, “Unlock Your Call Center,” which is tactically positioned just here. Uh, and I’m also the founder of the Modern Security Community. My work is helping organizations to improve the usability, efficiency, and security of their call center security processes.
[00:00:23] But I’m also, [laughs], just really, really passionate about removing those time-consuming, frustrating, and often pointless, security processes that- that all of us experience on a, on a daily basis, uh, which is one of the reasons for- for starting the community. Uh, I’m gonna be joined this afternoon by, uh, Ian McGuire from Nuance Communications, who’s gonna be, uh, helping us, uh, present the Beginner’s Guide to Voice Biometrics. Um, and this has been a- a really, uh, interesting topic that lots… has seemed to have gathered an awful lot of interest, and it’s something that I do with with every client that I work with before we start any engagement, so I’m delighted to be able to bring it to the- the community session today.
[00:01:00] Before we kick off, though, I’d just like to introduce, uh, Ian, who should come up, uh, here very shortly. Good afternoon, Ian, thank you so much for joining us.
[00:01:07] Hi there-
[00:01:08] Uh, I- I- I-
[00:01:09] … no problem. Quite-
[00:01:10] … Ian and I first met, uh, almost a decade ago, uh, in Glasgow, and you might not be able to tell that from his accent, uh, when I was working for, uh, as a, for an end user client, uh, and, our orbits have continued to, um, cross, and paths have crossed over the last few years. Um, Ian, do you just wanna give yourself a- a quick introduction to the group?
[00:01:30] Ian McGuire:
[00:01:30] Ian McGuire: Yeah, no, thank you for that. Yeah, so,
[00:01:32] um, yeah, a decade ago Matt, uh-
[00:01:34] [laughs].
[00:01:34] … we were both less gray then, weren’t we? Um, yeah, so I’m a Fraud and Biometrics Specialist at Nuance, so I’ve been involved in just about every single, uh, voice biometric deployment that Nuance have done in the UK over the last decade, and, uh, quite a few in the Middle East, and in Africa and, uh, various other parts of the globe.
[00:01:53] Um, although predominantly I get involved in doing authentication solutions, [laughs], because in the UK we seem to have more than our fair share of fraud, um, I’ve, eh, de facto, I’ve become a bit of a fraud expert, so I tend to get pulled into talk about fraud and advise on counter fraud measures for other parts of the world who might not normally have the same levels as we have here, so don’t have the same levels of experience as we have, and, uh, like Matt, I share a passion for trying to… for this technology, and actually I too- when I used to manage developers, I used to say, uh, “Imagine it’s your mum,” you know, so whenever somebody’s using a system, “Imagine it’s your mum,” and you don’t wanna mess it up for your mum, you want it to make life easier for your mum.
[00:02:36] So my, I, um, my view on the world, if you like, is to try and make life easier for people like my mum. If my mum can get through fine, then I’m doing a good job, that’s my mantra.
[00:02:47] Matt Smallman: Th- thanks, [laughs], very much Ian. Uh, it just reminds you, m- my motivation is- is obvious- is slightly different to that, it’s to stop my mother-in-law complaining at me about it, but there we go-
[00:02:56] [laughs].
[00:02:56] … it’s a- a similar relationship.
[00:02:58] Well, my mum’s
[00:02:59] Ian McGuire: quite scary, so maybe we’ve similar sit- situation then.
[00:03:02] [laughs].
[00:03:03] Matt Smallman: Great, so, um, th- this afternoon’s session then i- is- is entitled, “A Beginner’s Guide to Voice Biometrics.” Um, we are gonna, we’re gonna start off right at the beginning with some fundamentals, and we’re- we’re gonna start unpeeling the onion, uh, to what voice biometrics is, and- and how it works beneath the kind of surface, because, er, to some degree this technology can seem like magic, but for those of us who are considering implementing it in our organizations, or helping other organizations deploy it, I think, it’s really important to have a level of understanding about what’s happening underneath the hood, because there are a variety of decisions and choices that need to get made when we’re implementing this technology.
[00:03:40] So, that’s really the purpose of this afternoon’s session. We’re gonna look at the fundamentals, we’re gonna talk about the challenges in call center security, and why voice biometrics helps address those? We’re gonna look at some use cases, and talk about the implications of those, uh, and then we’re gonna look at some implementation considerations. Uh, I would encourage you, if you have questions… as we go through the session to put them into the chat or Q&A feature. Uh, if you put them in the chat, then everyone else on the call can see whatever name you’ve, um, had assigned by Zoom, or you’ve- you’ve put into Zoom.
[00:04:11] If you use the Q&A feature, then only I can see your questions and answer. If the question’s a little bit more sensitive, and you want me to kind of position it differently, then please use the Q&A feature. We will save those up until we get to the penultimate section, so that we- we cover the majority of the ground before we go into those, but this is definitely be better if you’ve got specific questions that you want answered as we go through.
[00:04:32] Matt Smallman: So- so without further ado, let’s look at a bit of a, bit of a definition, very simply, I think, most people on this call will already understand this, but voice biometrics is using unique properties of a speaker’s voice to either confirm their identity, which is what we call, “Authentication,” or to identify them from a, a kn- a group of known speakers, which is what we call, “Identification,” or, and we will come back to the distinction, uh, between those two pieces later.
[00:04:55] Matt Smallman: So, that’s all very well, but what does that actually mean beneath the surface, uh, eh, and I find the easiest way… because for some reason, from about aged eight or nine, we all have this inherent understanding of how a fingerprint works, whether it’s Sherlock Holmes novels, period dramas, modern detective shows, or Hawaii Five-0, which is my son’s current favorite, for some reason the fingerprint and how it works i- it’s part of our innate understanding. So, let’s just look at that as an example, and then we’ll use that to build a framework to help us understand voice biometrics and how that operates in a little bit more detail.
[00:05:36] So, the first step, when the detective finds the murder weapon or appears on the crime scene, is that he needs to find or observe and locate the fingerprint. When they’ve done that, they need to dust it for, dust it… to put those, um, and put their sellotape on it, or as Hawaii Five-0’s UV-powered light, uh, and they then have, um, a representation of the fingerprint. It’s not a perfect version of the real person’s fingerprint, but it’s good enough, and then they need to use that, and they need to compare it with something, either with a list of existing suspects, or with a large database of, uh, known criminals.
[00:06:12] Eh, and when they’re doing that comparison, they’re not necessarily doing a comparison of every single photographic detail of the fingerprint… they’re looking for some key attributes like the positions of the- the center of the loop, like the number of, um, rings in, uh, in those loops, the br- the way in which those break up, uh, and those features were codified, uh, I think, by a- a Scotland Yard detective, uh, more than a century and a half ago, and there’re, uh, before, be- between 40 and 50 really key attributes, and that gives you some kind of index to look at the fingerprint.
[00:06:45] Once you’ve got that fingerprint, and I’ve compared it with my records, and I’ve identified somebody who I have reason to believe, um, was the murderer, or was at the crime scene, I can then investigate further, and it becomes part of the evidence, or it could become part of the evidence to prosecute that person, but it’s never gonna be the whole evidence, because there is still some risk that it is not that right person. It is still indicative, uh, and there are still some challenges with it, so when it gets to the court, it will be considered amongst all of the other evidence to decide if given the- the weight of the crime, whether it is sufficient in order to prosecute and to find that person guilty.
[00:07:22] So, I think, we all understand how a fingerprint works, and, er, how does that then apply to- to voice biometrics?
[00:07:28] Well, we have exactly the same step of processes going, and we’re gonna dig in things a little bit more detail, but I break them down in to these four steps, we need to observe, and fortunately in the case of the phones- phone and the call center, we have a great observation device built into the u- device that the customer’s already using. We need to extract the key features from that voice in order to compare with an existing voice that we have on file, uh, and given that comparison make a decision as to whether or not we’re comfortable enough to do what the customer is asking us to do at that time, or to take the action that we might want to take .
[00:08:01] Matt Smallman: So, we’re gonna look at each of those in a little bit more detail. We’re gonna start off with, uh, ob- observation then. So, with all these things, they’re never quite as simple as you might, uh, imagine. First, let’s look at this from the, from the start at the beginning, uh, the- the purple head on the, on the- the- the left of your screens, um, is the speaker, and on the right, the magnifying glass is where we’re extracting those features, that’s our call center, that’s our voice biometric application, that- that’s- that’s- that’s the key, that’s the feature extractor.
[00:08:30] In practice, that voice is going through a microphone, uh, and in today’s modern mobile phones, it may go through an amount of signal processing before doing that. That microphone is also picking up, um, elements of the real world that are taking place around you, it’s then transmitted over the cell network, which may change the way in which the- the sounds, uh, the signal is, um, processed. It’s then received in your corporate network or your, in your phone system or wi- by your, um, contact center provider, and again some more change might take place, before it finally arrives, uh, at the point to detect, at the point of features being extracted.
[00:09:03] Um, now I am really pleased to say that from when I started this work, and probably when- when- when I first met, uh, Ian more than a decade ago, this- this was a really challenging, uh, situation, because, um, the quality of the device being used was pretty poor, um, phone networks were pretty inter- were more intermittent than they are now… I still have some problems here, uh, and corporate networks took all sorts of shortcuts, but I- I don’t know Ian, in your most recent work, how- how have you found the, kind of the quality of audio that you’re getting into your voice biometric systems at the start?
[00:09:35] Ian McGuire: Well, it’s really interesting actually because when I… so, my background… I- I’ve been voice biometric solutions and speech recognition solutions for over 30 years, and I remember when I was looking at, um, systems in the ’90s, when the mobile phone was just starting, the quality of the mobile phone audio capture was initially for both recognition and voice biometric systems, and what’s happened in the last few years… really since the iPhone, I guess, uh, when the, er, phone, the- the smartphone really came around and became something that people relied upon in their day-to-day use, the quality of the components went up… the price went up as well-
[00:10:14] [laughs].
[00:10:14] … but the quality of the components went up, and the- the microphone in a modern, uh, iPhone, or Samsung, or whatever is a phenomenal bit of equipment, and as you say, it goes through some signal processing carried out within the device that will remove background noise… it will try it’s best to give you a very cl- clean, high-quality signal, and it’s probably now in a point where the signal that the smartphone, you know, your- your iPhone or whatever is capturing is better than the signal being transmitted across the telephone network, so the telephone network reduces the bandwidth to 8 Khz uh, whereas if you get it straight from source, from the mobile phone, it’s gonna be much higher fidelity.
[00:10:53] So, we definitely see that in the performance of both speech recognition and voice biometric systems that the quality of the signal captured is greater fidelity, and that means you’ve got more data… more accurate data to deal with when you’re processing later on. I think, the other thing I would just highlight is that your for voice biometrics the audio capture device is the microphone, and I think, the ubiquity of the microphone, and, you know, the relevant cheapness of a microphone… which allows it to be ubiquitousness, means that you can apply it everywhere, so we- we’re probably, most of us on this call are sitting in front of a laptop, which will have a microphone and speakers in front of it, our mobile phone has it, tablets have it.
[00:11:35] The microphone is ubiquitousness, and that makes it a great choice for, you know, voice biometrics.
[00:11:43] Yeah,
[00:11:43] Matt Smallman: no, a- absolutely, and I think those, um, the quality of Devices, and in fact you’re- you’re wearing AirPods, which was against my recommendations for these webinars, because they have a- a tendency to go, to go wrong, but I think we, just the- the multiplicity-
[00:11:55] [laughs].
[00:11:55] … of the different ways in which microphones and the quality of that audio, uh, get proce- uh, has certainly, uh, improved. Uh, I- I think, the only… the thing that I’m kinda most excited about is probably the move to services like, uh, HD Voice, that I know in the, in the US a number of carriers are starting to pickup, and whilst they’re really great from a consumer perspective, they are actually also really good from a voice biometrics perspective, because we’re not doing that kind of final, um, switch down to eight kilohertz when we send it off the device, so again, I’m- I’m excited about the potential of those, and what that might do for voice biometrics performance in the future.
[00:12:27] Yeah.
[00:12:28]
[00:12:28] Matt Smallman: If we move onto then to feature extraction, er, eh, I’m not a speech scientist and I, and I don’t want to dwelve in- delve into this- this too much, um, there are two real things that drive the variation in human speech, one are physical characteristics, that’s the- the build up of our body, and that’s very much driven by genetics, so the length of your vocal track, the size of your chest cavity, lung capacity, position of teeth, there are, there are 1,000’s of these, little permutations and combinations that influence the- the sounds that I make when I try to sink- s- when I try to say different words, and I’m struggling with those words, [laughs], today… apologies, um, and the rate of change of frequencies, et cetera, between those.
[00:13:05] When a, when a voice biometric system is looking at those features, it is not creating some measurement of my, um, mouth cavity size. It is not doing that, it’s looking at the impact of those things, um, in many cases, we don’t care what is drivving- driving those, uh, that variation, and i- in some of today’s latest machine learning, eh, uh, and, um, versions of this technology, um, in some cases we don’t even know, the system is making the decisions about the… which things are most differentiating, but the physical characteristics, and those being related to your genetics are important, because tho- for those people for which you share genetics are going to share some similar traits with you. So, they are not… whilst they may not be identical to you, they are going to be more similar to you than the population as a whole.
[00:13:51] And the same goes for these behavioral characteristics… these are things that you learn, uh, and they are some- and whilst they may not change as quickly as your physical characterist- or sorry, they are more likely to change quickly than your physical characteristics, uh, I do a lot of work with clients, uh, in- in the US, and I’ll often spend a- a week or two, uh, in the US every year, and my… what I’d like to think of as rather defiant- refined English accent, um, uh, picks on, picks up elements of whatever location I’ve been working in, and my- my family have struggled to- to recognize me when I come back through the door, [laughs].
[00:14:23] Fortunately it- it reverts back over time, but again, some of those things are learnt, they can, some can change quickly, and some take longer to change, and again, people for whom you share demographics and other similarities are gonna have similar attributes of those, but because of the multiplicity of these things that drive how your voice actually sounds, um, actually the- the comparison i- is- i- is fairly easy. I- I don’t know Ian, you- you always have the- the Glaswegian example in- in this case… I don’t know if you want to add anything e-
[00:14:50] Yeah, [laughs].
[00:14:50] Ian McGuire: … [laughs]?
[00:14:50] Well, it was interesting, I was thinking about you modifying your accent when you- you’re speaking with Americans and so on, um, because Americans love a Scottish accent, I just become-
[00:15:00] [laughs].
[00:15:00] … more Scottish when I’m in their company.
[00:15:01] [laughs].
[00:15:02] Um, and we, the- the- the- the example I tend to use with people is that, we get, we get used to what we hear frequently, right, so, um, my Glaswegian accent is maybe less common than many of the other accents people would hear, so imagine we’re on a conference call, uh, we might be say 20 people on it, as soon as I s- open my mouth and start talking, the Glasgow accent kicks in, and people think, “Well, that’s Ian.” Now that- that simple decision-making process only works if I’m the only Glaswegian on the call, you know, I, when w- I first met Matt, and, uh, he- he was working-
[00:15:37] [laughs].
[00:15:37] … up in Glasgow, we were lots of Glaswegians in a meeting there, so suddenly my accent was of no use at identifying me from the other individuals, so yes, things like accent can be important, but they’re not the be all and end all, and, uh, you have to… our systems are designed to pick out the features that are most relevant for that calling population to allow you to identify particular individuals… and being Scottish obviously helps, you know?
[00:16:04] [laughs].
[00:16:04] Matt Smallman: And- and I- I think, well, it- it’s- it’s important to make those distinctions, like there are always variations, uh, and as we think about how we make a decision later on, what we have to decide is whether those variations are big enough to be worth making a different decision, and we will look at that in- in just a second.
[00:16:21] Ian McGuire: I do think, oh sorry, I- I meant, uh, I meant to say that… you made a really, really good point actually that all of these, uh, things, like lung capacity, eh, eh, length of your vocal tract, all of these things are… they’re the source of the characteristic that we end up measuring… we’re not measuring those directly, so quite often we’ll get people asking, “Oh, can you tell, you know, how many teeth he’s got,” or, “What size his lung capacity is,” and the answer to that is, “No,” the system’s not designed to do that, it’s looking for the- the, uh, the side effects, if you like, the- the knock-on effects of all of those different physical and behavioral characteristics.
[00:16:56] It’s not measuring those per se, and, uh, I think, that’s really quite important to- to emphasize as you did earlier
[00:17:02] Matt Smallman: Matt.
[00:17:02] Al- although I do think, uh, and slightly as an aside, there- there is some interesting work taking place in the medical field right now to understand, uh, whether or not certain conditions will have detectable characteristics in peoples voice, and therefore whether that can be used-
[00:17:17] Yeah.
[00:17:17] … for detection. I- I think the jury’s still out quite a lot on- on- on those, but, uh, it is an interesting application.
[00:17:23] Matt Smallman: Just- just moving onto- to think about comparison then, so now we have the features, we need to compare them, er, and the first thing to be really clear about with comparison is, we need something to compare with, er, and this is what leads to the enrollment requirement for voice biometrics, whether that is enrolling a genuine customer, and creating a… what we often call, “A voiceprint,” for them, or whether that is creating a voiceprint for a known fraudster to add them to a watch list.
[00:17:50] We need to create that kind of source of base knowledge… the base truth about who this person is, and that enrollment process is something we’ve talked about previously on the Modern Security Community, and because it is so key to the success of voice biometrics, I’m sure we’ll some back to many times, but just looking first off at the- the mechanism for comparison, we- we often see these two different methods in use, and we’ll- we’ll talk about how these apply to use cases in a minute, but the first is text dependent, and that is comparing the say, the way in which I say the same thing each time, and typically that’ll be some form of static passphrase, or random number challenge, where we have recorded and extracted a voiceprint for those specific words on previous occasions.
[00:18:33] Uh, and the second is text independent, where we are s- listening to the way in which people generally speak, and I always forget if I’m wrong, but there, is there, is it 43 phonemes in the English language or something like that, um-
[00:18:45] Uh,
[00:18:46] Ian McGuire: pho- phonemes, that, yeah, phonemes rather-
[00:18:47] … phonemes, yeah. Yeah.
[00:18:48] … phonemes, yeah, there- I think, there’s 42, 43, something
[00:18:50] Matt Smallman: like that, yeah.
[00:18:51] So, it’s not necessarily the words they’re saying, it’s the sounds that make up those words, and a- as you can imagine, the difference between those two, i- is, i- is computational, so if I’m always saying the same thing, I’m always comparing it with the same thing, and that’s a slightly easier task than figuring out what I’m saying, and making sure I’m comparing it with the way in which I would normally say that kind of sound, so that’s the difference between text dependent, and text independent, but as we’ll see a bit later, um, those technologies have come on significantly, uh, in the, in the last two, three years, let alone the- the five, 10 years where I’ve been looking at this space.
[00:19:24] Matt Smallman: The second piece of comparison is to decide whether we’re really talking about authentication or identification, and that’s what we talked about right at the start. Authentication is a one-to-one comparison, it’s saying, “Does this speaker that I’m listening to now sound like the speaker they are claiming to be in my database,” and that- that’s one-to-one, and that’s- that’s quite a relatively easy computational, um, task, versus identification, which is figuring out which of the potentially many speakers in my database this particular user is claiming to be today, and that is a far more challenging and error-prone potentially task than authentication, uh, and it’s why we see, um, authentication being the- the- the main use case for this, and that in most, y- well sorry, in most processes, um, an identification for a user is claimed by some other means before we attempt to use voice biometrics to authenticate that individual.
[00:20:18] There are exceptions to that, and we- we will discuss the- the fraud prevention use cases, uh, later. I do- Ian, did you have any- you- you have any interesting identification examples that you’ve seen?
[00:20:29] Ian McGuire: Um, so we’re in the- the fraud use cases, by far and away the biggest one that we’re seeing, those are t- when you’re trying to reduce the amount of fraud, if you can improve authentication, if you can strengthen the front door, then you’ll automatically get fraud reduction benefits, so having good authentication will m- minimize fraud. There are some situations where that’s just not practical, so new account sign up as an example, or talking with organization- organizations where they don’t talk with their customers that frequently, so the rate of en- um, enrollment is going to be much lower… in those situations then you’ll still want to be able to do some counter fraud, and being able to compare, uh, the callers to watch lists, and that inevitably means a one-to-many comparison, those are the use cases where the one-to-many really comes into play.
[00:21:14] We’ve had a couple of use cases where, let’s say you might have, uh, multiple individuals that are associated with an account, then if that number is quite small, and small probably means less than 10, or maybe less than 20, then we could do an identification task as well as an authentication task to say, “Okay, the telephone number indicates that it’s… that’s the telephone account,” and there, we know there’s five people in that household, “Let’s see which of the five it is,” so there are use cases where you can mix the identification with authentication, but in the vast majority of situations, people make a claim to their identity.
[00:21:51] You know, it’s, [laughs], very, very rare that people pick up the phone, dial somebody, and just expect to be identified automatically. You know, if you’re phoning a close family friend, or relative-
[00:22:01] [laughs].
[00:22:01] … then they might do that, but by and large you’re gonna have to tell somebody, “Hi, it’s Ian McGuire here,” so that’s your claimed identity, and then we can compare against it, so the one-to-one authentication, um, not only is it more accurate, but actually it tends to be the way we operate in- in our lives anyway. We introduce ourselves, and then we expect to be verified after that.
[00:22:23] Cool.
[00:22:25] Matt Smallman: Gonna move onto think about the, uh, decision-making now. So, th- this is the point o- after which we have made that comparison, and- and the outcome of that comparison is not a, “Yes,” or, “No,” it is a probability, because there is variation in everything that we do all t- all the time, so, my voice sounds very different… well, it doesn’t sound very different, it sounds different now than it did maybe half an hour ago, and it might sound a l- quite a bit more different than it maybe did five years ago, uh, so, that comparison is a probability… the result of that comparison is a probability, not the deterministic outcome.
[00:22:59] And as an organization we need to decide how confident we need to be, but fortunately, and- and this a very generic example that you’ll see on the screen, the- the- the separation between genuine speakers and imposter speakers has improved dramatically over the last five to 10 years. Um, in- in this model, and- and this is no one’s model in particular, this is the one I use to help people understand this, say we have a score between nought and a 100 that represents the probability of this person being who they claim to be, um, as you can see, a 100 would effectively be, “This piece of audio, and this speaker sounds identical to the speaker who created the voiceprint,” and we never really sound identical, er, more than once.
[00:23:38] So, we can see that the ma- but the majority of people sound mostly like themselves most of the time, and that’s where you can see that peak of genuine speakers, that’s the frequency of people who score in that, uh, that level of score, and it’s pretty close to a 100, but there is a longtail of that down towards the zero line and at- at some point it intersects with the imposter line, uh, and that’s because we have colds, and colds do effect the way in which we speak. Uh, things happen to us, uh, people can have life-changing injuries, uh, people can be in loud and noisy environments when they’re attempting to authenticate.
[00:24:11] There’s all sorts of variation that may mean that they don’t score quite as highly on some occasions as they might otherwise do, uh, and that’s what creates this longtail of, uh, genuine speakers. Most imposters, most people trying to access that account on somebody else’s part sound nothing like the real person, uh, and they will tend to score very lowly, um, but again, there are, the- the closer those imposters are in terms of genetics, in terms of, uh, accent and upbringing and loc- locale, the more like the real person they start to sound, uh, and that’s where we see this kind of overlap in the center of our chart, oo, gotta press the right button, that’s where we see this overlap in the center of our chart where those two lines, um, intersect and- and cross each other.
[00:24:56] Matt Smallman: Now th- thankfully, like tho- the separation of those two has increased immensely over the years, and- and I didn’t used to have to kind of put this magnifying glass over the line, because the crossover was far more apparent than it is, uh, i- in this example. But at some point, we as an organization have to make a decision as to how comfort- how confident we- we need to be in order to do the thing the customer is asking us to do in the case of authentication, and we have to establish a pr- threshold probability, a threshold score where we are happy for, to consider that person a… to be who they claim to be, and beyond… underneath which we are not, uh, comfortable with that.
[00:25:32] And that creates these two error types. That creates the error type that we incorrectly reject the person who is who they claim to be, or that we correctly, we incorrectly accept the person who isn’t who they claim to be, and that’s called a false accept, and this- this is what we’re trying to manage, uh, in the voice biometric process, to try and get this down to the minimal possible level, and when we think about performance in the con- in the context of voice biometrics, what we usually mean is, the level, the minimizing the level of false accept for the smallest possible level of false reject, so we will have an appetite potentially for false accepts, or for risk, uh, and we will try to minimize the level of false reject associated with that.
[00:26:10] Uh, and- and again, for… over the f- last five to 10 years, um, for example, we would, we would often think of a threshold criteria being somewhere with kind of one percent false accept for five percent false reject, er, and we are now at decimal places of that, which, uh, is significantly better now, it will vary by everyone’s environment and- and the rest, but, um, tho- those things have dramatically improved.
[00:26:34] Ian McGuire: Just very quickly Matt-
[00:26:34] Go for it.
[00:26:34] … when, before you move on from that slide, the- there’s, um, in some ways and we in the voice biometric industry created a wee bit of a rod for our own back when we introduced all of this concept, because we felt the need to explain how good the technology was, and to convince customers, our, you know, people that are buying our solutions that it does the work, so we had to explain some of this maths and so on. The reality is that all ID&V systems have a false accept-
[00:27:01] Mm.
[00:27:01] … and false reject, right, but most organizations never measure it, or don’t truly understand it, so if somebody was to phone up a- a telephone system linked to a pin, claim to be me, and enter the pin from- from my account, they would get access, and that would actually be, um, a- a false accept, but it’s not measured in any way, uh, shape or form, and if I phone up and forget my password, that’s a false reject, because you’re rejecting a genuine customer, so all systems have a false accept and false reject, and, um, it’s, you know, the bringing it to light is actually a good thing, because it increases our understanding of the problem and the situation, but when we’re comparing it to knowledge-based systems, which tends to be what we’re replacing, people don’t actually understand what the true false accept, false reject is, or was for those systems, so it can make it quite difficult really, because you’re not always comparing like for like.
[00:27:54] Matt Smallman: Yeah, I th- uh, we- we will definitely get to this point in a second as well, I have a slide that we could use to- to illustrate that. I think that’s a really- really interesting point, because we- we don’t often think about it in the same way, and- and there are some good reasons for that, as well, because, um, when someone forgets their passwords, the kind of the, “Yes, we decided as an organization they should have a password,” but there is some tendency for that person to think it’s their fault, um, but when they are using-
[00:28:17] Yeah.
[00:28:17] … their voice, they are using their voice, and there is every- the- the tendency is to blame the organization that fails to recognize them in that point, so there are some important kind of psychological features there, but the- I- I think that’s really important-
[00:28:28] Absolutely, yep.
[00:28:28] … and when we’re doing this comparison, like, I- I, it’s not uncommon for me to go to an organization and find that more than, more than five percent… sometimes more than 10 percent of callers are being restricted, at least, the service they can offer, because we haven’t managed to get them through an authentication process, that is a false reject in this context, and in the context of voice biometrics we’re often talking about, uh, an order of magnitude, smaller proportion of the user base. Uh, I just wanna highlight this point again, between these- these phrases, because you will hear… if you’re gonna get involved in voice biometrics in your organization, you will hear these terms again and again, and again.
[00:29:04] So, it’s really important to try and make sure you understand them at this point. So, this is the management consultant’s two by two matrix that I use on… in all of these cases, [laughs]. Um, and on the bottom we have whether it’s a customer or an imposter, and on the left-hand side access, we have the, whether it’s rejected or accepted. So, we have the happy case situations, which is where a customer is accepted, and that’s the top left, and that’s a true accept, and we have the other happy case, where an imposter is rejected, which is a true reject, but always when we’re thinking about imposters though, we need to remember that not every caller is an imposter.
[00:29:38] In practice, very few of your callers are imposters, and- and even those imposters, even those who are imposters may not be malicious in their intent. Often, it is far easier for a loved one, or someone who has caring responsibilities for an individual to just pretend to be them to your organization, because there is no other way of accessing your services, or the process is quite challenging, uh, and therefore we do see not only malicious imposters, but a huge number of, uh, non-malicious imposters that might start to fall in that true reject category that might not previously have done.
[00:30:12] The two error cases then, we’ve talked about then, are clearly the customer false reject situation, where we incorrectly reject them, and remembering that there might be two to three th- hundred times more, um, real customers than there are imposters, or even 1,000 times more real customers than there are imposters, and on the right-hand side, the, uh, imposters being falsely accepted. When you add those buckets together then, what you can see is, if you, if you read left to right, in- in the reject bucket, whilst the boxes on my square are, the boxes on my square are the same size, of all of those people who fall out of the back of a biometric system who are rejected, um, the vast, vast majority of those will be genuine customers, uh, and I always encourage people to be very careful about their language, particularly because of those psychological aspects we mentioned at the start.
[00:31:04] I- I would refer those t- as a mismatch, they are not a failure. If you call that, “A failure,” or, uh, “An error,” or other types of negative language, then there is a tendency that your teams and your processes might start, might start to treat them, uh, differently, so I would always encourage you to think about that as a mismatch, and accept that a bunch of real customers are knowingly gonna fall up, fall into that- that group. It will be small, in terms of proportion of your total customer base, but don’t treat that group as if they are, they are, they are all bad guys.
[00:31:34] So that’s voice biometrics at a very high-level… it’s taken slightly longer than we might’ve hoped, but what I’m gonna look through now-
[00:31:40] [laughs].
[00:31:40] …
[00:31:40] Matt Smallman: is just some of those, uh, challenges we see in the call center. Um, this will be familiar to- to many of you, but I think, it’s a helpful, uh, analogy to- to- to go back through. We have seen for many years this trade-off between security and convenience, uh, and- and the diagram and the- the picture in the background is my favorite of all slides that I ever use. This bike is entirely perfectly secure… it is also entirely unusable, you cannot take that bike to the shops, just to nip out and-
[00:32:09] [laughs].
[00:32:09] … get some milk, but it is very secure. So, we always have this trade-off, yeah? If I wanted to make that slightly less secure, more convenient, I’d probably have to make it less secure, and we generally see this trade-off curve as- as- as you’ll see in the diagram, and- and historically as we’ve thought about building call centers, we’ve had to make choices like, th- there is a risk that somebody could claim to be somebody else, so we had to introduce some degree of security, uh, and we chose to do what was most convenient, stuff people remembers, knowledge-based authentication typically, um, but that itself, uh, and that’s reasonably convenient, but not very high-levels of security, particularly as the internet carries on and you can find out anything you ever wanted to know me, about me on LinkedIn or Facebook, and guess probably my cat’s name and its date of birth, and all sorts of things that might lead you to- to even a password, or at least like the static information you might need.
[00:32:58] So, many organizations recognize that that’s not as good as it could be, um, but what they did, uh, in order to, um, improve security, was in many cases trade off a bit of secur- a bit of convenience, uh, and- and here I’m thinking about pins and passwords largely… yes, they are more secure, because they are secret to some degree, although social engineering teaches us that they may not be as secure as- as we think they are, but they are significantly less convenient, both for us as an organization, and for our customers, because they need to remember those things. And in high-value use cases, er, we’ve found that those aren’t successful either, so we’ve traded off more convenience for a little bit more security.
[00:33:35] And I’m thinking here about things like SMS two-factor authentication or hardware-based authenticators, which are significantly less convenient but maybe provide us a little bit more, um, security. The- the security value however of these things, and the convenience… the perceived convenience of these is all reducing over time… if I can unlock my phone with my face without doing anything, then all of a sudden the pin or password that 15 years ago seemed perfectly acceptable suddenly I perceive that to be significantly less convenient, and at the same time fraudsters are using the kind of technologies that we use ourselves to improve efficiency in order to, er, erode the security value of these tokens.
[00:34:13] So, what we really need, and what voice biometrics provide us is- is a security to convenience trade-off that- that’s slightly different, so, we are moving the position and modern authentication, modern security methods move the position of this security convenience trade-off. We get higher levels of security at every level of convenience, and the difference here as well is that tunable threshold. I- I can choose where on that curve I want to operate and- and we may well come to this in another session, as we talk about tuning and calibrating voice biometric systems… this is almost exactly the curve that we show people to say, “Where do you want to operate,” for this particular type of transaction, “Where do you want to operate on the curve?”
[00:34:50] So, voice biometrics helps address those security and convenience challenges by giving us a new curve that we can dial up and down, and- and in practice they’re not nearly as comparable as this chart makes out, but I think it’s a helpful, um, illustration.
[00:35:03] Matt Smallman: The second point, uh, I want to make about this i- is related to efficiency, uh, and particularly, um, the discussion that- that Ian just raised about the- the reject rate and the performance of existing processes.
[00:35:16] This is the, uh, security path visualization that we use with- with all clients that you can find in the, in the book, uh, “Unlock Your Call Center,” and in fact we even have an online tool that you can use, and- and it’s incredibly helpful for helping organizations understand the performance of their security processes. It starts with a 100 percent of calls arriving on the left, going through an automatic identification stage, moving at the top to automatic authentication, and then through to self-service on that pink A path, which is our happiest path, which from an organization’s perspective requires no agent activity, and in fact may even av- avoid the need for a call altogether, as customers can have their needs met entirely in self-service.
[00:35:55] Every time we’re unable to do one of those steps, whether that be identification or authentication, then we start to incur some, we- we s- we lose the opportunity, first off, to, um, self-serve that customer, to keep them within automation, and we most often have to take that customer to an agent, uh, in order to carry out some form of manual process, so that’s our manual authentication processes in the C and E path, and again, each of those also has reject paths. The- the false rejects that, um, Ian mentions in- in this chart sum to, um, er, almost 12 percent, 12 percent of total call volume, having identified them.
[00:36:31] So we- we at least know that this person is a real customer. Lots of organizations also have calls from people who aren’t customers, who are not yet customers, who don’t have identification, um, but 12 percent of calls in this case were, um, rejected because they couldn’t be authenitica- they couldn’t be authenticated manually, and each of those calls took time, effort and money, probably drove, uh, the customer to maybe visit, er, some physical infrastructure, or most certainly to callback, maybe to write in, maybe even to leave your organization altogether. So, all of those had a cost far beyond just the, uh, the agent handle time on that interaction.
[00:37:04] So, when I think about call center security performance, I always come back to these key dimensions, usability, efficiency, and security, er, uh, and that’s what, where voice biometrics is really helping, it is quicker, there is often not a lot… nothing for the customer to do, they just need to speak, uh, and- and it’s done. It’s easier, there’s nothing for the customer to remember, their retention rates are higher, and importantly it’s more secure, and I- and- and
[00:37:29] Matt Smallman: Ian has a fantastic, uh, version of why it’s more secure. Even though you might have some doubts, based on what we talked about- about genetic similarities and all the rest of it, uh, I think you, I think you’ll- you’ll- you’ll love this explanation that- that Ian’s got for you.
[00:37:44] Ian McGuire: Yeah, so what I’ve tried to show in this, uh, little diagram is how the risk versus the ability to impersonate for different actors, if you like, so let’s imagine, uh, we have a close family member, so let’s imagine we have a- a customer, and, uh, she has a- a sister, and maybe even an identical twin sister, so you’ve got that close family member, so think about any sort of security mechanism, think about traditional knowledge-based security mechanisms, your twin sister, or your twin brother, or even your non-twin, uh, sibling, is likely to know things like your mother’s maiden name, because it’s the same as theirs, right?
[00:38:19] They’re gonna know your date of birth, because hopefully they give you a present. They’re gonna know what school you went to, because they probably went to the same one, et cetera, so their ability to impersonate you is high. Now, the good news is, that in most cases, family members tend to get on, so they represent a low-risk to you. The chances of your brother, um, or your sister stealing from you is pretty low, because hopefully, uh, your brother and sister like you, so, they represent a lower risk.
[00:38:45] Let- let’s consider next a- a random stranger, so let’s imagine our customer here, um, has dropped her purse in the, uh, street, okay, um, and some, uh, random stranger has found it… his ability to impersonate, you know, from knowledge-based questions is dictated by the knowledge he’s able to get from that purse, so it might be address, it might be date of birth, you know, it- it mentioned the driving license is in there, there’s a fair bit of information you can get from that. So, his ability to impersonate is restricted by the contents of the purse. Now, let’s- let’s imagine that human nature is generally good, so then his abil- his desire to commit, uh, fraud against you hopefully is quite low, but it’s gonna be more than a family member, and, you know, he might be down on his luck, he sees some, a purse with a- a credit card in it, he’s gonna use it for some contactless, uh, transactions, because he’s, uh, hard, ha- on hard times.
[00:39:37] [laughs].
[00:39:37] Let’s imagine then that we got a fraudster, these are the guys you really gotta worry about, because their- their business model, their way of living is to steal from other people, so they’re out to try and do harm to you, and they research their customers, they research their victims, so their ability to impersonate can be quite high, so they will do this scaling of LinkedIn and Facebook as Matt suggested, and suddenly they’ll know everything about Matt, and they’ll be able to impersonate Matt at a knowledge-based question level. So, we see family members on the left, uh, low-risk, high ability to authenticate through to fraudsters on the right at high-risk, but also fairly high ability to, uh, impersonate.
[00:40:19] You step forward Matt, just to show what happens when you introduce voice biometrics is that right across-the-board the ability to impersonate drops, even if you have an identical twin, their ability to impersonate you will drop dramatically. Their ability to impersonate you will be higher than a random stranger or a professional fraudster, but it’s still low, and that’s when… even when we have had news articles about, uh, identical twins being able to break into accounts, they don’t do it at will, they’ve had to take maybe, you know, 15, 20 attempts to be able to get through, so it’s not a trivial exercise even for an identical twin.
[00:40:59] So, what we’re seeing here is that the risk levels remain the same, because, you know, your family member’s still unlikely to steal from you, and professional fraudsters still want to steal from you, but their ability to impersonate you when you have voice biometrics in play drops dramatically, and that’s why you get much stronger security.
[00:41:17] Er,
[00:41:19] Matt Smallman: oo, and I think so, just wrong slide, but yeah, I think that’s- that’s really important. I would always use the phrase, “More secure,” rather than, “Perfectly secure.” There- there are no perfectly secure security systems, um, let’s just be clear-
[00:41:32] Absolutely, I-
[00:41:32] … about that right now, [laughs].
[00:41:34] … I
[00:41:34] Ian McGuire: will absolutely it- reiterate that, [laughs], yes.
[00:41:37] Matt Smallman: But that’s why voice biometrics is more secure than conventional knowledge-based traditional authentication methods.
[00:41:43] Matt Smallman: If we move on now just to finally look at those use cases for the contact center, um, tradition- oh, we have them broadly split into two categories, one of, “Fraud prevention,” and then, “Authentication.” Now, con- conscious that we’re- we’re- we’re running short on time, I think, there’s a trade-off between these two, um, that we need to… the vast majority of use case- and I would always say, “The best form of fraud prevention is strong authentication,” and with strong authentication you also get some of those usability and efficiency benefits that you might not be able to, um, derive from fraud prevention.
[00:42:17] But, there are a range of use cases, and a range of situations particularly where callers aren’t necessarily very frequent, and your fraudsters are very frequent, um, or where there may be challenges creating that enrollment, because of the type of use case, where fraud prevention technologies where we might have that identification feature, looking in watch lists or comparing between speakers to find bad actors could- could be really, really valuable.
[00:42:40] Matt Smallman: I- in the majority of cases, that would also be supplemented with authentication, and traditionally authentication is formed into two buckets and- and those who’ve been tracking this technology for many years will be familiar with the kind of, the automated systems that we might see in, uh, interactive voice response, where you typically have a passphrase or something similar, where following identification you are asked to repeat a passphrase to an automated system.
[00:43:08] Now- now those systems have some advantages, yeah? It all takes place in automation, uh, and therefore the customer can be retained in the automated system and go through to self-service, but they also have disadvantages, and- and- and Ian- Ian might be able to- to chip in with a few more of these, like because we have to get the customer to repeat stuff, we are often having to, um, interrupt the flow that they might otherwise have to be serviced, so it is often more challenging to get people to enroll, uh, these, “My voice is my password,” or other type, um, challenges that we see i- in- in these situations.
[00:43:39] I don’t know, Ian, do you wanna add anything more to that?
[00:43:42] I think
[00:43:42] Ian McGuire: the other, um, disadvantage to it is more, uh, psychological, um, the fact that people are saying the same passphrase o- uh, each time, it, that can have, uh, flags that they’re going through a security process, right, so somebody’s eavesdropping on them, and they know what’s happening and we were also very cognizant of the fact that on the early systems, one of the things we used to suggest was, you know, at bank X, “My voice is my password,” and we said, “Well, it’s a chance for you to advertise your brand.” It turns out customers hate that-
[00:44:12] [laughs].
[00:44:13] … they absolutely hate that, you know, they don’t like to be seen to be- being used to advertise other-
[00:44:18] [laughs].
[00:44:18] … brands, so, the, having something that’s innocuous, “My voice, my password,” is quite innocuous, was great, but people saying it, it makes them uncomfortable saying it in, uh, an open-plan office or whatever, so, there’s some psychological disadvantages to it, uh, uh, rather than technological disadvantages
[00:44:37] Matt Smallman: to it.
[00:44:37] I- I think, I think the other side is it’s- it’s also predictable, from a fraudster’s perspective, yeah? So, it- it’s a predicable-
[00:44:43] Yeah.
[00:44:43] … response, now there- there are mechanisms to prevent that being useful, but, um, but we’re not gonna cover today, we’ll- we’ll cover those in some later sessions, but it is predictable.
[00:44:51] But what I would, what
[00:44:52] Ian McGuire: I would say to that though is that what we see from all of the data is that accounts that have voice biometrics protecting them, whether it’s, uh, eh, text dependent, or text independent, suffer significantly less fraud and it- it appears that fraudsters just say, “Okay, I’ll skip that account and move onto the next one.”
[00:45:09] [laughs].
[00:45:09] They’re looking for a weakest link, and it’s not an account that has voice biometrics.
[00:45:13] Matt Smallman: Uh, and- and what, and we’re actually gonna cover that, uh, in our next session, [laughs].
[00:45:17] [laughs], okay.
[00:45:17]
[00:45:17] Matt Smallman: So, um, just- just thinking then about the- the other traditional use case, which has been this passive use case, and- and this is where Ian and I first met a decade ago, which is when we take, authentication is taking place in parallel with the agent conversation, and this is… we call it, “Passive,” because the, but neither the customer nor the agent need to do anything different, they just have the conversation, they get straight to the meat of the problem, the customer says what they’re calling for, they complete your normal identification step, or potentially even using a caller ID lookup, or ANI lookup, they- they can be identified, and then your authentication takes place i- in- in the background, and, um, the cust- and the agent is able to get hol- get on with servicing the customer.
[00:45:53] It has tremendous advantages in terms of that usability, it’s a fantastic customer experience when it works, um, but the challenge has been that you have to be speaking to an agent in order to have these conversations, so the whole opportunity, uh, to automate or to take care of the customer’s needs through some sort of self-service capability has already been negated by the time you get here, uh, and that’s historically been because of the amount of audio that was needed to do these text independent… if you remember from our previous slide, comparisons.
[00:46:21] We used to be talking 10 to 12 seconds of customer talking, uh, and what we call, “NET speech,” and that- that would often be, uh, “I would like to get the balance of my accounts,” uh, “And my name is Matt Smallman,” and a few other facts before I could actually get enough audio to make that comparison.
[00:46:37] Matt Smallman: Fortunately though, um, I think, we’re now in a, eh, in what I call a, “Hybrid,” but Ian’s preferred, um, phrase, is, “Passive-
[00:46:44] [laughs].
[00:46:44] … everywhere world,” where the technology has advanced to such a state that, um, whilst we still need to capture enrollment audio from that, from a longer set of utterances, often with an agent, um, actually the, kind of the two or three second utterances the customers will typically give us when explaining their reasons for calling, or during an identification process in an IVR are- are sufficient to authenticate them in- in this case. Uh, and- and it, and it’s for this reason, I think, we- we, [laughs], we had on our pre-caller agreement, general agreements, this is the, i- in most cases that… whilst there may be some exceptions, this- this would be our generally recommended, uh, implementation pattern, to enroll customers with agents, uh, and then to use those voiceprints for, passively in IVRs, in nat- with natural language understanding, in conversational, uh, AI.
[00:47:34] Ian, I don’t know if you have, er, disagree violently with that… we didn’t on the previous
[00:47:36] Ian McGuire: one?
[00:47:36] No, I don’t, I- I-
[00:47:38] [laughs].
[00:47:38] … I- I don’t disagree at all. I mean, when we first met Matt, all those years ago when I- I was less gray and I didn’t need glasses the… when we were talking to any client about voice biometrics, the first question we’d be asking is, “Do you want to be doing this with automation, or do you want to be doing this with the agent,” and that would drive you down a text dependent or text independent rout, and… so you’d have that very early decision point, and quite often customers wouldn’t know which one was best for their setup, and then you might actually spend quite a bit of time discussing that one point, and where now… the biggest change that’s happened in the voice biometrics world for- from a Nuance perspective is the amount of audio needed to do that verification.
[00:48:16] So, as you said, it was maybe, uh, you know, 12 to 15 seconds for, speaking with an agent, now we’re looking at, er, the- the most recent live data that I’ve seen from my deployment in North America is that 88 percent of the, uh, customers have authenticated, uh, after two seconds of NET audio, so a massive differention, or two seconds of NET audio is roundabout the same amount of audio needed to say, “My voice is my password.” So, we can take that passive algorithm and apply it everywhere, so we no longer have to ask the clients upfront, “Are you doing this with automation, or, are you doing this with agent,” that question disappears, we can just get straight into applying the technology, getting the voiceprint, and making that applicable in any channel where we can capture audio.
[00:49:02] Matt Smallman: A- a- and- and for me that, I mean, that… I call it the, “Penultimate frontier,” as opposed to, “The final frontier,” but that- that- that really does bring together the advantages of that passive enrollment process, which is far easier for customers to en- engage with, because it requires far less effort on their part, um, and therefore gets higher levels of adoption, and- and we’re not gonna necessarily talk about enrollment in detail on this call, but that- that is the main frustra- if- if we, if we don’t enroll anyone, we don’t get those voiceprints, then we can’t make the value, we can’t realize the value of this technology.
[00:49:31] So, this passive use case, I think, is the- the passive everywhere, hybrid use case is certainly the- the- the future from my perspective. I- I- I’m just interested in your thoughts that the final frontier for me however is to avoid the need to have an agent involved, uh, or a conversation involved in that enrollment, because-
[00:49:47] Yeah.
[00:49:47] … for many organizations, uh, a huge chunk of customers never a- uh, may never actually speak to an agent, but we still want the, uh, advantages of this technology, and I’m just interested, um, I- I’ve- I’ve been prodding your teams for years-
[00:49:59] [laughs].
[00:49:59] … on this particular issue, h- ho- how’s that advancing, [laughs]?
[00:50:00] Well, if you think about
[00:50:03] Ian McGuire: the… I mentioned the big advances in reducing the amount of audio needed to do the authentication, so removing it from that 15 seconds down to sort of two seconds, and so on, that- that’s been done, right, and we, our s- our research team like to talk about the fact that they can actually get results with as little as half a second of audio, and that’s kind of pointless, that’s showing off, right-
[00:50:24] [laughs].
[00:50:24] … there’s no- nobody speaks for as little, [laughs], as half a second, that’s a grunt, that’s not a meaningful interaction. So, you know, down to two seconds is good enough, right, there’s no need to go much beyond that. Where we’ll need to focus now is reducing the amount of audio needed to create an enrollment, and then being able to expand that so that we can enroll in different channels, now the most obvious channel… you’re right, your customers might entirely deal with the IVR and never need to speak to an agent, is to use the audio that we’re capturing in the IVR and to actionably use- use that to enroll, and we are at the point where we can do that, so if there’s a, sufficient audio in the automated system, and if it’s a speech-based system, then there could well be… if not on one call, then maybe in two calls, three calls and you can aggregate it over that.
[00:51:11] Ian McGuire: I think, the other area that’s really interesting to me is to look at the digital domain, right, so we know that a huge number of customers now are digital first. They want to use the mobile app, they want to go online, they want to interact via the website, so what we want to do is actually use those channels to capture enrollment audio, because what we know from those channels is that those are the convenience channels. People go the website, and they to the mobile app to do their day-to-day stuff. So, if I go to my… you know, I bank with First Direct, I use the mobile app all the time, if there’s something I can’t do in the app, well, I have to phone up the agent, and I’m a wee bit annoyed I’ve actually got to phone up the agent, because I wanted to do it in the app.
[00:51:51] Now, if I then have to go through a torturous ID&V process, I’m gonna get even more frustrated. Now fortunately, First Direct do have voice biometrics, so I get to this, you know, seamlessly, but if the customer has never spoken, and never had the opportunity to enroll, then they’ve gone from, uh, ha- their happy place… the mobile app or online to a less happy place, and then you inflict pain on them by- by giving them the torturous ID&V. If we can enroll customers in the mobile app, or enroll them from the website, then they can then be transferred to the agent as needed, and pass security seamlessly.
[00:52:26] Um, Fidelity in the US are probably one of the first com- uh, companies to look at doing this, and if you bank with Fidelity in the US, you can enroll via the website, you can enroll on the mobile app, so it’s about making it easier to get those customers to enroll, to then subsequently use the voice biometrics when we need to use it.
[00:52:45] But, I think, that was quite a long-winded answer there Matt, answer there
[00:52:46] That- that- that was, and it, yeah,
[00:52:47] Matt Smallman: we- we- we’ll get a-
[00:52:48] … on your timing.
[00:52:49] … we were gonna get to that issue, but you’ve already got us there. I th- I think the other advantage to talk about-
[00:52:53] [laughs].
[00:52:53] … in that situation is just the higher-quality audio that’s available from those places, because they- they’re skipping the phone network-
[00:52:57] Oh, definitely, yeah, yeah.
[00:52:58] … uh, and therefore we get better, better quality, better opportunities with tho- with those kind of services. I’m, just, uh, conscious of- of time, we- we’re gonna talk about implementation considerations now, and there have been a few questions that have come up, so we already… so Ian has already got ahead of himself, and talked about the digital aspects-
[00:53:14] [laughs].
[00:53:14]
[00:53:14] Matt Smallman: … of this implementation considerations… there’s a question about privacy, and we’re gonna talk about that in a second, uh, and then we may get a chance to go to advances in AI, but there’re a few other questions, so, um, we’ll probably prefer to- to answer those, so if we get rid of the slides and just go to a, to a, to a one-to-one conversation, so the- the first is around privacy, now, um, there is something about the voice being an inherently personal thing, yeah, “My voice is my voice,” er, if you like I, [laughs], now ironically we never sound like we think we sound, because the way in which we hear is through our bone structure, but it is very personal, um, and because of the unique, uh, and identifying nature of it, in many reg- regions, uh, in many countries, in many jurisdictions, there are specific privacy legislations that reply to, um, the use of biometric data, including voice biometric data for the purposes of identification or authentication.
[00:54:06] Now, um, I first invented systems before any of this technology, uh, eh, sorry, technology, before any of these regulations existed, uh, and that doesn’t mean that we were completely, uh, er, run roughshod over, uh, peoples rights, because these are people who are customers of our organization who have expectations of security and privacy, and responsibility, uh, for us, so I don’t think I’ve ever been involved in implementation where we would not ask, or at least tell the customer that we were doing this kind of thing. Uh, I used to require to, er, mark it on my slides as the, kind of the weird zone, like, all of a sudden I used to be asked for a pin and passwords, and now you don’t ask me for anything when I call, that’s just really disorientating from a customers’ perspective, um, altogether.
[00:54:48] Uh, I- I- I- my 100 percent advice, uh, other than to watch the previous session we’ve done on an- uh, 10 Best, 10 Top Tips-
[00:54:55] [laughs].
[00:54:55] … of enrollment, um, is to, is to do it from the customer’s perspective, “You are the customer, how would you like to go through this process, what would be the right thing, what would feel right from you as the customer’s perspective?” And 99 times out of a 100, when you subsequently go and check that that is compliant with the law, um, you will find that it is. There- there may be some minor additional things you need to do, there might be some specific disclaimers you need to make, there might be some secondary processes that are required, and again, we had, um, Douwe Korff on a previous session, uh, er, eh, eh, an esteemed privacy lawyer talking about some of those challenges, um, but I would always encourage you to design for the customer, not for the regulation, to design for the customer and then check that it is compliant with the regulations that apply in your jurisdiction, rather than go, um, rather than design for the regulation, or rather than, er, no- no offense to any lawyers, in my opinion, rather than use lawyers to design your process, use your customers to design your process.
[00:55:49] Um, so that- that’s the-
[00:55:50] Wise word, wise words.
[00:55:50] … the priv- privacy- privacy aspect. We did have, um, uh, well there’s- there’s one easy to, um, cover question, so, uh, o- o- on the call someone has asked, um,
[00:56:02] Matt Smallman: “What’s the difference between voice ID and speaker ID?” Um, and- and I th- Ian I don’t, you- you can take that one.
[00:56:10] Ian McGuire: [laughs], well, um, speaker ID is effectively an identification, I, er, er, if I’m interpreting what there is that the person who asked the question intended those terms to mean, is trying to identify who the speaker is, so the one-to-many comparison, whereas voice ID tends to be the one-to-one, “This person is claiming to be Matt Smallman, it it Matt Smallman?” Uh, whereas you’ve got, you know, 10 potential people it could be, “Here’s the sample of the voice, which one-
[00:56:37] Yeah.
[00:56:37] … is it,” that’s identifying the speaker.
[00:56:39] Matt Smallman: I- I- I think, I think in- in the, in the general scientific literature, you would see this refer- this whole area referred to as, “Speaker identification,” that is the, kind of the- the scientific topic, um, and that includes both authentication and identification use cases that we’ve talked about, um, that is different from speech recognition, which is what the things that people are saying, as opposed to how they speak. Um, but voice ID is in many ways just a- a brand name, a branding of the service in order to make it recognizable to customers. Uh, I can’t remember who first started the trend, but- but-
[00:57:14] I think it was HSBC.
[00:57:14] … was it, right?
[00:57:14] Yeah, and they-
[00:57:16] Ian McGuire: they made it the- you could say, “They made a mistake by not trademarking it,”-
[00:57:19] [laughs].
[00:57:19] … because Everyday now uses it, but, um, it’s- it’s quite convenient.
[00:57:19] I- I think I- I do actually think there was a trademark before
[00:57:24] Matt Smallman: that, but, um, er, eh, it seems to have become the accepted phrase for this, and it has reached some level of conscious understanding certainly in many jurisdictions that makes it helpful to latch onto, but it is just a form of speaker identification. Um, there are two other questions which I think were quite interesting as well, um, one was about,
[00:57:41] Matt Smallman: “How does the system handle voice, uh, authentication for, uh, trans people,” I think, which is a, which is a, is an issue, I know, for many organizations, and, um, diversity and inclusion agendas require us to do some bias testing in many cases, is it worth just covering that?
[00:57:57] Ian McGuire: Well the, it’s actually great for trans people, because the- the system doesn’t care about your gender in any way, shape or form, okay, it cares about, “Is this the voice that enrolled,” right, so it’s completely gender agnostic in that respect. The, we did have, um, uh, one of the banks that we deal with, uh, their company trans community, uh, raised this, and we did some work with them to assess the- the quality of the voice from enrollment to now, so I’m thinking, it worked fine, it worked really well. There are certain situations where, um, trans customers… there’s different levels that you can go through during the transition and the very end level… I can’t remember what’s the num- the terminology is for it, but they can have their vocal tract lengthened or shortened… very, very few people ever go as far as that, we did tend not to deem it necessary.
[00:58:52] That would fundamentally change your voiceprint, and you would have to re-enroll, but that’s an exceptional edge case, for the vast majority of trans people, it’s actually beneficial, because you remove that issue about the agent thinking, “Wait a minute, this says it’s Miss Smith, but it sounds more like a Mr. Smith,” and the agent is using their unconscious bias to maybe alter their decision. If they’ve, if the voice biometrics system says, “It really is Matt Smallman, or it is really is Ian McGuire,” then they can go with that, so, it actually removes that unconscious bias that might come from a human being, making some attributed- uh, a- attributions to-
[00:59:29] Matt Smallman: I’m- I’m very conscious Ian that we are, we are, we are over time today, so, uh, I, thank you so much for the questions that we’ve had, and for everyone whose been- been with us today. I- I just need to highlight to you that we have two more events coming up over the next month or so, uh, related to this topic in our Voice Biometrics season, so on- on May the 4th, um, I’ll be covering, um, voice biometrics vulnerabilities and how to mitigate those in a bit more depth, uh, and then on May the 25th, we’ll be looking at the very topical subject of, “How to counter deep fakes and the challenges that synthetic voices raise to voice biometrics systems,” so thank you very much for joining us this afternoon everyone, and thank you Ian for- for your contribution-
[01:00:07] … okay.
[01:00:08] … um, and we look forward to seeing you at those events, uh, in the future. Thank you very much.
[01:00:13] Ian McGuire: Thanks a lot, bye- bye.