High accuracy from short utterances
The VoiceVault voice biometric technology is able to achieve high accuracy, speed, quality, predictability, and testability in the verification of a persons' identity that is based on a series of short (text dependent) utterances. For the amount of text that we use (typically of 1-5s duration) the accuracy of our technology is unparalleled.
The accuracy that we are able to achieve is largely irrespective of the background / environment noise; length of the utterance; variability of the channel; 'state' of the individual (health, narcotic intake etc). These are benefits that arise from the 'battlefield / military' provenance of our biometric technology. In other words, the challenging environments that the technology was originally designed to perform in now provide significant benefits in the non-military world.
The accuracy of the verification process is in part determined by the volume of datum points that are measured for a given speech sample which is then used in the comparison with the enrolment voice print. 42 measurements are made on speech samples every 20ms.
Exclusive Focus on Voice Biometrics
Our technology (our science) is mature and is our sole focus. We are not engaged in any activity other than voice biometrics. The technology is based on a single biometric engine that has been in continuous development since 1988.
The amount of speech data that has been used to test and refine the engine over the years is based on approximately 5000,000 enrolments that equates to 1TB of data. This amount of data throughput is reflected in the algorithm development and the number inventions and patents that have been filed by our scientists. In all, 160+ man-years of development work have gone into our core biometric engine. This has leveraged core mathematics and programming skills in the USA and UK.
Text Dependent System
One of the key differentiators in voice biometric systems stems from the type of speech samples they use, and how those samples are acquired from users. In one type of system, the words or phrases needed to produce a biometric voiceprint are system-defined and specific. These systems are text dependent. In the other, there are no limitations on the phrases that can be used to generate a voiceprint. These systems are text independent.
VoiceVault is text dependent system. In such a system, the phrases that the user is prompted to say are derived from a small vocabulary with known phonetic content, normally digits, that is tailored to capture as many aspects of the person's vocal tract as possible. The phrases are derived from an understanding of the words and phrases in a particular language that expose the most vocal tract characteristics. This ensures that the resulting voiceprint is able to reflect as much of the users' voice biometrics as possible which in turn has significant impact on system accuracy.
Detection of Recordings
Biometric systems, as with all security systems, are vulnerable to fraudster and imposter attack. In a voice biometrics system these attacks are generally based on one or more recordings of a genuine caller.
Various methodologies may be used for obtaining recordings that might be used in a fraudster attack. They may have been made in one or more phishing / social engineering calls where callers are prompted to say phrases that are known to be used in the biometric system. Or the caller may have been clandestinely recorded during the enrolment or verification processes.
The fraudster can then use the recordings 'as made' or modify them in an attempt to circumvent record-detect algorithms. These modifications may take the form of slowing down or speeding up the speech sample; altering the frequency or pitch of the speech; or even adding noise or harmonics to the recorded voice. Furthermore, in order to make up for the lack of certain required phrases that it may not have been possible to record, the fraudster may splice together fragments of recordings to synthesise 'new' phrases.
The challenges associated with the detection of fraudster attacks based on recordings are that the audio that is presented to the biometric system (that may be based on recordings) must be processed, in real time, with minimal loss of performance, either in user-experience or in false reject rate (FRR).
Standards and Accreditation
VoiceVault is accredited and certified to ISO 27001 - the internationally recognised information security standard.
The ISO 27001 standard sets out best practice control requirements for:
- Security policy and organization of information security
- Asset management
- Physical, environmental and human resources security
- Communications and operations management
- Access control
- Information systems acquisition, development and maintenance
- Information security incident management
- Business continuity management
In addition, VoiceVault is the first and only voice verification provider to be certified and accredited to issue Advanced Electronic Signatures (voice digital certificates). This certification enables VoiceVault to act as a trusted third party in transactions between a company and its’ customers.
The legal framework for Advanced Digital Signatures was agreed by the European Parliament and Council when it adopted Directive 1999/93/EC - an EU wide framework for electronic signatures. Since then, individual member states of the EU have implemented legislation that gives legal recognition to electronic signatures.
Language Support
VoiceVault voice biometric technology is text dependent but language independent.
In a VoiceVault language pack, language-specific word choices are made that best expose the vocal tract characteristics for that language. Specific, tuned voice models are then used to identify and separate individual words. The resulting data sets are then used to support and improve the accuracy and predictability of the scores for that language.
The use of a language pack enables a system to be tuned for a particular country / language. It is also inherent in the design of the language packs that they continue to refine the accuracy of the system as more and more users of the pack enrol and verify themselves using their speech.
Simple Enrolment and Verification Processes
A caller needs to enrol with a voice biometric system before they can verify themselves using their voice. A business process that identifies the caller as being eligible for voice verification and inviting them to enrol typically triggers the enrolment process. This involves establishing ground truth in relation to the callers’ identity.
The act of enrolment varies depending on the mode of operation of the voice biometric system, but in a text dependent challenge response mode it will consist of the caller repeating back to the system a number of prompted, short numerical phrases, for example “0579”. Between 6 and 12 such phrases are sufficient to achieve very high accuracy levels. The voice biometric system transforms these speech samples into a voiceprint that can subsequently be used to verify the caller.
Once a user has enrolled with a voice biometric system they can be verified against their enrolment voiceprint.
During the verification process, the caller will be asked to provide a form of claimed identity, which can be arbitrary but is typically an account or card number, before being asked to provide a speech sample. In a text dependent challenge response mode this speech sample will be one or more of the short numerical phrases used at enrolment time. Many variables will go into determining how many such phrases need to be spoken, including accuracy requirements, liveness testing, user experience, and so forth.
The voice biometric system will process the speech sample against the enrolment voiceprint in order to determine whether the caller definitely is; definitely is not; or could possibly be the claimed speaker.
|