Tuning

Last updated
23 Jun 2023

Tuning, in the context of voice biometrics, refers to the process of adjusting the configuration and parameters of a voice biometric system to optimize its performance for a particular task or environment.


Voice biometric systems use machine learning algorithms trained on generic datasets to recognise and differentiate individual voices. Tuning in this context refers to optimising system performance by determining the optimal algorithm for comparison, identifying the optimal configuration of any audio pre-processing steps and augmenting the existing training with data from the actual operating environment to improve discrimination between speakers and optimise overall performance. Tuning generally involves several steps:

  1. Data Collection – Collect a representative sample of verification and enrollment utterances from the operating environment. This should contain multiple verification utterances for each speaker and ideally take place over a significant period to allow for natural variation in the speaker’s voices and channel usage.
  2. Training and Testing – Part of the data set is used to augment or, in some cases, replace the Voice Biometric model’s existing training, often producing a custom Background Model (BGM). This new model is evaluated using different sample data to understand its performance with a True User Impost Test (TUIT). Other parameters, such as different minimum enrollment and verification audio lengths and audio processing configurations, such as signal-to-noise ratio, are also evaluated to understand the impact on performance. This may also include evaluating different detective measures, such as synthetic speech detection.
  3. Decision – Test data is reviewed, and a decision is made on the optimal biometric threshold and other key configuration parameters based on the implementing organisation’s risk and performance objectives.
  4. Implementation – The new model and updated configuration are implemented into production, which may require retraining existing speakers against the new model using their original enrolment audio.

Related Terms

Popular Posts