Project Synopsis
The project involved building a speech analytics engine that extracted meaningful insights after analyzing large volume of audio files. The solution needed to have the following features:
- High Accuracy
- Ability to transcribe speech from multiple speakers in a single audio file
- Mapping of insights related to different aspects of customer service.
Our Solution
Since speech recognition is a very research-oriented field, the project involved exploring many libraries, including CMU Sphinx, Julius, HTK, etc. We designed and implemented an end to end analytics service that extracted audio files along with their meta data, performed high accuracy transcription, enriched the results and extracted insights from it.
Our solution had the following features
- High Accuracy Transciption
- Ability to handle files of practically any size and type
- Ability to handle audio containing multiple speakers
Project Highlights
- Explored various libraries and tested them based on many metrics like overall accuracy, ability to recognize multiple speakers, ability to recognize tough words, etc.
- Built a tool that visualizes the difference between the original text and the transcribed text.