Exam AI-100 topic 1 question 45 discussion

Actual exam question from Microsoft's AI-100

Question #: 45
Topic #: 1

You are designing a real-time speech-to-text AI feature for an Android mobile app. The feature will stream data to the Speech service.
You need to recommend which audio format to use to serialize the audio. The solution must minimize the amount of data transferred to the cloud.
What should you recommend?

A. MP3
B. WAV/PCM
C. MP4a

Show Suggested Answer

Suggested Answer: B 🗳️
Currently, only the following configuration is supported:
Audio samples in PCM format, one channel, 16 bits per sample, 8000 or 16000 samples per second (16000 or 32000 bytes per second), two block align (16 bit including padding for a sample).
Reference:
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-audio-input-streams

by Cornholioz at Feb. 15, 2021, 4:36 a.m.

Comments

Submit Cancel

rveney

1 year, 12 months ago

the recommended audio format to use for serializing the audio in order to minimize data transfer is: A. MP3.

upvoted 1 times

...

berserkguts

4 years ago

this was in the AI-100 exam i took today, May 31

upvoted 1 times

...

vhx

4 years, 2 months ago

Now, Speech-to-Text supports "audio/wav" and "audio/mp3" as output formats via the format parameter.

upvoted 2 times

lollo1234

4 years ago

Compressed audio such as mp3 is only supported through GStreamer, which decompresses the files before sending them so it won't be of any benefit. https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams?tabs=debian&pivots=programming-language-csharp#speech-sdk-version-required-for-compressed-audio-input "The Speech SDK and Speech CLI can accept compressed audio formats using GStreamer. GStreamer decompresses the audio before it is sent over the wire to the Speech service as raw PCM."

upvoted 2 times

...

Cornholioz

4 years, 4 months ago

Correct. Although it says to minimize amount of data which means that a compressed audio format such as mp3 is preferred, only PCM (WAV) is supported by Azure Speech Services. https://dejanstojanovic.net/aspnet/2019/january/mp3-to-text-using-azure-cognitive-services/

upvoted 3 times

...