Update: Microsoft will be moving away from UserVoice sites on a product-by-product basis throughout the 2021 calendar year. We will leverage 1st party solutions for customer feedback. Learn more here.

Azure Cognitive Services

Customer Feedback & Ideas for Azure Cognitive Services

Share your ideas for making Cognitive Services and the accompanying APIs work better for the applications you develop.


Catch up on the latest News and Updates


Share your Ideas and Feedback

To share your ideas on how we can make Cognitive Services better, click one of the categories underneath "Give Feedback" located in the sidebar menu to access the forum.


Documentation

API documentation available here. Within, you'll find:

§  Getting started samples
§  API References
§  Testing Consoles

Using one or more of the APIs as a "Free" preview?  Be sure to read our Terms of Service.

Contact Support

UserVoice is intended for product feedback. If you need product support, please contact either: Azure support (https://azure.microsoft.com/en-us/support/plans/) or ask a question on stack overflow (https://stackoverflow.com/questions/tagged/microsoft-cognitive)


Become a Cloud Design Insider!

Join Cloud Design Insiders, and help shape the future of Cognitive Services! As an insider, you’ll speak with program managers, designers & researchers, see new designs and ideas, provide feedback through surveys, and try out prototypes. Take the short survey to join the Cloud Design Insiders now, and we’ll see you in the community.


  1. Speaker diarization for more than 2 speakers

    Speaker diarization for more than 2 speakers.

    See this one: https://cognitive.uservoice.com/forums/555925-speaker-recognition/suggestions/34823824-add-support-for-speaker-diarization-for-untrained

    I dont feel this should be marked as resolved. Would expect support for at least 10 speakers. Additionally its currently really poor and switches between speaker 1 and 2 almost randomly. Please make this more intelligent. Its a deal breaker for us and I'm sure many others. Especially considering the google alternative can handle unlimited speakers and is far more accurate at identifying them.

    https://cloud.google.com/speech-to-text/docs/multiple-voices

    And no... expecting a sample to train it for each voice is not an option. We literally just need it to assign a number…

    8 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  2. Generate accurate audio clip for each utterance

    Getting an audio clip for each utterance will make it possible to generate a basis for a human-labeled transcript for training a custom model. This will make it possible to gradually improve the recognition accuracy after every "session", by checking the transcription and the corresponding audio clip and fixing the text for incorrect transcriptions.

    Additionally the audio clip can be used as a live read-back of the original audio.

    6 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  3. Need the new metric to check the number of characters used for text to speech on the Azure portal

    It is needed to be able to check the number of characters used for text to speech.
    Under the metrics tab on the Azure portal, we can only see the number of requests that have been made.

    6 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  4. Improve the Speech Studio Text Editor.

    Being able to change the type, color, size and even highlighting the font with colors in the text editor, this would be very practical.

    4 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  5. Add speech profiles in Speech Studio.

    Have the option of saving voice profiles for dialogues, and that these profiles include: voice, tone, rate, volume and intonation of the voice, so when you want to apply this profile, select the desired text and press the profile and that all the aforementioned values ​​apply.

    4 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  6. Dictionary function in Speech Studio to ignore words.

    Add the function of a dictionary to Speech Studio which allows to ignore or change the pronunciation of a word in the whole document, that is, when adding this word in the dictionary, it is not read regardless of whether it appears 100 times in the same document and not having to mark it one by one.

    3 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  7. Audio Offset / Duration for Best Result on normalized words

    The JSON and/or result object needs to have the offset and duration of the whole normalized word.
    I've reviewed the JSON and it still doesn't solve the problem. I need to know the relationship of the DisplayText words to the Word Timings in the detail When the DisplayText outputs 007 and the Word Timings output "double" "oh" "seven" as 3 different words I don't know that 007 = those three words as there is no reference. There needs to be a display word reference to the audio word to track offset/duration of an underlying audio file. The only option that…

    3 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  8. Actionable Error Messaging in Speech Portal

    When a Dataset upload fails the error messaging is literally "Failed" and clicking on the Dataset displays "Failed to upload data. Please check your data format and try to upload again."

    This is not actionable error messaging. I have checked the data multiple times. I have been uploading this data, with additions, using an automated process for a year without issue.

    Tell us why it failed. Give us a hint. I have 15,000 files and entries in the Trans.txt file. "It failed" is not useful information. Especially when it could easily be a problem server-side and Microsoft provides no validation…

    3 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  9. Site banner when there is a known issue

    Twice now the Speech portal has been broken by the owning Product Group.

    Twice now I have wasted hours of my time as well as MS support personnel time trying to debug something only to find out that the portal (and associated APIs) were broken and it was known by the group.

    Twice now the fix has been weeks in the deploying so god knows how many other customer's time has been wasted.

    If you have a known issue that affects your customers, especially given the woeful error messaging on the portal, then please add a banner on the…

    2 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  10. Include Sanskrit language. It’s the most computer friendly language. It would help translate epics & ancient knowledge of highest order.

    I am surprized not to see sanskrit language as one of the supported languages here. Include Sanskrit language. It’s the most computer friendly language. It would help translate epics & ancient knowledge of highest order. Meghadutam, sham Veda, and many complex literatures easy.

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  11. Azure Custom Keyword in spanish

    Now there are only support for English and Chinese in Azure Custom Keyword. It would be great to add support for spanish users.

    Thanks

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  12. Need Heartbeat alert every several seconds to notice that network is still fine

    Sometimes when using Continuous recognition, after 1 hour or certain time, network will stuck or broke somehow.
    Since client side don't know about this, still sending binary data to Azure but actually the network already crashed somehow but client side don't know about this.
    If there will be a heartbeat every several seconds, will be better to know the network is fine or client can do a reconnection.

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  13. mstts:backgroundaudio SSML tag is not working on SDK

    SSML tried -
    <speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-IN"> <mstts:backgroundaudio src="https://cdn.yellowmessenger.com/3Ix9bm4Blriv1620110811289.wav" volume="0.3" fadein="3000" fadeout="4000"/><voice name="Microsoft Server Speech Text to Speech Voice (en-IN, NeerjaNeural)"><prosody rate="-10.00%" volume="+30.00%" contour="(27%, -13%) (49%, +7%) (73%, -11%)">Hi, Good Morning. Welcome! </prosody><prosody rate="-10.00%" volume="+30.00%" contour="(13%, -7%) (26%, +12%) (60%, -14%) (76%, +15%) (90%, -6%)">I can help you answer your queries regarding place an order, locate your order and do much more.</prosody><prosody rate="+10.00%" volume="+30.00%" contour="(18%, -6%) (40%, +12%) (64%, -10%)"> Please note that this call might be recorded for internal quality and training purposes.</prosody><prosody volume="+30.00%" contour="(41%, +6%) (74%, -26%)">Let us get started</prosody></voice></speak>

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  14. Allow log files to be retreived through the API by the transcription uuid only

    I have been able to retrieve the logs via the API however due to the log name format being of the form

    2021-03-241546289ac72a58-5c3d-4778-aede-e2eaae32982bwav

    ​it makes retrieving a specific log file difficult. The transcription uuid

    9ac72a58-5c3d-4778-aede-e2eaae32982b

    is retrievable from the SDK however the timestamp

    2021-03-24_154628

    is not. Would it be possible for the API to be modified so that the files are retrievable using the transcription id only?

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  15. Add ability to reuse pronunciations in Speech Studio

    We're using Speech Studio to generate phone menus. One common phoneme customization we have to do is for our company name. Every new file we create that has our company name somewhere in it, we have to copy/paste the phoneme from an already-completed File.

    It would be nice if we could create a library of reusable custom pronunciations, so this was a bit less manual.

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  16. Remove requirement to type "DELETE" to delete a file in Speech Studio

    A simple confirmation popup is fine. Having to type delete is excessive when using the tool to do create content.

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  17. Enterprise pricing for Speech to text and Speech to Text neural, would provide extension to the current pricing for large volume users.

    Enterprise pricing for Speech to text and Speech to Text neural, would provide extension to the current pricing for large volume users. We have clients that currently use hundreds of millions of characters using traditional data capture methods, and see the current pricing as not addressing their enterprise client market.

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  18. Javascript support for keyword recognition

    Not necessarily an idea, and please let me know if this is not the right place for this. But it would be great to have the Javascript SDK supporting Keyword Recognition (specifically Custom Keywords).

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  19. I'd Like To Use C++ To Create An Exercise App With Voice Commands

    I have a degree in Exercise and Sports Science and just got into coding 8 months ago, and I now want to create an Exercise App that uses voice command to run the app. For example, I'd like for the user to be able to use the commands "What's today's workout?", "What's the first exercise?", "What's the next exercise?", and "I'm finished with the workout?". I've used C++ before for simple projects, but I've never used it to create an app with voice commands. I'd really like to start from scratch and have somebody guide me through a Teams meeting…

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  20. Speaking some letters such as "A" and "E" using the English neural voices sounds bad.

    The sound of the neural voices speaking some single letters such as would be done when speaking multiple choice test options does not match with other utterances by the same voice. One particular example of this is the letter "A" sounds very very short and also of a lower volume than "B" "C" and "D". It sticks out like a sore thumb especially since much of the rest of the utterances of words and sentences sounds so good. Compare the single letter utterances of Guy (neural) with Noah and you will find the latter are much more natural sounding and…

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1
  • Don't see your idea?

Feedback and Knowledge Base