Update: Microsoft will be moving away from UserVoice sites on a product-by-product basis throughout the 2021 calendar year. We will leverage 1st party solutions for customer feedback. Learn more here.

Azure Cognitive Services

Customer Feedback & Ideas for Azure Cognitive Services

Share your ideas for making Cognitive Services and the accompanying APIs work better for the applications you develop.


Catch up on the latest News and Updates


Share your Ideas and Feedback

To share your ideas on how we can make Cognitive Services better, click one of the categories underneath "Give Feedback" located in the sidebar menu to access the forum.


Documentation

API documentation available here. Within, you'll find:

§  Getting started samples
§  API References
§  Testing Consoles

Using one or more of the APIs as a "Free" preview?  Be sure to read our Terms of Service.

Contact Support

UserVoice is intended for product feedback. If you need product support, please contact either: Azure support (https://azure.microsoft.com/en-us/support/plans/) or ask a question on stack overflow (https://stackoverflow.com/questions/tagged/microsoft-cognitive)


Become a Cloud Design Insider!

Join Cloud Design Insiders, and help shape the future of Cognitive Services! As an insider, you’ll speak with program managers, designers & researchers, see new designs and ideas, provide feedback through surveys, and try out prototypes. Take the short survey to join the Cloud Design Insiders now, and we’ll see you in the community.


  1. Generate accurate audio clip for each utterance

    Getting an audio clip for each utterance will make it possible to generate a basis for a human-labeled transcript for training a custom model. This will make it possible to gradually improve the recognition accuracy after every "session", by checking the transcription and the corresponding audio clip and fixing the text for incorrect transcriptions.

    Additionally the audio clip can be used as a live read-back of the original audio.

    6 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  2. Speaker diarization for more than 2 speakers

    Speaker diarization for more than 2 speakers.

    See this one: https://cognitive.uservoice.com/forums/555925-speaker-recognition/suggestions/34823824-add-support-for-speaker-diarization-for-untrained

    I dont feel this should be marked as resolved. Would expect support for at least 10 speakers. Additionally its currently really poor and switches between speaker 1 and 2 almost randomly. Please make this more intelligent. Its a deal breaker for us and I'm sure many others. Especially considering the google alternative can handle unlimited speakers and is far more accurate at identifying them.

    https://cloud.google.com/speech-to-text/docs/multiple-voices

    And no... expecting a sample to train it for each voice is not an option. We literally just need it to assign a number…

    6 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  3. Need the new metric to check the number of characters used for text to speech on the Azure portal

    It is needed to be able to check the number of characters used for text to speech.
    Under the metrics tab on the Azure portal, we can only see the number of requests that have been made.

    5 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  4. Audio Offset / Duration for Best Result on normalized words

    The JSON and/or result object needs to have the offset and duration of the whole normalized word.
    I've reviewed the JSON and it still doesn't solve the problem. I need to know the relationship of the DisplayText words to the Word Timings in the detail When the DisplayText outputs 007 and the Word Timings output "double" "oh" "seven" as 3 different words I don't know that 007 = those three words as there is no reference. There needs to be a display word reference to the audio word to track offset/duration of an underlying audio file. The only option that…

    3 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  5. Site banner when there is a known issue

    Twice now the Speech portal has been broken by the owning Product Group.

    Twice now I have wasted hours of my time as well as MS support personnel time trying to debug something only to find out that the portal (and associated APIs) were broken and it was known by the group.

    Twice now the fix has been weeks in the deploying so god knows how many other customer's time has been wasted.

    If you have a known issue that affects your customers, especially given the woeful error messaging on the portal, then please add a banner on the…

    2 votes

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  6. Improve the Speech Studio Text Editor.

    Being able to change the type, color, size and even highlighting the font with colors in the text editor, this would be very practical.

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  7. Dictionary function in Speech Studio to ignore words.

    Add the function of a dictionary to Speech Studio which allows to ignore or change the pronunciation of a word in the whole document, that is, when adding this word in the dictionary, it is not read regardless of whether it appears 100 times in the same document and not having to mark it one by one.

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  8. Add speech profiles in Speech Studio.

    Have the option of saving voice profiles for dialogues, and that these profiles include: voice, tone, rate, volume and intonation of the voice, so when you want to apply this profile, select the desired text and press the profile and that all the aforementioned values ​​apply.

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  9. Enterprise pricing for Speech to text and Speech to Text neural, would provide extension to the current pricing for large volume users.

    Enterprise pricing for Speech to text and Speech to Text neural, would provide extension to the current pricing for large volume users. We have clients that currently use hundreds of millions of characters using traditional data capture methods, and see the current pricing as not addressing their enterprise client market.

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  10. Javascript support for keyword recognition

    Not necessarily an idea, and please let me know if this is not the right place for this. But it would be great to have the Javascript SDK supporting Keyword Recognition (specifically Custom Keywords).

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  11. Actionable Error Messaging in Speech Portal

    When a Dataset upload fails the error messaging is literally "Failed" and clicking on the Dataset displays "Failed to upload data. Please check your data format and try to upload again."

    This is not actionable error messaging. I have checked the data multiple times. I have been uploading this data, with additions, using an automated process for a year without issue.

    Tell us why it failed. Give us a hint. I have 15,000 files and entries in the Trans.txt file. "It failed" is not useful information. Especially when it could easily be a problem server-side and Microsoft provides no validation…

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  12. I'd Like To Use C++ To Create An Exercise App With Voice Commands

    I have a degree in Exercise and Sports Science and just got into coding 8 months ago, and I now want to create an Exercise App that uses voice command to run the app. For example, I'd like for the user to be able to use the commands "What's today's workout?", "What's the first exercise?", "What's the next exercise?", and "I'm finished with the workout?". I've used C++ before for simple projects, but I've never used it to create an app with voice commands. I'd really like to start from scratch and have somebody guide me through a Teams meeting…

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  13. Speaking some letters such as "A" and "E" using the English neural voices sounds bad.

    The sound of the neural voices speaking some single letters such as would be done when speaking multiple choice test options does not match with other utterances by the same voice. One particular example of this is the letter "A" sounds very very short and also of a lower volume than "B" "C" and "D". It sticks out like a sore thumb especially since much of the rest of the utterances of words and sentences sounds so good. Compare the single letter utterances of Guy (neural) with Noah and you will find the latter are much more natural sounding and…

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  14. iOS Speech SDK: 'SPXDialogServiceConnector' class is missing

    With ref. to https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/860#issuecomment-726436315 raising it here.

    Missing Wrapper Class:
    Connection to Bot service using 'SPXDialogServiceConnector' class is unavailable in iOS Speech SDK.
    Note: It is available for Windows SDK and Android SDK.

    Alternative: Developers need to write their own Objective-C++ wrapper to utilize the core C++ SDK class.

    If it will be available natively from iOS Speech SDK, everyone won't have to write!!!

    And this is a need of time where SDK has more potential when we connect Speech to Bot service.

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  15. Azure TTS bug: <prosody rate="100%"> not handled correctly

    Problem you have encountered:
    <prosody rate="108%"> does not work as per the W3C spec for SSML.
    Neither does <prosody rate="100%">

    These result in the TTS being spoken at about twice normal rate - Which is not right.

    What you expected to happen:
    I expect the speaking rate to be DEFAULT with rate="100%", as the W3C spec your documentation references at: https://www.w3.org/TR/speech-synthesis/#S3.3.2
    literally says: "For example, a value of 100% means no change in speaking rate"

    However if instead we used: <prosody rate="+100%"> (With a '+') - THEN the speed should be doubled. The "+" and "-" are critical for relative…

    1 vote

    We're glad you're here

    Please sign in to leave feedback

    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Feedback and Knowledge Base