OCR Cognitive Skill for both printed and handwritten text
When parsing documents and images through the OCR cognitive skill, the 'handwritten' text extraction algorithm fails on printed documents and vice versa. This obviously isn't a bug, but it is an issue when indexing data dumps of both document types. It seems like a fix might be to have a small binary classifier model which can infer which model is appropriate for each document.
An alternative might be an easy method of flagging documents as handwritten or printed to handle them with the appropriate model.
Thank you for your feedback. While it is unlikely we’ll address this suggestion in the near future, we’ll reassess based on the number of votes it receives.
While we’re currently not planning on solving this out of the box, we are exploring a new cognitive services for custom document classification, that you could use to build your own classifier, and then wire it as a cognitive skill to your search pipeline. Feel free to reach out to us if you’re interested in exploring this further.
Azure Search Product Team