Extract text in a readable manner for pages using landscape orientation in PDF
There is a known issue with PDFs that display certain pages with text oriented in a vertical manner. The issue is not that the text is not extracted, but rather that it is extracted in an unreadable manner since current extraction methods expect text to be horizontal. Unfortunately there is not currently a work around within Azure Search.
We would need to inspect thousands of documents and rotate the vertical/landscape pages of the PDF manually before it can work as expected.
Thank you for your feedback. We’re considering this for a future release of Azure Search.
Azure Search Product Team