Allow web crawler robots on a case by case basis
As per an Azure AD blog post:
"As part of our continuous effort to improve the security posture of applications that are published by Azure AD Application Proxy, we have started to block Web crawler robots from indexing and archiving your applications.
Every time a Web crawler robot tries to retrieve the robots settings for a published application, the proxy will reply with a robots.txt file that have the following content:
User-agent: * Disallow: /
No action is needed to turn this on. All Application Proxy customers will automatically get this functionality."
I am using AADAP within education (read: no money) to publish some public web pages. Excellent, it's amazing, I love it!
Until I found no search engines are indexing the sites.
Although I understand the security stance mentioned in the blog above, I would like the ability on a case by case basis to enable robots again, as this is actually functionality I would like and require in this scenario.
It will be quite a lot more expensive now for me to create other access methods to get the websites published, and that's hard in a very cost conscious environment.
Jayson Knight commented
Please allow us to do this, many companies use Azure Application Proxy for wildcard publishing, many of those sites are outward facing and NEED to be crawled by Google/etc. Kind of ridiculous that you would take control out of the hands of those who need to have external sites crawled. Now we'll have to configure an onsite reverse proxy, which means maintaining it/etc.