What is Robots.txt, Purpose and it's Usage:
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
It works like this: whenever a robot wants to visits a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
There are two important considerations when using /robots.txt:
The solution appears not working in DXA 2.0. Is there a plan for DXA 2.0?
Quite nice one :)
I guess one way to do it would be to have logic in your template that checks if the current TargetType "IsPreviewCapable" - this will only work if you're using Web 8+ _AND_ if your staging environment is configured to use SessionPreview... but it's probably a good start. Then use this in your template to determine which Robots.txt to render.
In a robots.txt file with multiple user-agent directives, each disallow or allow rule only applies to the "user-agent"(s) specified in that particular line break-separated set. If the file contains a rule that applies to more than one user-agent, a crawler will only pay attention to (and follow the directives in) the most specific group of instructions and also we can do blocking a specific web crawler from a specific folder by specifying different "User-agent"s.
but even am also not pretty sure how to handle depending on different Target types like (stage and Live)... but will investigate , try to get info and post you sooner.
This seems pretty straight-forward, but what if you need to have a different robots.txt depending on target? For instance, I may want to disallow crawling on my staging site, but allow on my live site?