Monday, September 24, 2007
It's really been over ten years since I wrote “An Extended Standard for Robot Exclusion”? Wow …
Around the same time the IETF draft was being discussed, Sean “Captain Napalm” Connor [sic] proposed his own extension to the Robots Exclusion Protocol, which included Allow rules as well as regular expression syntax for rules, and new Robot-version, Visit-time, Request-rate, and Comment rules. Less than 100 of the sites I visited use rules unique to this spec.
Via email from Steve Smith, robots.txt Adventure
[As a small aside, I don't know why people insist on spelling my last name with an “O-R” instead of an “E-R”. It's not like I misppelled my own name on that page. Sigh. —Editor]
That's not the only place “An Extended Standard for Robot Exclusion” has been referenced—it's also mentioned in O'Reilly's HTTP: The Definitive Guide, but until Steve reminded me of it, I basically forgot about it. Understandable since the last time it was edited was November of 2002 (and even then, the previous time it was edited was six years earlier—it's old).
This probably means it's time once again to check the links and make sure they all work.
And maybe clean up the HTML while I'm at it.
Just as soon as I can reproduce that insipid Heisenbug.