Yesterday, while reviewing some logs I came across a curious entry in an Apache error log:
[Wed Apr 19 08:51:48.119666 2017] [core:error] [pid 29210] (36)File name too long: [client 18.104.22.168:40907] AH00036: access to /YesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForR esearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongReques tURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALo okAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurpos eWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisI sAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPu rposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWe AreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUs erAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreSca nningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyL ongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePlea seHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingI tOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTH XYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScann failed (filesystem path '[...]')
Formatted to plain English: Yes, this is a really long request URL but we are doing it on purpose. We are scanning for research purpose. Please have a look at the user agent. Thanks!
What does the user agent for this request have to say?
Here is the access log entry:
22.214.171.124 - - [19/Apr/2017:08:51:48 -0400] "GET [...] HTTP/1.1" 403 1471 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36 Scanning for research (researchscan.comsys.rwth-aachen.de)"
The website referenced in the user agent, researchscan.comsys.rwth-aachen.de, explains that this request is part of a research project at RWTH Aachen University in Germany and 126.96.36.199 is indeed a part of the university's network.
Interestingly, this is the first time I have seen such a request in any web server log (I have had occasion to look through more than a few). The Wayback Machine has an archive of the user agent page from 6 Dec 2015, so it seems that the research has been going on for at least a year. I checked for the YesThisIsAReallyLongRequestURL string in (a lot of) logs for about 10 different websites of varrying size and did not find any other instances. I wonder how they determine which sites to scan...