Has the British Library started scraping this site?

This topic was created by Vimes .

  1. Vimes

    Has the British Library started scraping this site?

    Log entries from the access log for my own site are included below, but you'll note that it's likely the only reason they came to my site was because of a posting here.

    This is the first time this has happened, even if I have posted other files located on my site before.

    So the British Library think that they are entitled to a copy of this website, as well as anything it links to Their own page effectively says they'll only respect robots.txt when they feel like it. And paid for content? They can force access there too

    (not that this is new admittedly - it's been like that since 2013 where the law is concerned apparently)

    Time to block that user agent and any remote host ending in *.bl.uk I think...

    crawler04.bl.uk - - [12/Dec/2016:02:46:15 -0700] "GET /robots.txt HTTP/1.0" 404 277 "-" "bl.uk_lddc_bot/3.3.0-LBS-2016-02 (+http://www.bl.uk/aboutus/legaldeposit/websites/websites/faqswebmaster/index.html)" patrick.seurre.com

    crawler04.bl.uk - - [12/Dec/2016:02:46:20 -0700] "GET /wp-content/uploads/2016/04/register_ad3.PNG HTTP/1.0" 200 168523 "http://m.forums.theregister.co.uk/forum/1/2016/04/05/Alistair_Hey_what_is_that_oddball_box_on_the_left/" "bl.uk_lddc_bot/3.3.0-LBS-2016-02 (+http://www.bl.uk/aboutus/legaldeposit/websites/websites/faqswebmaster/index.html)" patrick.seurre.com

    crawler04.bl.uk - - [12/Dec/2016:02:46:33 -0700] "GET /favicon.ico HTTP/1.0" 404 278 "https://patrick.seurre.com/wp-content/uploads/2016/04/register_ad3.PNG" "bl.uk_lddc_bot/3.3.0-LBS-2016-02 (+http://www.bl.uk/aboutus/legaldeposit/websites/websites/faqswebmaster/index.html)" patrick.seurre.com

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon