You may, or may not, have seen an interesting article about one chap deciding to save a load of BBC websites before the BBC delete them in the name of “saving money” (something people are dubious about as the BBC do have form for deleting important and interesting stuff and only asking questions later.
But how did he do it for so little money?
Well, our first guess would be that this little beauty – HTTrack – was responsible for the actual mirroring work.
HTTrack (or WinHTTrack, as the Windows version is known), is a free software offline browser. That is to say, it downloads a website to your hard drive, adjusts all the links so it doesn’t need the web server, and lets you view it offline.
Sounds simple, and perhaps in this day and age – where we’re no longer paying for Internet connectivity by the second and desperately trying to squeeze every last byte out of a wheezing 28.8 modem – far less necessary than it used to be. Which is why the timely nature of the post above got us to thinking about it. (Of course, it’s likely that the Internet Archive Wayback Machine would have captured those sites already, but there’s nothing like being sure.)
While copying an entire website may, or may not, be legal, depending on the circumstances, there’s no doubt that it’s extremely useful (in my many years programming for the web I’ve had to do it a couple of times to help build new versions of sites, of course, and I’m sure if I could be bothered to think about it for more than the requisite thirty seconds, I’d think of a dozen more scenarios that would send the both of us to sleep).
However, mirroring a website like that can be a bit rude – especially if the server at the other end doesn’t have a particularly large Internet connection (and stressing your own connection’s probably not a good idea, either). So, there are built in limit options. The default is 20KB a second – a fairly robust tradeoff between performance and politeness – but you can limit it to far less, or tell it to go hell for leather and work as hard as it can (which I’ve used to load test web servers at times, as well).
In any event, this is one of those tools to always have around. You never know when you will need it. But it’s likely that when you do, you’re going to need it now. And you know what I mean: the kind of Right. This. Second. now that requires both capital letters and punctuation marks.