|
As LiveJournal has moved to a new implementation of their image hosting service, the venerable
fotoup.pl no longer seems to work for backing up pictures. fotoup.pl up to date here, but it still didn't know that it was supposed to use the new ic.pics.livejournal.com server.I didn't relish the thought of trying to update the
fotoup.pl script myself, but fortunately I found an alternate set of backup instructions using wget that were easier to modify.Here are the steps I followed. Note that I use a bunch of Unix commands, which I did using Cygwin but which you could obviously do with Linux or at any other Unix prompt.
- In Chrome, install the cookies.txt export extension. If you're using Firefox, you could use a similar extension such as Export Cookies.
- Log in to LiveJournal.
- Export the cookies for livejournal.com using your handy browser extension and save them in a file called "cookies.txt".
- Replacing "USERNAME" with your LiveJournal username, run this command to download all of the web pages under your new Scrapbook "catalog" directory:
wget --load-cookies cookies.txt -nc -np -r -o crawl_log.txt http://USERNAME.livejournal.com/pics/catalog
- From all of that HTML that you just downloaded, you want to extract all of the URLs that link to your original images, all of which end with
original.jpg. This command should do it (again, replacing USERNAME with your username):grep -r original.jpg joecarnahan.livejournal.com | grep _blank | cut -d '"' -f 6 | sort | uniq > original_urls.txt
- Just to make sure it's all right, take a look at the first few lines of the file you created:
head original_urls.txt
It should be a list of links that look like this, with USERNAME in place of your username and with different numbers:http://ic.pics.livejournal.com/USERNAME/8675309/24601/original.jpg
- Now that you have the list of images to download, it's time to download them:
wget --load-cookies cookies.txt -i orignial_urls.txt -np -o dl_log.txt -x
This should create aic.pics.livejournal.comsubdirectory and download all your images under it. - Finally, there's the problem where all of your images have been downloaded with the same name "original.jpg", each in a different directory whose name is a different number. In order to put all the pictures into a single directory, I went into the parent directory of the pictures' directories (something like "
ic.pics.livejournal.com/USERNAME/8675309") and ran the following shell script there:for NUM in *; do mv $NUM/original.jpg $NUM.jpg; rmdir $NUM; done
(I actually made a backup of all of my ic.pics.livejournal.com directory before doing this, just so that I wouldn't need to re-download things if anything went wrong with the script.)
Of course, I can think of a couple ways to improve on this. In particular, it would be cool if I had kept the original album structure. This should be feasible, because the original album structure is captured in all of that HTML that I downloaded with the first
wget command. So, either by processing each album separately or by looking back at the downloaded HTML after downloading the images, it should be possible to figure out which pictures were in which album.I haven't gotten that far yet, though. In the meantime, I figured I should share what I have.
geeky
excited