Danbooru

Downloading while keeping tags intact?

Posted under General

I am trying to save pictures from the site because I am rebuilding my personal image collection, and I have found that when I download the .jpg files, they do not retain their tags. Is there any way to retain the tagging while downloading, or is the tagging all superficial and on the coding for the website vs. embedded in to the file?

Thanks.

Updated by Akiraka8

By "do not retain their tags", you mean in the file name or something similar? That's orthogonal, since the latter option is true, ie. tags are completely external to the image files. If you wanted to rename files to match their tags, it would be doable programmatically using the site API (and perhaps even accomplished already in one of the other threads we have dealing with downloader scripts), but also inherently unstable, since the tags can and do change as time goes. It would be better to have a local mirror of (a portion of) the tags DB instead.

葉月 said:
By "do not retain their tags", you mean in the file name or something similar? That's orthogonal, since the latter option is true, ie. tags are completely external to the image files. If you wanted to rename files to match their tags, it would be doable programmatically using the site API (and perhaps even accomplished already in one of the other threads we have dealing with downloader scripts), but also inherently unstable, since the tags can and do change as time goes. It would be better to have a local mirror of (a portion of) the tags DB instead.

I mean: when I open picasa and edit photos locally, tags are embedded in to the data on the .jpg files and I can search for them by tag both inside and outside of picasa, ie: i can search using windows search for "short hair" and all those pictures show up.

As I imagine it works here on Danbooru, the tags are probably not embedded in the files themselves as they are on my PC -- they're probably linked through a database somehow that allows them to be modified quickly and easily with no actual modification of the files themselves. Oh well, it was a thought. I figured it'd be neat if I could just download what images I wanted and search for them by the same tags that are here on Danbooru.

Oh, I see. What picasa uses is in-file EXIF metadata. But that's not something we can possibly apply, because it destroys the identity between the source file and the stored/downloaded copy on the byte level. Since danbooru by definition only collects external content, unlike picasa which is meant to be *the* place for pictures to live in, the tags have to live in an external database.

(Technical nitpick: you cannot possibly search for tags with a sensible speed without mirroring them in a real DB, so picasa has to do it too, and most probably only applies the tags when you actually download a copy. But that's an implementation detail; conceptually you can think of their tags as living inside the files themselves.)

If you really want the tags and happen know a thing or two about programming you might want to try writing a script like the one Bap suggested in forum #35161.

I don't know anything about programming myself, but from what I understand a script like that would do exactly the thing you want.

thebornotaku said:
I am trying to save pictures from the site because I am rebuilding my personal image collection, and I have found that when I download the .jpg files, they do not retain their tags. Is there any way to retain the tagging while downloading, or is the tagging all superficial and on the coding for the website vs. embedded in to the file?

Thanks.

Sorry if it's frowned upon to revive old threads, but I've seen nothing else on this topic lately.

There's a Python script called Sheska (https://github.com/xiongchiamiov/sheska) that does exactly what you want, basically using the methods described by Bap in forum #35161.

Well, I should say it used to do that... It's not working right now and I'm hoping someone here can help me fix it so I can update my local image collection.

When it works properly it uses the md5 from the image's filename to look it up with the API, and it uses Exiftool (with a custom configuration) to write the EXIF tag fields to the file. The result is a perfectly searchable tag list for your images in Picasa.

It apparently also saves the original md5 as an EXIF field for reference as well (even though it's in the filename anyway), because editing the tag fields into the metadata obviously alters the hash. I think this might be where the problem is, though.

1