Danbooru

Danbooru2020 mirror/dataset

Posted under General

First: Thank you for your work on this dataset.

Isn't the warning about the MD5 hashes in the metadata ("The MD5s are often incorrect.") misleading?

Those MD5s are correct (i.e. the MD5 in Danbooru's metadata matches the MD5 of the file stored on Danbooru's servers).

What doesn't match is Danbooru's MD5 and the MD5 you'd get if you downloaded the image from the original source URL again and hashed it (typically because the image has changed at the source).

The second footnote (the one addressing the lossless-in-terms-of-image-data recompression of images) reinforces this misunderstanding of md5_mismatch: "[...], but note that the original MD5 hashes are available in the metadata, and many thousands of them are incorrect even on the original Danbooru server".

1