Danbooru

A question about uploading almost the same images from a new source

Posted under General

This is a bit of a headscratcher for me. I wanted to help complete pool #11487 using images from here, which is a repost of the comic after the artist took down the original Pixiv post which some of the images were uploaded from. Thing is, it turns out that some of the images in there are larger and maybe of greater quality than the original uploads, but the difference is small enough that I’m not sure if I should upload all of the images. Also, for some reason the site isn’t detecting the images from the linked Pixiv post that have already been uploaded.

If any part of this is unclear, let me know and I’ll try to clear up the confusion.

If the images are the same file type and resolution, and the size difference is something like 180kb vs 200kb, then it's very likely they're pixel-perfect duplicates and not worth uploading. If you don't want to risk increasing your deletion percentage, it's safer to just upload the ones we don't already have.

If the file type is better (png>jpg) or the resolution is larger, assuming the extra resolution isn't just empty space, then it's probably fine to upload them regardless of file size.

As an extra note, if an image is an actually pixel-perfect duplicate, it'll show up like this on the upload page:

These images are not worth uploading and won't be approved. Like blindVigil says, in other cases (even when the message down there says "Duplicate"), it may be worth it depending on the specific circumstances.

In addition to the situations he outlined, for JPEGs it's also possible for a file to be the same resolution but still be higher quality, and thus worth posting. post #7496277 and it's child post #7276930 are an example of this. If you zoom in you should be able to discern the parent post has less (almost no) jpeg artifacts, especially compared to the child post. This difference is usually visible from the filesize, but you should always zoom in to make sure before uploading in cases like this.

Updated

岩戸鈴芽 said:

In addition to the situations he outlined, for JPEGs it's also possible for a file to be the same resolution but still be higher quality, and thus worth posting. post #7496277 and it's child post #7276930 are an example of this. If you zoom in you should be able to discern the parent post has less (almost no) jpeg artifacts, especially compared to the child post. This difference is usually visible from the filesize, but you should always zoom in to make sure before uploading in cases like this.

The parent is five times the file size. Its 3.7mb vs 700kb. Of course it's going to be better quality. An example of posts with a more similar file size would better illustrate your point.

blindVigil said:

The parent is five times the file size. Its 3.7mb vs 700kb. Of course it's going to be better quality. An example of posts with a more similar file size would better illustrate your point.

I was explaining that the site gives a warning for actual ppd's, and that while larger filesize -> *usually* higher JPEG quality, this isn't always the case. That example post I linked was chosen because the JPEG artifacts are very noticable on the Twitter source but hardly if at all on the weibo one, not as an example of "zoom in to make sure". Besides, asset #20606247 is 3.3x the filesize of it's Twitter variant, post #7248647, but if you zoom in the artifacts are clearly the same (this is an example of "zoom in to make sure").

Yeah, the site did detect any pixel-perfect duplicates and I have no intention of uploading those, but some images were merely very similar to already-uploaded posts due to a slightly larger size/resolution. Those are the ones I was asking about.

Also, what’s the site’s policy on changing the source of existing posts from ones that have been deleted to a new one that hasn’t?

Blank_User said:

As an extra extra note about pixel-perfect duplicates, anything you upload will be shown in under My Uploads, even if you don't post it. If you care about this and want to avoid unneeded clutter, use https://duplicatebooru.zipfiled.info to check the files before uploading them.

I think I might do that, thanks.

Valiran9 said:

Yeah, the site did detect any pixel-perfect duplicates and I have no intention of uploading those, but some images were merely very similar to already-uploaded posts due to a slightly larger size/resolution. Those are the ones I was asking about.

In general, higher res is always fine to upload (assuming they're not upscaled, obviously). Same resolution depends on the file characteristics (png vs jpeg, jpeg vs jpeg but with way fewer artifacts). Do note that a higher res jpeg is considered superior over a lower res png, at least on paper. In practice, if the resolution difference isn't big or the jpeg is really bad, the other post may be set as parent.

Also, what’s the site’s policy on changing the source of existing posts from ones that have been deleted to a new one that hasn’t?

I think I might do that, thanks.

The source field should refer to the place that exact file was obtained from, byte for byte. You should only change the source of a dead one to a non-dead one if you're 100% certain the entire file is an MD5 match. In these cases, trying to upload from the old source should redirect you to the existing post.

Updated

岩戸鈴芽 said:

The source field should refer to the place that exact file was obtained from, byte for byte. You should only change the source of a dead one to a non-dead one if you're 100% certain the entire file is an MD5 match. In these cases, trying to upload from the old source should redirect you to the existing post.

I did that recently with some official Asami Kei art. The uploader did not provide a valid source, but after seeing some with their source fields updated, I started comparing posts with images from that site and updating the source field when the files were exactly the same. Hex editors like HxD are great for this because they can make byte comparisons between any two files. And while the chance of accidental MD5 hash collisions is very low, direct byte comparison will reduce the chance of any false positive matches to 0.

1