Danbooru

Image queue after upload limit expires.

Posted under General

Ph.D said:
I thought of that. Supposedly, the post would get deleted from the queue, but I have no idea how the queue can use the Find Similar function. Can't it borrow that function for a bit? I don't think it would need to be duplicated entirely.

It's not something a computer can tell for sure. Some posts, for instance game_cgs, are very similar to the point where an automated system might think they're identical, even though they're not. Whereas true duplicates, perhaps due to heavy jpeg_artifacts and low resolution, might not be caught because they appear too different.

Any one bit change will defeat Danbooru's built in duplicate detection that's based on the MD5 hash. "Find Similar" links off to Piespy's visual similarity search, which isn't as sensitive.

Because the latter search is much more computationally expensive, and running off someone else's bandwidth, I'd be wary to have an automatic system use it for everything coming into the system.

That coupled with Coconut's points above, that although we can tell the system that anything better than 97% similarity is a match, you'd still potentially end up with a lot of false positives and negatives. You really need a human eye to make the final decision. Things like monochrome comic strips are especially likely to come up as false matches.

Another point: We accept duplicates in the form of better quality versions (higher resolution, sans JPEG artifacts). Those would almost certainly all be screened out by Piespy's similarity system as matches.

1 2