Danbooru

Line artifacts from Twitter posts?

Posted under General

kittey said:

I’d argue that the files aren’t corrupted. They’re perfectly fine JPEGs. The problem is that the images seem to have been processed incorrectly by Twitter.

The issue is that whatever software twitter uses to process images received a corrupted file, and basically took a random row of pixels close to the end of the file and slapped it where the missing data was. Maybe it's not corrupted in the sense that it won't raise an error if you try to open it, but it's still corrupted by any other definition of the term. It's a garbled file, though only for one or two rows of pixels.

post #6704143 is a good example of this, you can see the shoes and white border in the last row of pixels because the program just copypasted a random line close to the end in an attempt to repair the file.

Pixiv has similar issues sometimes, but they don't attempt to repair the files so they're usually not uploadable by danbooru because they're blocked by our processing software, though I've seen similar cases managing to get through before, images with rows of pixels getting lost in the way. post #3844117 comes to mind.

nonamethanks said:

The issue is that whatever software twitter uses to process images received a corrupted file, and basically took a random row of pixels close to the end of the file and slapped it where the missing data was. Maybe it's not corrupted in the sense that it won't raise an error if you try to open it, but it's still corrupted by any other definition of the term. It's a garbled file, though only for one or two rows of pixels.

If the edge between the normal and incorrect image data was at a height that’s a multiple of 16, I’d agree. The corrupted images are JPEGs and if a JPEG has corrupted data, you’ll get a 16x16 block of garbage. If data is missing somewhere, you’ll get garbage until the end of the image, but it’ll start at a 16x16 block. However, in the five examples above, that’s not the case. The incorrect data starts at a height that’s not a multiple of 16, so this is clearly Twitter’s image processing clobbering image data that should be there, or, more likely, tacking on extra image data, thereby enlarging the image. My guess is that this is a decoding error in the normally invisible gutter. As every block must be 16x16 pixels (If the image has chroma subsampling, which is pretty common. Otherwise it’s 8x8.) but the image might not have a matching height, the last row of blocks is encoded as if it had some extra lines to fill up to 16, but those extra lines are never shown because the correct height is saved in the JPEG header. Now if the image decoder is bugged, it will decode the full 16 pixels high last row of blocks and then not discard the invisible pixel rows, instead filling it with some other part of the image. It’s actually quite likely that the extra data is actually leaked memory because the decoder allocated memory for the full 16 rows, which contained old data from a few steps ago, and then overwrote only the ones it was supposed to decode. Either way, I’m pretty sure this is an artifact of incorrectly decoding a perfectly fine JPEG. This is also why all the “broken” images (that I’ve seen) have a height that’s a multiple of 16.

Edit: The post #3844117 that you mentioned uses 8x8 blocks (because of no chroma subsampling) and the corrupt blocks are all 8x8, so this is an actually corrupt file.

kittey said:

The incorrect data starts at a height that’s not a multiple of 16, so this is clearly Twitter’s image processing clobbering image data that should be there, or, more likely, tacking on extra image data, thereby enlarging the image. My guess is that this is a decoding error in the normally invisible gutter.

This would make sense to me in a vacuum, but we actually have evidence that it's not the case. This problem has been happening for a few years now, and the non-corrupted versions of files with this kind of visual artifact actually show more details on those rows of pixels (or at least the ones I found did). There's no extra height being added. Compare post #6462506 vs post #6462530, or post #6565655 vs post #6564341, or post #5725505 vs post #5722659. That's what making me think the files end up being truncated somewhere along the way.

Some weird memory leak problem also makes sense, but in the end it doesn't change the result, which is that we have a version of a file that somehow ended up getting corrupted in transit and is a visually broken (albeit in a minor, barely noticeable way) version of the original. Saying that it's a corrupted file vs a valid file that was visually corrupted due to a technical glitch seems like a debate on semantics to me.

The question is, do we need an extra tag specifically for this brand of corruption? Is it going to be more useful than just source:*twitter* corrupted_file?

Updated

I don't see the point in a separate tag.
The current tag is more technical to show "artifacts are not part of the original image." Even if they are not displayed when viewing.

may be added for some types of invisible corruption.

If possible, the file is replaced or uploaded from another source.

Having a tag would be useful for short-term problems with files on X (Twitter), making it easier to replace files (without searching by the source). There was a temporary tag on danbooru when it crashed with thumbnails, as far as I remember (topic #15640?). The situation under discussion looks different, without a time frame.

1