Line artifacts from Twitter posts?

luntoer

I've been noticing some posts from Twitter have an extra few line of pixels on the bottom that look like some kind of artifact:

From my uploads:
post #6704143
post #6703776
post #6704144

From others:
post #6704090
post #6704084

I'm curious whether it's Twitter doing this or the artists themselves and whether it's something to be concerned about in terms of upload quality.

Reply

nonamethanks

almost 2 years ago

It's Twitter doing it, see one of those images' original link. Usually it's a sign of corruption during upload on their side, and I'd expect it to only happen rarely, but you never know what kind of rubberbands-and-prayers solution Twitter's servers rely on these days.

Reply

Unbreakable

almost 2 years ago

I think someone attempted to make a tag for this once but it was either shot down or nuked.

Reply

nonamethanks

almost 2 years ago

The tag is corrupted file.

Reply

kittey

almost 2 years ago

I’d argue that the files aren’t corrupted. They’re perfectly fine JPEGs. The problem is that the images seem to have been processed incorrectly by Twitter.

Reply

nonamethanks

almost 2 years ago

kittey said:
I’d argue that the files aren’t corrupted. They’re perfectly fine JPEGs. The problem is that the images seem to have been processed incorrectly by Twitter.

The issue is that whatever software twitter uses to process images received a corrupted file, and basically took a random row of pixels close to the end of the file and slapped it where the missing data was. Maybe it's not corrupted in the sense that it won't raise an error if you try to open it, but it's still corrupted by any other definition of the term. It's a garbled file, though only for one or two rows of pixels.

post #6704143 is a good example of this, you can see the shoes and white border in the last row of pixels because the program just copypasted a random line close to the end in an attempt to repair the file.

Pixiv has similar issues sometimes, but they don't attempt to repair the files so they're usually not uploadable by danbooru because they're blocked by our processing software, though I've seen similar cases managing to get through before, images with rows of pixels getting lost in the way. post #3844117 comes to mind.

Reply

kittey

almost 2 years ago

nonamethanks said:
The issue is that whatever software twitter uses to process images received a corrupted file, and basically took a random row of pixels close to the end of the file and slapped it where the missing data was. Maybe it's not corrupted in the sense that it won't raise an error if you try to open it, but it's still corrupted by any other definition of the term. It's a garbled file, though only for one or two rows of pixels.

If the edge between the normal and incorrect image data was at a height that’s a multiple of 16, I’d agree. The corrupted images are JPEGs and if a JPEG has corrupted data, you’ll get a 16x16 block of garbage. If data is missing somewhere, you’ll get garbage until the end of the image, but it’ll start at a 16x16 block. However, in the five examples above, that’s not the case. The incorrect data starts at a height that’s not a multiple of 16, so this is clearly Twitter’s image processing clobbering image data that should be there, or, more likely, tacking on extra image data, thereby enlarging the image. My guess is that this is a decoding error in the normally invisible gutter. As every block must be 16x16 pixels (If the image has chroma subsampling, which is pretty common. Otherwise it’s 8x8.) but the image might not have a matching height, the last row of blocks is encoded as if it had some extra lines to fill up to 16, but those extra lines are never shown because the correct height is saved in the JPEG header. Now if the image decoder is bugged, it will decode the full 16 pixels high last row of blocks and then not discard the invisible pixel rows, instead filling it with some other part of the image. It’s actually quite likely that the extra data is actually leaked memory because the decoder allocated memory for the full 16 rows, which contained old data from a few steps ago, and then overwrote only the ones it was supposed to decode. Either way, I’m pretty sure this is an artifact of incorrectly decoding a perfectly fine JPEG. This is also why all the “broken” images (that I’ve seen) have a height that’s a multiple of 16.

Edit: The post #3844117 that you mentioned uses 8x8 blocks (because of no chroma subsampling) and the corrupt blocks are all 8x8, so this is an actually corrupt file.

Reply

nonamethanks

almost 2 years ago

kittey said:
The incorrect data starts at a height that’s not a multiple of 16, so this is clearly Twitter’s image processing clobbering image data that should be there, or, more likely, tacking on extra image data, thereby enlarging the image. My guess is that this is a decoding error in the normally invisible gutter.

This would make sense to me in a vacuum, but we actually have evidence that it's not the case. This problem has been happening for a few years now, and the non-corrupted versions of files with this kind of visual artifact actually show more details on those rows of pixels (or at least the ones I found did). There's no extra height being added. Compare post #6462506 vs post #6462530, or post #6565655 vs post #6564341, or post #5725505 vs post #5722659. That's what making me think the files end up being truncated somewhere along the way.

Some weird memory leak problem also makes sense, but in the end it doesn't change the result, which is that we have a version of a file that somehow ended up getting corrupted in transit and is a visually broken (albeit in a minor, barely noticeable way) version of the original. Saying that it's a corrupted file vs a valid file that was visually corrupted due to a technical glitch seems like a debate on semantics to me.

The question is, do we need an extra tag specifically for this brand of corruption? Is it going to be more useful than just source:*twitter* corrupted_file?

Updated by nonamethanks almost 2 years ago

Reply

Dolmatov

almost 2 years ago

I don't see the point in a separate tag.
The current tag is more technical to show "artifacts are not part of the original image." Even if they are not displayed when viewing.

may be added for some types of invisible corruption.

If possible, the file is replaced or uploaded from another source.

Having a tag would be useful for short-term problems with files on X (Twitter), making it easier to replace files (without searching by the source). There was a temporary tag on danbooru when it crashed with thumbnails, as far as I remember (topic #15640?). The situation under discussion looks different, without a time frame.

Reply

nonamethanks

almost 2 years ago

Dolmatov said:
making it easier to replace files (without searching by the source).

Twitter files are immutable, they likely won't be fixed, so it's doubtful we'll ever replace those.

Reply

evazion

almost 2 years ago

I created corrupted twitter file for this. I'd like to reserve corrupted file for images so corrupted that we can't even properly read them or generate thumbnails for them. You can find posts like this with exif:Vips:Error.

Reply