Danbooru

Pixiv 1000 Question

Posted under General

I'm against deleting the pool.
Danbooru has over 1 million images. If you're just looking for very high quality/interesting images, the Pixiv 1000 acts as a good filter.
A counterargument might be that someone could just search based on the Danbooru score. However, as Cyberia-mix pointed out, there are differences between what scores high on Pixiv and Danbooru. Many of the Pixiv 1000 images actually have a relatively Danbooru low score.

I haven't seen any strong arguments against keeping the pool.
One of the common arguments is that it's irrelevant data. First, irrelevant or not, I find it to be useful data. If you don't find it useful, that's fine. There seems to be a decent number of users that do. Second, I think it is fairly relevant. It's not like Pixiv is some arbitrary site. It's the biggest source of images on Danbooru.

I'm not clear on what harm the pool causes. There's a cost for updating it, but that's just like anything else on Danbooru--tags, notes, other pools, wiki entries, etc.

Also, it sounds like it might be possible to somewhat automate the process based on what DakuTree says.

Done a quick source check against my copy of the pixiv database and there is about 5400~ posts that could be added to it, some of which lack the tag..

I assume someone could do a bulk edit on those 5400 posts and add them to the pool.

john1980 said:
I'm against deleting the pool.
Danbooru has over 1 million images. If you're just looking for very high quality/interesting images, the Pixiv 1000 acts as a good filter.
A counterargument might be that someone could just search based on the Danbooru score. However, as Cyberia-mix pointed out, there are differences between what scores high on Pixiv and Danbooru. Many of the Pixiv 1000 images actually have a relatively Danbooru low score.

I haven't seen any strong arguments against keeping the pool.
One of the common arguments is that it's irrelevant data. First, irrelevant or not, I find it to be useful data. If you don't find it useful, that's fine. There seems to be a decent number of users that do.

The same argument can be used to defend anything posted in pointless pools and it's not actually any more substantive here.

If you find Pixiv's favorites data useful you can find it on Pixiv.

I'm for deleting the pool. I won't restate what others have said, but... One minor reason, in my opinion, is that popular artists tend to accumulate bookmarks incredibly fast regardless of the quality of their image. Yes, in general popular artists have high quality stuff, but sometimes their reputation is what gets their images into the Pixiv 1000.

If it's legitimately broken, and or causes problems with the site, I can't argue to keep it, but I can see it as being useful. People shouldn't necessarily need to register and jump sites to get this information if someone here is willing to curate it. I'm generally relatively hesitant to delete pools in general.

Updated

+1 for keeping it.

The pool can be useful for finding good images that are not noticed due to not being a popular copyright, or due to containing only male characters.
It could be worth bumping it up to 2500/5000/10000 though, 1000 is a bit low considering the amount of people who actually use pixiv. This should also filter out bad images that get the Pixiv1000 just because they are drawn by a popular artist.
Post count wise, 2500+ has 3865 posts, 5000+ has 1347 posts, 10000+ has 362 posts.

If it does get deleted though, I could easily setup a page for this using the pixiv database if people really need it.

Hinacle said:
The same argument can be used to defend anything posted in pointless pools and it's not actually any more substantive here.

How many pointless pools have you met that enable finding good images on such a wide scale?
Forget the principles for a second and focus on what matters: is it relevant to Danbooru? Hell yes it is. Is it cost-effective? That's what the discussion should be about.

DakuTree said:
It could be worth bumping it up to 2500/5000/10000 though, 1000 is a bit low considering the amount of people who actually use pixiv. This should also filter out bad images that get the Pixiv1000 just because they are drawn by a popular artist.

This doesn't change the core of the issue. I suppose pixiv 5000 and 10000 images are also mostly owned by insanely popular artists.
If we want to make navigation easier, much like with comics, we could exclude the most popular artists entirely and list them in the pool/tag's description. Not sure if it's really needed though.

Playing around with it, can someone point out the navigation issues that are being had? It's large, and you'd never browse the whole thing sequentially, but it seems to be working perfectly fine to me.

As for needing to exclude popular artists, I'm not sure why that would be helpful. If it's good art, it's good art. We would just get stuck in the position where it's impossible to objectively define "popular artist".

If anything, this pool is a pretty nice way to have something like what we had with vip_quality but keep our users from directly biasing it. Being based on an external site, it's a sort of quasi-objective quality indicator. Scrolling through the content, it seems to do a very good job of that (though it's still a bit too Touhou heavy for my tastes).

I sort of hedged up above, but I'll formally go in as a +1 keep, barring any legitimate technical barriers being caused by the pool.

Shinjidude said:
Being based on an external site, it's a sort of quasi-objective quality indicator. Scrolling through the content, it seems to do a very good job of that (though it's still a bit too Touhou heavy for my tastes).

It seems pretty redundant as a quality indicator if evazion's numbers for Touhou are still correct:

evazion said:
The median score for touhou is around 2 or 3 and ~90% of touhou_project_1000_users has a score of 3 or higher. So most touhou_project_1000_users posts have a higher than average score relative to regular touhou posts and will in fact be covered by a touhou score:>x search.

That's ignoring the fact that something rated >= 3 on Danbooru may not have a lot of favorites on Pixiv (this is frequently true of, but not exclusive to, explicit content for example). Using the pool can still provide a different (and perhaps to some better) discrimination on "good quality" than our internal scoring system.

It's not really possible to figure out which criteria describe posts with high Danbooru scores but few Pixiv favorites and exclude them. This pool would do that automatically, and still has the benefit of being something explicitly and objectively defined (so far as our control goes).

It's not a "VIP quality" or "favorites" pool per se which are rightfully banned since they are always privy to one or a handful of our users' personal preferences, but can achieve a similar effect of creating a set of "high quality posts" via more objective means.

Updated

Fred1515 said:
It seems pretty redundant as a quality indicator if evazion's numbers for Touhou are still correct:

I think there's a lot of overlap, but I don't think they're redundant at all. Based on number of pages of results (For example, pool:2778 score:<3 and pool:2778 score:>=3):

8.4% (47/(47+510)) have a score of less than 1.
18.9% (105/(105+452)) have a score of less than 2.
29.4% (164/(164+393)) have a score of less than 3.
38.5% (215/(215+342)) have a score of less than 4.
46.5% (259/(259+298)) have a score of less than 5.

If you consider all Danbooru posts, for example score:<3 and score:>=3

17.9% (9673/(9673+44230)) less than 1
34.9% (18791/(18791+35111)) less than 2
49.7% (26771/(26771+27131)) less than 3
61.8% (33323/(33323+20580)) less than 4
71.1% (38332/(38332+15571)) less than 5

Hinacle said:
If you find Pixiv's favorites data useful you can find it on Pixiv.

For the data to be useful for searching things on Danbooru, it obviously needs to be on Danbooru.

Regarding bumping up from 1000, I think 1000 is fine. We could add additional Pixiv 2500/5000/10000 pools if people wanted a more select set of images.

Updated

Shinjidude said:
Playing around with it, can someone point out the navigation issues that are being had? It's large, and you'd never browse the whole thing sequentially, but it seems to be working perfectly fine to me.

Mainly, the database frequently times out when trying to add/remove posts. So we have to trim down the pool in some way or turn it into a tag if we want to continue using it.

Shinjidude said:
As for needing to exclude popular artists, I'm not sure why that would be helpful. If it's good art, it's good art. We would just get stuck in the position where it's impossible to objectively define "popular artist".

The pool's usefulness is reduced by the fact that popular artists easily get anything in.
The Disgustingly Adorable pool is well known for having the exact same issue with artists that specialize in cute chibis. For pretty much every new work, you'll have a user to add it to the pool. At some point you go "okay okay we've got it, stop doing that, I can search for the artist tag myself", because it just floods images from lesser known artists.

We can simply make a rule that will exclude artists who previously had X or more images in the pool, and list their names in the description.

Shinjidude said:
(though it's still a bit too Touhou heavy for my tastes)

Could it be that you only looked at the first pages? It used to be a Touhou 1000 pool. Touhou is 17% of the pool right now.

Fred1515 said:
It seems pretty redundant as a quality indicator if evazion's numbers for Touhou are still correct

Aside from what Shinjidude and john1980 said, evazion also ignores the fact that scores up to 3-4 are still equivalent to 0 because of a few (20?) priv+ users who fav (or fav and upvote) a gigantic amount of posts. If IchiMashiPotatos, BlueFox and Zarath pay a visit to a post it can already get a free +5.
And meanwhile plenty of pixiv 1000 images have a score inferior to that.

Updated

If I can, I'll quote a suggestion that me and another user thought of in forum #74359.

BlueFox said:
I've also ran into this problem a couple of times before. Since the pool has such a large number of posts, why not separate it into it's own. For example, Madoka - Pixiv 1000, Persona - Pixiv 1000, Touhou - Pixiv 1000 and so forth. That way, it would cut down the time plus it would be easier to manage. Forgive me if someone suggested this already.

Benit149 said:
I like that idea, although I think the naming should be more like Pixiv 1000 - Touhou, Pixiv 1000 - Madoka, Pixiv 1000 - Precure, Pixiv 1000 - Other, and so forth so it's easier to find in the pools list.

Basically saying, just place each "Pixiv 1000" upload into it's own pool instead of grouping them all together. It will result in more pools but easier organization and less time to add a post to a pool.

@BlueFox
If you're suggesting that we should have a separate Pixiv 1000 pool for each copyright, then I think that's a bit much IMO. Not to mention that fact that you can just use tags if you want a specific copyright. I would instead propose separate pools for Pixiv 1000, Pixiv 5000, 10000, and so on.

Pixiv1000 alone is already out of Danbooru's scope, so I don't suggest adding even more of those pools.

BlueFox said:
Basically saying, just place each "Pixiv 1000" upload into it's own pool instead of grouping them all together. It will result in more pools but easier organization and less time to add a post to a pool.

I would be only a matter of time until the individual pools reach problematic sizes.

+1 for the tag too. Seems unnecessary to have multiple pools for it.

Although, considering this could happen in the future with any other pool, wouldn't it be better to see about getting it fixed?
I'm assuming the issue with the pool at the moment is due to it having to order the list of images (Even if they haven't been). Something like an "Unordered" option would most likely get rid of the issue with the database timing out.

1 2 3 4 5 6