Danbooru

Mass update toosaka_* -> tohsaka_*, toono_* -> tohno_*

Posted under Tags

NWF_Renim said:

If we followed Shinjidude's suggestion on changing to this "passport" system for romanization, that sounds like it would probably mean then most cases from the Fate properties would be resolved then. Do we have an idea what kind of impact this might have with the romanization done for other series?

One of the nice (flexible) and less than nice (not fully internally consistent in a 1:1 sense) aspects of the "passport system" / 外務省旅券規定, Gaimushou Ryoken Kitei is that it is permissive in the cases we're talking about rather than restrictive.

It's a modified version of the Hepburn romanization system we already use, with some additional allowances.

Relevant to the topic here, for long versions of Japanese "o" either, "おお" like in "Toosaka", or "おう" like "Azumanga Daiou", you are allowed (not required) to use "oo", "ou", "oh", or "o". Nothing needs to change since the "oo" and "ou" we are already using are permitted. It would also allow us to change "oo" in "Toosaka" and "ou" in "Daiou" to "Tohsaka" and "Daioh" (the "official" spellings) without breaking the romanization rules. I'd suggest against using a single "o" since you lose the distinction of the long vowel, but that's how you go from proper "Toukyou" to the common "Tokyo" (though also allowing you to spell it as "Tohkyoh" which looks dumb).

It also allows you to put "m" instead of "n" before "b", "p", and "m" consonants. Proper Hepburn would have you romanize "先輩"/"せんぱい" as "senpai", whereas the passport system allows you to spell it as "sempai" which is actually somewhat closer to how it gets pronounced much of the time.

tldr; Switching from proper Hepburn to the "passport system" wouldn't require us to change anything anywhere, but it would give us the flexibility to use "oh" where official spellings indicate it, along with a few other minor optional variations.

Updated

Ah, Toosaka (遠坂 とおさか /toosaka/) isn't a long vowel (indeed, you're supposed to say both the 'to' and 'o' semi-separately), while the ou in Azumanga Daiou (大王 だいおう /daioɯ/) is a long vowel (the ɯ at the end). Writing Rin's name as Tousaka is incorrect, as it makes it look like it would be とうさか and not とおさか.

The passport system even says the same, with "[例] 大野(おおの)→ONO 又は OHNO 又は OONO" not having "OUNO" as an option. (This also means that if you wanted to write Osaka out 'in full', it has to be Oosaka or Ohsaka, and not Ousaka.)

As Albert said in topic #1510, this basically comes down to a debate over descriptivism vs prescriptivism. The prescriptivist stance is that language should always follow the rules. If language doesn't follow the rules, then it should be changed to fit the rules. The descriptivist stance is that the rules should follow the language. If language doesn't follow the rules, then the rules should be changed to fit the language.

The prescriptivist stance would be to say that that Tokyo, for example, should be romanized as Toukyou because that's what the rules say, so that's what we must do. Clean and simple. The descriptivist stance is to say hey, nobody actually calls it Toukyou. Everyone calls it Tokyo. If everyone calls it Tokyo, but the rules say it should be called Toukyou, then the rules are wrong. The rules are what should be changed, not the language.

I'm a descriptivist. I don't believe it's our place to say we're right and everyone else is wrong. At a certain point you have to accept that language is messy, that not everything is going to fit into a clean set of rules, and that you just have to deal with it. You have to accept things as they are, not as you wish them to be.

I value accessibility and usability over consistency. Users shouldn't go "wait, what? Why are they calling her that?" when they see a character's name in the tag list. Sure, Toosaka Rin isn't hard to figure out, but it still looks weird to people used to seeing her called Tohsaka Rin in most places outside of Danbooru.

And before someone says "Who cares about casual users anyway?" - I do. It's important to me that casual users are able to use the site and see tags that are spelled how they expect. This is more important to me than having a clean set of romanization rules, just so Japanese speakers can map tags back to kana slightly easier.

Regarding the "other sites use Toosaka too" argument, I'll say this:

  • Wikipedia uses Tohsaka Rin. Their policy is that conventional or commonly accepted romanizations are preferred over their own system. For media this usually means official romanizations. If we're looking at what other sites do, I would give Wikipedia more credibility than sites like MAL.
  • Pixiv translates 遠坂凛 as Rin Tohsaka. They're not precious about romanization, they just care that English speakers can use their site.
  • Rin Tohsaka is vastly more popular in Google, to the point that Google autocorrects "Rin Toosaka" to "Rin Tohsaka".
  • Speaking of Google, using uncommon names kills us in search rankings. If someone searches Google for fanart of some popular character or copyright, they should at least have a chance of finding Danbooru. Not showing up in Google for popular characters because we use uncommon names really hurts us in terms of being able to bring in new users to the site. This is a major concern for me.

All of this to say that I agree with this alias and I would agree with moving back to things like Tohno Shiki or Azumanga Daioh too. The passport system sounds fine, and if it lets us romanize these tags in a consistent framework then great, but consistency isn't something I'm primarily concerned about. I'm very willing to sacrifice consistency for usability if some well-known character or copyright has a romanization that is nonstandard yet it's ingrained and everyone recognizes it.

Updated

Paracite said:

Ah, Toosaka (遠坂 とおさか /toosaka/) isn't a long vowel (indeed, you're supposed to say both the 'to' and 'o' semi-separately), while the ou in Azumanga Daiou (大王 だいおう /daioɯ/) is a long vowel (the ɯ at the end). Writing Rin's name as Tousaka is incorrect, as it makes it look like it would be とうさか and not とおさか.

Even better then. I was working off of a summary rather than referencing the whole thing. I'd rather we not use "ou" for "おお" since it sort of conflates "おう" and "おお" in a way that doesn't quite make sense.

It *does* allow for "oh" in both cases though (大野(おおの)→ONO 又は OHNO 又は OONO; 洋子(ようこ)→YOKO 又は YOHKO 又は YOUKO) which is what we'd want for the sort of Romanization brought up in this thread.

Updated

evazion said:

The prescriptivist stance would be to say that that Tokyo, for example, should be romanized as Toukyou because that's what the rules say, so that's what we must do. Clean and simple. The descriptivist stance is to say hey, nobody actually calls it Toukyou. Everyone calls it Tokyo. If everyone calls it Tokyo, but the rules say it should be called Toukyou, then the rules are wrong. The rules are what should be changed, not the language.

I'm a descriptivist. I don't believe it's our place to say we're right and everyone else is wrong. At a certain point you have to accept that language is messy, that not everything is going to fit into a clean set of rules, and that you just have to deal with it. You have to accept things as they are, not as you wish them to be.

In general I completely agree, descriptivism is better than prescriptivism (definately so when you're describing a language, which is where those terms are usually used). Note both 11 years ago and now, I prefer "Tohsaka" over "Toosaka". But I think having rules and consistency is valuable too, especially in the case of romaji where rules very often *aren't* followed or agreed to. Like you say, it's best to change the rules to fit the usage. I think by changing our romanization standard to something less restrictive will for the most part allow us to have our cake and eat it too. It allows us to cover these common use cases, and still have a standard to fall back on. I like changing *that* rule moreso than just saying we should make exceptions for each and every individual case. Changing my vote to reflect the discussion here.

Alright, if we're agreeing with this i'm going to move on with the follow-up BUR.

I'm sorry for repeating myself, but i feel like this topic deserves more attention than what it currently is getting, mainly to keep other people who haven't participated in this discussion up-to-date. Should i make a new topic?

Mysterious_Uploader said:

I'm sorry for repeating myself, but i feel like this topic deserves more attention than what it currently is getting, mainly to keep other people who haven't participated in this discussion up-to-date. Should i make a new topic?

Ok, I've written up topic #17011 for any further discussion on the larger romanization standard change proposal and give to us a place to discuss revisions to the Howto:Romanize wiki entry. Any further discussion on that front can take place there.

Shinjidude said:

In general I completely agree, descriptivism is better than prescriptivism (definately so when you're describing a language, which is where those terms are usually used). Note both 11 years ago and now, I prefer "Tohsaka" over "Toosaka". But I think having rules and consistency is valuable too, especially in the case of romaji where rules very often *aren't* followed or agreed to. Like you say, it's best to change the rules to fit the usage. I think by changing our romanization standard to something less restrictive will for the most part allow us to have our cake and eat it too. It allows us to cover these common use cases, and still have a standard to fall back on. I like changing *that* rule moreso than just saying we should make exceptions for each and every individual case. Changing my vote to reflect the discussion here.

It's question of internal consistency (consistency within our own rules) versus external consistency (consistency with the rest of the world). These are often in conflict. We should strive for internal consistency, but we shouldn't focus on it to the detriment of everything else.

I would say it's fine to use our own romanization system for most lesser-known characters or artists, since as you say, most Japanese creators don't really care about romanization and just do whatever. But if a character or franchise is very popular, and their name is well known outside of Danbooru under a certain spelling, then I think we should be very reluctant to change that spelling to something else. Otherwise we end up with people insisting on things like changing Yazawa Nico to Yazawa Niko (topic #15176) just to fit our system, even though that would put us at odds with the entire rest of the internet.

evazion said:

But if a character or franchise is very popular, and their name is well known outside of Danbooru under a certain spelling, then I think we should be very reluctant to change that spelling to something else. Otherwise we end up with people insisting on things like changing Yazawa Nico to Yazawa Niko (topic #15176) just to fit our system, even though that would put us at odds with the entire rest of the internet.

I don't have any real problems with this. In topic #17011, I wrote up cases like Yazawa Nico as:

If there are official spellings that neither follow the old Hepburn system, nor the new revised "passport system" (e.g. they use "si" for し, "tu" for つ, "ci" for き, etc.), you should bring them up in the forum for discussion. As per Evazion in forum #168056, if these alternative spellings are pervasive and well accepted by the English speaking fan-community, there is a fair argument that we approve an exception for them. If standard romanization tends to be accepted by the English community, or no romanization is well established (e.g. the official spelling consists of a brief on-screen appearance but nowhere else), I think there's a fair argument to follow the standard system rather than the official spelling.

Does that sound acceptable for us, policy-wise?

1 2