On Penkyamp and other atrocities

September 8, 2009

I recently stumbled across a romanization system for Cantonese called Pênkyämp (which, amazingly enough, is supposed to be pronounced [pʰɪŋ³³jɐm⁵⁵], or “ping yum”, for the non-IPA-readers).

Yes, it is even discussed on mainland bulletin boards and promoted in blogs.

Pênkyämp is, hands down, the most confusing Cantonese romanization ever devised. I suppose the distinguishing feature is that it encodes the length distinction, e.g. between [sɐm⁵⁵] ‘heart’ and [saːm⁵⁵] ‘three’. But it does this adding a consonant symbol at the end of the short-vowel syllable. So, ‘three’ is spelled “sam”, which is fine, but ‘heart’ is spelled “samp”, which just looks ridiculous. Similarly, ‘square’ [fɔːŋ⁵⁵] is “fong”, whereas ‘wind’ [fʊŋ⁵⁵] is “fonk”; and ‘eat’ [sɪk²²] is “sek”, while ‘rock’ [sɛk²²] is “seg”.

This scheme offends me because it’s wildly non-iconic—i.e., things that are longer should look longer. But for “sam” vs. “samp”, the vowels, which are crucially different here, look exactly the same, and to make the vowel shorter you add something to the end, rather than modify the vowel symbol itself, which I think would be the desirable thing since the vowel is the most salient part of the syllable. Most systems for Cantonese do this, viz. “saam” vs. “sam”.

Now, it is true cross-linguistically that voiceless coda consonants (like -p, -t, -k) make vowel durations shorter, and voiced ones (-b, -d, -g) make vowel durations longer. However, this is only subphonemically true for languages that already have a voicing contrast in that position. I know of no orthographies that use this fact to indicate actual, phonemic vowel length. Furthermore, syllables like [saːm⁵⁵] and [sɐm⁵⁵] tend to have the same overall duration. When the vowel shortens, the coda consonant lengthens to make up for it. So if you’re going to tack something on the end, why not write “saam” and “samm”? Isn’t that much more intuitive?

My next objection is the tone marks. Apparently you can choose between numbers and diacritics, and since the numbers are standard (1 through 6 stand for high 55, rising 25, mid 33, low 21/11, low rising 23, and low-mid 22), I have no problems with them. The terrible design choice is the diacritic tone marks. Going from tones 1 through 6, we have ä, ã, â, a, á, à.

First of all: an umlaut for the high tone? An umlaut?! Umlauts, whenever they are used, are used to change vowel quality, i.e., the vowel itself, and not length or pitch or stress or whatever. Umlauts are not appropriate for marking tone. Ever. (As an example, look at pinyin “u” vs. “ü”.) And besides, what’s wrong with the macron? Wouldn’t “ā” do just as well, if not better?

Next, the second and fifth tones. The second tone is by far the more common of the two, and so should get the less weird tone mark. If you’re going to use an acute accent for rising tone, then “á” should mark second tone, not the fifth tone. I suppose the use of the tilde for rising tone may be inspired by its use for the glottalized rising tone in Vietnamese; however, in most orthographies, the tilde is used for nasalization. The tilde is also reminiscent of the IPA falling-rising tone mark, a complex symbol which looks like this: [a᷈]. But this association also seems inappropriate for the straightforwardly rising tone in Cantonese. For the fifth tone, a haček would seem more appropriate (cf. the third tone in Mandarin).

The third tone, a level tone marked with a circumflex, is completely puzzling to me. Why mark a level tone with a hat? It makes no sense. In Vietnamese, circumflexes are used to distinguish vowels; in IPA, they’re used for falling tones. There’s just no motivation for this usage here.

Finally, the low (fourth) tone is left unmarked in Pênkyämp. This decision also seems counter-intuitive to me. If any tone should be unmarked, it should be the first tone. This is the tone that most (stressed syllables of) loanwords and many onomatopoeic words have. Yale romanization doesn’t take this strategy, choosing instead to mark the more extreme (high or low) tones, and leaving the mid tones unmarked, which I suppose is also a reasonable strategy. But an unmarked low tone for Cantonese? Again, there is no motivation for this, and no easy way to remember this.

The choice of vowel notation for [y] and [œ], namely “eu” and “eo”, are also sub-optimal. Cantonese romanizations have been using “ue” for [y] for years. Jyutping uses “yu”, which is also OK. “eu” looks like it should be [ew]; alternatively, “eu” is the common (and Yale) romanization for [œ] (cf. my own name 祥 “Cheung”). This is a poor design choice. Similarly, why use “eo” for [œ] when “oe” looks more like IPA and “eu” is the romanization you see on the street? These choices are especially illogical considering that the [-ɛ] rhyme is spelled “-e”, and considering the existence of the rhyme [-ɛːw]. Well, OK, that should be spelled “-eu”, right? No! That’s been taken by [-y] already, so instead Pênkyämp makes an awkward work-around and spells it “-eau”. This problem could have been avoided by choosing more sensible vowel spellings in the first place.

But back to vowel length. This system makes a choice. It chooses to represent the vowel length distinction in Cantonese as primary, and kind of ignores vowel quality. Most other romanizations go the other way, distinguishing vowel quality but not representing length. But the fact of the matter is that you get both. There is a length distinction, and the short vowels all happen to be higher and more central than their long vowel counterparts. So who’s right? Is it vowel length, or vowel quality? The answer is that it’s both; the system is redundant. Why don’t we just let our romanization system be redundant as well? Take, for example, the case of Taiwanese romanization, where the the [-wa] rhyme is spelled “-oa”, and the [-wi] final is spelled “-ui”. Why not use “u” or “w” as the medial for both? Well, because they’re different rhymes, and you might as well make them visually distinct. It might be a surprising design choice, but it’s not a bad one.

Pênkyämp basically makes everything it can make obscure, obscure. The vowels are spelled funny. The tones are marked funny. Short/long vowels are distinguished, but not in any normal way: no doubled letters, no colons or IPA length marks, no macrons. No, to figure out if a vowel is long or short (which, remember, essentially changes what vowel it is), you have to glance over one or two letters and see if there’s a -p, -t, -k, -y, or -w there, then modify the vowel in your head to match. (One could argue that you’re supposed to read the entire rhyme as a unit, but the questions remains: how to make these units, which are composed of alphabetic symbols, most easily learned/parsed?) Moreover, making this short/long distinction serves no purpose. It just makes it more confusing.

I actually tried reading a sample text written in Pênkyämp, and it was pure torture. When every symbol is used in a nonconventional way, which Pênkyämp does, it becomes a monumental task to just to parse one syllable. Does Pênkyämp offer any ideas or insights of value to the larger issue of Cantonese romanization? I’m afraid the answer to that is an emphatic “no”.