2009 Archives


| | Comments (0)

Apfeaeiip stands for "A Place For Everything, And Everything In Its Place". In the fourth edition of The Macintosh Bible (which, by the way, was an excellent book... i don't know of any modern equivalent introductions to the Mac that do it with the same humor), Arthur Naiman asserts that the word also, coincidentally enough, spells the Fijian word for "good housekeeping".

Anyway, this supposed factoid, which I didn't know if I should take seriously or not, has been bothering me for almost two decades now, so now that we have wikipedia, I decided to finally look up "Fijian language". It turns out that it can't possibly be true: for one thing, "apfeaeiip" violates Fijian CV syllable structure: furthermore, [p] and [f] only occur in loanwords in Fijian, not in the native vocabulary.

So there, mystery solved. And Mr. Naiman... you got me.

BibTeX and Chinese names

| | Comments (0)

As wonderful as BibTeX is, it's always bothered me that it formats non-initial names as "First Last" even when they're Chinese names (or Japanese, etc., but most of the citations I use are Chinese), which customarily put the surname first. "But how would BibTeX know that the name's Chinese?" you ask. Actually, it's nothing magical, I stick the actual characters into my bibliography file. E.g., Author = {Ikeda, Takumi \TC{池田巧}}, where \TC means "use a traditional chinese font" (defined in my XeLaTeX file). If only I could get BibTeX to check if the name includes that string, and handle it differently....

WARNING: the following code is in a language that uses Reverse Polish notation. Viewers who might be offended by RPN should not view this program.

As it turns out, the fix is quite simple. Just add a function to your .bst file:

{ 'text :=
    { text empty$ not }
    { text #1 #3 substring$ duplicate$ "\SC" = swap$ "\TC" = or
      { pop$
        "" 'text :=
        text #2 global.max$ substring$ 'text :=

And then, where the file calls format.name$, you can add an if statement to see if it contains Chinese or not. So, this:

s nameptr "{ff~}{vv~}{ll}{, jj}" format.name$

would turn into this:

s nameptr
    s nameptr "{ff}" format.name$ cjk.contains
      { "{vv~}{ll}{~ff}{, jj}" }
      { "{ff~}{vv~}{ll}{, jj}" }

Neat, huh? By the way, BibTeX doesn't define the boolean function not, so if your .bst file doesn't define it you'll have to add that in. For a down-and-dirty guide to BibTeX, check out this link:


Or more generally, this page:


Update on 2012 August 2: Wow, I can barely read my own code anymore! Let me try to clarify:

First, the code assumes that (1) you have identified all your CJK text using your custom commands \TC{ or \SC (if you use different commands you should change the code accordingly), and (2) any author in your bibliography file whose first name contains either of the strings \TC{ or \SC should be formatted with the last name first.

Now we look at the function with some comments added:

STRINGS{text} %% define "text" as a variable
FUNCTION{cjk.contains} %% the name of this function is "cjk.contains"
{ 'text := %% store whatever value is on the top of the stack in "text"
  #0 %% return 0 (false) unless the following code changes that

    { text empty$ not }
    %% the condition for the "while" loop: while text is not empty...

    { text #1 #3 substring$ duplicate$ "\SC" = swap$ "\TC" = or
    %% the "if" clause: if the first through third characters of "text"
    %% equals "\SC" or "\TC" (we have to duplicate and swap
    %% because we do two equality tests, and then "or" them)
      { pop$ %% get rid of the 0
        #1 %% and put 1 (true) on the stack
        "" 'text := %% set "text" to empty
        text #2 global.max$ substring$ 'text :=
        %% if "text" does not start with "\SC" or "\TC",
        %% delete the first character and try again

And now for the formatting code: s nameptr "{ff}" format.name$ extracts only the "first name" portion of the name and passes it to cjk.contains. If it returns true we order the last name before the first name, with no comma in between; otherwise we use the same format as before. So basically this format string

"{ff~}{vv~}{ll}{, jj}"

gets surrounded by a giant "if" clause, like so:

    s nameptr "{ff}" format.name$ cjk.contains
      { "{vv~}{ll}{~ff}{, jj}" }

If this is still too abstract, perhaps a concrete example will help: I have now posted both the original and modified bst file that I used for my dissertation, which follows the Linguistic Inquiry stylesheet:



Hopefully people out there will find this useful!

One of the terms that's always confused me in linguistics is that of languages being genetically related. Linguistic relatedness doesn't necessarily have anything to do biological relatedness, so why use this term?

As it turns out, the term "genetic" here doesn't mean "having to do with genes", but rather "having a common origin" (think "genesis"). I.e., there's two distinct meanings of the word genetic, one for biology and one for linguistics. Of course, no one's ever bothered to explain this to me... a glaring oversight, especially since the biological meaning is the common one.

Now how about the Chinese translation? In Chinese, to say languages x, y, and z are genetically related, you can use an awkward phrase like this: "x, y, z 等語言在發生學上具有淵源關係". WTF? What does that even mean? I mean, I know it's supposed to mean "x, y, z etc. are genetically related", but really what it translates to is something like "in genetics, x, y, and z have a common-origin relationship", since 發生學, short for 發育生物學 (according to the wikipedia redirect), means genetics or developmental biology—in the biological sense. This is a terrible mistranslation of the English term, imbuing it with a biological significance that it really shouldn't have.

So, I object to the use of the term 發生學 to mean "genetic" in the linguistic sense. It's a good thing I've figured this out... Whereas before I would furrow my eyebrows in confusion whenever I encountered the term in Chinese, I will now shake my head in disapproval instead.

Waikiki Gun Clubs

| | Comments (0)


[Guest post by Andrew]: While walking along Waikiki Beach, I was struck by the large numbers of people carrying flyers advertising indoor shooting ranges. The predominant language in all of these flyers was Japanese (see image for a representative sample). The people passing out flyers were clearly profiling - aiming for Japanese-looking tourists, and generally avoiding white tourists. When I made the mistake of expressing the slightest bit of curiosity, I was subjected to a rapid fire sales pitch in broken Japanese. I forget the exact phrasing of the pitch, but it more or less corresponded to the copy on the flyer. To provide some idea of the sales pitch: “SHOOT WITH LIVE AMMUNITION: FEEL THE REFRESHING POWER OF EACH SHOT” and, even more interesting, “The only Hawaii Gun Club with a Japanese owner.” The back of the flyer lists the various “courses” that one can select. The cheapest option ($25) offers 32 shots with a combination of small caliber (22) pistol, rifle, and revolver. The VIP ($95) and SWAT ($115) options include higher caliber weapons (12 gauge shotgun, 44 Magnum) and assault weapons (M-16, AK-47, H&K USC, etc.)

The verbal sales pitches often had a barely veiled sexual subtext: e.g. “be a real man, and go upstairs and fire a few rounds” or “Haven’t you ever fired a gun before? You’ll never know what it’s like until you’ve tried it.” The gun club advertisements emphasis on the “reality” of firing “real” weapons with “real” ammunition seems like an overcompensation for the essential artificiality of walking off the Waikiki beach into a supremely artificial and controlled shooting range environment in order to get a taste of the “real” experience of firing assault weapons. There also seems to be a strange historical irony at work - more than fifty years after Pearl Harbor, visitors from a quasi-demilitarized Japan are being encouraged to pick up assault weapons and release their frustration and stress in a safe and friendly tourist environment. I suppose one way to read the Waikiki gun clubs is that they are in a way, the most honest and authentic elements in the Waikiki bubble - the closet references to the violent history of American annexation and military occupation of Hawaiʻi. While the hotels and shopping malls of Waikiki all maintain an atmosphere of carefully composed tranquility and paradise, the gun club, in all its awkward display - might be the closest thing to reality.

On Penkyamp and other atrocities

| | Comments (0)

I recently stumbled across a romanization system for Cantonese called Pênkyämp (which, amazingly enough, is supposed to be pronounced [pʰɪŋ³³jɐm⁵⁵], or "ping yum", for the non-IPA-readers).

http://bbs.cantonese.asia/viewthread.php?tid=9970 Yes, it is even discussed on mainland bulletin boards and promoted in blogs.

Pênkyämp is, hands down, the most confusing Cantonese romanization ever devised. I suppose the distinguishing feature is that it encodes the length distinction, e.g. between [sɐm⁵⁵] 'heart' and [saːm⁵⁵] 'three'. But it does this adding a consonant symbol at the end of the short-vowel syllable. So, 'three' is spelled "sam", which is fine, but 'heart' is spelled "samp", which just looks ridiculous. Similarly, 'square' [fɔːŋ⁵⁵] is "fong", whereas 'wind' [fʊŋ⁵⁵] is "fonk"; and 'eat' [sɪk²²] is "sek", while 'rock' [sɛk²²] is "seg".

This scheme offends me because it's wildly non-iconic—i.e., things that are longer should look longer. But for "sam" vs. "samp", the vowels, which are crucially different here, look exactly the same, and to make the vowel shorter you add something to the end, rather than modify the vowel symbol itself, which I think would be the desirable thing since the vowel is the most salient part of the syllable. Most systems for Cantonese do this, viz. "saam" vs. "sam".

Now, it is true cross-linguistically that voiceless coda consonants (like -p, -t, -k) make vowel durations shorter, and voiced ones (-b, -d, -g) make vowel durations longer. However, this is only subphonemically true for languages that already have a voicing contrast in that position. I know of no orthographies that use this fact to indicate actual, phonemic vowel length. Furthermore, syllables like [saːm⁵⁵] and [sɐm⁵⁵] tend to have the same overall duration. When the vowel shortens, the coda consonant lengthens to make up for it. So if you're going to tack something on the end, why not write "saam" and "samm"? Isn't that much more intuitive?

My next objection is the tone marks. Apparently you can choose between numbers and diacritics, and since the numbers are standard (1 through 6 stand for high 55, rising 25, mid 33, low 21/11, low rising 23, and low-mid 22), I have no problems with them. The terrible design choice is the diacritic tone marks. Going from tones 1 through 6, we have ä, ã, â, a, á, à.

First of all: an umlaut for the high tone? An umlaut?! Umlauts, whenever they are used, are used to change vowel quality, i.e., the vowel itself, and not length or pitch or stress or whatever. Umlauts are not appropriate for marking tone. Ever. (As an example, look at pinyin "u" vs. "ü".) And besides, what's wrong with the macron? Wouldn't "ā" do just as well, if not better?

Next, the second and fifth tones. The second tone is by far the more common of the two, and so should get the less weird tone mark. If you're going to use an acute accent for rising tone, then "á" should mark second tone, not the fifth tone. I suppose the use of the tilde for rising tone may be inspired by its use for the glottalized rising tone in Vietnamese; however, in most orthographies, the tilde is used for nasalization. The tilde is also reminiscent of the IPA falling-rising tone mark, a complex symbol which looks like this: [a᷈]. But this association also seems inappropriate for the straightforwardly rising tone in Cantonese. For the fifth tone, a haček would seem more appropriate (cf. the third tone in Mandarin).

The third tone, a level tone marked with a circumflex, is completely puzzling to me. Why mark a level tone with a hat? It makes no sense. In Vietnamese, circumflexes are used to distinguish vowels; in IPA, they're used for falling tones. There's just no motivation for this usage here.

Finally, the low (fourth) tone is left unmarked in Pênkyämp. This decision also seems counter-intuitive to me. If any tone should be unmarked, it should be the first tone. This is the tone that most (stressed syllables of) loanwords and many onomatopoeic words have. Yale romanization doesn't take this strategy, choosing instead to mark the more extreme (high or low) tones, and leaving the mid tones unmarked, which I suppose is also a reasonable strategy. But an unmarked low tone for Cantonese? Again, there is no motivation for this, and no easy way to remember this.

The choice of vowel notation for [y] and [œ], namely "eu" and "eo", are also sub-optimal. Cantonese romanizations have been using "ue" for [y] for years. Jyutping uses "yu", which is also OK. "eu" looks like it should be [ew]; alternatively, "eu" is the common (and Yale) romanization for [œ] (cf. my own name 祥 "Cheung"). This is a poor design choice. Similarly, why use "eo" for [œ] when "oe" looks more like IPA and "eu" is the romanization you see on the street? These choices are especially illogical considering that the [-ɛ] rhyme is spelled "-e", and considering the existence of the rhyme [-ɛːw]. Well, OK, that should be spelled "-eu", right? No! That's been taken by [-y] already, so instead Pênkyämp makes an awkward work-around and spells it "-eau". This problem could have been avoided by choosing more sensible vowel spellings in the first place.

But back to vowel length. This system makes a choice. It chooses to represent the vowel length distinction in Cantonese as primary, and kind of ignores vowel quality. Most other romanizations go the other way, distinguishing vowel quality but not representing length. But the fact of the matter is that you get both. There is a length distinction, and the short vowels all happen to be higher and more central than their long vowel counterparts. So who's right? Is it vowel length, or vowel quality? The answer is that it's both; the system is redundant. Why don't we just let our romanization system be redundant as well? Take, for example, the case of Taiwanese romanization, where the the [-wa] rhyme is spelled "-oa", and the [-wi] final is spelled "-ui". Why not use "u" or "w" as the medial for both? Well, because they're different rhymes, and you might as well make them visually distinct. It might be a surprising design choice, but it's not a bad one.

Pênkyämp basically makes everything it can make obscure, obscure. The vowels are spelled funny. The tones are marked funny. Short/long vowels are distinguished, but not in any normal way: no doubled letters, no colons or IPA length marks, no macrons. No, to figure out if a vowel is long or short (which, remember, essentially changes what vowel it is), you have to glance over one or two letters and see if there's a -p, -t, -k, -y, or -w there, then modify the vowel in your head to match. (One could argue that you're supposed to read the entire rhyme as a unit, but the questions remains: how to make these units, which are composed of alphabetic symbols, most easily learned/parsed?) Moreover, making this short/long distinction serves no purpose. It just makes it more confusing.

I actually tried reading a sample text written in Pênkyämp, and it was pure torture. When every symbol is used in a nonconventional way, which Pênkyämp does, it becomes a monumental task to just to parse one syllable. Does Pênkyämp offer any ideas or insights of value to the larger issue of Cantonese romanization? I'm afraid the answer to that is an emphatic "no".

handy chart for Yi script

| | Comments (0)

The Yi (Nuosu) script is crazy! I've been trying to learn Nuosu, and have made a handy reference chart (inspired by the jiǎnzhì 简志, which has a foldout chart in the back).

Some of the characters are adapted from Chinese. See, e.g., cyp 'one', nyip 'two', suo 'three', ly 'four, fut 'six'. The characters have been turned 90 degrees clockwise since their inception. So unturn it in your mind, and you'll see the resemblance.

By the way, the -t and -p are tone marks: -t is high tone, -p is low tone (and -x is rising).

PDF and html versions below:



Japanese delicacies

| | Comments (0)

Last night, at the Association for Linguistic Typology banquet, I was picking up some sushi rolls from the table when an elderly gentleman (his name is E---- K----, I later discovered) said to me, "Aren't those delicacies from your country?"

I was like, "What?"

E---- K---- clarified by asking, "those are Japanese, aren't they?"

I was so shocked that I didn't know what to say. I think I said something like, "Close, but not quite," and ran off.

Who says linguists can't be racist?

Korean totem poles

| | Comments (0)


While at the Korean Bell of Friendship in San Pedro, I saw these two wooden figures that had written on them 天下大將軍 and 地下女將軍. (They also look like they've seen better days.) I was puzzled, took a picture, and now have looked them up on the internet.

Apparently these are called jangseung 장승, and traditionally they're placed outside villages to ward off demons, etc. There's even a 184-page photo book of them, called Changsŭng, Village Guardian God of Korea (1993, Hwang Hŏn-man 黃憲萬).

Traveler's Tales: Tibet

| | Comments (0)

I've been reading Traveler's Tales: Tibet (link to Google books), and I must say it has some pretty incredible stories... I recommend!

Everybody's Cantonese

| | Comments (0)

I've been going through an old book entitled Everybody's Cantonese (1949, by Chan Yeung Kwong), and although the vocabulary is pretty basic, I did find some old pronunciations and interesting characters. For example, 咁 is transcribed as gom3, with a back rounded vowel (nowadays usually pronounced gɐm3); and 粒 is transcribed as nɐp5 (which I've always heard as lɐp5). These appear to the old pronunciations which have gone out of fashion.

Interesting characters include 氈 dzin1 'blanket', 笪 daːt3 'classifier for places', 樖 pɔ1 'classifier for trees', and 擸𢶍 laːp6saːp3 'trash' (now usually written 垃圾). I've always wondered about the word for trash, which in mainland Mandarin is pronounced la1ji1, but in Taiwan is pronounced le4se4. Why the difference? Are one or both of the variants related to the Cantonese word, and how?

fonts: oldies but goodies

| | Comments (1)

i finally got an intel mac, and i've been spending part of the last few days reinstalling files that didn't get transferred automatically. Imagine my horror when i opened my web page and discovered that the font had suddenly turned ugly! what had happened to my beloved charcoal font?

as it turns out, charcoal is a classic-only font. i ended up retrieving it from my old OS9 System Folder, along with such oldies-but-goodies as Gadget and SteveHand. I also downloaded a San Francisco-inspired TrueType ransom note font called St. Francis:

St. Francis font

ah, the memories!

first QP!

| | Comments (0)

It's finally done! You can download it from the following link and look at all the pretty charts.

Lizu and Proto-Tibeto-Burman

The newspapers are making a big deal about how the mainland translation skips out on "communism" and "dissent", which got me looking for the full, uncut translation from Hong Kong-based broadcaster Phoenix Satellite Television, which is mentioned—but, rather inconveniently, not linked to—in the English-language media. So I extracted the text and have posted it below for general (and translators') interest's sake.

2009年01月21日 02:23 北方網
Text of President Obama’s inaugural address on Tuesday [2009 January 20], as prepared for delivery and released by the Presidential Inaugural Committee.
各位同胞: My fellow citizens:
今天我站在這裡,為眼前的重責大任感到謙卑,對各位的信任心懷感激,對先賢的犧牲銘記在心。我要謝謝布希總統為這個國家的服務,也感謝他在政權轉移期間的寬厚和配合。 I stand here today humbled by the task before us, grateful for the trust you have bestowed, mindful of the sacrifices borne by our ancestors. I thank President Bush for his service to our nation, as well as the generosity and cooperation he has shown throughout this transition.

surrounded by white people

| | Comments (0)

Having been surrounded by Asian American people all my life, I've been kind of shocked at how white grad school is. My house has almost 60 people, and last semester there were around 6 including me, which is an increase from previous years. My department has the same ratio: about 6 Asian Americans out of almost 60 grad students.

According to the official figures from UC Berkeley (download a copy), out of 8372 graduate students from the U.S., 1776, or 21%, are Asian American or Pacific Islander. Compare with 4258, or just over 51%, who are White. Certainly API is the second largest group (third is "No data" at 860, followed by Chicano/Latino at 640, "Other" at 395, Black at 328, and Native American at 115), but it pales in comparison with the undergraduate population: 10456 API out of 24076, which is 43%, compared with 7740 White (32%).

When I was admitted to grad school, I thought it a little odd that my financial aid was from the Eugene Cota-Robles Fellowship, which (as I understand it) is for supporting underrepresented students, but now it makes total sense.

As a side note, there are 1886 international grad students, and out of those 1305 are male, 581 are female. That's almost 70%!

About this Archive

This page is an archive of entries from January 2009 listed from newest to oldest.

2008 is the previous archive.

2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 5.2.7