BibTeX and Chinese names

As wonderful as BibTeX is, it’s always bothered me that it formats non-initial names as “First Last” even when they’re Chinese names (or Japanese, etc., but most of the citations I use are Chinese), which customarily put the surname first. “But how would BibTeX know that the name’s Chinese?” you ask. Actually, it’s nothing magical, I stick the actual characters into my bibliography file. E.g., Author = {Ikeda, Takumi \TC{池田巧}}, where \TC means “use a traditional chinese font” (defined in my XeLaTeX file). If only I could get BibTeX to check if the name includes that string, and handle it differently….

WARNING: the following code is in a language that uses Reverse Polish notation. Viewers who might be offended by RPN should not view this program.

As it turns out, the fix is quite simple. Just add a function to your .bst file:

STRINGS{text}
FUNCTION{cjk.contains}
{ 'text :=
  #0
	{ text empty$ not }
	{ text #1 #3 substring$ duplicate$ "\SC" = swap$ "\TC" = or
	  { pop$
		#1
		"" 'text :=
	  }
	  {
		text #2 global.max$ substring$ 'text :=
	  }
	if$
	}
  while$
}

And then, where the file calls format.name$, you can add an if statement to see if it contains Chinese or not. So, this:

s nameptr "{ff~}{vv~}{ll}{, jj}" format.name$

would turn into this:

s nameptr
	s nameptr "{ff}" format.name$ cjk.contains
      { "{vv~}{ll}{~ff}{, jj}" }
      { "{ff~}{vv~}{ll}{, jj}" }
    if$
format.name$

Neat, huh? By the way, BibTeX doesn’t define the boolean function not, so if your .bst file doesn’t define it you’ll have to add that in. For a down-and-dirty guide to BibTeX, check out this link:

http://www.lsv.ens-cachan.fr/~markey/BibTeX/doc/ttb_en.pdf

Or more generally, this page:

http://www.lsv.ens-cachan.fr/~markey/bibla.php

Update on 2012 August 2: Wow, I can barely read my own code anymore! Let me try to clarify:

First, the code assumes that (1) you have identified all your CJK text using your custom commands \TC{ or \SC (if you use different commands you should change the code accordingly), and (2) any author in your bibliography file whose first name contains either of the strings \TC{ or \SC should be formatted with the last name first.

Now we look at the function with some comments added:

STRINGS{text} %% define "text" as a variable
FUNCTION{cjk.contains} %% the name of this function is "cjk.contains"
{ 'text := %% store whatever value is on the top of the stack in "text"
  #0 %% return 0 (false) unless the following code changes that

	{ text empty$ not }
	%% the condition for the "while" loop: while text is not empty...

	{ text #1 #3 substring$ duplicate$ "\SC" = swap$ "\TC" = or
	%% the "if" clause: if the first through third characters of "text"
	%% equals "\SC" or "\TC" (we have to duplicate and swap
	%% because we do two equality tests, and then "or" them)
	  { pop$ %% get rid of the 0
		#1 %% and put 1 (true) on the stack
		"" 'text := %% set "text" to empty
	  }
	  {
		text #2 global.max$ substring$ 'text :=
		%% if "text" does not start with "\SC" or "\TC",
		%% delete the first character and try again
	  }
	if$
	}
  while$
}

And now for the formatting code: s nameptr "{ff}" format.name$ extracts only the “first name” portion of the name and passes it to cjk.contains. If it returns true we order the last name before the first name, with no comma in between; otherwise we use the same format as before. So basically this format string

"{ff~}{vv~}{ll}{, jj}"

gets surrounded by a giant “if” clause, like so:

	s nameptr "{ff}" format.name$ cjk.contains
      { "{vv~}{ll}{~ff}{, jj}" }
      { INSERT ORIGINAL FORMAT STRING HERE }
    if$

If this is still too abstract, perhaps a concrete example will help: I have now posted both the original and modified bst file that I used for my dissertation, which follows the Linguistic Inquiry stylesheet:

linquiry2.bst

linquiry2-cjk.bst

Hopefully people out there will find this useful!