BibTeX and Chinese names
As wonderful as BibTeX is, it’s always bothered me that it formats non-initial names as “First Last” even when they’re Chinese names (or Japanese, etc., but most of the citations I use are Chinese), which customarily put the surname first. “But how would BibTeX know that the name’s Chinese?” you ask. Actually, it’s nothing magical, I stick the actual characters into my bibliography file. E.g., Author = {Ikeda, Takumi \TC{池田巧}}
, where \TC means “use a traditional chinese font” (defined in my XeLaTeX file). If only I could get BibTeX to check if the name includes that string, and handle it differently….
WARNING: the following code is in a language that uses Reverse Polish notation. Viewers who might be offended by RPN should not view this program.
As it turns out, the fix is quite simple. Just add a function to your .bst
file:
STRINGS{text}
FUNCTION{cjk.contains}
{ 'text :=
#0
{ text empty$ not }
{ text #1 #3 substring$ duplicate$ "\SC" = swap$ "\TC" = or
{ pop$
#1
"" 'text :=
}
{
text #2 global.max$ substring$ 'text :=
}
if$
}
while$
}
And then, where the file calls format.name$
, you can add an if
statement to see if it contains Chinese or not. So, this:
s nameptr "{ff~}{vv~}{ll}{, jj}" format.name$
would turn into this:
s nameptr
s nameptr "{ff}" format.name$ cjk.contains
{ "{vv~}{ll}{~ff}{, jj}" }
{ "{ff~}{vv~}{ll}{, jj}" }
if$
format.name$
Neat, huh? By the way, BibTeX doesn’t define the boolean function not
, so if your .bst
file doesn’t define it you’ll have to add that in. For a down-and-dirty guide to BibTeX, check out this link:
http://www.lsv.ens-cachan.fr/~markey/BibTeX/doc/ttb_en.pdf
Or more generally, this page:
http://www.lsv.ens-cachan.fr/~markey/bibla.php
Update on 2012 August 2: Wow, I can barely read my own code anymore! Let me try to clarify:
First, the code assumes that (1) you have identified all your CJK text using your custom commands \TC{
or \SC
(if you use different commands you should change the code accordingly), and (2) any author in your bibliography file whose first name contains either of the strings \TC{
or \SC
should be formatted with the last name first.
Now we look at the function with some comments added:
STRINGS{text} %% define "text" as a variable
FUNCTION{cjk.contains} %% the name of this function is "cjk.contains"
{ 'text := %% store whatever value is on the top of the stack in "text"
#0 %% return 0 (false) unless the following code changes that
{ text empty$ not }
%% the condition for the "while" loop: while text is not empty...
{ text #1 #3 substring$ duplicate$ "\SC" = swap$ "\TC" = or
%% the "if" clause: if the first through third characters of "text"
%% equals "\SC" or "\TC" (we have to duplicate and swap
%% because we do two equality tests, and then "or" them)
{ pop$ %% get rid of the 0
#1 %% and put 1 (true) on the stack
"" 'text := %% set "text" to empty
}
{
text #2 global.max$ substring$ 'text :=
%% if "text" does not start with "\SC" or "\TC",
%% delete the first character and try again
}
if$
}
while$
}
And now for the formatting code: s nameptr "{ff}" format.name$
extracts only the “first name” portion of the name and passes it to cjk.contains
. If it returns true we order the last name before the first name, with no comma in between; otherwise we use the same format as before. So basically this format string
"{ff~}{vv~}{ll}{, jj}"
gets surrounded by a giant “if” clause, like so:
s nameptr "{ff}" format.name$ cjk.contains
{ "{vv~}{ll}{~ff}{, jj}" }
{ INSERT ORIGINAL FORMAT STRING HERE }
if$
If this is still too abstract, perhaps a concrete example will help: I have now posted both the original and modified bst file that I used for my dissertation, which follows the Linguistic Inquiry stylesheet:
Hopefully people out there will find this useful!