Mac OS X, the Terminal, Unicode and ls

Posted on July 17, 2011

Years ago I’ve figured out how to configure Mac OS X’s and the .inputrc to support umlauts encoded in UTF-8 in terminal windows. But there was still one essential piece missing. I was able to enter umlauts, vim was able to display umlauts, ls | cat showed correct umlauts in filenames, even bash completion was able to complete umlauts properly, but executing simply ls failed to display correct UTF-8 encoded umlauts. Today — after probably five years — I’ve found the first workaround: ls -w.

   -w    Force raw printing of non-printable characters.  This is the default
         when output is not to a terminal.

Though, this workarond is ugly, it works. It indicates that ls fails to determine umlauts as printable characters although the locale is set to a german UTF-8 locale. By forcing ls to ignore the fact that a character is supposedly non-printable, one might indeed get trouble as soon as a file name contains really non-printable characters. Until this happens I’ll be happy with this workaround. At long last, it seems feasible to use proper file names in german. This was the last missing piece for me to start using german file names.

In the end it turns out there seems to be one more programmer (the author of ls of OS X) who should have read Joel Spolsky’s blog entry: “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)” I wish there was a law forcing everyone to read that before he is allowed to program one line of code. It would have saved me — probably — hundreds of hours of my life.