PREV UP NEXT Regex

2.3: Collating Elements vs. Characters

posix generalizes the notion of a character to that of a collating element. It defines a collating element to be ``a sequence of one or more bytes defined in the current collating sequence as a unit of collation.''

This generalizes the notion of a character in two ways. First, a single character can map into two or more collating elements. For example, the German ``es-zet'' collates as the collating element s followed by another collating element s. Second, two or more characters can map into one collating element. For example, the Spanish ll collates after l and before m.

Since posix's ``collating element'' preserves the essential idea of a ``character,'' we use the latter, more familiar, term in this document.