Manual:$wgCategoryCollation

From MediaWiki.org
Jump to: navigation, search
Category: $wgCategoryCollation
What collation categories use to sort with
Introduced in version: 1.17.0 (r72308)
Removed in version: still in use
Allowed values: string
Default value: uppercase

Other settings: Alphabetical | By Function

Details[edit | edit source]

The setting determines what collation algorithm should be used to sort category listings. See also MediaWiki 1.17/Category sorting.

Currently supports:

  • uppercase [default]: make everything uppercase for sorting.
  • uca-default [MW 1.17+]: complex, much more multilingual friendly category collation.
  • identity [MW 1.18+]: sort by binary value of string when stored as UTF-8. Essentially sort by code point.
  • uca-<langcode> [MW 1.21+]: uca-default with language-specific adjustments, see below.

Since MediaWiki 1.18 extensions can add extra collations via the Collation::factory hook.

The value is also stored inside the categorylinks table to determine which rows need updating when the collation algorithm changes.

Warning Warning: Updating collations is slow and may lock up the database on large wikis. See bug 56041 for details.
Warning Warning:
  • After changing this option, you must run updateCollation.php to recompute sort keys for all pages, or your categories will be sorted inconsistently.
  • uca-default/uca-xx collations require the PHP intl extension.
  • If you're seeing unexpected results after changing this to uca-default/uca-xx (such as pages starting with "A" sorted under the "⅍" symbol, etc.), you probably have to generate collation data appropriate for your version of ICU using the language/generateCollationData.php maintenance script (bug 43740), then rerun updateCollation.php with a --force parameter.
  • You have to purge category pages after running updateCollation.php to see the results.

Language-specific collations[edit | edit source]

Since version 1.21 MediaWiki also supports 68 collations designed for specific languages. These are based on uca-default and have the same requirements; they are named uca-<langcode>, where <langcode> is one of: af, ast, az, be, be-tarask, bg, br, bs, ca, co, cs, cy, da, de, dsb, el, en, eo, es, et, eu, fi, fo, fr, fur, fy, ga, gd, gl, hr, hsb, hu, is, it, kk, kl, ku, ky, la, lb, lt, lv, mk, mo, mt, nl, no, oc, pl, pt, rm, ro, ru, rup, sco, sk, sl, smn, sq, sr, sv, tk, tl, tr, tt, uk, uz, vi. For example, to use a collation for Spanish, one would use the uca-es collation.

Using these collations provides both correct sorting order for given language and proper headings for first letters of article titles.

Getting new collations added[edit | edit source]

There are two parts to having a new language supported:

  • It being supported by the ICU library (the list of language codes it supports is available at [1]).
  • It being additionally supported by MediaWiki itself (this basically requires listing the additional characters, or character groups, that are considered separate letters in the given language, in addition to the basic alphabet) – the always up-to-date list of currently supported ones is available at [2]).

It might also be the case that the default ICU ordering ('uca-default' collation) orders the titles correctly, but does not correctly separate the letters – it can be used for the first step in that case. Sometimes the letter ordering of a different language might fit yours, if they are related – a custom collation can sometimes be provided in such case (there is one for Sorani Kurdish / Central Kurdish language ('ckb') already, called xx-uca-ckb [3]).

Language: English  • français