Manual:searchindex table

MediaWiki database layout

searchindex table

The searchindex table is used to provide full text searches. Those can only be done with the MyISAM table type and the text table (cur table in 1.4 and earlier) uses the InnoDB type to improve concurrency, so a copy of the current page text is required. If using Postgres, this table does not exist: the full text information is stored as columns in the page and pagecontent tables directly.

Fields[edit | edit source]

si_page[edit | edit source]

Key to page_id.

si_title[edit | edit source]

Munged version of page title.

si_text[edit | edit source]

Munged version of the current text of the page.

Versions[edit | edit source]

MediaWiki version:

1.19

mysql> describe searchindex;
+----------+------------------+------+-----+---------+-------+
| Field    | Type             | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+-------+
| si_page  | int(10) unsigned | NO   | PRI | NULL    |       |
| si_title | varchar(255)     | NO   | MUL |         |       |
| si_text  | mediumtext       | NO   | MUL | NULL    |       |
+----------+------------------+------+-----+---------+-------+

MediaWiki version:

1.18

"DESCRIBE searchindex" in MediaWiki 1.18 gives the following:

+----------+------------------+------+-----+---------+-------+
| Field    | Type             | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+-------+
| si_page  | int(10) unsigned | NO   | PRI | NULL    |       |
| si_title | varbinary(255)   | NO   | MUL | NULL    |       |
| si_text  | mediumblob       | NO   | MUL | NULL    |       |
+----------+------------------+------+-----+---------+-------+

MediaWiki version:

1.9

mysql> describe mw_searchindex;
+----------+-----------------+------+-----+---------+-------+
| Field    | Type            | Null | Key | Default | Extra |
+----------+-----------------+------+-----+---------+-------+
| si_page  | int(8) unsigned | NO   | PRI | NULL    |       |
| si_title | varchar(255)    | NO   | MUL | NULL    |       |
| si_text  | mediumtext      | NO   | MUL | NULL    |       |
+----------+-----------------+------+-----+---------+-------+
3 rows in set (0.02 sec)

The fields are:

si_page: page ID from page.
si_title: page title from page.
si_text: the indexed text from text.

Indexes for this, using en wikipedia for cardinality:

+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table       | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| searchindex |          0 | si_page  |            1 | si_page     | A         |      797275 |     NULL | NULL   |      | BTREE      |         |
| searchindex |          1 | si_title |            1 | si_title    | A         |      265758 |     NULL | NULL   |      | FULLTEXT   |         |
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

A common search query is of the form:

SELECT page_id, page_namespace, page_title FROM `page`,`searchindex` WHERE page_id=si_page AND MATCH(si_text) AGAINST('+searchterm' IN BOOLEAN MODE) AND page_is_redirect=0 AND page_namespace IN (0) LIMIT 20

In MediaWiki version 1.4 and earlier:

SELECT cur_id,cur_namespace,cur_title,cur_text FROM cur,searchindex WHERE cur_id=si_page AND (MATCH(si_text) AGAINST('+searchterm' IN BOOLEAN MODE) AND cur_is_redirect=0) AND cur_namespace IN (0,9,11) LIMIT 0, 20;

Explain shows the following:

+-------------+--------+-------------------------------------------+---------------+---------+------------+--------+-------------+
| table       | type   | possible_keys                             | key           | key_len | ref        | rows   | Extra       |
+-------------+--------+-------------------------------------------+---------------+---------+------------+--------+-------------+
| cur         | range  | cur_id,cur_namespace,name_title_timestamp | cur_namespace |       1 | NULL       | 317499 | Using where |
| searchindex | eq_ref | si_page                                   | si_page       |       4 | cur.cur_id |      1 | Using where |
+-------------+--------+-------------------------------------------+---------------+---------+------------+--------+-------------+

Known problems[edit | edit source]

A range scan with namespaces containing a large number of records requires that a large portion of the searchindex table be loaded into RAM. This is very costly, in part because the searchindex table contains the full text of the article. A long term plan is to remove the text from the searchindex table. Temporary measures may involve adding fields to searchindex and indexing them (and using a self join to allow both that index and the fulltext index to be used) or adding a searchindex index to decrease the number of complete searchindex records which must be read to match the query.

Databases

Engines: MySQL – Oracle – PostgreSQL – SQLite
Technical documentation: Schema (tables) – API property associations – Field prefixes – Primary key storage in other fields – Wikimedia extension tables
Configuration: Settings – Sharing
Development: Access – Optimization – Policy – Updater – Extension schema updates – Patch file
Core tables: archive – category – categorylinks – change_tag – config – externallinks – filearchive – hitcounter – image – imagelinks – interwiki – iwlinks – ipblocks – job – l10n_cache – langlinks – logging – log_search – msg_resource – msg_resource_links – module_deps – objectcache – oldimage – page – pagelinks – page_props – page_restrictions – protected_titles – querycache – querycachetwo – querycache_info – recentchanges – redirect – revision – searchindex – sites – site_stats – tag_summary – templatelinks – text – transcache – updatelog – uploadstash – user – user_former_groups – user_groups – user_newtalk – user_properties – valid_tag – watchlist

Language:	English • 日本語

Manual:searchindex table

Contents

Fields[edit | edit source]

si_page[edit | edit source]

si_title[edit | edit source]

si_text[edit | edit source]

Versions[edit | edit source]

Known problems[edit | edit source]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Support

Development

MediaWiki.org

Print/export

Tools