Database optimization

From MediaWiki.org
Jump to: navigation, search

Database optimization is an important consideration in MediaWiki coding, particularly when the code is intended for deployment on Wikimedia projects.

Tips[edit | edit source]

General[edit | edit source]

ORDER BY expression --> filesort == bad. Use COUNT(), SUM(), GROUP BY, etc.; there is no limit on rows scanned. MAX()/MIN() of indexed field on entire table is okay. In some cases, denormalize for performance. Add information duplicated from other tables. Use summary tables, counter tables, cache tables, etc. Think about how the DB will run your query. Add indexes where needed. Batch queries (when it makes sense).

Extensions generally shouldn't add fields to core tables. Instead, they should make their own table(s), and JOIN them if necessary.

Profiling[edit | edit source]

When in doubt, don't make assumptions; profile.

Select only needed columns[edit | edit source]

Unless all columns are needed, avoid 'SELECT * FROM ...' as this consumes resources transferring extraneous data. Select only the needed columns.

WHERE[edit | edit source]

Avoid full table scans, except in edge cases in which it would actually improve performance since almost everything is going to be selected anyway.

Indices[edit | edit source]

Weigh the advantages of an index in making queries more efficient against the efficiency losses of having to change the index when insertions, deletions, and updates are done. Generally, you shouldn't run unindexed queries. WHERE on rarely false conditions is usually okay. Unindexed ORDER BY (filesort) is never okay.

LIMIT[edit | edit source]

Use LIMIT to limit number of rows returned. When possible, also limit the number of rows scanned. Remember, LIMIT 10 OFFSET 5000 scans 5010 rows. If you can, use WHERE foo_id >= 12345 instead.

JOIN[edit | edit source]

Use JOINs rather than looping SELECT queries.

Calculations[edit | edit source]

Do calculations in such a way that the indexes are still being used.

Multiple row operations[edit | edit source]

Use multiple row operations to avoid looped INSERT and UPDATE queries.[1]

Communication[edit | edit source]

If you have questions on this topic that are not answered by the documentation, you may wish to go on #wikimedia-tech and talk to binasher's successor (since he's gone now) or Preilly, both of whom are experts on Wikimedia and performance. If possible, be prepared to explain which queries are involved in the situation. Roan Kattouw, another MySQL optimization expert, answers questions on the wikitech-l mailing list and is sometimes available at #mediawiki.

See also[edit | edit source]

References[edit | edit source]

  1. Galperin, Eran (14 May 2009) Multiple row operations in MySQL / PHP, Techfounder.

External links[edit | edit source]

Databases Engines: MySQLOraclePostgreSQLSQLite
Technical documentation: Schema (tables) – API property associationsField prefixesPrimary key storage in other fieldsWikimedia extension tables
Configuration: SettingsSharing
Development: AccessOptimizationPolicyUpdaterExtension schema updatesPatch file
Core tables: archivecategorycategorylinkschange_tagconfigexternallinksfilearchivehitcounterimageimagelinksinterwikiiwlinksipblocksjobl10n_cachelanglinkslogginglog_searchmsg_resourcemsg_resource_linksmodule_depsobjectcacheoldimagepagepagelinkspage_propspage_restrictionsprotected_titlesquerycachequerycachetwoquerycache_inforecentchangesredirectrevisionsearchindexsitessite_statstag_summarytemplatelinkstexttranscacheupdateloguploadstashuseruser_former_groupsuser_groupsuser_newtalkuser_propertiesvalid_tagwatchlist