Extension:DataModel
DataModel Release status: beta |
|
---|---|
Description | DataModel will allow data to be written to/read from a backend, while caching the results and providing an easy interface for developers to work with. |
MediaWiki | 1.21wmf8 |
Database changes | No |
License | GPLv2 |
Download | |
Translate the DataModel extension if it is available at translatewiki.net |
|
Check usage and version matrix; code metrics | |
Open tasks · Report a bug |
Contents
Installation[edit | edit source]
To install this extension, add the following to LocalSettings.php:
require_once( "$IP/extensions/DataModel/DataModel.php" );
This extension depends on an addition to MediaWiki core, Gerrit change 25879 ("Added merge() function to BagOStuff for CAS-like functionality"), which will be part of MediaWiki version 1.21.
Technical details[edit | edit source]
DataModel[edit | edit source]
DataModel.php can be easily extended (as sample/DataModelSample.php does) for a specific implementation. Basically, these methods can then be used to get/manipulate the data:
Public methods[edit | edit source]
::get( $id, $shard ): Will return an object (e.g. DataModelSample) that represents all data for the id requested
::getList( $name, $shard = null, $offset = 0, $sort = null, $order = 'ASC' ): Will return an instance of DataModelList (implements Iterator) that contains objects (e.g. DataModelSample) for a certain "list" (more info on lists later)
::getCount( $name, $shard = null ): Will return the amount (integer) of entries in a certain list
->insert(): Will insert the data of a newly created object (e.g. DataModelSample) into the DB (and temporarily cache)
->update(): Will update altered data of an existing object in the DB (and cache)
->delete(): Will delete DB & cache data
Static vars[edit | edit source]
static $table, $idColumn, $shardColumn: These static vars should be set in the extending class (e.g. sample/DataModelSample.php) for the DataModel code to know which value the key should be sharded over (or not - sharding is assumed), which table to write the data to and which unique id to save for the lists.
static $lists, $sorts (I promised to get back to this): Performing a selection query (e.g. "... WHERE visible = 1 ORDER BY title DESC") is quite a PITA when the data is possibly spread over multiple sharded DB's. Instead, a specific DataModel implementation (e.g. DataModelSample, which extends DataModel) should define "public static $lists = array( )" and "public static $sorts = array( )". The array can contain multiple entries (the key is the "list name"; the first parameter to ::getList) that each contain one or multiple conditions(the WHERE alternative). Sorts are defined through ::$sorts (the ORDER BY alternative).
The reason that all possible conditions & sort combinations need to be hardcoded in the implementing model is to ensure that, upon saving (->insert, ->update or ->delete) data, the data of the entry will be re-evaluated to all these conditions and sorts to update the list totals (which are expensive to fetch from DB) and purge caches (only when data is changed). If at some point later in time, a list or sort needs to be added or changed, you can purge all caches by running maintenance/DataModelPurgeCache.php.
When a list's data is not in cache and needs to query the database, it will pre-fetch more data and cache it right away, reducing potential follow-up queries.
Cache[edit | edit source]
Unless specifically set through DataModel::setCache(), $wgMemc will be used as cache. Unless $wgMemc is EmptyBagOStuff, in which case a HashBagOStuff will be created to use for DataModel (HashBagOStuff will be supported on any system and will make sure that ::get()-calls following a ::getList() will not result in fetching the same data twice in 1 request)
Data to be cached, per "type":
get: All data for ::get() is cached for an hour every time data is requested. This will ensure that "popular" entries are in cache pretty much all the time, while old & neglected entries do not occupy a cache slot when it's only rarely requested.
getList: All data for ::getList() is cached for an hour. Same reason as for "get" (e.g. for AFT, the list of "deleted" items is only visible to oversighters, who will probably not view it all that often, let alone the latest 50 entries of it..)
getListValidity: Lists are saved to cache in several smaller chunks (for an hour); if data is added or updated to a list, it's cache should be purged & instead of looping and purging all chunks, this will save the purge date & a chunk's cache will not be purged until it is actually requested.
getCount: For all list/shard combinations, the amount of matching entries (integer) will be stored indefinitely. It's just an int so won't consume much memory, and the alternative is a slightly more expensive "SELECT COUNT(*) FROM ..." query.
generateId: Will briefly save a value when generating a new id, to ensure that the id is unique and no 2 the same are generated at the same time.
DataModelBackend[edit | edit source]
DataModel has a method ::getBackend() that will return an object (depending on $wgDataModelBackendClass) that extends from DataModelBackend.php (e.g. DataModelBackend.LBFactory.php). *.LBFactory.php is a single DB implementation, using plain old wfGetDB(). Other backends could be created at will: e.g. one that supports sharded DB's and fetched all data from multiple servers, or even one that fetches data from a non-DatabaseBase source.
DataModelList[edit | edit source]
This is just a simple Iterator (extends from FakeResultWrapper) that will allow all entries from the requested list to easily be traversed. It also adds 2 new methods:
->hasMore(): Will return true/false as an indicator of whether or not there is additional data to be fetched after the requested chunk
->nextoffset(): Will return the value that should be used as $offset to fetch the next chunk
DataModelPurgeCache[edit | edit source]
In the event that cached data would become corrupted or when $lists or $sorts of a certain implemented model would change, running this maintenance script will purge all existing caches, which would force all data to be read from DB & re-cached. The script expects the parameter --model to be entered, which is the classname whose caches you're looking to purge. E.g.: php DataModelPurgeCache.php --model=DataModelSample
DataModelSample[edit | edit source]
This is just a really basic example implementation of DataModel (and also used by the unittests that accompany the extension). Once the model (like DataModelSample) has been created, it can be accessed like:
// create a new entry and insert $sample = new DataModelSample; $sample->ds_shard = 1; $sample->ds_title = "This is an example entry"; $sample->ds_email = "[email protected]"; $sample->ds_visible = true; $sample->insert(); // fetch all visible entries (see: $lists['visible']) $visibleEntries = DataModelSample::getList( 'visible', null, $offset, 'title', 'ASC' ); foreach ( $visibleEntries as $entry ) { var_dump( $entry->title ); // update the data, set all of them to hidden $entry->ds_visible = false; $entry->update(); } // get the amount of hidden entries $amountVisible = DataModelSample::getCount( 'hidden', null );
Feedback[edit | edit source]
Open tickets in Wikimedia's Bugzilla under the "DataModel" component in the "MediaWiki extensions" product.
![]() |
This extension is being used on one or more Wikimedia projects. This probably means that the extension is stable and works well enough to be used by such high-traffic websites. Look for this extension's name in Wikimedia's CommonSettings.php and InitialiseSettings.php configuration files to see where it's installed. A full list of the extensions installed on a particular wiki can be seen on the wiki's Special:Version page. |