IntroductionAt this moment, there is more information about import here: synarcher. MediaWikiMy beloved WAMP tutorial pages: Ultimate How To, in Russian. Download MediaWiki. Unzip it, e.g. to: D:\all\inetpub\mediawiki-1.16.0.enwikt.20100824 Open file httpd.conf gvim "C:\w\Apache Software Foundation\Apache2.2\conf\httpd.conf" Add the following text to httpd.conf: <Directory "D:/all/inetpub"> Order Allow,Deny Allow from All </Directory> NameVirtualHost *:80 <VirtualHost *:80> DocumentRoot D:/all/inetpub/mediawiki-1.16.0.enwikt.20100824 ServerName enwikt ErrorLog D:/all/inetpub/logs/wikt_ru_error_log.txt CustomLog D:/all/inetpub/logs/wikt_ru_log.txt common </VirtualHost>
Add the line: $wgCapitalLinks=false; at the end of file LocalSettings.php. Xml2sqlFast upload dump file (e.g. enwiktionary-20100824-pages-articles.xml) to MySQL. Errors (Redirect, DiscussionThreading)IntroThere are problems with "<redirect />", "<DiscussionThreading>", "<ns>" and "<sha1 />" in the XML file. The error message is: ./xml2sql unexpected element <redirect> ./xml2sql: parsing aborted at line 33 pos 16. SolutionStrip of the tags (Redirect and others) by using a perl script `xml2sql_helper.pl` with the following format: perl xml2sql_helper.pl in_file out_file Where in_file is enwiktionary-20100824-pages-articles.xml. Xml2sqlDownload Xml2sql. $ cd /var/lib/mediawiki-1.16.0.enwikt.20100824/ $ bzip2 -d enwiktionary-20100824-pages-articles.xml.bz2 $ xml2sql -m enwiktionary-20100824-pages-articles.xml Xml2sql generates three files: page.sql, revision.sql and text.sql MySQL$ mysql -u root -p USE enwikt20100824 mysql$ SOURCE /temp/page.sql mysql$ SOURCE /temp/text.sql mysql$ SOURCE /temp/revision.sql revision.sqlIf there are the errors (due to SOURCE revision.sql):
remove three fields (rev_len, rev_parent_id, rev_sha1), change rev_comment from TINYBLOB to BLOB, load by SOURCE, and return (add) three fields back: mysql$ ALTER TABLE revision DROP COLUMN rev_len, DROP COLUMN rev_parent_id, DROP COLUMN rev_sha1; mysql$ ALTER TABLE revision CHANGE COLUMN `rev_comment` `rev_comment` BLOB NOT NULL; mysql$ SOURCE /temp/revision.sql // restore back to square one mysql$ ALTER TABLE revision ADD COLUMN rev_len INT(10) UNSIGNED DEFAULT NULL AFTER rev_deleted, ADD COLUMN rev_parent_id INT(10) UNSIGNED DEFAULT NULL AFTER rev_len; mysql$ ALTER TABLE revision ADD COLUMN rev_sha1 VARBINARY(32) NULL AFTER rev_deleted; pagelinks and categorylinksI do not recommend "SOURCE pagelinks.sql;SOURCE categorylinks.sql;" for very big Wikipedias, e.g. English Wikipedia, see description here Synarcher (read text after "Tables pagelinks and categorylinks require special attention..."). The result database is ready. It could be used as a source for the Wiktionary parser. Let's dump it into file (just in case):
Grant privilegiesAdd the user (e.g. 'javawiki') to MySQL database. Grant privileges at database levels (e.g. 'ruwiki' database). Open MySQL command-line and run commands: mysql>CREATE USER javawiki; mysql>GRANT SELECT ON ruwiki.* TO javawiki@'%'; (with password; from any computer>GRANT SELECT PRIVILEGES ON ruwiki.* TO javawiki identified by '12345') mysql>FLUSH PRIVILEGES; Next stepSee alsoLinks
|