Extension:NotEvil

From MediaWiki.org
Jump to: navigation, search
MediaWiki extensions manual - list
Crystal Clear action run.png
NotEvil

Release status: unknown

Implementation $wgFilterCallback (invalid type)
Description Block any edit which contains a url, unless the user is on the Not Evil list.
Author(s) James Paigetalk
License Public Domain
Download No link

This extension (technically, a $wgFilterCallback function, not a regular extension) blocks any edit that contains a url unless the users is on the "Not Evil" list. An example mechanism for updating the not evil list using a protected wiki page is also provided.

The Not Evil list is simply a text file containing usernames who are allowed to make edits that include URLs.

This filter also attempts to deceive spammers into believing their edits have succeeded. This does mean that anonymous users who make legitimate edits containing non-spammy URLs get blocked without any feedback that they are getting blocked, but in real-life usage this is an astonishingly rare edge case (you would think it would be common, but when you are only blocking edits with URLs it actually isn't)

Contents

[edit] Development

This code is in use, and is highly effective, but could really use some cleanup and simplification. Help is welcome!

  • Is there a better way to detect which lines changed? The current method is slow, and I don't trust my own temp-file code.
  • Can $basePath be removed or replaced with something provided by MediaWiki? it would make configuration simpler.
Use $IP. —Emufarmers(T|C) 01:01, 30 April 2008 (UTC)
Thanks! Fixed. --Bob the Hamster 16:35, 15 February 2011 (UTC)
  • How could this filter be best implemented as a real extension, rather than just a $wgFilterCallback?
  • Would it be possible to implement NotEvilness as a real account permission?
  • The example script for refreshing the notevil.txt file needs to be rewritten in php. Not everybody can run shell scripts (and even fewer have the pcregrep command that the example requires) but everybody who is running Mediawiki will be able to run php

[edit] Cleaned-up code

Update: the latest version (below) has support for internationalised domain name detection.

Here's some helping... I've cleaned up the code and separated it into two extensions, "TrustedLinks" and "UntrustedLinksLogger". You can check out the files here:

I'll write more on the talk page. --Laird (talk) 23:12, 9 July 2012 (UTC)

[edit] Code

<?php
///////////////////////////////////////////////////////////////////////
// Silent spam blocker for WikiMedia. Blocks spam using Wikimedia's
// $wgFilterCallback and then attemtps to deceive the spammer into
// believing that thier edit was posted successfully
//
// USAGE: In LocalSettings.php:
//
//    require_once('mediawiki-spamcallback.php');
//    $wgFilterCallback = spamCallBack;
//
// This code is public domain. You may use and modify it in any way.
///////////////////////////////////////////////////////////////////////
 
// these files will be in the main mediawiki installation folder.
// make sure that spammer.log is writeable
 
$spamNOTEVIL   = $IP . '/not.evil.txt';
$spamLOG       = $IP . '/spammer.log';
 
///////////////////////////////////////////////////////////////////////
function checkWhiteList($filename,$who){
  // The not evil file is a list of valid usernames
  if($handle=fopen($filename,'r')){
    while (!feof($handle)) {
      $line = trim(fgets($handle));
      if($line and $line[0] != '#'){
        if(strcasecmp($line,$who) == 0){
          fclose($handle);
          return true;
        }
      }
    }
    fclose($handle);
  }
  return false;
}
 
///////////////////////////////////////////////////////////////////////
function spamCallBack($title, $body, $section){
  global $spamNOTEVIL;
  global $spamLOG;
 
  global $wgEmergencyContact;
  global $wgSitename;
  global $wgOut;
  global $wgParser;
  global $wgUser;
 
  $block = false;
  $do_filter = true;
  $who = $wgUser->mName;
 
  if(in_array('sysop',$wgUser->mRights)){
    //no filtering for sysops
    $do_filter = false;
  }
 
  if(checkWhiteList($spamNOTEVIL,$who)){
    $do_filter = false;
  }
 
  if($do_filter){
 
    //Create a diff, for better filtering  
    $old_page = new Article($title);
    $old = $old_page->fetchContent();
    $diff = getDiff($old,$body);
    $diff = implode("\n",$diff);
    unset($old_page);
 
    if (!$block){
      if(preg_match('/http:\//',$diff)){
        $reason = 'direct links are forbidden';
        $block = true;
      }
    }
  }
 
  if($block){
    // log the spam attempt
    $ip = $_SERVER['REMOTE_ADDR'];
    $ip_name = gethostbyaddr($ip);
    $log_name = $who;
    if($who != $ip) $log_name .= " ".$ip;
    if($ip != $ip_name) $log_name .= " ".$ip_name;
    if (is_writable($spamLOG)){
      if($fh = fopen($spamLOG,'a')){
        fwrite($fh, sprintf("%s\t%s\t%s\t%s\n",
                    date('Y-m-d H:i:s'),
                    $log_name,
                    $title->mTextform,
                    $reason));
        fclose($fh);
      }
    }
 
    // alert the administrator of the spam attempt.
    mail($wgEmergencyContact,
         sprintf('%s %s',$wgSitename,$title->mTextform),
         sprintf("spam attempt blocked from \"%s\"\nReason: %s\n\n%s",
                 $log_name,$reason,$diff),
         sprintf('From: %s',$wgEmergencyContact));
 
    // attempt to deceive the spammer into thinking their edit succeeded
    $parserOptions = ParserOptions::newFromUser( $wgUser );
    $parserOutput = $wgParser->parse( $body, $title, $parserOptions );
    $deceitHTML = $parserOutput->mText;
    $wgOut->addHTML($deceitHTML);
    $wgOut->addHTML( "<br style=\"clear:both;\" />\n" );
    return true;
  }
  return false;
}
 
/**
* Get a diff, as an array of changed lines.
* Returns false on error
*/
function getDiff($old, $new) {
  $hash = md5(mt_rand(1,1000000));
  $o = fopen($oldfile = "/tmp/$hash.wiki.old","w");
  $n = fopen($newfile = "/tmp/$hash.wiki.new","w");
 
  fwrite($o,$old);
  fwrite($n,$new);
 
  fclose($o);
  fclose($n);
 
  $res = shell_exec("diff $oldfile $newfile");
 
  unlink($oldfile);
  unlink($newfile);
 
  $line = "";
  $lines = explode("\n",$res);
 
  $diff = array();
 
  for($i=0;$i<count($lines);$i++,$line=$lines[$i]) {
    if($line[0] == ">") {
        $diff[] = substr($line,2);
    }
  }
  return $diff;
}

[edit] Generating not.evil.txt

You can use a cron job like this to update the not.evil.txt file once an hour.

#!/bin/sh
 
NOTEVILFILE="/home/username/public_html/wiki/not.evil.txt"
NOTEVILURL="http://example.com/~username/wiki/index.php/Not_evil"
NOTEVILSANE="This is a list of wiki contributors who are not robots"
 
wget -O - --quiet ${NOTEVILURL} \
 | pcregrep -M "(?s)<pre>.*</pre>" \
 | pcregrep -v "</?pre>" \
 > "${NOTEVILFILE}".new
 
SANITY=`grep "${NOTEVILSANE}" "${NOTEVILFILE}".new`
 
if [ "$SANITY" ] ; then
  mv "${NOTEVILFILE}".new "${NOTEVILFILE}"
else
  echo "ERROR: not evil sanity check failed!"
  cat "${NOTEVILFILE}".new
  rm "${NOTEVILFILE}".new
fi