Simple spam defense for Django comments
The comments application included with Django 1.0 fights spam several ways, but it wasn't enough here. Fortunately, it also includes a hook that lets you do additional validation of comments before they appear on your site.
Using that I whipped up a quick checker that can run comments through a number of validators and depending on their cumulative scoring, either mark it for review or reject it outright.
Most of my personal spambot friends are using the same template, which includes more links than any actual reader is likely to include in a comment. The simple link counter included with this has solved my problem (for now), but it should be obvious how you could write more complicated validators to do additional scoring.
To use this, just download the attached file, unzip it somewhere on your PYTHONPATH, and add 'fairview.comments' to your Django INSTALLED_APPS setting.
"""
Comment validation for Django 1.0. By default, it runs one validator,
which derives its score from the number of links in a comment.
To add more validators, configure COMMENT_VALIDATORS in your Django settings,
adding the path to any callable you want to use for scoring comments. A
validator function should return a score between 0 and 1.
The scores from all validators are totalled to determine whether a comment is
spam. There are two thresholds determining the disposition of a comment: if
the comment scores above COMMENT_SPAM_PUBLIC_THRESHOLD, its is_public
attribute will be set to False. If it scores above
COMMENT_SPAM_REJECT_THRESHOLD, the request will be rejected outright.
"""
import logging
import re
from django.conf import settings
from django.contrib.comments.models import Comment
from django.contrib.comments.signals import comment_will_be_posted
DEFAULT_COMMENT_VALIDATORS = (
'fairview.comments.validate_link_limit',
)
LINK_RE = re.compile('https?://')
def validate_link_limit(comment):
link_count = len(LINK_RE.findall(comment.comment))
score = min(link_count, 10.0) / 10.0
return score
def validate_comment(sender, comment, request, **kwargs):
logger = logging.getLogger('fairview.comments.validate_comment')
validators = []
for path in getattr(settings, 'COMMENT_VALIDATORS', DEFAULT_COMMENT_VALIDATORS):
i = path.rfind('.')
module, attr = path[:i], path[i+1:]
try:
mod = __import__(module, {}, {}, [attr])
except ImportError, e:
raise ImproperlyConfigured('Error importing comment validation module %s: "%s"' % (module, e))
try:
func = getattr(mod, attr)
except AttributeError:
raise ImproperlyConfigured('Module "%s" does not define a "%s" callable comment validator' % (module, attr))
validators.append(func)
score = 0.0
for validator in validators:
score += validator(comment)
if score > getattr(settings, 'COMMENT_SPAM_REJECT_THRESHOLD', 0.6):
logger.info("""Comment spam score is %s; rejecting it.""" % score)
logger.info("""Rejected comment info: IP Address: "%(ip_address)s" Name: "%(user_name)s" Email: "%(user_email)s" URL: "%(user_url)s" Comment: "%(comment)s" """ % comment.__dict__)
return False
if score > getattr(settings, 'COMMENT_SPAM_PUBLIC_THRESHOLD', 0.2):
logger.info("""Comment spam score is %s; marking it non-public.""" % score)
comment.is_public = False
return True
comment_will_be_posted.connect(validate_comment, sender=Comment)
Update: 18 May 2009
After several requests and way too long, I've set this up as a Bitbucket project. There you'll find better instructions, the full source in both downloadable form and a Mercurial repository, and an issue tracker. The code itself has had numerous improvements since this was posted, too, including support for Akismet or TypePad AntiSpam.
Comments (7)
http://sciyoshi.com/blog/2008/aug/27/using-akismet-djangos-new-comments-framework/
Thanks for sharing.
Currently I'm using James Bennet's comment_utils app, but as the functionality of this package has been greatly reduced with Django 1.0, I'm looking forward to implement my own comment moderation utilities. Your code shows great promise. :)
Comments have been turned off for this article, but you can always contact us about it.