[H-GEN] Experiences with SpamAssassin

Greg Black gjb at gbch.net
Fri Nov 22 22:28:32 EST 2002


[ Humbug *General* list - semi-serious discussions about Humbug and     ]
[ Unix-related topics. Posts from non-subscribed addresses will vanish. ]

Joel Michael wrote:

| On Fri, 2002-11-22 at 17:26, Greg Black wrote:
| >     Total archive size      84516
| >     Actual spam              7918
| >     False positives           218  ( 0.28% of good messages)
| >     False negatives          1652  (20.86% of actual spam)
| >     Time to process         13:58
| 
| Actually, I'd be interested to see how the numbers above change over
| time.  Say, how much different are the scores from this month than the
| scores from the first month you have email?  Or how scores for 1998 are
| different to scores in 2002.

I'm not going to do a month-by-month analysis -- the quantity of
data points is not sufficient to derive meaningful conclusions
from that (and I'd have to do some actual work to extract the
data anyway).  Nor is there any point in analysing the false
positives -- there are so few that this would also be pretty
meaningless.

However, taking the known spam and the failures to identify it,
we can get some trends.  The numbers in the table are the number
of messages in each category.

  Year     Correctly Identified     Not Identified     Total

  1998          244    68%            116    32%         360
  1999          223    44%            283    56%         506
  2000          268    60%            180    40%         448
  2001          846    74%            290    26%        1136
  2002         4471    80%           1134    20%        5605

  Totals       6052    75%           2003    25%        8055

Treating 1999 as an outlier, there has been a general trend for
SpamAssassin to get better over time at identifying spam.  Since
it's a recent tool and was presumably written to cater for the
kind of spam we've been seeing over the past couple of years,
that's not a surprise -- but it's encouraging.  However, it
still means that I'd get to see 3 or 4 spam messages a day that
got through the barrier.

As things stand, I'll keep using SpamAssassin for a while and
see how it goes.  Later, I'll have a look at crm114 as it seems
to be an interesting tool.  In the end, I suspect that I'll
decide that my initial negative reaction to TMDA when it was
first announced might have waned sufficiently for me to adopt it
as the last bastion.

Part of the problem for TMDA is that quite a bit of the spam I
receive comes courtesy of mailing lists that allow anybody to
post -- I think this is stupid in the 21st century, but I don't
get to manage those lists and I'm not willing to give them up.

Greg

--
* This is list (humbug) general handled by majordomo at lists.humbug.org.au .
* Postings to this list are only accepted from subscribed addresses of
* lists 'general' or 'general-post'.  See http://www.humbug.org.au/



More information about the General mailing list