[H-SASIG] [SysAdmin] #12: http://lists.humbug.org.au/mailman/admindb/mailman returns 500 Internal Server Error

HUMBUG System Administrators sasig at lists.humbug.org.au
Sat May 1 21:26:13 EDT 2010


#12: http://lists.humbug.org.au/mailman/admindb/mailman returns 500 Internal
Server Error
---------------------------+------------------------------------------------
  Reporter:  russell       |       Owner:  russell 
      Type:  defect        |      Status:  closed  
  Priority:  major         |   Milestone:  Sysadmin
 Component:  mailing-list  |     Version:  NA      
Resolution:  fixed         |    Keywords:          
---------------------------+------------------------------------------------
Changes (by russell):

  * status:  assigned => closed
  * resolution:  => fixed


Comment:

 Fixed.

 Chronology was thus:

 1.  I recall someone getting sick of mailman spam notifications for
 general.  It seems the solution was to send them then to /dev/null.

 2.  Mailmain still carefully filed away all spam messages in its pending
 queue, waiting for someone who cared to look at them.

 3.  3 years later, someone who cared came along (me).  The pending queue
 was held in a single python pickle which by this stage contained some
 16,000 messages.  When I attemped to look at the queue, mailman died.

 4.  Turns this initial failure was caused by python running out or memory
 when trying to process this pickle.  (This much I had guessed).

 5.  When mailman died, it did so so horribly it didn't cleanup its locks.

 6.  The next time you went via the web interface, mailman went into what
 was effectively an infinite loop, waiting to acquire the lock.
 Unfortunately, that loop contains a memory leak.  So it again ran and of
 memory, and in general for all appearances looked like the original
 problem.

 7.  Turns our that our VM at that point (around 10 PM last night), ran out
 of memory.  The linux OOM killer fired up, but unfortunately choose the
 wrong process to kill.  The process it did choose refused to die, so it
 went infinite trying to kill it.

 8.  As a consequence of that, we lost control of the VM.

 9.  Stephen Thorn wrestled control of the VM back for us this morning.

 10. I have now rm /var/lib/mailman/lists/mailman/request.pck.  This was
 the original trigger for the problem.

 11. I have restored normal processing of SPAM messages.  All notifications
 are now being sent to list-bounces at humbug.org.au, which for now is aliased
 to president at humbug.org.au.

 Thanks to Greg for loaning me his VM and time to help track down this
 problem.  Thanks to Stephen for being patient for me when I rang him this
 morning.

-- 
Ticket URL: <http://trac.humbug.org.au/ticket/12#comment:6>
SysAdmin <http://trac.humbug.org.au>
HUMBUG System Administration


More information about the Sasig mailing list