[H-SASIG] rdiff-image-cron /etc/rdiff-image/rdiff-image.conf FAILED!

Russell Stuart russell-humbug at stuart.id.au
Thu Apr 14 22:06:04 EDT 2011


On Fri, 2011-04-15 at 10:16 +1000, Daniel Devine wrote: 
> On Thu, 14 Apr 2011 17:44:03 -0400 (EDT), root at conference.osdc.com.au 
>  wrote:
> >   rdiff-image-cron: Backup completed, but S3 modified by someone 
> > else.
> 
>  Backup seems to have worked, but what would cause "S3 modified by 
>  someone else"? What exactly does that mean? Could that mean that it sees 
>  unexpected (or missing) files in the bucket due to a broken backup?

After rdiff-image completes a backup it records a list of every file on
S3, along with its MD5 hash.  When it next connects to S3 the first
thing is does it verify that list still accurately describes what is on
S3.  We get the email you are responding to if it doesn't. 

Rdiff-image writes a audit trail of every action it takes on S3.  I
provided this so you could double check S3's charges, but it turned out
to be useful for debugging.  I added the above check when I noticed
rdiff-image doing odd things such as deleting a file, then having to
delete it again on the next run.

After spending hours cross checking rdiff-image's log's versus S3's
logs, I decided the "someone else" that was modifying S3 was S3 itself.
(The cross checking was mainly to verify my logs of what had happened
agreed with S3's idea of what had happened.)

As a wild guess, I'd say it is a consequence of the CAP Theorem.
Suffices to say S3 is trying to provide Consistency, Availability and
Partition-tolerance (CAP), and the CAP Theorem says that is impossible.
Given it is impossible something will break if nodes inside S3 go down
at inopportune times, and I am guessing we are seeing that breakage as
deleted files reappearing.

Interesting, as time as gone on we have gotten these fewer of these
email's.  I think Amazon is gradually improving their cluster.




More information about the Sasig mailing list