|
Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com |
Re: Postfix multi-queue support
Subject: Re: Postfix multi-queue support
From: James Youngman (jay
gnu.org)
Date: Mon Jan 03 2000 - 15:22:13 CST
- Next message: Craig Sanders: "Re: 2 SMTP Relays, depending on the From: address"
- Previous message: Martin Schulze: "Re: 2 SMTP Relays, depending on the From: address"
- Next in thread: James Youngman: "Re: Postfix multi-queue support"
- Reply: Wietse Venema: "Re: Postfix multi-queue support"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
wietse
porcupine.org (Wietse Venema) writes:
> If you want to start a discussion on multiple queues, that is fine
> by me. Issues to consider:
>
> - One message arrives that must end up in multiple queues.
> Unfortuately, the system crashes after Postfix has inserted the
> message into some queues but before it has finished updating all
> queues.
I'll speak up, despite not knowing anything of importance about the
file formats used for the Postfix queue files. I've solved this
problem before (and tested the solution) in another existence...
Here, I'm reconstructing my earlier design from memory. The actual
design was thoroughly tested, but this is just a description from
memory. It's quite possible that I've not remembered it correctly.
A failrly generic case is for the copying of a file, with optional
modification, into two other directories. Let's assume that they're
all on the same filesystem (are we willing to assume this for the
purposes of the larger discussion?)
Let us assume that all state information needed for the program to
decide to perform the move is contained within the source file
itself, or in external files which are "never" moved (e.g. map
files). The latter external information is assumed to remain outside
the scope of the discussion.
Suppose that the file starts in directory /S, and is to be copied into
directories /A and /B.
Suppose that the source file is called /S/sXXX.
C1. We start by atomically creating a file S/tXXX.
C1.1 Into this file we write the values of A and B, and then flush and
close it.
C1.2 S/tXXX is then (non-atomically) renamed to S/mXXX.
C2. We then copy the source file (modifying if neccesary) to /A/dXXX and
/B/dXXX. This operation is not atomic and (let us suppose) the system
may crash leaving these files containing a random amount of undefined
data.
C3. We flush, sync and close /A/sXXX and /B/sXXX. In the failure case, we
delete the destination files (and log errors, etc.).
C4. At this stage, the destination files are correct.
C5. We then (atomically) delete S/sXXX.
C6. We then (atomically) delete S/mXXX.
At this point, the operation is complete, but we need to consider how
information is preserved in the event of a crash. For this we need a
"recovery" process which we assume to know all the possible values of
S. The recovery process does the following in each possible
directory S:-
R1. "Loop 1": Loop through all files in the directory (using readdir).
When all files have been exhausted, continue at step R6.
R2. If the filename returned by readdir() cannot be stat()ed, ignore
it. This happens because step R2.1.1 may delete a file which has
not yet been returned by readdir(), but which has already been
obtained from the kernel via getdents() or similar.
For each stat()able file, pluck out the initial character.
If that character is not "m", go to step R4.
R2.1. (Looking at "m" file) Look for S/sXXX. If S/sXXX doesn't exist,
go to step R3.
R2.1.1 If the file S/tXXX exists, we experienced a crash at step C1.2.
No destination files will exist, so delete the "t" and "m"
files, leaving the "s" file in place. Continue at step 2.
Otherwise, continue at the following step.
R2.1.2 For each directory listed in S/mXXX, delete any instance of
dXXX in that directory. Also Delete the file S/mXXX.
This cleans up after any crash between C2 and C5.
R3. (the "m" file exists but the "s" file does not). Delete the
S/mXXX file. There must have been a crash in step C6.
R4. If the initial character is "t", we ignore it in this loop.
Continue at step R2.
R5. If that character is "s", ignore it; it is an ordinary queue file
and will be processed in the ordinary course of events. Otherwise,
it's not an "m", "t" or "s" file, and we log its existence, and take
appropriate action.
R6. "Loop 2": Loop through all files in the directory (using readdir).
When all files have been exhausted, continue at step R7.
R6.1 If the filename starts with "t", delete it. There must have been
a crash at step C1 or C2.
R6.2 If the filename starts with "m", and is stat()able, there has
presumably been a failure in the first loop. This is a
catastrophic failure of the strategy!
R6.3 Otherwise, ignore the file (you may ewant to log its existence if
it's not an "s" file).
R7 Stop.
The point of using the single character prefixes is that you then know
the purpose of the file (and enough of the system state) without
having to worry about the validity of the data in the file. In the
discussion above, file creation is assumed to be atomic, but you may
wish to mess about with link(), stat() and unlink() to avoid NFS
problems or something.
If the destination files are ever to be moved by another program, you
probably need to hold a (cooperative) file lock on them until after
step C5, to inform potential users that they can'#t use it yet. The
intent here is to avoid duplicate processing.
More generally, the intent of the processing above is to ensure that
the recovery process either rolls back the "transaction" or completes
it.
> - How does Postfix recover without sending duplicate messages and
> without losing mail?
>
> - How does Postfix recover if the system crashes while it is
> recovering without sending duplicate messages and without losing
> mail?
>
> If this can be solved efficiently, multiple queues are feasible.
Again, if there are any problems with the above (other than that
perhaps you find the approach unsuitable), I'm sure these are errors
of recall rather than problems with the original design -- though I'm
open to criticism...
One thing in particular; my description above seems not to rely on A B
and S all being on the same filesystem, but I seem to remember that
the original scheme did, in fact, rely on this (at least the
documentation stated that this was mandatory).
-- James Youngman Manchester, UK. +44 161 226 7339
- Next message: Craig Sanders: "Re: 2 SMTP Relays, depending on the From: address"
- Previous message: Martin Schulze: "Re: 2 SMTP Relays, depending on the From: address"
- Next in thread: James Youngman: "Re: Postfix multi-queue support"
- Reply: Wietse Venema: "Re: Postfix multi-queue support"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This archive was generated by hypermail 2b27 : Mon Jan 03 2000 - 16:38:09 CST