OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
Re: postfix died!

From: Victor Duchovni (Victor.DuchovniMorganStanley.com)
Date: Thu Feb 01 2007 - 21:33:43 CST


On Thu, Feb 01, 2007 at 09:21:05PM -0500, Wietse Venema wrote:

> Rich Morin:
> > Feb 1 16:05:18 g3po postfix/master[38]: panic: master_status_event:
> > pointer corruption: 0x3049f800 != 0x3049f8
>
> I have never seen this in 10 years.
>
> Does the machine have ECC memory? If not, perhaps it is a good
> idea to run a hardware diagnostic.
>
> 0x3049f800 seems a bit high for a heap address to me (772 MB from
> the bottom of the process address space).

Interestingly with MacOSX shipping 2.1.5, it should have the
process generation sanity check:

    if ((proc = (MASTER_PROC *) binhash_find(master_child_table,
                                        (char *) &pid, sizeof(pid))) == 0) {
        if (msg_verbose)
            msg_info("%s: process id not found: %d", myname, stat.pid);
        return;
    }
    if (proc->gen != stat.gen) {
        msg_info("ignoring status update from child pid %d generation %u",
                 pid, stat.gen);
        return;
    }
    if (proc->serv != serv)
        msg_panic("%s: pointer corruption: %p != %p",
                  myname, (void *) proc->serv, (void *) serv);

If so, we got the right "proc" structure, but somehow either proc->serv
or "serv" changed, with the "serv" value looking much more reasonable
(unlikey that the master virtual size was > 700MB), we must conclude
that proc->serv is wrong, while proc->gen is right. This in turn suggests
that the "proc" pointer itself is fine. We also see from:

    proc->serv 0x3049f800
    serv: 0x003049f8

that loading proc->serv shifted the data 8 bits to the left. It is unlikely
that this is corruption of the stored data, rather the corruption is in
the relative offset of the "serv" structure member from the beginning of
the proc structure.

This means that the load instruction that reads proc->serv from memory
used an offset that is one larger than appropriate due to a single bit
error turning an even offset (all the structure members have an even size)
into an odd offset that is one bigger.

    typedef struct MASTER_PROC {
        MASTER_PID pid; /* child process id */
        unsigned gen; /* child generation number */
        int avail; /* availability */
        MASTER_SERV *serv; /* parent linkage */
        int use_count; /* number of service requests */
    } MASTER_PROC;

The problem could be a transient memory issue, in which case it will
be rare, or corruption on disk, in which case the master binary will
fail consistently.

--
        Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:majordomopostfix.org?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.