Closed Bug 550455 Opened 14 years ago Closed 14 years ago

Crash when m_foldersToStat.Count() == 0 [@ nsImapIncomingServer::OnStopRunningUrl]

Categories

(MailNews Core :: Networking: IMAP, defect)

x86
Linux
defect
Not set
critical

Tracking

(thunderbird3.1 beta2-fixed, blocking-thunderbird3.0 -, thunderbird3.0 .5-fixed)

RESOLVED FIXED
Tracking Status
thunderbird3.1 --- beta2-fixed
blocking-thunderbird3.0 --- -
thunderbird3.0 --- .5-fixed

People

(Reporter: jhorak, Assigned: Bienvenu)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 3 obsolete files)

Crash occurs in nsImapIncomingServer.cpp:
nsImapIncomingServer::OnStopRunningUrl(nsIURI *url, nsresult exitCode):
...
    case nsIImapUrl::nsImapFolderStatus:
    {
      PRInt32 folderCount = m_foldersToStat.Count();
      nsCOMPtr<nsIMsgFolder> msgFolder(
->        do_QueryInterface(m_foldersToStat[folderCount - 1]));

when m_foldersToStat array is empty. I'm unable to reproduce but we have couple of automatic reports at https://bugzilla.redhat.com/show_bug.cgi?id=570391 (please see for full backtrace). I can reproduce only by manually setting folderCount variable to 0.
(In reply to comment #1)
> related to any of these?
> *
I don't think so. I can't find any crash in nsImapIncomingServer::OnStopRunningUrl on crashstats nor something similar.
This is happening to me multiple times a day:  https://bugzilla.redhat.com/show_bug.cgi?id=570041

Is this ticket getting the visibility required?  Having thunderbird constantly segfault on me is surprising since it's always been fairly solid in the past.
Attached patch proposed fix (obsolete) — Splinter Review
I don't know why this situation should happen, or how to test for it, but this should fix it.
Assignee: nobody → bienvenu
Status: NEW → ASSIGNED
Attachment #431555 - Flags: superreview?(neil)
Attachment #431555 - Flags: review?(neil)
Could this happen when getting new messages for non inbox folders while there are still folders waiting to update statue?
FWIW, I took a spin through crashes with OnStopRunningUrl in the stack and did not find any matches with it in the top couple frames.
* nsXPTCStubBase::Release()  bp-2bab6ac8-4f77-4780-beeb-923892100308
* nsImapProtocol::SetupWithUrl(nsIURI*, nsISupports*) bp-fc01edc8-dbf5-42c8-846a-6faa92100306
(In reply to comment #5)
> Could this happen when getting new messages for non inbox folders while there
> are still folders waiting to update statue?

Well, we're running a STATUS url when this happens, and the incoming server is the listener for the url. AFAIK, that combination only happens when doing a STATUS to check non-inbox folders for new mail. We run other STATUS urls, but the imap incoming server is not the listener for those urls.
Summary: Crash when m_foldersToStat.Count() == 0 @nsImapIncomingServer::OnStopRunningUrl → Crash when m_foldersToStat.Count() == 0 [@ nsImapIncomingServer::OnStopRunningUrl]
Comment on attachment 431555 [details] [diff] [review]
proposed fix

So, as discussed on IRC, this just wallpapers over the crash. The real cause is that each call to GetNewMessagesForNonInboxFolders triggers a chain of folder updates, but there's no protection against multiple chains simultaneously using the same array.
Attachment #431555 - Flags: superreview?(neil)
Attachment #431555 - Flags: superreview-
Attachment #431555 - Flags: review?(neil)
This crash is rather frequent for significant amount of Linux users (see https://bugzilla.redhat.com/show_bug.cgi?id=570655 for example). Could this be a blocker for next release (3.0.5)?
blocking-thunderbird3.0: --- → ?
It's not showing up high in our crashstats for some reason. I am planning on fixing it, though I have to convince Neil of the rightness of the fix...the first thing is that I need to remove the assumption that the current folder is also the last folder in the stat array, which can be wrong when you get multiple calls to GetNewMessagesForNonInboxFolders.
I'm looking at related crash sigs/bugs and may have more info later. But I want add that Matej in https://bugzilla.redhat.com/show_bug.cgi?id=570655#c3 mentions their stack is the same as bug 411147. I haven't verified that. Not also, I wouldn't bank on bug 411147 being windows-only because my comment "windows only per crash-stats" simply means I didn't see any linux crashes.
I've just dug deep into crash-stats. I've found only one instance of this reported in the last two weeks:

bp-ab9ff0c2-3e7f-429d-a482-ad2a32100316

Therefore not blocking on this as that isn't anywhere near a top-crasher. I would say its wanted though.

jhorak: Are you sure this isn't something more specific to the Fedora build? An additional patch (I don't know what you guys do) or something?.

Also I notice from the first comment in that bug, it looks like your users are running a 64 bit build there and that's something we don't support/test/run at all at the moment (we don't even have nightlies). So you may want to take a look around and see if there's a 64-bit bug hiding somewhere.
blocking-thunderbird3.0: ? → -
I believe this is a case of linux users being much more likely to be checking all folders for new messages, because of server side filters, and being likely to have multiple accounts. I've seen this crash myself, and I'm only checking a few folders for new messages.
That's where it would be nice to receive crash stats from linux builds by distros.
(In reply to comment #14)
> That's where it would be nice to receive crash stats from linux builds by
> distros.

exactly.

My plan for fixing this is not to leave the currently statted folder in m_foldersToStat. This will make things a lot simpler as far as dealing with multiple actors on the array is concerned.
This change makes it so the current STATUS operation doesn't have its folder in m_foldersToStat, which means we have to check if there are any folders left before doing the next stat. It also means we won't be potentially removing the wrong folder when the STATUS operation finishes. And I put some code to prevent us from putting the same folder in the array multiple times, which should ameliorate the case of the user hitting get new mail twice.
Attachment #431555 - Attachment is obsolete: true
Attachment #435225 - Flags: superreview?(neil)
Attachment #435225 - Flags: review?(neil)
Comment on attachment 435225 [details] [diff] [review]
less hacky but more complicated fix

Although this obviously avoids removing the wrong folder, it doesn't actually stop multiple attempts to check for status, and in fact would make it harder to retrofit such a check.

>+      // if we get an error running the url, it's better
>+      // not to chain the next url.
>+      if (NS_FAILED(exitCode))
>+        break;
What about the remaining folders?

>+        nsCOMPtr<nsIMsgImapMailFolder> folder = m_foldersToStat[folderCount - 1];
>+        m_foldersToStat.RemoveObjectAt(folderCount - 1);
>+        folder->UpdateStatus(this, nsnull);
Is UpdateStatus not always async?

>-      if (imapFolder && !isServer)
>+      if (imapFolder && !isServer &&
>+          m_foldersToStat.IndexOfObject(imapFolder) == -1)
Nit: IndexOfObject is really slow because it QIs the argument and each member to nsISupports first. If you don't need that just use IndexOf instead.
(In reply to comment #17)
> (From update of attachment 435225 [details] [diff] [review])
> Although this obviously avoids removing the wrong folder, it doesn't actually
> stop multiple attempts to check for status, and in fact would make it harder to
> retrofit such a check.

It does, by preventing duplicates from getting added to the array of folders to stat.


> 
> >+      // if we get an error running the url, it's better
> >+      // not to chain the next url.
> >+      if (NS_FAILED(exitCode))
> >+        break;
> What about the remaining folders?

An error at this point is generally something that's not recoverable, e.g., the server is down, or we're shutting down. The remaining folders will still be in the array, so if we do STAT again, we'll get them on the next biff round. 

> 
> >+        nsCOMPtr<nsIMsgImapMailFolder> folder = m_foldersToStat[folderCount - 1];
> >+        m_foldersToStat.RemoveObjectAt(folderCount - 1);
> >+        folder->UpdateStatus(this, nsnull);
> Is UpdateStatus not always async?

It is. I've changed m_foldersToStat to mean the folders still remaining to stat. Once we've started statting a folder, it is removed from the array.

> 
> >-      if (imapFolder && !isServer)
> >+      if (imapFolder && !isServer &&
> >+          m_foldersToStat.IndexOfObject(imapFolder) == -1)
> Nit: IndexOfObject is really slow because it QIs the argument and each member
> to nsISupports first. If you don't need that just use IndexOf instead.

OK, that's good to fix.
(In reply to comment #18)

> > Although this obviously avoids removing the wrong folder, it doesn't actually
> > stop multiple attempts to check for status, and in fact would make it harder to
> > retrofit such a check.
> 
> It does, by preventing duplicates from getting added to the array of folders to
> stat.
> 
I suppose it makes it so the currently stat'd folder could again be requested to be stat'd. That doesn't bother me. You might actually get a different answer the next time.
(In reply to comment #18)
> (In reply to comment #17)
> > (From update of attachment 435225 [details] [diff] [review])
> > Although this obviously avoids removing the wrong folder, it doesn't actually
> > stop multiple attempts to check for status, and in fact would make it harder to
> > retrofit such a check.
> It does, by preventing duplicates from getting added to the array of folders to
> stat.
Sorry for being unclear, I was referring to multiple checks on the same server, rather than the same folder. (If you clicked Get Messages too many times, could you max out the cached connections?)

> > >+      // if we get an error running the url, it's better
> > >+      // not to chain the next url.
> > >+      if (NS_FAILED(exitCode))
> > >+        break;
> > What about the remaining folders?
> An error at this point is generally something that's not recoverable, e.g., the
> server is down, or we're shutting down. The remaining folders will still be in
> the array, so if we do STAT again, we'll get them on the next biff round. 
So the idea is that because you won't add them again, you don't need to remove them?

> > >+        nsCOMPtr<nsIMsgImapMailFolder> folder = m_foldersToStat[folderCount - 1];
> > >+        m_foldersToStat.RemoveObjectAt(folderCount - 1);
> > >+        folder->UpdateStatus(this, nsnull);
> > Is UpdateStatus not always async?
> It is. I've changed m_foldersToStat to mean the folders still remaining to
> stat. Once we've started statting a folder, it is removed from the array.
I was just wondering whether you could write
m_foldersToStat[folderCount - 1]->UpdateStatus(this, nsnull);
or is that being too sneaky?
(In reply to comment #19)
> (In reply to comment #18)
> > > Although this obviously avoids removing the wrong folder, it doesn't actually
> > > stop multiple attempts to check for status, and in fact would make it harder to
> > > retrofit such a check.
> > It does, by preventing duplicates from getting added to the array of folders to
> > stat.
> I suppose it makes it so the currently stat'd folder could again be requested
> to be stat'd. That doesn't bother me.
On the other hand you could perhaps work around that with a hybrid approach whereby you get the folder from the url (rather than the existing assumption that it's the last folder) and removing that from the array (helpfully RemoveObject returns PR_TRUE only if the object was in fact in the array.)
(In reply to comment #20)

> Sorry for being unclear, I was referring to multiple checks on the same server,
> rather than the same folder. (If you clicked Get Messages too many times, could
> you max out the cached connections?)

Maybe, but that would just mean that the requests would get queued. It wouldn't break anything. Ideally, we would prevent duplicate attempts, but the cure might be worse than the disease, if we get it wrong and permanently think we're doing a stat when we're not.

> So the idea is that because you won't add them again, you don't need to remove
> them?

Well, the idea was more that the next time around we should pick up where we left off. But since it's going to be the same set of folders the next time around, you're right that I might as well clear them.

> I was just wondering whether you could write
> m_foldersToStat[folderCount - 1]->UpdateStatus(this, nsnull);
> or is that being too sneaky?

Too sneaky, and that would cause issues if UpdateStatus failed synchronously, which is a possibility we should allow for (I guess that's what you were asking).
(In reply to comment #21)

> On the other hand you could perhaps work around that with a hybrid approach
> whereby you get the folder from the url (rather than the existing assumption
> that it's the last folder) and removing that from the array (helpfully
> RemoveObject returns PR_TRUE only if the object was in fact in the array.)

I started off doing just that, and decided not to, for a very good reason which I'll try to remember :-)
(In reply to comment #23)
> (In reply to comment #21)
> > On the other hand you could perhaps work around that with a hybrid approach
> > whereby you get the folder from the url (rather than the existing assumption
> > that it's the last folder) and removing that from the array (helpfully
> > RemoveObject returns PR_TRUE only if the object was in fact in the array.)
> I started off doing just that, and decided not to, for a very good reason which
> I'll try to remember :-)
Please do, that's the only remaining issue blocking my review!
Attached patch fix addressing comments (obsolete) — Splinter Review
The idea of not having to find the currently STATUS'd folder and remove it when done was the main appeal, iirc. But it occurred to me that if I start with the first folder in the array instead of the last, finding the current folder should in general be quick, and the code could be simplified a bit...
Attachment #435225 - Attachment is obsolete: true
Attachment #436199 - Flags: superreview?(neil)
Attachment #436199 - Flags: review?(neil)
Attachment #435225 - Flags: superreview?(neil)
Attachment #435225 - Flags: review?(neil)
Comment on attachment 436199 [details] [diff] [review]
fix addressing comments

>+      if (NS_FAILED(exitCode))
>+      {
>+        m_foldersToStat.Clear();
>+        break;
>+      }
>+      if (m_foldersToStat.Count() > 0)
>+        m_foldersToStat[0]->UpdateStatus(this, nsnull);
Nit: could use else or could just leave out the break
(the compiler should notice that the count will be zero anyway)
Attachment #436199 - Flags: superreview?(neil)
Attachment #436199 - Flags: superreview+
Attachment #436199 - Flags: review?(neil)
Attachment #436199 - Flags: review+
Attached patch patch landedSplinter Review
I just moved the break to the end of the case, because a case w/o a break is a bug waiting to happen.
Attachment #436199 - Attachment is obsolete: true
Attachment #436533 - Flags: approval-thunderbird3.0.5?
fixed on trunk.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Can we have this fix in 3.0 line? It affects many of Fedora users.
(In reply to comment #29)
> Can we have this fix in 3.0 line? It affects many of Fedora users.

Probably - it is already flagged for approval, but as we haven't yet got a firm date for 3.0.5 I've not looked at the approvals yet.
Attachment #436533 - Flags: approval-thunderbird3.0.5? → approval-thunderbird3.0.5+
FWIW, it also affects seamonkey 2.0.x
(In reply to comment #31)
> FWIW, it also affects seamonkey 2.0.x

That's why it is in Mailnews Core and not Thunderbird product...
fixed for 3.0.5
Crash Signature: [@ nsImapIncomingServer::OnStopRunningUrl]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: