Bug #62208
openmds: use MDSRank::abort to ceph_abort so necessary sync is done
0%
Description
The MDS calls ceph_abort("msg") in various places. If there is any pending cluster log messages to be sent to the mons, those messages may be lost when the MDS hard stops due to abort. To guarantee those messages are flushed before abort, use the new MDSRank::abort method instead of ceph_abort.
See discussion: https://github.com/ceph/ceph/pull/52638#discussion_r1276289186
Updated by Venky Shankar 9 months ago
- Backport changed from reef to reef,quincy,pacific
backport note - we might need to pull in additional commits for p/q.
Updated by Venky Shankar 9 months ago
- Category set to Administration/Usability
- Assignee set to Leonid Usov
Leonid, please take this one.
Updated by Leonid Usov 9 months ago
- Status changed from New to Need More Info
- Assignee changed from Leonid Usov to Patrick Donnelly
Hi Venky!
I've started looking at this one, but the ticket doesn't provide sufficient information so I don't know how to proceed with it.
It mentions a PR that seemingly adds the functionality requested here, and it's referencing a different ticket in the system: https://tracker.ceph.com/issues/62164. There is no info on how to reproduce this problem and no mentions of the other ticket which might be a duplicate
I'm assigning this issue to Patric with Need More Info
Updated by Patrick Donnelly 9 months ago
- Description updated (diff)
- Status changed from Need More Info to New
- Assignee changed from Patrick Donnelly to Leonid Usov
Updated by Leonid Usov 9 months ago
- Blocked by Bug #62164: qa: "cluster [ERR] MDS abort because newly corrupt dentry to be committed: [dentry #0x1/a [fffffffffffffff6,head] auth (dversion lock) v=13..." added
Updated by Leonid Usov 9 months ago
- Status changed from New to In Progress
OK thanks Patrick! So this will wait until you submit the linked issue that introduces the new method.
What are other strategies one could use? Is it a common practice to base PRs on other unmerged PRs they depend on?
Updated by Patrick Donnelly 9 months ago
Leonid Usov wrote:
OK thanks Patrick! So this will wait until you submit the linked issue that introduces the new method.
What are other strategies one could use? Is it a common practice to base PRs on other unmerged PRs they depend on?
It's uncommon but totally fine.
Updated by Leonid Usov 9 months ago
- Status changed from In Progress to New
updating the status to New until the API becomes available
Updated by Leonid Usov 9 months ago
- Status changed from New to In Progress
- Pull request ID set to 53014
Updated by Leonid Usov 9 months ago
- Status changed from In Progress to Fix Under Review
Updated by Leonid Usov 7 months ago
- Assignee changed from Leonid Usov to Patrick Donnelly
Reassigning to Patrick following our discussion on this topic. Patrick wanted to add some generic safe async signal handler framework