Bug #40341
multisite: failed assert(cursor) in mdlog trimming
Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
multisite
Backport:
luminous mimic nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Description
If the master zone doesn't have a copy of every period, it can crash in mdlog trimming:
/builddir/build/BUILD/ceph-12.2.8/src/rgw/rgw_sync.cc: 2387: FAILED assert(cursor) ceph version 12.2.8-128.el7cp (030358773c5213a14c1444a5147258672b2dc15f) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7ff642ce7f50] 2: (PurgePeriodLogsCR::operate()+0xa25) [0x5555715996a5] 3: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x7e) [0x5555713a2b4e] 4: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x41b) [0x5555713a567b] 5: (RGWSyncLogTrimThread::process()+0x1c5) [0x5555714734e5] 6: (RGWRadosThread::Worker::entry()+0x123) [0x555571404453] 7: (()+0x7dd5) [0x7ff64b718dd5] 8: (clone()+0x6d) [0x7ff63fb9fead]
The problem is identified on startup with errors like this:
2019-06-11 03:26:41.526812 7fe217523000 1 error read_lastest_epoch .rgw.root:periods.9c817b51-79ce-4c88-afe4-5ad70d112e30.latest_epoch 2019-06-11 03:26:41.526821 7fe217523000 0 failed to use_latest_epoch period id 9c817b51-79ce-4c88-afe4-5ad70d112e30 realm id : (2) No such file or directory 2019-06-11 03:26:41.526837 7fe217523000 1 rgw period puller: metadata master failed to read period 9c817b51-79ce-4c88-afe4-5ad70d112e30 from local storage: (2) No such file or directory 2019-06-11 03:26:41.526839 7fe217523000 1 failed to read period id=9c817b51-79ce-4c88-afe4-5ad70d112e30 for mdlog history: (2) No such file or directory
Related issues
History
#1 Updated by Casey Bodley almost 4 years ago
- Source set to Q/A
#2 Updated by fang yuxiang over 3 years ago
How about the progress?
#3 Updated by Shilpa MJ over 3 years ago
- Status changed from New to 7
- Pull request ID set to 31873
#4 Updated by Shilpa MJ over 3 years ago
- Copied to Backport #43134: nautilus: multisite: failed assert(cursor) in mdlog trimming added
#5 Updated by Patrick Donnelly over 3 years ago
- Status changed from 7 to Fix Under Review
#6 Updated by Casey Bodley about 3 years ago
- Status changed from Fix Under Review to Pending Backport
#7 Updated by Nathan Cutler about 3 years ago
- Copied to Backport #43633: mimic: multisite: failed assert(cursor) in mdlog trimming added
#8 Updated by Nathan Cutler about 3 years ago
- Copied to Backport #43634: luminous: multisite: failed assert(cursor) in mdlog trimming added
#9 Updated by Nathan Cutler about 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".