Actions
Bug #17569
closedmultisite: assertion in RGWRados::wakeup_data_sync_shards
% Done:
0%
Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
If RGWDataSyncControlCR returns an error to RGWRemoteDataLog::run_sync(), RGWRemoteDataLog::data_sync_cr is not reset to null even though the CR itself has been freed. When async notifications come in later to signal the data_sync_cr, this results in a use-after-free.
Log of RGWDataSyncControlCR failure:
2016-10-11 14:55:09.676991 7f6196698700 1 -- 10.17.151.111:0/1018465644 <== osd.0 10.17.151.111:6800/18857 88 ==== osd_op_reply(90 datalog.sync-status.88c0bc4d-e840-448c-ad0b-6f4576e4dded [read 0~0] v0'0 uv104 ondisk = 0) v7 ==== 176+0+0 (3008961858 0 0) 0x55add5466680 con 0x55add5877000 2016-10-11 14:55:09.677454 7f6180e6d700 20 rados->read r=0 bl.length=0 2016-10-11 14:55:09.677542 7f617c664700 20 cr:s=0x55add5c6f750:op=0x55add5ce2000:20RGWSimpleRadosReadCRI18rgw_data_sync_infoE: operate() 2016-10-11 14:55:09.677581 7f617c664700 20 cr:s=0x55add5c6f750:op=0x55add5ce2000:20RGWSimpleRadosReadCRI18rgw_data_sync_infoE: operate() returned r=-5 2016-10-11 14:55:09.677590 7f617c664700 20 cr:s=0x55add5c6f750:op=0x55add550d600:30RGWReadDataSyncStatusCoroutine: operate() 2016-10-11 14:55:09.677591 7f617c664700 4 data sync: failed to read sync status info with (5) Input/output error 2016-10-11 14:55:09.677595 7f617c664700 20 cr:s=0x55add5c6f750:op=0x55add550d600:30RGWReadDataSyncStatusCoroutine: operate() returned r=-5 2016-10-11 14:55:09.677598 7f617c664700 20 cr:s=0x55add5c6f750:op=0x55add5a0e000:13RGWDataSyncCR: operate() 2016-10-11 14:55:09.677599 7f617c664700 0 data sync: ERROR: failed to fetch sync status, retcode=-5 2016-10-11 14:55:09.677600 7f617c664700 20 cr:s=0x55add5c6f750:op=0x55add5a0e000:13RGWDataSyncCR: operate() returned r=-5 2016-10-11 14:55:09.677601 7f617c664700 20 cr:s=0x55add5c6f750:op=0x55add5448700:20RGWDataSyncControlCR: operate() 2016-10-11 14:55:09.677604 7f617c664700 5 data sync: Sync:88c0bc4d:data:Data:all:finish 2016-10-11 14:55:09.677618 7f617c664700 0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -5 2016-10-11 14:55:09.677619 7f617c664700 20 cr:s=0x55add5c6f750:op=0x55add5448700:20RGWDataSyncControlCR: operate() returned r=-5 2016-10-11 14:55:09.677620 7f617c664700 20 stack->operate() returned ret=-5 2016-10-11 14:55:09.677621 7f617c664700 20 run: stack=0x55add5c6f750 is done 2016-10-11 14:55:09.677627 7f617c664700 20 run(stacks) returned r=-5 2016-10-11 14:55:09.677629 7f617c664700 0 data sync: ERROR: failed to run sync
Log of the assertion:
2016-10-11 14:57:59.836653 7f615060c700 20 execute(): read data: [{"key":3,"val":["byxliz-11:e0ca40ce-cc60-4a95-bb0b-a9a5b29f689d.4116.12"]}] 2016-10-11 14:57:59.836880 7f615060c700 20 execute(): updated shard=3 2016-10-11 14:57:59.836886 7f615060c700 20 execute(): modified key=byxliz-11:e0ca40ce-cc60-4a95-bb0b-a9a5b29f689d.4116.12 2016-10-11 14:57:59.836889 7f615060c700 20 wakeup_data_sync_shards: source_zone=88c0bc4d-e840-448c-ad0b-6f4576e4dded, shard_ids={3=byxliz-11:e0ca40ce-cc60-4a95-bb0b-a9a5b29f689d.4116.12} 2016-10-11 14:57:59.844090 7f615060c700 -1 /home/cbodley/ceph/src/common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f615060c700 time 2016-10-11 14:57:59.836896 /home/cbodley/ceph/src/common/Mutex.cc: 113: FAILED assert(r == 0) ceph version Development (no_version) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x55adca9f1610] 2: (Mutex::Lock(bool)+0x14f) [0x55adca9b72a5] 3: (RGWDataSyncControlCR::wakeup(int, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&)+0x34) [0x55adca90d5e6] 4: (RGWRemoteDataLog::wakeup(int, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&)+0x70) [0x55adca8eebb2] 5: (RGWDataSyncStatusManager::wakeup(int, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&)+0x2c) [0x55adca7685ca] 6: (RGWDataSyncProcessorThread::wakeup_sync_shards(std::map<int, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::less<int>, std::allocator<std::pair<int const, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > >&)+0x8a) [0x55adca76a2d4] 7: (RGWRados::wakeup_data_sync_shards(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<int, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::less<int>, std::allocator<std::pair<int const, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > >&)+0x38e) [0x55adca70f5b6] 8: (RGWOp_DATALog_Notify::execute()+0x78e) [0x55adca7e056a] 9: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, bool)+0x5ef) [0x55adca6f4c5b] 10: (process_request(RGWRados*, RGWREST*, RGWRequest*, RGWStreamIO*, OpsLogSocket*)+0xbbd) [0x55adca6f58ba] 11: (()+0xa12998) [0x55adca5b6998] 12: (()+0xa326f7) [0x55adca5d66f7] 13: (()+0xa34ddf) [0x55adca5d8ddf] 14: (()+0xa3538a) [0x55adca5d938a] 15: (()+0xa35483) [0x55adca5d9483] 16: (()+0x761a) [0x7f61a096f61a] 17: (clone()+0x6d) [0x7f619f16159d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Casey Bodley over 7 years ago
- Related to Bug #17568: multisite: race between ReadSyncStatus and InitSyncStatus leads to EIO errors added
Updated by Casey Bodley over 7 years ago
- Status changed from New to Fix Under Review
- Backport set to jewel
Updated by Yehuda Sadeh over 7 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Loïc Dachary over 7 years ago
- Copied to Backport #17786: jewel: multisite: assertion in RGWRados::wakeup_data_sync_shards added
Updated by Nathan Cutler almost 7 years ago
- Status changed from Pending Backport to Resolved
Actions