Actions
Bug #16230
closedrbd-mirror: potential race condition accessing local image journal
Status:
Resolved
Priority:
High
Assignee:
Jason Dillaman
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
If the image watch is lost, the exclusive lock will be released and reacquired for safety reasons. Therefore, it's possible for a race to occur if the journal is closed in the background while rbd-mirror is accessing it w/o holding the necessary locks.
One specific example is in 'ImageReplayer::shut_down' where 'librbd::Journal::stop_external_replay' is invoked without holding the owner lock.
Updated by Jason Dillaman almost 8 years ago
Updated by Jason Dillaman almost 8 years ago
2016-06-13 18:32:21.109156 7f652a7fc700 20 rbd::mirror::ImageReplayer: 0x7f64f000c2e0 [3/48f20160-802f-4e40-b07e-867eacc34435] handle_init_remote_journaler: r=0 2016-06-13 18:32:21.109159 7f652a7fc700 20 rbd::mirror::ImageReplayer: 0x7f64f000c2e0 [3/48f20160-802f-4e40-b07e-867eacc34435] start_replay: 2016-06-13 18:32:21.110085 7f652a7fc700 -1 *** Caught signal (Segmentation fault) ** in thread 7f652a7fc700 thread_name:tp_journal ceph version 10.2.0-2257-g2dcb259 (2dcb259e11b050e3e6409e3709fb040778b39661) 1: (()+0x3bf817) [0x7f653da44817] 2: (()+0xf8d0) [0x7f653427f8d0] 3: (librbd::Journal<librbd::ImageCtx>::start_external_replay(librbd::journal::Replay<librbd::ImageCtx>**, Context*)+0x2a) [0x7f653d8fecca] 4: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::start_replay()+0x9e) [0x7f653d86b8de] 5: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::handle_init_remote_journaler(int)+0x240) [0x7f653d86bcd0] 6: (Context::complete(int)+0x9) [0x7f653d8550e9] 7: (Context::complete(int)+0x9) [0x7f653d8550e9] 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb7f) [0x7f653da8382f] 9: (ThreadPool::WorkThread::entry()+0x10) [0x7f653da84880] 10: (()+0x80a4) [0x7f65342780a4] 11: (clone()+0x6d) [0x7f6532be004d]
2016-06-13 15:19:17.593575 7f620d7fa700 20 rbd::mirror::ImageReplayer: 0x7f61f403c000 [3/60415c11-9d81-48fd-9d26-10d93f37cbc2] h andle_get_remote_tag: r=0 2016-06-13 15:19:17.593578 7f620d7fa700 20 rbd::mirror::ImageReplayer: 0x7f61f403c000 [3/60415c11-9d81-48fd-9d26-10d93f37cbc2] h andle_get_remote_tag: decoded remote tag 2: [mirror_uuid=, predecessor_mirror_uuid=, predecessor_tag_tid=1, predecessor_entry_ti d=99] 2016-06-13 15:19:17.593579 7f620d7fa700 20 rbd::mirror::ImageReplayer: 0x7f61f403c000 [3/60415c11-9d81-48fd-9d26-10d93f37cbc2] a llocate_local_tag: 2016-06-13 15:19:17.593796 7f61f0ff9700 20 rbd::mirror::image_sync::SyncPointPruneRequest: 0x7f61c80d9900 handle_remove_snap: r= 0 2016-06-13 15:19:17.593798 7f61f0ff9700 20 rbd::mirror::image_sync::SyncPointPruneRequest: 0x7f61c80d9900 send_refresh_image 2016-06-13 15:19:17.594357 7f620d7fa700 -1 *** Caught signal (Segmentation fault) ** in thread 7f620d7fa700 thread_name:fn_anonymous ceph version 10.2.0-2257-g2dcb259 (2dcb259e11b050e3e6409e3709fb040778b39661) 1: (()+0x3bf817) [0x7f6240555817] 2: (()+0xf8d0) [0x7f6236d908d0] 3: (librbd::Journal<librbd::ImageCtx>::allocate_tag(std::string const&, std::string const&, bool, unsigned long, unsigned long, Context*)+0x42) [0x7f624040f882] 4: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::allocate_local_tag()+0x16e) [0x7f624037d69e] 5: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::handle_get_remote_tag(int)+0xcd) [0x7f624037dbbd] 6: (Context::complete(int)+0x9) [0x7f62403660e9] 7: (()+0x34fca3) [0x7f62404e5ca3] 8: (()+0xa543d) [0x7f623766643d] 9: (()+0x8cf39) [0x7f623764df39] 10: (()+0x17d08e) [0x7f623773e08e] 11: (()+0x80a4) [0x7f6236d890a4] 12: (clone()+0x6d) [0x7f62356f104d]
Updated by Jason Dillaman almost 8 years ago
- Priority changed from Normal to High
Updated by Jason Dillaman almost 8 years ago
- Status changed from New to In Progress
- Assignee set to Jason Dillaman
- Backport set to jewel
Updated by Jason Dillaman almost 8 years ago
- Status changed from In Progress to Fix Under Review
Updated by Mykola Golub almost 8 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler almost 8 years ago
- Copied to Backport #16425: jewel: rbd-mirror: potential race condition accessing local image journal added
Updated by Loïc Dachary over 7 years ago
- Status changed from Pending Backport to Resolved
Actions