Project

General

Profile

Actions

Bug #16230

closed

rbd-mirror: potential race condition accessing local image journal

Added by Jason Dillaman almost 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If the image watch is lost, the exclusive lock will be released and reacquired for safety reasons. Therefore, it's possible for a race to occur if the journal is closed in the background while rbd-mirror is accessing it w/o holding the necessary locks.

One specific example is in 'ImageReplayer::shut_down' where 'librbd::Journal::stop_external_replay' is invoked without holding the owner lock.

http://teuthology.ovh.sepia.ceph.com/teuthology/jdillaman-2016-06-10_03:05:15-rbd-master---basic-openstack/21667/teuthology.log


Related issues 1 (0 open1 closed)

Copied to rbd - Backport #16425: jewel: rbd-mirror: potential race condition accessing local image journal ResolvedLoïc DacharyActions
Actions #1

Updated by Jason Dillaman almost 8 years ago

  • Description updated (diff)
Actions #3

Updated by Jason Dillaman almost 8 years ago

2016-06-13 18:32:21.109156 7f652a7fc700 20 rbd::mirror::ImageReplayer: 0x7f64f000c2e0 [3/48f20160-802f-4e40-b07e-867eacc34435] handle_init_remote_journaler: r=0
2016-06-13 18:32:21.109159 7f652a7fc700 20 rbd::mirror::ImageReplayer: 0x7f64f000c2e0 [3/48f20160-802f-4e40-b07e-867eacc34435] start_replay:
2016-06-13 18:32:21.110085 7f652a7fc700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f652a7fc700 thread_name:tp_journal

 ceph version 10.2.0-2257-g2dcb259 (2dcb259e11b050e3e6409e3709fb040778b39661)
 1: (()+0x3bf817) [0x7f653da44817]
 2: (()+0xf8d0) [0x7f653427f8d0]
 3: (librbd::Journal<librbd::ImageCtx>::start_external_replay(librbd::journal::Replay<librbd::ImageCtx>**, Context*)+0x2a) [0x7f653d8fecca]
 4: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::start_replay()+0x9e) [0x7f653d86b8de]
 5: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::handle_init_remote_journaler(int)+0x240) [0x7f653d86bcd0]
 6: (Context::complete(int)+0x9) [0x7f653d8550e9]
 7: (Context::complete(int)+0x9) [0x7f653d8550e9]
 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb7f) [0x7f653da8382f]
 9: (ThreadPool::WorkThread::entry()+0x10) [0x7f653da84880]
 10: (()+0x80a4) [0x7f65342780a4]
 11: (clone()+0x6d) [0x7f6532be004d]
2016-06-13 15:19:17.593575 7f620d7fa700 20 rbd::mirror::ImageReplayer: 0x7f61f403c000 [3/60415c11-9d81-48fd-9d26-10d93f37cbc2] h
andle_get_remote_tag: r=0
2016-06-13 15:19:17.593578 7f620d7fa700 20 rbd::mirror::ImageReplayer: 0x7f61f403c000 [3/60415c11-9d81-48fd-9d26-10d93f37cbc2] h
andle_get_remote_tag: decoded remote tag 2: [mirror_uuid=, predecessor_mirror_uuid=, predecessor_tag_tid=1, predecessor_entry_ti
d=99]
2016-06-13 15:19:17.593579 7f620d7fa700 20 rbd::mirror::ImageReplayer: 0x7f61f403c000 [3/60415c11-9d81-48fd-9d26-10d93f37cbc2] a
llocate_local_tag:
2016-06-13 15:19:17.593796 7f61f0ff9700 20 rbd::mirror::image_sync::SyncPointPruneRequest: 0x7f61c80d9900 handle_remove_snap: r=
0
2016-06-13 15:19:17.593798 7f61f0ff9700 20 rbd::mirror::image_sync::SyncPointPruneRequest: 0x7f61c80d9900 send_refresh_image
2016-06-13 15:19:17.594357 7f620d7fa700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f620d7fa700 thread_name:fn_anonymous

 ceph version 10.2.0-2257-g2dcb259 (2dcb259e11b050e3e6409e3709fb040778b39661)
 1: (()+0x3bf817) [0x7f6240555817]
 2: (()+0xf8d0) [0x7f6236d908d0]
 3: (librbd::Journal<librbd::ImageCtx>::allocate_tag(std::string const&, std::string const&, bool, unsigned long, unsigned long, Context*)+0x42) [0x7f624040f882]
 4: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::allocate_local_tag()+0x16e) [0x7f624037d69e]
 5: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::handle_get_remote_tag(int)+0xcd) [0x7f624037dbbd]
 6: (Context::complete(int)+0x9) [0x7f62403660e9]
 7: (()+0x34fca3) [0x7f62404e5ca3]
 8: (()+0xa543d) [0x7f623766643d]
 9: (()+0x8cf39) [0x7f623764df39]
 10: (()+0x17d08e) [0x7f623773e08e]
 11: (()+0x80a4) [0x7f6236d890a4]
 12: (clone()+0x6d) [0x7f62356f104d]
Actions #4

Updated by Jason Dillaman almost 8 years ago

  • Priority changed from Normal to High
Actions #5

Updated by Jason Dillaman almost 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman
  • Backport set to jewel
Actions #6

Updated by Jason Dillaman almost 8 years ago

  • Status changed from In Progress to Fix Under Review
Actions #7

Updated by Mykola Golub almost 8 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #8

Updated by Nathan Cutler almost 8 years ago

  • Copied to Backport #16425: jewel: rbd-mirror: potential race condition accessing local image journal added
Actions #9

Updated by Loïc Dachary over 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF