Project

General

Profile

Bug #4249

librbd: recursive lock of snap_lock during snap_rollback

Added by Sage Weil about 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

2013-02-23T06:25:37.353 INFO:teuthology.task.workunit.client.0.err:test_rbd.TestImage.test_rollback_to_snap ... common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f5ab043b700 time 2013-02-23 06:25:33.204919
2013-02-23T06:25:37.354 INFO:teuthology.task.workunit.client.0.err:common/Mutex.cc: 93: FAILED assert(r == 0)
2013-02-23T06:25:37.354 INFO:teuthology.task.workunit.client.0.err: ceph version 0.57-404-g8c05af5 (8c05af5dc3c398dda4c196a64f344db7ea69d209)
2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7f5aac8be90d]
2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 2: (Mutex::Lock(bool)+0x14e) [0x7f5aac8aa2d4]
2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 3: (librbd::LibrbdWriteback::write(object_t const&, object_locator_t const&, unsigned long, unsigned long, SnapContext const&, ceph::buffer::list const&, utime_t, unsigned long, unsigned int, Context*)+0x47) [0x7f5aa832e445]
2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 4: (ObjectCacher::bh_write(ObjectCacher::BufferHead*)+0x30a) [0x7f5aa83361e4]
2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 5: (ObjectCacher::flush(ObjectCacher::Object*, long, long)+0x33e) [0x7f5aa833cea0]
2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 6: (ObjectCacher::flush_set(ObjectCacher::ObjectSet*, Context*)+0x311) [0x7f5aa833d249]
2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 7: (librbd::ImageCtx::flush_cache()+0xe7) [0x7f5aa8308c7f]
2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 8: (librbd::ImageCtx::invalidate_cache()+0xc7) [0x7f5aa830909b]
2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 9: (librbd::snap_rollback(librbd::ImageCtx*, char const*, librbd::ProgressContext&)+0x375) [0x7f5aa831f22b]
2013-02-23T06:25:37.356 INFO:teuthology.task.workunit.client.0.err: 10: (rbd_snap_rollback()+0x3c) [0x7f5aa82f1613]
2013-02-23T06:25:37.356 INFO:teuthology.task.workunit.client.0.err: 11: (ffi_call_unix64()+0x4c) [0x7f5aad87eea4]
2013-02-23T06:25:37.356 INFO:teuthology.task.workunit.client.0.err: 12: (ffi_call()+0x1e5) [0x7f5aad87e8c5]

job was
ubuntu@teuthology:/a/teuthology-2013-02-23_01:00:04-regression-next-testing-basic/10046$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 92a49fb0f79f3300e6e50ddf56238e70678e4202
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 8c05af5dc3c398dda4c196a64f344db7ea69d209
  s3tests:
    branch: next
  workunit:
    sha1: 8c05af5dc3c398dda4c196a64f344db7ea69d209
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- ceph:
    conf:
      client:
        rbd cache: true
- ceph-fuse: null
- workunit:
    clients:
      client.0:
      - rbd/test_librbd_python.sh
    env:
      RBD_FEATURES: '1'

Associated revisions

Revision 5806226c (diff)
Added by Josh Durgin about 7 years ago

librbd: drop snap_lock before invalidating cache

Writeback will take the snap_lock, so read everything we need under it
before invalidating the cache. This avoids a recursive lock when writeback
uses snap_lock while snap_rollback() was holding it.

Remove a not-very-useful debugging message that depended on snap_lock being held.

Fixes: #4249
Backport: bobtail
Signed-off-by: Josh Durgin <>

History

#1 Updated by Josh Durgin about 7 years ago

  • Subject changed from python rbd test failure on next to librbd: recursive lock of snap_lock during snap_rollback
  • Status changed from New to Fix Under Review

wip-4249

#2 Updated by Sage Weil about 7 years ago

  • Status changed from Fix Under Review to Resolved

commit:9096d70642880946b0b477e33f7debabbefec9fa

Also available in: Atom PDF