Bug #4249
librbd: recursive lock of snap_lock during snap_rollback
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2013-02-23T06:25:37.353 INFO:teuthology.task.workunit.client.0.err:test_rbd.TestImage.test_rollback_to_snap ... common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7f5ab043b700 time 2013-02-23 06:25:33.204919 2013-02-23T06:25:37.354 INFO:teuthology.task.workunit.client.0.err:common/Mutex.cc: 93: FAILED assert(r == 0) 2013-02-23T06:25:37.354 INFO:teuthology.task.workunit.client.0.err: ceph version 0.57-404-g8c05af5 (8c05af5dc3c398dda4c196a64f344db7ea69d209) 2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7f5aac8be90d] 2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 2: (Mutex::Lock(bool)+0x14e) [0x7f5aac8aa2d4] 2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 3: (librbd::LibrbdWriteback::write(object_t const&, object_locator_t const&, unsigned long, unsigned long, SnapContext const&, ceph::buffer::list const&, utime_t, unsigned long, unsigned int, Context*)+0x47) [0x7f5aa832e445] 2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 4: (ObjectCacher::bh_write(ObjectCacher::BufferHead*)+0x30a) [0x7f5aa83361e4] 2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 5: (ObjectCacher::flush(ObjectCacher::Object*, long, long)+0x33e) [0x7f5aa833cea0] 2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 6: (ObjectCacher::flush_set(ObjectCacher::ObjectSet*, Context*)+0x311) [0x7f5aa833d249] 2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 7: (librbd::ImageCtx::flush_cache()+0xe7) [0x7f5aa8308c7f] 2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 8: (librbd::ImageCtx::invalidate_cache()+0xc7) [0x7f5aa830909b] 2013-02-23T06:25:37.355 INFO:teuthology.task.workunit.client.0.err: 9: (librbd::snap_rollback(librbd::ImageCtx*, char const*, librbd::ProgressContext&)+0x375) [0x7f5aa831f22b] 2013-02-23T06:25:37.356 INFO:teuthology.task.workunit.client.0.err: 10: (rbd_snap_rollback()+0x3c) [0x7f5aa82f1613] 2013-02-23T06:25:37.356 INFO:teuthology.task.workunit.client.0.err: 11: (ffi_call_unix64()+0x4c) [0x7f5aad87eea4] 2013-02-23T06:25:37.356 INFO:teuthology.task.workunit.client.0.err: 12: (ffi_call()+0x1e5) [0x7f5aad87e8c5]
job was
ubuntu@teuthology:/a/teuthology-2013-02-23_01:00:04-regression-next-testing-basic/10046$ cat orig.config.yaml kernel: kdb: true sha1: 92a49fb0f79f3300e6e50ddf56238e70678e4202 nuke-on-error: true overrides: ceph: conf: global: ms inject socket failures: 5000 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: 8c05af5dc3c398dda4c196a64f344db7ea69d209 s3tests: branch: next workunit: sha1: 8c05af5dc3c398dda4c196a64f344db7ea69d209 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - ceph: conf: client: rbd cache: true - ceph-fuse: null - workunit: clients: client.0: - rbd/test_librbd_python.sh env: RBD_FEATURES: '1'
Associated revisions
librbd: drop snap_lock before invalidating cache
Writeback will take the snap_lock, so read everything we need under it
before invalidating the cache. This avoids a recursive lock when writeback
uses snap_lock while snap_rollback() was holding it.
Remove a not-very-useful debugging message that depended on snap_lock being held.
Fixes: #4249
Backport: bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
History
#1 Updated by Josh Durgin about 11 years ago
- Subject changed from python rbd test failure on next to librbd: recursive lock of snap_lock during snap_rollback
- Status changed from New to Fix Under Review
wip-4249
#2 Updated by Sage Weil about 11 years ago
- Status changed from Fix Under Review to Resolved
commit:9096d70642880946b0b477e33f7debabbefec9fa