Actions
Bug #2046
closedfilestore: do_op running during commit
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
commit_start() is supposed to quiesce writes, but I see
2012-02-08 16:12:43.487876 7f5324ccb700 filestore(/tmp/cephtest/data/osd.3.data) sync_entry committing 127294 sync_epoch 787 2012-02-08 16:12:43.487912 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) _flush_op_queue waiting for apply finisher 2012-02-08 16:12:43.487928 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) flush complete 2012-02-08 16:12:43.487939 7f531bab7700 osd.3 1388 _remove_pg 0.17 taking osd_lock 2012-02-08 16:12:43.487954 7f531bab7700 osd.3 1388 _remove_pg 0.17 removing final 2012-02-08 16:12:43.487972 7f5324ccb700 filestore(/tmp/cephtest/data/osd.3.data) taking async snap 'snap_127294' 2012-02-08 16:12:43.487998 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) queue_transactions existing osr 0x2bfd2d0/0x2c05770 2012-02-08 16:12:43.488018 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) queue_transactions (parallel) 127295 0x7e71c00 2012-02-08 16:12:43.488048 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) queue_op 0xa66f6c0 seq 127295 193 bytes (queue has 1 ops and 193 bytes) 2012-02-08 16:12:43.488081 7f531bab7700 osd.3 1388 pg[0.17( empty n=17 ec=1 les/c 1340/1367 1377/1377/1377) [2,5] r=-1 lpr=1377 lcod 1363'306 stray DELETING] remove_watchers 2012-02-08 16:12:43.488103 7f531bab7700 osd.3 1388 _put_pool 0 1 -> 0 2012-02-08 16:12:43.488116 7f531bab7700 osd.3 1388 _remove_pg 0.17 all done 2012-02-08 16:12:43.488136 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _do_op 0xa66f6c0 127295 osr 0x2bfd2d0/0x2c05770 start 2012-02-08 16:12:43.488153 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _do_transaction on 0x7e71c00 2012-02-08 16:12:43.488167 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_1df 2012-02-08 16:12:43.488396 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_1df = 0 2012-02-08 16:12:43.488410 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_244 2012-02-08 16:12:43.488546 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_244 = 0 2012-02-08 16:12:43.488558 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_254 2012-02-08 16:12:43.491911 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_254 = 0 2012-02-08 16:12:43.491940 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/5172cdfb/pglog_0.17/0 2012-02-08 16:12:43.493447 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/5172cdfb/pglog_0.17/0 = 0 2012-02-08 16:12:43.493464 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/28da3187/pginfo_0.17/0 2012-02-08 16:12:43.496779 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/28da3187/pginfo_0.17/0 = 0 2012-02-08 16:12:43.496796 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_head 2012-02-08 16:12:43.497054 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_head = 0 2012-02-08 16:12:43.497068 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _do_op 0xa66f6c0 127295 r = 0, finisher 0x2be4f10 0 2012-02-08 16:12:43.497079 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _finish_op on osr 0x2bfd2d0/0x2c05770 2012-02-08 16:12:43.502713 7f5324ccb700 filestore(/tmp/cephtest/data/osd.3.data) async snap create 'snap_127294' transid 0 got (22) Invalid argument os/FileStore.cc: In function 'void FileStore::sync_entry()' thread 7f5324ccb700 time 2012-02-08 16:12:43.569505 os/FileStore.cc: 3078: FAILED assert(0 == "async snap ioctl error")
Updated by Sage Weil about 12 years ago
this was broken by 259c509a8941bf7cdad8bd4ede0ccd73ca8a83d3, way back in v0.25! Sigh. The wait condition for _op_apply_start() was broken.
Updated by Sage Weil about 12 years ago
- Status changed from New to Resolved
Actions