Bug #2046
filestore: do_op running during commit
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
commit_start() is supposed to quiesce writes, but I see
2012-02-08 16:12:43.487876 7f5324ccb700 filestore(/tmp/cephtest/data/osd.3.data) sync_entry committing 127294 sync_epoch 787 2012-02-08 16:12:43.487912 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) _flush_op_queue waiting for apply finisher 2012-02-08 16:12:43.487928 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) flush complete 2012-02-08 16:12:43.487939 7f531bab7700 osd.3 1388 _remove_pg 0.17 taking osd_lock 2012-02-08 16:12:43.487954 7f531bab7700 osd.3 1388 _remove_pg 0.17 removing final 2012-02-08 16:12:43.487972 7f5324ccb700 filestore(/tmp/cephtest/data/osd.3.data) taking async snap 'snap_127294' 2012-02-08 16:12:43.487998 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) queue_transactions existing osr 0x2bfd2d0/0x2c05770 2012-02-08 16:12:43.488018 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) queue_transactions (parallel) 127295 0x7e71c00 2012-02-08 16:12:43.488048 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) queue_op 0xa66f6c0 seq 127295 193 bytes (queue has 1 ops and 193 bytes) 2012-02-08 16:12:43.488081 7f531bab7700 osd.3 1388 pg[0.17( empty n=17 ec=1 les/c 1340/1367 1377/1377/1377) [2,5] r=-1 lpr=1377 lcod 1363'306 stray DELETING] remove_watchers 2012-02-08 16:12:43.488103 7f531bab7700 osd.3 1388 _put_pool 0 1 -> 0 2012-02-08 16:12:43.488116 7f531bab7700 osd.3 1388 _remove_pg 0.17 all done 2012-02-08 16:12:43.488136 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _do_op 0xa66f6c0 127295 osr 0x2bfd2d0/0x2c05770 start 2012-02-08 16:12:43.488153 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _do_transaction on 0x7e71c00 2012-02-08 16:12:43.488167 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_1df 2012-02-08 16:12:43.488396 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_1df = 0 2012-02-08 16:12:43.488410 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_244 2012-02-08 16:12:43.488546 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_244 = 0 2012-02-08 16:12:43.488558 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_254 2012-02-08 16:12:43.491911 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_254 = 0 2012-02-08 16:12:43.491940 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/5172cdfb/pglog_0.17/0 2012-02-08 16:12:43.493447 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/5172cdfb/pglog_0.17/0 = 0 2012-02-08 16:12:43.493464 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/28da3187/pginfo_0.17/0 2012-02-08 16:12:43.496779 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/28da3187/pginfo_0.17/0 = 0 2012-02-08 16:12:43.496796 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_head 2012-02-08 16:12:43.497054 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_head = 0 2012-02-08 16:12:43.497068 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _do_op 0xa66f6c0 127295 r = 0, finisher 0x2be4f10 0 2012-02-08 16:12:43.497079 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _finish_op on osr 0x2bfd2d0/0x2c05770 2012-02-08 16:12:43.502713 7f5324ccb700 filestore(/tmp/cephtest/data/osd.3.data) async snap create 'snap_127294' transid 0 got (22) Invalid argument os/FileStore.cc: In function 'void FileStore::sync_entry()' thread 7f5324ccb700 time 2012-02-08 16:12:43.569505 os/FileStore.cc: 3078: FAILED assert(0 == "async snap ioctl error")
Associated revisions
filestore: fix op queue quiesce during commit
When I added the ordering constraint fix back in 259c509a I got the
check backwards. We want to wait if we are blocked OR we are not in the
front of the line (i.e., proceed if we are not blocked AND first in line).
Fixes: #2046
Signed-off-by: Sage Weil <sage@newdream.net>
History
#1 Updated by Sage Weil almost 12 years ago
this was broken by 259c509a8941bf7cdad8bd4ede0ccd73ca8a83d3, way back in v0.25! Sigh. The wait condition for _op_apply_start() was broken.
#2 Updated by Sage Weil almost 12 years ago
- Status changed from New to Resolved