Project

General

Profile

Bug #2046

filestore: do_op running during commit

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

commit_start() is supposed to quiesce writes, but I see

2012-02-08 16:12:43.487876 7f5324ccb700 filestore(/tmp/cephtest/data/osd.3.data) sync_entry committing 127294 sync_epoch 787
2012-02-08 16:12:43.487912 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) _flush_op_queue waiting for apply finisher
2012-02-08 16:12:43.487928 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) flush complete
2012-02-08 16:12:43.487939 7f531bab7700 osd.3 1388 _remove_pg 0.17 taking osd_lock
2012-02-08 16:12:43.487954 7f531bab7700 osd.3 1388 _remove_pg 0.17 removing final
2012-02-08 16:12:43.487972 7f5324ccb700 filestore(/tmp/cephtest/data/osd.3.data) taking async snap 'snap_127294'
2012-02-08 16:12:43.487998 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) queue_transactions existing osr 0x2bfd2d0/0x2c05770
2012-02-08 16:12:43.488018 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) queue_transactions (parallel) 127295 0x7e71c00
2012-02-08 16:12:43.488048 7f531bab7700 filestore(/tmp/cephtest/data/osd.3.data) queue_op 0xa66f6c0 seq 127295 193 bytes   (queue has 1 ops and 193 bytes)
2012-02-08 16:12:43.488081 7f531bab7700 osd.3 1388 pg[0.17( empty n=17 ec=1 les/c 1340/1367 1377/1377/1377) [2,5] r=-1 lpr=1377 lcod 1363'306 stray DELETING] remove_watchers
2012-02-08 16:12:43.488103 7f531bab7700 osd.3 1388 _put_pool 0 1 -> 0
2012-02-08 16:12:43.488116 7f531bab7700 osd.3 1388 _remove_pg 0.17 all done
2012-02-08 16:12:43.488136 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _do_op 0xa66f6c0 127295 osr 0x2bfd2d0/0x2c05770 start
2012-02-08 16:12:43.488153 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _do_transaction on 0x7e71c00
2012-02-08 16:12:43.488167 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_1df
2012-02-08 16:12:43.488396 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_1df = 0
2012-02-08 16:12:43.488410 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_244
2012-02-08 16:12:43.488546 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_244 = 0
2012-02-08 16:12:43.488558 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_254
2012-02-08 16:12:43.491911 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_254 = 0
2012-02-08 16:12:43.491940 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/5172cdfb/pglog_0.17/0
2012-02-08 16:12:43.493447 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/5172cdfb/pglog_0.17/0 = 0
2012-02-08 16:12:43.493464 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/28da3187/pginfo_0.17/0
2012-02-08 16:12:43.496779 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) remove meta/28da3187/pginfo_0.17/0 = 0
2012-02-08 16:12:43.496796 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_head
2012-02-08 16:12:43.497054 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _destroy_collection /tmp/cephtest/data/osd.3.data/current/0.17_head = 0
2012-02-08 16:12:43.497068 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _do_op 0xa66f6c0 127295 r = 0, finisher 0x2be4f10 0
2012-02-08 16:12:43.497079 7f5322cc7700 filestore(/tmp/cephtest/data/osd.3.data) _finish_op on osr 0x2bfd2d0/0x2c05770
2012-02-08 16:12:43.502713 7f5324ccb700 filestore(/tmp/cephtest/data/osd.3.data) async snap create 'snap_127294' transid 0 got (22) Invalid argument
os/FileStore.cc: In function 'void FileStore::sync_entry()' thread 7f5324ccb700 time 2012-02-08 16:12:43.569505
os/FileStore.cc: 3078: FAILED assert(0 == "async snap ioctl error")

Associated revisions

Revision 1009d1a0 (diff)
Added by Sage Weil over 9 years ago

filestore: fix op queue quiesce during commit

When I added the ordering constraint fix back in 259c509a I got the
check backwards. We want to wait if we are blocked OR we are not in the
front of the line (i.e., proceed if we are not blocked AND first in line).

Fixes: #2046
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 9 years ago

this was broken by 259c509a8941bf7cdad8bd4ede0ccd73ca8a83d3, way back in v0.25! Sigh. The wait condition for _op_apply_start() was broken.

Also available in: Atom PDF