Project

General

Profile

Bug #1490

cfuse assert failure: assert(ob->last_commit_tid < tid)

Added by Josh Durgin almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is from a tiobench run:

$ zcat ~teuthworker/archive/nightly_coverage_2011-09-01/586/remote/ubuntu@sepia72.ceph.dreamhost.com/log/client.0.log.gz 
2011-09-01 15:52:22.036627 7fb026cc5720 ceph version 0.34-375-g7f640a9 (commit:7f640a9256b8258c78c917ac684e85901cf63eea), process cfuse, pid 1133
osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t)', in thread '0x7fb0223f3700'
osdc/ObjectCacher.cc: 672: FAILED assert(ob->last_commit_tid < tid)
 ceph version 0.34-375-g7f640a9 (commit:7f640a9256b8258c78c917ac684e85901cf63eea)
 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long)+0xc7d) [0x64b8ad]
 2: (ObjectCacher::C_WriteCommit::finish(int)+0x7a) [0x661aca]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xd8a) [0x63070a]
 4: (Client::ms_dispatch(Message*)+0x627) [0x4b6437]
 5: (SimpleMessenger::dispatch_entry()+0x9d2) [0x520c12]
 6: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x45a8ec]
 7: (Thread::_entry_func(void*)+0x12) [0x516182]
 8: (()+0x7971) [0x7fb026668971]
 9: (clone()+0x6d) [0x7fb0250fc92d]


Related issues

Related to Ceph - Feature #1863: qa: tester for osd op reply order Resolved 12/28/2011

Associated revisions

Revision 46cfda7b (diff)
Added by Sage Weil almost 10 years ago

osd: preserve ordering when throttling races with missing/degraded requeue

When we delay an op because the op_queue is full, we can violate the op
order:

- op1 comes in, waits because object is missing
- op2 comes in, throttles on op queue
- op1 is requeued (no longer missing)
- queue drains, op2 happens
- op1 happens

To avoid this, if we delay, requeue ourselves... after whatever else is
on the queue.

Fixes: #1490
Signed-off-by: Sage Weil <>

Revision b43b8645 (diff)
Added by Sage Weil over 9 years ago

osd: fix race between op requeueing and _dispatch

If a message is working it's way through _dispatch, and another thread
requeues waiting messages under pg->lock (e.g.
osd->take_waiting(waiting_for_active)), the requeued ops are processed
after the one _dispatch() is chewing on, breaking client ordering.

Instead, add a new OSD::requeue_ops() that reinjects ops back into the
op queue by feeding them to the handle*() helpers. Those do last minute
checks before enqueuing the ops.

Fixes: #1490 (again)
Signed-off-by: Sage Weil <>

Revision bebd393b (diff)
Added by Sage Weil over 9 years ago

objecter: ignore replies from old request attempts

If we know the request attempt, ignore old attempts.

If we do not know the attempt (because the server is old), accept the
reply. This could lead to doing some ACK callbacks we shouldn't in
extreme failure/recovery scenarios, but that is better than doing
the callbacks out of order.

Partially fixes: #1490
Signed-off-by: Sage Weil <>

Revision b1da5115 (diff)
Added by Josh Durgin over 9 years ago

pg: add a configurable lower bound on log size

This helps prevent problems with retrying requests being detected as
duplicates. The problem occurs when the log is trimmed too
aggressively, and an earlier tid is removed from the log, while a
later one is not. The later request will be detected as a duplicate
and responded to immediately, possibly violating the ordering of the
requests.

Partially fixes #1490.
Signed-off-by: Josh Durgin <>

History

#1 Updated by Sage Weil almost 10 years ago

  • Category changed from 11 to OSD
  • Assignee set to Sage Weil

Just reproduced this. The problem is on the OSD side... out of order acks:

ubuntu@sepia27:/tmp/cephtest$ grep 10000000004.00000001 archive/log/client.0.log | grep -- --
2011-09-04 06:35:36.574115 7f23bd6c9700 -- 10.3.14.154:0/23374 --> 10.3.14.159:6801/14836 -- osd_op(client4108.0:38 10000000004.00000001 [write 0~1044480] 0.c05947b1 snapc 1=[]) v1 -- ?+0 0x5925000 con 0x25da000
2011-09-04 06:35:37.228911 7f23bd6c9700 -- 10.3.14.154:0/23374 --> 10.3.14.159:6801/14836 -- osd_op(client4108.0:48 10000000004.00000001 [write 1044480~1441792] 0.c05947b1 snapc 1=[]) v1 -- ?+0 0x6231240 con 0x25da000
2011-09-04 06:35:37.576335 7f23bcec8700 -- 10.3.14.154:0/23374 <== osd0 10.3.14.159:6801/14836 24 ==== osd_op_reply(38 10000000004.00000001 [write 0~1044480] ondisk = 0) v2 ==== 111+0+0 (1102015817 0 0) 0x5c37c40 con 0x25da000
2011-09-04 06:35:37.905950 7f23bd6c9700 -- 10.3.14.154:0/23374 --> 10.3.14.159:6801/14836 -- osd_op(client4108.0:58 10000000004.00000001 [write 2486272~1519616] 0.c05947b1 snapc 1=[]) v1 -- ?+0 0x5925b40 con 0x25da000
2011-09-04 06:35:38.216935 7f23bcec8700 -- 10.3.14.154:0/23374 <== osd0 10.3.14.159:6801/14836 31 ==== osd_op_reply(48 10000000004.00000001 [write 1044480~1441792] ondisk = 0) v2 ==== 111+0+0 (2962237175 0 0) 0x7fe5a80 con 0x25da000
2011-09-04 06:35:38.370871 7f23bd6c9700 -- 10.3.14.154:0/23374 --> 10.3.14.159:6801/14836 -- osd_op(client4108.0:68 10000000004.00000001 [write 4005888~188416] 0.c05947b1 snapc 1=[]) v1 -- ?+0 0x80e4480 con 0x25da000
2011-09-04 06:35:39.545406 7f23bcec8700 -- 10.3.14.154:0/23374 <== osd0 10.3.14.159:6801/14836 44 ==== osd_op_reply(68 10000000004.00000001 [write 4005888~188416] ondisk = 0) v2 ==== 111+0+0 (909842800 0 0) 0x7fe58c0 con 0x25da000
2011-09-04 06:35:39.545703 7f23bcec8700 -- 10.3.14.154:0/23374 <== osd0 10.3.14.159:6801/14836 45 ==== osd_op_reply(58 10000000004.00000001 [write 2486272~1519616] ondisk = 0) v2 ==== 111+0+0 (4128546514 0 0) 0x7fe5a80 con 0x25da000

#2 Updated by Greg Farnum almost 10 years ago

The last time I saw this it was actually due to message processing priority...I don't think that should be the case here, but make sure the OSD is sending the messages out in the same order cfuse is processing them!

#3 Updated by Sage Weil almost 10 years ago

  • Target version changed from v0.35 to v0.36

#4 Updated by Sage Weil almost 10 years ago

  • translation missing: en.field_position set to 5

#5 Updated by Sage Weil almost 10 years ago

  • translation missing: en.field_position deleted (23)
  • translation missing: en.field_position set to 47

#6 Updated by Josh Durgin almost 10 years ago

Reproduced with debugging for the osd and cfuse when thrashing with bonnie. Logs are at vit:~joshd/bug_1490

#7 Updated by Sage Weil almost 10 years ago

  • Status changed from New to Resolved

should be fixed by a711f2935c0637b1b8ee88e5c7ce8d6e05c29467, which removes throttle_op_queue entirely.

#8 Updated by Josh Durgin over 9 years ago

  • Status changed from Resolved to In Progress
  • Target version changed from v0.36 to v0.37

Happened again with tiobench. From teutholoyg:~teuthworker/archive/nightly_coverage_2011-09-29/814/teuthology.log:

2011-09-29T00:50:54.774 INFO:teuthology.task.workunit.client.0.err:+ for i in '`seq 1 10`'
2011-09-29T00:50:54.774 INFO:teuthology.task.workunit.client.0.err:+ tiotest -f 20 -t 10 -d . -T -c -D 20 -r 1000
2011-09-29T00:50:55.792 INFO:teuthology.task.workunit.client.0.err:Waiting write threads to finish...do_write_test: initial seek 0
2011-09-29T00:50:55.792 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.794 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.796 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.807 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.808 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.809 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.811 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.812 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.829 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:51:04.610 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t)', in thread '0x7fd8e964d700'
2011-09-29T00:51:04.610 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: 672: FAILED assert(ob->last_commit_tid < tid)
2011-09-29T00:51:04.610 INFO:teuthology.task.cfuse.cfuse.0.err: ceph version 0.35-158-gbd91444 (commit:bd9144408d928cf92036166e20bcc2ef41819534)
2011-09-29T00:51:04.610 INFO:teuthology.task.cfuse.cfuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long)+0xc7d) [0x64e22d]
2011-09-29T00:51:04.611 INFO:teuthology.task.cfuse.cfuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x7a) [0x66444a]
2011-09-29T00:51:04.611 INFO:teuthology.task.cfuse.cfuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xd8a) [0x6335da]
2011-09-29T00:51:04.611 INFO:teuthology.task.cfuse.cfuse.0.err: 4: (Client::ms_dispatch(Message*)+0x627) [0x4b47d7]
2011-09-29T00:51:04.611 INFO:teuthology.task.cfuse.cfuse.0.err: 5: (SimpleMessenger::dispatch_entry()+0xa0a) [0x530ada]
2011-09-29T00:51:04.611 INFO:teuthology.task.cfuse.cfuse.0.err: 6: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x45aa1c]
2011-09-29T00:51:04.612 INFO:teuthology.task.cfuse.cfuse.0.err: 7: (Thread::_entry_func(void*)+0x12) [0x5229f2]
2011-09-29T00:51:04.612 INFO:teuthology.task.cfuse.cfuse.0.err: 8: (()+0x7971) [0x7fd8ed8c2971]
2011-09-29T00:51:04.612 INFO:teuthology.task.cfuse.cfuse.0.err: 9: (clone()+0x6d) [0x7fd8ec35692d]

#9 Updated by Sage Weil over 9 years ago

  • Target version changed from v0.37 to v0.38

#10 Updated by Sage Weil over 9 years ago

Reproduced. This time ordering is violated by a race between scrub completion doing take_waiters(waiting_for_active) and a new request in _dispatch waiting for the pg lock.

That same race will apply for any callers of take_waiters under pg->lock where we care about ordering. the ops need to be requeued in front of what _dispatch is chewing on.

#11 Updated by Sage Weil over 9 years ago

  • Status changed from In Progress to Resolved

#12 Updated by Sage Weil over 9 years ago

  • Status changed from Resolved to 12
  • Priority changed from Normal to High
  • Target version changed from v0.38 to v0.39

happened again on /var/lib/teuthworker/archive/nightly_coverage_2011-11-21-b/2818

This may be the same root cause as #1727 tho? maybe? hmm..


2011-11-21T13:59:04.426 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t)', in thread '7f0764dff700'
2011-11-21T13:59:04.426 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: 672: FAILED assert(ob->last_commit_tid < tid)
2011-11-21T13:59:04.438 INFO:teuthology.task.cfuse.cfuse.0.err: ceph version 0.38-207-g88963a1 (commit:88963a181adb8fadb3df5a4a6c8d9d7fd56ad972)
2011-11-21T13:59:04.438 INFO:teuthology.task.cfuse.cfuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long)+0xc7d) [0x67651d]
2011-11-21T13:59:04.438 INFO:teuthology.task.cfuse.cfuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x7a) [0x68d53a]
2011-11-21T13:59:04.439 INFO:teuthology.task.cfuse.cfuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe2e) [0x65e51e]
2011-11-21T13:59:04.439 INFO:teuthology.task.cfuse.cfuse.0.err: 4: (Client::ms_dispatch(Message*)+0x677) [0x4b8bd7]
2011-11-21T13:59:04.439 INFO:teuthology.task.cfuse.cfuse.0.err: 5: (SimpleMessenger::dispatch_entry()+0x99a) [0x52fd2a]
2011-11-21T13:59:04.439 INFO:teuthology.task.cfuse.cfuse.0.err: 6: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x45e1fc]
2011-11-21T13:59:04.439 INFO:teuthology.task.cfuse.cfuse.0.err: 7: (Thread::_entry_func(void*)+0x12) [0x52a872]
2011-11-21T13:59:04.440 INFO:teuthology.task.cfuse.cfuse.0.err: 8: (()+0x7971) [0x7f0769279971]
2011-11-21T13:59:04.440 INFO:teuthology.task.cfuse.cfuse.0.err: 9: (clone()+0x6d) [0x7f0767b0892d]
2011-11-21T13:59:04.440 INFO:teuthology.task.cfuse.cfuse.0.err: ceph version 0.38-207-g88963a1 (commit:88963a181adb8fadb3df5a4a6c8d9d7fd56ad972)
2011-11-21T13:59:04.440 INFO:teuthology.task.cfuse.cfuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long)+0xc7d) [0x67651d]
2011-11-21T13:59:04.440 INFO:teuthology.task.cfuse.cfuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x7a) [0x68d53a]
2011-11-21T13:59:04.441 INFO:teuthology.task.cfuse.cfuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe2e) [0x65e51e]
2011-11-21T13:59:04.441 INFO:teuthology.task.cfuse.cfuse.0.err: 4: (Client::ms_dispatch(Message*)+0x677) [0x4b8bd7]
2011-11-21T13:59:04.441 INFO:teuthology.task.cfuse.cfuse.0.err: 5: (SimpleMessenger::dispatch_entry()+0x99a) [0x52fd2a]
2011-11-21T13:59:04.441 INFO:teuthology.task.cfuse.cfuse.0.err: 6: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x45e1fc]
2011-11-21T13:59:04.441 INFO:teuthology.task.cfuse.cfuse.0.err: 7: (Thread::_entry_func(void*)+0x12) [0x52a872]
2011-11-21T13:59:04.442 INFO:teuthology.task.cfuse.cfuse.0.err: 8: (()+0x7971) [0x7f0769279971]
2011-11-21T13:59:04.442 INFO:teuthology.task.cfuse.cfuse.0.err: 9: (clone()+0x6d) [0x7f0767b0892d]

#13 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_position deleted (112)
  • translation missing: en.field_position set to 6

#14 Updated by Sage Weil over 9 years ago

  • Assignee deleted (Sage Weil)

#15 Updated by Sage Weil over 9 years ago

  • Target version changed from v0.39 to v0.40

#16 Updated by Greg Farnum over 9 years ago

This didn't turn out to have anything to do with #1727, did it?

#17 Updated by Greg Farnum over 9 years ago

  • Status changed from 12 to 7
  • Assignee set to Sage Weil

If we're lucky this was caused by taking waiters improperly, which Sage fixed in 8bbe576cab9ecdbfea939ad3d78664b638ec2f6a

#18 Updated by Sage Weil over 9 years ago

  • Status changed from 7 to Resolved

#19 Updated by Josh Durgin over 9 years ago

  • Status changed from Resolved to New

Happened again in teuthology:~/log/client.0.log.gz

Build is in teuthology:~teuthworker/archive/ceph_builds/nightly_coverage_2011-12-15-b.tgz

#20 Updated by Sage Weil over 9 years ago

  • Assignee deleted (Sage Weil)

#21 Updated by Josh Durgin over 9 years ago

Happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-12-28-b/5258/teuthology.log.

#22 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_position deleted (36)
  • translation missing: en.field_position set to 30

#23 Updated by Sage Weil over 9 years ago

  • Priority changed from High to Urgent

#24 Updated by Josh Durgin over 9 years ago

It looks like there are at least two bugs here: one client side, and one server side. I'm reproducing with more logs for the server side, but I think I've figued out the client side bug:

When the objecter gets an osd_op_reply, it ignores it iff the connection of the message is not the connection on which it was sent. When osds are thrashing, there's a race for ops to be ignored. Consider two writes to the same object:

1. tid 1 > osd 0
2. tid 2 -> osd 0
3. map change! primary is now osd 1
3. ack tid 1 <
osd 0 (ignored, because op->session->con is now to osd 1
4. map change! primary is now osd 0 again
5. ack tid 2 <- osd 0 ! problem - op->session->con is to osd 0 again, so this ack is not ignored

Here are the logs of two requests where this happened:

2012-01-05 15:40:33.091799 7f650e0e1760 client.4108.objecter recalc_op_target tid 5747 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:40:33.091819 7f650e0e1760 client.4108.objecter op_submit oid sepia8721721-76 @0 [write 1787555~34265] tid 5747 osd.5
2012-01-05 15:40:33.091829 7f650e0e1760 client.4108.objecter send_op 5747 to osd.5
2012-01-05 15:40:33.091850 7f650e0e1760 -- 10.3.14.214:0/1021721 --> 10.3.14.136:6800/2349 -- osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513) v1 -- ?+0 0x683010 con 0x67f6d0
2012-01-05 15:40:33.091866 7f650e0e1760 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513) v1 remote 10.3.14.136:6800/2349
2012-01-05 15:40:33.091925 7f65085ab700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).writer encoding 1489 0x683010 osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513) v1
2012-01-05 15:40:34.421184 7f65084aa700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).reader got message 1089 0x7f64fc08b990 osd_op_reply(5747 sepia8721721-76 [write 1787555~34265] ondisk = 0) v2
2012-01-05 15:41:03.742356 7f65091b0700 client.4108.objecter  tid 5747 on osd.5 is laggy
2012-01-05 15:41:03.752547 7f650b2b5700 client.4108.objecter recalc_op_target tid 5747 pgid 0.7bedc513 acting [0,2]
2012-01-05 15:41:03.755626 7f650b2b5700 client.4108.objecter send_op 5747 to osd.0
2012-01-05 15:41:03.755650 7f650b2b5700 -- 10.3.14.214:0/1021721 --> 10.3.14.207:6800/3781 -- osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1 -- ?+0 0x7f650430e050 con 0x7f65040385c0
2012-01-05 15:41:03.755666 7f650b2b5700 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1 remote 10.3.14.207:6800/3781
2012-01-05 15:41:03.755735 7f65083a9700 -- 10.3.14.214:0/1021721 >> 10.3.14.207:6800/3781 pipe(0x7f6504038350 sd=11 pgs=5 cs=1 l=1).writer encoding 1363 0x7f650430e050 osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1
2012-01-05 15:41:03.776711 7f650b2b5700 -- 10.3.14.214:0/1021721 <== osd.5 10.3.14.136:6800/2349 1089 ==== osd_op_reply(5747 sepia8721721-76 [write 1787555~34265] ondisk = 0) v2 ==== 106+0+0 (1121950433 0 0) 0x7f64fc08b990 con 0x67f6d0
2012-01-05 15:41:03.776748 7f650b2b5700 client.4108.objecter handle_osd_op_reply 5747 ondisk v 116'163 in 0.7bedc513
2012-01-05 15:41:03.778104 7f650b2b5700 client.4108.objecter recalc_op_target tid 5747 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:41:03.780673 7f650b2b5700 client.4108.objecter recalc_op_target tid 5747 pgid 0.7bedc513 acting [0,2]
2012-01-05 15:41:03.786047 7f650b2b5700 client.4108.objecter recalc_op_target tid 5747 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:41:03.789286 7f650b2b5700 client.4108.objecter send_op 5747 to osd.5
2012-01-05 15:41:03.789313 7f650b2b5700 -- 10.3.14.214:0/1021721 --> 10.3.14.136:6800/2349 -- osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1 -- ?+0 0x7f64fc08b620 con 0x67f6d0
2012-01-05 15:41:03.789332 7f650b2b5700 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1 remote 10.3.14.136:6800/2349
2012-01-05 15:41:03.789436 7f65085ab700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).writer encoding 1602 0x7f64fc08b620 osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1
2012-01-05 15:41:03.792539 7f65084aa700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).reader got message 1186 0x7f64fc08b620 osd_op_reply(5747 sepia8721721-76 [write 1787555~34265] ondisk = 0) v2
2012-01-05 15:41:03.846297 7f650b2b5700 -- 10.3.14.214:0/1021721 <== osd.5 10.3.14.136:6800/2349 1186 ==== osd_op_reply(5747 sepia8721721-76 [write 1787555~34265] ondisk = 0) v2 ==== 106+0+0 (3803515885 0 0) 0x7f64fc08b620 con 0x67f6d0
2012-01-05 15:41:03.846317 7f650b2b5700 client.4108.objecter handle_osd_op_reply 5747 ondisk v 116'163 in 0.7bedc513
2012-01-05 15:40:33.349909 7f650e0e1760 client.4108.objecter recalc_op_target tid 5795 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:40:33.349924 7f650e0e1760 client.4108.objecter op_submit oid sepia8721721-76 @0 [write 3772767~4901] tid 5795 osd.5
2012-01-05 15:40:33.349933 7f650e0e1760 client.4108.objecter send_op 5795 to osd.5
2012-01-05 15:40:33.349952 7f650e0e1760 -- 10.3.14.214:0/1021721 --> 10.3.14.136:6800/2349 -- osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513) v1 -- ?+0 0xc4b8b0 con 0x67f6d0
2012-01-05 15:40:33.349967 7f650e0e1760 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513) v1 remote 10.3.14.136:6800/2349
2012-01-05 15:40:33.350027 7f65085ab700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).writer encoding 1537 0xc4b8b0 osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513) v1
2012-01-05 15:40:34.431497 7f65084aa700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).reader got message 1137 0x7f64fc02fd30 osd_op_reply(5795 sepia8721721-76 [write 3772767~4901] ondisk = 0) v2
2012-01-05 15:41:03.742803 7f65091b0700 client.4108.objecter  tid 5795 on osd.5 is laggy
2012-01-05 15:41:03.753316 7f650b2b5700 client.4108.objecter recalc_op_target tid 5795 pgid 0.7bedc513 acting [0,2]
2012-01-05 15:41:03.760375 7f650b2b5700 client.4108.objecter send_op 5795 to osd.0
2012-01-05 15:41:03.760443 7f650b2b5700 -- 10.3.14.214:0/1021721 --> 10.3.14.207:6800/3781 -- osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1 -- ?+0 0x73d2b0 con 0x7f65040385c0
2012-01-05 15:41:03.760471 7f650b2b5700 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1 remote 10.3.14.207:6800/3781
2012-01-05 15:41:03.774484 7f65083a9700 -- 10.3.14.214:0/1021721 >> 10.3.14.207:6800/3781 pipe(0x7f6504038350 sd=11 pgs=5 cs=1 l=1).writer encoding 1411 0x73d2b0 osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1
2012-01-05 15:41:03.778788 7f650b2b5700 client.4108.objecter recalc_op_target tid 5795 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:41:03.781393 7f650b2b5700 client.4108.objecter recalc_op_target tid 5795 pgid 0.7bedc513 acting [0,2]
2012-01-05 15:41:03.787665 7f650b2b5700 client.4108.objecter recalc_op_target tid 5795 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:41:03.800543 7f650b2b5700 client.4108.objecter send_op 5795 to osd.5
2012-01-05 15:41:03.800584 7f650b2b5700 -- 10.3.14.214:0/1021721 --> 10.3.14.136:6800/2349 -- osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1 -- ?+0 0xc315d0 con 0x67f6d0
2012-01-05 15:41:03.800605 7f650b2b5700 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1 remote 10.3.14.136:6800/2349
2012-01-05 15:41:03.810690 7f65085ab700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).writer encoding 1650 0xc315d0 osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1
2012-01-05 15:41:03.837877 7f650b2b5700 -- 10.3.14.214:0/1021721 <== osd.5 10.3.14.136:6800/2349 1137 ==== osd_op_reply(5795 sepia8721721-76 [write 3772767~4901] ondisk = 0) v2 ==== 106+0+0 (2563624606 0 0) 0x7f64fc02fd30 con 0x67f6d0
2012-01-05 15:41:03.837897 7f650b2b5700 client.4108.objecter handle_osd_op_reply 5795 ondisk v 116'211 in 0.7bedc513

#25 Updated by Sage Weil over 9 years ago

  • Assignee set to Josh Durgin

#26 Updated by Josh Durgin over 9 years ago

One server side cause of this (I'm not sure there aren't more) is duplicate request tracking. Currently requests are immediately acked if they are determined to be duplicates by looking in the pg log, which is trimmed aggressively. This triggers out of order acks if requests are retried, and the first one is no longer in the pg log, so it is applied again. The second request is still in the log, so it is immediately acked as a duplicate, before the first can finish.

This is a problem for consistency for non-idempotent operations as well. The short term fix is to make pg log trimming less aggressive. I've opened #1895 for the long term fix.

#27 Updated by Sage Weil over 9 years ago

  • Priority changed from Urgent to Normal

#28 Updated by Sage Weil over 9 years ago

  • Target version deleted (v0.40)
  • translation missing: en.field_position deleted (51)
  • translation missing: en.field_position set to 1

#29 Updated by Sage Weil over 9 years ago

  • Target version set to v0.41

#30 Updated by Sage Weil over 9 years ago


2012-01-29T12:54:12.196 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t)' thread 7f850acc4700 time 2012-01-29 12:31:51.372320
2012-01-29T12:54:12.196 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: 672: FAILED assert(ob->last_commit_tid < tid)
2012-01-29T12:54:12.197 INFO:teuthology.task.cfuse.cfuse.0.err: ceph version 0.41-15-g9bb3875 (commit:9bb3875b1671b89a74895f3c97a27845867b3941)
2012-01-29T12:54:12.197 INFO:teuthology.task.cfuse.cfuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long)+0xc7d) [0x68450d]
2012-01-29T12:54:12.197 INFO:teuthology.task.cfuse.cfuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x7a) [0x69b64a]
2012-01-29T12:54:12.197 INFO:teuthology.task.cfuse.cfuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x173f) [0x66c5cf]
2012-01-29T12:54:12.197 INFO:teuthology.task.cfuse.cfuse.0.err: 4: (Client::ms_dispatch(Message*)+0x677) [0x4baaf7]
2012-01-29T12:54:12.198 INFO:teuthology.task.cfuse.cfuse.0.err: 5: (SimpleMessenger::dispatch_entry()+0xa1a) [0x53464a]
2012-01-29T12:54:12.198 INFO:teuthology.task.cfuse.cfuse.0.err: 6: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x45fd2c]
2012-01-29T12:54:12.198 INFO:teuthology.task.cfuse.cfuse.0.err: 7: (Thread::_entry_func(void*)+0x12) [0x52eb82]
2012-01-29T12:54:12.198 INFO:teuthology.task.cfuse.cfuse.0.err: 8: (()+0x7971) [0x7f850f13e971]
2012-01-29T12:54:12.198 INFO:teuthology.task.cfuse.cfuse.0.err: 9: (clone()+0x6d) [0x7f850d9cd92d]
2012-01-29T12:54:12.199 INFO:teuthology.task.cfuse.cfuse.0.err: ceph version 0.41-15-g9bb3875 (commit:9bb3875b1671b89a74895f3c97a27845867b3941)
2012-01-29T12:54:12.199 INFO:teuthology.task.cfuse.cfuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long)+0xc7d) [0x68450d]
2012-01-29T12:54:12.199 INFO:teuthology.task.cfuse.cfuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x7a) [0x69b64a]
2012-01-29T12:54:12.199 INFO:teuthology.task.cfuse.cfuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x173f) [0x66c5cf]
2012-01-29T12:54:12.199 INFO:teuthology.task.cfuse.cfuse.0.err: 4: (Client::ms_dispatch(Message*)+0x677) [0x4baaf7]
2012-01-29T12:54:12.200 INFO:teuthology.task.cfuse.cfuse.0.err: 5: (SimpleMessenger::dispatch_entry()+0xa1a) [0x53464a]
2012-01-29T12:54:12.200 INFO:teuthology.task.cfuse.cfuse.0.err: 6: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x45fd2c]
2012-01-29T12:54:12.200 INFO:teuthology.task.cfuse.cfuse.0.err: 7: (Thread::_entry_func(void*)+0x12) [0x52eb82]
2012-01-29T12:54:12.200 INFO:teuthology.task.cfuse.cfuse.0.err: 8: (()+0x7971) [0x7f850f13e971]
2012-01-29T12:54:12.200 INFO:teuthology.task.cfuse.cfuse.0.err: 9: (clone()+0x6d) [0x7f850d9cd92d]

on ubuntu@teuthology:/a/nightly_coverage_2012-01-29-b/9647

#31 Updated by Sage Weil over 9 years ago

  • Target version changed from v0.41 to v0.42

#32 Updated by Sage Weil over 9 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF