https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2011-09-04T14:00:35Z
Ceph
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=6079
2011-09-04T14:00:35Z
Sage Weil
sage@newdream.net
<ul><li><strong>Category</strong> changed from <i>11</i> to <i>OSD</i></li><li><strong>Assignee</strong> set to <i>Sage Weil</i></li></ul><p>Just reproduced this. The problem is on the OSD side... out of order acks:<br /><pre>
ubuntu@sepia27:/tmp/cephtest$ grep 10000000004.00000001 archive/log/client.0.log | grep -- --
2011-09-04 06:35:36.574115 7f23bd6c9700 -- 10.3.14.154:0/23374 --> 10.3.14.159:6801/14836 -- osd_op(client4108.0:38 10000000004.00000001 [write 0~1044480] 0.c05947b1 snapc 1=[]) v1 -- ?+0 0x5925000 con 0x25da000
2011-09-04 06:35:37.228911 7f23bd6c9700 -- 10.3.14.154:0/23374 --> 10.3.14.159:6801/14836 -- osd_op(client4108.0:48 10000000004.00000001 [write 1044480~1441792] 0.c05947b1 snapc 1=[]) v1 -- ?+0 0x6231240 con 0x25da000
2011-09-04 06:35:37.576335 7f23bcec8700 -- 10.3.14.154:0/23374 <== osd0 10.3.14.159:6801/14836 24 ==== osd_op_reply(38 10000000004.00000001 [write 0~1044480] ondisk = 0) v2 ==== 111+0+0 (1102015817 0 0) 0x5c37c40 con 0x25da000
2011-09-04 06:35:37.905950 7f23bd6c9700 -- 10.3.14.154:0/23374 --> 10.3.14.159:6801/14836 -- osd_op(client4108.0:58 10000000004.00000001 [write 2486272~1519616] 0.c05947b1 snapc 1=[]) v1 -- ?+0 0x5925b40 con 0x25da000
2011-09-04 06:35:38.216935 7f23bcec8700 -- 10.3.14.154:0/23374 <== osd0 10.3.14.159:6801/14836 31 ==== osd_op_reply(48 10000000004.00000001 [write 1044480~1441792] ondisk = 0) v2 ==== 111+0+0 (2962237175 0 0) 0x7fe5a80 con 0x25da000
2011-09-04 06:35:38.370871 7f23bd6c9700 -- 10.3.14.154:0/23374 --> 10.3.14.159:6801/14836 -- osd_op(client4108.0:68 10000000004.00000001 [write 4005888~188416] 0.c05947b1 snapc 1=[]) v1 -- ?+0 0x80e4480 con 0x25da000
2011-09-04 06:35:39.545406 7f23bcec8700 -- 10.3.14.154:0/23374 <== osd0 10.3.14.159:6801/14836 44 ==== osd_op_reply(68 10000000004.00000001 [write 4005888~188416] ondisk = 0) v2 ==== 111+0+0 (909842800 0 0) 0x7fe58c0 con 0x25da000
2011-09-04 06:35:39.545703 7f23bcec8700 -- 10.3.14.154:0/23374 <== osd0 10.3.14.159:6801/14836 45 ==== osd_op_reply(58 10000000004.00000001 [write 2486272~1519616] ondisk = 0) v2 ==== 111+0+0 (4128546514 0 0) 0x7fe5a80 con 0x25da000
</pre></p>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=6081
2011-09-04T17:08:36Z
Greg Farnum
gfarnum@redhat.com
<ul></ul><p>The last time I saw this it was actually due to message processing priority...I don't think that should be the case here, but make sure the OSD is sending the messages out in the same order cfuse is processing them!</p>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=6121
2011-09-06T21:49:18Z
Sage Weil
sage@newdream.net
<ul><li><strong>Target version</strong> changed from <i>v0.35</i> to <i>v0.36</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=6131
2011-09-06T21:52:08Z
Sage Weil
sage@newdream.net
<ul><li><strong>translation missing: en.field_position</strong> set to <i>5</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=6333
2011-09-09T22:05:06Z
Sage Weil
sage@newdream.net
<ul><li><strong>translation missing: en.field_position</strong> deleted (<del><i>23</i></del>)</li><li><strong>translation missing: en.field_position</strong> set to <i>47</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=6383
2011-09-19T14:12:25Z
Josh Durgin
<ul></ul><p>Reproduced with debugging for the osd and cfuse when thrashing with bonnie. Logs are at vit:~joshd/bug_1490</p>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=6385
2011-09-20T11:03:38Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li></ul><p>should be fixed by <a class="changeset" title="osd: remove throttle_op_queue() There are subtle annoying problems with throttling and requeuein..." href="https://tracker.ceph.com/projects/ceph/repository/revisions/a711f2935c0637b1b8ee88e5c7ce8d6e05c29467">a711f2935c0637b1b8ee88e5c7ce8d6e05c29467</a>, which removes throttle_op_queue entirely.</p>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=6514
2011-09-29T07:40:26Z
Josh Durgin
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>In Progress</i></li><li><strong>Target version</strong> changed from <i>v0.36</i> to <i>v0.37</i></li></ul><p>Happened again with tiobench. From teutholoyg:~teuthworker/archive/nightly_coverage_2011-09-29/814/teuthology.log:</p>
<pre>
2011-09-29T00:50:54.774 INFO:teuthology.task.workunit.client.0.err:+ for i in '`seq 1 10`'
2011-09-29T00:50:54.774 INFO:teuthology.task.workunit.client.0.err:+ tiotest -f 20 -t 10 -d . -T -c -D 20 -r 1000
2011-09-29T00:50:55.792 INFO:teuthology.task.workunit.client.0.err:Waiting write threads to finish...do_write_test: initial seek 0
2011-09-29T00:50:55.792 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.794 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.796 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.807 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.808 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.809 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.811 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.812 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:50:55.829 INFO:teuthology.task.workunit.client.0.err:do_write_test: initial seek 0
2011-09-29T00:51:04.610 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t)', in thread '0x7fd8e964d700'
2011-09-29T00:51:04.610 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: 672: FAILED assert(ob->last_commit_tid < tid)
2011-09-29T00:51:04.610 INFO:teuthology.task.cfuse.cfuse.0.err: ceph version 0.35-158-gbd91444 (commit:bd9144408d928cf92036166e20bcc2ef41819534)
2011-09-29T00:51:04.610 INFO:teuthology.task.cfuse.cfuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long)+0xc7d) [0x64e22d]
2011-09-29T00:51:04.611 INFO:teuthology.task.cfuse.cfuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x7a) [0x66444a]
2011-09-29T00:51:04.611 INFO:teuthology.task.cfuse.cfuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xd8a) [0x6335da]
2011-09-29T00:51:04.611 INFO:teuthology.task.cfuse.cfuse.0.err: 4: (Client::ms_dispatch(Message*)+0x627) [0x4b47d7]
2011-09-29T00:51:04.611 INFO:teuthology.task.cfuse.cfuse.0.err: 5: (SimpleMessenger::dispatch_entry()+0xa0a) [0x530ada]
2011-09-29T00:51:04.611 INFO:teuthology.task.cfuse.cfuse.0.err: 6: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x45aa1c]
2011-09-29T00:51:04.612 INFO:teuthology.task.cfuse.cfuse.0.err: 7: (Thread::_entry_func(void*)+0x12) [0x5229f2]
2011-09-29T00:51:04.612 INFO:teuthology.task.cfuse.cfuse.0.err: 8: (()+0x7971) [0x7fd8ed8c2971]
2011-09-29T00:51:04.612 INFO:teuthology.task.cfuse.cfuse.0.err: 9: (clone()+0x6d) [0x7fd8ec35692d]
</pre>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=6592
2011-10-09T20:39:54Z
Sage Weil
sage@newdream.net
<ul><li><strong>Target version</strong> changed from <i>v0.37</i> to <i>v0.38</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=6662
2011-10-10T16:44:45Z
Sage Weil
sage@newdream.net
<ul></ul><p>Reproduced. This time ordering is violated by a race between scrub completion doing take_waiters(waiting_for_active) and a new request in _dispatch waiting for the pg lock.</p>
<p>That same race will apply for any callers of take_waiters under pg->lock where we care about ordering. the ops need to be requeued in front of what _dispatch is chewing on.</p>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=6668
2011-10-11T13:41:12Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7308
2011-11-21T21:56:24Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>12</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li><li><strong>Target version</strong> changed from <i>v0.38</i> to <i>v0.39</i></li></ul><p>happened again on /var/lib/teuthworker/archive/nightly_coverage_2011-11-21-b/2818</p>
<p>This may be the same root cause as <a class="issue tracker-1 status-3 priority-5 priority-high3 closed" title="Bug: osd: failed assert(pending_ops > 0) in dequeue_op (Resolved)" href="https://tracker.ceph.com/issues/1727">#1727</a> tho? maybe? hmm..</p>
<pre>
2011-11-21T13:59:04.426 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t)', in thread '7f0764dff700'
2011-11-21T13:59:04.426 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: 672: FAILED assert(ob->last_commit_tid < tid)
2011-11-21T13:59:04.438 INFO:teuthology.task.cfuse.cfuse.0.err: ceph version 0.38-207-g88963a1 (commit:88963a181adb8fadb3df5a4a6c8d9d7fd56ad972)
2011-11-21T13:59:04.438 INFO:teuthology.task.cfuse.cfuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long)+0xc7d) [0x67651d]
2011-11-21T13:59:04.438 INFO:teuthology.task.cfuse.cfuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x7a) [0x68d53a]
2011-11-21T13:59:04.439 INFO:teuthology.task.cfuse.cfuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe2e) [0x65e51e]
2011-11-21T13:59:04.439 INFO:teuthology.task.cfuse.cfuse.0.err: 4: (Client::ms_dispatch(Message*)+0x677) [0x4b8bd7]
2011-11-21T13:59:04.439 INFO:teuthology.task.cfuse.cfuse.0.err: 5: (SimpleMessenger::dispatch_entry()+0x99a) [0x52fd2a]
2011-11-21T13:59:04.439 INFO:teuthology.task.cfuse.cfuse.0.err: 6: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x45e1fc]
2011-11-21T13:59:04.439 INFO:teuthology.task.cfuse.cfuse.0.err: 7: (Thread::_entry_func(void*)+0x12) [0x52a872]
2011-11-21T13:59:04.440 INFO:teuthology.task.cfuse.cfuse.0.err: 8: (()+0x7971) [0x7f0769279971]
2011-11-21T13:59:04.440 INFO:teuthology.task.cfuse.cfuse.0.err: 9: (clone()+0x6d) [0x7f0767b0892d]
2011-11-21T13:59:04.440 INFO:teuthology.task.cfuse.cfuse.0.err: ceph version 0.38-207-g88963a1 (commit:88963a181adb8fadb3df5a4a6c8d9d7fd56ad972)
2011-11-21T13:59:04.440 INFO:teuthology.task.cfuse.cfuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long)+0xc7d) [0x67651d]
2011-11-21T13:59:04.440 INFO:teuthology.task.cfuse.cfuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x7a) [0x68d53a]
2011-11-21T13:59:04.441 INFO:teuthology.task.cfuse.cfuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe2e) [0x65e51e]
2011-11-21T13:59:04.441 INFO:teuthology.task.cfuse.cfuse.0.err: 4: (Client::ms_dispatch(Message*)+0x677) [0x4b8bd7]
2011-11-21T13:59:04.441 INFO:teuthology.task.cfuse.cfuse.0.err: 5: (SimpleMessenger::dispatch_entry()+0x99a) [0x52fd2a]
2011-11-21T13:59:04.441 INFO:teuthology.task.cfuse.cfuse.0.err: 6: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x45e1fc]
2011-11-21T13:59:04.441 INFO:teuthology.task.cfuse.cfuse.0.err: 7: (Thread::_entry_func(void*)+0x12) [0x52a872]
2011-11-21T13:59:04.442 INFO:teuthology.task.cfuse.cfuse.0.err: 8: (()+0x7971) [0x7f0769279971]
2011-11-21T13:59:04.442 INFO:teuthology.task.cfuse.cfuse.0.err: 9: (clone()+0x6d) [0x7f0767b0892d]
</pre>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7311
2011-11-21T22:08:08Z
Sage Weil
sage@newdream.net
<ul><li><strong>translation missing: en.field_position</strong> deleted (<del><i>112</i></del>)</li><li><strong>translation missing: en.field_position</strong> set to <i>6</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7314
2011-11-22T23:33:31Z
Sage Weil
sage@newdream.net
<ul><li><strong>Assignee</strong> deleted (<del><i>Sage Weil</i></del>)</li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7340
2011-11-28T09:29:29Z
Sage Weil
sage@newdream.net
<ul><li><strong>Target version</strong> changed from <i>v0.39</i> to <i>v0.40</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7403
2011-11-30T10:03:25Z
Greg Farnum
gfarnum@redhat.com
<ul></ul><p>This didn't turn out to have anything to do with <a class="issue tracker-1 status-3 priority-5 priority-high3 closed" title="Bug: osd: failed assert(pending_ops > 0) in dequeue_op (Resolved)" href="https://tracker.ceph.com/issues/1727">#1727</a>, did it?</p>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7474
2011-12-02T15:43:34Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>7</i></li><li><strong>Assignee</strong> set to <i>Sage Weil</i></li></ul><p>If we're lucky this was caused by taking waiters improperly, which Sage fixed in <a class="changeset" title="osd: safely requeue waiting_for_ondisk waiters on_role_change This could conceivably cause the r..." href="https://tracker.ceph.com/projects/ceph/repository/revisions/8bbe576cab9ecdbfea939ad3d78664b638ec2f6a">8bbe576cab9ecdbfea939ad3d78664b638ec2f6a</a></p>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7522
2011-12-06T14:00:14Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>7</i> to <i>Resolved</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7658
2011-12-19T10:00:16Z
Josh Durgin
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>New</i></li></ul><p>Happened again in teuthology:~<a class="email" href="mailto:teuthworker/archive/nightly_coverage_2011-12-15-b/4357/remote/ubuntu@sepia63.ceph.dreamhost.com">teuthworker/archive/nightly_coverage_2011-12-15-b/4357/remote/ubuntu@sepia63.ceph.dreamhost.com</a>/log/client.0.log.gz</p>
<p>Build is in teuthology:~teuthworker/archive/ceph_builds/nightly_coverage_2011-12-15-b.tgz</p>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7711
2011-12-29T09:53:52Z
Sage Weil
sage@newdream.net
<ul><li><strong>Assignee</strong> deleted (<del><i>Sage Weil</i></del>)</li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7717
2011-12-29T10:33:43Z
Josh Durgin
<ul></ul><p>Happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-12-28-b/5258/teuthology.log.</p>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7720
2011-12-29T11:05:42Z
Sage Weil
sage@newdream.net
<ul><li><strong>translation missing: en.field_position</strong> deleted (<del><i>36</i></del>)</li><li><strong>translation missing: en.field_position</strong> set to <i>30</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7852
2012-01-05T10:14:50Z
Sage Weil
sage@newdream.net
<ul><li><strong>Priority</strong> changed from <i>High</i> to <i>Urgent</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7865
2012-01-05T17:08:04Z
Josh Durgin
<ul></ul><p>It looks like there are at least two bugs here: one client side, and one server side. I'm reproducing with more logs for the server side, but I think I've figued out the client side bug:</p>
<p>When the objecter gets an osd_op_reply, it ignores it iff the connection of the message is not the connection on which it was sent. When osds are thrashing, there's a race for ops to be ignored. Consider two writes to the same object:</p>
<p>1. tid 1 <del>> osd 0<br />2. tid 2 -> osd 0<br />3. map change! primary is now osd 1<br />3. ack tid 1 <</del> osd 0 (ignored, because op->session->con is now to osd 1<br />4. map change! primary is now osd 0 again<br />5. ack tid 2 <- osd 0 ! problem - op->session->con is to osd 0 again, so this ack is not ignored</p>
<p>Here are the logs of two requests where this happened:</p>
<pre>
2012-01-05 15:40:33.091799 7f650e0e1760 client.4108.objecter recalc_op_target tid 5747 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:40:33.091819 7f650e0e1760 client.4108.objecter op_submit oid sepia8721721-76 @0 [write 1787555~34265] tid 5747 osd.5
2012-01-05 15:40:33.091829 7f650e0e1760 client.4108.objecter send_op 5747 to osd.5
2012-01-05 15:40:33.091850 7f650e0e1760 -- 10.3.14.214:0/1021721 --> 10.3.14.136:6800/2349 -- osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513) v1 -- ?+0 0x683010 con 0x67f6d0
2012-01-05 15:40:33.091866 7f650e0e1760 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513) v1 remote 10.3.14.136:6800/2349
2012-01-05 15:40:33.091925 7f65085ab700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).writer encoding 1489 0x683010 osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513) v1
2012-01-05 15:40:34.421184 7f65084aa700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).reader got message 1089 0x7f64fc08b990 osd_op_reply(5747 sepia8721721-76 [write 1787555~34265] ondisk = 0) v2
2012-01-05 15:41:03.742356 7f65091b0700 client.4108.objecter tid 5747 on osd.5 is laggy
2012-01-05 15:41:03.752547 7f650b2b5700 client.4108.objecter recalc_op_target tid 5747 pgid 0.7bedc513 acting [0,2]
2012-01-05 15:41:03.755626 7f650b2b5700 client.4108.objecter send_op 5747 to osd.0
2012-01-05 15:41:03.755650 7f650b2b5700 -- 10.3.14.214:0/1021721 --> 10.3.14.207:6800/3781 -- osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1 -- ?+0 0x7f650430e050 con 0x7f65040385c0
2012-01-05 15:41:03.755666 7f650b2b5700 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1 remote 10.3.14.207:6800/3781
2012-01-05 15:41:03.755735 7f65083a9700 -- 10.3.14.214:0/1021721 >> 10.3.14.207:6800/3781 pipe(0x7f6504038350 sd=11 pgs=5 cs=1 l=1).writer encoding 1363 0x7f650430e050 osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1
2012-01-05 15:41:03.776711 7f650b2b5700 -- 10.3.14.214:0/1021721 <== osd.5 10.3.14.136:6800/2349 1089 ==== osd_op_reply(5747 sepia8721721-76 [write 1787555~34265] ondisk = 0) v2 ==== 106+0+0 (1121950433 0 0) 0x7f64fc08b990 con 0x67f6d0
2012-01-05 15:41:03.776748 7f650b2b5700 client.4108.objecter handle_osd_op_reply 5747 ondisk v 116'163 in 0.7bedc513
2012-01-05 15:41:03.778104 7f650b2b5700 client.4108.objecter recalc_op_target tid 5747 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:41:03.780673 7f650b2b5700 client.4108.objecter recalc_op_target tid 5747 pgid 0.7bedc513 acting [0,2]
2012-01-05 15:41:03.786047 7f650b2b5700 client.4108.objecter recalc_op_target tid 5747 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:41:03.789286 7f650b2b5700 client.4108.objecter send_op 5747 to osd.5
2012-01-05 15:41:03.789313 7f650b2b5700 -- 10.3.14.214:0/1021721 --> 10.3.14.136:6800/2349 -- osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1 -- ?+0 0x7f64fc08b620 con 0x67f6d0
2012-01-05 15:41:03.789332 7f650b2b5700 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1 remote 10.3.14.136:6800/2349
2012-01-05 15:41:03.789436 7f65085ab700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).writer encoding 1602 0x7f64fc08b620 osd_op(client.4108.0:5747 sepia8721721-76 [write 1787555~34265] 0.7bedc513 RETRY) v1
2012-01-05 15:41:03.792539 7f65084aa700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).reader got message 1186 0x7f64fc08b620 osd_op_reply(5747 sepia8721721-76 [write 1787555~34265] ondisk = 0) v2
2012-01-05 15:41:03.846297 7f650b2b5700 -- 10.3.14.214:0/1021721 <== osd.5 10.3.14.136:6800/2349 1186 ==== osd_op_reply(5747 sepia8721721-76 [write 1787555~34265] ondisk = 0) v2 ==== 106+0+0 (3803515885 0 0) 0x7f64fc08b620 con 0x67f6d0
2012-01-05 15:41:03.846317 7f650b2b5700 client.4108.objecter handle_osd_op_reply 5747 ondisk v 116'163 in 0.7bedc513
</pre>
<pre>
2012-01-05 15:40:33.349909 7f650e0e1760 client.4108.objecter recalc_op_target tid 5795 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:40:33.349924 7f650e0e1760 client.4108.objecter op_submit oid sepia8721721-76 @0 [write 3772767~4901] tid 5795 osd.5
2012-01-05 15:40:33.349933 7f650e0e1760 client.4108.objecter send_op 5795 to osd.5
2012-01-05 15:40:33.349952 7f650e0e1760 -- 10.3.14.214:0/1021721 --> 10.3.14.136:6800/2349 -- osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513) v1 -- ?+0 0xc4b8b0 con 0x67f6d0
2012-01-05 15:40:33.349967 7f650e0e1760 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513) v1 remote 10.3.14.136:6800/2349
2012-01-05 15:40:33.350027 7f65085ab700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).writer encoding 1537 0xc4b8b0 osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513) v1
2012-01-05 15:40:34.431497 7f65084aa700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).reader got message 1137 0x7f64fc02fd30 osd_op_reply(5795 sepia8721721-76 [write 3772767~4901] ondisk = 0) v2
2012-01-05 15:41:03.742803 7f65091b0700 client.4108.objecter tid 5795 on osd.5 is laggy
2012-01-05 15:41:03.753316 7f650b2b5700 client.4108.objecter recalc_op_target tid 5795 pgid 0.7bedc513 acting [0,2]
2012-01-05 15:41:03.760375 7f650b2b5700 client.4108.objecter send_op 5795 to osd.0
2012-01-05 15:41:03.760443 7f650b2b5700 -- 10.3.14.214:0/1021721 --> 10.3.14.207:6800/3781 -- osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1 -- ?+0 0x73d2b0 con 0x7f65040385c0
2012-01-05 15:41:03.760471 7f650b2b5700 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1 remote 10.3.14.207:6800/3781
2012-01-05 15:41:03.774484 7f65083a9700 -- 10.3.14.214:0/1021721 >> 10.3.14.207:6800/3781 pipe(0x7f6504038350 sd=11 pgs=5 cs=1 l=1).writer encoding 1411 0x73d2b0 osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1
2012-01-05 15:41:03.778788 7f650b2b5700 client.4108.objecter recalc_op_target tid 5795 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:41:03.781393 7f650b2b5700 client.4108.objecter recalc_op_target tid 5795 pgid 0.7bedc513 acting [0,2]
2012-01-05 15:41:03.787665 7f650b2b5700 client.4108.objecter recalc_op_target tid 5795 pgid 0.7bedc513 acting [5,0]
2012-01-05 15:41:03.800543 7f650b2b5700 client.4108.objecter send_op 5795 to osd.5
2012-01-05 15:41:03.800584 7f650b2b5700 -- 10.3.14.214:0/1021721 --> 10.3.14.136:6800/2349 -- osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1 -- ?+0 0xc315d0 con 0x67f6d0
2012-01-05 15:41:03.800605 7f650b2b5700 -- 10.3.14.214:0/1021721 submit_message osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1 remote 10.3.14.136:6800/2349
2012-01-05 15:41:03.810690 7f65085ab700 -- 10.3.14.214:0/1021721 >> 10.3.14.136:6800/2349 pipe(0x67fd90 sd=10 pgs=6 cs=1 l=1).writer encoding 1650 0xc315d0 osd_op(client.4108.0:5795 sepia8721721-76 [write 3772767~4901] 0.7bedc513 RETRY) v1
2012-01-05 15:41:03.837877 7f650b2b5700 -- 10.3.14.214:0/1021721 <== osd.5 10.3.14.136:6800/2349 1137 ==== osd_op_reply(5795 sepia8721721-76 [write 3772767~4901] ondisk = 0) v2 ==== 106+0+0 (2563624606 0 0) 0x7f64fc02fd30 con 0x67f6d0
2012-01-05 15:41:03.837897 7f650b2b5700 client.4108.objecter handle_osd_op_reply 5795 ondisk v 116'211 in 0.7bedc513
</pre>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7903
2012-01-06T12:21:22Z
Sage Weil
sage@newdream.net
<ul><li><strong>Assignee</strong> set to <i>Josh Durgin</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7907
2012-01-06T15:16:35Z
Josh Durgin
<ul></ul><p>One server side cause of this (I'm not sure there aren't more) is duplicate request tracking. Currently requests are immediately acked if they are determined to be duplicates by looking in the pg log, which is trimmed aggressively. This triggers out of order acks if requests are retried, and the first one is no longer in the pg log, so it is applied again. The second request is still in the log, so it is immediately acked as a duplicate, before the first can finish.</p>
<p>This is a problem for consistency for non-idempotent operations as well. The short term fix is to make pg log trimming less aggressive. I've opened <a class="issue tracker-2 status-6 priority-4 priority-default closed" title="Feature: osd: detect duplicate requests by tracking per-client last_acked_tid instead of using pg log (Rejected)" href="https://tracker.ceph.com/issues/1895">#1895</a> for the long term fix.</p>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7932
2012-01-09T14:45:18Z
Sage Weil
sage@newdream.net
<ul><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>Normal</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=7965
2012-01-09T14:50:06Z
Sage Weil
sage@newdream.net
<ul><li><strong>Target version</strong> deleted (<del><i>v0.40</i></del>)</li><li><strong>translation missing: en.field_position</strong> deleted (<del><i>51</i></del>)</li><li><strong>translation missing: en.field_position</strong> set to <i>1</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=8029
2012-01-09T15:20:44Z
Sage Weil
sage@newdream.net
<ul><li><strong>Target version</strong> set to <i>v0.41</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=8350
2012-01-29T17:51:56Z
Sage Weil
sage@newdream.net
<ul></ul><pre>
2012-01-29T12:54:12.196 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t, tid_t)' thread 7f850acc4700 time 2012-01-29 12:31:51.372320
2012-01-29T12:54:12.196 INFO:teuthology.task.cfuse.cfuse.0.err:osdc/ObjectCacher.cc: 672: FAILED assert(ob->last_commit_tid < tid)
2012-01-29T12:54:12.197 INFO:teuthology.task.cfuse.cfuse.0.err: ceph version 0.41-15-g9bb3875 (commit:9bb3875b1671b89a74895f3c97a27845867b3941)
2012-01-29T12:54:12.197 INFO:teuthology.task.cfuse.cfuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long)+0xc7d) [0x68450d]
2012-01-29T12:54:12.197 INFO:teuthology.task.cfuse.cfuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x7a) [0x69b64a]
2012-01-29T12:54:12.197 INFO:teuthology.task.cfuse.cfuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x173f) [0x66c5cf]
2012-01-29T12:54:12.197 INFO:teuthology.task.cfuse.cfuse.0.err: 4: (Client::ms_dispatch(Message*)+0x677) [0x4baaf7]
2012-01-29T12:54:12.198 INFO:teuthology.task.cfuse.cfuse.0.err: 5: (SimpleMessenger::dispatch_entry()+0xa1a) [0x53464a]
2012-01-29T12:54:12.198 INFO:teuthology.task.cfuse.cfuse.0.err: 6: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x45fd2c]
2012-01-29T12:54:12.198 INFO:teuthology.task.cfuse.cfuse.0.err: 7: (Thread::_entry_func(void*)+0x12) [0x52eb82]
2012-01-29T12:54:12.198 INFO:teuthology.task.cfuse.cfuse.0.err: 8: (()+0x7971) [0x7f850f13e971]
2012-01-29T12:54:12.198 INFO:teuthology.task.cfuse.cfuse.0.err: 9: (clone()+0x6d) [0x7f850d9cd92d]
2012-01-29T12:54:12.199 INFO:teuthology.task.cfuse.cfuse.0.err: ceph version 0.41-15-g9bb3875 (commit:9bb3875b1671b89a74895f3c97a27845867b3941)
2012-01-29T12:54:12.199 INFO:teuthology.task.cfuse.cfuse.0.err: 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long, unsigned long)+0xc7d) [0x68450d]
2012-01-29T12:54:12.199 INFO:teuthology.task.cfuse.cfuse.0.err: 2: (ObjectCacher::C_WriteCommit::finish(int)+0x7a) [0x69b64a]
2012-01-29T12:54:12.199 INFO:teuthology.task.cfuse.cfuse.0.err: 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x173f) [0x66c5cf]
2012-01-29T12:54:12.199 INFO:teuthology.task.cfuse.cfuse.0.err: 4: (Client::ms_dispatch(Message*)+0x677) [0x4baaf7]
2012-01-29T12:54:12.200 INFO:teuthology.task.cfuse.cfuse.0.err: 5: (SimpleMessenger::dispatch_entry()+0xa1a) [0x53464a]
2012-01-29T12:54:12.200 INFO:teuthology.task.cfuse.cfuse.0.err: 6: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x45fd2c]
2012-01-29T12:54:12.200 INFO:teuthology.task.cfuse.cfuse.0.err: 7: (Thread::_entry_func(void*)+0x12) [0x52eb82]
2012-01-29T12:54:12.200 INFO:teuthology.task.cfuse.cfuse.0.err: 8: (()+0x7971) [0x7f850f13e971]
2012-01-29T12:54:12.200 INFO:teuthology.task.cfuse.cfuse.0.err: 9: (clone()+0x6d) [0x7f850d9cd92d]
</pre><br />on ubuntu@teuthology:/a/nightly_coverage_2012-01-29-b/9647
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=8361
2012-01-30T09:49:19Z
Sage Weil
sage@newdream.net
<ul><li><strong>Target version</strong> changed from <i>v0.41</i> to <i>v0.42</i></li></ul>
Ceph - Bug #1490: cfuse assert failure: assert(ob->last_commit_tid < tid)
https://tracker.ceph.com/issues/1490?journal_id=8430
2012-02-06T10:00:56Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li></ul>