https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2015-09-22T14:54:20ZCeph RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=589782015-09-22T14:54:20ZSamuel Justsjust@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>Urgent</i></li></ul> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=591432015-09-25T02:44:12ZSage Weilsage@newdream.net
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/59143/diff?detail_id=57059">diff</a>)</li><li><strong>Status</strong> changed from <i>New</i> to <i>Need More Info</i></li></ul><p>Is this the only OSD that crashed? Have you seen it more than once?</p> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=591602015-09-25T06:09:35Zlu shishi.lu@h3c.com
<ul></ul><p>yes,only one osd crashed.i see it just once.But there is someone who joined ceph-devel encountered this problem</p> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=591612015-09-25T06:12:43ZJianhui Yuanzuiwanyuan@gmail.com
<ul></ul><p>In my env, Many OSD crash because of it.<br />And I use aync msg. It seem like wrong send push_op to replicated.</p>
<pre>
-1842> 2015-09-25 12:06:20.467166 7f1463a03700 1 -- 172.16.50.7:6829/30641 >> 172.16.50.10:6821/32703 conn(0x1eb93000 sd=263 :-1 s=STATE_OPEN pgs=455 cs=1 l=0). == tx == 0x1efbda00 MOSDPGPushReply(5.55d 94505 [PushReplyOp(85ea9d5d/rbd_data.8f19f2ae8944a.000000000002e950/head//5)]) v2
-1290> 2015-09-25 12:06:20.474333 7f146ae48700 1 -- 172.16.50.7:6829/30641 >> 172.16.50.10:6821/32703 conn(0x1eb93000 sd=263 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=455 cs=1 l=0). == rx == 0x1f665800 MOSDPGPush(5.55d 94505 [PushOp(85ea9d5d/rbd_data.8f19f2ae8944a.000000000002e950/head//5, version: 90491'484667, data_included: [8388608~3014656], data_size: 3014656, omap_header_size: 0, omap_entries_size: 0, attrset_size: 0, recovery_info: ObjectRecoveryInfo(85ea9d5d/rbd_data.8f19f2ae8944a.000000000002e950/head//5@90491'484667, copy_subset: [0~11403264], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:11403264, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(!first, data_recovered_to:8388608, data_complete:false, omap_recovered_to:, omap_complete:true))]) v2
-1204> 2015-09-25 12:06:20.474705 7f1459db6700 10 osd.10 pg_epoch: 94505 pg[5.55d( v 94488'484671 lc 94488'484670 (75232'481654,94488'484671] local-les=94505 n=130 ec=288 les/c 94505/94464 94504/94504/93603) [38,10] r=1 lpr=94504 pi=93491-94503/25 luod=0'0 crt=94488'484671 lcod 0'0 active m=1] on_local_recover: 85ea9d5d/rbd_data.8f19f2ae8944a.000000000002e950/head//5
</pre> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=594292015-10-01T13:21:37ZSage Weilsage@newdream.net
<ul><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>High</i></li></ul><p>this sounds like a side-effect of a async msgr bug, but without a full log it is hard to say.</p> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=599612015-10-10T08:31:18Zlu shishi.lu@h3c.com
<ul></ul><p><a class="external" href="https://github.com/s09816/kk/blob/master/ceph-osd.17.log.5">https://github.com/s09816/kk/blob/master/ceph-osd.17.log.5</a><br />I put the full log info to this address.</p> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=601372015-10-13T14:51:21ZSage Weilsage@newdream.net
<ul></ul><p>lu shi wrote:</p>
<blockquote>
<p><a class="external" href="https://github.com/s09816/kk/blob/master/ceph-osd.17.log.5">https://github.com/s09816/kk/blob/master/ceph-osd.17.log.5</a><br />I put the full log info to this address.</p>
</blockquote>
<p>need to reproduce with debug osd = 20, debug ms = 1</p> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=605042015-10-22T03:31:02Zshawn chencxwshawn@gmail.com
<ul></ul><p>actually, it's not too hard to reproduce this problem ,the reason is object info save on the file system is inconsistent with pglog, so the primary may send low version object to replicas.</p>
<p>see <a class="external" href="https://github.com/ceph/ceph/pull/6345">https://github.com/ceph/ceph/pull/6345</a></p> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=608112015-10-28T06:25:03ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Status</strong> changed from <i>Need More Info</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>Zhiqiang Wang</i></li></ul> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=609682015-11-03T09:40:21ZKefu Chaitchaikov@gmail.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li><li><strong>Assignee</strong> deleted (<del><i>Zhiqiang Wang</i></del>)</li></ul> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=616912015-11-17T15:34:56ZSage Weilsage@newdream.net
<ul><li><strong>Priority</strong> changed from <i>High</i> to <i>Urgent</i></li></ul> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=622662015-12-01T15:22:53ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>17</i></li></ul> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=773762016-08-25T16:49:14ZSamuel Justsjust@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>Low</i></li></ul> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=891142017-04-12T17:42:52ZSage Weilsage@newdream.net
<ul></ul><p>Is this still a problem?</p> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=927192017-06-13T22:00:09ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>RADOS</i></li></ul><p>I don't really get how the AsyncMessenger could have caused this issue...?</p> RADOS - Bug #13111: replicatedPG:the assert occurs in the fuction ReplicatedPG::on_local_recover.https://tracker.ceph.com/issues/13111?journal_id=1535992019-12-05T21:43:57ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>17</i> to <i>Fix Under Review</i></li></ul>