https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2015-04-20T04:01:05ZCeph Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=507752015-04-20T04:01:05ZSamuel Justsjust@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>7</i></li></ul> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=507762015-04-20T04:03:58ZSamuel Justsjust@redhat.com
<ul><li><strong>Status</strong> changed from <i>7</i> to <i>12</i></li></ul> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=508922015-04-21T20:10:09ZSage Weilsage@newdream.net
<ul></ul><p>It seems like the safest option here would be to have users manually run ceph-objectstore-tool remove.</p>
<p>We could make the OSD automatically delete PGs when the map is ancient, but that seems dangerous to me since an epoch-related bug could trigger deletion.</p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=509892015-04-22T22:57:59ZKen Dreyerkdreyer@redhat.com
<ul></ul><p>Was there a patch that went in for this?</p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=510052015-04-23T08:59:11ZSamuel Justsjust@redhat.com
<ul></ul><p>Yeah, I tried to make it remove the pg automatically, but it turned out to be complicated. Instead, it'll just skip the pg and complain into the log that the user should manually clean up the pg at some point in the future.</p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=510692015-04-26T14:44:47ZTuomas Juntunentuomas.juntunen@databasement.fi
<ul></ul><p>Could someone give out the process on how to use the ceph-objectstore-tool remove.</p>
<p>The one with get the pg's and compare to invalid pg's in 'ceph osd pool ls detail' seems too vague.</p>
<p>T</p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=512842015-05-04T23:37:59ZSamuel Justsjust@redhat.com
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>Pending Backport</i></li></ul> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=512862015-05-05T00:28:14ZKen Dreyerkdreyer@redhat.com
<ul></ul><p>Patch that went into master: <a class="external" href="https://github.com/ceph/ceph/pull/4539">https://github.com/ceph/ceph/pull/4539</a></p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=512922015-05-05T08:58:44ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Severity</strong> changed from <i>3 - minor</i> to <i>1 - critical</i></li></ul> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=513362015-05-06T02:34:07ZXinxin Shuxinxin.shu5040@gmail.com
<ul></ul><ul>
<li>firefly backport <a class="external" href="https://github.com/ceph/ceph/pull/4556">https://github.com/ceph/ceph/pull/4556</a></li>
</ul> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=513402015-05-06T08:47:27ZLoïc Dacharyloic@dachary.org
<ul></ul><ul>
<li>hammer backport <a class="external" href="https://github.com/ceph/ceph/pull/4559">https://github.com/ceph/ceph/pull/4559</a></li>
</ul> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=514102015-05-06T16:56:36ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Regression</strong> set to <i>No</i></li></ul><ul>
<li>ceph-qa-suite master <a class="external" href="https://github.com/ceph/ceph-qa-suite/pull/428">https://github.com/ceph/ceph-qa-suite/pull/428</a></li>
</ul> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=514122015-05-06T17:01:01ZLoïc Dacharyloic@dachary.org
<ul></ul><ul>
<li>ceph-qa-suite hammer backport <a class="external" href="https://github.com/ceph/ceph-qa-suite/pull/432">https://github.com/ceph/ceph-qa-suite/pull/432</a></li>
</ul> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=514502015-05-07T10:09:38ZLoïc Dacharyloic@dachary.org
<ul></ul><ul>
<li>ceph-qa-suite firefly backport <a class="external" href="https://github.com/ceph/ceph-qa-suite/pull/435">https://github.com/ceph/ceph-qa-suite/pull/435</a></li>
</ul> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=514742015-05-07T13:18:33ZLoïc Dacharyloic@dachary.org
<ul></ul><p>since the task installs firefly to reproduce the problem, it will become a noop as soon as the bug is fixed in v0.80.10+. It should install v0.80.9 instead of firefly.</p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=515492015-05-08T15:25:39ZLoïc Dacharyloic@dachary.org
<ul></ul><pre>
<sjustwork> loicd: http://tracker.ceph.com/issues/11429 I think the task installs v0.80.8
<sjustwork> not firefly
<loicd> sjustwork: ah cool, my mistake
<loicd> sjustwork: I wonder why I thought it installed firefly... sorry for the noise
<sjustwork> ok
<loicd> I probably read print: '**** done installing firefly'
<loicd> instead of the line before
<loicd> branch: v0.80.8
<loicd> oh well
</pre> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=519542015-05-15T15:04:28ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=524112015-05-26T06:00:52ZIrek Fasikhovmalmyzh@gmail.com
<ul><li><strong>File</strong> <i>ceph-osd.25.log.tar.gz</i> added</li></ul><p>Hi Loic,Samuel</p>
<p>The problem remains despite the patch.<br />Look at the attached log file.</p>
<pre>
[root@ceph03p24 ceph]# ceph -v
ceph version 0.94.1-116-g63832d4 (63832d4039889b6b704b88b86eaba4aadcfceb2e)
</pre>
<p>Configuration<br /><pre>
[osd]
osd journal size = 10000
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = rw,noatime,inode64,logbsize=256k,allocsize=1m
filestore xattr use omap = true
osd scrub load threshold = 2
osd recovery op priority = 2
osd max backfills = 1
osd recovery max active = 1
osd recovery threads = 1
osd crush update on start = false
osd recovery delay start = 5
osd snap trim sleep = 0.5
osd disk thread ioprio class = idle
osd disk thread ioprio priority = 7
debug_objecter = 10/10
debug_ms = 10/10
debug_filestore = 10/10
debug_osd = 10/10
debug_journal = 10/10
</pre></p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=524132015-05-26T06:45:02ZIrek Fasikhovmalmyzh@gmail.com
<ul><li><strong>File</strong> <a href="/attachments/download/1720/ceph-osd.25.log.tar.gz">ceph-osd.25.log.tar.gz</a> added</li></ul><pre>
debug_objecter = 20/20
debug_ms = 20/20
debug_filestore = 20/20
debug_osd = 20/20
debug_journal = 20/20
</pre> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=524152015-05-26T07:00:02ZLoïc Dacharyloic@dachary.org
<ul><li><strong>File</strong> deleted (<del><i>ceph-osd.25.log.tar.gz</i></del>)</li></ul> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=524162015-05-26T07:04:08ZLoïc Dacharyloic@dachary.org
<ul></ul><p>From ceph-osd.25.log.tar.gz <br /><pre>
-5> 2015-05-26 09:42:54.997851 7f0fbfa34880 10 _load_class version success
-4> 2015-05-26 09:42:54.997862 7f0fbfa34880 20 osd.25 0 get_map 17735 - loading and decoding 0x4589200
-3> 2015-05-26 09:42:54.997869 7f0fbfa34880 15 filestore(/var/lib/ceph/osd/ceph-25) read meta/4e928679/osdmap.17735/0//-1 0~0
-2> 2015-05-26 09:42:54.997890 7f0fbfa34880 10 filestore(/var/lib/ceph/osd/ceph-25) error opening file /var/lib/ceph/osd/ceph-25/current/meta/DIR_9/DIR_7/osdmap.17735__0_4E928679__none with flags=2: (2) No such file or directory
-1> 2015-05-26 09:42:54.997899 7f0fbfa34880 10 filestore(/var/lib/ceph/osd/ceph-25) FileStore::read(meta/4e928679/osdmap.17735/0//-1) open error: (2) No such file or directory
0> 2015-05-26 09:42:54.999254 7f0fbfa34880 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f0fbfa34880 time 2015-05-26 09:42:54.997908
osd/OSD.h: 716: FAILED assert(ret)
ceph version 0.94.1-116-g63832d4 (63832d4039889b6b704b88b86eaba4aadcfceb2e)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbc4e15]
2: (OSDService::get_map(unsigned int)+0x3f) [0x6ffa9f]
3: (OSD::init()+0x6b7) [0x6b8e17]
4: (main()+0x27f3) [0x643b63]
5: (__libc_start_main()+0xf5) [0x7f0fbcdd2af5]
6: /usr/bin/ceph-osd() [0x65cdc9]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
</pre></p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=524172015-05-26T07:09:30ZIrek Fasikhovmalmyzh@gmail.com
<ul></ul><p>Loic, This conclusion was already with the patch: <a class="external" href="https://github.com/ceph/ceph/pull/4559">https://github.com/ceph/ceph/pull/4559</a><br />Or you need to recreate the OSD to correct?<br />Thanks</p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=524192015-05-26T07:27:20ZLoïc Dacharyloic@dachary.org
<ul></ul><p>what you're having is different: it's a failure to load the osdmap because the epoch in the osd superblock is a reference to an osdmap that does not exist. This bug is about a failure to load an osdmap referenced from a resurected pg. Would you mind creating another bug report with the same information ?</p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=524202015-05-26T07:36:18ZIrek Fasikhovmalmyzh@gmail.com
<ul></ul><p>Loic Dachary wrote:</p>
<blockquote>
<p>what you're having is different: it's a failure to load the osdmap because the epoch in the osd superblock is a reference to an osdmap that does not exist. This bug is about a failure to load an osdmap referenced from a resurected pg. Would you mind creating another bug report with the same information ?</p>
</blockquote>
<p>Of course, it will create.</p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=524212015-05-26T07:44:04ZIrek Fasikhovmalmyzh@gmail.com
<ul></ul><p>Loic, Already there is also attached to the current task: <a class="issue tracker-1 status-10 priority-4 priority-default closed" title="Bug: OSD crash in OSDService::get_map (Duplicate)" href="https://tracker.ceph.com/issues/11373">#11373</a></p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=524232015-05-26T09:20:39ZLoïc Dacharyloic@dachary.org
<ul></ul><p>The bug <a class="issue tracker-1 status-10 priority-4 priority-default closed" title="Bug: OSD crash in OSDService::get_map (Duplicate)" href="https://tracker.ceph.com/issues/11373">#11373</a> is a duplicate of this one and the trace shows it crashes in load_pgs. Your problem seems slightly different: it does not involve load_pgs.</p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=524242015-05-26T09:33:34ZIrek Fasikhovmalmyzh@gmail.com
<ul></ul><p><a class="issue tracker-1 status-10 priority-4 priority-default closed" title="Bug: Failure to load the osdmap (Duplicate)" href="https://tracker.ceph.com/issues/11757">#11757</a></p> Ceph - Bug #11429: OSD::load_pgs: we need to handle the case where an upgrade from earlier versions which ignored non-existent pgs resurrects a pg with a prehistoric osdmaphttps://tracker.ceph.com/issues/11429?journal_id=530522015-06-03T05:54:37ZSrikanth Madugundimsrikant1999@gmail.com
<ul></ul><p>We recently started seeing this crash in some of our OSDs, we applied the patch to firefly and did not fix the crash. </p>
<pre><code>-5> 2015-06-03 05:31:25.986906 7f54fc5ee780 10 register_cxx_method kvs.create_with_omap flags 2 0x7f54ee423000<br /> -4> 2015-06-03 05:31:25.986908 7f54fc5ee780 10 register_cxx_method kvs.omap_remove flags 2 0x7f54ee422110<br /> -3> 2015-06-03 05:31:25.986909 7f54fc5ee780 10 register_cxx_method kvs.maybe_read_for_balance flags 1 0x7f54ee422820<br /> -2> 2015-06-03 05:31:25.986911 7f54fc5ee780 10 _load_class kvs success<br /> -1> 2015-06-03 05:31:25.986926 7f54fc5ee780 20 osd.61 0 get_map 11487 - loading and decoding 0x219f000<br /> 0> 2015-06-03 05:31:25.987927 7f54fc5ee780 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f54fc5ee780 time 2015-06-03 05:31:25.986975<br />osd/OSD.h: 634: FAILED assert(ret)</code></pre>
<pre><code>ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)<br /> 1: (OSDService::get_map(unsigned int)+0x3f) [0x68e86f]<br /> 2: (OSD::init()+0x2259) [0x64e529]<br /> 3: (main()+0x35aa) [0x5f991a]<br /> 4: (__libc_start_main()+0xfd) [0x3f3281ed5d]<br /> 5: /home/y/bin64/ceph-osd() [0x5f5fb9]<br /> NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.</code></pre>