https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2016-06-24T22:00:59Z
Ceph
Ceph - Bug #16474: OSD scrubs same PG over and over
https://tracker.ceph.com/issues/16474?journal_id=73366
2016-06-24T22:00:59Z
Paul Emmerich
paul.emmerich@oocero.de
<ul><li><strong>File</strong> <a href="/attachments/download/2396/ceph-osd-scrub-log.txt.gz">ceph-osd-scrub-log.txt.gz</a> added</li></ul><p>The attached log are a few seconds from an OSD with debug osd = 20/20 injected while the bug is present.<br />There are several scrubs of PG 46.1d1 in the log file.</p>
Ceph - Bug #16474: OSD scrubs same PG over and over
https://tracker.ceph.com/issues/16474?journal_id=76456
2016-08-11T15:20:57Z
Robert Klikics
support@robhost.de
<ul></ul><p>We are facing the same issue after doubling the number of PGs. Restart of all OSDs stopped this for now.</p>
<p>Example from PG 0.25a:</p>
<pre>
2016-08-11 08:51:24.120671 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 08:51:44.393595 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:03:44.204498 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:04:09.605254 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:06:39.226237 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:07:00.564817 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:08:32.238131 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:08:51.314839 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:11:51.261601 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:12:12.390471 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:18:05.300804 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:18:26.596946 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:29:15.378694 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:29:37.197375 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:30:13.385262 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:30:34.708098 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:34:04.411574 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:34:27.984084 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:35:26.420798 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:35:41.591375 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:37:28.432331 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:37:53.735487 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:42:43.468118 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:43:06.689449 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:43:14.470394 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:43:30.385951 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:44:32.479196 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:44:53.231835 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:44:53.481041 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:45:13.230173 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 09:47:26.499868 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 09:47:44.678449 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 10:00:13.590341 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 10:00:49.909739 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
2016-08-11 10:07:26.640703 7f92c7520700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub starts
2016-08-11 10:07:45.944285 7f92c9d25700 0 log_channel(cluster) log [INF] : 0.25a deep-scrub ok
</pre>
Ceph - Bug #16474: OSD scrubs same PG over and over
https://tracker.ceph.com/issues/16474?journal_id=78802
2016-09-21T13:50:58Z
Warren Wang
warren@wangspeed.com
<ul></ul><p>We are also seeing this problem.</p>
<p>2016-09-21 13:19:00.417766 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub ok<br />2016-09-21 13:19:02.415674 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:02.421542 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:05.416005 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:05.419691 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:06.417942 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:06.424146 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:08.420367 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:08.424397 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:09.420088 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub starts<br />2016-09-21 13:19:09.423914 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub ok<br />2016-09-21 13:19:11.432099 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:11.437787 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:13.429745 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:13.433968 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:20.441754 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:20.448617 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:23.437290 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub starts<br />2016-09-21 13:19:23.439749 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub ok<br />2016-09-21 13:19:26.441915 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub starts<br />2016-09-21 13:19:26.449701 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub ok<br />2016-09-21 13:19:28.448141 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:28.451956 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:31.449108 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:31.453150 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:34.461895 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:34.471187 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:40.458086 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub starts<br />2016-09-21 13:19:40.461601 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub ok<br />2016-09-21 13:19:43.463643 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:43.468535 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:45.465415 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:45.471669 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:56.479685 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:19:56.482410 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:19:58.483394 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub starts<br />2016-09-21 13:19:58.495058 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub ok<br />2016-09-21 13:20:02.484480 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:20:02.488747 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:20:03.487368 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub starts<br />2016-09-21 13:20:03.493172 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub ok<br />2016-09-21 13:20:05.492350 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts<br />2016-09-21 13:20:05.495008 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 scrub ok<br />2016-09-21 13:20:06.500451 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub starts<br />2016-09-21 13:20:06.510892 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub ok<br />2016-09-21 13:20:10.500003 7f704106c700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub starts<br />2016-09-21 13:20:10.507958 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 deep-scrub ok<br />2016-09-21 13:20:12.505763 7f7043871700 0 log_channel(cluster) log [INF] : 4.c6 scrub starts</p>
<p>ceph version 10.2.2-508-g9bfc0cf (9bfc0cf178dc21b0fe33e0ce3b90a18858abaf1b)</p>
Ceph - Bug #16474: OSD scrubs same PG over and over
https://tracker.ceph.com/issues/16474?journal_id=78845
2016-09-22T04:02:10Z
David Zafman
dzafman@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>12</i></li><li><strong>Assignee</strong> set to <i>David Zafman</i></li></ul><p>Two logs show that OSD::sched_scrub() is entered and the PG returned by service.first_scrub_stamp() is always the same PG. It appears that the PG is never removed from the ScrubJob set because the sched_time isn't advancing which is impossible if it were removed and re-added.</p>
<pre>
2016-09-21 13:25:44.603314 7fc0e051b700 20 osd.12 550 sched_scrub load_is_low=0
2016-09-21 13:25:44.603333 7fc0e051b700 10 osd.12 550 sched_scrub scrubbing 25.3f at 2016-09-14 15:01:22.013447 deadline < now
... SCRUBBING HERE ...
2016-09-21 13:25:44.608794 7fc0ccb5e700 10 osd.12 pg_epoch: 550 pg[25.3f( v 550'50326 (538'47314,550'50326] local-les=534 n=22 ec=292 les/c/f 534/538/0 528/529/378) [12,39,1] r=0 lpr=529 crt=550'50324 lcod 550'50325 mlcod 550'50325 active+clean+scrubbing] scrub got 22/22 objects, 0/0 clones, 22/22 dirty, 22/22 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 bytes, 0/0 hit_set_archive bytes.
...
2016-09-21 13:25:47.603858 7fc0e051b700 20 osd.12 550 sched_scrub load_is_low=0
2016-09-21 13:25:47.603871 7fc0e051b700 10 osd.12 550 sched_scrub scrubbing 25.3f at 2016-09-14 15:01:22.013447 deadline < now
</pre>
<p>As far as I can tell unreg_next_scrub()and then reg_next_scrub() must be called in scrub_finish(). I see the log message from _scrub_finish() that it calls.</p>
<p>One explanation that would cause this is for is_primary() to be returning false. This would cause unreg_next_scrub() and reg_next_scrub() to do nothing. However, PG::sched_scrub() won't proceed if !is_primary(), so that would be impossible.</p>
Ceph - Bug #16474: OSD scrubs same PG over and over
https://tracker.ceph.com/issues/16474?journal_id=78869
2016-09-22T15:00:16Z
Samuel Just
sjust@redhat.com
<ul><li><strong>Duplicated by</strong> <i><a class="issue tracker-1 status-10 priority-5 priority-high3 closed" href="/issues/17361">Bug #17361</a>: pg scrubbing far too often (with logs)</i> added</li></ul>
Ceph - Bug #16474: OSD scrubs same PG over and over
https://tracker.ceph.com/issues/16474?journal_id=78871
2016-09-22T15:02:15Z
Samuel Just
sjust@redhat.com
<ul></ul><p>Another instance came up on the list (subject: '[ceph-users] Same pg scrubbed over and over (Jewel)') Log available at <a class="external" href="http://filedump.ceph.com/samuelj/ceph-osd.12.log.bz2">http://filedump.ceph.com/samuelj/ceph-osd.12.log.bz2</a>.</p>
Ceph - Bug #16474: OSD scrubs same PG over and over
https://tracker.ceph.com/issues/16474?journal_id=78895
2016-09-22T18:53:49Z
David Zafman
dzafman@redhat.com
<ul></ul><p>I easily reproduced this and found two entries for 0.c which is the pg repeating the scrubbing:</p>
<pre>
(gdb) print osd->sched_scrub_pg
$7 = std::set with 3 elements = {[0] = {pgid = {pgid = {m_pool = 0, m_seed = 12, m_preferred = -1, static calc_name_buf_size = 36 '$'}, shard = {id = -1 '\377', static NO_SHARD = {id = -1 '\377',
static NO_SHARD = <same as static member of an already seen type>}}, static calc_name_buf_size = 40 '('}, sched_time = {tv = {tv_sec = 1474569250, tv_nsec = 335163700}}, deadline = {tv = {
tv_sec = 1475173891, tv_nsec = 355625841}}}, [1] = {pgid = {pgid = {m_pool = 0, m_seed = 4, m_preferred = -1, static calc_name_buf_size = 36 '$'}, shard = {id = -1 '\377', static NO_SHARD = {
id = -1 '\377', static NO_SHARD = <same as static member of an already seen type>}}, static calc_name_buf_size = 40 '('}, sched_time = {tv = {tv_sec = 1474569266, tv_nsec = 535410882}},
deadline = {tv = {tv_sec = 1475173891, tv_nsec = 359957349}}}, [2] = {pgid = {pgid = {m_pool = 0, m_seed = 12, m_preferred = -1, static calc_name_buf_size = 36 '$'}, shard = {id = -1 '\377',
static NO_SHARD = {id = -1 '\377', static NO_SHARD = <same as static member of an already seen type>}}, static calc_name_buf_size = 40 '('}, sched_time = {tv = {tv_sec = 1474569571,
tv_nsec = 149387499}}, deadline = {tv = {tv_sec = 1475174242, tv_nsec = 818689957}}}}
(gdb) print this->scrubber.scrub_reg_stamp
$8 = {tv = {tv_sec = 1474569571, tv_nsec = 149387499}}
</pre>
Ceph - Bug #16474: OSD scrubs same PG over and over
https://tracker.ceph.com/issues/16474?journal_id=78896
2016-09-22T22:55:03Z
David Zafman
dzafman@redhat.com
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>17</i></li><li><strong>Backport</strong> set to <i>jewel</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/11206">https://github.com/ceph/ceph/pull/11206</a></p>
Ceph - Bug #16474: OSD scrubs same PG over and over
https://tracker.ceph.com/issues/16474?journal_id=80219
2016-10-22T01:17:45Z
David Zafman
dzafman@redhat.com
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/17666">Backport #17666</a>: jewel: OSD scrubs same PG over and over</i> added</li></ul>
Ceph - Bug #16474: OSD scrubs same PG over and over
https://tracker.ceph.com/issues/16474?journal_id=87451
2017-03-10T01:47:10Z
David Zafman
dzafman@redhat.com
<ul><li><strong>Status</strong> changed from <i>17</i> to <i>Resolved</i></li></ul>