https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2015-12-07T21:09:07Z
Ceph
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62620
2015-12-07T21:09:07Z
Loïc Dachary
loic@dachary.org
<ul></ul><p><a class="external" href="http://jenkins.ceph.dachary.org/job/ceph/LABELS=ubuntu-14.04&&x86_64/9932/consoleFull">http://jenkins.ceph.dachary.org/job/ceph/LABELS=ubuntu-14.04&&x86_64/9932/consoleFull</a></p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62692
2015-12-08T12:29:01Z
Loïc Dachary
loic@dachary.org
<ul></ul><p><a class="external" href="http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/9966/console">http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/9966/console</a></p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62693
2015-12-08T12:58:33Z
Loïc Dachary
loic@dachary.org
<ul></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/6857">https://github.com/ceph/ceph/pull/6857</a> to collect more info about the possible race</p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62694
2015-12-08T13:42:58Z
Loïc Dachary
loic@dachary.org
<ul></ul><p>Running</p>
<pre><code>while : ; do echo -n . ; test/osd/osd-scrub-snaps.sh > test.out 2>&1 || break ; done</code></pre>
<p>on rex003</p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62700
2015-12-08T15:35:54Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Status</strong> changed from <i>Need More Info</i> to <i>Duplicate</i></li></ul><p><a class="issue tracker-1 status-3 priority-6 priority-high2 closed" title="Bug: new OSD re-using old OSD id fails to boot (Resolved)" href="https://tracker.ceph.com/issues/13988">#13988</a></p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62701
2015-12-08T15:36:12Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Duplicates</strong> <i><a class="issue tracker-1 status-3 priority-6 priority-high2 closed" href="/issues/13988">Bug #13988</a>: new OSD re-using old OSD id fails to boot</i> added</li></ul>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62714
2015-12-08T16:56:25Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Duplicates</strong> deleted (<i><a class="issue tracker-1 status-3 priority-6 priority-high2 closed" href="/issues/13988">Bug #13988</a>: new OSD re-using old OSD id fails to boot</i>)</li></ul>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62716
2015-12-08T16:56:52Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>File</strong> <a href="/attachments/download/2124/test.out.gz">test.out.gz</a> added</li></ul>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62717
2015-12-08T16:57:10Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Status</strong> changed from <i>Duplicate</i> to <i>Fix Under Review</i></li><li><strong>Assignee</strong> set to <i>Loïc Dachary</i></li></ul>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62718
2015-12-08T17:00:03Z
Loïc Dachary
loic@dachary.org
<ul></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/6860">https://github.com/ceph/ceph/pull/6860</a></p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62719
2015-12-08T17:02:17Z
Loïc Dachary
loic@dachary.org
<ul></ul><p>The relevant lines of the osd logs are:</p>
<pre>
2015-12-08 14:45:53.349784 7f3a6c534700 20 osd.0 19 scrub_load_below_threshold loadavg 32.23 >= max 0.5 and ( >= daily_loadavg 13.3244 or >= 15m avg 13.39) = no
2015-12-08 14:45:53.349806 7f3a6c534700 20 osd.0 19 sched_scrub load_is_low=0
2015-12-08 14:45:53.349809 7f3a6c534700 10 osd.0 19 sched_scrub 2.0 schedued at 2015-12-09 19:55:20.323080 > 2015-12-08 14:45:53.349745
</pre>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62738
2015-12-09T01:08:09Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>File</strong> <a href="/attachments/download/2127/test2.out.gz">test2.out.gz</a> added</li></ul>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62740
2015-12-09T10:20:13Z
Loïc Dachary
loic@dachary.org
<ul></ul><p>crash with monitor logs and --osd-scrub-load-threshold=2000 <a class="external" href="http://dachary.org/loic/crash.gz">http://dachary.org/loic/crash.gz</a></p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62745
2015-12-09T10:51:43Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-4 priority-default closed" href="/issues/14027">Bug #14027</a>: tests: set --osd-scrub-load-threshold=2000</i> added</li></ul>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62747
2015-12-09T12:56:48Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>In Progress</i></li></ul>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62749
2015-12-09T13:02:17Z
Loïc Dachary
loic@dachary.org
<ul></ul><p>To reproduce the problem:</p>
<ul>
<li>in the background run<br /><pre>
while : ; do echo -n . ; ./run-make-check.sh > run.out 2>&1 ; done
</pre></li>
<li>on master, after a ./run-make-check.sh<br /><pre>
cd src ; while : ; do echo -n . ; test/osd/osd-scrub-snaps.sh > test.out 2>&1 || break ; done
</pre></li>
</ul>
<p>Using the commit from <a class="external" href="https://github.com/ceph/ceph/pull/6860">https://github.com/ceph/ceph/pull/6860</a> to display the monitor and osd log, test.out will have all the necessary diagnostic information.</p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62784
2015-12-10T11:44:44Z
Loïc Dachary
loic@dachary.org
<ul></ul><p><a class="external" href="http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/10041/console">http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/10041/console</a></p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62786
2015-12-10T12:32:46Z
Loïc Dachary
loic@dachary.org
<ul></ul><p><a class="external" href="http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/10048/console">http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/10048/console</a></p>
<p>it has a new kind of error log. The kill_daemon function returned on error, meaning it was unable to kill the daemons before the timeout (~3 minutes). I've never seen this but it's not impossible if the machine is heavily overloaded.</p>
<pre>
./qa/workunits/ceph-helpers.sh:1147: wait_for_scrub: test '2015-12-10 11:22:18.817416' '!=' '2015-12-10 11:23:06.796504'
../qa/workunits/ceph-helpers.sh:1148: wait_for_scrub: return 0
./test/osd/osd-scrub-snaps.sh:158: TEST_scrub_snaps: grep log_channel testdir/osd-scrub-snaps/osd.0.log
2015-12-10 11:23:06.791343 7f7b7e136700 0 log_channel(cluster) log [INF] : 2.0 scrub starts
2015-12-10 11:23:06.794271 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/2acecc8b/obj10/1 is missing in clone_overlap
2015-12-10 11:23:06.794328 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/666934a3/obj5/7 no '_' attr
2015-12-10 11:23:06.794336 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/666934a3/obj5/7 is an unexpected clone
2015-12-10 11:23:06.794377 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/666934a3/obj5/4 on disk size (4608) does not match object info size (512) adjusted for ondisk to (512)
2015-12-10 11:23:06.794443 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/666934a3/obj5/head expected clone 2/666934a3/obj5/2
2015-12-10 11:23:06.794449 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/666934a3/obj5/head expected clone 2/666934a3/obj5/1
2015-12-10 11:23:06.794454 7f7b7e136700 0 log_channel(cluster) log [INF] : scrub 2.0 2/666934a3/obj5/head 2 missing clone(s)
2015-12-10 11:23:06.794481 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/d3a9faf5/obj12/head snapset.head_exists=false, but head exists
2015-12-10 11:23:06.794637 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/8df7eaa5/obj8/head snaps.seq not set
2015-12-10 11:23:06.794717 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/5c889059/obj7/head snapset.head_exists=false, but head exists
2015-12-10 11:23:06.794741 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/5c889059/obj7/1 is an unexpected clone
2015-12-10 11:23:06.794750 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/61f68bb1/obj3/head on disk size (3840) does not match object info size (768) adjusted for ondisk to (768)
2015-12-10 11:23:06.794900 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/83425cc4/obj6/1 is an unexpected clone
2015-12-10 11:23:06.794928 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/3f1ee208/obj2/snapdir no 'snapset' attr
2015-12-10 11:23:06.794948 7f7b7e136700 0 log_channel(cluster) log [INF] : scrub 2.0 2/3f1ee208/obj2/7 clone ignored due to missing snapset
2015-12-10 11:23:06.794967 7f7b7e136700 0 log_channel(cluster) log [INF] : scrub 2.0 2/3f1ee208/obj2/4 clone ignored due to missing snapset
2015-12-10 11:23:06.795017 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/a8759770/obj4/snapdir expected clone 2/a8759770/obj4/7
2015-12-10 11:23:06.795023 7f7b7e136700 0 log_channel(cluster) log [INF] : scrub 2.0 2/a8759770/obj4/snapdir 1 missing clone(s)
2015-12-10 11:23:06.796212 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/6cf8deff/obj1/1 is an unexpected clone
2015-12-10 11:23:06.796283 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/e478ac7f/obj9/1 is missing in clone_size
2015-12-10 11:23:06.796337 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/29547577/obj11/1 is an unexpected clone
2015-12-10 11:23:06.796412 7f7b7e136700 -1 log_channel(cluster) log [ERR] : scrub 2.0 2/94122507/obj14/1 size 1032 != clone_size 1033
2015-12-10 11:23:06.796482 7f7b7e136700 -1 log_channel(cluster) log [ERR] : 2.0 scrub stat mismatch, got 21/31 objects, 10/19 clones, 21/31 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 7681/13064 bytes, 0/0 hit_set_archive bytes.
2015-12-10 11:23:06.796500 7f7b7e136700 -1 log_channel(cluster) log [ERR] : 2.0 scrub 19 errors
../test/osd/osd-scrub-snaps.sh:34: TEST_scrub_snaps: seq 1 7
./test/osd/osd-scrub-snaps.sh:160: TEST_scrub_snaps: for i in '`seq 1 7`'
./test/osd/osd-scrub-snaps.sh:162: TEST_scrub_snaps: rados -p test rmsnap snap1
removed pool test snap snap1
./test/osd/osd-scrub-snaps.sh:160: TEST_scrub_snaps: for i in '`seq 1 7`'
./test/osd/osd-scrub-snaps.sh:162: TEST_scrub_snaps: rados -p test rmsnap snap2
removed pool test snap snap2
./test/osd/osd-scrub-snaps.sh:160: TEST_scrub_snaps: for i in '`seq 1 7`'
./test/osd/osd-scrub-snaps.sh:162: TEST_scrub_snaps: rados -p test rmsnap snap3
removed pool test snap snap3
./test/osd/osd-scrub-snaps.sh:160: TEST_scrub_snaps: for i in '`seq 1 7`'
./test/osd/osd-scrub-snaps.sh:162: TEST_scrub_snaps: rados -p test rmsnap snap4
removed pool test snap snap4
./test/osd/osd-scrub-snaps.sh:160: TEST_scrub_snaps: for i in '`seq 1 7`'
./test/osd/osd-scrub-snaps.sh:162: TEST_scrub_snaps: rados -p test rmsnap snap5
removed pool test snap snap5
./test/osd/osd-scrub-snaps.sh:160: TEST_scrub_snaps: for i in '`seq 1 7`'
./test/osd/osd-scrub-snaps.sh:162: TEST_scrub_snaps: rados -p test rmsnap snap6
removed pool test snap snap6
./test/osd/osd-scrub-snaps.sh:160: TEST_scrub_snaps: for i in '`seq 1 7`'
./test/osd/osd-scrub-snaps.sh:162: TEST_scrub_snaps: rados -p test rmsnap snap7
removed pool test snap snap7
./test/osd/osd-scrub-snaps.sh:165: TEST_scrub_snaps: ERRORS=0
../test/osd/osd-scrub-snaps.sh:167: TEST_scrub_snaps: find testdir/osd-scrub-snaps
../test/osd/osd-scrub-snaps.sh:167: TEST_scrub_snaps: grep '[^/]*\.pid'
./test/osd/osd-scrub-snaps.sh:167: TEST_scrub_snaps: pidfile='testdir/osd-scrub-snaps/mon.a.pid
testdir/osd-scrub-snaps/osd.0.pid'
../test/osd/osd-scrub-snaps.sh:168: TEST_scrub_snaps: cat testdir/osd-scrub-snaps/mon.a.pid testdir/osd-scrub-snaps/osd.0.pid
./test/osd/osd-scrub-snaps.sh:168: TEST_scrub_snaps: pid='32405
16134'
./test/osd/osd-scrub-snaps.sh:169: TEST_scrub_snaps: kill -0 32405 16134
./test/osd/osd-scrub-snaps.sh:176: TEST_scrub_snaps: kill_daemons testdir/osd-scrub-snaps
.../qa/workunits/ceph-helpers.sh:192: kill_daemons: shopt -q -o xtrace
.../qa/workunits/ceph-helpers.sh:192: kill_daemons: echo true
../qa/workunits/ceph-helpers.sh:192: kill_daemons: local trace=true
../qa/workunits/ceph-helpers.sh:193: kill_daemons: true
../qa/workunits/ceph-helpers.sh:193: kill_daemons: shopt -u -o xtrace
../qa/workunits/ceph-helpers.sh:219: kill_daemons: return 1
./test/osd/osd-scrub-snaps.sh:176: TEST_scrub_snaps: return 1
</pre>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62787
2015-12-10T12:34:38Z
Loïc Dachary
loic@dachary.org
<ul></ul><p>@david do you have an opinion on the error log above ? It's the first time it happens.</p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62788
2015-12-10T13:04:38Z
Loïc Dachary
loic@dachary.org
<ul></ul><p>See <a class="external" href="https://github.com/ceph/ceph/pull/6884">https://github.com/ceph/ceph/pull/6884</a> for an explanation of false negatives.</p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62789
2015-12-10T15:46:43Z
Loïc Dachary
loic@dachary.org
<ul></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/6890">https://github.com/ceph/ceph/pull/6890</a></p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62852
2015-12-11T17:26:10Z
Loïc Dachary
loic@dachary.org
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62853
2015-12-11T17:26:58Z
Loïc Dachary
loic@dachary.org
<ul></ul><p><a class="external" href="http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/10086/console">http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/10086/console</a></p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62854
2015-12-11T17:47:41Z
Loïc Dachary
loic@dachary.org
<ul></ul><p><a class="external" href="http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/10092/">http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/10092/</a></p>
Ceph - Bug #13986: TEST_scrub_snaps: timeout > 300
https://tracker.ceph.com/issues/13986?journal_id=62857
2015-12-11T19:55:33Z
Loïc Dachary
loic@dachary.org
<ul></ul><p><a class="external" href="http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/10094/console">http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/10094/console</a></p>