Bug #44662
qa/standalone/osd/osd-markdown.sh: markdown_N_impl fails in TEST_markdown_boot
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2020-03-16T19:53:52.312 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:45: markdown_N_impl: ceph osd tree 2020-03-16T19:53:52.575 INFO:tasks.workunit.client.0.smithi131.stdout:ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF 2020-03-16T19:53:52.576 INFO:tasks.workunit.client.0.smithi131.stdout:-1 0.29306 root default 2020-03-16T19:53:52.576 INFO:tasks.workunit.client.0.smithi131.stdout:-3 0.29306 host smithi131 2020-03-16T19:53:52.576 INFO:tasks.workunit.client.0.smithi131.stdout: 0 hdd 0.09769 osd.0 down 1.00000 1.00000 2020-03-16T19:53:52.576 INFO:tasks.workunit.client.0.smithi131.stdout: 1 hdd 0.09769 osd.1 up 1.00000 1.00000 2020-03-16T19:53:52.576 INFO:tasks.workunit.client.0.smithi131.stdout: 2 hdd 0.09769 osd.2 up 1.00000 1.00000 2020-03-16T19:53:52.585 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:46: markdown_N_impl: ceph osd tree 2020-03-16T19:53:52.585 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:46: markdown_N_impl: grep osd.0 2020-03-16T19:53:52.586 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:46: markdown_N_impl: grep up 2020-03-16T19:53:52.834 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:46: markdown_N_impl: return 1 2020-03-16T19:53:52.835 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:100: TEST_markdown_boot: sleep 15 2020-03-16T19:54:07.836 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:101: TEST_markdown_boot: ceph osd tree 2020-03-16T19:54:07.836 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:101: TEST_markdown_boot: grep up 2020-03-16T19:54:07.836 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:101: TEST_markdown_boot: grep osd.0 2020-03-16T19:54:08.111 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:101: TEST_markdown_boot: return 1
/a/nojha-2020-03-16_17:35:35-rados:standalone-master-distro-basic-smithi/4860655/
Related issues
History
#1 Updated by xie xingguo over 3 years ago
aah, this looks like having the same root cause as https://tracker.ceph.com/issues/44518.
On the monitor side:
2020-03-16T19:53:42.160+0000 7f3bfc664700 10 mon.a@0(leader).log v82 logging 2020-03-16T19:53:42.161223+0000 mon.a (mon.0) 142 : audit [INF] from='client.? 127.0.0.1:0/2347291138' entity='client.admin' cmd=[{"prefix": "osd down", "ids": ["0"]}]: dispatch 2020-03-16T19:53:42.160+0000 7f3bfc664700 10 mon.a@0(leader).paxosservice(logm 1..82) setting proposal_timer 0x559f39316540 with delay of 0.0124679 2020-03-16T19:53:42.160+0000 7f3bfc664700 20 -- v2:127.0.0.1:7108/0 done calling dispatch on 0x559f393468c0 2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxosservice(logm 1..82) propose_pending 2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).log v82 encode_full log v 82 2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).log v82 encode_pending v83 2020-03-16T19:53:42.173+0000 7f3bfee69700 5 mon.a@0(leader).paxos(paxos active c 1..135) queue_pending_finisher 0x559f37ef7850 2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos active c 1..135) trigger_propose active, proposing now 2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos active c 1..135) propose_pending 136 20090 bytes 2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos updating c 1..135) begin for 136 20090 bytes 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos updating c 1..135) commit_start 136 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).paxosservice(osdmap 1..29) propose_pending 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 encode_pending e 30 2020-03-16T19:53:42.198+0000 7f3bfee69700 1 mon.a@0(leader).osd e29 do_prune osdmap full prune enabled 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 should_prune currently holding only 28 epochs (min osdmap epochs: 500); do not prune. 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 maybe_prime_pg_temp estimate 4 pgs on 1 osds >= 0.25 of total 4 pgs, all 2020-03-16T19:53:42.198+0000 7f3bfee69700 20 queue 0x7f3bfee64cc0 1 [0,4) 2020-03-16T19:53:42.198+0000 7f3bfce65700 20 _process 0x7f3bfee64cc0 pool 1 [0,4) pgs [] 2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp 2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.0 [1,0,2]/[] -> [1,2]/[1,2], priming [] 2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp 2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.1 [2,0,1]/[] -> [2,1]/[2,1], priming [] 2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp 2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.2 [0,1,2]/[] -> [1,2]/[1,2], priming [] 2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp 2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.3 [1,2,0]/[] -> [1,2]/[1,2], priming [] 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 maybe_prime_pg_temp done in 0.000100 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 scan_for_creating_pgs already created 1 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs 0 pools queued 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs 0 pgs removed because they're created 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs queue remaining: 0 pools 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs 0/0 pgs added from queued pools 2020-03-16T19:53:42.198+0000 7f3bfee69700 2 mon.a@0(leader).osd e29 osd.0 DOWN 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 encode_pending encoding full map with octopus features 1080873256688364036 2020-03-16T19:53:42.198+0000 7f3bfee69700 20 mon.a@0(leader).osd e29 full_crc 113280054 inc_crc 473227981 2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader) e1 log_health updated 1 previous 0 2020-03-16T19:53:42.198+0000 7f3bfee69700 0 log_channel(cluster) log [WRN] : Health check failed: 1 osds down (OSD_DOWN)
e.g., osd.0 has been marked down at e29.
On the osd.0 side, osd.0 was stuck at e28:
2020-03-16T19:54:07.999+0000 7f944a1b8700 10 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] do_peering_event: epoch_sent: 28 epoch_requested: 28 MLeaseAck epoch 28 from osd.1 pg_lease_ack(ruub 71.698279127s) 2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recalc_readable_until peer osd.1 ruub 71.698279127s 2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recalc_readable_until peer osd.2 ruub 71.698279127s 2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recalc_readable_until readable_until[_ub] 71.698279127s (sent 71.698279127s) 2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recheck_readable wasn't wait or laggy
#2 Updated by Neha Ojha over 3 years ago
- Status changed from New to Fix Under Review
- Assignee set to xie xingguo
- Pull request ID set to 34056
#3 Updated by Kefu Chai over 3 years ago
- Status changed from Fix Under Review to Resolved
#4 Updated by Neha Ojha over 3 years ago
- Related to Bug #45139: osd/osd-markdown.sh: markdown_N_impl failure added