Project

General

Profile

Actions

Bug #44662

closed

qa/standalone/osd/osd-markdown.sh: markdown_N_impl fails in TEST_markdown_boot

Added by Neha Ojha about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-03-16T19:53:52.312 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:45: markdown_N_impl:  ceph osd tree
2020-03-16T19:53:52.575 INFO:tasks.workunit.client.0.smithi131.stdout:ID  CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF                                     
2020-03-16T19:53:52.576 INFO:tasks.workunit.client.0.smithi131.stdout:-1         0.29306  root default
2020-03-16T19:53:52.576 INFO:tasks.workunit.client.0.smithi131.stdout:-3         0.29306      host smithi131
2020-03-16T19:53:52.576 INFO:tasks.workunit.client.0.smithi131.stdout: 0    hdd  0.09769          osd.0         down   1.00000  1.00000
2020-03-16T19:53:52.576 INFO:tasks.workunit.client.0.smithi131.stdout: 1    hdd  0.09769          osd.1           up   1.00000  1.00000
2020-03-16T19:53:52.576 INFO:tasks.workunit.client.0.smithi131.stdout: 2    hdd  0.09769          osd.2           up   1.00000  1.00000
2020-03-16T19:53:52.585 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:46: markdown_N_impl:  ceph osd tree
2020-03-16T19:53:52.585 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:46: markdown_N_impl:  grep osd.0
2020-03-16T19:53:52.586 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:46: markdown_N_impl:  grep up
2020-03-16T19:53:52.834 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:46: markdown_N_impl:  return 1
2020-03-16T19:53:52.835 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:100: TEST_markdown_boot:  sleep 15
2020-03-16T19:54:07.836 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:101: TEST_markdown_boot:  ceph osd tree
2020-03-16T19:54:07.836 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:101: TEST_markdown_boot:  grep up
2020-03-16T19:54:07.836 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:101: TEST_markdown_boot:  grep osd.0
2020-03-16T19:54:08.111 INFO:tasks.workunit.client.0.smithi131.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:101: TEST_markdown_boot:  return 1

/a/nojha-2020-03-16_17:35:35-rados:standalone-master-distro-basic-smithi/4860655/


Related issues 1 (1 open0 closed)

Related to RADOS - Bug #45139: osd/osd-markdown.sh: markdown_N_impl failureNew

Actions
Actions #1

Updated by xie xingguo about 4 years ago

aah, this looks like having the same root cause as https://tracker.ceph.com/issues/44518.

On the monitor side:

2020-03-16T19:53:42.160+0000 7f3bfc664700 10 mon.a@0(leader).log v82  logging 2020-03-16T19:53:42.161223+0000 mon.a (mon.0) 142 : audit [INF] from='client.? 127.0.0.1:0/2347291138' entity='client.admin' cmd=[{"prefix": "osd down", "ids": ["0"]}]: dispatch
2020-03-16T19:53:42.160+0000 7f3bfc664700 10 mon.a@0(leader).paxosservice(logm 1..82)  setting proposal_timer 0x559f39316540 with delay of 0.0124679
2020-03-16T19:53:42.160+0000 7f3bfc664700 20 -- v2:127.0.0.1:7108/0 done calling dispatch on 0x559f393468c0
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxosservice(logm 1..82) propose_pending
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).log v82 encode_full log v 82
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).log v82 encode_pending v83
2020-03-16T19:53:42.173+0000 7f3bfee69700  5 mon.a@0(leader).paxos(paxos active c 1..135) queue_pending_finisher 0x559f37ef7850
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos active c 1..135) trigger_propose active, proposing now
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos active c 1..135) propose_pending 136 20090 bytes
2020-03-16T19:53:42.173+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos updating c 1..135) begin for 136 20090 bytes
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).paxos(paxos updating c 1..135) commit_start 136
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).paxosservice(osdmap 1..29) propose_pending
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 encode_pending e 30
2020-03-16T19:53:42.198+0000 7f3bfee69700  1 mon.a@0(leader).osd e29 do_prune osdmap full prune enabled
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 should_prune currently holding only 28 epochs (min osdmap epochs: 500); do not prune.
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 maybe_prime_pg_temp estimate 4 pgs on 1 osds >= 0.25 of total 4 pgs, all
2020-03-16T19:53:42.198+0000 7f3bfee69700 20 queue 0x7f3bfee64cc0 1 [0,4)
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 _process 0x7f3bfee64cc0 pool 1 [0,4) pgs []
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.0 [1,0,2]/[] -> [1,2]/[1,2], priming []
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.1 [2,0,1]/[] -> [2,1]/[2,1], priming []
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.2 [0,1,2]/[] -> [1,2]/[1,2], priming []
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp next_up == next_acting now, clear pg_temp
2020-03-16T19:53:42.198+0000 7f3bfce65700 20 mon.a@0(leader).osd e29 prime_pg_temp 1.3 [1,2,0]/[] -> [1,2]/[1,2], priming []
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 maybe_prime_pg_temp done in 0.000100
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 scan_for_creating_pgs already created 1
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs 0 pools queued
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs 0 pgs removed because they're created
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs queue remaining: 0 pools
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 update_pending_pgs 0/0 pgs added from queued pools
2020-03-16T19:53:42.198+0000 7f3bfee69700  2 mon.a@0(leader).osd e29  osd.0 DOWN
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader).osd e29 encode_pending encoding full map with octopus features 1080873256688364036
2020-03-16T19:53:42.198+0000 7f3bfee69700 20 mon.a@0(leader).osd e29  full_crc 113280054 inc_crc 473227981
2020-03-16T19:53:42.198+0000 7f3bfee69700 10 mon.a@0(leader) e1 log_health updated 1 previous 0
2020-03-16T19:53:42.198+0000 7f3bfee69700  0 log_channel(cluster) log [WRN] : Health check failed: 1 osds down (OSD_DOWN)

e.g., osd.0 has been marked down at e29.

On the osd.0 side, osd.0 was stuck at e28:

2020-03-16T19:54:07.999+0000 7f944a1b8700 10 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] do_peering_event: epoch_sent: 28 epoch_requested: 28 MLeaseAck epoch 28 from osd.1 pg_lease_ack(ruub 71.698279127s)
2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recalc_readable_until peer osd.1 ruub 71.698279127s
2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recalc_readable_until peer osd.2 ruub 71.698279127s
2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recalc_readable_until readable_until[_ub] 71.698279127s (sent 71.698279127s)
2020-03-16T19:54:07.999+0000 7f944a1b8700 20 osd.0 pg_epoch: 28 pg[1.2( v 22'2 (0'0,22'2] local-lis/les=26/27 n=1 ec=17/17 lis/c=26/26 les/c/f=27/27/0 sis=26 pruub=11.761122524s) [0,1,2] r=0 lpr=26 crt=22'2 lcod 21'1 mlcod 0'0 active+clean ps=[2~1]] recheck_readable wasn't wait or laggy
Actions #2

Updated by Neha Ojha about 4 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to xie xingguo
  • Pull request ID set to 34056
Actions #3

Updated by Kefu Chai about 4 years ago

  • Status changed from Fix Under Review to Resolved
Actions #4

Updated by Neha Ojha about 4 years ago

  • Related to Bug #45139: osd/osd-markdown.sh: markdown_N_impl failure added
Actions

Also available in: Atom PDF