Project

General

Profile

Bug #45647

"ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failures/many.yaml

Added by Kefu Chai 2 months ago. Updated 26 days ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

rados/singleton/{all/dump-stuck.yaml msgr-failures/many.yaml msgr/async.yaml objectstore/bluestore-bitmap.yaml rados.yaml supported-random-distro$/{rhel_8.yaml}}

in teuthology.log

2020-05-21T11:33:25.479 INFO:teuthology.orchestra.run.smithi158:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early osd last-stat-seq osd.0
2020-05-21T11:33:25.736 INFO:teuthology.orchestra.run.smithi158.stdout:30064771091
2020-05-21T11:33:25.747 INFO:tasks.dump_stuck.ceph_manager:need seq 30064771093 got 30064771091 for osd.0
2020-05-21T11:33:26.748 INFO:teuthology.orchestra.run.smithi158:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early osd last-stat-seq osd.0
2020-05-21T11:35:26.788 DEBUG:teuthology.orchestra.run:got remote process result: 124
2020-05-21T11:35:26.789 ERROR:teuthology.run_tasks:Saw exception from tasks.

on monitor side:

2020-05-21T11:33:25.732+0000 7feb1b348700 10 mon.a@0(leader).log v56  logging 2020-05-21T11:33:25.733449+0000 mon.a (mon.0) 93 : audit [DBG] from='client.? 172.21.15.158:0/2030381803' entity='client.admin' cmd=[{"prefix": "osd last-stat-seq", "id": 0}]: dispatch
2020-05-21T11:33:25.732+0000 7feb1b348700 10 mon.a@0(leader).paxosservice(logm 1..56)  proposal_timer already set
2020-05-21T11:33:25.733+0000 7feb1a346700  1 -- [v2:172.21.15.158:3300/0,v1:172.21.15.158:6789/0] >> 172.21.15.158:0/2030381803 conn(0x55ed36e1d000 msgr2=0x55ed36dffb00 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 42
2020-05-21T11:33:25.733+0000 7feb1a346700  1 -- [v2:172.21.15.158:3300/0,v1:172.21.15.158:6789/0] >> 172.21.15.158:0/2030381803 conn(0x55ed36e1d000 msgr2=0x55ed36dffb00 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2020-05-21T11:33:25.733+0000 7feb1a346700  1 --2- [v2:172.21.15.158:3300/0,v1:172.21.15.158:6789/0] >> 172.21.15.158:0/2030381803 conn(0x55ed36e1d000 0x55ed36dffb00 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55ed36978b10 tx=0x55ed36c819f0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2020-05-21T11:33:25.733+0000 7feb1a346700  1 --2- [v2:172.21.15.158:3300/0,v1:172.21.15.158:6789/0] >> 172.21.15.158:0/2030381803 conn(0x55ed36e1d000 0x55ed36dffb00 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55ed36978b10 tx=0x55ed36c819f0).stop
2020-05-21T11:33:25.733+0000 7feb1a346700  1 -- [v2:172.21.15.158:3300/0,v1:172.21.15.158:6789/0] reap_dead start

/a/kchai-2020-05-21_10:34:02-rados-wip-kefu-testing-2020-05-21-1652-distro-basic-smithi/5076350


Related issues

Related to RADOS - Bug #39039: mon connection reset, command not resent Need More Info

History

#1 Updated by Neha Ojha 2 months ago

  • Priority changed from Normal to High

/a/nojha-2020-05-21_19:33:40-rados-wip-32601-distro-basic-smithi/5076944/

#2 Updated by Neha Ojha about 1 month ago

  • Related to Bug #39039: mon connection reset, command not resent added

#3 Updated by Neha Ojha 26 days ago

  • Backport set to octopus

/a/yuriw-2020-07-06_19:37:47-rados-wip-yuri7-testing-2020-07-06-1754-octopus-distro-basic-smithi/5204335

Also available in: Atom PDF