Bug #45647
"ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failures/many.yaml
0%
Description
rados/singleton/{all/dump-stuck.yaml msgr-failures/many.yaml msgr/async.yaml objectstore/bluestore-bitmap.yaml rados.yaml supported-random-distro$/{rhel_8.yaml}}
in teuthology.log
2020-05-21T11:33:25.479 INFO:teuthology.orchestra.run.smithi158:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early osd last-stat-seq osd.0 2020-05-21T11:33:25.736 INFO:teuthology.orchestra.run.smithi158.stdout:30064771091 2020-05-21T11:33:25.747 INFO:tasks.dump_stuck.ceph_manager:need seq 30064771093 got 30064771091 for osd.0 2020-05-21T11:33:26.748 INFO:teuthology.orchestra.run.smithi158:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early osd last-stat-seq osd.0 2020-05-21T11:35:26.788 DEBUG:teuthology.orchestra.run:got remote process result: 124 2020-05-21T11:35:26.789 ERROR:teuthology.run_tasks:Saw exception from tasks.
on monitor side:
2020-05-21T11:33:25.732+0000 7feb1b348700 10 mon.a@0(leader).log v56 logging 2020-05-21T11:33:25.733449+0000 mon.a (mon.0) 93 : audit [DBG] from='client.? 172.21.15.158:0/2030381803' entity='client.admin' cmd=[{"prefix": "osd last-stat-seq", "id": 0}]: dispatch 2020-05-21T11:33:25.732+0000 7feb1b348700 10 mon.a@0(leader).paxosservice(logm 1..56) proposal_timer already set 2020-05-21T11:33:25.733+0000 7feb1a346700 1 -- [v2:172.21.15.158:3300/0,v1:172.21.15.158:6789/0] >> 172.21.15.158:0/2030381803 conn(0x55ed36e1d000 msgr2=0x55ed36dffb00 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 42 2020-05-21T11:33:25.733+0000 7feb1a346700 1 -- [v2:172.21.15.158:3300/0,v1:172.21.15.158:6789/0] >> 172.21.15.158:0/2030381803 conn(0x55ed36e1d000 msgr2=0x55ed36dffb00 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed 2020-05-21T11:33:25.733+0000 7feb1a346700 1 --2- [v2:172.21.15.158:3300/0,v1:172.21.15.158:6789/0] >> 172.21.15.158:0/2030381803 conn(0x55ed36e1d000 0x55ed36dffb00 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55ed36978b10 tx=0x55ed36c819f0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted) 2020-05-21T11:33:25.733+0000 7feb1a346700 1 --2- [v2:172.21.15.158:3300/0,v1:172.21.15.158:6789/0] >> 172.21.15.158:0/2030381803 conn(0x55ed36e1d000 0x55ed36dffb00 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55ed36978b10 tx=0x55ed36c819f0).stop 2020-05-21T11:33:25.733+0000 7feb1a346700 1 -- [v2:172.21.15.158:3300/0,v1:172.21.15.158:6789/0] reap_dead start
/a/kchai-2020-05-21_10:34:02-rados-wip-kefu-testing-2020-05-21-1652-distro-basic-smithi/5076350
Related issues
History
#1 Updated by Neha Ojha almost 4 years ago
- Priority changed from Normal to High
/a/nojha-2020-05-21_19:33:40-rados-wip-32601-distro-basic-smithi/5076944/
#2 Updated by Neha Ojha over 3 years ago
- Related to Bug #39039: mon connection reset, command not resent added
#3 Updated by Neha Ojha over 3 years ago
- Backport set to octopus
/a/yuriw-2020-07-06_19:37:47-rados-wip-yuri7-testing-2020-07-06-1754-octopus-distro-basic-smithi/5204335
#4 Updated by Deepika Upadhyay over 3 years ago
/a/yuriw-2020-08-27_00:49:53-rados-wip-yuri8-testing-2020-08-26-2329-octopus-distro-basic-smithi/5379206/
rados/singleton/{all/thrash-eio msgr-failures/few msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{ubuntu_latest}}
2020-08-27T04:56:57.176 INFO:teuthology.orchestra.run.smithi086.stdout:47244640258 2020-08-27T04:56:57.187 INFO:tasks.ceph.ceph_manager.ceph:need seq 47244640259 got 47244640258 for osd.0 2020-08-27T04:56:58.188 INFO:teuthology.orchestra.run.smithi086:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early osd last-stat-seq osd.0
#5 Updated by Zhenyi Shu over 3 years ago
/a/shuzhenyi-2020-08-28_05:41:22-rados:thrash-wip-shuzhenyi-testing-2020-08-28-0955-distro-basic-smithi/5381787/
rados:thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/short_pg_log 2-recovery-overrides/{more-async-recovery} backoff/peering_and_degraded ceph clusters/{fixed-2 openstack} crc-failures/default d-balancer/on msgr-failures/osd-delay msgr/async objectstore/bluestore-comp-snappy rados supported-random-distro$/{rhel_8} thrashers/pggrow thrashosds-health workloads/dedup_tier}
2020-08-28T06:41:07.433 INFO:teuthology.orchestra.run.smithi198:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early osd last-stat-seq osd.2 2020-08-28T06:43:07.470 DEBUG:teuthology.orchestra.run:got remote process result: 124 2020-08-28T06:43:07.499 ERROR:teuthology.run_tasks:Manager failed: thrashosds
#6 Updated by Neha Ojha over 3 years ago
rados/singleton/{all/peer mon_election/connectivity msgr-failures/many msgr/async-v2only objectstore/bluestore-bitmap rados supported-random-distro$/{ubuntu_latest}}
/a/teuthology-2020-09-28_07:01:02-rados-master-distro-basic-smithi/5476986
#7 Updated by Neha Ojha over 3 years ago
rados/singleton/{all/max-pg-per-osd.from-replica mon_election/connectivity msgr-failures/many msgr/async-v2only objectstore/bluestore-comp-zstd rados supported-random-distro$/{centos_8}}
/a/teuthology-2020-11-09_07:01:01-rados-master-distro-basic-smithi/5605014
#8 Updated by Neha Ojha about 3 years ago
rados/thrash-erasure-code-shec/{ceph clusters/{fixed-4 openstack} mon_election/classic msgr-failures/few objectstore/bluestore-hybrid rados recovery-overrides/{default} supported-random-distro$/{centos_8} thrashers/careful thrashosds-health workloads/ec-rados-plugin=shec-k=4-m=3-c=2}
/a/nojha-2021-01-07_00:06:49-rados-master-distro-basic-smithi/5760847
#9 Updated by Deepika Upadhyay about 3 years ago
2021-02-07T16:57:57.811+0000 7fb612561700 1 -- [v2:172.21.15.37:3300/0,v1:172.21.15.37:6789/0] >> 172.21.15.37:0/253127657 conn(0x5614574f1800 msgr2=0x5614574e7680 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed 2021-02-07T16:57:57.811+0000 7fb612561700 1 --2- [v2:172.21.15.37:3300/0,v1:172.21.15.37:6789/0] >> 172.21.15.37:0/253127657 conn(0x5614574f1800 0x5614574e7680 secure :-1 s=READY pgs=1 cs=0 l=1 rx=0x56145706d710 tx=0x5614570534f0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted) 2021-02-07T16:57:57.811+0000 7fb612561700 1 --2- [v2:172.21.15.37:3300/0,v1:172.21.15.37:6789/0] >> 172.21.15.37:0/253127657 conn(0x5614574f1800 0x5614574e7680 secure :-1 s=READY pgs=1 cs=0 l=1 rx=0x56145706d710 tx=0x5614570534f0).stop 2021-02-07T16:57:57.811+0000 7fb60e559700 10 mon.a@0(leader) e1 ms_handle_reset 0x5614574f1800 172.21.15.37:0/253127657 2021-02-07T16:57:57.811+0000 7fb60e559700 10 mon.a@0(leader) e1 reset/close on session client.? 172.21.15.37:0/253127657 2021-02-07T16:57:57.811+0000 7fb60d557700 1 -- [v2:172.21.15.37:3300/0,v1:172.21.15.37:6789/0] reap_dead start
/ceph/teuthology-archive/yuriw-2021-02-07_16:27:00-rados-wip-yuri8-testing-2021-01-27-1208-octopus-distro-basic-smithi/5865216/remote/ubuntu@smithi037.front.sepia.ceph.com/log/89a3c39e-6965-11eb-8fde-001a4aab830c/ceph-mon.a.log.gz
description: rados/cephadm/upgrade/{1-start 2-repo_digest/defaut 3-start-upgrade 4-wait distro$/{centos_latest} fixed-2}
failure_reason: reached maximum tries (180) after waiting for 180 seconds
/ceph/teuthology-archive/yuriw-2021-02-07_16:27:00-rados-wip-yuri8-testing-2021-01-27-1208-octopus-distro-basic-smithi/5865216/teuthology.log
another one in same batch:
/ceph/teuthology-archive/yuriw-2021-02-07_16:27:00-rados-wip-yuri8-testing-2021-01-27-1208-octopus-distro-basic-smithi/5865216/remote/ubuntu@smithi037.front.sepia.ceph.com/
log/89a3c39e-6965-11eb-8fde-001a4aab830c/ceph-mon.a.log.gz
#10 Updated by Deepika Upadhyay about 3 years ago
- Related to Bug #49212: mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd' added
#11 Updated by Deepika Upadhyay about 3 years ago
- Related to deleted (Bug #49212: mon/crush_ops.sh fails: Error EBUSY: osd.1 has already bound to class 'ssd', can not reset class to 'hdd')
#12 Updated by Neha Ojha about 3 years ago
- Backport changed from octopus to octopus, pacific
/a/teuthology-2021-02-17_03:31:03-rados-pacific-distro-basic-smithi/5889472
#13 Updated by Xiubo Li over 2 years ago
- Related to Bug #53436: mds, mon: mds beacon messages get dropped? (mds never reaches up:active state) added
#14 Updated by Neha Ojha about 2 years ago
- Priority changed from High to Normal