Bug #45593
qa: removing network bridge appears to cause dropped packets
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa
Pull request ID:
Crash signature:
Description
2020-05-16T13:01:44.540 INFO:tasks.cephfs.mount:Removing the 'ceph-brx' 2020-05-16T13:01:44.540 INFO:teuthology.orchestra.run:Running command with timeout 300 2020-05-16T13:01:44.540 INFO:teuthology.orchestra.run.smithi114:> sudo bash -c 'ip link set ceph-brx down' 2020-05-16T13:01:44.605 INFO:teuthology.orchestra.run:Running command with timeout 300 2020-05-16T13:01:44.606 INFO:teuthology.orchestra.run.smithi114:> sudo bash -c 'ip link delete ceph-brx' 2020-05-16T13:01:44.606 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:44.601+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000) 2020-05-16T13:01:44.607 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:44.601+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000) 2020-05-16T13:01:44.607 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:44.601+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000) 2020-05-16T13:01:44.607 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:44.601+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000) 2020-05-16T13:01:44.614 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T13:01:44.609+0000 7f72cece0700 -1 osd.2 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 ever on either front or back, first ping sent 2020-05-16T12:59:55.305823+0000 (oldest deadline 2020-05-16T13:00:15.305823+0000) 2020-05-16T13:01:44.614 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T13:01:44.609+0000 7f72cece0700 -1 osd.2 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 ever on either front or back, first ping sent 2020-05-16T12:59:55.305823+0000 (oldest deadline 2020-05-16T13:00:15.305823+0000) 2020-05-16T13:01:44.615 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T13:01:44.609+0000 7f72cece0700 -1 osd.2 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 ever on either front or back, first ping sent 2020-05-16T12:59:55.305823+0000 (oldest deadline 2020-05-16T13:00:15.305823+0000) 2020-05-16T13:01:44.615 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T13:01:44.609+0000 7f72cece0700 -1 osd.2 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 ever on either front or back, first ping sent 2020-05-16T12:59:55.305823+0000 (oldest deadline 2020-05-16T13:00:15.305823+0000) 2020-05-16T13:01:44.615 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T13:01:44.613+0000 7f72cece0700 -1 osd.2 28 get_health_metrics reporting 104 slow ops, oldest is osd_op(mds.0.5:115966 3.5 3:a6630910:::1000000c882.00000000:head [omap-set-header in=274b,omap-set-vals in=5968b,omap-rm-keys in=472b] snapc 0=[] ondisk+write+known_if_redirected+full_force e28) 2020-05-16T13:01:44.713 INFO:teuthology.orchestra.run:Running command with timeout 300 2020-05-16T13:01:44.714 INFO:teuthology.orchestra.run.smithi114:> route 2020-05-16T13:01:44.758 INFO:teuthology.orchestra.run.smithi114.stdout:Kernel IP routing table 2020-05-16T13:01:44.758 INFO:teuthology.orchestra.run.smithi114.stdout:Destination Gateway Genmask Flags Metric Ref Use Iface 2020-05-16T13:01:44.759 INFO:teuthology.orchestra.run.smithi114.stdout:default _gateway 0.0.0.0 UG 100 0 0 enp3s0f1 2020-05-16T13:01:44.759 INFO:teuthology.orchestra.run.smithi114.stdout:172.21.0.0 0.0.0.0 255.255.240.0 U 0 0 0 enp3s0f1 2020-05-16T13:01:44.759 INFO:teuthology.orchestra.run.smithi114.stdout:_gateway 0.0.0.0 255.255.255.255 UH 100 0 0 enp3s0f1 2020-05-16T13:01:44.759 INFO:teuthology.orchestra.run:Running command with timeout 300 2020-05-16T13:01:44.760 INFO:teuthology.orchestra.run.smithi114:> sudo bash -c 'iptables -D FORWARD -o enp3s0f1 -i ceph-brx -j ACCEPT' 2020-05-16T13:01:44.857 INFO:teuthology.orchestra.run:Running command with timeout 300 2020-05-16T13:01:44.857 INFO:teuthology.orchestra.run.smithi114:> sudo bash -c 'iptables -D FORWARD -i enp3s0f1 -o ceph-brx -j ACCEPT' 2020-05-16T13:01:44.876 INFO:teuthology.orchestra.run:Running command with timeout 300 2020-05-16T13:01:44.877 INFO:teuthology.orchestra.run.smithi114:> sudo bash -c 'iptables -t nat -D POSTROUTING -s 192.168.255.254/16 -o enp3s0f1 -j MASQUERADE' 2020-05-16T13:01:45.125 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:01:45.121+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 ever on either front or back, first ping sent 2020-05-16T12:59:55.310340+0000 (oldest deadline 2020-05-16T13:00:15.310340+0000) 2020-05-16T13:01:45.126 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:01:45.121+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 ever on either front or back, first ping sent 2020-05-16T12:59:55.310340+0000 (oldest deadline 2020-05-16T13:00:15.310340+0000) 2020-05-16T13:01:45.126 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:01:45.121+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 ever on either front or back, first ping sent 2020-05-16T12:59:55.310340+0000 (oldest deadline 2020-05-16T13:00:15.310340+0000) 2020-05-16T13:01:45.126 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:01:45.121+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 ever on either front or back, first ping sent 2020-05-16T12:59:55.310340+0000 (oldest deadline 2020-05-16T13:00:15.310340+0000) 2020-05-16T13:01:45.127 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:01:45.121+0000 7f7ae1214700 -1 osd.3 28 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.4160.0:29 2.2 2:4e99cc3e:::rbd_mirror_snapshot_schedule:head [omap-get-vals in=16b] snapc 0=[] ondisk+read+known_if_redirected e28) 2020-05-16T13:01:45.352 INFO:tasks.ceph.mon.a.smithi114.stderr:2020-05-16T13:01:45.349+0000 7f65f7f05700 -1 mon.a@1(probing) e1 get_health_metrics reporting 8638 slow ops, oldest is osd_failure(failed timeout osd.4 [v2:172.21.15.38:6816/12946,v1:172.21.15.38:6817/12946] for 25sec e28 v28) 2020-05-16T13:01:45.408 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T13:01:45.405+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 ever on either front or back, first ping sent 2020-05-16T12:59:35.123796+0000 (oldest deadline 2020-05-16T12:59:55.123796+0000) 2020-05-16T13:01:45.409 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T13:01:45.405+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 ever on either front or back, first ping sent 2020-05-16T12:59:35.123796+0000 (oldest deadline 2020-05-16T12:59:55.123796+0000) 2020-05-16T13:01:45.409 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T13:01:45.405+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 ever on either front or back, first ping sent 2020-05-16T12:59:35.123796+0000 (oldest deadline 2020-05-16T12:59:55.123796+0000) 2020-05-16T13:01:45.409 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T13:01:45.405+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 ever on either front or back, first ping sent 2020-05-16T12:59:35.123796+0000 (oldest deadline 2020-05-16T12:59:55.123796+0000) 2020-05-16T13:01:45.561 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:45.557+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000) 2020-05-16T13:01:45.561 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:45.557+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000) 2020-05-16T13:01:45.561 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:45.557+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000) 2020-05-16T13:01:45.561 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:45.557+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000)
From: /ceph/teuthology-archive/pdonnell-2020-05-16_06:07:05-fs-wip-pdonnell-testing-20200516.030215-distro-basic-smithi/5060521/teuthology.log
History
#1 Updated by Xiubo Li 10 months ago
- Status changed from New to In Progress
It seems not the removing NAT rule's issue, this began very early and last for minutes already:
2020-05-16T12:38:29.797 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/
2020-05-16T12:38:29.797 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__init__.py
2020-05-16T12:38:29.797 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/authorization_code.py
2020-05-16T12:38:29.797 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/base.py
2020-05-16T12:38:29.924 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/dispatchers.py
2020-05-16T12:38:29.924 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/exceptions.py
2020-05-16T12:38:29.924 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/hybrid.py
2020-05-16T12:38:29.924 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/implicit.py
2020-05-16T12:38:29.925 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/
2020-05-16T12:38:29.925 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/__init__.cpython-38.pyc
2020-05-16T12:38:29.925 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/authorization_code.cpython-38.pyc
2020-05-16T12:38:29.925 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/base.cpython-38.pyc
2020-05-16T12:38:29.925 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/dispatchers.cpython-38.pyc
2020-05-16T12:38:29.926 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/exceptions.cpython-38.pyc
2020-05-16T12:38:29.926 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/hybrid.cpython-38.pyc
2020-05-16T12:38:29.926 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/implicit.cpython-38.pyc
2020-05-16T12:38:29.926 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile-0.46.egg-info/
2020-05-16T12:38:29.926 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile-0.46.egg-info/PKG-INFO
2020-05-16T12:38:29.927 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile-0.46.egg-info/dependency_links.txt
2020-05-16T12:38:29.927 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile-0.46.egg-info/top_level.txt
2020-05-16T12:38:29.927 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile/
2020-05-16T12:38:29.927 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile/__init__.py
2020-05-16T12:38:29.927 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile/olefile.py
2020-05-16T12:38:54.287 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T12:38:54.276+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 since back 2020-05-16T12:38:28.623014+0000 front 2020-05-16T12:38:28.623561+0000 (oldest deadline 2020-05-16T12:38:53.922886+0000)
2020-05-16T12:38:54.288 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T12:38:54.276+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 since back 2020-05-16T12:38:28.623455+0000 front 2020-05-16T12:38:28.623480+0000 (oldest deadline 2020-05-16T12:38:53.922886+0000)
2020-05-16T12:38:54.288 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T12:38:54.276+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 since back 2020-05-16T12:38:28.627164+0000 front 2020-05-16T12:38:28.627247+0000 (oldest deadline 2020-05-16T12:38:53.922886+0000)
2020-05-16T12:38:54.289 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T12:38:54.276+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 since back 2020-05-16T12:38:28.623311+0000 front 2020-05-16T12:38:28.623426+0000 (oldest deadline 2020-05-16T12:38:53.922886+0000)
2020-05-16T12:38:54.293 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T12:38:54.280+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 since back 2020-05-16T12:38:26.468398+0000 front 2020-05-16T12:38:26.469517+0000 (oldest deadline 2020-05-16T12:38:52.367345+0000)
2020-05-16T12:38:54.293 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T12:38:54.280+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 since back 2020-05-16T12:38:26.467510+0000 front 2020-05-16T12:38:26.467465+0000 (oldest deadline 2020-05-16T12:38:52.367345+0000)
2020-05-16T12:38:54.294 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T12:38:54.280+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 since back 2020-05-16T12:38:26.468007+0000 front 2020-05-16T12:38:26.468556+0000 (oldest deadline 2020-05-16T12:38:52.367345+0000)
2020-05-16T12:38:54.294 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T12:38:54.280+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 since back 2020-05-16T12:38:26.468678+0000 front 2020-05-16T12:38:26.468054+0000 (oldest deadline 2020-05-16T12:38:52.367345+0000)
2020-05-16T12:38:54.297 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T12:38:54.280+0000 7f72cece0700 -1 osd.2 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 since back 2020-05-16T12:38:28.062805+0000 front 2020-05-16T12:38:28.062526+0000 (oldest deadline 2020-05-16T12:38:53.362437+0000)
...
It should be the smith38 node was down or something else:
front or back, first ping sent 2020-05-16T12:59:55.310340+0000 (oldest deadline 2020-05-16T13:00:15.310340+0000)
2020-05-16T13:00:44.439 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:00:44.434+0000 7f7ae1214700 -1 osd.3 28 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.4160.0:29 2.2 2:4e99cc3e:::rbd_mirror_snapshot_schedule:head [omap-get-vals in=16b] snapc 0=[] ondisk+read+known_if_redirected e28)
2020-05-16T13:00:44.446 DEBUG:teuthology.orchestra.remote:[Errno None] Unable to connect to port 22 on 172.21.15.38
2020-05-16T13:00:44.447 DEBUG:tasks.ceph:Missed logrotate, node 'smithi038' is offline
As I remembered before the unsharing netns patches I have also hit this, just thought it should be the network issue and didn't pay any attention to it.