Project

General

Profile

Bug #45593

qa: removing network bridge appears to cause dropped packets

Added by Patrick Donnelly 4 months ago. Updated 3 months ago.

Status:
Rejected
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa
Pull request ID:
Crash signature:

Description

2020-05-16T13:01:44.540 INFO:tasks.cephfs.mount:Removing the 'ceph-brx'
2020-05-16T13:01:44.540 INFO:teuthology.orchestra.run:Running command with timeout 300
2020-05-16T13:01:44.540 INFO:teuthology.orchestra.run.smithi114:> sudo bash -c 'ip link set ceph-brx down'
2020-05-16T13:01:44.605 INFO:teuthology.orchestra.run:Running command with timeout 300
2020-05-16T13:01:44.606 INFO:teuthology.orchestra.run.smithi114:> sudo bash -c 'ip link delete ceph-brx'
2020-05-16T13:01:44.606 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:44.601+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000)
2020-05-16T13:01:44.607 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:44.601+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000)
2020-05-16T13:01:44.607 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:44.601+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000)
2020-05-16T13:01:44.607 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:44.601+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000)
2020-05-16T13:01:44.614 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T13:01:44.609+0000 7f72cece0700 -1 osd.2 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 ever on either front or back, first ping sent 2020-05-16T12:59:55.305823+0000 (oldest deadline 2020-05-16T13:00:15.305823+0000)
2020-05-16T13:01:44.614 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T13:01:44.609+0000 7f72cece0700 -1 osd.2 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 ever on either front or back, first ping sent 2020-05-16T12:59:55.305823+0000 (oldest deadline 2020-05-16T13:00:15.305823+0000)
2020-05-16T13:01:44.615 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T13:01:44.609+0000 7f72cece0700 -1 osd.2 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 ever on either front or back, first ping sent 2020-05-16T12:59:55.305823+0000 (oldest deadline 2020-05-16T13:00:15.305823+0000)
2020-05-16T13:01:44.615 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T13:01:44.609+0000 7f72cece0700 -1 osd.2 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 ever on either front or back, first ping sent 2020-05-16T12:59:55.305823+0000 (oldest deadline 2020-05-16T13:00:15.305823+0000)
2020-05-16T13:01:44.615 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T13:01:44.613+0000 7f72cece0700 -1 osd.2 28 get_health_metrics reporting 104 slow ops, oldest is osd_op(mds.0.5:115966 3.5 3:a6630910:::1000000c882.00000000:head [omap-set-header in=274b,omap-set-vals in=5968b,omap-rm-keys in=472b] snapc 0=[] ondisk+write+known_if_redirected+full_force e28)
2020-05-16T13:01:44.713 INFO:teuthology.orchestra.run:Running command with timeout 300
2020-05-16T13:01:44.714 INFO:teuthology.orchestra.run.smithi114:> route
2020-05-16T13:01:44.758 INFO:teuthology.orchestra.run.smithi114.stdout:Kernel IP routing table
2020-05-16T13:01:44.758 INFO:teuthology.orchestra.run.smithi114.stdout:Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
2020-05-16T13:01:44.759 INFO:teuthology.orchestra.run.smithi114.stdout:default         _gateway        0.0.0.0         UG    100    0        0 enp3s0f1
2020-05-16T13:01:44.759 INFO:teuthology.orchestra.run.smithi114.stdout:172.21.0.0      0.0.0.0         255.255.240.0   U     0      0        0 enp3s0f1
2020-05-16T13:01:44.759 INFO:teuthology.orchestra.run.smithi114.stdout:_gateway        0.0.0.0         255.255.255.255 UH    100    0        0 enp3s0f1
2020-05-16T13:01:44.759 INFO:teuthology.orchestra.run:Running command with timeout 300
2020-05-16T13:01:44.760 INFO:teuthology.orchestra.run.smithi114:> sudo bash -c 'iptables -D FORWARD -o enp3s0f1 -i ceph-brx -j ACCEPT'
2020-05-16T13:01:44.857 INFO:teuthology.orchestra.run:Running command with timeout 300
2020-05-16T13:01:44.857 INFO:teuthology.orchestra.run.smithi114:> sudo bash -c 'iptables -D FORWARD -i enp3s0f1 -o ceph-brx -j ACCEPT'
2020-05-16T13:01:44.876 INFO:teuthology.orchestra.run:Running command with timeout 300
2020-05-16T13:01:44.877 INFO:teuthology.orchestra.run.smithi114:> sudo bash -c 'iptables -t nat -D POSTROUTING -s 192.168.255.254/16 -o enp3s0f1 -j MASQUERADE'
2020-05-16T13:01:45.125 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:01:45.121+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 ever on either front or back, first ping sent 2020-05-16T12:59:55.310340+0000 (oldest deadline 2020-05-16T13:00:15.310340+0000)
2020-05-16T13:01:45.126 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:01:45.121+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 ever on either front or back, first ping sent 2020-05-16T12:59:55.310340+0000 (oldest deadline 2020-05-16T13:00:15.310340+0000)
2020-05-16T13:01:45.126 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:01:45.121+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 ever on either front or back, first ping sent 2020-05-16T12:59:55.310340+0000 (oldest deadline 2020-05-16T13:00:15.310340+0000)
2020-05-16T13:01:45.126 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:01:45.121+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 ever on either front or back, first ping sent 2020-05-16T12:59:55.310340+0000 (oldest deadline 2020-05-16T13:00:15.310340+0000)
2020-05-16T13:01:45.127 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:01:45.121+0000 7f7ae1214700 -1 osd.3 28 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.4160.0:29 2.2 2:4e99cc3e:::rbd_mirror_snapshot_schedule:head [omap-get-vals in=16b] snapc 0=[] ondisk+read+known_if_redirected e28)
2020-05-16T13:01:45.352 INFO:tasks.ceph.mon.a.smithi114.stderr:2020-05-16T13:01:45.349+0000 7f65f7f05700 -1 mon.a@1(probing) e1 get_health_metrics reporting 8638 slow ops, oldest is osd_failure(failed timeout osd.4 [v2:172.21.15.38:6816/12946,v1:172.21.15.38:6817/12946] for 25sec e28 v28)
2020-05-16T13:01:45.408 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T13:01:45.405+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 ever on either front or back, first ping sent 2020-05-16T12:59:35.123796+0000 (oldest deadline 2020-05-16T12:59:55.123796+0000)
2020-05-16T13:01:45.409 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T13:01:45.405+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 ever on either front or back, first ping sent 2020-05-16T12:59:35.123796+0000 (oldest deadline 2020-05-16T12:59:55.123796+0000)
2020-05-16T13:01:45.409 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T13:01:45.405+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 ever on either front or back, first ping sent 2020-05-16T12:59:35.123796+0000 (oldest deadline 2020-05-16T12:59:55.123796+0000)
2020-05-16T13:01:45.409 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T13:01:45.405+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 ever on either front or back, first ping sent 2020-05-16T12:59:35.123796+0000 (oldest deadline 2020-05-16T12:59:55.123796+0000)
2020-05-16T13:01:45.561 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:45.557+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000)
2020-05-16T13:01:45.561 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:45.557+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000)
2020-05-16T13:01:45.561 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:45.557+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000)
2020-05-16T13:01:45.561 INFO:tasks.ceph.osd.0.smithi114.stderr:2020-05-16T13:01:45.557+0000 7f2768605700 -1 osd.0 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 ever on either front or back, first ping sent 2020-05-16T12:59:59.405725+0000 (oldest deadline 2020-05-16T13:00:19.405725+0000)

From: /ceph/teuthology-archive/pdonnell-2020-05-16_06:07:05-fs-wip-pdonnell-testing-20200516.030215-distro-basic-smithi/5060521/teuthology.log

History

#1 Updated by Xiubo Li 4 months ago

  • Status changed from New to In Progress

It seems not the removing NAT rule's issue, this began very early and last for minutes already:

2020-05-16T12:38:29.797 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/
2020-05-16T12:38:29.797 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__init__.py
2020-05-16T12:38:29.797 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/authorization_code.py
2020-05-16T12:38:29.797 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/base.py
2020-05-16T12:38:29.924 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/dispatchers.py
2020-05-16T12:38:29.924 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/exceptions.py
2020-05-16T12:38:29.924 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/hybrid.py
2020-05-16T12:38:29.924 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/implicit.py
2020-05-16T12:38:29.925 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/
2020-05-16T12:38:29.925 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/__init__.cpython-38.pyc
2020-05-16T12:38:29.925 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/authorization_code.cpython-38.pyc
2020-05-16T12:38:29.925 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/base.cpython-38.pyc
2020-05-16T12:38:29.925 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/dispatchers.cpython-38.pyc
2020-05-16T12:38:29.926 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/exceptions.cpython-38.pyc
2020-05-16T12:38:29.926 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/hybrid.cpython-38.pyc
2020-05-16T12:38:29.926 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/oauthlib/openid/connect/core/grant_types/__pycache__/implicit.cpython-38.pyc
2020-05-16T12:38:29.926 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile-0.46.egg-info/
2020-05-16T12:38:29.926 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile-0.46.egg-info/PKG-INFO
2020-05-16T12:38:29.927 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile-0.46.egg-info/dependency_links.txt
2020-05-16T12:38:29.927 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile-0.46.egg-info/top_level.txt
2020-05-16T12:38:29.927 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile/
2020-05-16T12:38:29.927 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile/__init__.py
2020-05-16T12:38:29.927 INFO:tasks.workunit.client.0.smithi114.stdout:multiple_rsync_payload.115994/python3/dist-packages/olefile/olefile.py
2020-05-16T12:38:54.287 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T12:38:54.276+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 since back 2020-05-16T12:38:28.623014+0000 front 2020-05-16T12:38:28.623561+0000 (oldest deadline 2020-05-16T12:38:53.922886+0000)
2020-05-16T12:38:54.288 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T12:38:54.276+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 since back 2020-05-16T12:38:28.623455+0000 front 2020-05-16T12:38:28.623480+0000 (oldest deadline 2020-05-16T12:38:53.922886+0000)
2020-05-16T12:38:54.288 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T12:38:54.276+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 since back 2020-05-16T12:38:28.627164+0000 front 2020-05-16T12:38:28.627247+0000 (oldest deadline 2020-05-16T12:38:53.922886+0000)
2020-05-16T12:38:54.289 INFO:tasks.ceph.osd.1.smithi114.stderr:2020-05-16T12:38:54.276+0000 7f473bcfe700 -1 osd.1 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 since back 2020-05-16T12:38:28.623311+0000 front 2020-05-16T12:38:28.623426+0000 (oldest deadline 2020-05-16T12:38:53.922886+0000)
2020-05-16T12:38:54.293 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T12:38:54.280+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 since back 2020-05-16T12:38:26.468398+0000 front 2020-05-16T12:38:26.469517+0000 (oldest deadline 2020-05-16T12:38:52.367345+0000)
2020-05-16T12:38:54.293 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T12:38:54.280+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6812 osd.5 since back 2020-05-16T12:38:26.467510+0000 front 2020-05-16T12:38:26.467465+0000 (oldest deadline 2020-05-16T12:38:52.367345+0000)
2020-05-16T12:38:54.294 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T12:38:54.280+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6804 osd.6 since back 2020-05-16T12:38:26.468007+0000 front 2020-05-16T12:38:26.468556+0000 (oldest deadline 2020-05-16T12:38:52.367345+0000)
2020-05-16T12:38:54.294 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T12:38:54.280+0000 7f7ae1214700 -1 osd.3 28 heartbeat_check: no reply from 172.21.15.38:6828 osd.7 since back 2020-05-16T12:38:26.468678+0000 front 2020-05-16T12:38:26.468054+0000 (oldest deadline 2020-05-16T12:38:52.367345+0000)
2020-05-16T12:38:54.297 INFO:tasks.ceph.osd.2.smithi114.stderr:2020-05-16T12:38:54.280+0000 7f72cece0700 -1 osd.2 28 heartbeat_check: no reply from 172.21.15.38:6820 osd.4 since back 2020-05-16T12:38:28.062805+0000 front 2020-05-16T12:38:28.062526+0000 (oldest deadline 2020-05-16T12:38:53.362437+0000)
...

It should be the smith38 node was down or something else:

front or back, first ping sent 2020-05-16T12:59:55.310340+0000 (oldest deadline 2020-05-16T13:00:15.310340+0000)
2020-05-16T13:00:44.439 INFO:tasks.ceph.osd.3.smithi114.stderr:2020-05-16T13:00:44.434+0000 7f7ae1214700 -1 osd.3 28 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.4160.0:29 2.2 2:4e99cc3e:::rbd_mirror_snapshot_schedule:head [omap-get-vals in=16b] snapc 0=[] ondisk+read+known_if_redirected e28)
2020-05-16T13:00:44.446 DEBUG:teuthology.orchestra.remote:[Errno None] Unable to connect to port 22 on 172.21.15.38
2020-05-16T13:00:44.447 DEBUG:tasks.ceph:Missed logrotate, node 'smithi038' is offline

As I remembered before the unsharing netns patches I have also hit this, just thought it should be the network issue and didn't pay any attention to it.

#2 Updated by Xiubo Li 4 months ago

The netns was only set up on simithi114, but connections to smiithi038 test node was lost(node 'smithi038' is offline) from the deploy node and smithi114 at the same time.

#3 Updated by Xiubo Li 3 months ago

  • Status changed from In Progress to Rejected

This is not the ceph qa test suite's bug, the root cause is the node itself get lost.

Also available in: Atom PDF