Project

General

Profile

Actions

Bug #64471

open

kernel: upgrades from quincy/v18.2.[01]/reef to main|squid fail with kernel oops

Added by Patrick Donnelly 3 months ago. Updated 10 days ago.

Status:
New
Priority:
Urgent
Assignee:
Category:
fs/ceph
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout:{
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout:    "target_image": "quay.ceph.io/ceph-ci/ceph:f78a58c0ffd401d1493058a1022c35f011d65275",
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout:    "in_progress": true,
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout:    "which": "Upgrading all daemon types on all hosts",
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout:    "services_complete": [],
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout:    "progress": "",
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout:    "message": "Doing first pull of quay.ceph.io/ceph-ci/ceph:f78a58c0ffd401d1493058a1022c35f011d65275 image",
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout:    "is_paused": false
2024-02-16T02:34:47.515 INFO:teuthology.orchestra.run.smithi032.stdout:}
...
2024-02-16T02:34:52.075 INFO:tasks.workunit.client.0.smithi032.stderr:+ pushd fsstress
2024-02-16T02:34:52.076 INFO:tasks.workunit.client.0.smithi032.stdout:~/cephtest/mnt.0/client.0/tmp/fsstress ~/cephtest/mnt.0/client.0/tmp
2024-02-16T02:34:52.076 INFO:tasks.workunit.client.0.smithi032.stderr:+ wget -q -O ltp-full.tgz http://download.ceph.com/qa/ltp-full-20091231.tgz
2024-02-16T02:35:10.364 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:09 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:09.988+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6804 osd.3 since back 2024-02-16T02:34:45.611069+0000 front 2024-02-16T02:34:45.611050+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)
2024-02-16T02:35:10.364 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:09 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:09.988+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6812 osd.4 since back 2024-02-16T02:34:45.611191+0000 front 2024-02-16T02:34:45.611160+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)
2024-02-16T02:35:10.364 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:09 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:09.988+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6820 osd.5 since back 2024-02-16T02:34:45.611126+0000 front 2024-02-16T02:34:45.611214+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)
2024-02-16T02:35:11.316 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:10 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:10.982+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6804 osd.3 since back 2024-02-16T02:34:45.611069+0000 front 2024-02-16T02:34:45.611050+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)
2024-02-16T02:35:11.316 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:10 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:10.982+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6812 osd.4 since back 2024-02-16T02:34:45.611191+0000 front 2024-02-16T02:34:45.611160+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)
2024-02-16T02:35:11.317 INFO:journalctl@ceph.osd.2.smithi032.stdout:Feb 16 02:35:10 smithi032 ceph-a2883a70-cc72-11ee-95ba-87774f69a715-osd-2[62304]: 2024-02-16T02:35:10.982+0000 7f5e5529a700 -1 osd.2 46 heartbeat_check: no reply from 172.21.15.196:6820 osd.5 since back 2024-02-16T02:34:45.611126+0000 front 2024-02-16T02:34:45.611214+0000 (oldest deadline 2024-02-16T02:35:09.711083+0000)

From: /teuthology/pdonnell-2024-02-16_01:25:08-fs:upgrade:mds_upgrade_sequence-wip-batrick-testing-20240215.160715-distro-default-smithi/7561891/teuthology.log

and many others in that run. It's quite reproducible. I don't think it happens with quincy -> main.

This might be related to #55258.


Related issues 1 (0 open1 closed)

Related to Linux kernel client - Bug #55258: lots of "heartbeat_check: no reply from X.X.X.X" in OSD logsResolvedJeff Layton

Actions
Actions

Also available in: Atom PDF