Project

General

Profile

Actions

Bug #59380

open

rados/singleton-nomsgr: test failing from "Health check failed: 1 full osd(s) (OSD_FULL)" and "Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)"

Added by Laura Flores about 1 year ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2023-03-31T00:25:12.169 INFO:tasks.rgw.client.0.smithi157.stdout:2023-03-31T00:25:12.167+0000 7f470012f700 -1 received  signal: Terminated from /usr/bin/python3 /bin/daemon-helper term radosgw --rgw-frontends beast port=80 -n client.0 --cluster ceph -k /etc/ceph/ceph.client.0.keyring --log-file /var/log/ceph/rgw.ceph.client.0.log --rgw_ops_log_socket_path /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock --foreground  (PID: 110884) UID: 0
2023-03-31T00:25:12.169 INFO:tasks.rgw.client.0.smithi157.stdout:2023-03-31T00:25:12.167+0000 7f47041857c0 -1 shutting down
2023-03-31T00:25:14.065 INFO:tasks.ceph.mon.a.smithi157.stderr:2023-03-31T00:25:14.063+0000 7f8bfe441700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 full osd(s) (OSD_FULL)
2023-03-31T00:25:15.662 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~0s
2023-03-31T00:25:17.523 INFO:tasks.ceph.mds.a.smithi157.stderr:2023-03-31T00:25:17.522+0000 7f1f542c9700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request.
2023-03-31T00:25:18.270 INFO:tasks.rgw.client.0:Stopped
2023-03-31T00:25:18.271 DEBUG:teuthology.orchestra.run.smithi157:> rm -f /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock
2023-03-31T00:25:18.298 DEBUG:teuthology.orchestra.run.smithi157:> sudo rm -f /etc/ceph/vault-root-token
2023-03-31T00:25:18.372 DEBUG:teuthology.orchestra.run.smithi157:> radosgw-admin gc process --include-all
2023-03-31T00:25:21.168 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 has been restored
2023-03-31T12:11:11.597 DEBUG:teuthology.exit:Got signal 15; running 1 handler...
2023-03-31T12:11:11.634 DEBUG:teuthology.task.console_log:Killing console logger for smithi157
2023-03-31T12:11:11.635 DEBUG:teuthology.exit:Finished running handlers

/a/yuriw-2023-03-30_21:29:24-rados-wip-yuri2-testing-2023-03-30-0826-distro-default-smithi/7227455
/a/yuriw-2023-04-04_15:24:40-rados-wip-yuri4-testing-2023-03-31-1237-distro-default-smithi/7231345

Actions #1

Updated by Laura Flores about 1 year ago

Also in the logs:

/a/yuriw-2023-04-04_15:22:56-rados-wip-yuri2-testing-2023-03-30-0826-distro-default-smithi/7231083

2023-04-04T18:50:13.147 INFO:tasks.rgw.client.0.smithi191.stdout:2023-04-04T18:50:13.142+0000 7ff108317700 -1 received  signal: Terminated from /usr/bin/python3 /usr/bin/daemon-helper term radosgw --rgw-frontends beast port=80 -n client.0 --cluster ceph -k /etc/ceph/ceph.client.0.keyring --log-file /var/log/ceph/rgw.ceph.client.0.log --rgw_ops_log_socket_path /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock --foreground  (PID: 29378) UID: 0
2023-04-04T18:50:13.147 INFO:tasks.rgw.client.0.smithi191.stdout:2023-04-04T18:50:13.142+0000 7ff10bb62f80 -1 shutting down
2023-04-04T18:50:13.997 INFO:tasks.ceph.mon.a.smithi191.stderr:2023-04-04T18:50:13.994+0000 7fe05488a700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
2023-04-04T18:50:17.024 INFO:tasks.ceph.mon.a.smithi191.stderr:2023-04-04T18:50:17.022+0000 7fe05488a700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 full osd(s) (OSD_FULL)
2023-04-04T18:50:18.211 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~0s
2023-04-04T18:50:19.020 INFO:tasks.ceph.mds.a.smithi191.stderr:2023-04-04T18:50:19.018+0000 7fecb7b6d700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request.
2023-04-04T18:50:19.249 INFO:tasks.rgw.client.0:Stopped
2023-04-04T18:50:19.249 DEBUG:teuthology.orchestra.run.smithi191:> rm -f /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock
2023-04-04T18:50:19.258 DEBUG:teuthology.orchestra.run.smithi191:> sudo rm -f /etc/ceph/vault-root-token
2023-04-04T18:50:19.274 DEBUG:teuthology.orchestra.run.smithi191:> radosgw-admin gc process --include-all
2023-04-04T18:50:23.714 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 has been restored
2023-04-05T06:39:49.154 DEBUG:teuthology.exit:Got signal 15; running 1 handler...
2023-04-05T06:39:49.186 DEBUG:teuthology.task.console_log:Killing console logger for smithi191
2023-04-05T06:39:49.187 DEBUG:teuthology.exit:Finished running handlers

Actions #2

Updated by Laura Flores about 1 year ago

  • Subject changed from rados/singleton-nomsgr: Health check failed: 1 full osd(s) (OSD_FULL) to rados/singleton-nomsgr: test failing from "Health check failed: 1 full osd(s) (OSD_FULL)" and "Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)"
Actions #3

Updated by Laura Flores 12 months ago

/a/yuriw-2023-04-25_21:30:50-rados-wip-yuri3-testing-2023-04-25-1147-distro-default-smithi/7253420

Actions #4

Updated by Nitzan Mordechai 12 months ago

  • Project changed from RADOS to rgw

it doesn't look like it is OSD_FULL issue, we didn't got to the point that we are scanning for log-ignore messages, the job got stuck for 12 hours before teuthology killed it.
i think its rgw client that we are waiting for that had some core dump that we didn't got:

  -306> 2023-04-04T22:32:57.496+0000 7f8b3005f7c0  5 rgw main: tl::expected<std::pair<boost::container::flat_map<long unsigned int, logback_generation>, obj_version>, boost::system::error_code> logback_generations:
:read(const DoutPrefixProvider*, optional_yield):410: oid=data_loggenerations_metadata not found
  -305> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding auth protocol: cephx
  -304> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding auth protocol: cephx
  -303> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding auth protocol: cephx
  -302> 2023-04-04T22:32:57.526+0000 7f8b11fd4700  2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start
  -301> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure
  -300> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc
  -299> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure
  -298> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc
  -297> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure
  -296> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc
  -295> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure
  -294> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc
  -293> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure
  -292> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc
  -291> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure
  -290> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.0.keyring
  -289> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 asok(0x55f851b3e000) register_command cache list hook 0x55f851854050
  -288> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 asok(0x55f851b3e000) register_command cache inspect hook 0x55f851854050
  -287> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 asok(0x55f851b3e000) register_command cache erase hook 0x55f851854050
  -286> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  5 asok(0x55f851b3e000) register_command cache zap hook 0x55f851854050
  -285> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 10 monclient: get_monmap_and_config
  -284> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 10 monclient: build_initial_monmap
  -283> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0  1 build_initial for_mkfs: 0
  -282> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 10 monclient: monmap:

If anyone from RGW can check that

Actions #5

Updated by Laura Flores 12 months ago

/a/yuriw-2023-04-24_22:54:45-rados-wip-yuri7-testing-2023-04-19-1343-distro-default-smithi/7250243

Actions #6

Updated by Casey Bodley 12 months ago

i don't see evidence of an rgw crash. but if an OSD is full, couldn't that cause radosgw to hang waiting on requests to it?

Actions #7

Updated by Nitzan Mordechai 12 months ago

Casey Bodley, can you please check: /a/yuriw-2023-04-24_22:54:45-rados-wip-yuri7-testing-2023-04-19-1343-distro-default-smithi/7250243/remote/smithi138/log/rgw.ceph.client.0.log.gz
for example

Actions #8

Updated by Laura Flores 11 months ago

/a/yuriw-2023-05-16_23:44:06-rados-wip-yuri10-testing-2023-05-16-1243-distro-default-smithi/7276080
/a/yuriw-2023-05-16_23:44:06-rados-wip-yuri10-testing-2023-05-16-1243-distro-default-smithi/7276235

Actions #9

Updated by Laura Flores 11 months ago

/a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271099
/a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271254

Actions #10

Updated by Laura Flores 11 months ago

/a/yuriw-2023-05-19_19:19:25-rados-wip-yuri11-testing-2023-05-19-0836-distro-default-smithi/7279399
/a/yuriw-2023-05-19_19:19:25-rados-wip-yuri11-testing-2023-05-19-0836-distro-default-smithi/7279244

Actions #11

Updated by Laura Flores 11 months ago

/a/yuriw-2023-06-01_19:33:38-rados-wip-yuri-testing-2023-06-01-0746-distro-default-smithi/7293994

Actions #12

Updated by Laura Flores 11 months ago

/a/yuriw-2023-06-06_14:45:18-rados-wip-yuri7-testing-2023-06-05-1505-distro-default-smithi/7296832

Actions #13

Updated by Laura Flores 10 months ago

/a/yuriw-2023-06-13_18:33:48-rados-wip-yuri10-testing-2023-06-02-1406-distro-default-smithi/7302949

Actions #14

Updated by Laura Flores 10 months ago

/a/yuriw-2023-07-03_15:32:36-rados-wip-yuri5-testing-2023-06-28-1515-distro-default-smithi/7325426

Actions #15

Updated by Laura Flores 10 months ago

/a/yuriw-2023-07-03_15:30:40-rados-wip-yuri7-testing-2023-06-23-1022-distro-default-smithi/7324875

Actions #16

Updated by Casey Bodley 10 months ago

i do see rgw crashes in these logs:

   -12> 2023-05-17T02:30:05.789+0000 7f8007d53700  1 do_command 'config diff' '{prefix=config diff}'
   -11> 2023-05-17T02:30:05.789+0000 7f8007d53700  1 do_command 'config diff' '{prefix=config diff}' result is 0 bytes
   -10> 2023-05-17T02:30:05.789+0000 7f8007d53700  1 do_command 'config help' '{prefix=config help}'
    -9> 2023-05-17T02:30:05.797+0000 7f8007d53700  1 do_command 'config help' '{prefix=config help}' result is 0 bytes
    -8> 2023-05-17T02:30:05.801+0000 7f8007d53700  1 do_command 'config show' '{prefix=config show}'
    -7> 2023-05-17T02:30:05.805+0000 7f8007d53700  1 do_command 'config show' '{prefix=config show}' result is 0 bytes
    -6> 2023-05-17T02:30:05.805+0000 7f8007d53700  1 do_command 'counter dump' '{prefix=counter dump}'
    -5> 2023-05-17T02:30:05.805+0000 7f8007d53700  1 do_command 'counter dump' '{prefix=counter dump}' result is 0 bytes
    -4> 2023-05-17T02:30:05.805+0000 7f8007d53700  1 do_command 'counter schema' '{prefix=counter schema}'
    -3> 2023-05-17T02:30:05.805+0000 7f8007d53700  1 do_command 'counter schema' '{prefix=counter schema}' result is 0 bytes
    -2> 2023-05-17T02:30:05.805+0000 7f8007d53700  1 do_command 'injectargs' '{prefix=injectargs}'
    -1> 2023-05-17T02:30:05.805+0000 7f8007d53700  1 do_command 'injectargs' '{prefix=injectargs}' result is 0 bytes
     0> 2023-05-17T02:30:05.805+0000 7f8007d53700  1 do_command 'log dump' '{prefix=log dump}'

but there's no stack trace, and ceph_test_admin_socket_output passes anyway

Actions #17

Updated by Kamoltat (Junior) Sirivadhna 10 months ago

/a/yuriw-2023-07-10_18:41:02-rados-wip-yuri6-testing-2023-07-10-0816-distro-default-smithi/7332405

Actions #18

Updated by Laura Flores 9 months ago

/a/yuriw-2023-07-17_14:37:31-rados-wip-yuri-testing-2023-07-14-1641-distro-default-smithi/7341584

Actions #19

Updated by Laura Flores 8 months ago

/a/yuriw-2023-08-17_21:18:20-rados-wip-yuri11-testing-2023-08-17-0823-distro-default-smithi/7372055

Actions #20

Updated by Matan Breizman 8 months ago

/a//yuriw-2023-08-22_18:16:03-rados-wip-yuri10-testing-2023-08-17-1444-distro-default-smithi/7376756
/a//yuriw-2023-08-22_18:16:03-rados-wip-yuri10-testing-2023-08-17-1444-distro-default-smithi/7376912

Actions #21

Updated by Laura Flores 8 months ago

/a/yuriw-2023-08-15_18:58:56-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7369398

Actions #22

Updated by Laura Flores 8 months ago

/a/lflores-2023-09-05_22:05:24-rados-wip-yuri8-testing-2023-08-28-1340-distro-default-smithi/7389178

Actions #23

Updated by Laura Flores 8 months ago

/a/lflores-2023-09-06_18:23:03-rados-wip-yuri-testing-2023-08-25-0809-distro-default-smithi/7390358

Actions #24

Updated by Laura Flores 8 months ago

/a/lflores-2023-09-08_20:36:06-rados-wip-lflores-testing-2-2023-09-08-1755-distro-default-smithi/7391626

Actions #25

Updated by Laura Flores 6 months ago

/a/yuriw-2023-10-24_00:11:54-rados-wip-yuri4-testing-2023-10-23-0903-distro-default-smithi/7435687

Actions #26

Updated by Nitzan Mordechai 6 months ago

/a/yuriw-2023-10-30_15:34:36-rados-wip-yuri10-testing-2023-10-27-0804-distro-default-smithi/7441165
/a/yuriw-2023-10-30_15:34:36-rados-wip-yuri10-testing-2023-10-27-0804-distro-default-smithi/7441319

Actions #27

Updated by Laura Flores 6 months ago

  • Backport set to reef

/a/yuriw-2023-10-31_14:43:48-rados-wip-yuri4-testing-2023-10-30-1117-distro-default-smithi/7442070

Actions #28

Updated by Laura Flores 6 months ago

/a/yuriw-2023-10-24_00:11:03-rados-wip-yuri2-testing-2023-10-23-0917-distro-default-smithi/7435871

Actions #29

Updated by Laura Flores 5 months ago

/a/yuriw-2023-12-07_16:42:12-rados-wip-yuri2-testing-2023-12-06-1239-distro-default-smithi/7482272

Actions #30

Updated by Nitzan Mordechai 5 months ago

/a/yuriw-2023-12-11_23:27:14-rados-wip-yuri8-testing-2023-12-11-1101-distro-default-smithi/7487804
/a/yuriw-2023-12-11_23:27:14-rados-wip-yuri8-testing-2023-12-11-1101-distro-default-smithi/7487806
/a/yuriw-2023-12-11_23:27:14-rados-wip-yuri8-testing-2023-12-11-1101-distro-default-smithi/7487647

Actions #31

Updated by Aishwarya Mathuria 4 months ago

/a/yuriw-2024-01-03_16:19:00-rados-wip-yuri6-testing-2024-01-02-0832-distro-default-smithi/7505468

Actions #32

Updated by Laura Flores 3 months ago

/a/yuriw-2024-01-18_21:18:17-rados-wip-yuri8-testing-2024-01-18-0823-distro-default-smithi/7521174

Actions #33

Updated by Matan Breizman 3 months ago

/a/yuriw-2024-02-09_00:15:46-rados-wip-yuri2-testing-2024-02-08-0727-distro-default-smithi/7553401

Actions #34

Updated by Aishwarya Mathuria about 1 month ago

/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603379/
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603639/

Actions #35

Updated by Aishwarya Mathuria about 1 month ago

/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609873

Actions #36

Updated by Aishwarya Mathuria about 1 month ago

/a/yuriw-2024-03-20_14:28:32-rados-wip-yuri2-testing-2024-03-13-0827-distro-default-smithi/7612365/

Actions #37

Updated by Laura Flores about 1 month ago

/a/yuriw-2024-03-20_18:33:32-rados-wip-yuri6-testing-2024-03-18-1406-squid-distro-default-smithi/7613140

Actions

Also available in: Atom PDF