Bug #59380
openrados/singleton-nomsgr: test failing from "Health check failed: 1 full osd(s) (OSD_FULL)" and "Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)"
0%
Description
2023-03-31T00:25:12.169 INFO:tasks.rgw.client.0.smithi157.stdout:2023-03-31T00:25:12.167+0000 7f470012f700 -1 received signal: Terminated from /usr/bin/python3 /bin/daemon-helper term radosgw --rgw-frontends beast port=80 -n client.0 --cluster ceph -k /etc/ceph/ceph.client.0.keyring --log-file /var/log/ceph/rgw.ceph.client.0.log --rgw_ops_log_socket_path /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock --foreground (PID: 110884) UID: 0
2023-03-31T00:25:12.169 INFO:tasks.rgw.client.0.smithi157.stdout:2023-03-31T00:25:12.167+0000 7f47041857c0 -1 shutting down
2023-03-31T00:25:14.065 INFO:tasks.ceph.mon.a.smithi157.stderr:2023-03-31T00:25:14.063+0000 7f8bfe441700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 full osd(s) (OSD_FULL)
2023-03-31T00:25:15.662 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~0s
2023-03-31T00:25:17.523 INFO:tasks.ceph.mds.a.smithi157.stderr:2023-03-31T00:25:17.522+0000 7f1f542c9700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request.
2023-03-31T00:25:18.270 INFO:tasks.rgw.client.0:Stopped
2023-03-31T00:25:18.271 DEBUG:teuthology.orchestra.run.smithi157:> rm -f /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock
2023-03-31T00:25:18.298 DEBUG:teuthology.orchestra.run.smithi157:> sudo rm -f /etc/ceph/vault-root-token
2023-03-31T00:25:18.372 DEBUG:teuthology.orchestra.run.smithi157:> radosgw-admin gc process --include-all
2023-03-31T00:25:21.168 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 has been restored
2023-03-31T12:11:11.597 DEBUG:teuthology.exit:Got signal 15; running 1 handler...
2023-03-31T12:11:11.634 DEBUG:teuthology.task.console_log:Killing console logger for smithi157
2023-03-31T12:11:11.635 DEBUG:teuthology.exit:Finished running handlers
/a/yuriw-2023-03-30_21:29:24-rados-wip-yuri2-testing-2023-03-30-0826-distro-default-smithi/7227455
/a/yuriw-2023-04-04_15:24:40-rados-wip-yuri4-testing-2023-03-31-1237-distro-default-smithi/7231345
Updated by Laura Flores about 1 year ago
Also in the logs:
/a/yuriw-2023-04-04_15:22:56-rados-wip-yuri2-testing-2023-03-30-0826-distro-default-smithi/7231083
2023-04-04T18:50:13.147 INFO:tasks.rgw.client.0.smithi191.stdout:2023-04-04T18:50:13.142+0000 7ff108317700 -1 received signal: Terminated from /usr/bin/python3 /usr/bin/daemon-helper term radosgw --rgw-frontends beast port=80 -n client.0 --cluster ceph -k /etc/ceph/ceph.client.0.keyring --log-file /var/log/ceph/rgw.ceph.client.0.log --rgw_ops_log_socket_path /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock --foreground (PID: 29378) UID: 0
2023-04-04T18:50:13.147 INFO:tasks.rgw.client.0.smithi191.stdout:2023-04-04T18:50:13.142+0000 7ff10bb62f80 -1 shutting down
2023-04-04T18:50:13.997 INFO:tasks.ceph.mon.a.smithi191.stderr:2023-04-04T18:50:13.994+0000 7fe05488a700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
2023-04-04T18:50:17.024 INFO:tasks.ceph.mon.a.smithi191.stderr:2023-04-04T18:50:17.022+0000 7fe05488a700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 full osd(s) (OSD_FULL)
2023-04-04T18:50:18.211 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~0s
2023-04-04T18:50:19.020 INFO:tasks.ceph.mds.a.smithi191.stderr:2023-04-04T18:50:19.018+0000 7fecb7b6d700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request.
2023-04-04T18:50:19.249 INFO:tasks.rgw.client.0:Stopped
2023-04-04T18:50:19.249 DEBUG:teuthology.orchestra.run.smithi191:> rm -f /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock
2023-04-04T18:50:19.258 DEBUG:teuthology.orchestra.run.smithi191:> sudo rm -f /etc/ceph/vault-root-token
2023-04-04T18:50:19.274 DEBUG:teuthology.orchestra.run.smithi191:> radosgw-admin gc process --include-all
2023-04-04T18:50:23.714 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 has been restored
2023-04-05T06:39:49.154 DEBUG:teuthology.exit:Got signal 15; running 1 handler...
2023-04-05T06:39:49.186 DEBUG:teuthology.task.console_log:Killing console logger for smithi191
2023-04-05T06:39:49.187 DEBUG:teuthology.exit:Finished running handlers
Updated by Laura Flores about 1 year ago
- Subject changed from rados/singleton-nomsgr: Health check failed: 1 full osd(s) (OSD_FULL) to rados/singleton-nomsgr: test failing from "Health check failed: 1 full osd(s) (OSD_FULL)" and "Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)"
Updated by Laura Flores 12 months ago
/a/yuriw-2023-04-25_21:30:50-rados-wip-yuri3-testing-2023-04-25-1147-distro-default-smithi/7253420
Updated by Nitzan Mordechai 12 months ago
- Project changed from RADOS to rgw
it doesn't look like it is OSD_FULL issue, we didn't got to the point that we are scanning for log-ignore messages, the job got stuck for 12 hours before teuthology killed it.
i think its rgw client that we are waiting for that had some core dump that we didn't got:
-306> 2023-04-04T22:32:57.496+0000 7f8b3005f7c0 5 rgw main: tl::expected<std::pair<boost::container::flat_map<long unsigned int, logback_generation>, obj_version>, boost::system::error_code> logback_generations: :read(const DoutPrefixProvider*, optional_yield):410: oid=data_loggenerations_metadata not found -305> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding auth protocol: cephx -304> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding auth protocol: cephx -303> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding auth protocol: cephx -302> 2023-04-04T22:32:57.526+0000 7f8b11fd4700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start -301> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -300> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc -299> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -298> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc -297> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -296> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc -295> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -294> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc -293> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -292> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc -291> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -290> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.0.keyring -289> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 asok(0x55f851b3e000) register_command cache list hook 0x55f851854050 -288> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 asok(0x55f851b3e000) register_command cache inspect hook 0x55f851854050 -287> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 asok(0x55f851b3e000) register_command cache erase hook 0x55f851854050 -286> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 asok(0x55f851b3e000) register_command cache zap hook 0x55f851854050 -285> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 10 monclient: get_monmap_and_config -284> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 10 monclient: build_initial_monmap -283> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 1 build_initial for_mkfs: 0 -282> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 10 monclient: monmap:
If anyone from RGW can check that
Updated by Laura Flores 12 months ago
/a/yuriw-2023-04-24_22:54:45-rados-wip-yuri7-testing-2023-04-19-1343-distro-default-smithi/7250243
Updated by Casey Bodley 12 months ago
i don't see evidence of an rgw crash. but if an OSD is full, couldn't that cause radosgw to hang waiting on requests to it?
Updated by Nitzan Mordechai 12 months ago
Casey Bodley, can you please check: /a/yuriw-2023-04-24_22:54:45-rados-wip-yuri7-testing-2023-04-19-1343-distro-default-smithi/7250243/remote/smithi138/log/rgw.ceph.client.0.log.gz
for example
Updated by Laura Flores 11 months ago
/a/yuriw-2023-05-16_23:44:06-rados-wip-yuri10-testing-2023-05-16-1243-distro-default-smithi/7276080
/a/yuriw-2023-05-16_23:44:06-rados-wip-yuri10-testing-2023-05-16-1243-distro-default-smithi/7276235
Updated by Laura Flores 11 months ago
/a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271099
/a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271254
Updated by Laura Flores 11 months ago
/a/yuriw-2023-05-19_19:19:25-rados-wip-yuri11-testing-2023-05-19-0836-distro-default-smithi/7279399
/a/yuriw-2023-05-19_19:19:25-rados-wip-yuri11-testing-2023-05-19-0836-distro-default-smithi/7279244
Updated by Laura Flores 11 months ago
/a/yuriw-2023-06-01_19:33:38-rados-wip-yuri-testing-2023-06-01-0746-distro-default-smithi/7293994
Updated by Laura Flores 11 months ago
/a/yuriw-2023-06-06_14:45:18-rados-wip-yuri7-testing-2023-06-05-1505-distro-default-smithi/7296832
Updated by Laura Flores 10 months ago
/a/yuriw-2023-06-13_18:33:48-rados-wip-yuri10-testing-2023-06-02-1406-distro-default-smithi/7302949
Updated by Laura Flores 10 months ago
/a/yuriw-2023-07-03_15:32:36-rados-wip-yuri5-testing-2023-06-28-1515-distro-default-smithi/7325426
Updated by Laura Flores 10 months ago
/a/yuriw-2023-07-03_15:30:40-rados-wip-yuri7-testing-2023-06-23-1022-distro-default-smithi/7324875
Updated by Casey Bodley 10 months ago
i do see rgw crashes in these logs:
-12> 2023-05-17T02:30:05.789+0000 7f8007d53700 1 do_command 'config diff' '{prefix=config diff}' -11> 2023-05-17T02:30:05.789+0000 7f8007d53700 1 do_command 'config diff' '{prefix=config diff}' result is 0 bytes -10> 2023-05-17T02:30:05.789+0000 7f8007d53700 1 do_command 'config help' '{prefix=config help}' -9> 2023-05-17T02:30:05.797+0000 7f8007d53700 1 do_command 'config help' '{prefix=config help}' result is 0 bytes -8> 2023-05-17T02:30:05.801+0000 7f8007d53700 1 do_command 'config show' '{prefix=config show}' -7> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'config show' '{prefix=config show}' result is 0 bytes -6> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'counter dump' '{prefix=counter dump}' -5> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'counter dump' '{prefix=counter dump}' result is 0 bytes -4> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'counter schema' '{prefix=counter schema}' -3> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'counter schema' '{prefix=counter schema}' result is 0 bytes -2> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'injectargs' '{prefix=injectargs}' -1> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'injectargs' '{prefix=injectargs}' result is 0 bytes 0> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'log dump' '{prefix=log dump}'
but there's no stack trace, and ceph_test_admin_socket_output
passes anyway
Updated by Kamoltat (Junior) Sirivadhna 10 months ago
/a/yuriw-2023-07-10_18:41:02-rados-wip-yuri6-testing-2023-07-10-0816-distro-default-smithi/7332405
Updated by Laura Flores 9 months ago
/a/yuriw-2023-07-17_14:37:31-rados-wip-yuri-testing-2023-07-14-1641-distro-default-smithi/7341584
Updated by Laura Flores 8 months ago
/a/yuriw-2023-08-17_21:18:20-rados-wip-yuri11-testing-2023-08-17-0823-distro-default-smithi/7372055
Updated by Matan Breizman 8 months ago
/a//yuriw-2023-08-22_18:16:03-rados-wip-yuri10-testing-2023-08-17-1444-distro-default-smithi/7376756
/a//yuriw-2023-08-22_18:16:03-rados-wip-yuri10-testing-2023-08-17-1444-distro-default-smithi/7376912
Updated by Laura Flores 8 months ago
/a/yuriw-2023-08-15_18:58:56-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7369398
Updated by Laura Flores 8 months ago
/a/lflores-2023-09-05_22:05:24-rados-wip-yuri8-testing-2023-08-28-1340-distro-default-smithi/7389178
Updated by Laura Flores 8 months ago
/a/lflores-2023-09-06_18:23:03-rados-wip-yuri-testing-2023-08-25-0809-distro-default-smithi/7390358
Updated by Laura Flores 8 months ago
/a/lflores-2023-09-08_20:36:06-rados-wip-lflores-testing-2-2023-09-08-1755-distro-default-smithi/7391626
Updated by Laura Flores 6 months ago
/a/yuriw-2023-10-24_00:11:54-rados-wip-yuri4-testing-2023-10-23-0903-distro-default-smithi/7435687
Updated by Nitzan Mordechai 6 months ago
/a/yuriw-2023-10-30_15:34:36-rados-wip-yuri10-testing-2023-10-27-0804-distro-default-smithi/7441165
/a/yuriw-2023-10-30_15:34:36-rados-wip-yuri10-testing-2023-10-27-0804-distro-default-smithi/7441319
Updated by Laura Flores 6 months ago
- Backport set to reef
/a/yuriw-2023-10-31_14:43:48-rados-wip-yuri4-testing-2023-10-30-1117-distro-default-smithi/7442070
Updated by Laura Flores 6 months ago
/a/yuriw-2023-10-24_00:11:03-rados-wip-yuri2-testing-2023-10-23-0917-distro-default-smithi/7435871
Updated by Laura Flores 5 months ago
/a/yuriw-2023-12-07_16:42:12-rados-wip-yuri2-testing-2023-12-06-1239-distro-default-smithi/7482272
Updated by Nitzan Mordechai 5 months ago
/a/yuriw-2023-12-11_23:27:14-rados-wip-yuri8-testing-2023-12-11-1101-distro-default-smithi/7487804
/a/yuriw-2023-12-11_23:27:14-rados-wip-yuri8-testing-2023-12-11-1101-distro-default-smithi/7487806
/a/yuriw-2023-12-11_23:27:14-rados-wip-yuri8-testing-2023-12-11-1101-distro-default-smithi/7487647
Updated by Aishwarya Mathuria 4 months ago
/a/yuriw-2024-01-03_16:19:00-rados-wip-yuri6-testing-2024-01-02-0832-distro-default-smithi/7505468
Updated by Laura Flores 3 months ago
/a/yuriw-2024-01-18_21:18:17-rados-wip-yuri8-testing-2024-01-18-0823-distro-default-smithi/7521174
Updated by Matan Breizman 3 months ago
/a/yuriw-2024-02-09_00:15:46-rados-wip-yuri2-testing-2024-02-08-0727-distro-default-smithi/7553401
Updated by Aishwarya Mathuria about 1 month ago
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603379/
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603639/
Updated by Aishwarya Mathuria about 1 month ago
/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609873
Updated by Aishwarya Mathuria about 1 month ago
/a/yuriw-2024-03-20_14:28:32-rados-wip-yuri2-testing-2024-03-13-0827-distro-default-smithi/7612365/
Updated by Laura Flores about 1 month ago
/a/yuriw-2024-03-20_18:33:32-rados-wip-yuri6-testing-2024-03-18-1406-squid-distro-default-smithi/7613140