Bug #59380
openrados/singleton-nomsgr: test failing from "Health check failed: 1 full osd(s) (OSD_FULL)" and "Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)"
Added by Laura Flores about 1 year ago. Updated 5 days ago.
0%
Description
2023-03-31T00:25:12.169 INFO:tasks.rgw.client.0.smithi157.stdout:2023-03-31T00:25:12.167+0000 7f470012f700 -1 received signal: Terminated from /usr/bin/python3 /bin/daemon-helper term radosgw --rgw-frontends beast port=80 -n client.0 --cluster ceph -k /etc/ceph/ceph.client.0.keyring --log-file /var/log/ceph/rgw.ceph.client.0.log --rgw_ops_log_socket_path /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock --foreground (PID: 110884) UID: 0
2023-03-31T00:25:12.169 INFO:tasks.rgw.client.0.smithi157.stdout:2023-03-31T00:25:12.167+0000 7f47041857c0 -1 shutting down
2023-03-31T00:25:14.065 INFO:tasks.ceph.mon.a.smithi157.stderr:2023-03-31T00:25:14.063+0000 7f8bfe441700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 full osd(s) (OSD_FULL)
2023-03-31T00:25:15.662 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~0s
2023-03-31T00:25:17.523 INFO:tasks.ceph.mds.a.smithi157.stderr:2023-03-31T00:25:17.522+0000 7f1f542c9700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request.
2023-03-31T00:25:18.270 INFO:tasks.rgw.client.0:Stopped
2023-03-31T00:25:18.271 DEBUG:teuthology.orchestra.run.smithi157:> rm -f /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock
2023-03-31T00:25:18.298 DEBUG:teuthology.orchestra.run.smithi157:> sudo rm -f /etc/ceph/vault-root-token
2023-03-31T00:25:18.372 DEBUG:teuthology.orchestra.run.smithi157:> radosgw-admin gc process --include-all
2023-03-31T00:25:21.168 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 has been restored
2023-03-31T12:11:11.597 DEBUG:teuthology.exit:Got signal 15; running 1 handler...
2023-03-31T12:11:11.634 DEBUG:teuthology.task.console_log:Killing console logger for smithi157
2023-03-31T12:11:11.635 DEBUG:teuthology.exit:Finished running handlers
/a/yuriw-2023-03-30_21:29:24-rados-wip-yuri2-testing-2023-03-30-0826-distro-default-smithi/7227455
/a/yuriw-2023-04-04_15:24:40-rados-wip-yuri4-testing-2023-03-31-1237-distro-default-smithi/7231345
Updated by Laura Flores about 1 year ago
Also in the logs:
/a/yuriw-2023-04-04_15:22:56-rados-wip-yuri2-testing-2023-03-30-0826-distro-default-smithi/7231083
2023-04-04T18:50:13.147 INFO:tasks.rgw.client.0.smithi191.stdout:2023-04-04T18:50:13.142+0000 7ff108317700 -1 received signal: Terminated from /usr/bin/python3 /usr/bin/daemon-helper term radosgw --rgw-frontends beast port=80 -n client.0 --cluster ceph -k /etc/ceph/ceph.client.0.keyring --log-file /var/log/ceph/rgw.ceph.client.0.log --rgw_ops_log_socket_path /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock --foreground (PID: 29378) UID: 0
2023-04-04T18:50:13.147 INFO:tasks.rgw.client.0.smithi191.stdout:2023-04-04T18:50:13.142+0000 7ff10bb62f80 -1 shutting down
2023-04-04T18:50:13.997 INFO:tasks.ceph.mon.a.smithi191.stderr:2023-04-04T18:50:13.994+0000 7fe05488a700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
2023-04-04T18:50:17.024 INFO:tasks.ceph.mon.a.smithi191.stderr:2023-04-04T18:50:17.022+0000 7fe05488a700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 full osd(s) (OSD_FULL)
2023-04-04T18:50:18.211 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 is failed for ~0s
2023-04-04T18:50:19.020 INFO:tasks.ceph.mds.a.smithi191.stderr:2023-04-04T18:50:19.018+0000 7fecb7b6d700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request.
2023-04-04T18:50:19.249 INFO:tasks.rgw.client.0:Stopped
2023-04-04T18:50:19.249 DEBUG:teuthology.orchestra.run.smithi191:> rm -f /home/ubuntu/cephtest/rgw.opslog.ceph.client.0.sock
2023-04-04T18:50:19.258 DEBUG:teuthology.orchestra.run.smithi191:> sudo rm -f /etc/ceph/vault-root-token
2023-04-04T18:50:19.274 DEBUG:teuthology.orchestra.run.smithi191:> radosgw-admin gc process --include-all
2023-04-04T18:50:23.714 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.rgw.client.0 has been restored
2023-04-05T06:39:49.154 DEBUG:teuthology.exit:Got signal 15; running 1 handler...
2023-04-05T06:39:49.186 DEBUG:teuthology.task.console_log:Killing console logger for smithi191
2023-04-05T06:39:49.187 DEBUG:teuthology.exit:Finished running handlers
Updated by Laura Flores about 1 year ago
- Subject changed from rados/singleton-nomsgr: Health check failed: 1 full osd(s) (OSD_FULL) to rados/singleton-nomsgr: test failing from "Health check failed: 1 full osd(s) (OSD_FULL)" and "Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)"
Updated by Laura Flores about 1 year ago
/a/yuriw-2023-04-25_21:30:50-rados-wip-yuri3-testing-2023-04-25-1147-distro-default-smithi/7253420
Updated by Nitzan Mordechai about 1 year ago
- Project changed from RADOS to rgw
it doesn't look like it is OSD_FULL issue, we didn't got to the point that we are scanning for log-ignore messages, the job got stuck for 12 hours before teuthology killed it.
i think its rgw client that we are waiting for that had some core dump that we didn't got:
-306> 2023-04-04T22:32:57.496+0000 7f8b3005f7c0 5 rgw main: tl::expected<std::pair<boost::container::flat_map<long unsigned int, logback_generation>, obj_version>, boost::system::error_code> logback_generations: :read(const DoutPrefixProvider*, optional_yield):410: oid=data_loggenerations_metadata not found -305> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding auth protocol: cephx -304> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding auth protocol: cephx -303> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding auth protocol: cephx -302> 2023-04-04T22:32:57.526+0000 7f8b11fd4700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start -301> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -300> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc -299> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -298> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc -297> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -296> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc -295> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -294> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc -293> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -292> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: crc -291> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 AuthRegistry(0x7ffe6a3da110) adding con mode: secure -290> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.0.keyring -289> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 asok(0x55f851b3e000) register_command cache list hook 0x55f851854050 -288> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 asok(0x55f851b3e000) register_command cache inspect hook 0x55f851854050 -287> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 asok(0x55f851b3e000) register_command cache erase hook 0x55f851854050 -286> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 5 asok(0x55f851b3e000) register_command cache zap hook 0x55f851854050 -285> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 10 monclient: get_monmap_and_config -284> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 10 monclient: build_initial_monmap -283> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 1 build_initial for_mkfs: 0 -282> 2023-04-04T22:32:57.526+0000 7f8b3005f7c0 10 monclient: monmap:
If anyone from RGW can check that
Updated by Laura Flores about 1 year ago
/a/yuriw-2023-04-24_22:54:45-rados-wip-yuri7-testing-2023-04-19-1343-distro-default-smithi/7250243
Updated by Casey Bodley about 1 year ago
i don't see evidence of an rgw crash. but if an OSD is full, couldn't that cause radosgw to hang waiting on requests to it?
Updated by Nitzan Mordechai about 1 year ago
Casey Bodley, can you please check: /a/yuriw-2023-04-24_22:54:45-rados-wip-yuri7-testing-2023-04-19-1343-distro-default-smithi/7250243/remote/smithi138/log/rgw.ceph.client.0.log.gz
for example
Updated by Laura Flores about 1 year ago
/a/yuriw-2023-05-16_23:44:06-rados-wip-yuri10-testing-2023-05-16-1243-distro-default-smithi/7276080
/a/yuriw-2023-05-16_23:44:06-rados-wip-yuri10-testing-2023-05-16-1243-distro-default-smithi/7276235
Updated by Laura Flores about 1 year ago
/a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271099
/a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271254
Updated by Laura Flores 12 months ago
/a/yuriw-2023-05-19_19:19:25-rados-wip-yuri11-testing-2023-05-19-0836-distro-default-smithi/7279399
/a/yuriw-2023-05-19_19:19:25-rados-wip-yuri11-testing-2023-05-19-0836-distro-default-smithi/7279244
Updated by Laura Flores 12 months ago
/a/yuriw-2023-06-01_19:33:38-rados-wip-yuri-testing-2023-06-01-0746-distro-default-smithi/7293994
Updated by Laura Flores 12 months ago
/a/yuriw-2023-06-06_14:45:18-rados-wip-yuri7-testing-2023-06-05-1505-distro-default-smithi/7296832
Updated by Laura Flores 11 months ago
/a/yuriw-2023-06-13_18:33:48-rados-wip-yuri10-testing-2023-06-02-1406-distro-default-smithi/7302949
Updated by Laura Flores 11 months ago
/a/yuriw-2023-07-03_15:32:36-rados-wip-yuri5-testing-2023-06-28-1515-distro-default-smithi/7325426
Updated by Laura Flores 11 months ago
/a/yuriw-2023-07-03_15:30:40-rados-wip-yuri7-testing-2023-06-23-1022-distro-default-smithi/7324875
Updated by Casey Bodley 10 months ago
i do see rgw crashes in these logs:
-12> 2023-05-17T02:30:05.789+0000 7f8007d53700 1 do_command 'config diff' '{prefix=config diff}' -11> 2023-05-17T02:30:05.789+0000 7f8007d53700 1 do_command 'config diff' '{prefix=config diff}' result is 0 bytes -10> 2023-05-17T02:30:05.789+0000 7f8007d53700 1 do_command 'config help' '{prefix=config help}' -9> 2023-05-17T02:30:05.797+0000 7f8007d53700 1 do_command 'config help' '{prefix=config help}' result is 0 bytes -8> 2023-05-17T02:30:05.801+0000 7f8007d53700 1 do_command 'config show' '{prefix=config show}' -7> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'config show' '{prefix=config show}' result is 0 bytes -6> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'counter dump' '{prefix=counter dump}' -5> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'counter dump' '{prefix=counter dump}' result is 0 bytes -4> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'counter schema' '{prefix=counter schema}' -3> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'counter schema' '{prefix=counter schema}' result is 0 bytes -2> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'injectargs' '{prefix=injectargs}' -1> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'injectargs' '{prefix=injectargs}' result is 0 bytes 0> 2023-05-17T02:30:05.805+0000 7f8007d53700 1 do_command 'log dump' '{prefix=log dump}'
but there's no stack trace, and ceph_test_admin_socket_output
passes anyway
Updated by Kamoltat (Junior) Sirivadhna 10 months ago
/a/yuriw-2023-07-10_18:41:02-rados-wip-yuri6-testing-2023-07-10-0816-distro-default-smithi/7332405
Updated by Laura Flores 10 months ago
/a/yuriw-2023-07-17_14:37:31-rados-wip-yuri-testing-2023-07-14-1641-distro-default-smithi/7341584
Updated by Laura Flores 9 months ago
/a/yuriw-2023-08-17_21:18:20-rados-wip-yuri11-testing-2023-08-17-0823-distro-default-smithi/7372055
Updated by Matan Breizman 9 months ago
/a//yuriw-2023-08-22_18:16:03-rados-wip-yuri10-testing-2023-08-17-1444-distro-default-smithi/7376756
/a//yuriw-2023-08-22_18:16:03-rados-wip-yuri10-testing-2023-08-17-1444-distro-default-smithi/7376912
Updated by Laura Flores 9 months ago
/a/yuriw-2023-08-15_18:58:56-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7369398
Updated by Laura Flores 9 months ago
/a/lflores-2023-09-05_22:05:24-rados-wip-yuri8-testing-2023-08-28-1340-distro-default-smithi/7389178
Updated by Laura Flores 9 months ago
/a/lflores-2023-09-06_18:23:03-rados-wip-yuri-testing-2023-08-25-0809-distro-default-smithi/7390358
Updated by Laura Flores 8 months ago
/a/lflores-2023-09-08_20:36:06-rados-wip-lflores-testing-2-2023-09-08-1755-distro-default-smithi/7391626
Updated by Laura Flores 7 months ago
/a/yuriw-2023-10-24_00:11:54-rados-wip-yuri4-testing-2023-10-23-0903-distro-default-smithi/7435687
Updated by Nitzan Mordechai 7 months ago
/a/yuriw-2023-10-30_15:34:36-rados-wip-yuri10-testing-2023-10-27-0804-distro-default-smithi/7441165
/a/yuriw-2023-10-30_15:34:36-rados-wip-yuri10-testing-2023-10-27-0804-distro-default-smithi/7441319
Updated by Laura Flores 7 months ago
- Backport set to reef
/a/yuriw-2023-10-31_14:43:48-rados-wip-yuri4-testing-2023-10-30-1117-distro-default-smithi/7442070
Updated by Laura Flores 7 months ago
/a/yuriw-2023-10-24_00:11:03-rados-wip-yuri2-testing-2023-10-23-0917-distro-default-smithi/7435871
Updated by Laura Flores 5 months ago
/a/yuriw-2023-12-07_16:42:12-rados-wip-yuri2-testing-2023-12-06-1239-distro-default-smithi/7482272
Updated by Nitzan Mordechai 5 months ago
/a/yuriw-2023-12-11_23:27:14-rados-wip-yuri8-testing-2023-12-11-1101-distro-default-smithi/7487804
/a/yuriw-2023-12-11_23:27:14-rados-wip-yuri8-testing-2023-12-11-1101-distro-default-smithi/7487806
/a/yuriw-2023-12-11_23:27:14-rados-wip-yuri8-testing-2023-12-11-1101-distro-default-smithi/7487647
Updated by Aishwarya Mathuria 5 months ago
/a/yuriw-2024-01-03_16:19:00-rados-wip-yuri6-testing-2024-01-02-0832-distro-default-smithi/7505468
Updated by Laura Flores 4 months ago
/a/yuriw-2024-01-18_21:18:17-rados-wip-yuri8-testing-2024-01-18-0823-distro-default-smithi/7521174
Updated by Matan Breizman 3 months ago
/a/yuriw-2024-02-09_00:15:46-rados-wip-yuri2-testing-2024-02-08-0727-distro-default-smithi/7553401
Updated by Aishwarya Mathuria 2 months ago
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603379/
/a/yuriw-2024-03-15_19:59:43-rados-wip-yuri6-testing-2024-03-15-0709-distro-default-smithi/7603639/
Updated by Aishwarya Mathuria 2 months ago
/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609873
Updated by Aishwarya Mathuria 2 months ago
/a/yuriw-2024-03-20_14:28:32-rados-wip-yuri2-testing-2024-03-13-0827-distro-default-smithi/7612365/
Updated by Laura Flores about 2 months ago
/a/yuriw-2024-03-20_18:33:32-rados-wip-yuri6-testing-2024-03-18-1406-squid-distro-default-smithi/7613140
Updated by Aishwarya Mathuria 18 days ago ยท Edited
/a/yuriw-2024-04-30_14:17:59-rados-wip-yuri5-testing-2024-04-17-1400-distro-default-smithi/7681095/
/a/yuriw-2024-04-30_14:17:59-rados-wip-yuri5-testing-2024-04-17-1400-distro-default-smithi/7680996/
Updated by Laura Flores 15 days ago
/a/yuriw-2024-05-02_23:59:28-rados-wip-yuriw11-testing-20240501.200505-squid-distro-default-smithi/7687043
Updated by Sridhar Seshasayee 5 days ago
/a/yuriw-2024-05-14_00:32:08-rados-wip-yuri4-testing-2024-04-29-0642-distro-default-smithi/7705412
/a/yuriw-2024-05-14_00:32:08-rados-wip-yuri4-testing-2024-04-29-0642-distro-default-smithi/7705439