Bug #54030
mgr/dashboard: cephadm e2e failing because of rgw commands getting stuck
% Done:
0%
Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Description
Pasted below is an instance of the log where radosgw command is getting hanged because of the health of the cluster.
From https://github.com/ceph/ceph/pull/44384#issuecomment-1019969906 the tests needs to be adopted according to this.
Jan 21 15:19:21 ceph-node-00 ceph-mgr[11741]: [cephadm INFO cephadm.serve] Checking dashboard <-> RGW credentials Jan 21 15:19:21 ceph-node-00 ceph-mgr[11741]: log_channel(cephadm) log [INF] : Checking dashboard <-> RGW credentials Jan 21 15:19:21 ceph-node-00 ceph-mgr[11741]: [dashboard INFO rgw_client] Configuring dashboard RGW credentials Jan 21 15:19:21 ceph-node-00 ceph-mgr[11741]: [dashboard INFO orchestrator] is orchestrator available: True, Jan 21 15:19:21 ceph-node-00 ceph-mgr[11741]: [dashboard INFO request] [::ffff:192.168.100.1:40492] [GET] [200] [0.015s] [admin] [385.0B] /api/service Jan 21 15:19:22 ceph-node-00 ceph-mgr[11741]: log_channel(cluster) log [DBG] : pgmap v74: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail Jan 21 15:19:23 ceph-node-00 ceph-mgr[11741]: [progress INFO root] Writing back 2 completed events Jan 21 15:19:23 ceph-node-00 ceph-mgr[11741]: [progress INFO root] Processing OSDMap change 5..5 Jan 21 15:19:24 ceph-node-00 ceph-mgr[11741]: log_channel(cluster) log [DBG] : pgmap v75: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail Jan 21 15:19:26 ceph-node-00 ceph-mgr[11741]: log_channel(cluster) log [DBG] : pgmap v76: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail Jan 21 15:19:26 ceph-node-00 ceph-mgr[11741]: [dashboard INFO request] [::ffff:192.168.100.1:40492] [GET] [200] [0.005s] [admin] [22.0B] /api/prometheus/notifications Jan 21 15:19:26 ceph-node-00 ceph-mgr[11741]: [dashboard INFO request] [::ffff:192.168.100.1:40492] [GET] [200] [0.011s] [admin] [877.0B] /api/summary Jan 21 15:19:26 ceph-node-00 ceph-mgr[11741]: [dashboard INFO orchestrator] is orchestrator available: True, Jan 21 15:19:26 ceph-node-00 ceph-mgr[11741]: [dashboard INFO request] [::ffff:192.168.100.1:40492] [GET] [200] [0.011s] [admin] [385.0B] /api/service Jan 21 15:19:28 ceph-node-00 ceph-mgr[11741]: log_channel(cluster) log [DBG] : pgmap v77: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail Jan 21 15:19:28 ceph-node-00 ceph-mgr[11741]: [progress INFO root] Processing OSDMap change 5..5 Jan 21 15:19:30 ceph-node-00 ceph-mgr[11741]: log_channel(cluster) log [DBG] : pgmap v78: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: [dashboard INFO request] [::ffff:192.168.100.1:40492] [GET] [200] [0.007s] [admin] [22.0B] /api/prometheus/notifications Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: [dashboard INFO request] [::ffff:192.168.100.1:40492] [GET] [200] [0.012s] [admin] [877.0B] /api/summary Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: [dashboard ERROR root] Timeout (10s) executing radosgw-admin ['radosgw-admin', '-c', '/etc/ceph/ceph.conf', '-k', '/var/lib/ceph/mgr/ceph-ceph-node-00.njzmln/keyring', '-n', 'mgr.ceph-node-00.njzmln', 'realm', 'list'] Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: [dashboard ERROR rgw_client] Command '['radosgw-admin', '-c', '/etc/ceph/ceph.conf', '-k', '/var/lib/ceph/mgr/ceph-ceph-node-00.njzmln/keyring', '-n', 'mgr.ceph-node-00.njzmln', 'realm', 'list']' timed out after 10 seconds Traceback (most recent call last): File "/lib64/python3.6/subprocess.py", line 425, in run stdout, stderr = process.communicate(input, timeout=timeout) File "/lib64/python3.6/subprocess.py", line 863, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "/lib64/python3.6/subprocess.py", line 1535, in _communicate self._check_timeout(endtime, orig_timeout) File "/lib64/python3.6/subprocess.py", line 891, in _check_timeout raise TimeoutExpired(self.args, orig_timeout) subprocess.TimeoutExpired: Command '['radosgw-admin', '-c', '/etc/ceph/ceph.conf', '-k', '/var/lib/ceph/mgr/ceph-ceph-node-00.njzmln/keyring', '-n', 'mgr.ceph-node-00.njzmln', 'realm', 'list']' timed out after 10 seconds During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 243, in configure_rgw_credentials _, out, err = mgr.send_rgwadmin_command(['realm', 'list']) File "/usr/share/ceph/mgr/mgr_module.py", line 2235, in send_rgwadmin_command timeout=10, File "/lib64/python3.6/subprocess.py", line 430, in run stderr=stderr) subprocess.TimeoutExpired: Command '['radosgw-admin', '-c', '/etc/ceph/ceph.conf', '-k', '/var/lib/ceph/mgr/ceph-ceph-node-00.njzmln/keyring', '-n', 'mgr.ceph-node-00.njzmln', 'realm', 'list']' timed out after 10 seconds Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: [progress WARNING root] complete: ev a17c58b8-dd16-4393-8694-6c4c78723bd8 does not exist Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: [progress WARNING root] complete: ev a092e721-a7cb-47d5-a218-4ef2698477b6 does not exist Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: [progress WARNING root] complete: ev 609ee482-a1d1-49d6-8a6b-9e5076d0b978 does not exist Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: [progress WARNING root] complete: ev 25dd2cd7-80c6-4450-bd53-d9d8384616ca does not exist Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: [cephadm INFO cephadm.serve] Purge service mds.test Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: log_channel(cephadm) log [INF] : Purge service mds.test Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: [dashboard INFO orchestrator] is orchestrator available: True, Jan 21 15:19:31 ceph-node-00 ceph-mgr[11741]: [dashboard INFO request] [::ffff:192.168.100.1:40492] [GET] [200] [0.014s] [admin] [336.0B] /api/service Jan 21 15:19:32 ceph-node-00 ceph-mgr[11741]: log_channel(cluster) log [DBG] : pgmap v79: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail Jan 21 15:19:33 ceph-node-00 ceph-mgr[11741]: [progress INFO root] Processing OSDMap change 5..5 Jan 21 15:19:34 ceph-node-00 ceph-mgr[11741]: [cephadm INFO cephadm.serve] Checking dashboard <-> RGW credentials Jan 21 15:19:34 ceph-node-00 ceph-mgr[11741]: log_channel(cephadm) log [INF] : Checking dashboard <-> RGW credentials Jan 21 15:19:34 ceph-node-00 ceph-mgr[11741]: [dashboard INFO rgw_client] Configuring dashboard RGW credentials
Related issues
History
#1 Updated by Ernesto Puerta about 2 years ago
- Status changed from New to Pending Backport
- Assignee set to Nizamudeen A
- Pull request ID set to 44825
#2 Updated by Backport Bot about 2 years ago
- Copied to Backport #54178: pacific: mgr/dashboard: cephadm e2e failing because of rgw commands getting stuck added
#3 Updated by Ernesto Puerta about 2 years ago
- Status changed from Pending Backport to Resolved