Bug #53680
closedERROR:tasks.rook:'waiting for service removal' reached maximum tries (90) after waiting for 900 seconds
0%
Description
/a/yuriw-2021-12-17_22:45:37-rados-wip-yuri10-testing-2021-12-17-1119-distro-default-smithi/6569344/
2021-12-18T01:35:01.041 INFO:teuthology.orchestra.run.smithi185.stdout:[{"placement": {"host_pattern": "*"}, "service_name": "crash", "service_type": "crash", "status": {"container_image_id": "1b23219043771c7aeaf383e73d414c625d03fae66d69ca7172f26ede96eefd1d", "container_image_name": "quay.ceph.io/ceph-ci/ceph:91fdab49fed87aa0a3dbbceccc27e84ab4f80130", "created": "2021-12-18T01:11:19.000000Z", "last_refresh": "2021-12-18T01:35:00.947402Z", "running": 1, "size": 1}}, {"placement": {"count": 1}, "service_name": "mgr", "service_type": "mgr", "status": {"container_image_id": "1b23219043771c7aeaf383e73d414c625d03fae66d69ca7172f26ede96eefd1d", "container_image_name": "quay.ceph.io/ceph-ci/ceph:91fdab49fed87aa0a3dbbceccc27e84ab4f80130", "created": "2021-12-18T01:02:05.000000Z", "last_refresh": "2021-12-18T01:35:00.947402Z", "running": 1, "size": 1}}, {"placement": {"count": 1}, "service_name": "mon", "service_type": "mon", "status": {"container_image_id": "1b23219043771c7aeaf383e73d414c625d03fae66d69ca7172f26ede96eefd1d", "container_image_name": "quay.ceph.io/ceph-ci/ceph:91fdab49fed87aa0a3dbbceccc27e84ab4f80130", "created": "2021-12-18T01:01:38.000000Z", "last_refresh": "2021-12-18T01:35:00.947402Z", "running": 1, "size": 1}}, {"service_name": "osd", "service_type": "osd", "spec": {"filter_logic": "AND", "objectstore": "bluestore"}, "status": {"container_image_id": "1b23219043771c7aeaf383e73d414c625d03fae66d69ca7172f26ede96eefd1d", "container_image_name": "quay.ceph.io/ceph-ci/ceph:91fdab49fed87aa0a3dbbceccc27e84ab4f80130", "created": "2021-12-18T01:03:41.000000Z", "last_refresh": "2021-12-18T01:35:00.947402Z", "running": 8, "size": 4}, "unmanaged": true}, {"placement": {"host_pattern": "*"}, "service_id": "all-available-devices", "service_name": "osd.all-available-devices", "service_type": "osd", "spec": {"data_devices": {"all": true}, "filter_logic": "AND", "objectstore": "bluestore"}, "status": {"last_refresh": "2021-12-18T01:35:00.947402Z", "running": 0, "size": 0}}, {"placement": {"count": 1}, "service_id": "foo", "service_name": "rgw.foo", "service_type": "rgw", "spec": {"rgw_frontend_port": 80}, "status": {"container_image_id": "1b23219043771c7aeaf383e73d414c625d03fae66d69ca7172f26ede96eefd1d", "container_image_name": "quay.ceph.io/ceph-ci/ceph:91fdab49fed87aa0a3dbbceccc27e84ab4f80130", "created": "2021-12-18T01:13:57.000000Z", "last_refresh": "2021-12-18T01:35:00.947402Z", "running": 1, "size": 1}}]
2021-12-18T01:35:01.064 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/contextutil.py", line 33, in nested
yield vars
File "/home/teuthworker/src/github.com_ceph_ceph-c_91fdab49fed87aa0a3dbbceccc27e84ab4f80130/qa/tasks/rook.py", line 669, in task
while proceed():
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/contextutil.py", line 133, in __call__
raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: 'waiting for service removal' reached maximum tries (90) after waiting for 900 seconds
2021-12-18T01:35:01.065 ERROR:tasks.rook:'waiting for service removal' reached maximum tries (90) after waiting for 900 seconds
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_91fdab49fed87aa0a3dbbceccc27e84ab4f80130/qa/tasks/rook.py", line 530, in ceph_config_keyring
yield
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/contextutil.py", line 33, in nested
yield vars
File "/home/teuthworker/src/github.com_ceph_ceph-c_91fdab49fed87aa0a3dbbceccc27e84ab4f80130/qa/tasks/rook.py", line 669, in task
while proceed():
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/contextutil.py", line 133, in __call__
raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: 'waiting for service removal' reached maximum tries (90) after waiting for 900 seconds
2021-12-18T01:35:01.065 INFO:tasks.rook:Cleaning up config and client.admin keyring
2021-12-18T01:35:01.066 DEBUG:teuthology.orchestra.run.smithi185:> sudo rm -f /etc/ceph/ceph.conf /etc/ceph/ceph.client.admin.keyring
2021-12-18T01:35:01.079 ERROR:tasks.rook:'waiting for service removal' reached maximum tries (90) after waiting for 900 seconds
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_91fdab49fed87aa0a3dbbceccc27e84ab4f80130/qa/tasks/rook.py", line 478, in rook_post_config
yield
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/contextutil.py", line 33, in nested
yield vars
File "/home/teuthworker/src/github.com_ceph_ceph-c_91fdab49fed87aa0a3dbbceccc27e84ab4f80130/qa/tasks/rook.py", line 669, in task
while proceed():
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/contextutil.py", line 133, in __call__
raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: 'waiting for service removal' reached maximum tries (90) after waiting for 900 seconds
2021-12-18T01:35:01.080 ERROR:tasks.rook:'waiting for service removal' reached maximum tries (90) after waiting for 900 seconds
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_91fdab49fed87aa0a3dbbceccc27e84ab4f80130/qa/tasks/rook.py", line 442, in rook_toolbox
yield
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/contextutil.py", line 33, in nested
yield vars
File "/home/teuthworker/src/github.com_ceph_ceph-c_91fdab49fed87aa0a3dbbceccc27e84ab4f80130/qa/tasks/rook.py", line 669, in task
while proceed():
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/contextutil.py", line 133, in __call__
raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: 'waiting for service removal' reached maximum tries (90) after waiting for 900 seconds
2021-12-18T01:35:01.141 DEBUG:teuthology.orchestra.remote:smithi185:rook/cluster/examples/kubernetes/ceph/operator.yaml is 22KB
2021-12-18T01:35:01.191 DEBUG:teuthology.orchestra.run.smithi185:> kubectl delete -f rook/cluster/examples/kubernetes/ceph/toolbox.yaml
2021-12-18T01:35:01.255 INFO:teuthology.orchestra.run.smithi185.stdout:deployment.apps "rook-ceph-tools" deleted
2021-12-18T01:35:01.287 ERROR:tasks.rook:'waiting for service removal' reached maximum tries (90) after waiting for 900 seconds
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_91fdab49fed87aa0a3dbbceccc27e84ab4f80130/qa/tasks/rook.py", line 379, in rook_cluster
yield
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/contextutil.py", line 33, in nested
yield vars
File "/home/teuthworker/src/github.com_ceph_ceph-c_91fdab49fed87aa0a3dbbceccc27e84ab4f80130/qa/tasks/rook.py", line 669, in task
while proceed():
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/contextutil.py", line 133, in __call__
raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: 'waiting for service removal' reached maximum tries (90) after waiting for 900 seconds
2021-12-18T01:35:01.288 DEBUG:teuthology.orchestra.run.smithi185:> kubectl delete -f cluster.yaml
2021-12-18T01:35:01.380 INFO:teuthology.orchestra.run.smithi185.stdout:cephcluster.ceph.rook.io "rook-ceph" deleted
2021-12-18T01:35:01.383 INFO:tasks.rook.operator.smithi185.stdout:2021-12-18 01:35:01.383570 I | ceph-cluster-controller: CR "rook-ceph" is going be deleted, cancelling any ongoing orchestration
2021-12-18T01:35:01.685 INFO:tasks.rook.operator.smithi185.stdout:2021-12-18 01:35:01.685102 I | ceph-cluster-controller: CephCluster "rook-ceph/rook-ceph" will not be deleted until all dependents are removed: CephObjectStores: [foo]
2021-12-18T01:35:01.696 INFO:tasks.rook.operator.smithi185.stdout:2021-12-18 01:35:01.696075 E | ceph-cluster-controller: failed to reconcile CephCluster "rook-ceph/rook-ceph". CephCluster "rook-ceph/rook-ceph" will not be deleted until all dependents are removed: CephObjectStores: [foo]
2021-12-18T01:35:01.696 INFO:tasks.rook.operator.smithi185.stdout:2021-12-18 01:35:01.696111 I | op-k8sutil: Reporting Event rook-ceph:rook-ceph Warning:ReconcileFailed:CephCluster "rook-ceph/rook-ceph" will not be deleted until all dependents are removed: CephObjectStores: [foo]
Updated by Kamoltat (Junior) Sirivadhna over 2 years ago
/a/yuriw-2021-12-21_18:01:07-rados-wip-yuri3-testing-2021-12-21-0749-distro-default-smithi/6576218/
Updated by Laura Flores over 2 years ago
/a/yuriw-2022-01-04_21:52:15-rados-wip-yuri7-testing-2022-01-04-1159-distro-default-smithi/6595518
Updated by Joseph Sawaya over 2 years ago
The first two logs are due to this ListBuckets call failing in the RGW pod: https://github.com/rook/rook/blob/0d8fd9d8a47799fbb2607fded7bab757fee2fd6a/pkg/operator/ceph/object/dependents.go#L97. This is the error we get when that happens:
failed to reconcile CephObjectStore "rook-ceph/foo". failed to get dependents of CephObjectStore "rook-ceph/foo": failed to list buckets in CephObjectStore "rook-ceph/foo": Get "http://rook-ceph-rgw-foo.rook-ceph.svc:80/admin/bucket?": dial tcp 10.98.24.136:80: connect: connection refused
We also get another error earlier in the log that says that we can't create multisite for the CephObjectStore but that goes away and we end up getting the error above when the CephObjectStore tries to reconcile:
failed to reconcile CephObjectStore "rook-ceph/foo". failed to create object store deployments: failed to configure multisite for object store: failed create ceph multisite for object-store ["foo"]: failed to update period%!(EXTRA []string=[]): exit status 2
In the third log we're getting a different error related to the creation of the "rgw-admin-ops-user" and it looks like it signifies and error with the radosgw-admin command. This is the error we get when that happens:
failed to reconcile CephObjectStore "rook-ceph/foo". failed to check for object buckets. failed to get admin ops API context: failed to create or retrieve rgw admin ops user: failed to create object user "rgw-admin-ops-user". error code 1 for object store "foo": failed to create s3 user. 2022-01-05T03:22:34.737+0000 7f4cb28aa340 0 failed reading zonegroup info: ret -2 (2) No such file or directory
To me it seems that the issue isn't caused by the orchestrator or Rook, since there were no changes to the orchestrator or Rook 1.7.2 and the error changed between runs. If the same error persists, then this warrants another look.
Updated by Laura Flores over 2 years ago
Thanks Joseph. I frequently review teuthology runs, so I'll update this tracker if the problem persists. Hopefully if there are more occurrences, we can narrow down the root cause.
Updated by Laura Flores over 2 years ago
Happened again here: /a/yuriw-2022-01-13_14:57:55-rados-wip-yuri5-testing-2022-01-12-1534-distro-default-smithi/6612758
Updated by Laura Flores about 2 years ago
/a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6698305
/a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6698462
/a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6698542
/a/yuriw-2022-02-22_16:14:07-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6700744
/a/yuriw-2022-02-22_16:14:07-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6700746
/a/yuriw-2022-02-22_16:14:07-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6700752
/a/yuriw-2022-02-22_16:14:07-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6700754
Updated by Laura Flores about 2 years ago
/a/yuriw-2022-03-01_22:42:19-rados-wip-yuri4-testing-2022-03-01-1206-distro-default-smithi/6715405
Updated by Laura Flores about 2 years ago
- Priority changed from Normal to High
Upping the priority of this because it is failing a lot in the rados suite.
Updated by Laura Flores about 2 years ago
/a/dgalloway-2022-03-09_02:34:58-rados-wip-45272-distro-basic-smithi/6727547
Updated by Laura Flores about 2 years ago
/a/yuriw-2022-03-10_01:04:51-rados-wip-yuri5-testing-2022-03-07-0958-distro-default-smithi/6728619
Updated by Aishwarya Mathuria about 2 years ago
/a/yuriw-2022-03-14_18:47:44-rados-wip-yuri3-testing-2022-03-14-0946-distro-default-smithi/6736585
/a/yuriw-2022-03-14_18:47:44-rados-wip-yuri3-testing-2022-03-14-0946-distro-default-smithi/6736425
Updated by Sridhar Seshasayee about 2 years ago
/a/yuriw-2022-03-16_20:38:07-rados-wip-yuri3-testing-2022-03-16-1030-distro-default-smithi/6739268
Updated by Joseph Sawaya about 2 years ago
Neha brought it to my attention that this issue is still coming up, so I'm going to attempt to fix it by simply removing the orchestrator commands from the test suite so that the rook suite just creates a Rook cluster and runs radosbench. The Rook orchestrator is not being maintained at the moment so I think it's safe to remove testing for it. The orchestrator being broken could easily be the cause of this issue so it makes sense to just remove that from the test suite for now, and think of more appropriate solution for running radosbench on Rook in teuthology later.
Updated by Neha Ojha about 2 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 45749
Updated by Aishwarya Mathuria about 2 years ago
/a/yuriw-2022-04-06_16:35:43-rados-wip-yuri5-testing-2022-04-05-1720-distro-default-smithi/6779888
Updated by Laura Flores about 2 years ago
- Status changed from Fix Under Review to Resolved