Project

General

Profile

Actions

Bug #47306

closed

rgw: FAIL: test_all (tasks.mgr.dashboard.test_rgw.RgwBucketTest)

Added by Ernesto Puerta over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
octopus, nautilus
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The error report and why this is not a flapping one

2020-09-03 18:03:07,331.331 INFO:__main__:----------------------------------------------------------------------
2020-09-03 18:03:07,331.331 INFO:__main__:Traceback (most recent call last):
2020-09-03 18:03:07,331.331 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/test_rgw.py", line 288, in test_all
2020-09-03 18:03:07,331.331 INFO:__main__:    self.assertEqual(len(data), 0)
2020-09-03 18:03:07,332.332 INFO:__main__:AssertionError: 1 != 0
2020-09-03 18:03:07,332.332 INFO:__main__:
2020-09-03 18:03:07,332.332 INFO:__main__:----------------------------------------------------------------------
2020-09-03 18:03:07,332.332 INFO:__main__:Ran 193 tests in 2142.009s
2020-09-03 18:03:07,332.332 INFO:__main__:
2020-09-03 18:03:07,332.332 INFO:__main__:

This error first appeared in PR https://github.com/ceph/ceph/pull/31663 (https://jenkins.ceph.com/job/ceph-api/2359/), which was merged with that error raising. All subsequent builds (99%) are consistently failing due to this.

RGW team (Casey and and J. Eric Ivancich) launched a revert PR to verify whether this was the reason. The PR reverted resulted in the same failure. This may mean:

Issue analysis

  • Failed API test (qa/tasks/mgr/dashboard/test_rgw.py:288): It's basically using MFA Deletes/TOTP. On removing a bucket, RGW is returning 200 while logging ERROR: could not remove bucket teuth-test-bucket // req 31 0.024000438s :remove_bucket remove_bucket returned ret=-125. 200 for DELETE means that the deletion was successful. However, a later bucket listing indeed shows that the bucket was not deleted.
  • Logs from radosgw:
    2020-09-07T12:10:11.978+0000 7f0ca3937700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:12.054+0000 7f0c92915700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.076001385s ======
    2020-09-07T12:10:12.054+0000 7f0c92915700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:12.054338+0000] "PUT /teuth-test-bucket HTTP/1.1" 200 880 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.175+0000 7f0c9691d700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:12.179+0000 7f0c9691d700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.004000073s ======
    2020-09-07T12:10:12.179+0000 7f0c9691d700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:12.179340+0000] "GET /admin/bucket HTTP/1.1" 200 21 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.294+0000 7f0c91112700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:12.296+0000 7f0c9f92f700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.002000036s ======
    2020-09-07T12:10:12.296+0000 7f0c9f92f700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:12.296342+0000] "GET /admin/bucket?bucket=teuth-test-bucket HTTP/1.1" 200 723 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.301+0000 7f0ca0931700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:12.302+0000 7f0ca0931700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.001000018s ======
    2020-09-07T12:10:12.302+0000 7f0ca0931700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:12.302342+0000] "GET /teuth-test-bucket?versioning HTTP/1.1" 200 2 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.306+0000 7f0c9e92d700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:12.307+0000 7f0c9e92d700  1 ====== req done req=0x7f0ce8935640 op status=-2046 http_status=404 latency=0.001000018s ======
    2020-09-07T12:10:12.307+0000 7f0c9e92d700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:12.307342+0000] "GET /teuth-test-bucket?object-lock HTTP/1.1" 404 174 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.373+0000 7f0c98120700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:12.417+0000 7f0c98120700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.044000802s ======
    2020-09-07T12:10:12.417+0000 7f0c98120700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:12.417344+0000] "PUT /admin/bucket?bucket=teuth-test-bucket&bucket-id=a98d0c78-a903-4fa3-9e14-032fb698e9ee.4769.1&uid=teuth-test-user HTTP/1.1" 200 0 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.428+0000 7f0c98921700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:12.429+0000 7f0c98921700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.001000018s ======
    2020-09-07T12:10:12.429+0000 7f0c98921700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:12.429344+0000] "GET /admin/user?uid=teuth-test-user HTTP/1.1" 200 1075 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.486+0000 7f0c8f90f700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:12.487+0000 7f0c8f90f700  1 ====== req done req=0x7f0ce88b4640 op status=0 http_status=200 latency=0.001000018s ======
    2020-09-07T12:10:12.487+0000 7f0c8f90f700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:12.487345+0000] "GET /teuth-test-bucket?versioning HTTP/1.1" 200 2 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.497+0000 7f0c9d12a700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:12.513+0000 7f0c8b907700  1 ====== req done req=0x7f0ce88b4640 op status=0 http_status=200 latency=0.016000291s ======
    2020-09-07T12:10:12.513+0000 7f0c8b907700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:12.513346+0000] "PUT /teuth-test-bucket?versioning HTTP/1.1" 200 0 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.525+0000 7f0c8c909700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:12.527+0000 7f0c8c909700  1 ====== req done req=0x7f0ce88b4640 op status=-2046 http_status=404 latency=0.002000037s ======
    2020-09-07T12:10:12.527+0000 7f0c8c909700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:12.527346+0000] "GET /teuth-test-bucket?object-lock HTTP/1.1" 404 174 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.573+0000 7f0c8b106700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:12.574+0000 7f0c8b106700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.001000018s ======
    2020-09-07T12:10:12.574+0000 7f0c8b106700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:12.574347+0000] "GET /admin/bucket?bucket=teuth-test-bucket HTTP/1.1" 200 733 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.579+0000 7f0c89903700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:12.579+0000 7f0c89903700  1 ====== req done req=0x7f0ce88b4640 op status=0 http_status=200 latency=0s ======
    2020-09-07T12:10:12.579+0000 7f0c89903700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:12.579347+0000] "GET /teuth-test-bucket?versioning HTTP/1.1" 200 43 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.582+0000 7f0c860fc700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:12.583+0000 7f0c860fc700  1 ====== req done req=0x7f0ce88b4640 op status=-2046 http_status=404 latency=0.001000018s ======
    2020-09-07T12:10:12.583+0000 7f0c860fc700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:12.583347+0000] "GET /teuth-test-bucket?object-lock HTTP/1.1" 404 174 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.688+0000 7f0c868fd700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:12.735+0000 7f0c868fd700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.047000857s ======
    2020-09-07T12:10:12.735+0000 7f0c868fd700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:12.735350+0000] "PUT /admin/bucket?bucket=teuth-test-bucket&bucket-id=a98d0c78-a903-4fa3-9e14-032fb698e9ee.4769.1&uid=teuth-test-user HTTP/1.1" 200 0 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.748+0000 7f0c8e90d700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:12.749+0000 7f0c7a8e5700  1 ====== req done req=0x7f0ce88b4640 op status=0 http_status=200 latency=0.001000018s ======
    2020-09-07T12:10:12.749+0000 7f0c7a8e5700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:12.749350+0000] "GET /teuth-test-bucket?versioning HTTP/1.1" 200 43 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.758+0000 7f0c7e8ed700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:12.782+0000 7f0c788e1700  1 ====== req done req=0x7f0ce88b4640 op status=0 http_status=200 latency=0.024000438s ======
    2020-09-07T12:10:12.782+0000 7f0c788e1700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:12.782351+0000] "PUT /teuth-test-bucket?versioning HTTP/1.1" 200 0 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.787+0000 7f0c790e2700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:12.787+0000 7f0c790e2700  1 ====== req done req=0x7f0ce88b4640 op status=-2046 http_status=404 latency=0s ======
    2020-09-07T12:10:12.787+0000 7f0c790e2700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:12.787351+0000] "GET /teuth-test-bucket?object-lock HTTP/1.1" 404 174 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.878+0000 7f0c780e0700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:12.879+0000 7f0c780e0700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.001000018s ======
    2020-09-07T12:10:12.879+0000 7f0c780e0700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:12.879353+0000] "GET /admin/bucket?bucket=teuth-test-bucket HTTP/1.1" 200 733 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.884+0000 7f0c7f0ee700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:12.885+0000 7f0c7f0ee700  1 ====== req done req=0x7f0ce88b4640 op status=0 http_status=200 latency=0.001000019s ======
    2020-09-07T12:10:12.885+0000 7f0c7f0ee700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:12.885353+0000] "GET /teuth-test-bucket?versioning HTTP/1.1" 200 42 - "python-requests/2.20.0" -
    2020-09-07T12:10:12.890+0000 7f0caa945700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:12.890+0000 7f0caa945700  1 ====== req done req=0x7f0ce88b4640 op status=-2046 http_status=404 latency=0s ======
    2020-09-07T12:10:12.890+0000 7f0caa945700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:12.890353+0000] "GET /teuth-test-bucket?object-lock HTTP/1.1" 404 174 - "python-requests/2.20.0" -
    2020-09-07T12:10:15.947+0000 7f0c708d1700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:16.011+0000 7f0c6f0ce700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.064001167s ======
    2020-09-07T12:10:16.011+0000 7f0c6f0ce700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:16.011410+0000] "PUT /admin/bucket?bucket=teuth-test-bucket&bucket-id=a98d0c78-a903-4fa3-9e14-032fb698e9ee.4769.1&uid=teuth-test-user HTTP/1.1" 200 0 - "python-requests/2.20.0" -
    2020-09-07T12:10:16.015+0000 7f0c838f7700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:16.016+0000 7f0c838f7700  1 ====== req done req=0x7f0ce88b4640 op status=0 http_status=200 latency=0.001000018s ======
    2020-09-07T12:10:16.016+0000 7f0c838f7700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:16.016410+0000] "GET /teuth-test-bucket?versioning HTTP/1.1" 200 42 - "python-requests/2.20.0" -
    2020-09-07T12:10:16.021+0000 7f0c830f6700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:16.042+0000 7f0c808f1700  1 ====== req done req=0x7f0ce88b4640 op status=0 http_status=200 latency=0.021000383s ======
    2020-09-07T12:10:16.042+0000 7f0c808f1700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:16.042410+0000] "PUT /teuth-test-bucket?versioning HTTP/1.1" 200 0 - "python-requests/2.20.0" -
    2020-09-07T12:10:16.047+0000 7f0c820f4700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:16.047+0000 7f0c820f4700  1 ====== req done req=0x7f0ce88b4640 op status=-2046 http_status=404 latency=0s ======
    2020-09-07T12:10:16.047+0000 7f0c820f4700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:16.047410+0000] "GET /teuth-test-bucket?object-lock HTTP/1.1" 404 174 - "python-requests/2.20.0" -
    2020-09-07T12:10:16.148+0000 7f0c698c3700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:16.150+0000 7f0c6e0cc700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.002000037s ======
    2020-09-07T12:10:16.150+0000 7f0c6e0cc700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:16.150412+0000] "GET /admin/bucket?bucket=teuth-test-bucket HTTP/1.1" 200 733 - "python-requests/2.20.0" -
    2020-09-07T12:10:16.155+0000 7f0c640b8700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:16.155+0000 7f0c638b7700  1 ====== req done req=0x7f0ce88b4640 op status=0 http_status=200 latency=0s ======
    2020-09-07T12:10:16.155+0000 7f0c638b7700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:16.155412+0000] "GET /teuth-test-bucket?versioning HTTP/1.1" 200 45 - "python-requests/2.20.0" -
    2020-09-07T12:10:16.162+0000 7f0c640b8700  1 ====== starting new request req=0x7f0ce88b4640 =====
    2020-09-07T12:10:16.163+0000 7f0c640b8700  1 ====== req done req=0x7f0ce88b4640 op status=-2046 http_status=404 latency=0.001000019s ======
    2020-09-07T12:10:16.163+0000 7f0c640b8700  1 beast: 0x7f0ce88b4640: 172.20.0.3 - - [2020-09-07T12:10:16.163412+0000] "GET /teuth-test-bucket?object-lock HTTP/1.1" 404 174 - "python-requests/2.20.0" -
    2020-09-07T12:10:16.253+0000 7f0c610b2700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:16.277+0000 7f0c610b2700 -1 ERROR: could not remove bucket teuth-test-bucket
    2020-09-07T12:10:16.277+0000 7f0c610b2700  0 req 31 0.024000438s :remove_bucket remove_bucket returned ret=-125
    2020-09-07T12:10:16.277+0000 7f0c6d0ca700  1 ====== req done req=0x7f0ce8935640 op status=-125 http_status=200 latency=0.024000438s ======
    2020-09-07T12:10:16.277+0000 7f0c6d0ca700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:16.277415+0000] "DELETE /admin/bucket?bucket=teuth-test-bucket&purge-objects=true HTTP/1.1" 200 0 - "python-requests/2.20.0" -
    2020-09-07T12:10:16.348+0000 7f0c618b3700  1 ====== starting new request req=0x7f0ce8935640 =====
    2020-09-07T12:10:16.353+0000 7f0c6b8c7700  1 ====== req done req=0x7f0ce8935640 op status=0 http_status=200 latency=0.005000092s ======
    2020-09-07T12:10:16.353+0000 7f0c6b8c7700  1 beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:16.353416+0000] "GET /admin/bucket HTTP/1.1" 200 21 - "python-requests/2.20.0" -
    

Files

failures.png (555 KB) failures.png Ernesto Puerta, 09/07/2020 08:22 AM
osd.0.log.gz (417 Bytes) osd.0.log.gz Mark Kogan, 09/08/2020 04:29 PM
radosgw.8000.log.gz (7.14 KB) radosgw.8000.log.gz Mark Kogan, 09/08/2020 04:29 PM

Related issues 2 (0 open2 closed)

Copied to rgw - Backport #47686: nautilus: rgw: FAIL: test_all (tasks.mgr.dashboard.test_rgw.RgwBucketTest)ResolvedNathan CutlerActions
Copied to rgw - Backport #47687: octopus: rgw: FAIL: test_all (tasks.mgr.dashboard.test_rgw.RgwBucketTest)ResolvedNathan CutlerActions
Actions #1

Updated by Ernesto Puerta over 3 years ago

Actions #2

Updated by Ernesto Puerta over 3 years ago

  • Description updated (diff)
Actions #3

Updated by Ernesto Puerta over 3 years ago

  • Description updated (diff)
Actions #4

Updated by Ernesto Puerta over 3 years ago

  • Description updated (diff)
Actions #5

Updated by Ernesto Puerta over 3 years ago

  • Description updated (diff)
Actions #6

Updated by Ernesto Puerta over 3 years ago

  • Description updated (diff)
Actions #7

Updated by Ernesto Puerta over 3 years ago

  • Description updated (diff)
Actions #8

Updated by Ernesto Puerta over 3 years ago

  • Description updated (diff)
Actions #9

Updated by Ernesto Puerta over 3 years ago

  • Description updated (diff)
Actions #10

Updated by Mark Kogan over 3 years ago

Attempting to reproduce the bucket deletion as in:

DELETE /admin/bucket?bucket=teuth-test-bucket&purge-objects=true

similarly to the log above:
====== starting new request req=0x7f0ce8935640 =====
ERROR: could not remove bucket teuth-test-bucket
req 31 0.024000438s :remove_bucket remove_bucket returned ret=-125
====== req done req=0x7f0ce8935640 op status=-125 http_status=200 latency=0.024000438s ======
beast: 0x7f0ce8935640: 172.20.0.3 - - [2020-09-07T12:10:16.277415+0000] "DELETE /admin/bucket?bucket=teuth-test-bucket&purge-objects=true HTTP/1.1" 200 0 - "python-requests/2.20.0" -

@Ernesto, is it correct that this is a versioned bucket? and if so, were all the previous versions of the object removed before attempting to delete the bucket? or was versioning turned off on the bucket before deleting?

Actions #11

Updated by Ernesto Puerta over 3 years ago

  • Description updated (diff)
Actions #12

Updated by Ernesto Puerta over 3 years ago

  • Description updated (diff)
Actions #13

Updated by Ernesto Puerta over 3 years ago

Mark Kogan wrote:

Attempting to reproduce the bucket deletion as in:
[...]
similarly to the log above:
[...]

@Ernesto, is it correct that this is a versioned bucket? and if so, were all the previous versions of the object removed before attempting to delete the bucket? or was versioning turned off on the bucket before deleting?

Mark, by looking at the test code it looks like only 1 DELETE is issued once the versioning is enabled... I cannot say if this should be the expected behavior, but what is clear to me is that this might be breaking the current one...

Actions #14

Updated by Ernesto Puerta over 3 years ago

  • Description updated (diff)
Actions #15

Updated by Mark Kogan over 3 years ago

Thank you Ernesto,

Searching for PUT in the attached log to determin the operations flow for reproduction,
the flow looks as following:
- create bucket 'teuth-test-bucket'
- perform bucket versioning status change operations by: 'PUT /teuth-test-bucket?versioning' 3 times

The log at his level does not contain enought detail, would like to ask please for each of the 'PUT /teuth-test-bucket?versioning' in order enabling or suspending the versioning?

Attempint guessing various permutations like below did not reproduce the issue.

aws --endpoint-url http://localhost:8000 s3api put-bucket-versioning --bucket bkt --versioning-configuration Status=Enabled
aws --endpoint-url http://localhost:8000 s3api put-bucket-versioning --bucket bkt --versioning-configuration Status=Suspend
aws --endpoint-url http://localhost:8000 s3api put-bucket-versioning --bucket bkt --versioning-configuration Status=Enabled

Another question please, what does the below admin operation that prefaces each versioning operation perform?
is it to change the bucker ownership to a different user or subuser?
will `uid=teuth-test-user` be the same user as the user that will perform the bucket delete operation?

"PUT /admin/bucket?bucket=teuth-test-bucket&bucket-id=a98d0c78-a903-4fa3-9e14-032fb698e9ee.4769.1&uid=teuth-test-user HTTP/1.1" 200 0 - "python-requests/2.20.0" -

Actions #16

Updated by Ernesto Puerta over 3 years ago

@Mark, keep in mind that this is consuming RGW through Ceph (Dashboard) API, but for RGW it is basically an HTTP Proxy (using Python requests library).

The test workflow is:
  1. POST to create new bucket without any specific features (default zonegroup and placement target). admin is the owner of this bucket.
  2. GET to list all buckets
  3. GET to fetch this bucket
  4. PUT to enable versioning and change owner (from admin to teuth-test-user)
  5. GET to ensure versioning is enabled
  6. PUT to enable MFA Delete
  7. GET to ensure MFA Delete & Versioning are enabled
  8. PUT to suspend versioning and disable MFA Delete
  9. GET to ensure Versioning suspended and MFA Delete is disabled
  10. DELETE to delete bucket
  11. GET to list all buckets (0 to be expected)
Actions #17

Updated by Mark Kogan over 3 years ago

Thank you very much, will try the outlined steps - just one clarification please:

The operation:

"PUT /admin/bucket?bucket=teuth-test-bucket&bucket-id=a98d0c78-a903-4fa3-9e14-032fb698e9ee.4769.1&uid=teuth-test-user HTTP/1.1" 200 0 - "python-requests/2.20.0" 

Per- https://docs.ceph.com/docs/master/radosgw/adminops/#link-bucket : Link a bucket to a specified user, unlinking the bucket from any previous user.

is the user that creates and deletes the bucket the same:

uid=teuth-test-user

or is it a another user that needs to be created and perform the user linking PUT operation during tthe reproduction?

Actions #18

Updated by Ernesto Puerta over 3 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 37027
Actions #19

Updated by Ernesto Puerta over 3 years ago

Mark Kogan wrote:

Thank you very much, will try the outlined steps - just one clarification please:

The operation:
[...]
Per- https://docs.ceph.com/docs/master/radosgw/adminops/#link-bucket : Link a bucket to a specified user, unlinking the bucket from any previous user.

is the user that creates and deletes the bucket the same:
[...]
or is it a another user that needs to be created and perform the user linking PUT operation during tthe reproduction?

Mark, yes, there's a change in the bucket ownership. admin creates the bucket initially, and then it's linked to the user teuth-test-user. I'll update my previous comment with these data.

BTW, not sure if you already saw that, but the causing PR has already been identified and a revert has been proposed (basically because this issue might be masking other API test errors). If you're ok with it, I'd suggest to merge the revert PR and then perform the necessary root cause analysis.

Actions #20

Updated by Mark Kogan over 3 years ago

With the provided information it is possible to reproduce the issue using the following command flow:

pip3 install --user rgwadmin

cat << EOF > ~/tmp/rgwadmin-DELETE_bucket.py
from rgwadmin import RGWAdmin
rgw = RGWAdmin(access_key='b2345678901234567890', secret_key='b234567890123456789012345678901234567890', server='localhost:8000', secure=False)

rgw.remove_bucket(bucket='teuth-test-bucket', purge_objects="true")
EOF

cat << EOF > ~/tmp/rgwadmin-LINK_bucket.py
from rgwadmin import RGWAdmin
rgw = RGWAdmin(access_key='b2345678901234567890', secret_key='b234567890123456789012345678901234567890', server='localhost:8000', secure=False)

rgw.link_bucket(bucket='teuth-test-bucket', bucket_id="", uid="teuth-test-user")
EOF

env MON=1 OSD=1 MDS=0 MGR=1 RGW=1 GANESHA=0 ../src/vstart.sh -n -d --rgw_frontend "beast" 

./bin/radosgw-admin user create --display-name="cosbench_user" --uid=cosbench --access-key b2345678901234567890 --secret b234567890123456789012345678901234567890 --system true
./bin/radosgw-admin user create --display-name="teuth_test_user" --uid=teuth-test-user --access-key t2345678901234567890 --secret t234567890123456789012345678901234567890 --system false

aws configure
...

aws --endpoint-url http://localhost:8000 s3 mb s3://teuth-test-bucket
aws --endpoint-url http://localhost:8000 s3api put-bucket-versioning --bucket teuth-test-bucket --versioning-configuration Status=Enabled
aws --endpoint-url http://localhost:8000 s3 cp ./ceph.conf s3://teuth-test-bucket/1.dat
aws --endpoint-url http://localhost:8000 s3 cp ./CMakeCache.txt s3://teuth-test-bucket/1.dat
aws --endpoint-url http://localhost:8000 s3api put-bucket-versioning --bucket teuth-test-bucket --versioning-configuration Status=Suspended
python3 ~/tmp/rgwadmin-LINK_bucket.py |& ccze -Aonolookups
python3 ~/tmp/rgwadmin-DELETE_bucket.py |& ccze -Aonolookups

tail -F ./out/radosgw.8000.log
2020-09-08T16:30:48.615+0300 7f0e16130700 10 cls_bucket_list_unordered teuth-test-bucket[f77a021a-487c-45a2-8f63-7812757235b0.4170.36] start_after [] num_entries 1100                        
2020-09-08T16:30:48.615+0300 7f0e16130700 20 WARNING: blocking librados call                                                                                                                  
2020-09-08T16:30:48.616+0300 7f0e16130700 20 WARNING: blocking librados call                                                                                                                  
2020-09-08T16:30:48.620+0300 7f0e16130700 20 get_system_obj_state: rctx=0x7f0dbd87cb78 obj=default.rgw.meta:root:teuth-test-bucket state=0x4686e20 s->prefetch_data=0                         
2020-09-08T16:30:48.620+0300 7f0e16130700 10 cache get: name=default.rgw.meta+root+teuth-test-bucket : hit (requested=0x16, cached=0x17)                                                      
2020-09-08T16:30:48.620+0300 7f0e16130700 20 get_system_obj_state: s->obj_tag was set empty                                                                                                   
2020-09-08T16:30:48.620+0300 7f0e16130700 10 cache get: name=default.rgw.meta+root+teuth-test-bucket : hit (requested=0x11, cached=0x17)                                                      
2020-09-08T16:30:48.620+0300 7f0e16130700 10 removing default.rgw.meta+root+teuth-test-bucket from cache                                                                                      
2020-09-08T16:30:48.620+0300 7f0e16130700 10 distributing notification oid=default.rgw.control:notify.1 bl.length()=161                                                                       
2020-09-08T16:30:48.620+0300 7f0e16130700 20 WARNING: blocking librados call                                                                                                                  
2020-09-08T16:30:48.620+0300 7f0fe24c8700 10 RGWWatcher::handle_notify()  notify_id 244813136043 cookie 25699328 notifier 4168 bl.length()=161                                                
2020-09-08T16:30:48.620+0300 7f0e16130700 20 WARNING: blocking librados call                                                                                                                  
2020-09-08T16:30:48.621+0300 7f0e16130700 -1 ERROR: could not remove bucket teuth-test-bucket                                                                                                 
2020-09-08T16:30:48.621+0300 7f0e16130700  0 req 288 0.005999965s :remove_bucket remove_bucket returned ret=-125  <<<-----------                                                                             
2020-09-08T16:30:48.621+0300 7f0e16130700  2 req 288 0.005999965s :remove_bucket completing                                                                                                   
2020-09-08T16:30:48.621+0300 7f0e16130700  2 req 288 0.005999965s :remove_bucket op status=-125                                                                                               
2020-09-08T16:30:48.621+0300 7f0e16130700  2 req 288 0.005999965s :remove_bucket http status=200                                                                                              
2020-09-08T16:30:48.621+0300 7f0e16130700  1 ====== req done req=0x7f0dbd87ea70 op status=-125 http_status=200 latency=0.005999965s ======                                                    
2020-09-08T16:30:48.621+0300 7f0e16130700  1 beast: 0x7f0dbd87ea70: 127.0.0.1 - - [2020-09-08T16:30:48.621133+0300] "DELETE /admin/bucket?format=json&bucket=teuth-test-bucket&purge-objects=t
rue HTTP/1.1" 200 0 - "python-requests/2.22.0" -    

aws --endpoint-url http://localhost:8000 s3 ls s3://teuth-test-bucket
2020-09-08 16:30:47      55313 1.dat
Actions #21

Updated by Mark Kogan over 3 years ago

narrowing the reproducer further, occures without versioning, link operation alone is enoughth to reproduce:

radosgw-admin bucket rm --bucket=teuth-test-bucket --purge-objects
aws --endpoint-url http://localhost:8000 s3 mb s3://teuth-test-bucket
python3 ~/tmp/rgwadmin-LINK_bucket.py |& ccze -Aonolookups
python3 ~/tmp/rgwadmin-DELETE_bucket.py |& ccze -Aonolookups
aws --endpoint-url http://localhost:8000 s3 ls

Updated by Mark Kogan over 3 years ago

attaching debug logs from rgw debug_rgw=20 debug_ms=1
and osd
debug_objclass=20

Actions #23

Updated by Casey Bodley over 3 years ago

testing https://github.com/ceph/ceph/pull/37054 as a fix. i also opened https://github.com/ceph/ceph/pull/37052 so the admin api won't return '200 OK' on error

Actions #24

Updated by Ernesto Puerta over 3 years ago

  • Pull request ID changed from 37027 to 37054
Actions #25

Updated by Laura Paduano over 3 years ago

  • Assignee set to Casey Bodley
  • Affected Versions v16.0.0 added
Actions #26

Updated by Casey Bodley over 3 years ago

  • Status changed from Fix Under Review to Resolved
Actions #27

Updated by Casey Bodley over 3 years ago

thanks for everyone's help bisecting and debugging this one!

Actions #28

Updated by Nathan Cutler over 3 years ago

  • Backport set to octopus
Actions #29

Updated by Nathan Cutler over 3 years ago

  • Backport changed from octopus to octopus, nautilus
Actions #30

Updated by Nathan Cutler over 3 years ago

  • Status changed from Resolved to Pending Backport
Actions #31

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47686: nautilus: rgw: FAIL: test_all (tasks.mgr.dashboard.test_rgw.RgwBucketTest) added
Actions #32

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47687: octopus: rgw: FAIL: test_all (tasks.mgr.dashboard.test_rgw.RgwBucketTest) added
Actions #33

Updated by Nathan Cutler over 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF