Bug #22963
closedradosgw-admin usage show loops indefinitly - again
0%
Description
After updating our cluster from jewel to luminous, radosgw-admin usage show
loops. I thought #21196 should have closed the Bug.
EG:radosgw-admin usage show --uid="DunkelDatenaustausch" --start-date=2018-01-01 --end-date=2018-02-01
{ "entries": [ { "user": "DunkelDatenaustausch", "buckets": [ { "bucket": "", "time": "2018-01-17 10:00:00.000000Z", "epoch": 1516183200, "owner": "DunkelDatenaustausch", "categories": [ { "category": "list_buckets", "bytes_sent": 344, "bytes_received": 0, "ops": 1, "successful_ops": 1 } ] }, { "bucket": "", "time": "2018-01-17 10:00:00.000000Z", "epoch": 1516183200, "owner": "DunkelDatenaustausch", "categories": [ { "category": "list_buckets", "bytes_sent": 344, "bytes_received": 0, "ops": 1, "successful_ops": 1 } ] }, { "bucket": "", "time": "2018-01-17 10:00:00.000000Z", "epoch": 1516183200, "owner": "DunkelDatenaustausch", "categories": [ { "category": "list_buckets", "bytes_sent": 344, "bytes_received": 0, "ops": 1, "successful_ops": 1 } ] }, [...]
Interestingly, the amount of data, bytes_sent and bytes_received seem to be correct and respect to complete time range. We wrote a little wrapper, that detects and breaks the loop. We still got the correct accounting, when we added the bytes afterwards.
For some accounts, we still get the original behaviour.
The cluster is rather old. It has been upgraded over years from bobtail, probably. I included the zonegroup config.
Files
Updated by Robin Johnson about 6 years ago
- Status changed from New to 12
- Priority changed from Normal to High
- Tags set to rgw, usage
- Backport set to luminous
- Severity changed from 3 - minor to 2 - major
- Release set to master
This bug definitely still exists, is affecting billing.
Here's the interesting part from --debug-rgw=20 --debug-ms=1 on the client, I'm going to track it to the OSD as well, but I think it's in the client-side class code.
2018-02-15 19:03:31.179167 7f3ba1219b80 1 -- [XXXX::5752]:0/598436282 --> [XXXX::7098]:6838/2803930 -- osd_op(unknown.0.0:70 12.2 12:51cb4789:::usage.21:head [call rgw.user_usage_log_read] snapc 0=[] ondisk+read+known_if_redirected e1292041) v8 -- 0x5618934a01d0 con 0 2018-02-15 19:03:31.192370 7f3b8f1a5700 1 -- [XXXX::5752]:0/598436282 <== osd.604 [XXXX::7098]:6838/2803930 1 ==== osd_op_reply(70 usage.21 [call] v0'0 uv194985623 ondisk = 0) v8 ==== 152+0+399 (1593127871 0 1061501781) 0x7f3b80012070 con 0x56189349bbb0 2018-02-15 19:03:31.192649 7f3ba1219b80 1 -- [XXXX::5752]:0/598436282 --> [XXXX::7098]:6838/2803930 -- osd_op(unknown.0.0:71 12.2 12:51cb4789:::usage.21:head [call rgw.user_usage_log_read] snapc 0=[] ondisk+read+known_if_redirected e1292041) v8 -- 0x56189349a360 con 0 2018-02-15 19:03:31.209750 7f3b8f1a5700 1 -- [XXXX::5752]:0/598436282 <== osd.604 [XXXX::7098]:6838/2803930 2 ==== osd_op_reply(71 usage.21 [call] v0'0 uv194985623 ondisk = 0) v8 ==== 152+0+399 (1593127871 0 1061501781) 0x7f3b80012070 con 0x56189349bbb0 2018-02-15 19:03:31.209975 7f3ba1219b80 1 -- [XXXX::5752]:0/598436282 --> [XXXX::7098]:6838/2803930 -- osd_op(unknown.0.0:72 12.2 12:51cb4789:::usage.21:head [call rgw.user_usage_log_read] snapc 0=[] ondisk+read+known_if_redirected e1292041) v8 -- 0x56189349a360 con 0 2018-02-15 19:03:31.227569 7f3b8f1a5700 1 -- [XXXX::5752]:0/598436282 <== osd.604 [XXXX::7098]:6838/2803930 3 ==== osd_op_reply(72 usage.21 [call] v0'0 uv194985623 ondisk = 0) v8 ==== 152+0+399 (1593127871 0 1061501781) 0x7f3b80010850 con 0x56189349bbb0 2018-02-15 19:03:31.227796 7f3ba1219b80 1 -- [XXXX::5752]:0/598436282 --> [XXXX::7098]:6838/2803930 -- osd_op(unknown.0.0:73 12.2 12:51cb4789:::usage.21:head [call rgw.user_usage_log_read] snapc 0=[] ondisk+read+known_if_redirected e1292041) v8 -- 0x56189349a360 con 0 2018-02-15 19:03:31.245164 7f3b8f1a5700 1 -- [XXXX::5752]:0/598436282 <== osd.604 [XXXX::7098]:6838/2803930 4 ==== osd_op_reply(73 usage.21 [call] v0'0 uv194985623 ondisk = 0) v8 ==== 152+0+399 (1593127871 0 1061501781) 0x7f3b80012070 con 0x56189349bbb0
Updated by Robin Johnson about 6 years ago
Debugging so far:
1. The bug only occurs on the Congress cluster, not the Benjamin cluster (was rebuilt from scratch with Jewel era). So probably depends on old data. Congress has LOTS of old usage data.
2. I haven't seen the ENOENT path get triggered yet.
Updated by Robin Johnson about 6 years ago
The comment by Marcin Gibula in issue 21196 is actually spot on.
This DOES actually complete, eventually, if nothing else kills it first for taking too long.
I created a brand new user, and started a capture on it, asking for yesterday's data (of which there should be NONE):
time radosgw-admin usage show --uid ceph-22963-test --start-date=2018-02-17
It did eventually complete, with the time output of:
real 37m3.425s
user 0m39.836s
sys 0m14.912s
If I pick another username, the runtime is very similar.
Updated by Marcin Gibula about 6 years ago
Hi,
just FYI - I've uploaded some logs in this ticket as well - http://tracker.ceph.com/issues/22314
If you need anything more, just ask - it's currently a major blocker for us as well.
Updated by Orit Wasserman about 6 years ago
- Related to Bug #2050: rgw: crash at Objecter::_linger_commit() added
Updated by Orit Wasserman about 6 years ago
- Related to deleted (Bug #2050: rgw: crash at Objecter::_linger_commit())
Updated by Orit Wasserman about 6 years ago
- Related to Bug #22050: ERROR type entries of pglog do not update min_last_complete_ondisk, potentially ballooning memory usage added
Updated by Greg Farnum about 6 years ago
Looked through the bucket_list.log and do see some inexplicably long op times. Although the watch-notify stuff (3) I could maybe see taking a while, most listing ops take 1 ms instead of 350 (1), and a create shouldn't take 196 ms (2) unless the PG is blocked for some reason. (Which it...might be?)
2018-02-07 07:23:43.501849 7f87efd3f700 1 -- 10.198.10.101:0/734401551 --> 10.198.10.14:6832/271459 -- osd_op(unknown.0.0:21 16.1 16:80000000::::head [pgnls start_epoch 2083520] snapc 0=[] ondisk+read+ignore_overlay+known_if_redirected e2083520) v8 -- 0x7f87e000c900 con 0 2018-02-07 07:23:43.825047 7f87ef53e700 1 -- 10.198.10.101:0/734401551 <== osd.215 10.198.10.14:6832/271459 1 ==== osd_op_reply(21 [pgnls start_epoch 2083520] v987079'84 uv84 ondisk = 1) v8 ==== 144+0+107 (3045876259 0 1982614520) 0x7f87e4009c00 con 0x7f87e0008240 ... 2018-02-07 07:23:43.846484 7f8802235cc0 1 -- 10.198.10.101:0/734401551 --> 10.198.10.15:6830/158365 -- osd_op(unknown.0.0:52 8.7 8:f95f44c2:::notify.0:head [create] snapc 0=[] ondisk+write+known_if_redirected e2083520) v8 -- 0x56443269d7c0 con 0 2018-02-07 07:23:44.042682 7f87ef53e700 1 -- 10.198.10.101:0/734401551 <== osd.27 10.198.10.15:6830/158365 1 ==== osd_op_reply(52 notify.0 [create] v2083520'1819325 uv1819325 ondisk = 0) v8 ==== 152+0+0 (872112348 0 0) 0x7f87e400a900 con 0x5644326990a0 ... 2018-02-07 07:23:44.042912 7f8802235cc0 1 -- 10.198.10.101:0/734401551 --> 10.198.10.15:6830/158365 -- osd_op(unknown.0.0:53 8.7 8:f95f44c2:::notify.0:head [watch watch cookie 94850903562176] snapc 0=[] ondisk+write+known_if_redirected e2083520) v8 -- 0x56443269e1b0 con 0 2018-02-07 07:23:45.013615 7f87ef53e700 1 -- 10.198.10.101:0/734401551 <== osd.27 10.198.10.15:6830/158365 2 ==== osd_op_reply(53 notify.0 [watch watch cookie 94850903562176] v2083520'1819326 uv1819325 ondisk = 0) v8 ==== 152+0+0 (739656982 0 0) 0x7f87e400a900 con 0x5644326990a0 2018-02-07 07:23:45.013746 7f8802235cc0 20 add_watcher() i=0
But since that log appears to end successfully I'm confused
Updated by Greg Farnum about 6 years ago
Also Sage is asking if missing https://github.com/ceph/ceph/pull/13021 might be causing the repeated loops.
Updated by Greg Farnum about 6 years ago
Okay, dug into the rgw side of things a bit more and I think https://github.com/ceph/ceph/pull/19131 contains some incorrect logic?
[16:04:51] <gregsfortytwo> it looks to me like that's a little broken: in cls_rgw_client, we keep looping until we receive ENODATA
[16:05:23] <gregsfortytwo> in cls_rgw, we only return ENODATA if "iter.empty()", which in usage_iterate_range is key.iter
[16:05:30] <gregsfortytwo> s/key.iter/key_iter/
[16:06:07] <gregsfortytwo> but in usage_iterate_range we set key_iter to non-empty if we run out of user data to trim, meaning that we're done
Updated by Greg Farnum about 6 years ago
- Status changed from 12 to 4
It kinda looks to me like we should just set "*truncated = false" in the cases where usage_iterate_range() reaches the end of what we care about, and that the callers should trust that truncated parameter about whether they're done or not. But perhaps I'm missing something?
Anyway, pushed a PR: https://github.com/ceph/ceph/pull/20730
Assuming that's correct, it will for sure need a backport.
Updated by Orit Wasserman about 6 years ago
- Related to Bug #23229: usage trim loops forever: infinite calls to rgw.user_usage_log_trim added
Updated by Orit Wasserman about 6 years ago
- Related to Bug #23229: usage trim loops forever: infinite calls to rgw.user_usage_log_trim added
Updated by Orit Wasserman about 6 years ago
- Related to deleted (Bug #23229: usage trim loops forever: infinite calls to rgw.user_usage_log_trim)
Updated by Orit Wasserman about 6 years ago
- Status changed from 4 to Fix Under Review
- Assignee set to Orit Wasserman
Updated by Robin Johnson about 6 years ago
@Orit Wasserman & @Greg Farnum:
I confirm this fix works for the Benjamin cluster! Now completes in a few seconds.
Updated by Orit Wasserman about 6 years ago
- Status changed from Fix Under Review to 17
Updated by Orit Wasserman about 6 years ago
- Assignee changed from Orit Wasserman to Yehuda Sadeh
Updated by Robin Johnson about 6 years ago
my own further fix here:
https://github.com/dreamhost/ceph/commit/6c0cb1c1491df1f69309ff57a5b0f9d7929e1990
Will be submitted for master this week hopefully.
Further digging shows this might have been broken since the Jewel release?
Updated by Yehuda Sadeh about 6 years ago
Updated by Casey Bodley about 6 years ago
- Status changed from 7 to Pending Backport
Updated by Nathan Cutler about 6 years ago
- Copied to Backport #23686: luminous: radosgw-admin usage show loops indefinitly - again added
Updated by Abhishek Lekshmanan about 6 years ago
- Status changed from Pending Backport to Resolved