Bug #22963: radosgw-admin usage show loops indefinitly - again - rgw - Ceph

Actions

Copy link

Bug #22963

closed

radosgw-admin usage show loops indefinitly - again

Added by Ingo Reimann about 6 years ago. Updated about 6 years ago.

Status:

Resolved

Priority:

High

Assignee:

Yehuda Sadeh

Target version:

% Done:

Source:

Tags:

rgw, usage

Backport:

luminous

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

rgw

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

After updating our cluster from jewel to luminous, radosgw-admin usage show loops. I thought #21196 should have closed the Bug.

EG:
radosgw-admin usage show --uid="DunkelDatenaustausch" --start-date=2018-01-01 --end-date=2018-02-01

{
    "entries": [
        {
            "user": "DunkelDatenaustausch",
            "buckets": [
                {
                    "bucket": "",
                    "time": "2018-01-17 10:00:00.000000Z",
                    "epoch": 1516183200,
                    "owner": "DunkelDatenaustausch",
                    "categories": [
                        {
                            "category": "list_buckets",
                            "bytes_sent": 344,
                            "bytes_received": 0,
                            "ops": 1,
                            "successful_ops": 1
                        }
                    ]
                },
                {
                    "bucket": "",
                    "time": "2018-01-17 10:00:00.000000Z",
                    "epoch": 1516183200,
                    "owner": "DunkelDatenaustausch",
                    "categories": [
                        {
                            "category": "list_buckets",
                            "bytes_sent": 344,
                            "bytes_received": 0,
                            "ops": 1,
                            "successful_ops": 1
                        }
                    ]
                },
                {
                    "bucket": "",
                    "time": "2018-01-17 10:00:00.000000Z",
                    "epoch": 1516183200,
                    "owner": "DunkelDatenaustausch",
                    "categories": [
                        {
                            "category": "list_buckets",
                            "bytes_sent": 344,
                            "bytes_received": 0,
                            "ops": 1,
                            "successful_ops": 1
                        }
                    ]
                },
                [...]

Interestingly, the amount of data, bytes_sent and bytes_received seem to be correct and respect to complete time range. We wrote a little wrapper, that detects and breaks the loop. We still got the correct accounting, when we added the bytes afterwards.

For some accounts, we still get the original behaviour.

The cluster is rather old. It has been upgraded over years from bobtail, probably. I included the zonegroup config.

Files

clusterinfo.zip (20.6 KB) clusterinfo.zip

Ingo Reimann, 02/09/2018 06:40 AM

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Greg Farnum about 6 years ago

Project changed from Ceph to rgw

Actions

Copy link

Updated by Robin Johnson about 6 years ago

Status changed from New to 12
Priority changed from Normal to High
Tags set to rgw, usage
Backport set to luminous
Severity changed from 3 - minor to 2 - major
Release set to master

This bug definitely still exists, is affecting billing.
Here's the interesting part from --debug-rgw=20 --debug-ms=1 on the client, I'm going to track it to the OSD as well, but I think it's in the client-side class code.

2018-02-15 19:03:31.179167 7f3ba1219b80  1 -- [XXXX::5752]:0/598436282 --> [XXXX::7098]:6838/2803930 -- osd_op(unknown.0.0:70 12.2 12:51cb4789:::usage.21:head [call rgw.user_usage_log_read] snapc 0=[] ondisk+read+known_if_redirected e1292041) v8 -- 0x5618934a01d0 con 0
2018-02-15 19:03:31.192370 7f3b8f1a5700  1 -- [XXXX::5752]:0/598436282 <== osd.604 [XXXX::7098]:6838/2803930 1 ==== osd_op_reply(70 usage.21 [call] v0'0 uv194985623 ondisk = 0) v8 ==== 152+0+399 (1593127871 0 1061501781) 0x7f3b80012070 con 0x56189349bbb0
2018-02-15 19:03:31.192649 7f3ba1219b80  1 -- [XXXX::5752]:0/598436282 --> [XXXX::7098]:6838/2803930 -- osd_op(unknown.0.0:71 12.2 12:51cb4789:::usage.21:head [call rgw.user_usage_log_read] snapc 0=[] ondisk+read+known_if_redirected e1292041) v8 -- 0x56189349a360 con 0
2018-02-15 19:03:31.209750 7f3b8f1a5700  1 -- [XXXX::5752]:0/598436282 <== osd.604 [XXXX::7098]:6838/2803930 2 ==== osd_op_reply(71 usage.21 [call] v0'0 uv194985623 ondisk = 0) v8 ==== 152+0+399 (1593127871 0 1061501781) 0x7f3b80012070 con 0x56189349bbb0
2018-02-15 19:03:31.209975 7f3ba1219b80  1 -- [XXXX::5752]:0/598436282 --> [XXXX::7098]:6838/2803930 -- osd_op(unknown.0.0:72 12.2 12:51cb4789:::usage.21:head [call rgw.user_usage_log_read] snapc 0=[] ondisk+read+known_if_redirected e1292041) v8 -- 0x56189349a360 con 0
2018-02-15 19:03:31.227569 7f3b8f1a5700  1 -- [XXXX::5752]:0/598436282 <== osd.604 [XXXX::7098]:6838/2803930 3 ==== osd_op_reply(72 usage.21 [call] v0'0 uv194985623 ondisk = 0) v8 ==== 152+0+399 (1593127871 0 1061501781) 0x7f3b80010850 con 0x56189349bbb0
2018-02-15 19:03:31.227796 7f3ba1219b80  1 -- [XXXX::5752]:0/598436282 --> [XXXX::7098]:6838/2803930 -- osd_op(unknown.0.0:73 12.2 12:51cb4789:::usage.21:head [call rgw.user_usage_log_read] snapc 0=[] ondisk+read+known_if_redirected e1292041) v8 -- 0x56189349a360 con 0
2018-02-15 19:03:31.245164 7f3b8f1a5700  1 -- [XXXX::5752]:0/598436282 <== osd.604 [XXXX::7098]:6838/2803930 4 ==== osd_op_reply(73 usage.21 [call] v0'0 uv194985623 ondisk = 0) v8 ==== 152+0+399 (1593127871 0 1061501781) 0x7f3b80012070 con 0x56189349bbb0

Actions

Copy link

Updated by Robin Johnson about 6 years ago

Debugging so far:
1. The bug only occurs on the Congress cluster, not the Benjamin cluster (was rebuilt from scratch with Jewel era). So probably depends on old data. Congress has LOTS of old usage data.
2. I haven't seen the ENOENT path get triggered yet.

Actions

Copy link

Updated by Robin Johnson about 6 years ago

The comment by Marcin Gibula in issue 21196 is actually spot on.
This DOES actually complete, eventually, if nothing else kills it first for taking too long.

I created a brand new user, and started a capture on it, asking for yesterday's data (of which there should be NONE):
time radosgw-admin usage show --uid ceph-22963-test --start-date=2018-02-17

It did eventually complete, with the time output of:
real 37m3.425s
user 0m39.836s
sys 0m14.912s

If I pick another username, the runtime is very similar.

Actions

Copy link

Updated by Marcin Gibula about 6 years ago

Hi,
just FYI - I've uploaded some logs in this ticket as well - http://tracker.ceph.com/issues/22314

If you need anything more, just ask - it's currently a major blocker for us as well.

Actions

Copy link

Updated by Orit Wasserman about 6 years ago

Related to Bug #2050: rgw: crash at Objecter::_linger_commit() added

Actions

Copy link

Updated by Orit Wasserman about 6 years ago

Related to deleted (Bug #2050: rgw: crash at Objecter::_linger_commit())

Actions

Copy link

Updated by Orit Wasserman about 6 years ago

Related to Bug #22050: ERROR type entries of pglog do not update min_last_complete_ondisk, potentially ballooning memory usage added

Actions

Copy link

Updated by Greg Farnum about 6 years ago

Looked through the bucket_list.log and do see some inexplicably long op times. Although the watch-notify stuff (3) I could maybe see taking a while, most listing ops take 1 ms instead of 350 (1), and a create shouldn't take 196 ms (2) unless the PG is blocked for some reason. (Which it...might be?)

2018-02-07 07:23:43.501849 7f87efd3f700  1 -- 10.198.10.101:0/734401551 --> 10.198.10.14:6832/271459 -- osd_op(unknown.0.0:21 16.1 16:80000000::::head [pgnls start_epoch 2083520] snapc 0=[] ondisk+read+ignore_overlay+known_if_redirected e2083520) v8 -- 0x7f87e000c900 con 0
2018-02-07 07:23:43.825047 7f87ef53e700  1 -- 10.198.10.101:0/734401551 <== osd.215 10.198.10.14:6832/271459 1 ==== osd_op_reply(21  [pgnls start_epoch 2083520] v987079'84 uv84 ondisk = 1) v8 ==== 144+0+107 (3045876259 0 1982614520) 0x7f87e4009c00 con 0x7f87e0008240
...
2018-02-07 07:23:43.846484 7f8802235cc0  1 -- 10.198.10.101:0/734401551 --> 10.198.10.15:6830/158365 -- osd_op(unknown.0.0:52 8.7 8:f95f44c2:::notify.0:head [create] snapc 0=[] ondisk+write+known_if_redirected e2083520) v8 -- 0x56443269d7c0 con 0
2018-02-07 07:23:44.042682 7f87ef53e700  1 -- 10.198.10.101:0/734401551 <== osd.27 10.198.10.15:6830/158365 1 ==== osd_op_reply(52 notify.0 [create] v2083520'1819325 uv1819325 ondisk = 0) v8 ==== 152+0+0 (872112348 0 0) 0x7f87e400a900 con 0x5644326990a0
...
2018-02-07 07:23:44.042912 7f8802235cc0  1 -- 10.198.10.101:0/734401551 --> 10.198.10.15:6830/158365 -- osd_op(unknown.0.0:53 8.7 8:f95f44c2:::notify.0:head [watch watch cookie 94850903562176] snapc 0=[] ondisk+write+known_if_redirected e2083520) v8 -- 0x56443269e1b0 con 0
2018-02-07 07:23:45.013615 7f87ef53e700  1 -- 10.198.10.101:0/734401551 <== osd.27 10.198.10.15:6830/158365 2 ==== osd_op_reply(53 notify.0 [watch watch cookie 94850903562176] v2083520'1819326 uv1819325 ondisk = 0) v8 ==== 152+0+0 (739656982 0 0) 0x7f87e400a900 con 0x5644326990a0
2018-02-07 07:23:45.013746 7f8802235cc0 20 add_watcher() i=0

But since that log appears to end successfully I'm confused

Actions

Copy link

#10

Updated by Greg Farnum about 6 years ago

Also Sage is asking if missing https://github.com/ceph/ceph/pull/13021 might be causing the repeated loops.

Actions

Copy link

#11

Updated by Greg Farnum about 6 years ago

Okay, dug into the rgw side of things a bit more and I think https://github.com/ceph/ceph/pull/19131 contains some incorrect logic?

[16:04:51] <gregsfortytwo> it looks to me like that's a little broken: in cls_rgw_client, we keep looping until we receive ENODATA
[16:05:23] <gregsfortytwo> in cls_rgw, we only return ENODATA if "iter.empty()", which in usage_iterate_range is key.iter
[16:05:30] <gregsfortytwo> s/key.iter/key_iter/
[16:06:07] <gregsfortytwo> but in usage_iterate_range we set key_iter to non-empty if we run out of user data to trim, meaning that we're done

Actions

Copy link

#12

Updated by Greg Farnum about 6 years ago

Status changed from 12 to 4

It kinda looks to me like we should just set "*truncated = false" in the cases where usage_iterate_range() reaches the end of what we care about, and that the callers should trust that truncated parameter about whether they're done or not. But perhaps I'm missing something?

Anyway, pushed a PR: https://github.com/ceph/ceph/pull/20730

Assuming that's correct, it will for sure need a backport.

Actions

Copy link

#13