Project

General

Profile

Actions

Bug #19870

closed

rgw: bytes_send and bytes_recv in the msg of usage show returning is 0 in master branch

Added by Enming Zhang almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

➜ build git:(master) ✗ ./bin/radosgw-admin usage show {
"entries": [ {
"user": "tester",
"buckets": [ {
"bucket": "",
"time": "2017-05-05 06:00:00.000000Z",
"epoch": 1493964000,
"owner": "tester",
"categories": [ {
"category": "list_buckets",
"bytes_sent": 0,
"bytes_received": 0,
"ops": 2,
"successful_ops": 2
}
]
}, {
"bucket": "222",
"time": "2017-05-05 06:00:00.000000Z",
"epoch": 1493964000,
"owner": "tester",
"categories": [ {
"category": "create_bucket",
"bytes_sent": 0,
"bytes_received": 0,
"ops": 1,
"successful_ops": 1
}
]
}
]
}
],
"summary": [ {
"user": "tester",
"categories": [ {
"category": "create_bucket",
"bytes_sent": 0,
"bytes_received": 0,
"ops": 1,
"successful_ops": 1
}, {
"category": "list_buckets",
"bytes_sent": 0,
"bytes_received": 0,
"ops": 2,
"successful_ops": 2
}
],
"total": {
"bytes_sent": 0,
"bytes_received": 0,
"ops": 3,
"successful_ops": 3
}
}
]
}


Related issues 2 (0 open2 closed)

Related to rgw - Bug #19194: after s3cmd put file to bucket, 'radosgw-admin usage show --categories=put_obj' bytes_received=0DuplicateMarcus Watts03/06/2017

Actions
Copied to rgw - Backport #20968: luminous: rgw: bytes_send and bytes_recv in the msg of usage show returning is 0 in master branchResolvedNathan CutlerActions
Actions #1

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to rgw
Actions #2

Updated by Orit Wasserman almost 7 years ago

  • Assignee set to Marcus Watts
Actions #3

Updated by Casey Bodley almost 7 years ago

  • Related to Bug #19194: after s3cmd put file to bucket, 'radosgw-admin usage show --categories=put_obj' bytes_received=0 added
Actions #5

Updated by Yehuda Sadeh almost 7 years ago

  • Subject changed from bytes_send and bytes_recv in the msg of usage show returning is 0 in master branch to rgw: bytes_send and bytes_recv in the msg of usage show returning is 0 in master branch
Actions #6

Updated by Yehuda Sadeh almost 7 years ago

while we're at it, we need to make sure that there's sufficient teuthology coverage for this issue. I think we do test usage, but maybe not look at the bytes_received / bytes_sent fields.

Actions #7

Updated by Marcus Watts almost 7 years ago

I may have found the teuthology tests. in ceph, master branch, "qa/tasks/radosgw_admin.py" and "qa/tasks/radosgw_admin_rest.py". Is that the current thing? Are there any other places I should look? It looks like that just exercises radosgw-admin interfaces, so may not fully cover all the s3/swift possible accounting paths. There's no logic there to look at bytes_sent|bytes_received. Looks like it's mainly checking existance and "successful_ops".

jewel has relatively straightforward code for this. Does it calculate bytes_sent and bytes_received correctly?

Looking at master, it appears it has significantly more complex code using c++ classes, templates, and such. One of the classes involved is "BufferingFilter" which appears to have the property of shifting byte counts from send_body() to a later complete_request() call. So far my gdb experiments suggest that this later count is not wrapped in AccountingFilter and so winds up not being accumulated.

Actions #8

Updated by Marcus Watts almost 7 years ago

cbodley has a fix for a similar problem with swift:
https://github.com/ceph/ceph/pull/14042

I believe it may be correct that "create_bucket" lists 0 bytes sent/received. The count appears to be intended to cover the body of the request/response only, not the headers. create_bucket may not have any data for either. I'll verify that this is the case with jewel/boto.

"list_buckets" should definitely have a nz sent_bytes count. Jewel certainly shows a nz count. I've been able to confirm with debug code that the count is being lost in process_request() when it calls complete_request.

In my experiments with s3cmd, I believe get/put object do update byte counts right.

I'm looking at what it takes to run teuthology's radmin_admin command standalone. I appear to have managed this once upon a time, so it should be just a matter of updating my process. Once I can do that, adding code to test out usage should be much more feasible.

Actions #9

Updated by Marcus Watts almost 7 years ago

I've been able to run a modified version of radosgw_admin.py against jewel & master. Modifications included instrumenting boto calls to track sent & received bytes, moving the "usage show" logic to near the end (after all the boto) calls, and added a few more boto calls to exercise more interfaces.

Both jewel & master seem to have various oddities in counts.

jewel,
list_buckets: actually received 299 bytes, but ceph claims to have sent 320.
create_bucket: in 9 ops, 8 successes returned 0 bytes, one failed returning 218 bytes. Ceph claims to have sent 152 bytes in 8 ops.
get_acls: got 1026 bytes back, but ceph claims to have sent 1068.
list_bucket: got 484 bytes, but ceph claims to have sent 505.
put_acls: got 0 bytes back, but ceph claims to have sent 38.

master,
create_bucket: received 220 bytes in 1 failure of 9 ops total, but ceph claims to have sent 0 bytes in 8 ops.
get_acls: received 1026 bytes, but ceph claism to have sent 0 bytes.
list_bucket: received 484 butes, but ceph claims to have sent 0 bytes.
list_buckets: received 299 bytes, but ceph claims to have sent 0 bytes.

My build of ceph master doesn't have casey's patch. I'll have to look at that to see if it fixes s3 too.
Ideally these tests should be doing swift too, and be capable of testing casey's patch more directly.

Actions #10

Updated by Marcus Watts almost 7 years ago

AccountingFilter is missing "complete_request". So that's why master drops "bytes_sent" in most cases. Implementing that function in reveals another problem: BufferingFilter does some trickery of its own. The big problem is that in complete_request, it calls send_content_length/complete_header - which return counts that should not be counted. The mechanism that's supposed to control when bytes are counted, "AccountingFilter::set_account()", is only supposed to be called after complete_header() is called, but BufferingFilter cleverly defers calling that. I have a not very elegant fix for this, which seems to be giving me mostly correct counts.

Actions #11

Updated by Marcus Watts almost 7 years ago

github.com:mdw-at-linuxbox/ceph-1 policy is a copy of my current changes to master to do this. Lots of temp debug messages and deadend code get_pending_data_count() to remove--needs significant cleanup before making a PR. Also there's a dmclock build fix which at the very least has a different sha1 in master.

Actions #12

Updated by Matt Benjamin almost 7 years ago

per Marcus 6/21, he has a candidate fix with some "ugly" aspects, is working on revised version

Actions #13

Updated by Matt Benjamin over 6 years ago

  • Status changed from New to In Progress
Actions #14

Updated by Casey Bodley over 6 years ago

can Marcus propose a simple fix so other developers can review and suggest alternatives?

Actions #15

Updated by Marcus Watts over 6 years ago

I've posted a fix against master for the bytes usage fix.
https://github.com/ceph/ceph/pull/16834

I got rid of some unnecessary detours for fixing this bug (fixes based on theories of the bug that turned out to be wrong.)

This passes modified teuthology test code that I'll shortly post as a PR against teuthology.

Actions #16

Updated by Marcus Watts over 6 years ago

Turns out the actual teuthologic logic to test usage now lives in the ceph repo! So

https://github.com/ceph/ceph/pull/16835

This points to a changed radosgw_admin.py file that checks usage a lot more thoroughly.

I tested this logic out against a jewel without any special patches - it passes this test as well. So we could use this to catch future regressions in jewel if we want.

Actions #17

Updated by Matt Benjamin over 6 years ago

does this need backport (L?)

Actions #18

Updated by Matt Benjamin over 6 years ago

  • Status changed from In Progress to Pending Backport
Actions #19

Updated by Nathan Cutler over 6 years ago

  • Backport set to luminous
Actions #20

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #20968: luminous: rgw: bytes_send and bytes_recv in the msg of usage show returning is 0 in master branch added
Actions #21

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF