Bug #53199
closedbucket stats inconsistency from racing multipart completions
100%
Description
steps to reproduce:
1. create a fresh bucket
$ s3cmd mb s3://testbucket Bucket 's3://testbucket/' created
2. create a 32MB object (any size over 15MB multipart threshold)
$ truncate -s 32M 32m.iso
3. perform 8 concurrent multipart uploads to the same object
$ for i in {0..7}; do s3cmd put 32m.iso s3://testbucket & done ... [8]+ Done s3cmd put 32m.iso s3://testbucket
4. check 'bucket stats'
$ radosgw-admin bucket stats --bucket testbucket ... "usage": { "rgw.main": { "size": 134217728, "size_actual": 134217728, "size_utilized": 134217728, "size_kb": 131072, "size_kb_actual": 131072, "size_kb_utilized": 131072, "num_objects": 10 },
expected stats: 1 object of size=33554432
Updated by Casey Bodley over 2 years ago
it looks like this is the result of ECANCELED errors on the head object during CompleteMultipart. the rgw log shows 3 of these errors out of the 8 multipart uploads:
2021-11-08T15:56:35.266-0500 7f0756ffd640 1 -- 192.168.245.128:0/2416006518 <== osd.0 v2:192.168.245.128:6800/920291 583 ==== osd_op_reply(589 f9658b86-62b3-45c0-831b-e7bcabd754a6.4137.1_32m.iso [cmpxattr (63) op 1 mode 1,create,call,setxattr (62),setxattr (62),setxattr (347),setxattr (147),setxattr (25),setxattr (34),setxattr (8),setxattr (4),setxattr (65),setxattr (17),setxattr (145)] v20'103 uv101 ondisk = -125 ((125) Operation canceled)) v8 ==== 741+0+0 (crc 0 0 0) 0x7f074c0c2040 con 0x5648d187cda0 2021-11-08T15:56:35.280-0500 7f0756ffd640 1 -- 192.168.245.128:0/2416006518 <== osd.0 v2:192.168.245.128:6800/920291 597 ==== osd_op_reply(603 f9658b86-62b3-45c0-831b-e7bcabd754a6.4137.1_32m.iso [cmpxattr (62) op 1 mode 1,create,call,setxattr (62),setxattr (62),setxattr (347),setxattr (147),setxattr (25),setxattr (34),setxattr (8),setxattr (4),setxattr (65),setxattr (17),setxattr (145)] v20'108 uv107 ondisk = -125 ((125) Operation canceled)) v8 ==== 741+0+0 (crc 0 0 0) 0x7f074c0c2040 con 0x5648d187cda0 2021-11-08T15:56:35.282-0500 7f0756ffd640 1 -- 192.168.245.128:0/2416006518 <== osd.0 v2:192.168.245.128:6800/920291 598 ==== osd_op_reply(604 f9658b86-62b3-45c0-831b-e7bcabd754a6.4137.1_32m.iso [cmpxattr (62) op 1 mode 1,create,call,setxattr (62),setxattr (62),setxattr (347),setxattr (147),setxattr (25),setxattr (34),setxattr (8),setxattr (4),setxattr (65),setxattr (17),setxattr (145)] v20'109 uv107 ondisk = -125 ((125) Operation canceled)) v8 ==== 741+0+0 (crc 0 0 0) 0x7f074c0c2040 con 0x5648d187cda0
when the head object write fails during a bucket index transation, we cancel the transaction by calling cls_rgw_bucket_complete_op() with op=CLS_RGW_OP_CANCEL instead of CLS_RGW_OP_ADD. the main difference between CLS_RGW_OP_CANCEL and the other ops, is that the cls call bails out early on "rgw_bucket_complete_op(): cancel requested" and doesn't try to handle remove_objs
in multipart uploads, remove_objs contains the list of multipart part entries, and we rely on cls to remove their entries and unaccount for their stats
a grep for 'remove_objs' in the osd log only shows 5 cases where remove_objs was non-empty:
... 2021-11-08T15:56:35.244-0500 7f9c33fff640 20 <cls> ../src/cls/rgw/cls_rgw.cc:1121: rgw_bucket_complete_op(): remove_objs.size()=0 2021-11-08T15:56:35.265-0500 7f9c33fff640 20 <cls> ../src/cls/rgw/cls_rgw.cc:1121: rgw_bucket_complete_op(): remove_objs.size()=3 2021-11-08T15:56:35.279-0500 7f9c33fff640 20 <cls> ../src/cls/rgw/cls_rgw.cc:1121: rgw_bucket_complete_op(): remove_objs.size()=3 2021-11-08T15:56:35.286-0500 7f9c33fff640 20 <cls> ../src/cls/rgw/cls_rgw.cc:1121: rgw_bucket_complete_op(): remove_objs.size()=3 2021-11-08T15:56:35.290-0500 7f9c33fff640 20 <cls> ../src/cls/rgw/cls_rgw.cc:1121: rgw_bucket_complete_op(): remove_objs.size()=0 2021-11-08T15:56:35.301-0500 7f9c33fff640 20 <cls> ../src/cls/rgw/cls_rgw.cc:1121: rgw_bucket_complete_op(): remove_objs.size()=3 2021-11-08T15:56:35.304-0500 7f9c33fff640 20 <cls> ../src/cls/rgw/cls_rgw.cc:1121: rgw_bucket_complete_op(): remove_objs.size()=0 2021-11-08T15:56:35.308-0500 7f9c33fff640 20 <cls> ../src/cls/rgw/cls_rgw.cc:1121: rgw_bucket_complete_op(): remove_objs.size()=0 2021-11-08T15:56:35.310-0500 7f9c33fff640 20 <cls> ../src/cls/rgw/cls_rgw.cc:1121: rgw_bucket_complete_op(): remove_objs.size()=3 2021-11-08T15:56:35.313-0500 7f9c33fff640 20 <cls> ../src/cls/rgw/cls_rgw.cc:1121: rgw_bucket_complete_op(): remove_objs.size()=0 ...
these correspond to the 5 multipart uploads which completed successfully. the 3 canceled transactions did not clean up the index entries for their parts (3 each), which explains why num_objects=10 instead of 1
Updated by Casey Bodley over 2 years ago
- Status changed from New to Fix Under Review
- Tags changed from stats to stats multipart
- Pull request ID set to 43854
Updated by Casey Bodley over 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot over 2 years ago
- Copied to Backport #53285: pacific: bucket stats inconsistency from racing multipart completions added
Updated by Backport Bot over 2 years ago
- Copied to Backport #53286: octopus: bucket stats inconsistency from racing multipart completions added
Updated by Backport Bot over 1 year ago
- Tags changed from stats multipart to stats multipart backport_processed
Updated by Konstantin Shalygin about 1 year ago
- Status changed from Pending Backport to Resolved
- % Done changed from 0 to 100
- Source set to Development