Project

General

Profile

Bug #23209

Strange RGW behaviour after running Cosbench tests (heavy read/write on cluster, then delete objects and dispose buckets)

Added by pratyush ranjan over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This could be a cosbench issue, I am not 100% sure, but I have raised a ticket there.

I ran a couple of cosbench tests on luminous, test involved creation of 100 buckets with 100 objects, a mix of 70% read, 30% write, and the stage where the objects are deleted and the buckets are cleaned.

After the test is done with cleanup and dispose, I can see a lot of write happening on the cluster, RGW log shows following -
What's strange is that no test is running during that period, and still the write are happening.

Ceph -s shows high write activity, with an increasing in object count and a decreasing in available space.
When the rgw is stopped and restarted, these write ops do not reoccur.

2018-03-04 23:34:12.611104 7fefff729700 1 civetweb: 0x7ff07911e000: 10.32.65.6 - - [04/Mar/2018:20:42:25 +0530] "PUT /lmns376/myobjects193 HTTP/1.1" 1 0 - aws-sdk-java/1.10.76 Linux/3.16.0-4-amd64 OpenJDK_64-Bit_Server_VM/24.151-b01/1.7.0_151
2018-03-04 23:34:12.611240 7fefff729700 1 ====== starting new request req=0x7fefff723300 =====
2018-03-04 23:34:12.616991 7fefff729700 1 ====== req done req=0x7fefff723300 op status=0 http_status=404 ======
2018-03-04 23:34:12.617875 7ff02076b700 1 civetweb: 0x7ff078fc8000: 10.32.65.6 - - [04/Mar/2018:18:56:39 +0530] "PUT /lmns425/myobjects925 HTTP/1.1" 1 0 - aws-sdk-java/1.10.76 Linux/3.16.0-4-amd64 OpenJDK_64-Bit_Server_VM/24.151-b01/1.7.0_151
2018-03-04 23:34:12.617950 7ff02076b700 1 ====== starting new request req=0x7ff020765300 =====
2018-03-04 23:34:12.622921 7ff02076b700 1 ====== req done req=0x7ff020765300 op status=0 http_status=404 ======
2018-03-04 23:34:12.625062 7fef7be22700 1 civetweb: 0x7ff079671000: 10.32.65.6 - - [04/Mar/2018:15:23:48 +0530] "PUT /lmns380/myobjects2446 HTTP/1.1" 1 0 - aws-sdk-java/1.10.76 Linux/3.16.0-4-amd64 OpenJDK_64-Bit_Server_VM/24.151-b01/1.7.0_151
2018-03-04 23:34:12.625146 7fef7be22700 1 ====== starting new request req=0x7fef7be1c300 =====
2018-03-04 23:34:12.626678 7ff03b7a1700 1 civetweb: 0x7ff078e40000: 10.32.65.6 - - [04/Mar/2018:15:23:48 +0530] "PUT /lmns638/myobjects1543 HTTP/1.1" 1 0 - aws-sdk-java/1.10.76 Linux/3.16.0-4-amd64 OpenJDK_64-Bit_Server_VM/24.151-b01/1.7.0_151
2018-03-04 23:34:12.626779 7ff03b7a1700 1 ====== starting new request req=0x7ff03b79b300 =====
2018-03-04 23:34:12.627077 7fef7be22700 1 ====== req done req=0x7fef7be1c300 op status=0 http_status=404 ======
2018-03-04 23:34:12.627809 7ff03b7a1700 1 ====== req done req=0x7ff03b79b300 op status=0 http_status=404 ======
2018-03-04 23:34:12.634188 7ff03afa0700 1 ====== req done req=0x7ff03af9a300 op status=0 http_status=200 ======
2018-03-04 23:34:12.634232 7ff03afa0700 1 civetweb: 0x7ff078e45000: 10.32.77.24 - - [04/Mar/2018:15:23:48 +0530] "PUT /lmns352/myobjects5558 HTTP/1.1" 1 0 - aws-sdk-java/1.10.76 Linux/3.16.0-4-amd64 OpenJDK_64-Bit_Server_VM/24.151-b01/1.7.0_151
2018-03-04 23:34:12.637091 7ff03afa0700 1 ====== starting new request req=0x7ff03af9a300 =====

...
...
...

And the final moments before a shutdown -

2018-03-04 23:39:21.582410 7f2f258f5700 1 ====== req done req=0x7f2f258ef300 op status=0 http_status=404 ======
2018-03-04 23:39:21.596783 7f2f280fa700 1 ====== req done req=0x7f2f280f4300 op status=0 http_status=200 ======
12247521,1 99%
2018-03-04 23:39:27.762299 7efcd6d43700 1 ====== req done req=0x7efcd6d3d300 op status=0 http_status=200 ======
2018-03-04 23:39:27.762344 7efcd6d43700 1 civetweb: 0x7efd17dc7000: 10.32.153.41 - - [04/Mar/2018:23:39:23 +0530] "PUT /lmns354/myobjects4941 HTTP/1.1" 1 0 - aws-sdk-java/1.10.76 Linux/3.16.0-4-amd64 OpenJDK_64-Bit_Server_VM/24.151-b01/1.7.0_151
2018-03-04 23:39:27.812220 7efcd8546700 0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Input/output error
2018-03-04 23:39:27.812266 7efcd8546700 1 ====== req done req=0x7efcd8540300 op status=0 http_status=200 ======
2018-03-04 23:39:27.812311 7efcd8546700 1 civetweb: 0x7efd17db8000: 10.32.65.6 - - [04/Mar/2018:23:39:23 +0530] "PUT /lmns390/myobjects337 HTTP/1.1" 1 0 - aws-sdk-java/1.10.76 Linux/3.16.0-4-amd64 OpenJDK_64-Bit_Server_VM/24.151-b01/1.7.0_151
2018-03-04 23:39:27.862686 7efcd7544700 0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Input/output error
2018-03-04 23:39:27.862729 7efcd7544700 1 ====== req done req=0x7efcd753e300 op status=0 http_status=200 ======
2018-03-04 23:39:27.862768 7efcd7544700 1 civetweb: 0x7efd17dc2000: 10.32.77.24 - - [04/Mar/2018:23:39:23 +0530] "PUT /lmns354/myobjects5291 HTTP/1.1" 1 0 - aws-sdk-java/1.10.76 Linux/3.16.0-4-amd64 OpenJDK_64-Bit_Server_VM/24.151-b01/1.7.0_151
2018-03-04 23:39:27.913195 7efd15549d00 1 final shutdown

It could be a Cosbench testing tool bug, or it could be an RGW issue, but not sure at the moment.
Will update in case I get more info on this.

Also available in: Atom PDF