Bug #54482
closedoctopus: heap memory leak in radosgw
0%
Description
I've been trying to narrow down why radosgw is hogging so much memory first on 15.2.15 then I found https://github.com/ceph/ceph/pull/43381 so I upgraded radosgw to 15.2.16 but the issue is still there.
We are heavy users of the admin REST API to get buckets information etc.
I can see with valgrind's massif tool that the it increases incrementally between each snapshot so the longer the time goes the faster the memory usage increases exponentially.
I've attached the massif output as a file, it's the same on 15.2.15 and 15.2.16 I thought it was the above bugfix because this sticks out
--------------------------------------------------------------------------------n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
--------------------------------------------------------------------------------
58 478,065,211,953 1,119,989,200 1,058,526,445 61,462,755 0
94.51% (1,058,526,445B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->72.80% (815,392,000B) 0x5629069: rgw::sal::RGWRadosUser::list_buckets(std::string const&, std::string const&, unsigned long, bool, rgw::sal::RGWBucketList&) (in /usr/lib64/libradosgw.so.2.0.0)
->72.80% (815,392,000B) 0x534B93E: RGWBucketAdminOp::info(rgw::sal::RGWRadosStore*, RGWBucketAdminOpState&, RGWFormatterFlusher&) (in /usr/lib64/libradosgw.so.2.0.0) |
->72.80% (815,392,000B) 0x523561B: RGWOp_Bucket_Info::execute() (in /usr/lib64/libradosgw.so.2.0.0) |
->72.80% (815,392,000B) 0x522DAA1: rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, bool) (in /usr/lib64/libradosgw.so.2.0.0) |
->72.80% (815,392,000B) 0x52315C7: process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::string const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, int*) (in /usr/lib64/libradosgw.so.2.0.0) |
->72.80% (815,392,000B) 0x519A9F8: RGWCivetWebFrontend::process(mg_connection*) (in /usr/lib64/libradosgw.so.2.0.0) |
->72.80% (815,392,000B) 0x52EBA4D: ??? (in /usr/lib64/libradosgw.so.2.0.0) |
->72.80% (815,392,000B) 0x52ED6EE: ??? (in /usr/lib64/libradosgw.so.2.0.0) |
->72.80% (815,392,000B) 0x52EDB97: ??? (in /usr/lib64/libradosgw.so.2.0.0) |
->72.80% (815,392,000B) 0xFEA5EA4: start_thread (in /usr/lib64/libpthread-2.17.so) |
->72.80% (815,392,000B) 0x116409FC: clone (in /usr/lib64/libc-2.17.so) |
||||||||||
->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%) |
This is how the graph looks like
GB 1.051^ : | @@:# | @@::::@ :# | :@@ :: :@ :# | @@::::::@@ :: :@ :# | @@ ::: ::@@ :: :@ :# | @@@@ ::: ::@@ :: :@ :# | @@@ @@ ::: ::@@ :: :@ :# | ::::@ @ @@ ::: ::@@ :: :@ :# | @@::: :@ @ @@ ::: ::@@ :: :@ :# | @@@::::@ ::: :@ @ @@ ::: ::@@ :: :@ :# | @::@@ :: :@ ::: :@ @ @@ ::: ::@@ :: :@ :# | @@@: @@ :: :@ ::: :@ @ @@ ::: ::@@ :: :@ :# | ::@:::@ @: @@ :: :@ ::: :@ @ @@ ::: ::@@ :: :@ :# | :::::: @: :@ @: @@ :: :@ ::: :@ @ @@ ::: ::@@ :: :@ :# | ::::: ::: @: :@ @: @@ :: :@ ::: :@ @ @@ ::: ::@@ :: :@ :# | :::: ::: ::: @: :@ @: @@ :: :@ ::: :@ @ @@ ::: ::@@ :: :@ :# | :::::::: :: ::: ::: @: :@ @: @@ :: :@ ::: :@ @ @@ ::: ::@@ :: :@ :# | ::: ::: : :: ::: ::: @: :@ @: @@ :: :@ ::: :@ @ @@ ::: ::@@ :: :@ :# | ::: ::: : :: ::: ::: @: :@ @: @@ :: :@ ::: :@ @ @@ ::: ::@@ :: :@ :# 0 +----------------------------------------------------------------------->Gi 0 448.2 Number of snapshots: 62 Detailed snapshots: [18, 21, 22, 24, 25, 29, 34, 35, 36, 37, 43, 44, 48, 57, 58 (peak)]
Files
Updated by Tobias Urdin about 2 years ago
-------------------------------------------------------------------------------- n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) -------------------------------------------------------------------------------- 58 478,065,211,953 1,119,989,200 1,058,526,445 61,462,755 0 94.51% (1,058,526,445B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->72.80% (815,392,000B) 0x5629069: rgw::sal::RGWRadosUser::list_buckets(std::string const&, std::string const&, unsigned long, bool, rgw::sal::RGWBucketList&) (in /usr/lib64/libradosgw.so.2.0.0) | ->72.80% (815,392,000B) 0x534B93E: RGWBucketAdminOp::info(rgw::sal::RGWRadosStore*, RGWBucketAdminOpState&, RGWFormatterFlusher&) (in /usr/lib64/libradosgw.so.2.0.0) | | ->72.80% (815,392,000B) 0x523561B: RGWOp_Bucket_Info::execute() (in /usr/lib64/libradosgw.so.2.0.0) | | ->72.80% (815,392,000B) 0x522DAA1: rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, bool) (in /usr/lib64/libradosgw.so.2.0.0) | | ->72.80% (815,392,000B) 0x52315C7: process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::string const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, int*) (in /usr/lib64/libradosgw.so.2.0.0) | | ->72.80% (815,392,000B) 0x519A9F8: RGWCivetWebFrontend::process(mg_connection*) (in /usr/lib64/libradosgw.so.2.0.0) | | ->72.80% (815,392,000B) 0x52EBA4D: ??? (in /usr/lib64/libradosgw.so.2.0.0) | | ->72.80% (815,392,000B) 0x52ED6EE: ??? (in /usr/lib64/libradosgw.so.2.0.0) | | ->72.80% (815,392,000B) 0x52EDB97: ??? (in /usr/lib64/libradosgw.so.2.0.0) | | ->72.80% (815,392,000B) 0xFEA5EA4: start_thread (in /usr/lib64/libpthread-2.17.so) | | ->72.80% (815,392,000B) 0x116409FC: clone (in /usr/lib64/libc-2.17.so) | | | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%) |
Updated by Tobias Urdin about 2 years ago
Seems like this is related to the admin API.
Updated by Tobias Urdin about 2 years ago
Should probably mention that we have thousands of buckets that we retrieve statistics for, but we don't expect memory used for these operations to stay behind like this.
Updated by Tobias Urdin about 2 years ago
After a while with:
GET /admin/bucket?uid=<uid here>&stats=true
Updated by Tobias Urdin about 2 years ago
This might be solved if we upgrade I would guess because of the major refactor in https://github.com/ceph/ceph/commit/99f7c4aa1286edfea6961b92bb44bb8fe22bd599 however that's not a feasible solution for this issue.
Updated by Casey Bodley about 2 years ago
- Subject changed from heap memory leak in radosgw to octopus: heap memory leak in radosgw
- Status changed from New to Fix Under Review
- Pull request ID set to 45283
Updated by Yuri Weinstein almost 2 years ago
Updated by Casey Bodley almost 2 years ago
- Status changed from Fix Under Review to Resolved