Project

General

Profile

Bug #20542

rgw: not initialized pointer cause rgw crash with ec data pool

Added by Aleksei Gutikov 4 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
-
Target version:
-
Start date:
07/07/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw
Release:
Needs Doc:
No

Description

In RGWPutObjProcessor_Atomic::complete_writing_data()
with pending_data_bl.length() > 0 and next_part_ofs==data_ofs
not initialized void *handle leads to invalid pointer librados::AioCompletion::pc
which leads to crash of rgw.

For ec pools alignment = osd_pool_erasure_code_stripe_unit * data_chunk_count.
(Not sure that is correct, but that is what I've observed)

So, for example for
rgw_obj_stripe_size=4M
rgw_max_chunk_size=4M
osd_pool_erasure_code_stripe_unit=4k
data_chunk_count=3 (k=3, m=2)
RGWRados::get_max_chunk_size(const rgw_pool& pool, uint64_t *max_chunk_size) returns max_chunk_size = 4M-4k

Then if uploading to S3 not multipart object with size=16M

While rgw_obj_stripe_size=4M
That leads (somehow) to entering while (pending_data_bl.length()) {
in int RGWPutObjProcessor_Atomic::complete_writing_data():2710

And there not initialized void *handle leads to invalid pointer librados::AioCompletion::pc
which leads to rgw crash

1: (()+0x1fcee2) [0x55f98c2e0ee2]
2: (()+0x11390) [0x7f43341c8390]
3: (Mutex::Lock(bool)+0xd) [0x7f432b5d1a2d]
4: (librados::AioCompletion::wait_for_safe()+0x15) [0x7f4334425995]
5: (RGWRados::aio_wait(void*)+0x11) [0x55f98c3efd81]
6: (RGWPutObjProcessor_Aio::wait_pending_front()+0x4c) [0x55f98c3fc51c]
7: (RGWPutObjProcessor_Aio::drain_pending()+0x20) [0x55f98c3fc5a0]
8: (RGWPutObjProcessor_Atomic::complete_writing_data()+0x30d) [0x55f98c40f6bd]
9: (RGWPutObjProcessor_Atomic::do_complete(unsigned long, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, ceph::buffer::list> > >&, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, char const, char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const*, std::set<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >)+0x67) [0x55f98c440037]
10: (RGWPutObjProcessor::complete(unsigned long, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, ceph::buffer::list> > >&, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, char const*, char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const*, std::set<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >)+0x22) [0x55f98c3eea52]
11: (RGWPutObj::execute()+0x22a9) [0x55f98c3be659]
12: (rgw_process_authenticated(RGWHandler_REST, RGWOp*&, RGWRequest*, req_state*, bool)+0x165) [0x55f98c3e8eb5]
13: (process_request(RGWRados*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*)+0x1abc) [0x55f98c3ead5c]
14: (RGWCivetWebFrontend::process(mg_connection*)+0x371) [0x55f98c299711]
15: (()+0x1ee2a9) [0x55f98c2d22a9]
16: (()+0x1efc79) [0x55f98c2d3c79]
17: (()+0x76ba) [0x7f43341be6ba]
18: (clone()+0x6d) [0x7f4329c453dd]


Related issues

Copied to rgw - Backport #20712: jewel: rgw: not initialized pointer cause rgw crash with ec data pool Resolved
Copied to rgw - Backport #20713: kraken: rgw: not initialized pointer cause rgw crash with ec data pool Rejected

History

#2 Updated by Jos Collin 4 months ago

  • Status changed from New to In Progress

#3 Updated by Aleksei Gutikov 4 months ago

How to reproduce with vstart.sh environment (luminous branch v12.1.0):

$ ../src/vstart.sh -d -n -x -l --osd_num 5 --rgw_num 1 --bluestore
$ ./bin/ceph osd erasure-code-profile set ec-profile k=3 m=2 ruleset-failure-domain=osd
$ ./bin/ceph osd pool create default.rgw.buckets.data 12 12 erasure ec-profile

$ s3cmd mb s3://1111
$ dd if=/dev/urandom of=./test16M.raw bs=1M count=16
$ s3cmd put test16M.raw s3://1111/xxx

$ cat ./out/rgw.0.log
...
ceph version 12.1.0-10-gf1938d7a0d (f1938d7a0d04ef9b16ae752c0c0621aa3d7485c9) luminous (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f0580afeb02]
2: (Mutex::Lock(bool)+0x1a6) [0x7f0580ac6b16]
3: (librados::AioCompletion::wait_for_safe()+0x1e) [0x7f05894dd6fe]
4: (RGWRados::aio_wait(void*)+0x37) [0x558b4edb0c17]
5: (RGWPutObjProcessor_Aio::wait_pending_front()+0x55) [0x558b4edbd295]
6: (RGWPutObjProcessor_Aio::drain_pending()+0x20) [0x558b4edbd430]
7: (RGWPutObjProcessor_Atomic::complete_writing_data()+0xb61) [0x558b4edd5101]
8: (RGWPutObjProcessor_Atomic::do_complete(unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::list> > >&, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, char const, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)+0x67) [0x558b4ee00be7]
9: (RGWPutObjProcessor::complete(unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >
, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::list> > >&, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)+0x22) [0x558b4edafd92]
10: (RGWPutObj::execute()+0x258b) [0x558b4ed7b8bb]
11: (rgw_process_authenticated(RGWHandler_REST
, RGWOp*&, RGWRequest*, req_state*, bool)+0x172) [0x558b4eda97e2]
12: (process_request(RGWRados*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*)+0x1ecc) [0x558b4edaba8c]
13: (RGWCivetWebFrontend::process(mg_connection*)+0x3bc) [0x558b4ec4b55c]
14: (()+0x1d3ac6) [0x558b4ec84ac6]
15: (()+0x1d5bb8) [0x558b4ec86bb8]
16: (()+0x7494) [0x7f0580469494]
17: (clone()+0x3f) [0x7f057e04b93f]

#5 Updated by Casey Bodley 3 months ago

  • Status changed from In Progress to Pending Backport
  • Backport set to jewel kraken

#6 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #20712: jewel: rgw: not initialized pointer cause rgw crash with ec data pool added

#7 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #20713: kraken: rgw: not initialized pointer cause rgw crash with ec data pool added

#8 Updated by Nathan Cutler about 1 month ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF