Project

General

Profile

Bug #46332

boost::asio::async_write() does not return error when the remote endpoint is not connected

Added by Mark Kogan about 1 month ago. Updated 26 days ago.

Status:
Pending Backport
Priority:
High
Assignee:
Target version:
% Done:

0%

Source:
other
Tags:
beast
Backport:
nautilus octopus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

In a flow where the remote side disconnect the TCP connection in the middle of RGW transmit,
the boost::asio::async_write(... yield[ec]) sporadically dosent return an error.

debug log:

2020-07-02T15:12:17.981+0300 7f332de07700  1 ====== starting new request req=0x7f332cc81218 =====                                                 [10/92171]
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s initializing for trans_id = tx000000000000000000004-005efdcf21-1634-default
2020-07-02T15:12:17.982+0300 7f332de07700 10 rgw api priority: s3=8 s3website=7
2020-07-02T15:12:17.982+0300 7f332de07700 20 req 4 0s get_handler handler=26RGWHandler_REST_Service_S3
2020-07-02T15:12:17.982+0300 7f332de07700 10 handler=26RGWHandler_REST_Service_S3
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s getting op 0
2020-07-02T15:12:17.982+0300 7f332de07700 10 req 4 0s s3:list_buckets scheduling with dmclock/throttler client=3 cost=1
2020-07-02T15:12:17.982+0300 7f332de07700 10 op=26RGWListBuckets_ObjStore_S3
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s s3:list_buckets verifying requester
2020-07-02T15:12:17.982+0300 7f332de07700 20 req 4 0s s3:list_buckets rgw::auth::StrategyRegistry::s3_main_strategy_t: trying rgw::auth::s3::AWSAuthStrategy
2020-07-02T15:12:17.982+0300 7f332de07700 20 req 4 0s s3:list_buckets rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::S3AnonymousEngine
2020-07-02T15:12:17.982+0300 7f332de07700 20 req 4 0s s3:list_buckets rgw::auth::s3::S3AnonymousEngine granted access
2020-07-02T15:12:17.982+0300 7f332de07700 20 req 4 0s s3:list_buckets rgw::auth::s3::AWSAuthStrategy granted access
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s s3:list_buckets normalizing buckets and tenants
2020-07-02T15:12:17.982+0300 7f332de07700 10 s->object=<NULL> s->bucket=
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s s3:list_buckets init permissions
2020-07-02T15:12:17.982+0300 7f332de07700 20 RGWSI_User_RADOS::read_user_info(): anonymous user
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s s3:list_buckets recalculating target
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s s3:list_buckets reading permissions
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s s3:list_buckets init op
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s s3:list_buckets verifying op mask
2020-07-02T15:12:17.982+0300 7f332de07700 20 req 4 0s s3:list_buckets required_mask= 1 user.op_mask=7
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s s3:list_buckets verifying op permissions
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s s3:list_buckets verifying op params
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s s3:list_buckets pre-executing
2020-07-02T15:12:17.982+0300 7f332de07700  2 req 4 0s s3:list_buckets executing
2020-07-02T15:12:17.982+0300 7f332de07700 20 RGWSI_User_RADOS::list_buckets(): anonymous user
2020-07-02T15:12:17.982+0300 7f3332e11700  4 write_data failed: Connection reset by peer
#                                            ^^^^^^^^^^
2020-07-02T15:12:17.983+0300 7f3331e0f700  4 write_data failed: Broken pipe
#                                            ^^^^^^^^^^
2020-07-02T15:12:17.983+0300 7f3331e0f700  2 req 4 0.001000006s s3:list_buckets completing

The 3rd call to boost::asio::async_write(...) from below does not return an error:

src/rgw/rgw_process.cc

process_request(...)
...
done:
  try {
    client_io->complete_request();

proceeds to
src/rgw/rgw_asio_frontend.cc

write_data()
auto bytes = boost::asio::async_write(stream, boost::asio::buffer(buf, len),
                                          yield[ec]);

Adding a remote_endpoint() check before writing to the socket like for example:

size_t write_data(const char* buf, size_t len) override {
    boost::system::error_code ec;
    // - - - -  8<  - - - -
    stream.lowest_layer().remote_endpoint(ec);
    if (ec) {
      ldout(cct, 4) << "write_data failed: " << ec.message() << dendl;
      throw rgw::io::Exception(ec.value(), std::system_category());
    }
    // - - - -  8<  - - - -
    auto bytes = boost::asio::async_write(stream, boost::asio::buffer(buf, len),
                                          yield[ec]);

will resolve the issue by returning / throwing ec=system:107 / Transport endpoint is not connected
from the stream.lowest_layer().remote_endpoint(ec);

2020-07-02T15:45:01.086+0300 7ff183e5b700  1 ====== starting new request req=0x7ff17d64d218 =====
2020-07-02T15:45:01.086+0300 7ff183e5b700  2 req 4 0s initializing for trans_id = tx000000000000000000004-005efdd6cd-1638-default
2020-07-02T15:45:01.086+0300 7ff183e5b700 10 rgw api priority: s3=8 s3website=7
2020-07-02T15:45:01.086+0300 7ff183e5b700 20 req 4 0s get_handler handler=26RGWHandler_REST_Service_S3
2020-07-02T15:45:01.087+0300 7ff183e5b700 10 handler=26RGWHandler_REST_Service_S3
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s getting op 0
2020-07-02T15:45:01.087+0300 7ff183e5b700 10 req 4 0.001000006s s3:list_buckets scheduling with dmclock/throttler client=3 cost=1
2020-07-02T15:45:01.087+0300 7ff183e5b700 10 op=26RGWListBuckets_ObjStore_S3
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s s3:list_buckets verifying requester
2020-07-02T15:45:01.087+0300 7ff183e5b700 20 req 4 0.001000006s s3:list_buckets rgw::auth::StrategyRegistry::s3_main_strategy_t: trying rgw::auth::s3::AWSAuthStrategy
2020-07-02T15:45:01.087+0300 7ff183e5b700 20 req 4 0.001000006s s3:list_buckets rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::S3AnonymousEngine
2020-07-02T15:45:01.087+0300 7ff183e5b700 20 req 4 0.001000006s s3:list_buckets rgw::auth::s3::S3AnonymousEngine granted access
2020-07-02T15:45:01.087+0300 7ff183e5b700 20 req 4 0.001000006s s3:list_buckets rgw::auth::s3::AWSAuthStrategy granted access
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s s3:list_buckets normalizing buckets and tenants
2020-07-02T15:45:01.087+0300 7ff183e5b700 10 s->object=<NULL> s->bucket=
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s s3:list_buckets init permissions
2020-07-02T15:45:01.087+0300 7ff183e5b700 20 RGWSI_User_RADOS::read_user_info(): anonymous user
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s s3:list_buckets recalculating target
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s s3:list_buckets reading permissions
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s s3:list_buckets init op
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s s3:list_buckets verifying op mask
2020-07-02T15:45:01.087+0300 7ff183e5b700 20 req 4 0.001000006s s3:list_buckets required_mask= 1 user.op_mask=7
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s s3:list_buckets verifying op permissions
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s s3:list_buckets verifying op params
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s s3:list_buckets pre-executing
2020-07-02T15:45:01.087+0300 7ff183e5b700  2 req 4 0.001000006s s3:list_buckets executing
2020-07-02T15:45:01.087+0300 7ff183e5b700 20 RGWSI_User_RADOS::list_buckets(): anonymous user
2020-07-02T15:45:01.087+0300 7ff17ee51700  4 write_data failed: Transport endpoint is not connected
#                                            ^^^^^^^^^^
2020-07-02T15:45:01.087+0300 7ff17ee51700  4 write_data failed: Transport endpoint is not connected
#                                            ^^^^^^^^^^
2020-07-02T15:45:01.087+0300 7ff17ee51700  2 req 4 0.001000006s s3:list_buckets completing
2020-07-02T15:45:01.087+0300 7ff17ee51700  4 write_data failed: Transport endpoint is not connected
#                                            ^^^^^^^^^^
2020-07-02T15:45:01.087+0300 7ff17ee51700  0 ERROR: client_io->complete_request() returned Transport endpoint is not connected
#                                                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2020-07-02T15:45:01.087+0300 7ff17ee51700  2 req 4 0.001000006s s3:list_buckets op status=0
2020-07-02T15:45:01.087+0300 7ff17ee51700  2 req 4 0.001000006s s3:list_buckets http status=200
2020-07-02T15:45:01.087+0300 7ff17ee51700  1 ====== req done req=0x7ff17d64d218 op status=0 http_status=200 latency=0.001000006s ======

Related issues

Copied to rgw - Backport #46518: octopus: boost::asio::async_write() does not return error when the remote endpoint is not connected New
Copied to rgw - Backport #46519: nautilus: boost::asio::async_write() does not return error when the remote endpoint is not connected New

History

#1 Updated by Casey Bodley about 1 month ago

  • Tags set to beast
  • Backport set to nautilus octopus

#2 Updated by Mark Kogan about 1 month ago

  • Pull request ID set to 35904

#3 Updated by Casey Bodley about 1 month ago

  • Status changed from In Progress to Fix Under Review

#4 Updated by Casey Bodley 26 days ago

  • Status changed from Fix Under Review to Pending Backport

#5 Updated by Nathan Cutler 26 days ago

  • Copied to Backport #46518: octopus: boost::asio::async_write() does not return error when the remote endpoint is not connected added

#6 Updated by Nathan Cutler 26 days ago

  • Copied to Backport #46519: nautilus: boost::asio::async_write() does not return error when the remote endpoint is not connected added

Also available in: Atom PDF