Project

General

Profile

Actions

Feature #15308

closed

LIBRADOS modify Pipe::connect() to return the error code

Added by Vikhyat Umrao about 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
jewel
Reviewed:
Affected Versions:
Pull request ID:

Description

LIBRADOS modify Pipe::connect() to return the error code

Steps to Reproduce:

1. create a qemu instance
2. reduce the fd limits with prlimit command
3. try to attach rbd image to this qemu instance
4. ceph client side log file will have logs as given below :

7f0b769e7700 1 - 192.168.128.30:0/2021513 >> 192.168.128.35:6800/24374 pipe(0x7f0bcabc0000 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0bc55e1ce0).connect couldn't created socket (24) Too many open files

5. But qemu logs will not log anything regarding this error as rados_connect call from qemu wont have proper return code from librados Pipe::connect().

Ceph source file : src/msg/simple/Pipe.cc

 870 int Pipe::connect()
 871 {
 872   bool got_bad_auth = false;
 873 
 874   ldout(msgr->cct,10) << "connect " << connect_seq << dendl;
 875   assert(pipe_lock.is_locked());

....................
.....................

 901   // create socket?
 902   sd = ::socket(peer_addr.get_family(), SOCK_STREAM, 0);
 903   if (sd < 0) {
 904     lderr(msgr->cct) << "connect couldn't created socket " << cpp_strerror(errno) << dendl;
 905     goto fail;
 906   }

........................
.........................

1215  fail:
1216   if (conf->ms_inject_internal_delays) {
1217     ldout(msgr->cct, 10) << " sleep for " << msgr->cct->_conf->ms_inject_internal_delays << dendl;
1218     utime_t t;
1219     t.set_from_double(msgr->cct->_conf->ms_inject_internal_delays);
1220     t.sleep();
1221   }
1222 
1223   pipe_lock.Lock();
1224  fail_locked:
1225   if (state == STATE_CONNECTING)
1226     fault();
1227   else
1228     ldout(msgr->cct,3) << "connect fault, but state = " << get_state_name()
1229                        << " != connecting, stopping" << dendl;
1230 
1231  stop_locked:
1232   delete authorizer;
1233   return -1;
1234 }

- "connect couldn't created socket (24) Too many open files"
- This error is coming from this function socket() from Pipe::connect() function.
- Here we are going to fail label but if we check fail we returning "-1" but we need proper error handling in this function.


Subtasks 1 (1 open0 closed)

Feature #16310: Take pipe::connect() returned errno to rados_connect() Part2 of http://tracker.ceph.com/issues/15308NewBrad Hubbard06/15/2016

Actions

Related issues 2 (0 open2 closed)

Has duplicate Ceph - Bug #17573: librados doesn't properly report failure to create socket DuplicateJosh Durgin10/13/2016

Actions
Copied to Ceph - Backport #17377: jewel: LIBRADOS modify Pipe::connect() to return the error codeResolvedNathan CutlerActions
Actions #2

Updated by Vikhyat Umrao about 8 years ago

  • Description updated (diff)
Actions #3

Updated by Vikhyat Umrao about 8 years ago

  • Source changed from other to Support
Actions #4

Updated by Vikhyat Umrao about 8 years ago

  • Assignee deleted (Vikhyat Umrao)
Actions #5

Updated by Vikhyat Umrao about 8 years ago

Does async messenger have this problem?

async has this problem as well, see AsyncConnection::_process_connection(), and search for "sd = net.nonblock_connect(get_peer_addr());".

Probably specifying a function-local error code and making sure that gets set everywhere before jumping through to the failure handling.

yeah, we need to save the errno somewhere and assign it to rval of the context of the request.

Actions #6

Updated by Vikhyat Umrao about 8 years ago

Thanks Kefu for your inputs and nice discussion.

- Now we need to fix both the messengers : simple messenger and async messenger connect function definitions.

- Then I and Kefu discussed in detail more about it and we concluded we need to fix all the caller functions of these functions with proper return codes.

- I checked parent AsyncConnection::_process_connection() parent functions and they look good.

Like :_process_connection() calls -> nonblock_connect() calls ->   generic_connect() -> and generic connect definition is proper.
int NetHandler::generic_connect(const entity_addr_t& addr, bool nonblock)
{
  int ret;
  int s = create_socket(addr.get_family());
  if (s < 0)
    return s;

  if (nonblock) {
    ret = set_nonblock(s);
    if (ret < 0) {
      close(s);
      return ret;
    }
  }

  set_socket_options(s);

  ret = ::connect(s, (sockaddr*)&addr.addr, addr.addr_size());
  if (ret < 0) {
    if (errno == EINPROGRESS && nonblock)
      return s;

    ldout(cct, 10) << __func__ << " connect: " << strerror(errno) << dendl;
    close(s);
    return -errno;
  }

  return s;
}
Actions #7

Updated by Vikhyat Umrao about 8 years ago

Hello Josh,

Thanks for your help and nice discussion.

As discussed I have sent first part of pipe::connect() fix : https://github.com/ceph/ceph/pull/8416

Will work on qemu patch for fixing error number function fix.

Actions #9

Updated by Sage Weil almost 8 years ago

  • Status changed from New to Resolved
Actions #10

Updated by Samuel Just over 7 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to jewel
Actions #11

Updated by Nathan Cutler over 7 years ago

  • Copied to Backport #17377: jewel: LIBRADOS modify Pipe::connect() to return the error code added
Actions #12

Updated by Nathan Cutler over 7 years ago

  • Status changed from Pending Backport to Resolved
Actions #13

Updated by Josh Durgin about 7 years ago

  • Has duplicate Bug #17573: librados doesn't properly report failure to create socket added
Actions

Also available in: Atom PDF