Project

General

Profile

Feature #15308

LIBRADOS modify Pipe::connect() to return the error code

Added by Vikhyat Umrao 10 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
06/15/2016
Due date:
% Done:

0%

Source:
Support
Tags:
Backport:
jewel
Reviewed:
User Impact:
Affected Versions:
Release:
Needs Doc:
No

Description

LIBRADOS modify Pipe::connect() to return the error code

Steps to Reproduce:

1. create a qemu instance
2. reduce the fd limits with prlimit command
3. try to attach rbd image to this qemu instance
4. ceph client side log file will have logs as given below :

7f0b769e7700 1 - 192.168.128.30:0/2021513 >> 192.168.128.35:6800/24374 pipe(0x7f0bcabc0000 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0bc55e1ce0).connect couldn't created socket (24) Too many open files

5. But qemu logs will not log anything regarding this error as rados_connect call from qemu wont have proper return code from librados Pipe::connect().

Ceph source file : src/msg/simple/Pipe.cc

 870 int Pipe::connect()
 871 {
 872   bool got_bad_auth = false;
 873 
 874   ldout(msgr->cct,10) << "connect " << connect_seq << dendl;
 875   assert(pipe_lock.is_locked());

....................
.....................

 901   // create socket?
 902   sd = ::socket(peer_addr.get_family(), SOCK_STREAM, 0);
 903   if (sd < 0) {
 904     lderr(msgr->cct) << "connect couldn't created socket " << cpp_strerror(errno) << dendl;
 905     goto fail;
 906   }

........................
.........................

1215  fail:
1216   if (conf->ms_inject_internal_delays) {
1217     ldout(msgr->cct, 10) << " sleep for " << msgr->cct->_conf->ms_inject_internal_delays << dendl;
1218     utime_t t;
1219     t.set_from_double(msgr->cct->_conf->ms_inject_internal_delays);
1220     t.sleep();
1221   }
1222 
1223   pipe_lock.Lock();
1224  fail_locked:
1225   if (state == STATE_CONNECTING)
1226     fault();
1227   else
1228     ldout(msgr->cct,3) << "connect fault, but state = " << get_state_name()
1229                        << " != connecting, stopping" << dendl;
1230 
1231  stop_locked:
1232   delete authorizer;
1233   return -1;
1234 }

- "connect couldn't created socket (24) Too many open files"
- This error is coming from this function socket() from Pipe::connect() function.
- Here we are going to fail label but if we check fail we returning "-1" but we need proper error handling in this function.


Subtasks

Feature #16310: Take pipe::connect() returned errno to rados_connect() Part2 of http://tracker.ceph.com/issues/15308NewBrad Hubbard


Related issues

Copied to Backport #17377: jewel: LIBRADOS modify Pipe::connect() to return the error code Resolved

History

#2 Updated by Vikhyat Umrao 10 months ago

  • Description updated (diff)

#3 Updated by Vikhyat Umrao 10 months ago

  • Source changed from other to Support

#4 Updated by Vikhyat Umrao 10 months ago

  • Assignee deleted (Vikhyat Umrao)

#5 Updated by Vikhyat Umrao 10 months ago

Does async messenger have this problem?

async has this problem as well, see AsyncConnection::_process_connection(), and search for "sd = net.nonblock_connect(get_peer_addr());".

Probably specifying a function-local error code and making sure that gets set everywhere before jumping through to the failure handling.

yeah, we need to save the errno somewhere and assign it to rval of the context of the request.

#6 Updated by Vikhyat Umrao 10 months ago

Thanks Kefu for your inputs and nice discussion.

- Now we need to fix both the messengers : simple messenger and async messenger connect function definitions.

- Then I and Kefu discussed in detail more about it and we concluded we need to fix all the caller functions of these functions with proper return codes.

- I checked parent AsyncConnection::_process_connection() parent functions and they look good.

Like :_process_connection() calls -> nonblock_connect() calls ->   generic_connect() -> and generic connect definition is proper.
int NetHandler::generic_connect(const entity_addr_t& addr, bool nonblock)
{
  int ret;
  int s = create_socket(addr.get_family());
  if (s < 0)
    return s;

  if (nonblock) {
    ret = set_nonblock(s);
    if (ret < 0) {
      close(s);
      return ret;
    }
  }

  set_socket_options(s);

  ret = ::connect(s, (sockaddr*)&addr.addr, addr.addr_size());
  if (ret < 0) {
    if (errno == EINPROGRESS && nonblock)
      return s;

    ldout(cct, 10) << __func__ << " connect: " << strerror(errno) << dendl;
    close(s);
    return -errno;
  }

  return s;
}

#7 Updated by Vikhyat Umrao 10 months ago

Hello Josh,

Thanks for your help and nice discussion.

As discussed I have sent first part of pipe::connect() fix : https://github.com/ceph/ceph/pull/8416

Will work on qemu patch for fixing error number function fix.

#9 Updated by Sage Weil 9 months ago

  • Status changed from New to Resolved

#10 Updated by Samuel Just 4 months ago

  • Status changed from Resolved to Pending Backport
  • Backport set to jewel

#11 Updated by Nathan Cutler 4 months ago

  • Copied to Backport #17377: jewel: LIBRADOS modify Pipe::connect() to return the error code added

#12 Updated by Nathan Cutler 3 months ago

  • Status changed from Pending Backport to Resolved
  • Needs Doc set to No

Also available in: Atom PDF