Project

General

Profile

Actions

Bug #8939

closed

stalled LibRadosTwoPoolsPP.TryFlushReadRace; client failed to reconnect?

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

it appears the OSD was behaving properly, but things stalled because on of the stat replies got

2014-07-26 01:13:59.903140 7febbc03c700  1 -- 10.214.132.10:6814/28297 --> 10.214.133.37:0/3025740 -- osd_op_reply(1309 foo [stat] v0'0 uv174 ondisk = 0) v6 -- ?+0 0x1de6000 con 0x2524dc0
2014-07-26 01:13:59.903151 7febbc03c700  0 -- 10.214.132.10:6814/28297 submit_message osd_op_reply(1309 foo [stat] v0'0 uv174 ondisk = 0) v6 remote, 10.214.133.37:0/3025740, failed lossy con, dropping message 0x1de6000

and then the client never reconnected. unfortunately, there are limited client-side logs:

2014-07-26 01:10:27.960542 7f3b4f40e780 -1 asok(0xded690) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.25740.asok': (17) File exists
2014-07-26 01:10:39.209225 7f3b4f40e780 -1 asok(0xe2a010) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.25740.asok': (17) File exists
2014-07-26 01:10:49.929050 7f3b4f40e780 -1 asok(0xded690) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.25740.asok': (17) File exists
2014-07-26 01:10:58.051348 7f3b487c5700  0 -- 10.214.133.37:0/3025740 >> 10.214.133.37:6789/0 pipe(0xe0a0e0 sd=8 :33838 s=2 pgs=128 cs=1 l=1 c=0xe0a370).injecting socket failure
2014-07-26 01:10:58.053691 7f3b48fc6700  0 monclient: hunting for new mon
2014-07-26 01:10:58.928719 7f3b485c3700  0 -- 10.214.133.37:0/3025740 >> 10.214.132.10:6814/28297 pipe(0xe026e0 sd=9 :52691 s=2 pgs=3 cs=1 l=1 c=0xe02970).injecting socket failure
2014-07-26 01:11:22.993299 7f3b4f406700  0 -- 10.214.133.37:0/3025740 >> 10.214.133.37:6810/18762 pipe(0xe0e9a0 sd=11 :58753 s=2 pgs=29 cs=1 l=1 c=0xe28e90).injecting socket failure
2014-07-26 01:11:32.131231 7f3b406f7700  0 -- 10.214.133.37:0/3025740 >> 10.214.133.37:6800/24538 pipe(0x7f3b34003270 sd=12 :44060 s=2 pgs=8 cs=1 l=1 c=0x7f3b34002420).injecting socket failure
2014-07-26 01:13:14.313710 7f3b4f406700  0 -- 10.214.133.37:0/3025740 >> 10.214.133.37:6810/18762 pipe(0x7f3b2800ab20 sd=11 :58788 s=2 pgs=32 cs=1 l=1 c=0x7f3b28003350).injecting socket failure
2014-07-26 01:13:59.900747 7f3b485c3700  0 -- 10.214.133.37:0/3025740 >> 10.214.132.10:6814/28297 pipe(0x7f3b2800c750 sd=9 :52737 s=2 pgs=8 cs=1 l=1 c=0x7f3b28002320).injecting socket failure

ubun@teuthology:/var/lib/teuthworker/archive/sage-2014-07-25_22:40:14-rados-wip-sage-testing-testing-basic-plana/378419


Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #8891: rados bench hang during thrashingResolvedSage Weil07/21/2014

Actions
Actions #1

Updated by Sage Weil over 9 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-07-25_22:40:14-rados-wip-sage-testing-testing-basic-plana/378563 (same test!)

different test, same stall

ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-07-25_22:40:14-rados-wip-sage-testing-testing-basic-plana/378579

Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil
Actions #3

Updated by Sage Weil over 9 years ago

  • Status changed from In Progress to Duplicate
Actions

Also available in: Atom PDF