Bug #4369: osd: msgr connection not cleanly shut down - Ceph - Ceph

Actions

Copy link

Bug #4369

closed

osd: msgr connection not cleanly shut down

Added by Sage Weil about 11 years ago. Updated about 11 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Sage Weil

Category:

OSD

Target version:

% Done:

Source:

Development

Tags:

Backport:

bobtail

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

i saw a hang during umount due to the FILE_BUFFER pin on a bunch fo disconnected indoes. at first this looked like the fh leak fixed in wip-traceless earlier, but that was already applied. didn't come up during last run, but should hammer this job a few times to make sure it isn't still a problem.

kernel:
  branch: testing
  kdb: true
nuke-on-error: true
machine_type: mira
overrides:
  ceph:
    conf:
      mds:
        mds inject traceless reply probability: 0.5
      client:
        debug client: 20
        debug ms: 20
        debug objectcacher: 20
        debug objecter: 20
      osd:
        osd debug op order: 1
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    branch: wip-traceless
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- install: null
- ceph: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - suites/ffsb.sh

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Ian Colle about 11 years ago

Assignee set to Sage Weil

Actions

Copy link

Updated by Sage Weil about 11 years ago

Status changed from New to In Progress

this is actually requests not coming back from the osd.. possibly/probably due to #4079?

reproducing with osd+journal logs.

Actions

Copy link

Updated by Sage Weil about 11 years ago

now i see that the pgs aren't clean, because of a msgr weirdness... a notify reply isn't delivered due ot the msgr throwing out old messages. low seq # on incoming, vs very high seq number on local connection state. reproducing with debug ms = 20.

Actions

Copy link

Updated by Sage Weil about 11 years ago

suspicious of commit 0f42eddef5da6c1babe9ed51ceaa3212a42c2ec4 for #4271 ...

Actions

Copy link

Updated by Sage Weil about 11 years ago

Subject changed from ceph-fuse: hang on shutdown after ffsb to osd: msgr connection not cleanly shut down
Category set to OSD
Backport set to bobtail

wip-osd-map

now passes my test!

Actions

Copy link