Project

General

Profile

Actions

Bug #4369

closed

osd: msgr connection not cleanly shut down

Added by Sage Weil about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
bobtail
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

i saw a hang during umount due to the FILE_BUFFER pin on a bunch fo disconnected indoes. at first this looked like the fh leak fixed in wip-traceless earlier, but that was already applied. didn't come up during last run, but should hammer this job a few times to make sure it isn't still a problem.

kernel:
  branch: testing
  kdb: true
nuke-on-error: true
machine_type: mira
overrides:
  ceph:
    conf:
      mds:
        mds inject traceless reply probability: 0.5
      client:
        debug client: 20
        debug ms: 20
        debug objectcacher: 20
        debug objecter: 20
      osd:
        osd debug op order: 1
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    branch: wip-traceless
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- install: null
- ceph: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - suites/ffsb.sh

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #4271: osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid)ResolvedSage Weil02/26/2013

Actions
Actions #1

Updated by Ian Colle about 11 years ago

  • Assignee set to Sage Weil
Actions #2

Updated by Sage Weil about 11 years ago

  • Status changed from New to In Progress

this is actually requests not coming back from the osd.. possibly/probably due to #4079?

reproducing with osd+journal logs.

Actions #3

Updated by Sage Weil about 11 years ago

now i see that the pgs aren't clean, because of a msgr weirdness... a notify reply isn't delivered due ot the msgr throwing out old messages. low seq # on incoming, vs very high seq number on local connection state. reproducing with debug ms = 20.

Actions #4

Updated by Sage Weil about 11 years ago

suspicious of commit 0f42eddef5da6c1babe9ed51ceaa3212a42c2ec4 for #4271 ...

Actions #5

Updated by Sage Weil about 11 years ago

  • Subject changed from ceph-fuse: hang on shutdown after ffsb to osd: msgr connection not cleanly shut down
  • Category set to OSD
  • Backport set to bobtail

wip-osd-map

now passes my test!

Actions #6

Updated by Sage Weil about 11 years ago

  • Status changed from In Progress to Fix Under Review
Actions #7

Updated by Sage Weil about 11 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF