Project

General

Profile

Actions

Bug #2802

closed

msgr: mds session hangs on direct_io test

Added by Sage Weil almost 12 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


2012-07-19 15:20:54.125864 7f57acfe2700  0 -- 10.214.133.32:6809/4094 >> 10.214.133.34:0/508443620 pipe(0x1ed0b40 sd=24 pgs=58 cs=23 l=0).injecting socket failure
2012-07-19 15:20:54.125932 7f57acfe2700  0 -- 10.214.133.32:6809/4094 >> 10.214.133.34:0/508443620 pipe(0x1ed0b40 sd=24 pgs=58 cs=23 l=0).fault initiating reconnect
2012-07-19 15:21:58.826692 7f57afcf3700  0 mds.0.1 ms_handle_reset on 10.214.133.32:6800/4084
2012-07-19 15:21:58.827353 7f57afcf3700  0 mds.0.1 ms_handle_connect on 10.214.133.32:6800/4084
2012-07-19 15:25:58.753929 7f57ae3ef700  0 log [INF] : closing stale session client.4120 10.214.133.34:0/508443620 after 304.610037

on 6e064446b538530240fa84bac9c686d89b1bdaf7

ubuntu@teuthology:/a/sage-2012-07-19_15:03:51-regression-wip-msgr-cleanup-testing-basic/14343$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 14240f8208136dbbe7e825caedc0104806027aae
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 6e064446b538530240fa84bac9c686d89b1bdaf7
  workunit:
    sha1: 6e064446b538530240fa84bac9c686d89b1bdaf7
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana80.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCv9ah6KrNde/qjaxGaNCBWMlzzprTwkPmQetLrGhY+MsTz9VN6CRBegYq94TcOjE6A+w3swGNkuPzgG+meNGKV3fq+rggCEDJq+lPPgvM2dU86d0eoPr3m4BYgHcEdxcIzkaZ6f+F+vC956Lc1a1DGRyz6ygAuD8HJh/4FKhGe+PaaQ2EtvrwY8zEj+i9PARj9PHttLYLgsaOrdhszJkXBPGHGSJ7jaBI/AViv12WmYN9WpXWE+khICRpqfm2PIhzKZOgrRBMUuH8YpHTh8hUH5HjBtC2kNS6FPowrNhJU4luFQWV5un85CDRHthfGW+e8/ZelH56NfS1oOxnRM7jL
  ubuntu@plana82.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDKbuUXtRWySmQ24tUH6g2Y3MU0N5+VTNSxHzX5QhVnLUY2djFFUHUhivJGutTfIOdIpw+jQP2fwQoryqC/oqSKyE2fwunP+M1ZEobQILQFthKeUKpDH6rC1gnaJI6CU3k0voVUTQpHUyKLApO5A4knXLfzmPJeE4lbi8Tds9Kx1TtID4+R8Z0UgJgfu8LVsMw8225hvuE0lOG+2n3Ms97uLh9MIxuELm4HAerp2etlVQVeE99udYeST+LMhrltxFRdKD8WUiuycIrzaT9D446ELfN+Vlugh6jSwbJycCzeqfEcPBEKNd4A9oPBThR7T0iHKRNGXHv7QhwpEgPUTT8f
  ubuntu@plana84.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDPXDuoRV49Ph3aE0EZpWxR2CoBoNtfwJ1ZeXaq6R+IAR9Daehj5J0En+0aGVuwwFvsrxlUjZYU1XO9jYHjkqemw/YArcwdiCfnX9au/qS7NdzjSRLzgcHiEAV47H/q6EZuQVew3ppuueM3GZwgpf9yuDFCWy85+MoKGCLAmVqb/Q836SKgQFkAR3gWEXyDaXyZzK97np7pNCJfh3uuPEOmKSSYeqHRE/lkv3DLTvyHJdymuJ4k3RBy4jy/t2M9Hh8IhcFydhQM/IiwOFAatDRX972hwrtu1JTO/N5qTVc+Don5um0cLk386iTCDlo6fIdM7ohc19NTEKvdgJEvEF77
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- kclient: null
- workunit:
    clients:
      all:
      - direct_io
Actions #1

Updated by Sage Weil almost 12 years ago

  • Subject changed from msgr: initiating reconnect on server side to msgr: mds session hangs on direct_io test
  • Status changed from New to In Progress

just a misleading message; cleaned up. the direct_io test (which leaves the mds connection mostly idle) still fails on the mds channel, tho.

Actions #2

Updated by Sage Weil almost 12 years ago

  • Status changed from In Progress to Resolved
Actions #3

Updated by Greg Farnum about 5 years ago

  • Project changed from Ceph to Messengers
  • Category deleted (msgr)
Actions

Also available in: Atom PDF