Project

General

Profile

Actions

Bug #10011

closed

Journaler: failed on shutdown or EBLACKLISTED

Added by Greg Farnum over 9 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_23:08:01-kcephfs-giant-testing-basic-multi/585648/

teuthology.log:

2014-11-04T13:33:30.518 INFO:teuthology.misc:Shutting down mds daemons...
2014-11-04T13:33:36.467 INFO:tasks.ceph.mds.a:Stopped
2014-11-04T13:33:37.026 ERROR:teuthology.misc:Saw exception from mds.a-s
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 1084, in stop_daemons_of_type
    daemon.stop()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/daemon.py", line 45, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 387, in wait
    proc.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 105, in wait
    exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on burnupi25 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f -i a-s'

The main log is pretty boring (note that despite the name it was not a standby at the end):

2014-11-04 12:54:01.477223 7f8d4009f700  0 mds.beacon.a-s handle_mds_beacon no longer laggy
2014-11-04 12:54:04.035561 7f8d3d89a700 -1 mds.0.journaler(rw) _finish_flush got (108) Cannot send after transport endpoint shutdown
2014-11-04 12:54:04.035643 7f8d3d89a700 -1 mds.0.journaler(rw) handle_write_error (108) Cannot send after transport endpoint shutdown
2014-11-04 12:54:04.035678 7f8d3d89a700 -1 mds.0.journaler(rw) _finish_write_head got (108) Cannot send after transport endpoint shutdown
2014-11-04 12:54:04.035687 7f8d3d89a700 -1 mds.0.journaler(rw) handle_write_error (108) Cannot send after transport endpoint shutdown
2014-11-04 12:54:04.056040 7f8d3d89a700 -1 osdc/Journaler.cc: In function 'void Journaler::handle_write_error(int)' thread 7f8d3d89a700 time 2014-11-04 12:54:04.035711
osdc/Journaler.cc: 1180: FAILED assert(0 == "unhandled write error")

This doesn't look like the MDS was trying to shut down, so I'm not sure exactly what happened. Probably want to look and see if the OSDs were explicitly blacklisting it for some reason, or if they got shut down, or what.

Actions

Also available in: Atom PDF