Bug #10011: Journaler: failed on shutdown or EBLACKLISTED - CephFS - Ceph

Actions

Copy link

Bug #10011

closed

Journaler: failed on shutdown or EBLACKLISTED

Added by Greg Farnum over 9 years ago. Updated almost 8 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

John Spray

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-03_23:08:01-kcephfs-giant-testing-basic-multi/585648/

teuthology.log:

2014-11-04T13:33:30.518 INFO:teuthology.misc:Shutting down mds daemons...
2014-11-04T13:33:36.467 INFO:tasks.ceph.mds.a:Stopped
2014-11-04T13:33:37.026 ERROR:teuthology.misc:Saw exception from mds.a-s
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/misc.py", line 1084, in stop_daemons_of_type
    daemon.stop()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/daemon.py", line 45, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 387, in wait
    proc.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 105, in wait
    exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on burnupi25 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f -i a-s'

The main log is pretty boring (note that despite the name it was not a standby at the end):

2014-11-04 12:54:01.477223 7f8d4009f700  0 mds.beacon.a-s handle_mds_beacon no longer laggy
2014-11-04 12:54:04.035561 7f8d3d89a700 -1 mds.0.journaler(rw) _finish_flush got (108) Cannot send after transport endpoint shutdown
2014-11-04 12:54:04.035643 7f8d3d89a700 -1 mds.0.journaler(rw) handle_write_error (108) Cannot send after transport endpoint shutdown
2014-11-04 12:54:04.035678 7f8d3d89a700 -1 mds.0.journaler(rw) _finish_write_head got (108) Cannot send after transport endpoint shutdown
2014-11-04 12:54:04.035687 7f8d3d89a700 -1 mds.0.journaler(rw) handle_write_error (108) Cannot send after transport endpoint shutdown
2014-11-04 12:54:04.056040 7f8d3d89a700 -1 osdc/Journaler.cc: In function 'void Journaler::handle_write_error(int)' thread 7f8d3d89a700 time 2014-11-04 12:54:04.035711
osdc/Journaler.cc: 1180: FAILED assert(0 == "unhandled write error")

This doesn't look like the MDS was trying to shut down, so I'm not sure exactly what happened. Probably want to look and see if the OSDs were explicitly blacklisting it for some reason, or if they got shut down, or what.

Actions

Copy link

Updated by John Spray over 9 years ago

mon.a says:

2014-11-04 12:53:56.430421 7f096d00b700 10 mon.a@1(leader).mds e18 no beacon from 6220 10.214.134.2:6811/37422 mds.0.3 up:active since 2014-11-04 12:53:39.506502
2014-11-04 12:53:56.430433 7f096d00b700 10 mon.a@1(leader).mds e18  replacing 6220 10.214.134.2:6811/37422 mds.0.3 up:active with 7525/a 10.214.136.134:6811/29541
2014-11-04 12:54:36.433472 7f096d00b700 10 mon.a@1(leader).mds e18 no beacon from 6220 10.214.134.2:6811/37422 mds.0.3 up:active since 2014-11-04 12:54:18.531904
2014-11-04 12:54:36.433487 7f096d00b700 10 mon.a@1(leader).mds e18  replacing 6220 10.214.134.2:6811/37422 mds.0.3 up:active with 7525/a 10.214.136.134:6811/29541

The second message is a bit weird, because the 'since' time isn't a time that we actually received a beacon at. Also, why are we going through the same replacement twice?

2014-11-04 12:53:20.869914 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(6220/a-s up:active seq 210 v18) v3 from mds.0 10.214.134.2:6811/37422
2014-11-04 12:53:20.870211 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(7525/a up:standby seq 50 v18) v3 from mds.? 10.214.136.134:6811/29541
2014-11-04 12:53:20.870433 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(7525/a up:standby seq 51 v18) v3 from mds.? 10.214.136.134:6811/29541
2014-11-04 12:53:20.871085 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(6220/a-s up:active seq 211 v18) v3 from mds.0 10.214.134.2:6811/37422
2014-11-04 12:53:21.424787 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(7525/a up:standby seq 52 v18) v3 from mds.? 10.214.136.134:6811/29541
2014-11-04 12:53:23.506505 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(6220/a-s up:active seq 212 v18) v3 from mds.0 10.214.134.2:6811/37422
2014-11-04 12:53:25.424944 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(7525/a up:standby seq 53 v18) v3 from mds.? 10.214.136.134:6811/29541
2014-11-04 12:53:27.506369 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(6220/a-s up:active seq 213 v18) v3 from mds.0 10.214.134.2:6811/37422
2014-11-04 12:53:29.425082 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(7525/a up:standby seq 54 v18) v3 from mds.? 10.214.136.134:6811/29541
2014-11-04 12:53:31.506457 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(6220/a-s up:active seq 214 v18) v3 from mds.0 10.214.134.2:6811/37422
2014-11-04 12:53:33.425300 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(7525/a up:standby seq 55 v18) v3 from mds.? 10.214.136.134:6811/29541
2014-11-04 12:53:35.506452 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(6220/a-s up:active seq 215 v18) v3 from mds.0 10.214.134.2:6811/37422
2014-11-04 12:53:37.425595 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(7525/a up:standby seq 56 v18) v3 from mds.? 10.214.136.134:6811/29541
2014-11-04 12:53:39.506443 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(6220/a-s up:active seq 216 v18) v3 from mds.0 10.214.134.2:6811/37422
2014-11-04 12:53:41.425912 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(7525/a up:standby seq 57 v18) v3 from mds.? 10.214.136.134:6811/29541
2014-11-04 12:54:18.538614 7f096c80a700 10 mon.a@1(leader).paxosservice(mdsmap 1..18) dispatch mdsbeacon(7525/a up:standby seq 63 v18) v3 from mds.? 10.214.136.134:6811/29541

There's lots of "no longer laggy" spam in the MDS log, so either the mons are not responding promptly or somethings wrong with the way we're reporting that.

The weird thing then is that the last MDS messages are stamped 12:54:04.056040, ~30 seconds before the mon has decided to replace this MDS due to it being laggy.

I note that the failing MDS (a-s) was sharing a host with two mons and two OSDs.

The relevant period in the cluster log:

2014-11-04 12:54:01.464986 mon.0 10.214.134.2:6789/0 6 : cluster [INF] mdsmap e18: 1/1/1 up {0=a-s=up:active}, 1 up:standby
2014-11-04 12:54:01.465174 mon.0 10.214.134.2:6789/0 7 : cluster [INF] osdmap e9: 4 osds: 4 up, 4 in
2014-11-04 12:54:01.464986 mon.0 10.214.134.2:6789/0 6 : cluster [INF] mdsmap e18: 1/1/1 up {0=a-s=up:active}, 1 up:standby
2014-11-04 12:54:01.465174 mon.0 10.214.134.2:6789/0 7 : cluster [INF] osdmap e9: 4 osds: 4 up, 4 in
2014-11-04 12:54:02.508375 mon.0 10.214.134.2:6789/0 11 : cluster [INF] scrub ok on 0,1,2: ScrubResult(keys {auth=64,logm=877,mdsmap=20,monmap=3,osdmap=21,pgmap=281,pgmap_meta=6,pgmap_osd=4,pgmap_pg=528} crc {auth=3204458802,logm=3129699316,mdsmap=1435440698,monmap=650486722,osdmap=3984225998,pgmap=2217722726,pgmap_meta=1976240145,pgmap_osd=1705388810,pgmap_pg=213665068})
2014-11-04 12:54:02.508375 mon.0 10.214.134.2:6789/0 11 : cluster [INF] scrub ok on 0,1,2: ScrubResult(keys {auth=64,logm=877,mdsmap=20,monmap=3,osdmap=21,pgmap=281,pgmap_meta=6,pgmap_osd=4,pgmap_pg=528} crc {auth=3204458802,logm=3129699316,mdsmap=1435440698,monmap=650486722,osdmap=3984225998,pgmap=2217722726,pgmap_meta=1976240145,pgmap_osd=1705388810,pgmap_pg=213665068})
2014-11-04 12:54:18.531924 mon.1 10.214.136.134:6789/0 44 : cluster [INF] mdsmap e18: 1/1/1 up {0=a-s=up:active}, 1 up:standby
2014-11-04 12:54:18.532039 mon.1 10.214.136.134:6789/0 45 : cluster [INF] osdmap e9: 4 osds: 4 up, 4 in
2014-11-04 12:54:36.439108 mon.1 10.214.136.134:6789/0 50 : cluster [INF] osdmap e10: 4 osds: 4 up, 4 in
2014-11-04 12:54:36.443530 mon.1 10.214.136.134:6789/0 51 : cluster [INF] mdsmap e19: 1/1/1 up {0=a=up:replay}
2014-11-04 12:54:37.455944 mon.1 10.214.136.134:6789/0 54 : cluster [INF] mdsmap e20: 1/1/1 up {0=a=up:reconnect}
2014-11-04 12:54:02.508375 mon.0 10.214.134.2:6789/0 11 : cluster [INF] scrub ok on 0,1,2: ScrubResult(keys {auth=64,logm=877,mdsmap=20,monmap=3,osdmap=21,pgmap=281,pgmap_meta=6,pgmap_osd=4,pgmap_pg=528} crc {auth=3204458802,logm=3129699316,mdsmap=1435440698,monmap=650486722,osdmap=3984225998,pgmap=2217722726,pgmap_meta=1976240145,pgmap_osd=1705388810,pgmap_pg=213665068})
2014-11-04 12:54:37.455944 mon.1 10.214.136.134:6789/0 54 : cluster [INF] mdsmap e20: 1/1/1 up {0=a=up:reconnect}
2014-11-04 12:54:39.274024 mon.0 10.214.134.2:6789/0 6 : cluster [INF] mdsmap e21: 1/1/1 up {0=a=up:rejoin}
2014-11-04 12:54:39.274099 mon.0 10.214.134.2:6789/0 7 : cluster [INF] osdmap e10: 4 osds: 4 up, 4 in
2014-11-04 12:54:39.274024 mon.0 10.214.134.2:6789/0 6 : cluster [INF] mdsmap e21: 1/1/1 up {0=a=up:rejoin}
2014-11-04 12:54:39.274099 mon.0 10.214.134.2:6789/0 7 : cluster [INF] osdmap e10: 4 osds: 4 up, 4 in
2014-11-04 12:54:40.272907 mon.0 10.214.134.2:6789/0 11 : cluster [INF] scrub ok on 0,1,2: ScrubResult(keys {auth=65,logm=895,mdsmap=23,monmap=3,osdmap=23,pgmap=287,pgmap_meta=6,pgmap_osd=4,pgmap_pg=528} crc {auth=4175571516,logm=1239312095,mdsmap=136519059,monmap=650486722,osdmap=1911257321,pgmap=1019801681,pgmap_meta=930017212,pgmap_osd=2681313588,pgmap_pg=3075239165})
2014-11-04 12:54:40.272907 mon.0 10.214.134.2:6789/0 11 : cluster [INF] scrub ok on 0,1,2: ScrubResult(keys {auth=65,logm=895,mdsmap=23,monmap=3,osdmap=23,pgmap=287,pgmap_meta=6,pgmap_osd=4,pgmap_pg=528} crc {auth=4175571516,logm=1239312095,mdsmap=136519059,monmap=650486722,osdmap=1911257321,pgmap=1019801681,pgmap_meta=930017212,pgmap_osd=2681313588,pgmap_pg=3075239165})

In the audit log nothing but a bunch of scrubs:

2014-11-04 12:54:02.461793 mon.1 10.214.136.134:6789/0 38 : audit [INF] from='client.? 10.214.136.134:0/1014326' entity='client.admin' cmd=[{"prefix": "scrub"}]: dispatch
2014-11-04 12:54:02.464953 mon.0 10.214.134.2:6789/0 10 : audit [INF] from='client.7924 :/0' entity='forwarded-request' cmd=[{"prefix": "scrub"}]: dispatch
2014-11-04 12:54:02.461793 mon.1 10.214.136.134:6789/0 38 : audit [INF] from='client.? 10.214.136.134:0/1014326' entity='client.admin' cmd=[{"prefix": "scrub"}]: dispatch
2014-11-04 12:54:02.464953 mon.0 10.214.134.2:6789/0 10 : audit [INF] from='client.7924 :/0' entity='forwarded-request' cmd=[{"prefix": "scrub"}]: dispatch
2014-11-04 12:54:02.461793 mon.1 10.214.136.134:6789/0 38 : audit [INF] from='client.? 10.214.136.134:0/1014326' entity='client.admin' cmd=[{"prefix": "scrub"}]: dispatch
2014-11-04 12:54:02.464953 mon.0 10.214.134.2:6789/0 10 : audit [INF] from='client.7924 :/0' entity='forwarded-request' cmd=[{"prefix": "scrub"}]: dispatch
2014-11-04 12:54:40.233713 mon.1 10.214.136.134:6789/0 60 : audit [INF] from='client.? 10.214.136.134:0/1014757' entity='client.admin' cmd=[{"prefix": "scrub"}]: dispatch
2014-11-04 12:54:40.235130 mon.0 10.214.134.2:6789/0 10 : audit [INF] from='client.7939 :/0' entity='forwarded-request' cmd=[{"prefix": "scrub"}]: dispatch
2014-11-04 12:54:40.233713 mon.1 10.214.136.134:6789/0 60 : audit [INF] from='client.? 10.214.136.134:0/1014757' entity='client.admin' cmd=[{"prefix": "scrub"}]: dispatch
2014-11-04 12:54:40.235130 mon.0 10.214.134.2:6789/0 10 : audit [INF] from='client.7939 :/0' entity='forwarded-request' cmd=[{"prefix": "scrub"}]: dispatch

Actions

Copy link

Updated by John Spray over 9 years ago

Ah... I've just realised why the "respawn on blacklist" thing I put in a while back isn't kicking in here: because Journaler is calling the write error handler via a finisher, the lock is not protecting handle_write_error from another call before the error handler has been called. So before the respawn code has a chance to run, a second call to handle_write_error is causing an assertion.

Actions

Copy link