Bug #4565: MDS/client: issue decoding MClientReconnect on MDS - CephFS - Ceph

Actions

Copy link

Bug #4565

closed

MDS/client: issue decoding MClientReconnect on MDS

Added by Sage Weil about 11 years ago. Updated almost 11 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Greg Farnum

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2013-03-27T08:32:15.272 INFO:teuthology.task.mds_thrash.mds_thrasher.failure_group.[a, b-s-a]:waiting till mds map indicates mds.a is laggy/crashed, in failed state, or mds.a is removed from mdsmap
2013-03-27T08:32:16.509 INFO:teuthology.task.ceph.mds.b-s-a.err:2013-03-27 08:32:08.953553 7fa0dc23e700 -1 failed to decode message of type 23 v3: buffer::end_of_buffer
2013-03-27T08:32:16.511 INFO:teuthology.task.ceph.mds.b-s-a.err:2013-03-27 08:32:08.955332 7fa0dc33f700 -1 failed to decode message of type 23 v3: buffer::end_of_buffer
2013-03-27T08:32:16.513 INFO:teuthology.task.ceph.mds.b-s-a.err:2013-03-27 08:32:08.957340 7fa0dc440700 -1 failed to decode message of type 23 v3: buffer::end_of_buffer
2013-03-27T08:32:16.520 INFO:teuthology.task.ceph.mds.b-s-a.err:mds/Server.cc: In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7fa0dfd4f700 time 2013-03-27 08:32:08.961574
2013-03-27T08:32:16.520 INFO:teuthology.task.ceph.mds.b-s-a.err:mds/Server.cc: 601: FAILED assert(in->snaprealm)
2013-03-27T08:32:16.520 INFO:teuthology.task.ceph.mds.b-s-a.err: ceph version 0.59-530-g3f2e711 (3f2e71169a540519f669f406789b319cf6989686)
2013-03-27T08:32:16.520 INFO:teuthology.task.ceph.mds.b-s-a.err: 1: (Server::handle_client_reconnect(MClientReconnect*)+0x25de) [0x54cb2e]
2013-03-27T08:32:16.521 INFO:teuthology.task.ceph.mds.b-s-a.err: 2: (Server::dispatch(Message*)+0x285) [0x55e965]
2013-03-27T08:32:16.521 INFO:teuthology.task.ceph.mds.b-s-a.err: 3: (MDS::handle_deferrable_message(Message*)+0x910) [0x4c93f0]
2013-03-27T08:32:16.521 INFO:teuthology.task.ceph.mds.b-s-a.err: 4: (MDS::_dispatch(Message*)+0x6f4) [0x4dcea4]
2013-03-27T08:32:16.521 INFO:teuthology.task.ceph.mds.b-s-a.err: 5: (MDS::ms_dispatch(Message*)+0x1db) [0x4de27b]
2013-03-27T08:32:16.521 INFO:teuthology.task.ceph.mds.b-s-a.err: 6: (DispatchQueue::entry()+0x341) [0x8201e1]
2013-03-27T08:32:16.522 INFO:teuthology.task.ceph.mds.b-s-a.err: 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x79ce4d]
2013-03-27T08:32:16.522 INFO:teuthology.task.ceph.mds.b-s-a.err: 8: (()+0x7e9a) [0x7fa0e40a7e9a]
2013-03-27T08:32:16.522 INFO:teuthology.task.ceph.mds.b-s-a.err: 9: (clone()+0x6d) [0x7fa0e28604bd]

job was

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-27_01:01:46-fs-master-testing-basic/4394$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: d51342bbcc8c0a30466356934197c93186e2ff2d
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        mds inject delay type: osd mds
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject socket failures: 2500
    log-whitelist:
    - slow request
    sha1: 3f2e71169a540519f669f406789b319cf6989686
  s3tests:
    branch: master
  workunit:
    sha1: 3f2e71169a540519f669f406789b319cf6989686
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
  - mds.b-s-a
tasks:
- chef: null
- clock: null
- install: null
- ceph: null
- mds_thrash: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - suites/fsstress.sh

Actions

Copy link

Updated by Sage Weil about 11 years ago

Project changed from Ceph to CephFS

Actions

Copy link

Updated by Sage Weil about 11 years ago

Priority changed from Normal to Urgent

Actions

Copy link

Updated by Sam Lang about 11 years ago

Update on trying to track this down...running this test in teuthology, I don't hit the same assertion, but I do see the test hang after the mds gets restarted a number of times. Doing the restarting by hang during the same fsstress run shows the client hanging on get_caps():

#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x000000000047d127 in Wait (mutex=..., this=0x7fff18faac80) at ./common/Cond.h:55
#2 Client::wait_on_list (this=0x1ee9800, ls=...) at client/Client.cc:2590
#3 0x00000000004931af in Client::get_caps (this=0x1ee9800, in=0x3619d80, need=4096, want=8192, phave=0x7fff18fab0e8, endoff=773989)
at client/Client.cc:2259
#4 0x00000000004a4860 in Client::_write (this=0x1ee9800, f=0x1ece6e0, offset=763371, size=10618, buf=<optimized out>)
at client/Client.cc:5943
#5 0x00000000004a55a7 in Client::ll_write (this=0x1ee9800, fh=0x1ece6e0, off=763371, len=10618,
data=0x1f04050 '=' <repeats 200 times>...) at client/Client.cc:7603
#6 0x0000000000478697 in fuse_ll_write (req=0x304b300, ino=<optimized out>, buf=0x1f04050 '=' <repeats 200 times>..., size=10618,
off=763371, fi=0x7fff18fab3a0) at client/fuse_ll.cc:358
#7 0x00007f3194d3cce3 in ?? () from /lib/libfuse.so.2
#8 0x00007f3194d3ad65 in fuse_session_loop () from /lib/libfuse.so.2

Detaching from the fuse process causes the hang to disappear (the test starts going again), which may be from the session getting timed out at the mds causing the client to retry that write request with a new session. During the hang the client doesn't show any outstanding requests to the mds.

Once the test finishes, it hangs for a bit, and then the unmount returns EBUSY instead of unmounting cleanly.

Actions

Copy link

Updated by Ian Colle about 11 years ago

Assignee set to Sam Lang

Actions

Copy link

Updated by Greg Farnum about 11 years ago

Assignee changed from Sam Lang to Greg Farnum

I'll look into the code around this today.

Actions

Copy link

Updated by Greg Farnum about 11 years ago

Subject changed from mds/Server.cc: 601: FAILED assert(in->snaprealm) to MDS/client: issue decoding MClientReconnect on MDS

[Meant to post this yesterday but I guess I forgot to hit submit.]
Sadly, this test didn't slurp up any logs, so all we have is what's in the teuthology log (the important bit of which is pasted above). This log tells us that the MClientReconnect decode failed three times and then suddenly (apparently) succeeded on the fourth try, and then the MDS failed on the assert that the inode rooted a snaprealm. That makes me think we just got "lucky" with the inode number and it's actually totally invalid. Especially because this is the fourth attempt at the message?
I do note that a fix to the decode (which was previously never decoding the realms, and is now doing so, thanks to a reversed conditional) went in not a ton before this, but it should have been in for some tests before this (and certainly after). :/

Going to try and reproduce with logging, but I'm not super optimistic...

Actions

Copy link

Updated by Greg Farnum about 11 years ago

Status changed from New to Can't reproduce

I've had this running for more than 24 hours and it still hasn't reproduced. I'll let it keep going, but I don't believe this was an error in code.

Actions

Copy link

Updated by Greg Farnum about 11 years ago

Status changed from Can't reproduce to 12
Assignee deleted (~~Greg Farnum~~)

This just happened again at /a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14248 (it's still running, for now...)

Actions

Copy link

Updated by Greg Farnum about 11 years ago

And also /a/teuthology-2013-04-16_01:00:52-fs-next-testing-basic/13665

Actions

Copy link

#10

Updated by Greg Farnum about 11 years ago

Those machines are cleared out again, of course (d'oh!). Next time we see this we need to gather up everything we can. I'm going to turn up the MDS debug log levels so we can hopefully have more data out of the MDS next time we see this, too. (Testing those now.)

Actions

Copy link

#11

Updated by Greg Farnum about 11 years ago

Also, both of these are the same job as the first incident was: fsstress workunit on ceph-fuse, messenger failure injection, and mds thrashing.

Actions

Copy link

#12

Updated by Greg Farnum about 11 years ago

Priority changed from Urgent to High

Okay, pushed the update for more debugging, and am downgrading this to "High" since it only appears under so many failure conditions to begin with.

Actions

Copy link

#13

Updated by Greg Farnum about 11 years ago

/a/teuthology-2013-04-26_02:29:14-fs-next-testing-basic/1450

Actions

Copy link

#14

Updated by Tamilarasi muthamizhan almost 11 years ago

Assignee set to Greg Farnum
Priority changed from High to Urgent

hitting this pretty consistently when upgrading mds from argonaut to cuttlefish.

have this reproduced on burnupi39 and burnupi45 with debugs on.

Actions

Copy link

#15

Updated by Greg Farnum almost 11 years ago

Are you still running old clients when you hit this?

Actions

Copy link

#16

Updated by Tamilarasi muthamizhan almost 11 years ago

steps to reproduce:

bring up a cluster of 2 nodes running argonaut, run blogbench workload on it from client.
upgraded only monitors on all the nodes to bobtail
wait for the quorum, then upgrade all daemons to next branch in the order of mon, osd and mds.

ubuntu@burnupi14:~/qa-smoke-tests/cephfs$ dpkg -l | grep ceph
ii  ceph-common                             0.60-683-g5fa3cbf-1precise      common utilities to mount and interact with a ceph storage cluster
ii  ceph-fuse                               0.60-683-g5fa3cbf-1precise      FUSE-based client for the Ceph distributed file system
ii  ceph-test                               0.60-683-g5fa3cbf-1precise      Ceph test and benchmarking tools.
ii  libcephfs1                              0.60-683-g5fa3cbf-1precise      Ceph distributed file system client library
ii  linux-image-3.6.1-ceph-00057-gc6092bb   3.6.1-ceph-00057-gc6092bb-1     Linux kernel, version 3.6.1-ceph-00057-gc6092bb

ceph.conf 

ubuntu@burnupi39:~$ sudo cat /etc/ceph/ceph.conf
[global]
    debug ms = 1
    auth client required = none
    auth service required = none
    auth cluster required = none
[osd]
    osd journal size = 1000
    filestore xattr use omap = true

[osd.1]
    host = burnupi39

[osd.2]
    host = burnupi45

[mon.a]
    host = burnupi39
    mon addr = 10.214.135.12:6789

[mon.b]
    host = burnupi45
    mon addr = 10.214.136.40:6789

[mon.c]
    host = burnupi45
    mon addr = 10.214.136.40:6790

[mds.a]
    host = burnupi39

thought i had debugs enabled for mds and osd as well.just now found that its missing in the conf file :(

Actions

Copy link

#17

Updated by Greg Farnum almost 11 years ago

Priority changed from Urgent to High

This is happening with the argonaut ceph-fuse daemon, not a cuttlefish one. Going to turn this down to High again and bring it up as a data point in the Cuttlefish blockers meeting later today.

Actions

Copy link

#18

Updated by Greg Farnum almost 11 years ago

Assignee deleted (~~Greg Farnum~~)

Actions

Copy link

#19

Updated by Tamilarasi muthamizhan almost 11 years ago

log: ubuntu@teuthology:/a/teuthology-2013-05-10_01:00:36-fs-master-testing-basic/10442

2013-05-10T13:15:53.069 INFO:teuthology.task.ceph.mds.b-s-a.err:2013-05-10 13:16:10.329312 7f4bb13f6700 -1 failed to decode message of type 23 v3: buffer::end_of_buffer
2013-05-10T13:15:53.077 INFO:teuthology.task.ceph.mds.b-s-a.err:mds/Server.cc: In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7f4bb530b700 time 2013-05-10 13:16:10.335774
2013-05-10T13:15:53.077 INFO:teuthology.task.ceph.mds.b-s-a.err:mds/Server.cc: 602: FAILED assert(in->snaprealm)
2013-05-10T13:15:53.077 INFO:teuthology.task.ceph.mds.b-s-a.err: ceph version 0.61-210-g261aaba (261aaba1172945bb8368e12481db24c22104f1fa)
2013-05-10T13:15:53.077 INFO:teuthology.task.ceph.mds.b-s-a.err: 1: (Server::handle_client_reconnect(MClientReconnect*)+0x237b) [0x54f8fb]
2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 2: (Server::dispatch(Message*)+0x275) [0x5624b5]
2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 3: (MDS::handle_deferrable_message(Message*)+0x910) [0x4cc0c0]
2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 4: (MDS::_dispatch(Message*)+0x6f4) [0x4e04d4]
2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 5: (MDS::ms_dispatch(Message*)+0x1d3) [0x4e1893]
2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 6: (DispatchQueue::entry()+0x3f1) [0x82f311]
2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ac07d]
2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err: 8: (()+0x7e9a) [0x7f4bb9667e9a]
2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err: 9: (clone()+0x6d) [0x7f4bb7e1dccd]
2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err:2013-05-10 13:16:10.337162 7f4bb530b700 -1 mds/Server.cc: In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7f4bb530b700 time 2013-05-10 13:16:10.335774
2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err:mds/Server.cc: 602: FAILED assert(in->snaprealm)
2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err:
2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: ceph version 0.61-210-g261aaba (261aaba1172945bb8368e12481db24c22104f1fa)
2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: 1: (Server::handle_client_reconnect(MClientReconnect*)+0x237b) [0x54f8fb]
2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: 2: (Server::dispatch(Message*)+0x275) [0x5624b5]
2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: 3: (MDS::handle_deferrable_message(Message*)+0x910) [0x4cc0c0]
2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: 4: (MDS::_dispatch(Message*)+0x6f4) [0x4e04d4]
2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: 5: (MDS::ms_dispatch(Message*)+0x1d3) [0x4e1893]
2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err: 6: (DispatchQueue::entry()+0x3f1) [0x82f311]
2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err: 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ac07d]
2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err: 8: (()+0x7e9a) [0x7f4bb9667e9a]
2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err: 9: (clone()+0x6d) [0x7f4bb7e1dccd]
2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err:
2013-05-10T13:15:53.224 DEBUG:teuthology.misc:with jobid basedir: 10442
2013-05-10T13:15:53.224 DEBUG:teuthology.orchestra.run:Running [10.214.132.2]: '/home/ubuntu/cephtest/10442/enable-coredump ceph-coverage /home/ubuntu/cephtest/10442/archive/coverage ceph --concise mds dump --format=json'
2013-05-10T13:15:53.293 INFO:teuthology.task.ceph.mds.b-s-a.err:   -36> 2013-05-10 13:16:10.329312 7f4bb13f6700 -1 failed to decode message of type 23 v3: buffer::end_of_buffer
2013-05-10T13:15:53.294 INFO:teuthology.task.ceph.mds.b-s-a.err:     0> 2013-05-10 13:16:10.337162 7f4bb530b700 -1 mds/Server.cc: In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7f4bb530b700 time 2013-05-10 13:16:10.335774

Actions

Copy link

#20

Updated by Tamilarasi muthamizhan almost 11 years ago

ubuntu@teuthology:/a/teuthology-2013-05-10_01:00:36-fs-master-testing-basic/10442$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: b5b09be30cf99f9c699e825629f02e3bce555d44
machine_type: plana
nuke-on-error: true
overrides:
  ceph:
    conf:
      client:
        debug client: 10
      global:
        mds inject delay type: osd mds
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject socket failures: 2500
      mds:
        debug mds: 20
        debug ms: 1
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    sha1: b353da6f682d223ba14812da0fe814eca72ad6f5
  s3tests:
    branch: master
  workunit:
    sha1: b353da6f682d223ba14812da0fe814eca72ad6f5
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
  - mds.b-s-a
tasks:
- chef: null
- clock.check: null
- install: null
- ceph: null
- mds_thrash: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - suites/fsstress.sh

Actions

Copy link

#21