Bug #4565
closedMDS/client: issue decoding MClientReconnect on MDS
0%
Description
2013-03-27T08:32:15.272 INFO:teuthology.task.mds_thrash.mds_thrasher.failure_group.[a, b-s-a]:waiting till mds map indicates mds.a is laggy/crashed, in failed state, or mds.a is removed from mdsmap 2013-03-27T08:32:16.509 INFO:teuthology.task.ceph.mds.b-s-a.err:2013-03-27 08:32:08.953553 7fa0dc23e700 -1 failed to decode message of type 23 v3: buffer::end_of_buffer 2013-03-27T08:32:16.511 INFO:teuthology.task.ceph.mds.b-s-a.err:2013-03-27 08:32:08.955332 7fa0dc33f700 -1 failed to decode message of type 23 v3: buffer::end_of_buffer 2013-03-27T08:32:16.513 INFO:teuthology.task.ceph.mds.b-s-a.err:2013-03-27 08:32:08.957340 7fa0dc440700 -1 failed to decode message of type 23 v3: buffer::end_of_buffer 2013-03-27T08:32:16.520 INFO:teuthology.task.ceph.mds.b-s-a.err:mds/Server.cc: In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7fa0dfd4f700 time 2013-03-27 08:32:08.961574 2013-03-27T08:32:16.520 INFO:teuthology.task.ceph.mds.b-s-a.err:mds/Server.cc: 601: FAILED assert(in->snaprealm) 2013-03-27T08:32:16.520 INFO:teuthology.task.ceph.mds.b-s-a.err: ceph version 0.59-530-g3f2e711 (3f2e71169a540519f669f406789b319cf6989686) 2013-03-27T08:32:16.520 INFO:teuthology.task.ceph.mds.b-s-a.err: 1: (Server::handle_client_reconnect(MClientReconnect*)+0x25de) [0x54cb2e] 2013-03-27T08:32:16.521 INFO:teuthology.task.ceph.mds.b-s-a.err: 2: (Server::dispatch(Message*)+0x285) [0x55e965] 2013-03-27T08:32:16.521 INFO:teuthology.task.ceph.mds.b-s-a.err: 3: (MDS::handle_deferrable_message(Message*)+0x910) [0x4c93f0] 2013-03-27T08:32:16.521 INFO:teuthology.task.ceph.mds.b-s-a.err: 4: (MDS::_dispatch(Message*)+0x6f4) [0x4dcea4] 2013-03-27T08:32:16.521 INFO:teuthology.task.ceph.mds.b-s-a.err: 5: (MDS::ms_dispatch(Message*)+0x1db) [0x4de27b] 2013-03-27T08:32:16.521 INFO:teuthology.task.ceph.mds.b-s-a.err: 6: (DispatchQueue::entry()+0x341) [0x8201e1] 2013-03-27T08:32:16.522 INFO:teuthology.task.ceph.mds.b-s-a.err: 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x79ce4d] 2013-03-27T08:32:16.522 INFO:teuthology.task.ceph.mds.b-s-a.err: 8: (()+0x7e9a) [0x7fa0e40a7e9a] 2013-03-27T08:32:16.522 INFO:teuthology.task.ceph.mds.b-s-a.err: 9: (clone()+0x6d) [0x7fa0e28604bd]
job was
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-27_01:01:46-fs-master-testing-basic/4394$ cat orig.config.yaml kernel: kdb: true sha1: d51342bbcc8c0a30466356934197c93186e2ff2d nuke-on-error: true overrides: ceph: conf: global: mds inject delay type: osd mds ms inject delay max: 1 ms inject delay probability: 0.005 ms inject socket failures: 2500 log-whitelist: - slow request sha1: 3f2e71169a540519f669f406789b319cf6989686 s3tests: branch: master workunit: sha1: 3f2e71169a540519f669f406789b319cf6989686 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 - mds.b-s-a tasks: - chef: null - clock: null - install: null - ceph: null - mds_thrash: null - ceph-fuse: null - workunit: clients: all: - suites/fsstress.sh
Updated by Sam Lang about 11 years ago
Update on trying to track this down...running this test in teuthology, I don't hit the same assertion, but I do see the test hang after the mds gets restarted a number of times. Doing the restarting by hang during the same fsstress run shows the client hanging on get_caps():
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x000000000047d127 in Wait (mutex=..., this=0x7fff18faac80) at ./common/Cond.h:55
#2 Client::wait_on_list (this=0x1ee9800, ls=...) at client/Client.cc:2590
#3 0x00000000004931af in Client::get_caps (this=0x1ee9800, in=0x3619d80, need=4096, want=8192, phave=0x7fff18fab0e8, endoff=773989)
at client/Client.cc:2259
#4 0x00000000004a4860 in Client::_write (this=0x1ee9800, f=0x1ece6e0, offset=763371, size=10618, buf=<optimized out>)
at client/Client.cc:5943
#5 0x00000000004a55a7 in Client::ll_write (this=0x1ee9800, fh=0x1ece6e0, off=763371, len=10618,
data=0x1f04050 '=' <repeats 200 times>...) at client/Client.cc:7603
#6 0x0000000000478697 in fuse_ll_write (req=0x304b300, ino=<optimized out>, buf=0x1f04050 '=' <repeats 200 times>..., size=10618,
off=763371, fi=0x7fff18fab3a0) at client/fuse_ll.cc:358
#7 0x00007f3194d3cce3 in ?? () from /lib/libfuse.so.2
#8 0x00007f3194d3ad65 in fuse_session_loop () from /lib/libfuse.so.2
Detaching from the fuse process causes the hang to disappear (the test starts going again), which may be from the session getting timed out at the mds causing the client to retry that write request with a new session. During the hang the client doesn't show any outstanding requests to the mds.
Once the test finishes, it hangs for a bit, and then the unmount returns EBUSY instead of unmounting cleanly.
Updated by Greg Farnum about 11 years ago
- Assignee changed from Sam Lang to Greg Farnum
I'll look into the code around this today.
Updated by Greg Farnum about 11 years ago
- Subject changed from mds/Server.cc: 601: FAILED assert(in->snaprealm) to MDS/client: issue decoding MClientReconnect on MDS
[Meant to post this yesterday but I guess I forgot to hit submit.]
Sadly, this test didn't slurp up any logs, so all we have is what's in the teuthology log (the important bit of which is pasted above). This log tells us that the MClientReconnect decode failed three times and then suddenly (apparently) succeeded on the fourth try, and then the MDS failed on the assert that the inode rooted a snaprealm. That makes me think we just got "lucky" with the inode number and it's actually totally invalid. Especially because this is the fourth attempt at the message?
I do note that a fix to the decode (which was previously never decoding the realms, and is now doing so, thanks to a reversed conditional) went in not a ton before this, but it should have been in for some tests before this (and certainly after). :/
Going to try and reproduce with logging, but I'm not super optimistic...
Updated by Greg Farnum about 11 years ago
- Status changed from New to Can't reproduce
I've had this running for more than 24 hours and it still hasn't reproduced. I'll let it keep going, but I don't believe this was an error in code.
Updated by Greg Farnum about 11 years ago
- Status changed from Can't reproduce to 12
- Assignee deleted (
Greg Farnum)
This just happened again at /a/teuthology-2013-04-17_01:00:56-fs-master-testing-basic/14248 (it's still running, for now...)
Updated by Greg Farnum about 11 years ago
And also /a/teuthology-2013-04-16_01:00:52-fs-next-testing-basic/13665
Updated by Greg Farnum about 11 years ago
Those machines are cleared out again, of course (d'oh!). Next time we see this we need to gather up everything we can. I'm going to turn up the MDS debug log levels so we can hopefully have more data out of the MDS next time we see this, too. (Testing those now.)
Updated by Greg Farnum about 11 years ago
Also, both of these are the same job as the first incident was: fsstress workunit on ceph-fuse, messenger failure injection, and mds thrashing.
Updated by Greg Farnum about 11 years ago
- Priority changed from Urgent to High
Okay, pushed the update for more debugging, and am downgrading this to "High" since it only appears under so many failure conditions to begin with.
Updated by Greg Farnum about 11 years ago
/a/teuthology-2013-04-26_02:29:14-fs-next-testing-basic/1450
Updated by Tamilarasi muthamizhan almost 11 years ago
- Assignee set to Greg Farnum
- Priority changed from High to Urgent
hitting this pretty consistently when upgrading mds from argonaut to cuttlefish.
have this reproduced on burnupi39 and burnupi45 with debugs on.
Updated by Greg Farnum almost 11 years ago
Are you still running old clients when you hit this?
Updated by Tamilarasi muthamizhan almost 11 years ago
steps to reproduce:
bring up a cluster of 2 nodes running argonaut, run blogbench workload on it from client.
upgraded only monitors on all the nodes to bobtail
wait for the quorum, then upgrade all daemons to next branch in the order of mon, osd and mds.
ubuntu@burnupi14:~/qa-smoke-tests/cephfs$ dpkg -l | grep ceph ii ceph-common 0.60-683-g5fa3cbf-1precise common utilities to mount and interact with a ceph storage cluster ii ceph-fuse 0.60-683-g5fa3cbf-1precise FUSE-based client for the Ceph distributed file system ii ceph-test 0.60-683-g5fa3cbf-1precise Ceph test and benchmarking tools. ii libcephfs1 0.60-683-g5fa3cbf-1precise Ceph distributed file system client library ii linux-image-3.6.1-ceph-00057-gc6092bb 3.6.1-ceph-00057-gc6092bb-1 Linux kernel, version 3.6.1-ceph-00057-gc6092bb ceph.conf ubuntu@burnupi39:~$ sudo cat /etc/ceph/ceph.conf [global] debug ms = 1 auth client required = none auth service required = none auth cluster required = none [osd] osd journal size = 1000 filestore xattr use omap = true [osd.1] host = burnupi39 [osd.2] host = burnupi45 [mon.a] host = burnupi39 mon addr = 10.214.135.12:6789 [mon.b] host = burnupi45 mon addr = 10.214.136.40:6789 [mon.c] host = burnupi45 mon addr = 10.214.136.40:6790 [mds.a] host = burnupi39 thought i had debugs enabled for mds and osd as well.just now found that its missing in the conf file :(
Updated by Greg Farnum almost 11 years ago
- Priority changed from Urgent to High
This is happening with the argonaut ceph-fuse daemon, not a cuttlefish one. Going to turn this down to High again and bring it up as a data point in the Cuttlefish blockers meeting later today.
Updated by Tamilarasi muthamizhan almost 11 years ago
log: ubuntu@teuthology:/a/teuthology-2013-05-10_01:00:36-fs-master-testing-basic/10442
2013-05-10T13:15:53.069 INFO:teuthology.task.ceph.mds.b-s-a.err:2013-05-10 13:16:10.329312 7f4bb13f6700 -1 failed to decode message of type 23 v3: buffer::end_of_buffer 2013-05-10T13:15:53.077 INFO:teuthology.task.ceph.mds.b-s-a.err:mds/Server.cc: In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7f4bb530b700 time 2013-05-10 13:16:10.335774 2013-05-10T13:15:53.077 INFO:teuthology.task.ceph.mds.b-s-a.err:mds/Server.cc: 602: FAILED assert(in->snaprealm) 2013-05-10T13:15:53.077 INFO:teuthology.task.ceph.mds.b-s-a.err: ceph version 0.61-210-g261aaba (261aaba1172945bb8368e12481db24c22104f1fa) 2013-05-10T13:15:53.077 INFO:teuthology.task.ceph.mds.b-s-a.err: 1: (Server::handle_client_reconnect(MClientReconnect*)+0x237b) [0x54f8fb] 2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 2: (Server::dispatch(Message*)+0x275) [0x5624b5] 2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 3: (MDS::handle_deferrable_message(Message*)+0x910) [0x4cc0c0] 2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 4: (MDS::_dispatch(Message*)+0x6f4) [0x4e04d4] 2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 5: (MDS::ms_dispatch(Message*)+0x1d3) [0x4e1893] 2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 6: (DispatchQueue::entry()+0x3f1) [0x82f311] 2013-05-10T13:15:53.078 INFO:teuthology.task.ceph.mds.b-s-a.err: 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ac07d] 2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err: 8: (()+0x7e9a) [0x7f4bb9667e9a] 2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err: 9: (clone()+0x6d) [0x7f4bb7e1dccd] 2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err:2013-05-10 13:16:10.337162 7f4bb530b700 -1 mds/Server.cc: In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7f4bb530b700 time 2013-05-10 13:16:10.335774 2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err:mds/Server.cc: 602: FAILED assert(in->snaprealm) 2013-05-10T13:15:53.079 INFO:teuthology.task.ceph.mds.b-s-a.err: 2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: ceph version 0.61-210-g261aaba (261aaba1172945bb8368e12481db24c22104f1fa) 2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: 1: (Server::handle_client_reconnect(MClientReconnect*)+0x237b) [0x54f8fb] 2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: 2: (Server::dispatch(Message*)+0x275) [0x5624b5] 2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: 3: (MDS::handle_deferrable_message(Message*)+0x910) [0x4cc0c0] 2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: 4: (MDS::_dispatch(Message*)+0x6f4) [0x4e04d4] 2013-05-10T13:15:53.080 INFO:teuthology.task.ceph.mds.b-s-a.err: 5: (MDS::ms_dispatch(Message*)+0x1d3) [0x4e1893] 2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err: 6: (DispatchQueue::entry()+0x3f1) [0x82f311] 2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err: 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ac07d] 2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err: 8: (()+0x7e9a) [0x7f4bb9667e9a] 2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err: 9: (clone()+0x6d) [0x7f4bb7e1dccd] 2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2013-05-10T13:15:53.081 INFO:teuthology.task.ceph.mds.b-s-a.err: 2013-05-10T13:15:53.224 DEBUG:teuthology.misc:with jobid basedir: 10442 2013-05-10T13:15:53.224 DEBUG:teuthology.orchestra.run:Running [10.214.132.2]: '/home/ubuntu/cephtest/10442/enable-coredump ceph-coverage /home/ubuntu/cephtest/10442/archive/coverage ceph --concise mds dump --format=json' 2013-05-10T13:15:53.293 INFO:teuthology.task.ceph.mds.b-s-a.err: -36> 2013-05-10 13:16:10.329312 7f4bb13f6700 -1 failed to decode message of type 23 v3: buffer::end_of_buffer 2013-05-10T13:15:53.294 INFO:teuthology.task.ceph.mds.b-s-a.err: 0> 2013-05-10 13:16:10.337162 7f4bb530b700 -1 mds/Server.cc: In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7f4bb530b700 time 2013-05-10 13:16:10.335774
Updated by Tamilarasi muthamizhan almost 11 years ago
ubuntu@teuthology:/a/teuthology-2013-05-10_01:00:36-fs-master-testing-basic/10442$ cat orig.config.yaml kernel: kdb: true sha1: b5b09be30cf99f9c699e825629f02e3bce555d44 machine_type: plana nuke-on-error: true overrides: ceph: conf: client: debug client: 10 global: mds inject delay type: osd mds ms inject delay max: 1 ms inject delay probability: 0.005 ms inject socket failures: 2500 mds: debug mds: 20 debug ms: 1 mon: debug mon: 20 debug ms: 20 debug paxos: 20 log-whitelist: - slow request - wrongly marked me down sha1: b353da6f682d223ba14812da0fe814eca72ad6f5 s3tests: branch: master workunit: sha1: b353da6f682d223ba14812da0fe814eca72ad6f5 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 - mds.b-s-a tasks: - chef: null - clock.check: null - install: null - ceph: null - mds_thrash: null - ceph-fuse: null - workunit: clients: all: - suites/fsstress.sh
Updated by Sage Weil almost 11 years ago
- Status changed from 12 to Fix Under Review
Updated by Ian Colle almost 11 years ago
- Assignee changed from Sage Weil to Greg Farnum
Greg, please review.
Updated by Sage Weil almost 11 years ago
- Status changed from Fix Under Review to Resolved
Updated by Greg Farnum almost 11 years ago
commit:314cf046b0b787ca69665e8751eab6fe7adb4037 in the next branch.