Project

General

Profile

Actions

Bug #15314

closed

failed to decode message of type 43 v7: buffer::end_of_buffer

Added by Mark Nelson about 8 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During the start of CBT cache tiering tests of v10.1.0 on incerta05-08, we're hitting a bug when cbt runs the rbd command to create the RBD images. It also appears to happen during other rbd commands such as "ls". I've included log snippets from the client and OSD with ms, objecter, and osd debugging turned up to 20. It appears that message 43 is osd_op_reply as can be seen in the OSD log (also, thanks Greg for pointing out that this message type is in ceph_fs.h, not message.h!)

If I have time, I'll see if I can dump the hex into binary and run it through ceph-dencoder. The issue is quite repeatable and can be caused simply by doing a "sudo rbd ls" on perf@incerta[05-08].

version:

ceph version 10.1.0 (96ae8bd25f31862dbd5302f304ebf8bf1166aba6)

command run:

sudo /usr/bin/rbd -c /home/perf/tmp/cbt/ceph/ceph.conf create cbt-librbdfio-incerta05-foo --size 131072 --pool cbt-librbdfio --order 22 --debug-rbd 20 --debug-ms 20 --debug-objecter 20

OSD message sent:

2016-03-29 19:16:35.181958 7f46a6dc6700 20 -- 10.0.10.107:6808/127105 >> 10.0.10.107:0/3904314947 pipe(0x7f46e10cc000 sd=174 :6808 s=2 pgs=314 cs=1 l=1 c=0x7f46e13c4c00).writer encoding 1 features 576460752032874495 0x7f46ddc8edc0 osd_op_reply(1 cbt-librbdfio-incerta05-foo.rbd [stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v7

Hex dump from the client:

2016-03-29 19:16:35.186561 7f756c3ca700  1 dump: 
00000000  1f 00 00 00 63 62 74 2d  6c 69 62 72 62 64 66 69  |....cbt-librbdfi|
00000010  6f 2d 69 6e 63 65 72 74  61 30 35 2d 66 6f 6f 2e  |o-incerta05-foo.|
00000020  72 62 64 01 02 00 00 00  00 00 00 00 c7 68 9a 98  |rbd..........h..|
00000030  ff ff ff ff 19 00 40 00  00 00 00 00 fe ff ff ff  |......@.........|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 79 00 00 00  |............y...|
00000050  01 00 00 00 02 12 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  00 00 00 00 00 00 00 00  00 00 3b 01 00 00 00 00  |..........;.....|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00                              |.......|
00000097


Files

client.log (11.1 KB) client.log Mark Nelson, 03/30/2016 12:01 AM
osd.22.log (15.6 KB) osd.22.log Mark Nelson, 03/30/2016 12:01 AM
runtests.tiered.xfs.yaml (2.5 KB) runtests.tiered.xfs.yaml Mark Nelson, 03/30/2016 12:02 AM
ceph.conf.filestore.tiered (10.5 KB) ceph.conf.filestore.tiered Mark Nelson, 03/30/2016 12:02 AM
Actions #1

Updated by Mark Nelson about 8 years ago

For what it's worth, I tried dumping the hex to a binary file (with some internet found sed magic to make xxd like it):

sed 's/^[0-9]*//' hexdump | xxd -r -p | dd conv=swab of=binaryfile

and running it through ceph-dencoder:

[ubuntu@incerta05 issue_15314]$ ceph-dencoder import binaryfile type MOSDOpReply dump_json
{
    "summary": "osd_op_reply(0  [] v0'0 uv0 ack = 0)" 
}

If we add the decode step in, it fails though:

[ubuntu@incerta05 issue_15314]$ ceph-dencoder import binaryfile type MOSDOpReply decode dump_json
error: buffer::end_of_buffer
Actions #2

Updated by Josh Durgin about 8 years ago

You've got librbd1 10.0.2 installed - hopefully this is just a temporary incompatability from a dev branch. Try librbd 10.1.0.

Actions #3

Updated by Mark Nelson about 8 years ago

Hi Josh,

This is 10.1.0 compiled locally and installed in /usr/local using make install:

[ubuntu@incerta05 lib]$ which rbd
/usr/local/bin/rbd
[ubuntu@incerta05 lib]$ ldd /usr/local/bin/rbd | grep librbd
    librbd.so.1 => /usr/local/lib/librbd.so.1 (0x00007f17e6d88000)
[ubuntu@incerta05 lib]$ ls -al librbd.so.1
lrwxrwxrwx. 1 root root 15 Mar 29 09:49 librbd.so.1 -> librbd.so.1.0.0
[perf@incerta05 librbd]$ strings /usr/local/lib/librbd.so.1.0.0 | grep 10.1.0
10.1.0

[perf@incerta05 librbd]$ strings /usr/local/lib/librbd.so.1.0.0 | grep 10.0.2
[perf@incerta05 librbd]$ 
Actions #4

Updated by Mark Nelson about 8 years ago

Aha, thanks to Josh we found that in fact we did have a stray copy of 10.0.2 installed, so this is perhaps an incompatability with an earlier version. Closing now.

Actions #5

Updated by Mark Nelson about 8 years ago

  • Category deleted (OSD)
  • Status changed from New to Closed
  • Assignee deleted (Samuel Just)
Actions

Also available in: Atom PDF