Bug #15314
closedfailed to decode message of type 43 v7: buffer::end_of_buffer
0%
Description
During the start of CBT cache tiering tests of v10.1.0 on incerta05-08, we're hitting a bug when cbt runs the rbd command to create the RBD images. It also appears to happen during other rbd commands such as "ls". I've included log snippets from the client and OSD with ms, objecter, and osd debugging turned up to 20. It appears that message 43 is osd_op_reply as can be seen in the OSD log (also, thanks Greg for pointing out that this message type is in ceph_fs.h, not message.h!)
If I have time, I'll see if I can dump the hex into binary and run it through ceph-dencoder. The issue is quite repeatable and can be caused simply by doing a "sudo rbd ls" on perf@incerta[05-08].
version:
ceph version 10.1.0 (96ae8bd25f31862dbd5302f304ebf8bf1166aba6)
command run:
sudo /usr/bin/rbd -c /home/perf/tmp/cbt/ceph/ceph.conf create cbt-librbdfio-incerta05-foo --size 131072 --pool cbt-librbdfio --order 22 --debug-rbd 20 --debug-ms 20 --debug-objecter 20
OSD message sent:
2016-03-29 19:16:35.181958 7f46a6dc6700 20 -- 10.0.10.107:6808/127105 >> 10.0.10.107:0/3904314947 pipe(0x7f46e10cc000 sd=174 :6808 s=2 pgs=314 cs=1 l=1 c=0x7f46e13c4c00).writer encoding 1 features 576460752032874495 0x7f46ddc8edc0 osd_op_reply(1 cbt-librbdfio-incerta05-foo.rbd [stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v7
Hex dump from the client:
2016-03-29 19:16:35.186561 7f756c3ca700 1 dump: 00000000 1f 00 00 00 63 62 74 2d 6c 69 62 72 62 64 66 69 |....cbt-librbdfi| 00000010 6f 2d 69 6e 63 65 72 74 61 30 35 2d 66 6f 6f 2e |o-incerta05-foo.| 00000020 72 62 64 01 02 00 00 00 00 00 00 00 c7 68 9a 98 |rbd..........h..| 00000030 ff ff ff ff 19 00 40 00 00 00 00 00 fe ff ff ff |......@.........| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 79 00 00 00 |............y...| 00000050 01 00 00 00 02 12 00 00 00 00 00 00 00 00 00 00 |................| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000070 00 00 00 00 00 00 00 00 00 00 3b 01 00 00 00 00 |..........;.....| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000090 00 00 00 00 00 00 00 |.......| 00000097
Files
Updated by Mark Nelson about 8 years ago
For what it's worth, I tried dumping the hex to a binary file (with some internet found sed magic to make xxd like it):
sed 's/^[0-9]*//' hexdump | xxd -r -p | dd conv=swab of=binaryfile
and running it through ceph-dencoder:
[ubuntu@incerta05 issue_15314]$ ceph-dencoder import binaryfile type MOSDOpReply dump_json { "summary": "osd_op_reply(0 [] v0'0 uv0 ack = 0)" }
If we add the decode step in, it fails though:
[ubuntu@incerta05 issue_15314]$ ceph-dencoder import binaryfile type MOSDOpReply decode dump_json error: buffer::end_of_buffer
Updated by Josh Durgin about 8 years ago
You've got librbd1 10.0.2 installed - hopefully this is just a temporary incompatability from a dev branch. Try librbd 10.1.0.
Updated by Mark Nelson about 8 years ago
Hi Josh,
This is 10.1.0 compiled locally and installed in /usr/local using make install:
[ubuntu@incerta05 lib]$ which rbd /usr/local/bin/rbd
[ubuntu@incerta05 lib]$ ldd /usr/local/bin/rbd | grep librbd librbd.so.1 => /usr/local/lib/librbd.so.1 (0x00007f17e6d88000)
[ubuntu@incerta05 lib]$ ls -al librbd.so.1 lrwxrwxrwx. 1 root root 15 Mar 29 09:49 librbd.so.1 -> librbd.so.1.0.0
[perf@incerta05 librbd]$ strings /usr/local/lib/librbd.so.1.0.0 | grep 10.1.0 10.1.0 [perf@incerta05 librbd]$ strings /usr/local/lib/librbd.so.1.0.0 | grep 10.0.2 [perf@incerta05 librbd]$
Updated by Mark Nelson about 8 years ago
Aha, thanks to Josh we found that in fact we did have a stray copy of 10.0.2 installed, so this is perhaps an incompatability with an earlier version. Closing now.
Updated by Mark Nelson about 8 years ago
- Category deleted (
OSD) - Status changed from New to Closed
- Assignee deleted (
Samuel Just)