Project

General

Profile

Actions

Bug #3413

closed

rbd bench-write fails with assert when rbd caching turned on

Added by Dan Mick over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

With 1152656c620c347cfd17035ea3e41711a79b86ce (and earlier 59839a8f9030c28391c7f99aa52a8c58a74c0066). Some kind of race with caching
I'm guessing.

2012-10-25 22:33:00.520533 7f26fa32f780 -1 ./common/Mutex.h: In function 'void Mutex::Lock(bool)' thread 7f26fa32f780 time 2012-10-25 22:33:00.519048
./common/Mutex.h: 113: FAILED assert(r 0)

ceph version 0.53-463-g1152656 (1152656c620c347cfd17035ea3e41711a79b86ce)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x4ad64d]
2: (Mutex::Lock(bool)+0x70) [0x489eb6]
3: /home/dmick/src/ceph/ceph/src/.libs/lt-rbd() [0x482801]
4: (librbd::AioCompletion::complete()+0xe0) [0x7f26f9e95310]
5: (librbd::AioCompletion::finish_adding_requests()+0x80) [0x7f26f9ec7c24]
6: (librbd::aio_write(librbd::ImageCtx*, unsigned long, unsigned long, char const*, librbd::AioCompletion*)+0xa19) [0x7f26f9ec54dc]
7: (librbd::Image::aio_write(unsigned long, unsigned long, ceph::buffer::list&, librbd::RBD::AioCompletion*)+0x75) [0x7f26f9e8f4c3]
8: (rbd_bencher::start_write(int, unsigned long, unsigned long, ceph::buffer::list&)+0xa9) [0x48b13b]
9: /home/dmick/src/ceph/ceph/src/.libs/lt-rbd() [0x4829fa]
10: (main()+0x2fe3) [0x488d93]
11: (__libc_start_main()+0xed) [0x7f26f6e0a76d]
12: /home/dmick/src/ceph/ceph/src/.libs/lt-rbd() [0x47fd79]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
49> 2012-10-25 22:33:00.453654 7f26fa32f780 5 asok(0x1574a50) register_command perfcounters_dump hook 0x15749b0
-48> 2012-10-25 22:33:00.453678 7f26fa32f780 5 asok(0x1574a50) register_command 1 hook 0x15749b0
-47> 2012-10-25 22:33:00.453682 7f26fa32f780 5 asok(0x1574a50) register_command perf dump hook 0x15749b0
-46> 2012-10-25 22:33:00.453687 7f26fa32f780 5 asok(0x1574a50) register_command perfcounters_schema hook 0x15749b0
-45> 2012-10-25 22:33:00.453689 7f26fa32f780 5 asok(0x1574a50) register_command 2 hook 0x15749b0
-44> 2012-10-25 22:33:00.453691 7f26fa32f780 5 asok(0x1574a50) register_command perf schema hook 0x15749b0
-43> 2012-10-25 22:33:00.453694 7f26fa32f780 5 asok(0x1574a50) register_command config show hook 0x15749b0
-42> 2012-10-25 22:33:00.453696 7f26fa32f780 5 asok(0x1574a50) register_command config set hook 0x15749b0
-41> 2012-10-25 22:33:00.453699 7f26fa32f780 5 asok(0x1574a50) register_command log flush hook 0x15749b0
-40> 2012-10-25 22:33:00.453702 7f26fa32f780 5 asok(0x1574a50) register_command log dump hook 0x15749b0
-39> 2012-10-25 22:33:00.453706 7f26fa32f780 5 asok(0x1574a50) register_command log reopen hook 0x15749b0
-38> 2012-10-25 22:33:00.456682 7f26fa32f780 1 librados: starting msgr at :/0
-37> 2012-10-25 22:33:00.456697 7f26fa32f780 1 librados: starting objecter
-36> 2012-10-25 22:33:00.456812 7f26fa32f780 1 -
:/0 messenger.start
35> 2012-10-25 22:33:00.456830 7f26fa32f780 1 librados: setting wanted keys
-34> 2012-10-25 22:33:00.456835 7f26fa32f780 1 librados: calling monclient init
-33> 2012-10-25 22:33:00.456881 7f26fa32f780 2 monclient(hunting): cephx auth is not supported, ignoring absence of keyring
-32> 2012-10-25 22:33:00.457146 7f26fa32f780 1 -
:/1015930 --> 192.168.107.227:6790/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x157f890 con 0x157f500
31> 2012-10-25 22:33:00.457512 7f26fa327700 1 - 192.168.107.227:0/1015930 learned my addr 192.168.107.227:0/1015930
30> 2012-10-25 22:33:00.458380 7f26f5de6700 1 - 192.168.107.227:0/1015930 < mon.1 192.168.107.227:6790/0 1 ==== mon_map v1 ==== 473+0+0 (344731583 0 0) 0x7f26e4000b10 con 0x157f500
29> 2012-10-25 22:33:00.458548 7f26f5de6700 1 monclient(hunting): found mon.b
-28> 2012-10-25 22:33:00.458581 7f26f5de6700 1 -
192.168.107.227:0/1015930 <== mon.1 192.168.107.227:6790/0 2 ==== auth_reply(proto 1 0 Success) v1 ==== 24+0+0 (1636446754 0 0) 0x7f26e4000eb0 con 0x157f500
27> 2012-10-25 22:33:00.458622 7f26f5de6700 1 - 192.168.107.227:0/1015930 --> 192.168.107.227:6790/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x157fc50 con 0x157f500
26> 2012-10-25 22:33:00.458699 7f26fa32f780 5 monclient: authenticate success, global_id 10807
-25> 2012-10-25 22:33:00.458782 7f26fa32f780 5 asok(0x1574a50) register_command objecter_requests hook 0x157f170
-24> 2012-10-25 22:33:00.458825 7f26fa32f780 1 -
192.168.107.227:0/1015930 --> 192.168.107.227:6790/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x157fc50 con 0x157f500
23> 2012-10-25 22:33:00.458866 7f26fa32f780 1 - 192.168.107.227:0/1015930 --> 192.168.107.227:6790/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x1580600 con 0x157f500
22> 2012-10-25 22:33:00.458887 7f26fa32f780 1 librados: waiting for osdmap
-21> 2012-10-25 22:33:00.459029 7f26f5de6700 1 -
192.168.107.227:0/1015930 <== mon.1 192.168.107.227:6790/0 3 ==== mon_map v1 ==== 473+0+0 (344731583 0 0) 0x7f26e4000b10 con 0x157f500
20> 2012-10-25 22:33:00.459081 7f26f5de6700 1 - 192.168.107.227:0/1015930 <== mon.1 192.168.107.227:6790/0 4 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (3537913511 0 0) 0x7f26e4000eb0 con 0x157f500
19> 2012-10-25 22:33:00.459332 7f26f5de6700 1 - 192.168.107.227:0/1015930 <== mon.1 192.168.107.227:6790/0 5 ==== osd_map(168..168 src has 1..168) v3 ==== 3359+0+0 (2050252776 0 0) 0x7f26e40008c0 con 0x157f500
18> 2012-10-25 22:33:00.460334 7f26f5de6700 1 - 192.168.107.227:0/1015930 <== mon.1 192.168.107.227:6790/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (3537913511 0 0) 0x7f26e4001b00 con 0x157f500
17> 2012-10-25 22:33:00.460376 7f26fa32f780 1 librados: init done
-16> 2012-10-25 22:33:00.460373 7f26f5de6700 1 -
192.168.107.227:0/1015930 <== mon.1 192.168.107.227:6790/0 7 ==== osd_map(168..168 src has 1..168) v3 ==== 3359+0+0 (2050252776 0 0) 0x7f26e4002a30 con 0x157f500
15> 2012-10-25 22:33:00.460427 7f26f5de6700 1 - 192.168.107.227:0/1015930 <== mon.1 192.168.107.227:6790/0 8 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (3537913511 0 0) 0x7f26e4002d40 con 0x157f500
14> 2012-10-25 22:33:00.461192 7f26fa32f780 1 - 192.168.107.227:0/1015930 --> 192.168.107.227:6803/15569 -- osd_op(client.10807.0:1 im2.rbd [stat] 7.8ef026f9) v4 -- ?+0 0x15825d0 con 0x1582280
13> 2012-10-25 22:33:00.462314 7f26f5de6700 1 - 192.168.107.227:0/1015930 <== osd.1 192.168.107.227:6803/15569 1 ==== osd_op_reply(1 im2.rbd [stat] = 2 (No such file or directory)) v4 ==== 106+0+0 (1278069516 0 0) 0x7f26d40009a0 con 0x1582280
-12> 2012-10-25 22:33:00.462529 7f26fa32f780 1 -
192.168.107.227:0/1015930 --> 192.168.107.227:6803/15569 -- osd_op(client.10807.0:2 rbd_id.im2 [stat] 7.17a09b08) v4 -- ?+0 0x15825d0 con 0x1582280
11> 2012-10-25 22:33:00.463381 7f26f5de6700 1 - 192.168.107.227:0/1015930 <== osd.1 192.168.107.227:6803/15569 2 ==== osd_op_reply(2 rbd_id.im2 [stat] = 0) v4 ==== 109+0+16 (526847941 0 2667632798) 0x7f26d40009a0 con 0x1582280
10> 2012-10-25 22:33:00.463573 7f26fa32f780 1 - 192.168.107.227:0/1015930 --> 192.168.107.227:6803/15569 -- osd_op(client.10807.0:3 rbd_id.im2 [call rbd.get_id] 7.17a09b08) v4 -- ?+0 0x1582b90 con 0x1582280
9> 2012-10-25 22:33:00.466001 7f26f5de6700 1 - 192.168.107.227:0/1015930 <== osd.1 192.168.107.227:6803/15569 3 ==== osd_op_reply(3 rbd_id.im2 [call] = 0) v4 ==== 109+0+16 (3358251403 0 281962594) 0x7f26d4000a50 con 0x1582280
8> 2012-10-25 22:33:00.466220 7f26fa32f780 1 - 192.168.107.227:0/1015930 --> 192.168.107.227:6803/15569 -- osd_op(client.10807.0:4 rbd_header.246c6b85a474 [call rbd.get_size,call rbd.get_object_prefix] 7.9e454233) v4 -- ?+0 0x1586010 con 0x1582280
7> 2012-10-25 22:33:00.467638 7f26f5de6700 1 - 192.168.107.227:0/1015930 <== osd.1 192.168.107.227:6803/15569 4 ==== osd_op_reply(4 rbd_header.246c6b85a474 [call,call] = 0) v4 ==== 164+0+34 (2072588659 0 3902593591) 0x7f26d4000a50 con 0x1582280
6> 2012-10-25 22:33:00.467846 7f26fa32f780 1 - 192.168.107.227:0/1015930 --> 192.168.107.227:6803/15569 -- osd_op(client.10807.0:5 rbd_header.246c6b85a474 [call rbd.get_stripe_unit_count] 7.9e454233) v4 -- ?+0 0x1581bb0 con 0x1582280
5> 2012-10-25 22:33:00.469472 7f26f5de6700 1 - 192.168.107.227:0/1015930 <== osd.1 192.168.107.227:6803/15569 5 ==== osd_op_reply(5 rbd_header.246c6b85a474 [call] = 0) v4 ==== 122+0+16 (2859042505 0 74706562) 0x7f26d4000a50 con 0x1582280
4> 2012-10-25 22:33:00.469837 7f26fa32f780 1 - 192.168.107.227:0/1015930 --> 192.168.107.227:6803/15569 -- osd_op(client.10807.0:6 rbd_header.246c6b85a474 [call rbd.get_size,call rbd.get_features,call rbd.get_snapcontext,call rbd.get_parent,call lock.get_info] 7.9e454233) v4 -- ?+0 0x1586010 con 0x1582280
3> 2012-10-25 22:33:00.472602 7f26f5de6700 1 - 192.168.107.227:0/1015930 <== osd.1 192.168.107.227:6803/15569 6 ==== osd_op_reply(6 rbd_header.246c6b85a474 [call,call,call,call,call] = 0) v4 ==== 290+0+80 (345967075 0 4056393194) 0x7f26d4000a50 con 0x1582280
2> 2012-10-25 22:33:00.473028 7f26fa32f780 1 - 192.168.107.227:0/1015930 --> 192.168.107.227:6803/15569 -- osd_op(client.10807.0:7 rbd_header.246c6b85a474 [assert-version 0~4,watch 1~0] 7.9e454233) v4 -- ?+0 0x1583240 con 0x1582280
1> 2012-10-25 22:33:00.518295 7f26f5de6700 1 - 192.168.107.227:0/1015930 <== osd.1 192.168.107.227:6803/15569 7 ==== osd_op_reply(7 rbd_header.246c6b85a474 [assert-version 0~4,watch 1~0] = 0) v4 ==== 164+0+0 (3112557614 0 0) 0x7f26d4000a20 con 0x1582280
0> 2012-10-25 22:33:00.520533 7f26fa32f780 -1 ./common/Mutex.h: In function 'void Mutex::Lock(bool)' thread 7f26fa32f780 time 2012-10-25 22:33:00.519048
./common/Mutex.h: 113: FAILED assert(r == 0)

Actions #1

Updated by Josh Durgin over 11 years ago

  • Status changed from New to 12
  • Assignee set to Josh Durgin
Actions #2

Updated by Ian Colle over 11 years ago

  • Project changed from Ceph to rbd
Actions #3

Updated by Josh Durgin over 11 years ago

  • Status changed from 12 to Fix Under Review

branch wip-rbd-bench-write

Actions #4

Updated by Josh Durgin over 11 years ago

  • Status changed from Fix Under Review to Resolved

commit:d81ac8418f9e6bbc9adcc69b2e7cb98dd4db6abb

Actions

Also available in: Atom PDF