Bug #19983
closedosds abort on shutdown with assert(/build/ceph-12.0.2/src/os/bluestore/KernelDevice.cc: 364: FAILED assert(r >= 0))
0%
Description
version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+rbd+ec+overwrite+iscsi,ec k/m == 4/2,12 osds
2017-05-17 17:00:24.446351 7f994a645700 -1 /build/ceph-12.0.2/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f994a645700 time 2017-05-17 17:00:24.442872
/build/ceph-12.0.2/src/os/bluestore/KernelDevice.cc: 364: FAILED assert(r >= 0)
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x563f2f015072]
2: (KernelDevice::_aio_thread()+0x1301) [0x563f2ef9ac61]
3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x563f2ef9d45d]
4: (()+0x76ba) [0x7f99523176ba]
5: (clone()+0x6d) [0x7f995138e82d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Brad Hubbard almost 7 years ago
/a/bhubbard-2017-05-24_05:25:43-rados-wip-badone-testing---basic-smithi/1224591/teuthology.log
2017-05-24T07:27:09.928 INFO:tasks.workunit.client.0.smithi084.stderr:2017-05-24 07:27:09.927439 7f8f003e9700 0 -- 172.21.15.84:0/674029674 >> 172.21.15.84:6801/836721 pipe(0x7f8f0801bd10 sd=9 :56304 s=2 pgs=168 cs=1 l=1 c=0x7f8f0801e360).injecting socket failure 2017-05-24T07:27:09.931 INFO:tasks.workunit.client.0.smithi084.stderr:2017-05-24 07:27:09.927602 7f8f2d86c980 0 -- 172.21.15.84:0/674029674 submit_message osd_op(client.4304.0:127 5.7 5:e49d7777:::benchmark_data_smithi084_844084_object126:head [omap-set-vals] snapc 0=[] ondisk+write+known_if_redirected e40) v8 remote, 172.21.15.84:6801/836721, failed lossy con, dropping message 0x55f4a4642fd0 2017-05-24T07:27:10.052 INFO:tasks.ceph.osd.2.smithi084.stderr:/build/ceph-12.0.2-1221-geb5c02d/src/os/bluestore/BlockDevice.h: In function 'void IOContext::aio_wake()' thread 7f6732579700 time 2017-05-24 07:27:10.052106 2017-05-24T07:27:10.055 INFO:tasks.ceph.osd.2.smithi084.stderr:/build/ceph-12.0.2-1221-geb5c02d/src/os/bluestore/BlockDevice.h: 65: FAILED assert(num_running == 0) 2017-05-24T07:27:10.059 INFO:tasks.ceph.osd.2.smithi084.stderr: ceph version 12.0.2-1221-geb5c02d (eb5c02df634b34ca45ecbf1eaf4440ccf806845f) 2017-05-24T07:27:10.063 INFO:tasks.ceph.osd.2.smithi084.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x56299e8f14e2] 2017-05-24T07:27:10.067 INFO:tasks.ceph.osd.2.smithi084.stderr: 2: (KernelDevice::_aio_thread()+0xd94) [0x56299e898384] 2017-05-24T07:27:10.076 INFO:tasks.ceph.osd.2.smithi084.stderr: 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x56299e89aa9d] 2017-05-24T07:27:10.079 INFO:tasks.ceph.osd.2.smithi084.stderr: 4: (()+0x770a) [0x7f673d09170a] 2017-05-24T07:27:10.082 INFO:tasks.ceph.osd.2.smithi084.stderr: 5: (clone()+0x6d) [0x7f673c10882d] 2017-05-24T07:27:10.088 INFO:tasks.ceph.osd.2.smithi084.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actually, this may be different but appears to be a similar race in the same function?
Updated by Brad Hubbard almost 7 years ago
- Priority changed from Normal to Urgent
Updated by xw zhang almost 7 years ago
I pulled out a disk, and then there was the problem.
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Category set to Correctness/Safety
- Component(RADOS) BlueStore added
Updated by Sage Weil almost 7 years ago
- Status changed from New to Need More Info
Do you mean you pulled out the disk, and then ceph-osd crashed? That is normal--the disk si gone!
Or, do you mean that you pulled the disk, rebooted the server, and ceph-osd crashed on startup?
Updated by Sage Weil almost 7 years ago
- Status changed from Need More Info to Closed