Bug #23426
aio thread got No space left on device
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
smoke
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Seems reproducible on all distros
Runs:
http://pulpito.ceph.com/teuthology-2018-03-20_05:02:01-smoke-master-testing-basic-ovh/ xenial
http://pulpito.ceph.com/teuthology-2018-03-20_07:02:02-smoke-master-testing-basic-ovh/ centos
Jobs:
['2308443', '2308423'] xenial
['2307549', '2307569'] centos
2018-03-20T09:06:06.125 INFO:teuthology.orchestra.run.ovh086:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-03-20T09:06:36.430 INFO:teuthology.orchestra.run.ovh007:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-03-20T09:06:36.552 INFO:teuthology.orchestra.run.ovh076:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-03-20T09:06:36.669 INFO:teuthology.orchestra.run.ovh086:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-03-20T09:07:06.856 INFO:teuthology.orchestra.run.ovh007:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-03-20T09:07:06.974 INFO:teuthology.orchestra.run.ovh076:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-03-20T09:07:07.087 INFO:teuthology.orchestra.run.ovh086:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-03-20T09:07:37.342 INFO:teuthology.orchestra.run.ovh007:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-03-20T09:07:37.402 INFO:teuthology.orchestra.run.ovh076:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-03-20T09:07:37.510 INFO:teuthology.orchestra.run.ovh086:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf' 2018-03-20T09:07:52.297 INFO:tasks.ceph.osd.0.ovh076.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.0.1-3181-g820dac9/rpm/el7/BUILD/ceph-13.0.1-3181-g820dac9/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7fa42ae61700 time 2018-03-20 09:07:52.286519 2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.0.1-3181-g820dac9/rpm/el7/BUILD/ceph-13.0.1-3181-g820dac9/src/os/bluestore/KernelDevice.cc: 417: FAILED assert(0 == "got unexpected error from io_getevents") 2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr:2018-03-20 09:07:52.285 7fa42ae61700 -1 bdev(0x56304ec1a000 /var/lib/ceph/osd/ceph-0/block) _aio_thread got (28) No space left on device 2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: ceph version 13.0.1-3181-g820dac9 (820dac980e9416fe05998d50cac633c81a87b9e3) mimic (dev) 2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7fa43774bc8f] 2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: 2: (()+0x278e77) [0x7fa43774be77] 2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: 3: (KernelDevice::_aio_thread()+0xd71) [0x56304c85fe51] 2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: 4: (KernelDevice::AioCompletionThread::entry()+0xd) [0x56304c86366d] 2018-03-20T09:07:52.298 INFO:tasks.ceph.osd.0.ovh076.stderr: 5: (()+0x7e25) [0x7fa434d25e25] 2018-03-20T09:07:52.299 INFO:tasks.ceph.osd.0.ovh076.stderr: 6: (clone()+0x6d) [0x7fa433e1934d] 2018-03-20T09:07:52.299 INFO:tasks.ceph.osd.0.ovh076.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Related issues
History
#1 Updated by Yuri Weinstein about 6 years ago
- Related to Bug #23333: bluestore: ENODATA on aio added
#2 Updated by Yuri Weinstein about 6 years ago
might be dupe of #23333
#3 Updated by Radoslaw Zarzynski about 6 years ago
- Assignee set to Radoslaw Zarzynski
#4 Updated by Radoslaw Zarzynski about 6 years ago
- Status changed from New to 12
Yeah, the assertion came from aio_t::get_return_value. It might be because e.g. driver returned BLK_STS_NOSPC or the node is really running out of space. To know more I would need to take a look on dmesg the linked directory lacks. Is in obtainable?
#5 Updated by Sage Weil almost 6 years ago
see remote/*/log/syslog/*
#6 Updated by Sage Weil almost 6 years ago
- Subject changed from "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102)" in smoke to aio thread got No space left on device
- Status changed from 12 to Won't Fix
this looks like a provisioning/test error, not a bug, if we're getting ENOSPC.