Bug #23598
closedhammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade test
0%
Description
Test description: rados/upgrade/{hammer-x-singleton/{0-cluster/{openstack.yaml start.yaml} 1-hammer-install/hammer.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/{rbd-cls.yaml rbd-import-export.yaml readwrite.yaml snaps-few-objects.yaml} 6-next-mon/monb.yaml 7-workload/{radosbench.yaml rbd_api.yaml} 8-next-mon/monc.yaml 9-workload/{ec-rados-plugin=jerasure-k=3-m=1.yaml rbd-python.yaml rgw-swift.yaml snaps-many-objects.yaml test_cache-pool-snaps.yaml}} rados.yaml}
Symptom: crash during radosbench
Log excerpt:
2018-04-08T20:51:22.888 INFO:teuthology.task.full_sequential:In full_sequential, running task radosbench... 2018-04-08T20:51:22.888 INFO:tasks.radosbench:Beginning radosbench...
After some time, but still within radosbench:
2018-04-08T21:06:37.005 INFO:tasks.rados.rados.0.smithi130.stderr:./test/osd/RadosModel.h: In function 'virtual void CopyFromOp::_finish(TestOp::CallbackInfo*)' thread 7f11527fc700 time 2018-04-08 21:06:37.005490 2018-04-08T21:06:37.006 INFO:tasks.rados.rados.0.smithi130.stderr:./test/osd/RadosModel.h: 1597: FAILED assert(!version || comp->get_version64() == version) 2018-04-08T21:06:37.006 INFO:tasks.rados.rados.0.smithi130.stderr: ceph version 0.94.10-85-ga8e54ce (a8e54cee69fc2fdc8df27f35ebe1b56444f43317) 2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x4e2cb5] 2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 2: (CopyFromOp::_finish(TestOp::CallbackInfo*)+0x4bb) [0x4c940b] 2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 3: (write_callback(void*, void*)+0x19) [0x4d9e49] 2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 4: (()+0x99b4d) [0x7f115f8bcb4d] 2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 5: (()+0x73379) [0x7f115f896379] 2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 6: (()+0x13eb88) [0x7f115f961b88] 2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 7: (()+0x7e25) [0x7f115ec8be25] 2018-04-08T21:06:37.008 INFO:tasks.rados.rados.0.smithi130.stderr: 8: (clone()+0x6d) [0x7f115dd8c34d] 2018-04-08T21:06:37.008 INFO:tasks.rados.rados.0.smithi130.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2018-04-08T21:06:37.008 INFO:tasks.rados.rados.0.smithi130.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'
Alternatively (in some runs), the crash is:
2018-04-08T09:02:33.922 INFO:tasks.rados.rados.0.smithi016.stderr:Error: finished tid 1 when last_acked_tid was 6 2018-04-08T09:02:33.922 INFO:tasks.rados.rados.0.smithi016.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f953ffff700 time 2018-04-08 09:02:33.913642 2018-04-08T09:02:33.922 INFO:tasks.rados.rados.0.smithi016.stderr:./test/osd/RadosModel.h: 854: FAILED assert(0) 2018-04-08T09:02:33.922 INFO:tasks.rados.rados.0.smithi016.stderr: ceph version 0.94.10-85-ga8e54ce (a8e54cee69fc2fdc8df27f35ebe1b56444f43317) 2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x4e2cb5] 2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x4a3) [0x4c9ce3] 2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 3: (write_callback(void*, void*)+0x19) [0x4d9e49] 2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 4: (()+0x99b4d) [0x7f9559a7ab4d] 2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 5: (()+0x73379) [0x7f9559a54379] 2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 6: (()+0x13eb88) [0x7f9559b1fb88] 2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 7: (()+0x7e25) [0x7f9558e49e25] 2018-04-08T09:02:33.924 INFO:tasks.rados.rados.0.smithi016.stderr: 8: (clone()+0x6d) [0x7f9557f4a34d] 2018-04-08T09:02:33.925 INFO:tasks.rados.rados.0.smithi016.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2018-04-08T09:02:33.925 INFO:tasks.rados.rados.0.smithi016.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'
Reproducibility: HIGH (4 in 4 tries)
Updated by Nathan Cutler about 6 years ago
Set priority to Urgent because this prevents us from getting a clean rados run in jewel 10.2.11 integration testing.
Updated by Nathan Cutler about 6 years ago
- Related to Bug #22123: osd: objecter sends out of sync with pg epochs for proxied ops added
Updated by Nathan Cutler about 6 years ago
This problem was not happening so reproducibly before the current integration run, so one of the following PRs might be implicated:
https://github.com/ceph/ceph/pull/21200 - jewel: osd/PrimaryLogPG: dump snap_trimq size
https://github.com/ceph/ceph/pull/21199 - jewel: osd: replica read can trigger cache promotion
https://github.com/ceph/ceph/pull/21197 - jewel: ceph_authtool: add mode option
https://github.com/ceph/ceph/pull/20381 - jewel: librados: Double free in rados_getxattrs_next
https://github.com/ceph/ceph/pull/18010 - jewel: core: enable rocksdb for filestore
Updated by Nathan Cutler about 6 years ago
- Related to Bug #22063: "RadosModel.h: 1703: FAILED assert(!version || comp->get_version64() == version)" inrados-jewel-distro-basic-smithi added
Updated by Nathan Cutler about 6 years ago
- Subject changed from FAILED assert(!version || comp->get_version64() == version) in jewel rados upgrade test to FAILED assert(!version || comp->get_version64() == version) in radosbench in jewel rados upgrade test
Updated by Nathan Cutler about 6 years ago
rados bisect¶
Reproducer: --suite rados --filter="rados/upgrade/{hammer-x-singleton/{0-cluster/{openstack.yaml start.yaml} 1-hammer-install/hammer.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/{rbd-cls.yaml rbd-import-export.yaml readwrite.yaml snaps-few-objects.yaml} 6-next-mon/monb.yaml 7-workload/{radosbench.yaml rbd_api.yaml} 8-next-mon/monc.yaml 9-workload/{ec-rados-plugin=jerasure-k=3-m=1.yaml rbd-python.yaml rgw-swift.yaml snaps-many-objects.yaml test_cache-pool-snaps.yaml}} rados.yaml}" --num 5
Jewel baseline
- fail (all 5 failed) http://pulpito.ceph.com/smithfarm-2018-04-09_02:53:17-rados-jewel-distro-basic-smithi/
wip-jewel-backports
Updated by Nathan Cutler about 6 years ago
- Subject changed from FAILED assert(!version || comp->get_version64() == version) in radosbench in jewel rados upgrade test to ceph_test_rados crashes in radosbench task in jewel rados upgrade test
Updated by Nathan Cutler about 6 years ago
- Subject changed from ceph_test_rados crashes in radosbench task in jewel rados upgrade test to ceph_test_rados crashes during radosbench task in jewel rados upgrade test
Updated by Greg Farnum about 6 years ago
- Project changed from Ceph to RADOS
This is a dupe of...something. We can track it down later.
For now, note that the crash is happening with Hammer clients during an upgrade to Jewel.
Updated by Sage Weil almost 6 years ago
- Subject changed from ceph_test_rados crashes during radosbench task in jewel rados upgrade test to hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade test
Updated by Kefu Chai almost 6 years ago
- Related to Bug #23947: ceph_test_rados dumped core, Error: finished tid 1 when last_acked_tid was 6 added
Updated by Kefu Chai almost 6 years ago
- Related to Bug #23290: "/test/osd/RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi added
Updated by Kefu Chai almost 6 years ago
- Related to deleted (Bug #23290: "/test/osd/RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi)
Updated by Kefu Chai almost 6 years ago
- Is duplicate of Bug #23290: "/test/osd/RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi added
Updated by Kefu Chai almost 6 years ago
- Status changed from New to Duplicate
#23290 does not contain any of the PR mentioned above. so it's not a regression.