Project

General

Profile

Actions

Bug #23598

closed

hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade test

Added by Nathan Cutler about 6 years ago. Updated almost 6 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Test description: rados/upgrade/{hammer-x-singleton/{0-cluster/{openstack.yaml start.yaml} 1-hammer-install/hammer.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/{rbd-cls.yaml rbd-import-export.yaml readwrite.yaml snaps-few-objects.yaml} 6-next-mon/monb.yaml 7-workload/{radosbench.yaml rbd_api.yaml} 8-next-mon/monc.yaml 9-workload/{ec-rados-plugin=jerasure-k=3-m=1.yaml rbd-python.yaml rgw-swift.yaml snaps-many-objects.yaml test_cache-pool-snaps.yaml}} rados.yaml}

Symptom: crash during radosbench

Log excerpt:

2018-04-08T20:51:22.888 INFO:teuthology.task.full_sequential:In full_sequential, running task radosbench...
2018-04-08T20:51:22.888 INFO:tasks.radosbench:Beginning radosbench...

After some time, but still within radosbench:

2018-04-08T21:06:37.005 INFO:tasks.rados.rados.0.smithi130.stderr:./test/osd/RadosModel.h: In function 'virtual void CopyFromOp::_finish(TestOp::CallbackInfo*)' thread 7f11527fc700 time 2018-04-08 21:06:37.005490
2018-04-08T21:06:37.006 INFO:tasks.rados.rados.0.smithi130.stderr:./test/osd/RadosModel.h: 1597: FAILED assert(!version || comp->get_version64() == version)
2018-04-08T21:06:37.006 INFO:tasks.rados.rados.0.smithi130.stderr: ceph version 0.94.10-85-ga8e54ce (a8e54cee69fc2fdc8df27f35ebe1b56444f43317)
2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x4e2cb5]
2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 2: (CopyFromOp::_finish(TestOp::CallbackInfo*)+0x4bb) [0x4c940b]
2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 3: (write_callback(void*, void*)+0x19) [0x4d9e49]
2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 4: (()+0x99b4d) [0x7f115f8bcb4d]
2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 5: (()+0x73379) [0x7f115f896379]
2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 6: (()+0x13eb88) [0x7f115f961b88]
2018-04-08T21:06:37.007 INFO:tasks.rados.rados.0.smithi130.stderr: 7: (()+0x7e25) [0x7f115ec8be25]
2018-04-08T21:06:37.008 INFO:tasks.rados.rados.0.smithi130.stderr: 8: (clone()+0x6d) [0x7f115dd8c34d]
2018-04-08T21:06:37.008 INFO:tasks.rados.rados.0.smithi130.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2018-04-08T21:06:37.008 INFO:tasks.rados.rados.0.smithi130.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'

Alternatively (in some runs), the crash is:

2018-04-08T09:02:33.922 INFO:tasks.rados.rados.0.smithi016.stderr:Error: finished tid 1 when last_acked_tid was 6
2018-04-08T09:02:33.922 INFO:tasks.rados.rados.0.smithi016.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f953ffff700 time 2018-04-08 09:02:33.913642
2018-04-08T09:02:33.922 INFO:tasks.rados.rados.0.smithi016.stderr:./test/osd/RadosModel.h: 854: FAILED assert(0)
2018-04-08T09:02:33.922 INFO:tasks.rados.rados.0.smithi016.stderr: ceph version 0.94.10-85-ga8e54ce (a8e54cee69fc2fdc8df27f35ebe1b56444f43317)
2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x4e2cb5]
2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x4a3) [0x4c9ce3]
2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 3: (write_callback(void*, void*)+0x19) [0x4d9e49]
2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 4: (()+0x99b4d) [0x7f9559a7ab4d]
2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 5: (()+0x73379) [0x7f9559a54379]
2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 6: (()+0x13eb88) [0x7f9559b1fb88]
2018-04-08T09:02:33.923 INFO:tasks.rados.rados.0.smithi016.stderr: 7: (()+0x7e25) [0x7f9558e49e25]
2018-04-08T09:02:33.924 INFO:tasks.rados.rados.0.smithi016.stderr: 8: (clone()+0x6d) [0x7f9557f4a34d]
2018-04-08T09:02:33.925 INFO:tasks.rados.rados.0.smithi016.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2018-04-08T09:02:33.925 INFO:tasks.rados.rados.0.smithi016.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'

Full log: http://qa-proxy.ceph.com/teuthology/smithfarm-2018-04-08_20:06:36-rados-wip-jewel-backports-distro-basic-smithi/2371658/teuthology.log

Reproducibility: HIGH (4 in 4 tries)


Related issues 4 (2 open2 closed)

Related to RADOS - Bug #22123: osd: objecter sends out of sync with pg epochs for proxied opsResolvedSage Weil11/14/2017

Actions
Related to RADOS - Bug #22063: "RadosModel.h: 1703: FAILED assert(!version || comp->get_version64() == version)" inrados-jewel-distro-basic-smithi Duplicate11/07/2017

Actions
Related to Ceph - Bug #23947: ceph_test_rados dumped core, Error: finished tid 1 when last_acked_tid was 6New

Actions
Is duplicate of RADOS - Bug #23290: "/test/osd/RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithiNew03/09/2018

Actions
Actions

Also available in: Atom PDF