Project

General

Profile

Bug #9610

Crash "RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)'" in multi-version-giant-testing-basic-multi run

Added by Yuri Weinstein almost 5 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
-
Start date:
09/27/2014
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-09-26_23:20:01-multi-version-giant-testing-basic-multi/514667/

2014-09-27T04:12:05.704 INFO:tasks.rados.rados.0.plana09.stderr:Error: finished tid 1 when last_acked_tid was 5
2014-09-27T04:12:05.705 INFO:tasks.rados.rados.0.plana09.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f9f75d26700 time 2014-09-27 04:12:05.704068
2014-09-27T04:12:05.705 INFO:tasks.rados.rados.0.plana09.stderr:./test/osd/RadosModel.h: 828: FAILED assert(0)
2014-09-27T04:12:05.705 INFO:tasks.rados.rados.0.plana09.stderr: ceph version 0.80.5-248-g1fafd6b (1fafd6bf2ef03672dfa27ec7a201a274927040b7)
2014-09-27T04:12:05.705 INFO:tasks.rados.rados.0.plana09.stderr: 1: (WriteOp::_finish(TestOp::CallbackInfo*)+0x318) [0x41a058]
2014-09-27T04:12:05.706 INFO:tasks.rados.rados.0.plana09.stderr: 2: (write_callback(void*, void*)+0x21) [0x427c21]
2014-09-27T04:12:05.706 INFO:tasks.rados.rados.0.plana09.stderr: 3: (librados::C_AioSafe::finish(int)+0x1d) [0x7f9f7c2daa0d]
2014-09-27T04:12:05.706 INFO:tasks.rados.rados.0.plana09.stderr: 4: (Context::complete(int)+0x9) [0x7f9f7c2b7fc9]
2014-09-27T04:12:05.706 INFO:tasks.rados.rados.0.plana09.stderr: 5: (Finisher::finisher_thread_entry()+0x1c0) [0x7f9f7c36bcd0]
2014-09-27T04:12:05.706 INFO:tasks.rados.rados.0.plana09.stderr: 6: (()+0x7e9a) [0x7f9f7bf1fe9a]
2014-09-27T04:12:05.706 INFO:tasks.rados.rados.0.plana09.stderr: 7: (clone()+0x6d) [0x7f9f7b7363fd]
2014-09-27T04:12:05.706 INFO:tasks.rados.rados.0.plana09.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-09-27T04:12:05.706 INFO:tasks.rados.rados.0.plana09.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'
2014-09-27T04:12:06.056 ERROR:teuthology.run_tasks:Manager failed: rados
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 113, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/rados.py", line 190, in task
    running.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 331, in get
    raise self._exception
CommandCrashedError: Command crashed: 'CEPH_CLIENT_ID=0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_rados --op read 100 --op write 100 --op delete 50 --max-ops 4000 --objects 500 --max-in-flight 16 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0 --op snap_create 50 --op snap_remove 50 --op rollback 50 --pool unique_pool_0'
2014-09-27T04:12:06.056 DEBUG:teuthology.run_tasks:Unwinding manager ceph
archive_path: /var/lib/teuthworker/archive/teuthology-2014-09-26_23:20:01-multi-version-giant-testing-basic-multi/514667
branch: giant
description: multi-version/dumpling-x/basic/{0-cluster/start.yaml 1-install/dumpling-firefly.yaml
  2-workload/rados_snap_many_objects.yaml}
email: ceph-qa@ceph.com
job_id: '514667'
kernel: &id001
  kdb: true
  sha1: e41b769d0435df25e88017418d335f41340a1e0e
last_in_suite: false
machine_type: plana,burnupi,mira
name: teuthology-2014-09-26_23:20:01-multi-version-giant-testing-basic-multi
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: giant
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    sha1: f8ac2248af8a7f04094c6d2f8844e928212ff6b0
  ceph-deploy:
    branch:
      dev: giant
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: f8ac2248af8a7f04094c6d2f8844e928212ff6b0
  s3tests:
    branch: giant
  workunit:
    sha1: f8ac2248af8a7f04094c6d2f8844e928212ff6b0
owner: scheduled_teuthology@teuthology
priority: 1000
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
suite: multi-version
suite_branch: giant
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_giant
targets:
  ubuntu@mira089.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCvvy2N6V9ovNPiJHmAs24nAZ4IaTpweFC+gSle0LBbI6errtA20YzKOgL4sAXr5dKVOLys7Syg65IUJD4HkRLGh9Gkhoo4D7xnlYAR1vjiXcGEc2oH4W079SjMjskVCwWiWxCoUUnEni+MmHsHLGCPAHSrsFAIuDunS1UMtOtNYUWhZnnXik4D3i2/A5YJh6jV8Xq9AdaA+f6izBFcu7aErToz4G4263snGzpeMH4oi/FqrnspxqZOFAIX6OtKjXciyOukncx8uqu5gyl8+AkEIiaz+CQmihEyetVfhKOwv9c6bsshtSJIOkGmci23+HJwJhIVjMs0syJLvUSTFZxv
  ubuntu@plana09.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEYHVcLAeMRhz4qfwrWIFsf79HIqaChMbGC2LGuvVgkMM/a1uVGhsYFHWlfDKMo9UnShyfjTUUmpD9jjjNpfuejhynQeTYBLjjrvE2yS7J9chyxRhSCXUN1rnAa5UmcDzd9CJjltX9h18iHNDPRGu1H3gzaZzonQo9Hwk6H+Xhubz/Y7GYBRq4jySYCnQ11hNj2pdnwxfiqjNawjaB3yYYTnA6NT9QUEPNNtTgFStyuACTKBI1JowDdHaxafRz0lMnCoU+r84gSZ+TWqyNkj9+8tyir6QqXQRV9CDlAQPIPiMPVnl9u17vEISaIR/I5RL7G3KR5CxER/Kc0oIhXPn1
  ubuntu@plana13.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2MDThFziLaiDvsp2V/5t6Ua4N3WPUgbyQB48nGv1gshBgy/DJh5s7aM+UJMW/BSu2bUnggSJRKQfmm4HQz/9b0noL1jjJWfhto5I5Cz6p9vngFRGpoejTqZfVQLEJkZNG4RbN18y1N33rEfSfnVQhSByf9UTWyBFa+66FbuwSkRs2vlDsZS4k7Go02ZyiDjdhhvZ5BzjEQtmYqyEAW9S4M0q0DmY1v4S5LgfqESWriGBEQnuXbzvskIHjHj7qxRdxGJ3EvsS7FvBfC9tBqqvWf2DwhU7oaVQpblQ2LIQCJpRXC5cPcem/wgbN/HUpw3u6owvEMPLnLhiUbCCTbkI3
tasks:
- internal.lock_machines:
  - 3
  - plana,burnupi,mira
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.push_inventory: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- install.upgrade:
    client.0:
      branch: firefly
- ceph: null
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: master
tube: multi
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.3138
client.0-kernel-sha1: e41b769d0435df25e88017418d335f41340a1e0e
description: multi-version/dumpling-x/basic/{0-cluster/start.yaml 1-install/dumpling-firefly.yaml
  2-workload/rados_snap_many_objects.yaml}
duration: 441.0656759738922
failure_reason: 'Command crashed: ''CEPH_CLIENT_ID=0 adjust-ulimits ceph-coverage
  /home/ubuntu/cephtest/archive/coverage ceph_test_rados --op read 100 --op write
  100 --op delete 50 --max-ops 4000 --objects 500 --max-in-flight 16 --size 4000000
  --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0 --op snap_create
  50 --op snap_remove 50 --op rollback 50 --pool unique_pool_0'''
flavor: basic
mon.a-kernel-sha1: e41b769d0435df25e88017418d335f41340a1e0e
mon.b-kernel-sha1: e41b769d0435df25e88017418d335f41340a1e0e
owner: scheduled_teuthology@teuthology
success: false

Related issues

Duplicated by Ceph - Bug #9705: "RadosModel.h: 829: FAILED assert(0)" in multi-version-giant-distro-basic-multi run Duplicate 10/08/2014

Associated revisions

Revision 2687b3d8 (diff)
Added by Tamilarasi muthamizhan almost 5 years ago

s/ceph_test_rados/rados_loadgen_big workload
fixes: #9610

Signed-off-by: tamil <>

History

#1 Updated by Yuri Weinstein almost 5 years ago

Another similar crash in job http://pulpito.front.sepia.ceph.com/teuthology-2014-09-26_23:20:01-multi-version-giant-testing-basic-multi/514671/

Assertion: ./test/osd/RadosModel.h: 829: FAILED assert(0)
ceph version 0.85-998-g3f05fbf (3f05fbf55bc94f057e04ef5640057a1c29b0d1ba)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x7fb24a43aecf]
 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x318) [0x419ee8]
 3: (write_callback(void*, void*)+0x21) [0x4278b1]
 4: (librados::C_AioSafe::finish(int)+0x1d) [0x7fb24a39c11d]
 5: (Context::complete(int)+0x9) [0x7fb24a3786e9]
 6: (Finisher::finisher_thread_entry()+0x160) [0x7fb24a43a0c0]
 7: (()+0x7e9a) [0x7fb249fdae9a]
 8: (clone()+0x6d) [0x7fb2497f231d]

#2 Updated by Sage Weil almost 5 years ago

  • Status changed from New to Resolved

pushed fix to dumpling branch, 503f865d6432bead72aac0ffba0539d807f078c4

#5 Updated by Yuri Weinstein almost 5 years ago

  • Project changed from teuthology to Ceph

#6 Updated by Yuri Weinstein almost 5 years ago

Still an issue: http://pulpito.front.sepia.ceph.com/teuthology-2014-10-08_23:20:03-multi-version-giant-distro-basic-multi/534751

Assertion: ./test/osd/RadosModel.h: 829: FAILED assert(0)
ceph version 0.86-49-g3bfb5fa (3bfb5fab41b6247259183c3f52c786e35beb3b01)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x7f98623f358f]
 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x318) [0x419ee8]
 3: (write_callback(void*, void*)+0x21) [0x4278c1]
 4: (librados::C_AioSafe::finish(int)+0x1d) [0x7f98623541ed]
 5: (Context::complete(int)+0x9) [0x7f9862330739]
 6: (Finisher::finisher_thread_entry()+0x160) [0x7f98623f2780]
 7: (()+0x7e9a) [0x7f9861f92e9a]
 8: (clone()+0x6d) [0x7f98617a93fd]

#7 Updated by Ian Colle almost 5 years ago

  • Status changed from Resolved to Verified
  • Assignee set to Samuel Just

#8 Updated by Sage Weil almost 5 years ago

  • Priority changed from Normal to Urgent

#9 Updated by Tamilarasi muthamizhan almost 5 years ago

seeing this on the upgrade test from v0.67.11 to firefly [v0.80.7]

logs: ubuntu@teuthology:/a/teuthology-2014-10-20_18:40:02-upgrade:firefly:older-firefly-distro-basic-vps/561549

#10 Updated by Sage Weil almost 5 years ago

  • Priority changed from Urgent to Immediate

#11 Updated by Tamilarasi muthamizhan almost 5 years ago

also: ubuntu@teuthology:/a/teuthology-2014-10-20_18:40:02-upgrade:firefly:older-firefly-distro-basic-vps/561550

2014-10-21T02:57:02.430 INFO:tasks.rados.rados.0.vpm154.stderr:Error: finished tid 1 when last_acked_tid was 4
2014-10-21T02:57:02.431 INFO:tasks.rados.rados.0.vpm154.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f1790f46700 time 2014-10-21 09:57:02.428940
2014-10-21T02:57:02.431 INFO:tasks.rados.rados.0.vpm154.stderr:./test/osd/RadosModel.h: 828: FAILED assert(0)
2014-10-21T02:57:02.434 INFO:tasks.rados.rados.0.vpm154.stderr: ceph version 0.80.7-73-g5a10b95 (5a10b95f7968ecac1f2af4abf9fb91347a290544)
2014-10-21T02:57:02.435 INFO:tasks.rados.rados.0.vpm154.stderr: 1: (WriteOp::_finish(TestOp::CallbackInfo*)+0x318) [0x41a058]
2014-10-21T02:57:02.435 INFO:tasks.rados.rados.0.vpm154.stderr: 2: (write_callback(void*, void*)+0x21) [0x427c21]
2014-10-21T02:57:02.435 INFO:tasks.rados.rados.0.vpm154.stderr: 3: (librados::C_AioSafe::finish(int)+0x1d) [0x7f1797cd89ad]
2014-10-21T02:57:02.435 INFO:tasks.rados.rados.0.vpm154.stderr: 4: (Context::complete(int)+0x9) [0x7f1797cb5f69]
2014-10-21T02:57:02.436 INFO:tasks.rados.rados.0.vpm154.stderr: 5: (Finisher::finisher_thread_entry()+0x1c0) [0x7f1797d69f30]
2014-10-21T02:57:02.436 INFO:tasks.rados.rados.0.vpm154.stderr: 6: (()+0x7e9a) [0x7f179791de9a]
2014-10-21T02:57:02.436 INFO:tasks.rados.rados.0.vpm154.stderr: 7: (clone()+0x6d) [0x7f179713531d]
2014-10-21T02:57:02.437 INFO:tasks.rados.rados.0.vpm154.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-10-21T02:57:02.438 INFO:tasks.rados.rados.0.vpm154.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'
2014-10-21T02:57:02.508 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 50, in _run_spawned
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/sequential.py", line 55, in task
    mgr.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/rados.py", line 170, in task
    running.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 331, in get
    raise self._exception
CommandCrashedError: Command crashed: 'CEPH_CLIENT_ID=0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph_test_rados --op read 100 --op write 100 --op delete 50 --max-ops 2000 --objects 50 --max-in-flight 16 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0 --op snap_create 50 --op snap_remove 50 --op rollback 50 --pool unique_pool_0'
2014-10-21T02:57:02.510 ERROR:teuthology.run_tasks:Saw exception from tasks.

#12 Updated by Samuel Just almost 5 years ago

  • Assignee changed from Samuel Just to Tamilarasi muthamizhan
  • Priority changed from Immediate to High

New ceph_test_rados is too picky for dumpling osds. We only want to use dumpling ceph_test_rados against clusters with dumpling osds.

#13 Updated by Tamilarasi muthamizhan almost 5 years ago

  • Status changed from Verified to Resolved

fixed in multi-version suite already - commit b966da7b71c8aee22ff8e58b3b0c105b1d7ca4bf

fixed in upgrade:firefly/older suite now: 2687b3d8996a4985fc790aec16af25fe09fbebb3

#14 Updated by bo cai almost 4 years ago

can you give us a github commit link about how you resolved this?
the commit sha1 you said can not found.

Also available in: Atom PDF