Project

General

Profile

Bug #8425

Ceph - Bug #7068: os/FileStore.cc: 4035: FAILED assert(omap_attrs.size() == omap_aset.size()) (dumpling)

osd crashed in rados-dumpling-testing-basic-plana suite

Added by Yuri Weinstein almost 10 years ago. Updated almost 10 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-05-20_19:00:34-rados-dumpling-testing-basic-plana/267269/

coredump info @ ubuntu@teuthology:/a/teuthology-2014-05-20_19:00:/log$ceph-osd.3.log.gz :

ceph-osd.3.log.gz:835407114-    -6> 2014-05-21 19:36:27.016800 7fb451ee7700 20 journal write_thread_entry going to sleep
ceph-osd.3.log.gz:835407207-    -5> 2014-05-21 19:36:27.016801 7fb44f6e2700  5 filestore(/var/lib/ceph/osd/ceph-3) queue_op 0x2d6aa50 seq 58784 osr(3.14 0x25c2a70) 1196 bytes   (queue has 50 ops and 8215300 bytes)
ceph-osd.3.log.gz:835407393-    -4> 2014-05-21 19:36:27.016804 7fb44f6e2700  5 filestore(/var/lib/ceph/osd/ceph-3) _journaled_ahead 0x2d6acd0 seq 58785 osr(3.15 0x32154d0) 0x37ba900
ceph-osd.3.log.gz:835407547-    -3> 2014-05-21 19:36:27.016807 7fb44f6e2700  5 filestore(/var/lib/ceph/osd/ceph-3) queue_op 0x2d6acd0 seq 58785 osr(3.15 0x32154d0) 1196 bytes   (queue has 50 ops and 8215300 bytes)
ceph-osd.3.log.gz:835407733-    -2> 2014-05-21 19:36:27.016810 7fb44f6e2700  5 filestore(/var/lib/ceph/osd/ceph-3) _journaled_ahead 0x2d6abe0 seq 58786 osr(3.15 0x32154d0) 0x37ba300
ceph-osd.3.log.gz:835407887-    -1> 2014-05-21 19:36:27.016812 7fb44f6e2700  5 filestore(/var/lib/ceph/osd/ceph-3) queue_op 0x2d6abe0 seq 58786 osr(3.15 0x32154d0) 1196 bytes   (queue has 50 ops and 8215300 bytes)
ceph-osd.3.log.gz:835408073:     0> 2014-05-21 19:36:27.051066 7fb4459b4700 -1 *** Caught signal (Aborted) **
ceph-osd.3.log.gz:835408155- in thread 7fb4459b4700
ceph-osd.3.log.gz:835408179-
ceph-osd.3.log.gz:835408180- ceph version 0.67.8-15-gb638d19 (b638d19d126646d2a8f6da11067c5f392a62525e)
ceph-osd.3.log.gz:835408256- 1: ceph-osd() [0x7fe46a]
ceph-osd.3.log.gz:835408282- 2: (()+0xfcb0) [0x7fb458ac0cb0]
ceph-osd.3.log.gz:835408315- 3: (gsignal()+0x35) [0x7fb456b46425]
ceph-osd.3.log.gz:835408353- 4: (abort()+0x17b) [0x7fb456b49b8b]
ceph-osd.3.log.gz:835408390- 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb45749969d]
ceph-osd.3.log.gz:835408460- 6: (()+0xb5846) [0x7fb457497846]
ceph-osd.3.log.gz:835408494- 7: (()+0xb5873) [0x7fb457497873]
ceph-osd.3.log.gz:835408528- 8: (()+0xb596e) [0x7fb45749796e]
ceph-osd.3.log.gz:835408562- 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x8c841f]
ceph-osd.3.log.gz:835408654- 10: (FileStore::getattrs(coll_t, hobject_t const&, std::map<std::string, ceph::buffer::ptr, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::ptr> > >&, bool)+0xbde) [0x7a735e]
ceph-osd.3.log.gz:835408864- 11: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x18a4) [0x606ac4]
ceph-osd.3.log.gz:835408985- 12: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x6f) [0x6107ff]
ceph-osd.3.log.gz:835409068- 13: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x35a0) [0x618ac0]
ceph-osd.3.log.gz:835409146- 14: (PG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x619) [0x7098a9]
ceph-osd.3.log.gz:835409241- 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x330) [0x65cd90]
ceph-osd.3.log.gz:835409363- 16: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x478) [0x673828]
ceph-osd.3.log.gz:835409456- 17: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6aee1c]
ceph-osd.3.log.gz:835409647- 18: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b8c56]
ceph-osd.3.log.gz:835409715- 19: (ThreadPool::WorkThread::entry()+0x10) [0x8baa60]
ceph-osd.3.log.gz:835409770- 20: (()+0x7e9a) [0x7fb458ab8e9a]
ceph-osd.3.log.gz:835409804- 21: (clone()+0x6d) [0x7fb456c043fd]
ceph-osd.3.log.gz:835409841- NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
ceph-osd.3.log.gz:835409934-
ceph-osd.3.log.gz:835409935---- logging levels ---
ceph-osd.3.log.gz:835409958-   0/ 5 none
ceph-osd.3.log.gz:835409971-   0/ 1 lockdep
ceph-osd.3.log.gz:835409987-   0/ 1 context

2014-05-21T19:37:51.050 ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x2b67f90>
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-dumpling/teuthology/run_tasks.py", line 45, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/thrashosds.py", line 167, in task
    thrash_proc.do_join()
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/ceph_manager.py", line 106, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 331, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.3 restart
2014-05-21T19:37:51.186 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x2aaa610>
2014-05-21T19:37:51.186 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-dumpling/teuthology/contextutil.py", line 27, in nested
    yield vars
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/ceph.py", line 1168, in task
    yield
  File "/home/teuthworker/teuthology-dumpling/teuthology/run_tasks.py", line 45, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/thrashosds.py", line 167, in task
    thrash_proc.do_join()
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/ceph_manager.py", line 106, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 331, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.3 restart
2014-05-21T19:37:51.280 INFO:teuthology.misc:Shutting down mds daemons...
2014-05-21T19:37:51.281 DEBUG:teuthology.task.ceph.mds.a:waiting for process to exit
2014-05-21T19:37:51.288 INFO:teuthology.task.ceph.mds.a:Stopped
2014-05-21T19:37:51.288 INFO:teuthology.misc:Shutting down osd daemons...
2014-05-21T19:37:51.288 DEBUG:teuthology.task.ceph.osd.1:waiting for process to exit
2014-05-21T19:37:51.307 INFO:teuthology.task.ceph.osd.1:Stopped
2014-05-21T19:37:51.307 DEBUG:teuthology.task.ceph.osd.0:waiting for process to exit
2014-05-21T19:37:51.359 INFO:teuthology.task.ceph.osd.0:Stopped
2014-05-21T19:37:51.359 DEBUG:teuthology.task.ceph.osd.3:waiting for process to exit
2014-05-21T19:37:51.359 ERROR:teuthology.misc:Saw exception from osd.3
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-dumpling/teuthology/misc.py", line 831, in stop_daemons_of_type
    daemon.stop()
  File "/home/teuthworker/teuthology-dumpling/teuthology/task/ceph.py", line 36, in stop
    run.wait([self.proc])
  File "/home/teuthworker/teuthology-dumpling/teuthology/orchestra/run.py", line 282, in wait
    proc.exitstatus.get()
  File "/usr/lib/python2.7/dist-packages/gevent/event.py", line 207, in get
    raise self._exception
CommandFailedError: Command failed on 10.214.131.26 with status 1: '/home/ubuntu/cephtest/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage sudo /home/ubuntu/cephtest/daemon-helper kill ceph-osd -f -i 3'
2014-05-21T19:37:51.385 DEBUG:teuthology.task.ceph.osd.2:waiting for process to exit
2014-05-21T19:37:51.412 INFO:teuthology.task.ceph.osd.2:Stopped
2014-05-21T19:37:51.412 ERROR:teuthology.task.ceph.osd.5:tried to stop a non-running daemon
2014-05-21T19:37:51.412 ERROR:teuthology.task.ceph.osd.4:tried to stop a non-running daemon
archive_path: /var/lib/teuthworker/archive/teuthology-2014-05-20_19:00:34-rados-dumpling-testing-basic-plana/267269
branch: dumpling
description: rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/few.yaml
  thrashers/default.yaml workloads/snaps-few-objects.yaml}
email: null
job_id: '267269'
kernel: &id001
  kdb: true
  sha1: 335cb91ce950ce0e12294af671c64a468d89194c
last_in_suite: false
machine_type: plana
name: teuthology-2014-05-20_19:00:34-rados-dumpling-testing-basic-plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: dumpling
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    fs: xfs
    log-whitelist:
    - slow request
    sha1: b638d19d126646d2a8f6da11067c5f392a62525e
  ceph-deploy:
    branch:
      dev: dumpling
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: b638d19d126646d2a8f6da11067c5f392a62525e
  s3tests:
    branch: dumpling
  workunit:
    sha1: b638d19d126646d2a8f6da11067c5f392a62525e
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
suite: rados
targets:
  ubuntu@plana14.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCkQBfSVIjF3fRKmdHm4NeHMWyt0vTcQFKvQISTEmBNe/DVwHRjLnsV0Cx5LsWjUNlVCXkHEYidQbhvQQ6KuWRs1EbRQWYjYOdv/EZh+uxqik8zBPV3L7hC3O34GiUng+EEaX22ta804jrsIZyoeBFa6r9jnKcc4Uk5eCafw3RVUEKILnYrAFOGWrTpXzoZGljuG6GujV7kKrnOvCyAfY4PdSRdQY1j4QOyvkr1zix9iBctAVaJG+xa/Ff48Oa7mqxioQPtUcn03ZOxqkQ70U24URjqPMZl3JD7aHG/Riz/DSvbeESQftHWVmqp03WIm9qTIgzClEcPbhKbLUNVewh7
  ubuntu@plana35.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDN2DvVrroCqjz4NeYHoqEhj35Ea5bEEMc2YHJY0rfXO0J9cRSnQBfzOQtSC6LrMYXJrQ4Wr/UQmmovs6a42F/fVmYygUeCdAjy6RT5bQ7eHdd+bJB3zKSZ+oY72g8UHlHRekSFBDI8ivFLC7PWscUx4v1o5vgQFgUDXkBjZpje7VpB3Sp8P/Dpqf5zcEo/9DxYxntaxn3dg2LrGZ4jKG5A49ivCo6NOxFRyth3dlcTMlhPjVs6dKWzohppx9h6bW5k0ijF3FLfPfc72Nkdjiw8W3njDbXl8JjKTvMrfKc1wxDEeq7602oIuT7CFC4hrBZtxLPinrtQhguXB0j35git
tasks:
- internal.lock_machines:
  - 2
  - plana
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: dumpling
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.plana.17189
description: rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/few.yaml
  thrashers/default.yaml workloads/snaps-few-objects.yaml}
duration: 1520.7286908626556
failure_reason: timed out waiting for admin_socket to appear after osd.3 restart
flavor: basic
mon.a-kernel-sha1: 335cb91ce950ce0e12294af671c64a468d89194c
mon.b-kernel-sha1: 335cb91ce950ce0e12294af671c64a468d89194c
owner: scheduled_teuthology@teuthology
success: false

History

#1 Updated by Tamilarasi muthamizhan almost 10 years ago

  • Status changed from New to Duplicate
  • Parent task set to #7068

Also available in: Atom PDF