Project

General

Profile

Actions

Bug #7068

closed

os/FileStore.cc: 4035: FAILED assert(omap_attrs.size() == omap_aset.size()) (dumpling)

Added by Sage Weil over 10 years ago. Updated almost 9 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
% Done:

100%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/dumpling
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2013-12-28 09:22:57.789552 7f502d93c700 -1 os/FileStore.cc: In function 'virtual int FileStore::getattrs(coll_t, const hobject_t&, std::map<std::basic_string<char>, ceph::buffer::ptr>&, bool)' thread 7f502d93c700 time 2013-12-28 09:22:57.787645
os/FileStore.cc: 4035: FAILED assert(omap_attrs.size() == omap_aset.size())

 ceph version 0.67.5 (a60ac9194718083a4b6a225fc17cad6096c69bd1)
 1: (FileStore::getattrs(coll_t, hobject_t const&, std::map<std::string, ceph::buffer::ptr, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::ptr> > >&, bool)+0xa3f) [0x7af44f]
 2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x18a4) [0x610644]
 3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x6f) [0x61a36f]
 4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x35a0) [0x622630]
 5: (PG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x619) [0x7120f9]
 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x330) [0x6660a0]
 7: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x4a0) [0x67c7a0]
 8: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6b7e9c]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8c0296]
 10: (ThreadPool::WorkThread::entry()+0x10) [0x8c20a0]
 11: (()+0x7e9a) [0x7f50417f5e9a]
 12: (clone()+0x6d) [0x7f503f9883fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

this was an upgrade test..

ubuntu@teuthology:/a/sage-2013-12-28_09:01:26-upgrade:upgrade-parallel-next-testing-basic-plana/17328$ cat orig.config.yaml 
archive_path: /var/lib/teuthworker/archive/sage-2013-12-28_09:01:26-upgrade:upgrade-parallel-next-testing-basic-plana/17328
description: upgrade/upgrade-parallel/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/snaps-few-objects.yaml
  6-next-mon/monb.yaml 7-workload/rados_api_tests.yaml 8-next-mon/monc.yaml 9-workload/rados_api_tests.yaml
  distro/ubuntu_12.04.yaml}
email: null
job_id: '17328'
kernel:
  kdb: true
  sha1: e2a63181d78fb157eafcdc036c7b9b00e9ac1bd7
last_in_suite: false
machine_type: plana
name: sage-2013-12-28_09:01:26-upgrade:upgrade-parallel-next-testing-basic-plana
nuke-on-error: true
os_type: ubuntu
os_version: '12.04'
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug ms: 1
        debug osd: 5
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 4f0784898767f40982a2aa94b35fb429d4f5965d
  ceph-deploy:
    branch:
      dev: next
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: 4f0784898767f40982a2aa94b35fb429d4f5965d
  s3tests:
    branch: next
  workunit:
    sha1: 4f0784898767f40982a2aa94b35fb429d4f5965d
owner: scheduled_sage@vapre
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - osd.3
  - osd.4
  - osd.5
  - client.0
  - mon.c
tasks:
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: emperor
    clients:
      client.0:
      - rados/test.sh
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: emperor
    clients:
      client.0:
      - rados/test.sh
teuthology_branch: next
verbose: true


Subtasks 1 (0 open1 closed)

rbd - Bug #8425: osd crashed in rados-dumpling-testing-basic-plana suiteDuplicate05/22/2014

Actions
Actions #1

Updated by Sage Weil about 10 years ago

  • Status changed from New to Need More Info
Actions #2

Updated by Sage Weil about 10 years ago

  • Severity changed from 3 - minor to 2 - major
Actions #3

Updated by Sage Weil about 10 years ago

  • Priority changed from Urgent to High
Actions #4

Updated by Samuel Just about 10 years ago

  • Status changed from Need More Info to Can't reproduce
Actions #5

Updated by Sage Weil almost 10 years ago

  • Subject changed from os/FileStore.cc: 4035: FAILED assert(omap_attrs.size() == omap_aset.size()) to os/FileStore.cc: 4035: FAILED assert(omap_attrs.size() == omap_aset.size()) (dumpling)
  • Status changed from Can't reproduce to 12
  • Priority changed from High to Urgent

ubuntu@teuthology:/a/teuthology-2014-05-20_19:00:34-rados-dumpling-testing-basic-plana/267269

Actions #6

Updated by Samuel Just almost 10 years ago

  • Status changed from 12 to 7
  • Assignee set to Samuel Just

The bug is that the snap trimmer in dumpling does not take a lock on the object. The fix is probably backporting b87bc2311aa4da065477f402a869e2edc1558e2f, testing (wip-7068-dumpling).

Actions #7

Updated by Sage Weil almost 10 years ago

  • Status changed from 7 to Resolved
Actions #8

Updated by Yuri Weinstein about 9 years ago

  • Status changed from Resolved to New

See it again in run: http://pulpito-rdu.front.sepia.ceph.com/teuthology-2015-04-25_15:00:01-upgrade:dumpling-dumpling-distro-basic-typica/
Job: ['4529']
Logs: http://typica002.front.sepia.ceph.com/teuthology-2015-04-25_15:00:01-upgrade:dumpling-dumpling-distro-basic-typica/4529/teuthology.log

Assertion: os/FileStore.cc: 3936: FAILED assert(omap_attrs.size() == omap_aset.size())
ceph version 0.67.1 (e23b817ad0cf1ea19c0a7b7c9999b30bed37d533)
 1: (FileStore::getattrs(coll_t, hobject_t const&, std::map<std::string, ceph::buffer::ptr, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::ptr> > >&, bool)+0xbae) [0x795d4e]
 2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x28db) [0x5f354b]
 3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x23f) [0x5ff17f]
 4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3cd2) [0x6044e2]
 5: (PG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x334) [0x6fe444]
 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x343) [0x64dbf3]
 7: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x19f) [0x66274f]
 8: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x69e20c]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0x8a4c01]
 10: (ThreadPool::WorkThread::entry()+0x10) [0x8a5af0]
 11: (()+0x8182) [0x7f4df5528182]
 12: (clone()+0x6d) [0x7f4df343247d]
Actions #9

Updated by Yuri Weinstein about 9 years ago

  • ceph-qa-suite upgrade/dumpling-x added
Actions #10

Updated by Yuri Weinstein about 9 years ago

  • ceph-qa-suite upgrade/dumpling added
  • ceph-qa-suite deleted (upgrade/dumpling-x)
Actions #11

Updated by Samuel Just almost 9 years ago

  • Status changed from New to Can't reproduce
  • Regression set to No
Actions

Also available in: Atom PDF