Bug #9891
closed"Assertion: os/DBObjectMap.cc: 1214: FAILED assert(0)" in upgrade:firefly-x-giant-distro-basic-multi run
0%
Description
(from scrape)
Assertion: os/DBObjectMap.cc: 1214: FAILED assert(0) ceph version 0.86-104-gb05efdd (b05efddb77290b86eb5c150776c761ab84f66f37) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb81feb] 2: (DBObjectMap::lookup_parent(std::tr1::shared_ptr<DBObjectMap::_Header>)+0x798) [0xa2da28] 3: (DBObjectMap::_clear(std::tr1::shared_ptr<DBObjectMap::_Header>, std::tr1::shared_ptr<KeyValueDB::TransactionImpl>)+0xf4) [0xa30e74] 4: (DBObjectMap::clear(ghobject_t const&, SequencerPosition const*)+0x202) [0xa35962] 5: (FileStore::lfn_unlink(coll_t, ghobject_t const&, SequencerPosition const&, bool)+0x16b) [0x8ff01b] 6: (FileStore::_remove(coll_t, ghobject_t const&, SequencerPosition const&)+0x8b) [0x8ff4ab] 7: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x2cc1) [0x90ff01] 8: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x913624] 9: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x18f) [0x9137cf] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa46) [0xb73176] 11: (ThreadPool::WorkThread::entry()+0x10) [0xb74220] 12: (()+0x8182) [0x7ffbf6ff8182] 13: (clone()+0x6d) [0x7ffbf5563fbd]
archive_path: /var/lib/teuthworker/archive/teuthology-2014-10-25_18:13:01-upgrade:firefly-x-giant-distro-basic-multi/571584 branch: giant description: upgrade:firefly-x/parallel/{0-cluster/start.yaml 1-firefly-install/firefly.yaml 2-workload/{ec-rados-default.yaml rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-upgrade-sequence/upgrade-all.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/ubuntu_14.04.yaml} email: ceph-qa@ceph.com job_id: '571584' kernel: &id001 kdb: true sha1: distro last_in_suite: false machine_type: plana,burnupi,mira name: teuthology-2014-10-25_18:13:01-upgrade:firefly-x-giant-distro-basic-multi nuke-on-error: true os_type: ubuntu os_version: '14.04' overrides: admin_socket: branch: giant ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 mon warn on legacy crush tunables: false osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request - scrub mismatch - ScrubResult sha1: b05efddb77290b86eb5c150776c761ab84f66f37 ceph-deploy: branch: dev: giant conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: b05efddb77290b86eb5c150776c761ab84f66f37 s3tests: branch: giant workunit: sha1: b05efddb77290b86eb5c150776c761ab84f66f37 owner: scheduled_teuthology@teuthology priority: 99 roles: - - mon.a - mds.a - osd.0 - osd.1 - - mon.b - mon.c - osd.2 - osd.3 - - client.0 - client.1 suite: upgrade:firefly-x suite_branch: giant suite_path: /var/lib/teuthworker/src/ceph-qa-suite_giant targets: ubuntu@burnupi55.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCoNkQL+cTE/GCwShmnIodptdNXomyURFXp3UtW2a5/5qqq/4ic1ujJ5KAxPSs+uJCP5u+RVpoX6O7WIT0RC9OjPo4yQRU2CXk0AneE0rS2JK0DBz5KJlpKXwjQ5hkU71uaXMkUDJE9YE1faZ3C92hnlyIixjsmWfNEdxcGWUCFRc1/wXNbJsp5wyTIy+RHqwxuy+IRp3Oa1qvrbM2u0wEx5qCFUvlj4M1F+Oi7c531BWXRMlh9VbDa3scs2qc3Cu5foglcMe4B6Fxoj9J8ZPfdX/lI3qHlgRdnTZzB3fCCNldc9ZOreAbL/8eV+fHRMxG6uuCKp2b9M+scxh4puZ2J ubuntu@plana29.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDgHl5M5raEaWWKwH7thuC+Glhc6UlVLHpvIMDGMezlVOELXUB197ZAyurxpWurUteDEaqevUqjUgi08lEL0dHW4G9ulst8B3FqHrAvIvCmhKwGs1znrorZx2Bq+/8mreE4ocMWfT/R8sFzBsM3npgdgSqCAdDBSgI78S92WSlHGAqUz1A0iJoGdwiRjTKhCrI/tuIYyXUU9z/2vIR9bJTp0fwt2S3z4LnpKdsMU8BGpJwm8CPrVj/odRDZ/epgEFBySb4OUq68QDXlg8RHnL1D+zFVsoi/uU5o9rW55EAo7KnPDcdNCQSsZmaurwRaZxVSqhomF0kIU9oL1wlD+s7N ubuntu@plana38.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCnqJmM2k0mJ+z0UPorFQERz0n6Dm6kgc51D9b/Mxebr4tNb2JIhI0L23thzhnsWW/q9UzIyVFZriTFmjdrvHjmfqByoihjUiX2QnHF0jDskMuywSUm+hvhwU+P7s+BSC5HZEWf/oQJ/w8WuLXOfsruMJu2cQc6ow3Zj+UzeGwO0wab5knKwJDG70aVfrs3wQjcm1RQWSqv+oVG4/CXQnC+TmAMR1/2G42XBNjjF5vl8M47M5zZKW+czP5r40dC5LmNU8lyvmvikxIaoxQS5n9SPSUtuAF1PM6UnSy+Rq2RZNvI/yRVWyYuV0UBwwFN8EarZ9icstCv2P1rzsMMeNYv tasks: - internal.lock_machines: - 3 - plana,burnupi,mira - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.push_inventory: null - internal.serialize_remote_roles: null - internal.check_conflict: null - internal.check_ceph_data: null - internal.vm_setup: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.sudo: null - internal.syslog: null - internal.timer: null - chef: null - clock.check: null - install: branch: firefly - print: '**** done installing firefly' - ceph: fs: xfs - print: '**** done ceph' - parallel: - workload - upgrade-sequence - print: '**** done parallel' - rados: clients: - client.1 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 - workunit: clients: client.1: - rados/load-gen-mix.sh - mon_thrash: revive_delay: 20 thrash_delay: 1 - workunit: clients: client.1: - rados/test.sh - workunit: clients: client.1: - cls/test_cls_rbd.sh - workunit: clients: client.1: - rbd/import_export.sh env: RBD_CREATE_ARGS: --new-format - rgw: - client.1 - swift: client.1: rgw_server: client.1 teuthology_branch: master tube: multi upgrade-sequence: sequential: - install.upgrade: mon.a: null mon.b: null - ceph.restart: - mon.a - mon.b - mon.c - mds.a - osd.0 - osd.1 - osd.2 - osd.3 verbose: true worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.3115 workload: sequential: - rados: clients: - client.0 ec_pool: true objects: 50 op_weights: append: 100 copy_from: 50 delete: 50 read: 100 rmattr: 25 rollback: 50 setattr: 25 snap_create: 50 snap_remove: 50 write: 0 ops: 4000 - workunit: branch: firefly clients: client.0: - cls - workunit: branch: firefly clients: client.0: - rados/load-gen-big.sh - workunit: branch: firefly clients: client.0: - rbd/test_librbd.sh - workunit: branch: firefly clients: client.0: - rbd/test_librbd_python.sh
description: upgrade:firefly-x/parallel/{0-cluster/start.yaml 1-firefly-install/firefly.yaml 2-workload/{ec-rados-default.yaml rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-upgrade-sequence/upgrade-all.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/ubuntu_14.04.yaml} duration: 2967.701498031616 failure_reason: '''wait_until_healthy''reached maximum tries (150) after waiting for 900 seconds' flavor: basic owner: scheduled_teuthology@teuthology status: fail success: false
Updated by Samuel Just over 9 years ago
2014-10-25 18:56:23.243456 7fefdcc3e700 20 filestore dbobjectmap: seq is 485
2014-10-25 18:56:23.243559 7fefdd43f700 20 filestore dbobjectmap: seq is 486
2014-10-25 18:56:23.243602 7fefdd43f700 20 filestore dbobjectmap: seq is 487
2014-10-25 18:56:23.243628 7fefdd43f700 20 filestore set_header: setting seq 324
2014-10-25 18:56:23.243642 7fefdd43f700 20 filestore set_map_header: setting 485 oid fb2248dc/burnupi5544081-40/45//3/18446744073709551615/2 parent seq 324
2014-10-25 18:56:23.243655 7fefdd43f700 20 filestore set_map_header: setting 486 oid fb2248dc/burnupi5544081-40/45//3/484/2 parent seq 324
...
2014-10-25 18:56:30.801289 7ffbf8345900 0 ceph version 0.86-104-gb05efdd (b05efddb77290b86eb5c150776c761ab84f66f37), process ceph-osd, pid 18817
...
2014-10-25 18:56:31.104055 7ffbf8345900 20 filestore (init)dbobjectmap: seq is 485
...
2014-10-25 18:56:40.629704 7ffbe8e4d700 20 filestore(/var/lib/ceph/osd/ceph-2) objectmap clone
2014-10-25 18:56:40.629834 7ffbe8e4d700 20 filestore dbobjectmap: seq is 487
2014-10-25 18:56:40.629878 7ffbe8e4d700 20 filestore dbobjectmap: seq is 488
2014-10-25 18:56:40.629907 7ffbe8e4d700 20 filestore set_header: setting seq 453
2014-10-25 18:56:40.629926 7ffbe8e4d700 20 filestore set_map_header: setting 486 oid ac9ea7a1/burnupi5544081-15/head//3/18446744073709551615/1 parent seq 453
2014-10-25 18:56:40.629959 7ffbe8e4d700 20 filestore set_map_header: setting 487 oid ac9ea7a1/burnupi5544081-15/64//3/18446744073709551615/1 parent seq 453
...
2014-10-25 18:56:41.920041 7ffbe8e4d700 20 filestore remove_map_header: removing 486 oid fb2248dc/burnupi5544081-40/45//3/484/2
2014-10-25 18:56:41.920056 7ffbe8e4d700 20 filestore clear_header: clearing seq 486
...
2014-10-25 18:56:41.933829 7ffbe8e4d700 20 filestore remove_map_header: removing 489 oid ac9ea7a1/burnupi5544081-15/head//3/360/1
2014-10-25 18:56:41.933842 7ffbe964e700 10 filestore oid: 1113f7/pglog_3.as2/0//-1 not skipping op, *spos 7669.0.4
2014-10-25 18:56:41.933844 7ffbe8e4d700 20 filestore clear_header: clearing seq 489
2014-10-25 18:56:41.933847 7ffbe964e700 10 filestore > header.spos 0.0.0
2014-10-25 18:56:41.933890 7ffbe964e700 15 filestore(/var/lib/ceph/osd/ceph-2) _omap_setkeys meta/1113f7/pglog_3.as2/0//-1
2014-10-25 18:56:41.933932 7ffbe964e700 10 filestore oid: 1113f7/pglog_3.as2/0//-1 not skipping op, *spos 7669.0.5
2014-10-25 18:56:41.933937 7ffbe964e700 10 filestore > header.spos 0.0.0
2014-10-25 18:56:41.934078 7ffbe964e700 10 filestore(/var/lib/ceph/osd/ceph-2) -ERANGE, len is 802
2014-10-25 18:56:41.934107 7ffbe964e700 10 filestore(/var/lib/ceph/osd/ceph-2) -ERANGE, got 401
2014-10-25 18:56:41.934108 7ffbe8e4d700 20 filestore lookup_parent: parent 486 for seq 489
So, the transaction updating dbobjectmap seq didn't apply for some reason. It would have been submitted before the transaction actually writing out the nodes using it, so that's definitely odd since burnupi5544081-40/45//3/484/2 was apparently written out with seq 486. This must be either a filesystem thing or a leveldb thing. It's not clear to me how it could have been a filesystem thing though.
Updated by Sage Weil over 9 years ago
- Priority changed from Urgent to High
does not appear to be a ceph issue.. either bad disk or leveldb corruption or something. lowering priority.
Updated by Loïc Dachary over 9 years ago
http://pulpito.ceph.com/loic-2015-01-10_11:54:25-rados-giant-backports-testing-basic-vps/694324/
-6> 2015-01-11 09:18:46.785891 7f6d14685700 1 -- 10.214.140.199:6803/19206 <== osd.5 10.214.138.167:0/6293 12 ==== osd_ping(ping e459 stamp 2015-01-11 09:18:46.774003) v2 ==== 47+0+0 (3607071168 0 0) 0x5d42a00 con 0x5abd000 -5> 2015-01-11 09:18:46.785913 7f6d14685700 1 -- 10.214.140.199:6803/19206 --> 10.214.138.167:0/6293 -- osd_ping(ping_reply e459 stamp 2015-01-11 09:18:46.774003) v2 -- ?+0 0x5a4ec00 con 0x5abd000 -4> 2015-01-11 09:18:46.785929 7f6d14685700 20 osd.2 459 share_map_peer 0x5a45000 already has epoch 459 -3> 2015-01-11 09:18:46.785822 7f6d12e82700 1 -- 10.214.140.199:6802/19206 <== osd.5 10.214.138.167:0/6293 12 ==== osd_ping(ping e459 stamp 2015-01-11 09:18:46.774003) v2 ==== 47+0+0 (3607071168 0 0) 0x5dbfa00 con 0x5abd160 -2> 2015-01-11 09:18:46.785998 7f6d12e82700 1 -- 10.214.140.199:6802/19206 --> 10.214.138.167:0/6293 -- osd_ping(ping_reply e459 stamp 2015-01-11 09:18:46.774003) v2 -- ?+0 0x5499200 con 0x5abd160 -1> 2015-01-11 09:18:46.786011 7f6d12e82700 20 osd.2 459 share_map_peer 0x5a45000 already has epoch 459 0> 2015-01-11 09:18:46.804341 7f6d1c7c3700 -1 os/DBObjectMap.cc: In function 'DBObjectMap::Header DBObjectMap::lookup_parent(DBObjectMap::Header)' thread 7f6d1c7c3700 time 2015-01-11 09:18:46.775993 os/DBObjectMap.cc: 1214: FAILED assert(0) ceph version 0.87-87-g0ea20e6 (0ea20e6c51208d6710f469454ab3f964bfa7c9d2) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xaa04bf] 2: (DBObjectMap::lookup_parent(std::tr1::shared_ptr<DBObjectMap::_Header>)+0x925) [0x9790e5] 3: (DBObjectMap::_clear(std::tr1::shared_ptr<DBObjectMap::_Header>, std::tr1::shared_ptr<KeyValueDB::TransactionImpl>)+0xe5) [0x9792c5] 4: (DBObjectMap::clear(ghobject_t const&, SequencerPosition const*)+0x1a5) [0x97d325] 5: (FileStore::lfn_unlink(coll_t, ghobject_t const&, SequencerPosition const&, bool)+0x250) [0x8ab400] 6: (FileStore::_remove(coll_t, ghobject_t const&, SequencerPosition const&)+0x172) [0x8ab872] 7: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x3bcc) [0x8bbc5c] 8: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x8be064] 9: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x297) [0x8be317] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x48e) [0xa900ee] 11: (ThreadPool::WorkThread::entry()+0x10) [0xa92e80] 12: (()+0x7e9a) [0x7f6d27eabe9a] 13: (clone()+0x6d) [0x7f6d268762ed] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Yuri Weinstein about 9 years ago
- Status changed from Rejected to New
See in release giant v0.87.1 RC
Run: http://pulpito.front.sepia.ceph.com/teuthology-2015-02-12_17:29:32-rados-giant-distro-basic-multi/
Job: ['754710']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-02-12_17:29:32-rados-giant-distro-basic-multi/754710/
Assertion: os/DBObjectMap.cc: 1214: FAILED assert(0) ceph version 0.87-159-gccb0914 (ccb0914f76da23acdd7374233cd1939ab80ef3c8) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb82c4b] 2: (DBObjectMap::lookup_parent(std::tr1::shared_ptr<DBObjectMap::_Header>)+0x798) [0xa2e6d8] 3: (DBObjectMap::_clear(std::tr1::shared_ptr<DBObjectMap::_Header>, std::tr1::shared_ptr<KeyValueDB::TransactionImpl>)+0xf4) [0xa31b24] 4: (DBObjectMap::clear(ghobject_t const&, SequencerPosition const*)+0x202) [0xa36612] 5: (FileStore::lfn_unlink(coll_t, ghobject_t const&, SequencerPosition const&, bool)+0x16b) [0x8ff55b] 6: (FileStore::_remove(coll_t, ghobject_t const&, SequencerPosition const&)+0x8b) [0x8ff9eb] 7: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x2cc1) [0x910441] 8: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x913b64] 9: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x18f) [0x913d0f] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa46) [0xb73dd6] 11: (ThreadPool::WorkThread::entry()+0x10) [0xb74e80] 12: (()+0x8182) [0x7f5c252de182] 13: (clone()+0x6d) [0x7f5c23848efd]
Updated by Yuri Weinstein about 9 years ago
- Priority changed from High to Urgent
Updated by Yuri Weinstein about 9 years ago
Verified that this test:
rados/thrash-erasure-code/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/few.yaml thrashers/default.yaml workloads/ec-rados-plugin=jerasure-k=2-m=1.yaml}
passed in run http://pulpito.front.sepia.ceph.com/teuthology-2015-02-17_14:43:58-rados-giant-distro-basic-multi/
Updated by Samuel Just about 9 years ago
- Status changed from New to 7
- Assignee set to Samuel Just
- Backport changed from giant to hammer,giant,firefly,dumpling
2015-02-13 03:40:47.208535 7f0e9b5ec700 20 filestore dbobjectmap: seq is 271
2015-02-13 03:40:47.208564 7f0e9adeb700 20 filestore dbobjectmap: seq is 271
2015-02-13 03:40:47.208579 7f0e9b5ec700 20 filestore dbobjectmap: seq is 272
Note that there are two threads participating there! DBObjectMap::sync() doesn't take the header_lock!
Updated by Samuel Just about 9 years ago
This can be ignored for the present giant release though.
Updated by Samuel Just about 9 years ago
- Status changed from 7 to Pending Backport
Updated by Loïc Dachary about 9 years ago
- firefly backport https://github.com/ceph/ceph/pull/3948
Updated by Loïc Dachary about 9 years ago
- giant backport https://github.com/ceph/ceph/pull/4094
Updated by Loïc Dachary about 9 years ago
- Backport changed from hammer,giant,firefly,dumpling to hammer,giant,firefly
dumpling is end of life
Updated by Loïc Dachary about 9 years ago
- Backport changed from hammer,giant,firefly to giant,firefly
already in hammer
Updated by Loïc Dachary about 9 years ago
- Status changed from Pending Backport to Resolved