Bug #37656
FileStore::_do_transaction() crashed with error 17 (merge collection vs osd restart)
0%
Description
2018-12-13T16:44:55.693 INFO:teuthology.orchestra.run.smithi091:Running: 2018-12-13T16:44:55.693 INFO:teuthology.orchestra.run.smithi091:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.6.asok dump_historic_ops 2018-12-13T16:44:55.698 INFO:tasks.ceph.osd.6.smithi091.stderr:2018-12-13 16:44:55.693 7f62e5052700 -1 filestore(/var/lib/ceph/osd/ceph-6) error (17) File exists not handled on operation 0xec8e000 (8205.0.0, or op 0, counting from 0) 2018-12-13T16:44:55.702 INFO:tasks.ceph.osd.6.smithi091.stderr:/build/ceph-14.0.1-1674-g822b02c/src/os/filestore/FileStore.cc: In function 'void FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*, const char*)' thread 7f62e5052700 time 2018-12-13 16:44:55.697661 2018-12-13T16:44:55.702 INFO:tasks.ceph.osd.6.smithi091.stderr:/build/ceph-14.0.1-1674-g822b02c/src/os/filestore/FileStore.cc: 3293: abort() 2018-12-13T16:44:55.702 INFO:tasks.ceph.osd.6.smithi091.stderr: ceph version 14.0.1-1674-g822b02c (822b02ce1b5d8d2126974c0b6aec117223b299b2) nautilus (dev) 2018-12-13T16:44:55.702 INFO:tasks.ceph.osd.6.smithi091.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xda) [0x82cd64] 2018-12-13T16:44:55.702 INFO:tasks.ceph.osd.6.smithi091.stderr: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*, char const*)+0xa58) [0xd01e18] 2018-12-13T16:44:55.703 INFO:tasks.ceph.osd.6.smithi091.stderr: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*, char const*)+0x48) [0xd08d08] 2018-12-13T16:44:55.703 INFO:tasks.ceph.osd.6.smithi091.stderr: 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x1a4) [0xd08ee4] 2018-12-13T16:44:55.703 INFO:tasks.ceph.osd.6.smithi091.stderr: 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xbee) [0xfc12de] 2018-12-13T16:44:55.703 INFO:tasks.ceph.osd.6.smithi091.stderr: 6: (ThreadPool::WorkThread::entry()+0x10) [0xfca540] 2018-12-13T16:44:55.703 INFO:tasks.ceph.osd.6.smithi091.stderr: 7: (()+0x76ba) [0x7f62f6ebd6ba] 2018-12-13T16:44:55.703 INFO:tasks.ceph.osd.6.smithi091.stderr: 8: (clone()+0x6d) [0x7f62f64c441d]
/a/nojha-2018-12-13_15:13:17-rados-master-distro-basic-smithi/3358402/
History
#1 Updated by Josh Durgin almost 5 years ago
/a/dzafman-2018-12-14_11:02:20-rados-wip-zafman-testing-distro-basic-smithi/3362534
#2 Updated by Neha Ojha almost 5 years ago
- Priority changed from Normal to High
#3 Updated by Josh Durgin about 4 years ago
- Status changed from New to Can't reproduce
#4 Updated by Neha Ojha over 3 years ago
- Status changed from Can't reproduce to New
/a/teuthology-2020-02-11_02:30:03-upgrade:mimic-x-nautilus-distro-basic-smithi/4753470/
upgrade:mimic-x/stress-split/{0-cluster/{openstack.yaml start.yaml} 1-ceph-install/mimic.yaml 1.1-pg-log-overrides/normal_pg_log.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-workload/{radosbench.yaml rbd-cls.yaml rbd-import-export.yaml rbd_api.yaml readwrite.yaml rgw_ragweed_prepare.yaml snaps-few-objects.yaml} 5-finish-upgrade.yaml 6-msgr2.yaml 6-nautilus.yaml 7-final-workload/{rbd-python.yaml rgw-swift-ragweed_check.yaml snaps-many-objects.yaml} objectstore/filestore-xfs.yaml supported-all-distro/ubuntu_16.04.yaml thrashosds-health.yaml}
020-02-11 09:28:01.145 7fdb10310700 10 filestore(/var/lib/ceph/osd/ceph-2) _do_transaction(2773): on 0xf358520 2020-02-11 09:28:01.145 7fdb10310700 15 filestore(/var/lib/ceph/osd/ceph-2) _create_collection(5448): /var/lib/ceph/osd/ceph-2/current/1.165_head 2020-02-11 09:28:01.145 7fdb10310700 10 filestore(/var/lib/ceph/osd/ceph-2) _create_collection(5454): /var/lib/ceph/osd/ceph-2/current/1.165_head = -17 2020-02-11 09:28:01.145 7fdb10310700 -1 filestore(/var/lib/ceph/osd/ceph-2) error (17) File exists not handled on operation 0x10553000 (866015.0.0, or op 0, counting from 0)
2020-02-11 09:28:01.433 7fdb10310700 -1 /build/ceph-14.2.7-508-g6036662/src/os/filestore/FileStore.cc: In function 'void FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*, const char*)' thread 7fdb10310700 time 2020-02-11 09:28:01.148817 /build/ceph-14.2.7-508-g6036662/src/os/filestore/FileStore.cc: 3320: ceph_abort_msg("unexpected error") ceph version 14.2.7-508-g6036662 (6036662bc04e4d9a0f8ab284b1a18ca54100c1d7) nautilus (stable) 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdf) [0x83c5f0] 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*, char const*)+0x6124) [0xc7db84] 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*, char const*)+0x48) [0xc7f168] 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x1a4) [0xc7f344] 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x9d6) [0xee3346] 6: (ThreadPool::WorkThread::entry()+0x11) [0xee41d1] 7: (()+0x76ba) [0x7fdb21a5f6ba] 8: (clone()+0x6d) [0x7fdb2106641d]
#5 Updated by Sage Weil over 3 years ago
-737> 2020-03-04T09:28:58.109+0000 7effc1c02700 10 filestore(/var/lib/ceph/osd/ceph-6) _create_collection(5447): /var/lib/ceph/osd/ceph-6/current/2.10_head = -17 -732> 2020-03-04T09:28:58.109+0000 7effc1c02700 -1 filestore(/var/lib/ceph/osd/ceph-6) error (17) File exists not handled on operation 0x55fcf753f000 (11684.0.0, or op 0, counting from 0) -731> 2020-03-04T09:28:58.109+0000 7effc1c02700 0 filestore(/var/lib/ceph/osd/ceph-6) unexpected error code -672> 2020-03-04T09:28:58.109+0000 7effc1c02700 0 filestore(/var/lib/ceph/osd/ceph-6) transaction dump: { "ops": [ { "op_num": 0, "op_name": "mkcoll", "collection": "2.10_head" }, { "op_num": 1, "op_name": "coll_hint", "collection": "2.10_head", "type": 1, "pg_num": 18, "expected_num_objects": 0 }, { "op_num": 2, "op_name": "touch", "collection": "2.10_head", "oid": "#2:08000000::::head#" }, { "op_num": 3, "op_name": "omap_setkeys", "collection": "2.10_head", "oid": "#2:08000000::::head#", "attr_lens": { "_infover": 1 } }, { "op_num": 4, "op_name": "touch", "collection": "2.10_head", "oid": "#2:08000000::::head#" }, { "op_num": 5, "op_name": "omap_setkeys", "collection": "2.10_head", "oid": "#2:08000000::::head#", "attr_lens": { "_biginfo": 95, "_epoch": 4, "_info": 948, "may_include_deletes_in_missing": 0 } }, { "op_num": 6, "op_name": "omap_rmkeys", "collection": "2.10_head", "oid": "#2:08000000::::head#", "attrs": [ "_fastinfo" ] } ] } -7> 2020-03-04T09:28:58.110+0000 7effc1c02700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/huge/release/15.1.0-1606-gecd8670/rpm/el8/BUILD/ceph-15.1.0-1606-gecd8670/src/os/filestore/FileStore.cc: In function 'void FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*, const char*)' thread 7effc1c02700 time 2020-03-04T09:28:58.109601+0000
/a/sage-2020-03-03_19:20:19-rados-wip-sage2-testing-2020-03-03-0959-distro-basic-smithi/4821257
#6 Updated by Sage Weil over 3 years ago
the merge happens right before we shut down:
2020-03-04T09:21:35.006+0000 7f44cc7a0700 10 filestore(/var/lib/ceph/osd/ceph-6) _destroy_collection(5505): /var/lib/ceph/osd/ceph-6/current/2.10_head = 0 2020-03-04T09:21:35.006+0000 7f44cc7a0700 10 filestore(/var/lib/ceph/osd/ceph-6) _close_replay_guard(2653): 3445.0.1 2020-03-04T09:21:35.006+0000 7f44cc7a0700 20 filestore dbobjectmap: seq is 27 2020-03-04T09:21:35.006+0000 7f44cc7a0700 10 filestore(/var/lib/ceph/osd/ceph-6) _close_replay_guard(2683): 3445.0.1 done 2020-03-04T09:21:35.006+0000 7f44cc7a0700 10 filestore(/var/lib/ceph/osd/ceph-6) _close_replay_guard(2653): 3445.0.1 2020-03-04T09:21:35.006+0000 7f44cc7a0700 20 filestore dbobjectmap: seq is 27 2020-03-04T09:21:35.006+0000 7f44cc7a0700 10 filestore(/var/lib/ceph/osd/ceph-6) _close_replay_guard(2683): 3445.0.1 done 2020-03-04T09:21:35.006+0000 7f44cc7a0700 15 filestore(/var/lib/ceph/osd/ceph-6) _merge_collection(5908): 2.10_head 2.0_head bits 4 = 0 ... 2020-03-04T09:21:36.095+0000 7f44de034700 10 --1- v1:172.21.15.130:6805/15572 >> v1:172.21.15.130:0/15835 conn(0x55e15d8ef800 0x55e15e9bc800 :6805 s=OPENED pgs=18 cs=1 l=1).write_message sending 0x55e15df05080 done. 2020-03-04T09:21:54.508+0000 7effd522aec0 0 ceph version 15.1.0-1606-gecd8670 (ecd867096a473d425947a07b70a4dfca72c6fb88) octopus (rc), process ceph-osd, pid 31676 ... 2020-03-04T09:21:54.519+0000 7effd522aec0 20 filestore(/var/lib/ceph/osd/ceph-6) ls [meta,2.4_head,2.4_TEMP,2.c_head,2.c_TEMP,1.1_head,1.1_TEMP,1.a_head,1.a_TEMP,3.3_head,3.3_TEMP,2.0_head,2.0_TEMP,2.10_head,2.10_TEMP,1.6_head,1.6_TEMP]
the thrasher is doing this:
2020-03-04T09:21:36.234 INFO:tasks.thrashosds.thrasher:Testing ceph-objectstore-tool on down osd.6 2020-03-04T09:21:36.235 INFO:teuthology.orchestra.run.smithi130:> type ceph-objectstore-tool 2020-03-04T09:21:36.289 INFO:teuthology.orchestra.run.smithi130.stdout:ceph-objectstore-tool is /usr/bin/ceph-objectstore-tool 2020-03-04T09:21:36.290 INFO:teuthology.orchestra.run.smithi130:> sudo adjust-ulimits ceph-objectstore-tool --no-mon-config '--log-file=/var/log/ceph/objectstore_tool.$pid.log' --data-path /var/lib/ceph/osd/ceph-6 --journal-path /var/lib/ceph/osd/ceph-6/journal --op list-pgs 2020-03-04T09:21:36.447 INFO:teuthology.orchestra.run.smithi130:> sudo adjust-ulimits ceph-objectstore-tool --no-mon-config '--log-file=/var/log/ceph/objectstore_tool.$pid.log' --data-path /var/lib/ceph/osd/ceph-6 --journal-path /var/lib/ceph/osd/ceph-6/journal --op export --pgid 1.6 --file /var/log/ceph/exp.1.6.6 2020-03-04T09:21:36.501 INFO:teuthology.orchestra.run.smithi130:> sudo adjust-ulimits ceph-objectstore-tool --no-mon-config '--log-file=/var/log/ceph/objectstore_tool.$pid.log' --data-path /var/lib/ceph/osd/ceph-6 --journal-path /var/lib/ceph/osd/ceph-6/journal --force --op remove --pgid 1.6 2020-03-04T09:21:36.560 INFO:teuthology.orchestra.run.smithi130:> sudo adjust-ulimits ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-6 --journal-path /var/lib/ceph/osd/ceph-6/journal '--log-file=/var/log/ceph/objectstore_tool.$pid.log' --op import --file /var/log/ceph/exp.1.6.6 2020-03-04T09:21:36.923 INFO:teuthology.orchestra.run.smithi130:> CEPH_ARGS='--filestore-merge-threshold 1 --filestore-split-multiple 1' sudo -E ceph-objectstore-tool --no-mon-config --log-file=/var/log/ceph/objectstore_tool.$pid.log --data-path /var/lib/ceph/osd/ceph-6 --journal-path /var/lib/ceph/osd/ceph-6/journal --op apply-layout-settings --pool device_health_metrics 2020-03-04T09:21:36.977 INFO:teuthology.orchestra.run.smithi130:> CEPH_ARGS='--filestore-merge-threshold 1 --filestore-split-multiple 1' sudo -E ceph-objectstore-tool --no-mon-config --log-file=/var/log/ceph/objectstore_tool.$pid.log --data-path /var/lib/ceph/osd/ceph-6 --journal-path /var/lib/ceph/osd/ceph-6/journal --op apply-layout-settings --pool rbd 2020-03-04T09:21:37.073 INFO:teuthology.orchestra.run.smithi130:> CEPH_ARGS='--filestore-merge-threshold 1 --filestore-split-multiple 1' sudo -E ceph-objectstore-tool --no-mon-config --log-file=/var/log/ceph/objectstore_tool.$pid.log --data-path /var/lib/ceph/osd/ceph-6 --journal-path /var/lib/ceph/osd/ceph-6/journal --op apply-layout-settings --pool base 2020-03-04T09:21:37.179 INFO:teuthology.orchestra.run.smithi130:> CEPH_ARGS='--filestore-merge-threshold 1 --filestore-split-multiple 1' sudo -E ceph-objectstore-tool --no-mon-config --log-file=/var/log/ceph/objectstore_tool.$pid.log --data-path /var/lib/ceph/osd/ceph-6 --journal-path /var/lib/ceph/osd/ceph-6/journal --op apply-layout-settings --pool cache
the first cot incantation objectstore_tool.30268.log:
2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _do_transaction(2771): on 0x5651ddb8fe60 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _check_replay_guard(2716): 2.10_head dne 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 15 filestore(/var/lib/ceph/osd/ceph-6) _create_collection(5441): /var/lib/ceph/osd/ceph-6/current/2.10_head 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _create_collection(5447): /var/lib/ceph/osd/ceph-6/current/2.10_head = 0 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 20 read_settings split_rand_factor = 6 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _collection_set_bits(5078): /var/lib/ceph/osd/ceph-6/current/2.10_head 5 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _collection_set_bits(5091): /var/lib/ceph/osd/ceph-6/current/2.10_head 5 = 4 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 15 filestore(/var/lib/ceph/osd/ceph-6) _create_collection(5441): /var/lib/ceph/osd/ceph-6/current/2.10_TEMP 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _create_collection(5447): /var/lib/ceph/osd/ceph-6/current/2.10_TEMP = 0 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 20 read_settings split_rand_factor = 17 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _collection_set_bits(5078): /var/lib/ceph/osd/ceph-6/current/2.10_TEMP 0 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _collection_set_bits(5091): /var/lib/ceph/osd/ceph-6/current/2.10_TEMP 0 = 4 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _set_replay_guard(2589): 3416.0.0 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 20 filestore dbobjectmap: seq is 27 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _set_replay_guard(2629): 3416.0.0 done 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _set_replay_guard(2589): 3416.0.0 2020-03-04T09:21:36.391+0000 7fd7cbc35cc0 20 filestore dbobjectmap: seq is 27 2020-03-04T09:21:36.392+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _set_replay_guard(2629): 3416.0.0 done 2020-03-04T09:21:36.392+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _check_replay_guard(2760): object has 3416.0.0 < current pos 3416.0.1, in past, will replay 2020-03-04T09:21:36.392+0000 7fd7cbc35cc0 15 filestore(/var/lib/ceph/osd/ceph-6) _collection_hint_expected_num_objs(5407): collection: 2.10_head pg number: 17 expected number of objects: 0 2020-03-04T09:21:36.392+0000 7fd7cbc35cc0 15 filestore(/var/lib/ceph/osd/ceph-6) collection_empty(5048): 2.10_head ... 2020-03-04T09:21:36.438+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) _remove(3642): 2.10_head/#2:08000000::::head# = 0 2020-03-04T09:21:36.438+0000 7fd7cbc35cc0 15 filestore(/var/lib/ceph/osd/ceph-6) _merge_collection(5801): 2.10_head 2.0_head bits 4 2020-03-04T09:21:36.438+0000 7fd7cbc35cc0 15 filestore(/var/lib/ceph/osd/ceph-6) collection_stat(5026): /var/lib/ceph/osd/ceph-6/current/2.10_head 2020-03-04T09:21:36.438+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) collection_stat(5030): /var/lib/ceph/osd/ceph-6/current/2.10_head = 0 2020-03-04T09:21:36.438+0000 7fd7cbc35cc0 15 filestore(/var/lib/ceph/osd/ceph-6) collection_stat(5026): /var/lib/ceph/osd/ceph-6/current/2.0_head 2020-03-04T09:21:36.438+0000 7fd7cbc35cc0 10 filestore(/var/lib/ceph/osd/ceph-6) collection_stat(5030): /var/lib/ceph/osd/ceph-6/current/2.0_head = 0
so.. looks like a problem with filestore replay vs merge_collection
#7 Updated by Sage Weil over 3 years ago
- Subject changed from FileStore::_do_transaction() crashed with error 17 to FileStore::_do_transaction() crashed with error 17 (merge collection vs osd restart)
- Status changed from New to Triaged
#8 Updated by Neha Ojha about 3 years ago
- Priority changed from High to Normal