Bug #5154
closedosd/SnapMapper.cc: 270: FAILED assert(check(oid))
0%
Description
0> 2013-05-23 01:52:34.870678 7f8a88dd0700 -1 osd/SnapMapper.cc: In function 'int SnapMapper::remove_oid(const hobject_t&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' thread 7f8a88dd0700 time 2013-05-23 01:52:34.842007 osd/SnapMapper.cc: 270: FAILED assert(check(oid)) ceph version 0.62-198-gc2e262f (c2e262fc9493b4bb22c2b7b4990aa1ee7846940e) 1: ceph-osd() [0x692392] 2: (remove_dir(ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t, std::tr1::shared_ptr<DeletingState>)+0x3ac) [0x60a2cc] 3: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >)+0x2d0) [0x60ace0] 4: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> > >::_process(std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x52) [0x65a822] 5: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> > >::_void_process(void*, ThreadPool::TPHandle&)+0xd4) [0x65a954] 6: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x83a486] 7: (ThreadPool::WorkThread::entry()+0x10) [0x83c2b0] 8: (()+0x7e9a) [0x7f8a9b76be9a] 9: (clone()+0x6d) [0x7f8a998feccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
job was
ubuntu@teuthology:/a/teuthology-2013-05-23_01:00:08-rados-next-testing-basic/20329$ cat orig.config.yaml kernel: kdb: true sha1: cb53a7ee6c6a42a0d8b472eb1d296219def2d877 machine_type: plana nuke-on-error: true overrides: ceph: conf: global: ms inject socket failures: 5000 mon: debug mon: 20 debug ms: 20 debug paxos: 20 fs: xfs log-whitelist: - slow request sha1: c2e262fc9493b4bb22c2b7b4990aa1ee7846940e install: ceph: sha1: c2e262fc9493b4bb22c2b7b4990aa1ee7846940e s3tests: branch: next workunit: sha1: c2e262fc9493b4bb22c2b7b4990aa1ee7846940e roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - radosbench: clients: - client.0 time: 1800
Updated by Sage Weil almost 11 years ago
- Status changed from Resolved to 12
just hit this on job
ubuntu@teuthology:/a/teuthology-2013-06-24_01:00:12-rados-master-testing-basic/43959$ cat orig.config.yaml kernel: kdb: true sha1: 3d740946b3b79d51f07d9a735a5fb77a849f57dd machine_type: plana nuke-on-error: true overrides: admin_socket: branch: master ceph: conf: global: ms inject delay max: 1 ms inject delay probability: 0.005 ms inject delay type: osd ms inject socket failures: 2500 mon: debug mon: 20 debug ms: 20 debug paxos: 20 fs: xfs log-whitelist: - slow request sha1: 134d08a9654f66634b893d493e4a92f38acc63cf install: ceph: sha1: 134d08a9654f66634b893d493e4a92f38acc63cf s3tests: branch: master workunit: sha1: 134d08a9654f66634b893d493e4a92f38acc63cf roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - rados: clients: - client.0 objects: 500 op_weights: delete: 10 read: 45 write: 45 ops: 4000
Updated by Samuel Just almost 11 years ago
ubuntu@teuthology:/a/teuthology-2013-07-01_01:00:09-rados-master-testing-basic/51502/remote$ less ceph-osd.4.log
0> 2013-07-01 01:35:20.186234 7fd8e8f78700 -1 osd/SnapMapper.cc: In function 'int SnapMapper::remove_oid(const hobject_t&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
thread 7fd8e8f78700 time 2013-07-01 01:35:20.183963
osd/SnapMapper.cc: 270: FAILED assert(check(oid))
ceph version 0.65-225-ge235958 (e235958fb13e654aa1ad5e1c5a6ccf76a95196ca)
1: ceph-osd() [0x6ddd32]
2: (remove_dir(ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t, std::tr1::shared_ptr<DeletingState>)+0x3e1) [0x657431]
3: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >)+0x2d0) [0x657ef0]
4: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> > >::_process(std::pai
r<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x52) [0x6a5362]
5: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> > >::_void_process(voi
d*, ThreadPool::TPHandle&)+0xd4) [0x6a5494]
6: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8a5136]
7: (ThreadPool::WorkThread::entry()+0x10) [0x8a6f60]
8: (()+0x7e9a) [0x7fd8fc915e9a]
9: (clone()+0x6d) [0x7fd8faaa8ccd]
Updated by Sage Weil almost 11 years ago
ubuntu@teuthology:/a/teuthology-2013-07-15_01:00:16-rados-next-testing-basic/67483
Updated by Samuel Just almost 11 years ago
- Status changed from 12 to Need More Info
Updated by Samuel Just almost 11 years ago
2013-07-17 15:14:10.588979 7f87c4c3a700 15 filestore(/var/lib/ceph/osd/ceph-0) _split_collection 3.0_head bits: 3
2013-07-17 15:14:10.588985 7f87c4c3a700 15 filestore(/var/lib/ceph/osd/ceph-0) collection_stat /var/lib/ceph/osd/ceph-0/current/3.0_head
2013-07-17 15:14:10.588993 7f87c4c3a700 10 filestore(/var/lib/ceph/osd/ceph-0) collection_stat /var/lib/ceph/osd/ceph-0/current/3.0_head = 0
2013-07-17 15:14:10.588997 7f87c4c3a700 15 filestore(/var/lib/ceph/osd/ceph-0) collection_stat /var/lib/ceph/osd/ceph-0/current/3.7_head
2013-07-17 15:14:10.589001 7f87c4c3a700 10 filestore(/var/lib/ceph/osd/ceph-0) collection_stat /var/lib/ceph/osd/ceph-0/current/3.7_head = 0
2013-07-17 15:14:10.589003 7f87c4c3a700 15 filestore(/var/lib/ceph/osd/ceph-0) collection_empty 3.7_head
2013-07-17 15:14:10.589010 7f87c4c3a700 20 _collection_list_partial 0//0//-1 1-1 ls.size 0
2013-07-17 15:14:10.589039 7f87c4c3a700 20 prefixes
2013-07-17 15:14:10.589052 7f87c4c3a700 10 filestore(/var/lib/ceph/osd/ceph-0) _set_replay_guard 840.0.43 START
2013-07-17 15:14:10.589057 7f87c4c3a700 20 filestore dbobjectmap: seq is 36
2013-07-17 15:14:10.622281 7f87c4c3a700 10 filestore(/var/lib/ceph/osd/ceph-0) _set_replay_guard 840.0.43 done
2013-07-17 15:14:10.622351 7f87c4c3a700 10 filestore(/var/lib/ceph/osd/ceph-0) _set_replay_guard 840.0.43 START
2013-07-17 15:14:10.622360 7f87c4c3a700 20 filestore dbobjectmap: seq is 36
2013-07-17 15:14:10.648775 7f87bf42f700 1 -- 10.214.133.30:6802/12604 <== osd.4 10.214.132.3:0/26804 13 ==== osd_ping(ping e11 stamp 2013-07-17 15:14:10.649236) v2 ==== 47+0+0 (421875485 0 0) 0x29f6700 con 0x28d5b00
2013-07-17 15:14:10.648798 7f87c0431700 1 -- 10.214.133.30:6803/12604 <== osd.4 10.214.132.3:0/26804 13 ==== osd_ping(ping e11 stamp 2013-07-17 15:14:10.649236) v2 ==== 47+0+0 (421875485 0 0) 0x27c4380 con 0x28282c0
2013-07-17 15:14:10.648812 7f87bf42f700 1 -- 10.214.133.30:6802/12604 --> 10.214.132.3:0/26804 -- osd_ping(ping_reply e11 stamp 2013-07-17 15:14:10.649236) v2 -- ?+0 0x27d31c0 con 0x28d5b00
2013-07-17 15:14:10.648837 7f87bf42f700 20 osd.0 11 _share_map_outgoing 0x2793b00 already has epoch 11
2013-07-17 15:14:10.648857 7f87c0431700 1 -- 10.214.133.30:6803/12604 --> 10.214.132.3:0/26804 -- osd_ping(ping_reply e11 stamp 2013-07-17 15:14:10.649236) v2 -- ?+0 0x27b7e00 con 0x28282c0
2013-07-17 15:14:10.648889 7f87c0431700 20 osd.0 11 _share_map_outgoing 0x2793b00 already has epoch 11
2013-07-17 15:14:10.655607 7f87c4c3a700 10 filestore(/var/lib/ceph/osd/ceph-0) _set_replay_guard 840.0.43 done
...
2013-07-17 15:14:21.408944 7f9595f68780 0 ceph version 0.66-673-g0ebf23c (0ebf23cee84180a0ae8b9fc0d8c2463ca31e6cbc), process ceph-osd, pid 14016
...
2013-07-17 15:14:21.724675 7f9595f68780 10 filestore(/var/lib/ceph/osd/ceph-0) do_transaction on 0x16c4180
2013-07-17 15:14:21.724733 7f9595f68780 10 filestore(/var/lib/ceph/osd/ceph-0) error opening file /var/lib/ceph/osd/ceph-0/current/3.0_head/plana7527305-271_head_3A673EB6__3 with flags=2: (2) No such file or directory
2013-07-17 15:14:21.724757 7f9595f68780 10 filestore(/var/lib/ceph/osd/ceph-0) check_replay_guard 3.0_head 3a673eb6/plana7527305-271/head//3 dne
2013-07-17 15:14:21.724763 7f9595f68780 15 filestore(/var/lib/ceph/osd/ceph-0) write 3.0_head/3a673eb6/plana7527305-271/head//3 1593331~588713
2013-07-17 15:14:21.725360 7f9595f68780 10 filestore(/var/lib/ceph/osd/ceph-0) write 3.0_head/3a673eb6/plana7527305-271/head//3 1593331~588713 = 588713
2013-07-17 15:14:21.725437 7f9595f68780 20 filestore(/var/lib/ceph/osd/ceph-0) _check_replay_guard no xattr
2013-07-17 15:14:21.725477 7f9595f68780 15 filestore(/var/lib/ceph/osd/ceph-0) setattrs 3.0_head/3a673eb6/plana7527305-271/head//3
2013-07-17 15:14:21.725531 7f9595f68780 10 filestore(/var/lib/ceph/osd/ceph-0) setattrs 3.0_head/3a673eb6/plana7527305-271/head//3 = -2
2013-07-17 15:14:21.725543 7f9595f68780 10 filestore(/var/lib/ceph/osd/ceph-0) tolerating ENOENT on replay
2013-07-17 15:14:21.725568 7f9595f68780 20 filestore(/var/lib/ceph/osd/ceph-0) _check_replay_guard no xattr
2013-07-17 15:14:21.725590 7f9595f68780 20 filestore(/var/lib/ceph/osd/ceph-0) fgetattrs 22 getting ''
2013-07-17 15:14:21.725602 7f9595f68780 15 filestore(/var/lib/ceph/osd/ceph-0) setattrs 3.0_head/3a673eb6/plana7527305-271/head//3
2013-07-17 15:14:21.725637 7f9595f68780 10 filestore(/var/lib/ceph/osd/ceph-0) setattrs 3.0_head/3a673eb6/plana7527305-271/head//3 = -2
2013-07-17 15:14:21.725647 7f9595f68780 10 filestore(/var/lib/ceph/osd/ceph-0) tolerating ENOENT on replay
2013-07-17 15:14:21.729082 7f9595f68780 10 filestore(/var/lib/ceph/osd/ceph-0) _do_transaction on 0x16c4180
2013-07-17 15:14:21.729088 7f9595f68780 15 filestore(/var/lib/ceph/osd/ceph-0) _omap_setkeys meta/103a0d1e/pglog_3.0/0//-1
2013-07-17 15:14:21.729167 7f9595f68780 10 filestore hoid: 103a0d1e/pglog_3.0/0//-1 not skipping op, *spos 828.0.0
2013-07-17 15:14:21.729178 7f9595f68780 10 filestore > header.spos 0.0.0
ubuntu@teuthology:/a/dzafman-5154/70686
Splits need to do a sync and prevent replay of any operation prior to that point within the affected collections. Or something.
Updated by Samuel Just almost 11 years ago
- Status changed from Need More Info to Resolved