Actions
Bug #5270
closedosd: crash in PG::peek_map_epoch()
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
-3> 2013-06-06 22:11:43.947436 7f69f8cda780 10 osd.1 245 pgid 76.5 coll 76.5_head -2> 2013-06-06 22:11:43.947446 7f69f8cda780 15 filestore(/var/lib/ceph/osd/ceph-1) collection_getattr /var/lib/ceph/osd/ceph-1/current/76.5_head 'info' -1> 2013-06-06 22:11:43.947468 7f69f8cda780 10 filestore(/var/lib/ceph/osd/ceph-1) collection_getattr /var/lib/ceph/osd/ceph-1/current/76.5_head 'info' = -61 0> 2013-06-06 22:11:43.950057 7f69f8cda780 -1 *** Caught signal (Aborted) ** in thread 7f69f8cda780 ceph version 0.63-393-g08923eb (08923eb842a9768ff556939221e63b983724e9bf) 1: ceph-osd() [0x7a825a] 2: (()+0xfcb0) [0x7f69f86bdcb0] 3: (gsignal()+0x35) [0x7f69f678b425] 4: (abort()+0x17b) [0x7f69f678eb8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f69f70dd69d] 6: (()+0xb5846) [0x7f69f70db846] 7: (()+0xb5873) [0x7f69f70db873] 8: (()+0xb596e) [0x7f69f70db96e] 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0x85ab37] 10: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::buffer::list*)+0x112) [0x6c1e82] 11: (OSD::load_pgs()+0x14dd) [0x64cd6d] 12: (OSD::init()+0xf85) [0x64f1c5] 13: (main()+0x1cff) [0x5836cf]
several instances on current master. one test was
ubuntu@teuthology:/a/sage-2013-06-06_17:56:44-rados-wip-mon-testing-basic/32596$ cat orig.config.yaml kernel: kdb: true sha1: 19bb6a83cb93383b363cc5956e304213f0f1b79f machine_type: plana nuke-on-error: true overrides: ceph: conf: global: ms inject delay max: 1 ms inject delay probability: 0.005 ms inject delay type: osd ms inject socket failures: 2500 mon: debug mon: 20 debug ms: 20 debug paxos: 20 fs: xfs log-whitelist: - slow request sha1: 08923eb842a9768ff556939221e63b983724e9bf install: ceph: sha1: 08923eb842a9768ff556939221e63b983724e9bf s3tests: branch: master workunit: sha1: 08923eb842a9768ff556939221e63b983724e9bf roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - ceph-fuse: null - workunit: clients: client.0: - rados/test.sh
looking at the store the xattr is not there:
root@plana24:/var/lib/ceph/osd/ceph-1/current/76.5_head# getfattr -d . # file: . user.cephos.collection_version=0sAwAAAA== user.cephos.phash.contents=0sAQAAAAAAAAAAAAAAAAAAAAA= user.cephos.seq=0sAQEQAAAAHRMAAAAAAAAAAAAABQAAAAA=
other jobs, also pg splitting:
ubuntu@teuthology:/a/sage-2013-06-06_17:56:44-rados-wip-mon-testing-basic/32578
ubuntu@teuthology:/a/sage-2013-06-06_17:56:44-rados-wip-mon-testing-basic/32584
ubuntu@teuthology:/a/sage-2013-06-06_17:56:44-rados-wip-mon-testing-basic/32590
Updated by Samuel Just almost 11 years ago
Very odd. That xattr is written atomically on pg collection creation and never overwritten thereafter.
Updated by Sergey Fionov almost 11 years ago
I've got the same error when some pginfo files have been lost due to XFS corruption. Removing pg collection helped to start osd again.
Actions