Project

General

Profile

Bug #11152

"Crash: 'wait_until_healthy'" in upgrade:giant-giant-distro-basic-vps run

Added by Yuri Weinstein about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
Start date:
03/18/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

fea29b1bcbd17b3d1f642398ec70dbe258bbc98f

Run: http://pulpito.ceph.com/teuthology-2015-03-15_17:15:01-upgrade:giant-giant-distro-basic-vps/
Jobs: ['805105', '805106', '805109']
Logs for one: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-15_17:15:01-upgrade:giant-giant-distro-basic-vps/805105/

Crash: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds
ceph version 0.87.1-1-g938e036 (938e03630e075af03780da139ae879b5b0377734)
 1: ceph-osd() [0xba734c]
 2: (()+0xf030) [0x7f18e0f9e030]
 3: (gsignal()+0x35) [0x7f18df909475]
 4: (abort()+0x180) [0x7f18df90c6f0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f18e015e89d]
 6: (()+0x63996) [0x7f18e015c996]
 7: (()+0x639c3) [0x7f18e015c9c3]
 8: (()+0x63bee) [0x7f18e015cbee]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0xc8ccb7]
 10: (pg_log_entry_t::decode(ceph::buffer::list::iterator&)+0x22) [0x894ec2]
 11: (pg_log_entry_t::decode_with_checksum(ceph::buffer::list::iterator&)+0x104) [0x895844]
 12: (PGLog::read_log(ObjectStore*, coll_t, hobject_t, pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&, std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*)+0x1250) [0x87ca90]
 13: (PGLog::read_log(ObjectStore*, coll_t, hobject_t, pg_info_t const&, std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&)+0xa1) [0x901271]
 14: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0xdd) [0x8eafed]
 15: (OSD::load_pgs()+0x8f9) [0x7c6f09]
 16: (OSD::init()+0x13fd) [0x7cb11d]
 17: (main()+0x24c3) [0x7710e3]
 18: (__libc_start_main()+0xfd) [0x7f18df8f5ead]
 19: ceph-osd() [0x776089]

and jobs: ['805098', '805100', '805102', '805103', '805107', '805108']
logs for one: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-15_17:15:01-upgrade:giant-giant-distro-basic-vps/805102/

Crash: 'wait_until_healthy'reached maximum tries (150) after waiting for 900 seconds
ceph version 0.87.1-1-g938e036 (938e03630e075af03780da139ae879b5b0377734)
 1: ceph-osd() [0xa02fc5]
 2: (()+0xf710) [0x7f983106a710]
 3: (gsignal()+0x35) [0x7f982ff3c635]
 4: (abort()+0x175) [0x7f982ff3de15]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7f98307f6a5d]
 6: (()+0xbcbe6) [0x7f98307f4be6]
 7: (()+0xbcc13) [0x7f98307f4c13]
 8: (()+0xbcd0e) [0x7f98307f4d0e]
 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x13e) [0xa595fe]
 10: (pg_log_entry_t::decode(ceph::buffer::list::iterator&)+0x2c) [0x7b06ec]
 11: (pg_log_entry_t::decode_with_checksum(ceph::buffer::list::iterator&)+0x177) [0x7b0d97]
 12: (PGLog::read_log(ObjectStore*, coll_t, hobject_t, pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&, std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*)+0x1ae2) [0x7d46f2]
 13: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2bc) [0x817a4c]
 14: (OSD::load_pgs()+0x1532) [0x67a482]
 15: (OSD::init()+0x186b) [0x67d99b]
 16: (main()+0x3609) [0x6132d9]
 17: (__libc_start_main()+0xfd) [0x7f982ff28d5d]
 18: ceph-osd() [0x60ec89]


Related issues

Related to Ceph - Bug #10157: PGLog::(read|write)_log don't write out rollback_info_trimmed_to Resolved 11/20/2014

History

#1 Updated by Sage Weil about 4 years ago

  • Status changed from New to Resolved

giant was missing the backport of 1fe8b846641486cc294fe7e1d2450132c38d2dba, the fix for #10157. the latest firefly was setting the pglog key rollback_info_trimmed_to but giant was interpreting it as an update key and crashing.

#2 Updated by Loic Dachary about 4 years ago

  • Description updated (diff)

Also available in: Atom PDF