Project

General

Profile

Bug #10985

Updated by Loic Dachary over 5 years ago

h3. Workaround

* ceph-osd -i X --flush-journal
* restart the OSD

h3. Description

Hello,

after upgrading my ceph cluster from 0.92 to 0.93 earlyer this morning 10 OSDs out of 24 are down.

-2> 2015-03-02 09:37:15.734525 7f128a0bd880 3 journal journal_replay: applying op seq 3629506
-1> 2015-03-02 09:37:15.734706 7f128a0bd880 10 journal op_apply_start 3629506 open_ops 0 -> 1
0> 2015-03-02 09:37:15.737648 7f128a0bd880 -1 os/Transaction.cc: In function 'void ObjectStore::Transaction::_build_actions_from_tbl()' thread 7f128a0bd880 time 2015-03-02 09:37:15.734724
os/Transaction.cc: 504: FAILED assert(ops == data.ops)

ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbc7e75]
2: (ObjectStore::Transaction::_build_actions_from_tbl()+0x3476) [0x9b2156]
3: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x3af0) [0x9359e0]
4: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x937954]
5: (JournalingObjectStore::journal_replay(unsigned long)+0x5db) [0x9500bb]
6: (FileStore::mount()+0x3730) [0x922180]
7: (OSD::init()+0x26c) [0x6b828c]
8: (main()+0x27f3) [0x6438e3]
9: (__libc_start_main()+0xf5) [0x7f128745faf5]
10: /usr/bin/ceph-osd() [0x65c849]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Adding tarball containing:
- backtrace - file "osd.3.backtrace" (just for osd.3, but others produce same error)
- "ceph report" report - file "ceph_report"
- log files of failed OSDs - directory logs_of_failed_osds

Back