Project

General

Profile

Bug #10985

Updated by Loïc Dachary about 9 years ago

h3. Workaround 

 * get a ceph-osd binary from v0.92 
 * ceph-osd -i X --flush-journal 
 * restart the OSD 

 h3. Description 

 Hello, 

 after upgrading my ceph cluster from 0.92 to 0.93 earlyer this morning 10 OSDs out of 24 are down. 


     -2> 2015-03-02 09:37:15.734525 7f128a0bd880    3 journal journal_replay: applying op seq 3629506 
     -1> 2015-03-02 09:37:15.734706 7f128a0bd880 10 journal op_apply_start 3629506 open_ops 0 -> 1 
      0> 2015-03-02 09:37:15.737648 7f128a0bd880 -1 os/Transaction.cc: In function 'void ObjectStore::Transaction::_build_actions_from_tbl()' thread 7f128a0bd880 time 2015-03-02 09:37:15.734724 
 os/Transaction.cc: 504: FAILED assert(ops == data.ops) 

  ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4) 
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbc7e75] 
  2: (ObjectStore::Transaction::_build_actions_from_tbl()+0x3476) [0x9b2156] 
  3: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x3af0) [0x9359e0] 
  4: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x937954] 
  5: (JournalingObjectStore::journal_replay(unsigned long)+0x5db) [0x9500bb] 
  6: (FileStore::mount()+0x3730) [0x922180] 
  7: (OSD::init()+0x26c) [0x6b828c] 
  8: (main()+0x27f3) [0x6438e3] 
  9: (__libc_start_main()+0xf5) [0x7f128745faf5] 
  10: /usr/bin/ceph-osd() [0x65c849] 
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 



 Adding tarball containing: 
 - backtrace - file "osd.3.backtrace"    (just for osd.3, but others produce same error) 
 - "ceph report" report - file "ceph_report" 
 - log files of failed OSDs - directory logs_of_failed_osds

Back