Project

General

Profile

Bug #8047

0.79: new OSD crashed within minutes

Added by Dmitry Smirnov almost 10 years ago. Updated almost 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On 0.79 I added new OSD (on btrfs). Shortly after re-balancing begin newly added OSD crashed:

    -5> 2014-04-09 16:44:26.998155 7f82d7908700  1 -- 192.168.0.2:6819/13171 <== osd.1 192.168.0.250:0/14833 75 ==== osd_ping(ping e14837 stamp 2014-04-09 16:44:26.980489) v2 ==== 47+0+0 (2626240657 0 0) 0x7f82f4a8a380 con 0x7f82eb58edc0
    -4> 2014-04-09 16:44:26.998175 7f82d7908700  1 -- 192.168.0.2:6819/13171 --> 192.168.0.250:0/14833 -- osd_ping(ping_reply e14837 stamp 2014-04-09 16:44:26.980489) v2 -- ?+0 0x7f82f370a8c0 con 0x7f82eb58edc0
    -3> 2014-04-09 16:44:27.212032 7f82dda55700  0 filestore(/var/lib/ceph/osd/ceph-11)  error (1) Operation not permitted not handled on operation 32 (4143.0.0, or op 0, counting from 0)
    -2> 2014-04-09 16:44:27.212056 7f82dda55700  0 filestore(/var/lib/ceph/osd/ceph-11) unexpected error code
    -1> 2014-04-09 16:44:27.212058 7f82dda55700  0 filestore(/var/lib/ceph/osd/ceph-11)  transaction dump:
{ "ops": [
        { "op_num": 0,
          "op_name": "omap_setkeys",
          "collection": "meta",
          "oid": "516b9a0b\/pglog_2.63\/0\/\/-1",
          "attr_lens": { "0000014837.00000000000000491457": 172}},
        { "op_num": 1,
          "op_name": "collection_setattr",
          "collection": "2.63_head",
          "name": "info",
          "length": 1},
        { "op_num": 2,
          "op_name": "omap_setkeys",
          "collection": "meta",
          "oid": "16ef7597\/infos\/head\/\/-1",
          "attr_lens": { "2.63_biginfo": 138,
              "2.63_epoch": 4,
              "2.63_info": 650}},
        { "op_num": 3,
          "op_name": "omap_rmkeys",
          "collection": "meta",
          "oid": "516b9a0b\/pglog_2.63\/0\/\/-1"},
        { "op_num": 4,
          "op_name": "omap_setkeys",
          "collection": "meta",
          "oid": "516b9a0b\/pglog_2.63\/0\/\/-1",
          "attr_lens": { "0000014837.00000000000000491457": 172,
              "can_rollback_to": 12}}]}
     0> 2014-04-09 16:44:27.238338 7f82dda55700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f82dda55700 time 2014-04-09 16:44:27.227975
os/FileStore.cc: 2540: FAILED assert(0 == "unexpected error")

 ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xb7c) [0x7f82e9f505ac]
 2: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x6c) [0x7f82e9f542dc]
 3: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x160) [0x7f82e9f54460]
 4: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0x7f82ea10fb91]
 5: (ThreadPool::WorkThread::entry()+0x10) [0x7f82ea110a80]
 6: (()+0x8062) [0x7f82e91f6062]
 7: (clone()+0x6d) [0x7f82e771da3d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.11.log
--- end dump of recent events ---
2014-04-09 16:44:27.345195 7f82dda55700 -1 *** Caught signal (Aborted) **
 in thread 7f82dda55700

 ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e)
 1: (()+0x59d26f) [0x7f82ea03e26f]
 2: (()+0xf880) [0x7f82e91fd880]
 3: (gsignal()+0x39) [0x7f82e766d3a9]
 4: (abort()+0x148) [0x7f82e76704c8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f82e7f5a5e5]
 6: (()+0x5e746) [0x7f82e7f58746]
 7: (()+0x5e773) [0x7f82e7f58773]
 8: (()+0x5e9b2) [0x7f82e7f589b2]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0x7f82ea11ecd2]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xb7c) [0x7f82e9f505ac]
 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x6c) [0x7f82e9f542dc]
 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x160) [0x7f82e9f54460]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0x7f82ea10fb91]
 14: (ThreadPool::WorkThread::entry()+0x10) [0x7f82ea110a80]
 15: (()+0x8062) [0x7f82e91f6062]
 16: (clone()+0x6d) [0x7f82e771da3d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2014-04-09 16:44:27.345195 7f82dda55700 -1 *** Caught signal (Aborted) **
 in thread 7f82dda55700

 ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e)
 1: (()+0x59d26f) [0x7f82ea03e26f]
 2: (()+0xf880) [0x7f82e91fd880]
 3: (gsignal()+0x39) [0x7f82e766d3a9]
 4: (abort()+0x148) [0x7f82e76704c8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f82e7f5a5e5]
 6: (()+0x5e746) [0x7f82e7f58746]
 7: (()+0x5e773) [0x7f82e7f58773]
 8: (()+0x5e9b2) [0x7f82e7f589b2]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0x7f82ea11ecd2]
 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xb7c) [0x7f82e9f505ac]
 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x6c) [0x7f82e9f542dc]
 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x160) [0x7f82e9f54460]
 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0x7f82ea10fb91]
 14: (ThreadPool::WorkThread::entry()+0x10) [0x7f82ea110a80]
 15: (()+0x8062) [0x7f82e91f6062]
 16: (clone()+0x6d) [0x7f82e771da3d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.11.log
--- end dump of recent events ---

History

#1 Updated by Dmitry Smirnov almost 10 years ago

Never mind, root cause for this one is physical media errors in HDD. Apologies for noise -- please close this bug.
Thanks.

#2 Updated by Ian Colle almost 10 years ago

  • Status changed from New to Closed

Also available in: Atom PDF