Project

General

Profile

Actions

Bug #8270

closed

0.80~rc1: OSD crash during replication after repair

Added by Dmitry Smirnov about 10 years ago. Updated almost 10 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

    -5> 2014-05-02 17:36:59.305440 7f46b148c700  5 -- op tracker -- , seq: 124083, time: 2014-05-02 17:36:59.305310, event: throttled, request: pg_trim(0.2b to 28876'3931 e28876) v1
    -4> 2014-05-02 17:36:59.305442 7f46b148c700  5 -- op tracker -- , seq: 124083, time: 2014-05-02 17:36:59.305382, event: all_read, request: pg_trim(0.2b to 28876'3931 e28876) v1
    -3> 2014-05-02 17:36:59.305445 7f46b148c700  5 -- op tracker -- , seq: 124083, time: 2014-05-02 17:36:59.305427, event: dispatched, request: pg_trim(0.2b to 28876'3931 e28876) v1
    -2> 2014-05-02 17:36:59.305450 7f46b148c700  5 -- op tracker -- , seq: 124083, time: 2014-05-02 17:36:59.305449, event: waiting_for_osdmap, request: pg_trim(0.2b to 28876'3931 e28876) v1
    -1> 2014-05-02 17:36:59.305460 7f46b148c700  5 -- op tracker -- , seq: 124083, time: 2014-05-02 17:36:59.305458, event: started, request: pg_trim(0.2b to 28876'3931 e28876) v1
     0> 2014-05-02 17:36:59.330794 7f46aac7f700 -1 osd/ReplicatedPG.cc: In function 'int ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)' thread 7f46aac7f700 time 2014-05-02 17:36:59.215010
osd/ReplicatedPG.cc: 9682: FAILED assert(latest->is_update())

 ceph version 0.80-rc1 (6769f4dc88425396921f94e1a37a1c90758aa3ea)
 1: (ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)+0x441) [0x7f46ca066a71]
 2: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*, ThreadPool::TPHandle&, int*)+0x54b) [0x7f46ca095feb]
 3: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x2ae) [0x7f46c9eb488e]
 4: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x17) [0x7f46c9f11437]
 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0x7f46ca2c9bb1]
 6: (ThreadPool::WorkThread::entry()+0x10) [0x7f46ca2caaa0]
 7: (()+0x8062) [0x7f46c93a9062]
 8: (clone()+0x6d) [0x7f46c78d1a3d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.6.log
--- end dump of recent events ---
2014-05-02 17:36:59.399677 7f46aac7f700 -1 *** Caught signal (Aborted) **
 in thread 7f46aac7f700

 ceph version 0.80-rc1 (6769f4dc88425396921f94e1a37a1c90758aa3ea)
 1: (()+0x5a5daf) [0x7f46ca1f7daf]
 2: (()+0xf880) [0x7f46c93b0880]
 3: (gsignal()+0x39) [0x7f46c78213a9]
 4: (abort()+0x148) [0x7f46c78244c8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f46c810e5e5]
 6: (()+0x5e746) [0x7f46c810c746]
 7: (()+0x5e773) [0x7f46c810c773]
 8: (()+0x5e9b2) [0x7f46c810c9b2]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0x7f46ca2d8cf2]
 10: (ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)+0x441) [0x7f46ca066a71]
 11: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*, ThreadPool::TPHandle&, int*)+0x54b) [0x7f46ca095feb]
 12: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x2ae) [0x7f46c9eb488e]
 13: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x17) [0x7f46c9f11437]
 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0x7f46ca2c9bb1]
 15: (ThreadPool::WorkThread::entry()+0x10) [0x7f46ca2caaa0]
 16: (()+0x8062) [0x7f46c93a9062]
 17: (clone()+0x6d) [0x7f46c78d1a3d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -26> 2014-05-02 17:36:59.351778 7f46ab480700  5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.351777, event: started, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4
   -25> 2014-05-02 17:36:59.351796 7f46ab480700  5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.351795, event: started, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4
   -24> 2014-05-02 17:36:59.351832 7f46ab480700  5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.351831, event: waiting for subops from 1,7,9, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4
   -23> 2014-05-02 17:36:59.351850 7f46ab480700  1 -- 192.168.0.204:6811/1007885 --> osd.1 192.168.0.250:6809/19720 -- osd_sub_op(mds.0.164:467954 0.19 2575c599/100002d177c.00000000/head//0 [] v 28876'322141 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46d3db6b00
   -22> 2014-05-02 17:36:59.351867 7f46ab480700  1 -- 192.168.0.204:6811/1007885 --> osd.7 192.168.0.2:6814/17668 -- osd_sub_op(mds.0.164:467954 0.19 2575c599/100002d177c.00000000/head//0 [] v 28876'322141 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46e703b700
   -21> 2014-05-02 17:36:59.351881 7f46ab480700  1 -- 192.168.0.204:6811/1007885 --> osd.9 192.168.0.7:6801/1155 -- osd_sub_op(mds.0.164:467954 0.19 2575c599/100002d177c.00000000/head//0 [] v 28876'322141 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46eabb1700
   -20> 2014-05-02 17:36:59.351935 7f46ab480700  5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.351934, event: commit_queued_for_journal_write, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4
   -19> 2014-05-02 17:36:59.351969 7f46c08de700  5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.351968, event: write_thread_in_journal_buffer, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4
   -18> 2014-05-02 17:36:59.352159 7f46c08de700  5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.352159, event: journaled_completion_queued, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4
   -17> 2014-05-02 17:36:59.352177 7f46bf8dc700  5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.352176, event: op_commit, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4
   -16> 2014-05-02 17:36:59.352729 7f46be0d9700  5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.352729, event: op_applied, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4
   -15> 2014-05-02 17:36:59.355718 7f46aec87700  1 -- 192.168.0.204:6812/1007885 <== osd.0 192.168.0.250:0/19386 978 ==== osd_ping(ping e28876 stamp 2014-05-02 17:36:59.352828) v2 ==== 47+0+0 (1505424456 0 0) 0x7f46e85b0300 con 0x7f46e6d56000
   -14> 2014-05-02 17:36:59.355737 7f46aec87700  1 -- 192.168.0.204:6812/1007885 --> 192.168.0.250:0/19386 -- osd_ping(ping_reply e28876 stamp 2014-05-02 17:36:59.352828) v2 -- ?+0 0x7f46e0e7c540 con 0x7f46e6d56000
   -13> 2014-05-02 17:36:59.355959 7f46afc89700  1 -- 192.168.0.204:6813/1007885 <== osd.0 192.168.0.250:0/19386 978 ==== osd_ping(ping e28876 stamp 2014-05-02 17:36:59.352828) v2 ==== 47+0+0 (1505424456 0 0) 0x7f46dd0504c0 con 0x7f46e6d566e0
   -12> 2014-05-02 17:36:59.355975 7f46afc89700  1 -- 192.168.0.204:6813/1007885 --> 192.168.0.250:0/19386 -- osd_ping(ping_reply e28876 stamp 2014-05-02 17:36:59.352828) v2 -- ?+0 0x7f46e76e72c0 con 0x7f46e6d566e0
   -11> 2014-05-02 17:36:59.375383 7f46abc81700  5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375383, event: started, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4
   -10> 2014-05-02 17:36:59.375413 7f46abc81700  5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375412, event: started, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4
    -9> 2014-05-02 17:36:59.375461 7f46abc81700  5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375460, event: waiting for subops from 3,4,9, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4
    -8> 2014-05-02 17:36:59.375480 7f46abc81700  1 -- 192.168.0.204:6811/1007885 --> osd.3 192.168.0.2:6821/16947 -- osd_sub_op(mds.0.164:467953 0.71 e530bf1/100002d177b.00000000/head//0 [] v 28876'272097 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46ebaa6000
    -7> 2014-05-02 17:36:59.375500 7f46abc81700  1 -- 192.168.0.204:6811/1007885 --> osd.4 192.168.0.250:6819/20528 -- osd_sub_op(mds.0.164:467953 0.71 e530bf1/100002d177b.00000000/head//0 [] v 28876'272097 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46db7ee200
    -6> 2014-05-02 17:36:59.375515 7f46abc81700  1 -- 192.168.0.204:6811/1007885 --> osd.9 192.168.0.7:6801/1155 -- osd_sub_op(mds.0.164:467953 0.71 e530bf1/100002d177b.00000000/head//0 [] v 28876'272097 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46e747a100
    -5> 2014-05-02 17:36:59.375573 7f46abc81700  5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375572, event: commit_queued_for_journal_write, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4
    -4> 2014-05-02 17:36:59.375611 7f46c08de700  5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375610, event: write_thread_in_journal_buffer, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4
    -3> 2014-05-02 17:36:59.375807 7f46c08de700  5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375806, event: journaled_completion_queued, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4
    -2> 2014-05-02 17:36:59.375938 7f46bf8dc700  5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375937, event: op_commit, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4
    -1> 2014-05-02 17:36:59.376326 7f46be0d9700  5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.376325, event: op_applied, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4
     0> 2014-05-02 17:36:59.399677 7f46aac7f700 -1 *** Caught signal (Aborted) **
 in thread 7f46aac7f700

 ceph version 0.80-rc1 (6769f4dc88425396921f94e1a37a1c90758aa3ea)
 1: (()+0x5a5daf) [0x7f46ca1f7daf]
 2: (()+0xf880) [0x7f46c93b0880]
 3: (gsignal()+0x39) [0x7f46c78213a9]
 4: (abort()+0x148) [0x7f46c78244c8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f46c810e5e5]
 6: (()+0x5e746) [0x7f46c810c746]
 7: (()+0x5e773) [0x7f46c810c773]
 8: (()+0x5e9b2) [0x7f46c810c9b2]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0x7f46ca2d8cf2]
 10: (ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)+0x441) [0x7f46ca066a71]
 11: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*, ThreadPool::TPHandle&, int*)+0x54b) [0x7f46ca095feb]
 12: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x2ae) [0x7f46c9eb488e]
 13: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x17) [0x7f46c9f11437]
 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0x7f46ca2c9bb1]
 15: (ThreadPool::WorkThread::entry()+0x10) [0x7f46ca2caaa0]
 16: (()+0x8062) [0x7f46c93a9062]
 17: (clone()+0x6d) [0x7f46c78d1a3d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.6.log
--- end dump of recent events ---
Actions

Also available in: Atom PDF