Actions
Bug #8270
closed0.80~rc1: OSD crash during replication after repair
Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
-5> 2014-05-02 17:36:59.305440 7f46b148c700 5 -- op tracker -- , seq: 124083, time: 2014-05-02 17:36:59.305310, event: throttled, request: pg_trim(0.2b to 28876'3931 e28876) v1 -4> 2014-05-02 17:36:59.305442 7f46b148c700 5 -- op tracker -- , seq: 124083, time: 2014-05-02 17:36:59.305382, event: all_read, request: pg_trim(0.2b to 28876'3931 e28876) v1 -3> 2014-05-02 17:36:59.305445 7f46b148c700 5 -- op tracker -- , seq: 124083, time: 2014-05-02 17:36:59.305427, event: dispatched, request: pg_trim(0.2b to 28876'3931 e28876) v1 -2> 2014-05-02 17:36:59.305450 7f46b148c700 5 -- op tracker -- , seq: 124083, time: 2014-05-02 17:36:59.305449, event: waiting_for_osdmap, request: pg_trim(0.2b to 28876'3931 e28876) v1 -1> 2014-05-02 17:36:59.305460 7f46b148c700 5 -- op tracker -- , seq: 124083, time: 2014-05-02 17:36:59.305458, event: started, request: pg_trim(0.2b to 28876'3931 e28876) v1 0> 2014-05-02 17:36:59.330794 7f46aac7f700 -1 osd/ReplicatedPG.cc: In function 'int ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)' thread 7f46aac7f700 time 2014-05-02 17:36:59.215010 osd/ReplicatedPG.cc: 9682: FAILED assert(latest->is_update()) ceph version 0.80-rc1 (6769f4dc88425396921f94e1a37a1c90758aa3ea) 1: (ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)+0x441) [0x7f46ca066a71] 2: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*, ThreadPool::TPHandle&, int*)+0x54b) [0x7f46ca095feb] 3: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x2ae) [0x7f46c9eb488e] 4: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x17) [0x7f46c9f11437] 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0x7f46ca2c9bb1] 6: (ThreadPool::WorkThread::entry()+0x10) [0x7f46ca2caaa0] 7: (()+0x8062) [0x7f46c93a9062] 8: (clone()+0x6d) [0x7f46c78d1a3d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.6.log --- end dump of recent events --- 2014-05-02 17:36:59.399677 7f46aac7f700 -1 *** Caught signal (Aborted) ** in thread 7f46aac7f700 ceph version 0.80-rc1 (6769f4dc88425396921f94e1a37a1c90758aa3ea) 1: (()+0x5a5daf) [0x7f46ca1f7daf] 2: (()+0xf880) [0x7f46c93b0880] 3: (gsignal()+0x39) [0x7f46c78213a9] 4: (abort()+0x148) [0x7f46c78244c8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f46c810e5e5] 6: (()+0x5e746) [0x7f46c810c746] 7: (()+0x5e773) [0x7f46c810c773] 8: (()+0x5e9b2) [0x7f46c810c9b2] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0x7f46ca2d8cf2] 10: (ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)+0x441) [0x7f46ca066a71] 11: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*, ThreadPool::TPHandle&, int*)+0x54b) [0x7f46ca095feb] 12: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x2ae) [0x7f46c9eb488e] 13: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x17) [0x7f46c9f11437] 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0x7f46ca2c9bb1] 15: (ThreadPool::WorkThread::entry()+0x10) [0x7f46ca2caaa0] 16: (()+0x8062) [0x7f46c93a9062] 17: (clone()+0x6d) [0x7f46c78d1a3d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -26> 2014-05-02 17:36:59.351778 7f46ab480700 5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.351777, event: started, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4 -25> 2014-05-02 17:36:59.351796 7f46ab480700 5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.351795, event: started, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4 -24> 2014-05-02 17:36:59.351832 7f46ab480700 5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.351831, event: waiting for subops from 1,7,9, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4 -23> 2014-05-02 17:36:59.351850 7f46ab480700 1 -- 192.168.0.204:6811/1007885 --> osd.1 192.168.0.250:6809/19720 -- osd_sub_op(mds.0.164:467954 0.19 2575c599/100002d177c.00000000/head//0 [] v 28876'322141 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46d3db6b00 -22> 2014-05-02 17:36:59.351867 7f46ab480700 1 -- 192.168.0.204:6811/1007885 --> osd.7 192.168.0.2:6814/17668 -- osd_sub_op(mds.0.164:467954 0.19 2575c599/100002d177c.00000000/head//0 [] v 28876'322141 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46e703b700 -21> 2014-05-02 17:36:59.351881 7f46ab480700 1 -- 192.168.0.204:6811/1007885 --> osd.9 192.168.0.7:6801/1155 -- osd_sub_op(mds.0.164:467954 0.19 2575c599/100002d177c.00000000/head//0 [] v 28876'322141 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46eabb1700 -20> 2014-05-02 17:36:59.351935 7f46ab480700 5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.351934, event: commit_queued_for_journal_write, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4 -19> 2014-05-02 17:36:59.351969 7f46c08de700 5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.351968, event: write_thread_in_journal_buffer, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4 -18> 2014-05-02 17:36:59.352159 7f46c08de700 5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.352159, event: journaled_completion_queued, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4 -17> 2014-05-02 17:36:59.352177 7f46bf8dc700 5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.352176, event: op_commit, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4 -16> 2014-05-02 17:36:59.352729 7f46be0d9700 5 -- op tracker -- , seq: 124082, time: 2014-05-02 17:36:59.352729, event: op_applied, request: osd_op(mds.0.164:467954 100002d177c.00000000 [delete] 0.2575c599 snapc 1=[] ondisk+write e28876) v4 -15> 2014-05-02 17:36:59.355718 7f46aec87700 1 -- 192.168.0.204:6812/1007885 <== osd.0 192.168.0.250:0/19386 978 ==== osd_ping(ping e28876 stamp 2014-05-02 17:36:59.352828) v2 ==== 47+0+0 (1505424456 0 0) 0x7f46e85b0300 con 0x7f46e6d56000 -14> 2014-05-02 17:36:59.355737 7f46aec87700 1 -- 192.168.0.204:6812/1007885 --> 192.168.0.250:0/19386 -- osd_ping(ping_reply e28876 stamp 2014-05-02 17:36:59.352828) v2 -- ?+0 0x7f46e0e7c540 con 0x7f46e6d56000 -13> 2014-05-02 17:36:59.355959 7f46afc89700 1 -- 192.168.0.204:6813/1007885 <== osd.0 192.168.0.250:0/19386 978 ==== osd_ping(ping e28876 stamp 2014-05-02 17:36:59.352828) v2 ==== 47+0+0 (1505424456 0 0) 0x7f46dd0504c0 con 0x7f46e6d566e0 -12> 2014-05-02 17:36:59.355975 7f46afc89700 1 -- 192.168.0.204:6813/1007885 --> 192.168.0.250:0/19386 -- osd_ping(ping_reply e28876 stamp 2014-05-02 17:36:59.352828) v2 -- ?+0 0x7f46e76e72c0 con 0x7f46e6d566e0 -11> 2014-05-02 17:36:59.375383 7f46abc81700 5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375383, event: started, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4 -10> 2014-05-02 17:36:59.375413 7f46abc81700 5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375412, event: started, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4 -9> 2014-05-02 17:36:59.375461 7f46abc81700 5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375460, event: waiting for subops from 3,4,9, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4 -8> 2014-05-02 17:36:59.375480 7f46abc81700 1 -- 192.168.0.204:6811/1007885 --> osd.3 192.168.0.2:6821/16947 -- osd_sub_op(mds.0.164:467953 0.71 e530bf1/100002d177b.00000000/head//0 [] v 28876'272097 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46ebaa6000 -7> 2014-05-02 17:36:59.375500 7f46abc81700 1 -- 192.168.0.204:6811/1007885 --> osd.4 192.168.0.250:6819/20528 -- osd_sub_op(mds.0.164:467953 0.71 e530bf1/100002d177b.00000000/head//0 [] v 28876'272097 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46db7ee200 -6> 2014-05-02 17:36:59.375515 7f46abc81700 1 -- 192.168.0.204:6811/1007885 --> osd.9 192.168.0.7:6801/1155 -- osd_sub_op(mds.0.164:467953 0.71 e530bf1/100002d177b.00000000/head//0 [] v 28876'272097 snapset=0=[]:[] snapc=0=[]) v10 -- ?+125 0x7f46e747a100 -5> 2014-05-02 17:36:59.375573 7f46abc81700 5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375572, event: commit_queued_for_journal_write, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4 -4> 2014-05-02 17:36:59.375611 7f46c08de700 5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375610, event: write_thread_in_journal_buffer, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4 -3> 2014-05-02 17:36:59.375807 7f46c08de700 5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375806, event: journaled_completion_queued, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4 -2> 2014-05-02 17:36:59.375938 7f46bf8dc700 5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.375937, event: op_commit, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4 -1> 2014-05-02 17:36:59.376326 7f46be0d9700 5 -- op tracker -- , seq: 124081, time: 2014-05-02 17:36:59.376325, event: op_applied, request: osd_op(mds.0.164:467953 100002d177b.00000000 [delete] 0.e530bf1 snapc 1=[] ondisk+write e28876) v4 0> 2014-05-02 17:36:59.399677 7f46aac7f700 -1 *** Caught signal (Aborted) ** in thread 7f46aac7f700 ceph version 0.80-rc1 (6769f4dc88425396921f94e1a37a1c90758aa3ea) 1: (()+0x5a5daf) [0x7f46ca1f7daf] 2: (()+0xf880) [0x7f46c93b0880] 3: (gsignal()+0x39) [0x7f46c78213a9] 4: (abort()+0x148) [0x7f46c78244c8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f46c810e5e5] 6: (()+0x5e746) [0x7f46c810c746] 7: (()+0x5e773) [0x7f46c810c773] 8: (()+0x5e9b2) [0x7f46c810c9b2] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0x7f46ca2d8cf2] 10: (ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)+0x441) [0x7f46ca066a71] 11: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*, ThreadPool::TPHandle&, int*)+0x54b) [0x7f46ca095feb] 12: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x2ae) [0x7f46c9eb488e] 13: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x17) [0x7f46c9f11437] 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0x7f46ca2c9bb1] 15: (ThreadPool::WorkThread::entry()+0x10) [0x7f46ca2caaa0] 16: (()+0x8062) [0x7f46c93a9062] 17: (clone()+0x6d) [0x7f46c78d1a3d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.6.log --- end dump of recent events ---
Actions