Actions
Bug #13428
closedmultiple OSDs crashed on same node
Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Description
Stracktraces in different OSDs:
The OSDS are on filestore, most of them Erasure coded, but also an OSD of the cache replicated pool:
Full logs attached
-5> 2015-10-09 08:50:24.476720 7f5249539700 5 -- op tracker -- seq: 68376217, time: 2015-10-09 08:50:24.476720, event: commit_queued_for_journal _write, op: osd_repop(client.205172.0:507328 1.1f1 1/3e80a1f1/10003ec73cb.00000022/head v 5330'782762) -4> 2015-10-09 08:50:24.477057 7f525d9aa700 5 -- op tracker -- seq: 68376217, time: 2015-10-09 08:50:24.477057, event: write_thread_in_journal_b uffer, op: osd_repop(client.205172.0:507328 1.1f1 1/3e80a1f1/10003ec73cb.00000022/head v 5330'782762) -3> 2015-10-09 08:50:24.486811 7f525d1a9700 5 -- op tracker -- seq: 68376217, time: 2015-10-09 08:50:24.486811, event: journaled_completion_queu ed, op: osd_repop(client.205172.0:507328 1.1f1 1/3e80a1f1/10003ec73cb.00000022/head v 5330'782762) -2> 2015-10-09 08:50:24.486851 7f525a9a4700 5 -- op tracker -- seq: 68376217, time: 2015-10-09 08:50:24.486851, event: commit_sent, op: osd_repo p(client.205172.0:507328 1.1f1 1/3e80a1f1/10003ec73cb.00000022/head v 5330'782762) -1> 2015-10-09 08:50:24.486902 7f525a9a4700 1 -- 10.143.16.19:6806/1035881 --> 10.143.16.13:6803/28439 -- osd_repop_reply(client.205172.0:507328 1.1f1 ondisk, result = 0) v1 -- ?+0 0xd945100 con 0x11759b80 0> 2015-10-09 08:50:24.489418 7f522bba7700 -1 *** Caught signal (Aborted) ** in thread 7f522bba7700 ceph version 9.0.3 (7295612d29f953f46e6e88812ef372b89a43b9da) 1: /usr/bin/ceph-osd() [0xb476d2] 2: (()+0xf130) [0x7f526c476130] 3: (gsignal()+0x37) [0x7f526ac545d7] 4: (abort()+0x148) [0x7f526ac55cc8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f526b5589b5] 6: (()+0x5e926) [0x7f526b556926] 7: (()+0x5e953) [0x7f526b556953] 8: (()+0x5eb73) [0x7f526b556b73] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xc4bc1a] 10: (Thread::create(unsigned long)+0x8a) [0xc2f06a] 11: (Pipe::accept()+0x37db) [0xd36edb] 12: (Pipe::reader()+0x193d) [0xd3a95d] 13: (Pipe::Reader::entry()+0xd) [0xd3d52d] 14: (()+0x7df5) [0x7f526c46edf5] 15: (clone()+0x6d) [0x7f526ad151ad] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -4> 2015-10-09 09:00:50.862266 7fa38872b700 5 -- op tracker -- seq: 51428597, time: 2015-10-09 09:00:50.862266, event: started, op: osd_sub_op(u nknown.0.0:0 2.60es7 MIN [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[]) -3> 2015-10-09 09:00:50.862351 7fa38872b700 1 -- 10.143.16.19:6814/36794 --> 10.143.16.18:6820/36052 -- osd_sub_op_reply(unknown.0.0:0 2.60es0 M IN [scrub-reserve] ack, result = 0) v2 -- ?+1 0x56420b00 con 0x3ea10f20 -2> 2015-10-09 09:00:50.862432 7fa38872b700 5 -- op tracker -- seq: 51428597, time: 2015-10-09 09:00:50.862432, event: done, op: osd_sub_op(unkn own.0.0:0 2.60es7 MIN [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[]) -1> 2015-10-09 09:00:50.862508 7fa38872b700 5 -- op tracker -- seq: 51428589, time: 2015-10-09 09:00:50.862508, event: reached_pg, op: MOSDPGPus h(2.82s3 5458 [PushOp(2/82680082/100030d21bb.00000023/head, version: 4550'40080, data_included: [0~419744], data_size: 419744, omap_header_size: 0, o map_entries_size: 0, attrset_size: 3, recovery_info: ObjectRecoveryInfo(2/82680082/100030d21bb.00000023/head@4550'40080, copy_subset: [], clone_subse t: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4197440, data_complete:true, omap_recovered_to:, omap_complete:true), before _progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:true)),PushOp(2/63680082/1000214 a741.0000032f/head, version: 3287'22904, data_included: [0~419744], data_size: 419744, omap_header_size: 0, omap_entries_size: 0, attrset_size: 3, re covery_info: ObjectRecoveryInfo(2/63680082/1000214a741.0000032f/head@3287'22904, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryPr ogress(!first, data_recovered_to:4197440, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:true)),PushOp(2/55680082/1000295f89d.00000000/head, version: 3805'26247, data_included: [0~832], data_size: 832, omap_header_size: 0, omap_entries_size: 0, attrset_size: 5, recovery_info: ObjectRecoveryInfo(2/55680082/100 0295f89d.00000000/head@3805'26247, copy_subset: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:8320, data_co mplete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_r ecovered_to:, omap_complete:true)),PushOp(2/e5680082/10001f0405a.000029f8/head, version: 2454'15544, data_included: [0~419744], data_size: 419744, om ap_header_size: 0, omap_entries_size: 0, attrset_size: 3, recovery_info: ObjectRecoveryInfo(2/e5680082/10001f0405a.000029f8/head@2454'15544, copy_sub set: [], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4197440, data_complete:true, omap_recovered_to:, omap_co mplete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:true)),PushO p(2/89680082/100032cfb43.00000000/head, version: 4550'44639, data_included: [0~11648], data_size: 11648, omap_header_size: 0, omap_entries_size: 0, a ttrset_size: 5, recovery_info: ObjectRecoveryInfo(2/89680082/100032cfb43.00000000/head@4550'44639, copy_subset: [], clone_subset: {}), after_progress : ObjectRecoveryProgress(!first, data_recovered_to:116480, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecove ryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:true))]) 0> 2015-10-09 09:00:50.863997 7fa376394700 -1 *** Caught signal (Aborted) ** in thread 7fa376394700 ceph version 9.0.3 (7295612d29f953f46e6e88812ef372b89a43b9da) 1: /usr/bin/ceph-osd() [0xb476d2] 2: (()+0xf130) [0x7fa3ab463130] 3: (gsignal()+0x37) [0x7fa3a9c415d7] 4: (abort()+0x148) [0x7fa3a9c42cc8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fa3aa5459b5] 6: (()+0x5e926) [0x7fa3aa543926] 7: (()+0x5e953) [0x7fa3aa543953] 8: (()+0x5eb73) [0x7fa3aa543b73] 9: (ceph::buffer::create_aligned(unsigned int, unsigned int)+0x1fa) [0xc5433a] 10: (Pipe::read_message(Message**, AuthSessionHandler*)+0x22bd) [0xd26d7d] 11: (Pipe::reader()+0xa91) [0xd39ab1] 12: (Pipe::Reader::entry()+0xd) [0xd3d52d] 13: (()+0x7df5) [0x7fa3ab45bdf5] 14: (clone()+0x6d) [0x7fa3a9d021ad] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Files
Updated by Kenneth Waegeman over 8 years ago
- File cephlog1.tar.gz cephlog1.tar.gz added
I can't attach all logs (only 1.2MB gzipped):
413 Request Entity Too Large
So only log of 1 osd attached
Updated by Loïc Dachary over 8 years ago
- Status changed from New to Rejected
The only way ceph::buffer::create_aligned can fail is if memory allocation fails. I think the failures you saw was because there was not enough memory on the host. If you disagree, please let me know :-)
Actions