Project

General

Profile

Actions

Bug #36232

open

node crash && bad crc in data ->kernel: libceph: socket closed (con state OPEN)

Added by lgb bin over 5 years ago. Updated over 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

ceph: Luminous12.2.7
os: Linux version 3.10.0-862.11.6.el7.x86_64
cluster: 5 nodes

phenomenon:
1:after 20 days, one node crash, here is the vmcore-dmesg.txt.
2:two nodes sometimes have “kernel: libceph: osdxx socket closed (con state OPEN)” in /var/log/messages
and “bad crc in data” in ceph-osdxx.log. is that a problem of network?

vmcore-dmesg.txt:
984179.687002] kernel BUG at kernel/hrtimer.c:1238!
[984179.687048] invalid opcode: 0000 [#1] SMP
[984179.687093] Modules linked in: ebt_arp ebt_among ip6table_raw nf_conntrack_ipv6 nf_defrag_ipv6 xt_mac xt_comment xt_physdev xt_set xt_multiport ip_set_hash_net ip_set nfnetlink vhost_net vhost macvtap macvlan nfsv3 ceph libceph iptable_raw xt_addrtype overlay xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcsec_gss_krb5 nfsv4 dns_resolver nfs br_netfilter bridge stp fscache llc dm_mirror dm_region_hash dm_log dm_mod sb_edac coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg joydev mei_me i2c_i801 iTCO_wdt

[984179.687895] iTCO_vendor_support mei mxm_wmi pcspkr lpc_ich shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ast drm_kms_helper mlx5_core syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci igb mlxfw crct10dif_pclmul libahci crct10dif_common crc32c_intel devlink libata megaraid_sas ptp pps_core dca i2c_algo_bit i2c_core

[984179.688345] CPU: 14 PID: 108583 Comm: kworker/14:2 Kdump: loaded Not tainted 3.10.0-862.11.6.el7.x86_64 #1
[984179.688428] Hardware name: Sugon I620-G20/60G24-US, BIOS 222 11/01/2016
[984179.688504] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[984179.688557] task: ffff8d41e67fcf10 ti: ffff8d41f9dd0000 task.ti: ffff8d41f9dd0000
[984179.688623] RIP: 0010:[<ffffffff936c22ac>] [<ffffffff936c22ac>] __hrtimer_run_queues+0x25c/0x260
[984179.688711] RSP: 0018:ffff8d41fee03f28 EFLAGS: 00010002
[984179.688760] RAX: 0000000000000001 RBX: ffff8d37cacc1c10 RCX: 0000000000000001
[984179.688824] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff8d41fee139e0
[984179.688888] RBP: ffff8d41fee03f70 R08: 0000000000000101 R09: 0000000000000006
[984179.688951] R10: 0000000000000000 R11: ffff8d41fee03de8 R12: ffff8d41fee139e0
[984179.689014] R13: ffff8d41fee13a20 R14: 0000000000000001 R15: ffff8d41fee13b18
[984179.689079] FS: 0000000000000000(0000) GS:ffff8d41fee00000(0000) knlGS:0000000000000000
[984179.689151] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[984179.689203] CR2: 000007fffffd6478 CR3: 0000001000752000 CR4: 00000000003627e0
[984179.689267] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[984179.689331] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[984179.689394] Call Trace:
[984179.689420] <IRQ>
[984179.689448] [<ffffffff936c26bf>] hrtimer_interrupt+0xaf/0x1d0
[984179.689510] [<ffffffff9365967b>] local_apic_timer_interrupt+0x3b/0x60
[984179.689572] [<ffffffff93d2a083>] smp_apic_timer_interrupt+0x43/0x60
[984179.689637] [<ffffffff93d267b2>] apic_timer_interrupt+0x162/0x170
[984179.689692] <EOI>
[984179.689720] [<ffffffff93d1b2c5>] ? _raw_spin_unlock_irqrestore+0x15/0x20
[984179.689784] [<ffffffff936a766d>] mod_timer+0x14d/0x230
[984179.689836] [<ffffffff93bd59c8>] sk_reset_timer+0x18/0x30
[984179.689890] [<ffffffff93c4fbde>] tcp_rearm_rto+0x7e/0x100
[984179.689943] [<ffffffff93c533b8>] tcp_event_new_data_sent+0xa8/0xb0
[984179.690002] [<ffffffff93c55245>] tcp_write_xmit+0x185/0xd00
[984179.690057] [<ffffffff93c5603e>] __tcp_push_pending_frames+0x2e/0xc0
[984179.690117] [<ffffffff93c442ec>] tcp_push+0xec/0x120
[984179.692464] [<ffffffff93c47b67>] tcp_sendpage+0x527/0x5c0
[984179.694809] [<ffffffff93c73a60>] inet_sendpage+0x70/0xe0
[984179.697143] [<ffffffff93bd1407>] ? kernel_sendmsg+0x37/0x50
[984179.699474] [<ffffffff93bd06ae>] kernel_sendpage+0x1e/0x30
[984179.701797] [<ffffffffc09977ec>] ceph_tcp_sendpage+0x4c/0xf0 [libceph]
[984179.704120] [<ffffffffc0998a73>] try_write+0x153/0xe90 [libceph]
[984179.706431] [<ffffffff9362a59e>] ? __switch_to+0xce/0x580
[984179.708698] [<ffffffffc099ade9>] ceph_con_workfn+0xc9/0x670 [libceph]
[984179.710915] [<ffffffff936b613f>] process_one_work+0x17f/0x440
[984179.713075] [<ffffffff936b71d6>] worker_thread+0x126/0x3c0
[984179.715173] [<ffffffff936b70b0>] ? manage_workers.isra.24+0x2a0/0x2a0
[984179.717222] [<ffffffff936bdf21>] kthread+0xd1/0xe0
[984179.719204] [<ffffffff936bde50>] ? insert_kthread_work+0x40/0x40
[984179.721139] [<ffffffff93d255f7>] ret_from_fork_nospec_begin+0x21/0x21
[984179.723020] [<ffffffff936bde50>] ? insert_kthread_work+0x40/0x40
[984179.724841] Code: d8 fe ff ff 0f 1f 00 48 8b 4b 28 48 8b 53 48 4c 8d 43 50 8b 73 40 45 31 c9 48 89 df e8 0e 3a 04 00 e9 64 fe ff ff e8 84 20 fd ff <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 55 48 8d 75 d8 41 54
[984179.728675] RIP [<ffffffff936c22ac>] __hrtimer_run_queues+0x25c/0x260
[984179.730484] RSP <ffff8d41fee03f28>

Actions #1

Updated by Greg Farnum over 5 years ago

  • Project changed from Ceph to Linux kernel client
Actions

Also available in: Atom PDF