From James Eckersall, "GPF kernel panics" on ceph-users.

I've had a fun time with ceph this week.
We have a cluster with 4 OSD (20 OSD's per) servers, 3 mons and a server mapping ~200 rbd's and presenting cifs shares.

We're using cephx and the export node has its own cephx auth key.

I made a change to the key last week, adding rwx access to another pool.

Since that point, we had sporadic kernel panics of the export node.

It got to the point where it would barely finish booting up and would panic.

Once I removed the extra pool I had added to the auth key, it hasn't crashed again.

I'm a bit concerned that a change to an auth key can cause this type of crash.
There were no log entries on mon/osd/export node regarding the key at all, so it was only by searching my memory for what had changed that allowed me to resolve the problem.

From what I could tell from the key, the format was correct and the pool that I added did exist, so I am confused as to how this would have caused kernel panics.

The rbd's are mapped by the rbdmap service on boot.
All our ceph servers are running Ubuntu 14.04 (kernel 3.13.0-30-generic). Ceph packages are from the Ubuntu repos, version 0.80.1-0ubuntu1.1.
I should have probably mentioned this info in the initial mail :)

This problem also seemed to get gradually worse over time.
We had a couple of sporadic crashes at the start of the week, escalating to the node being unable to stay up for more than a couple of minutes before panicking.

[   32.713504] general protection fault: 0000 [#1] SMP 
[   32.724718] Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables rbd libceph libcrc32c gpio_ich dcdbas intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul joydev crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core shpchp lpc_ich mei_me mei wmi ipmi_si mac_hid acpi_power_meter 8021q garp stp mrp llc bonding lp parport nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache hid_generic igb ixgbe i2c_algo_bit usbhid dca hid ptp ahci libahci pps_core megaraid_sas mdio
[   32.843936] CPU: 18 PID: 5030 Comm: tr Not tainted 3.13.0-30-generic #54-Ubuntu
[   32.860163] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 1.6.0 03/07/2013
[   32.876774] task: ffff880417b15fc0 ti: ffff8804273f4000 task.ti: ffff8804273f4000
[   32.893384] RIP: 0010:[<ffffffff811a19c5>]  [<ffffffff811a19c5>] kmem_cache_alloc+0x75/0x1e0
[   32.912198] RSP: 0018:ffff8804273f5d40  EFLAGS: 00010286
[   32.924015] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000011ed
[   32.939856] RDX: 00000000000011ec RSI: 00000000000080d0 RDI: ffff88042f803700
[   32.955696] RBP: ffff8804273f5d70 R08: 0000000000017260 R09: ffffffff811be63c
[   32.971559] R10: 8080808080808080 R11: 0000000000000000 R12: 7d10f8ec0c3cb928
[   32.987421] R13: 00000000000080d0 R14: ffff88042f803700 R15: ffff88042f803700
[   33.003284] FS:  0000000000000000(0000) GS:ffff88042fd20000(0000) knlGS:0000000000000000
[   33.021281] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   33.034068] CR2: 00007f01a8fced40 CR3: 000000040e52f000 CR4: 00000000000407e0
[   33.049929] Stack:
[   33.054456]  ffffffff811be63c 0000000000000000 ffff88041be52780 ffff880428052000
[   33.071259]  ffff8804273f5f2c 00000000ffffff9c ffff8804273f5d98 ffffffff811be63c
[   33.088084]  0000000000000080 ffff8804273f5f2c ffff8804273f5e40 ffff8804273f5e30
[   33.104908] Call Trace:
[   33.110399]  [<ffffffff811be63c>] ? get_empty_filp+0x5c/0x180
[   33.123188]  [<ffffffff811be63c>] get_empty_filp+0x5c/0x180
[   33.135593]  [<ffffffff811cc03d>] path_openat+0x3d/0x620
[   33.147422]  [<ffffffff811cd47a>] do_filp_open+0x3a/0x90
[   33.159250]  [<ffffffff811a1985>] ? kmem_cache_alloc+0x35/0x1e0
[   33.172405]  [<ffffffff811cc6bf>] ? getname_flags+0x4f/0x190
[   33.185004]  [<ffffffff811da237>] ? __alloc_fd+0xa7/0x130
[   33.197025]  [<ffffffff811bbb99>] do_sys_open+0x129/0x280
[   33.209049]  [<ffffffff81020d25>] ? syscall_trace_enter+0x145/0x250
[   33.222992]  [<ffffffff811bbd0e>] SyS_open+0x1e/0x20
[   33.234053]  [<ffffffff8172aeff>] tracesys+0xe1/0xe6
[   33.245112] Code: dc 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 17 01 00 00 48 85 c0 0f 84 0e 01 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63 
[   33.292549] RIP  [<ffffffff811a19c5>] kmem_cache_alloc+0x75/0x1e0
[   33.306192]  RSP <ffff8804273f5d40>

#1 Updated by Sage Weil over 9 years ago

The auth caps were as follows:

caps: [mon] allow r
caps: [osd] allow rwx pool=hosting_windows_sharedweb, allow rwx
pool=infra_systems, allow rwx pool=hosting_linux_sharedweb

I changed them (just adding a pool to the list) to:

caps: [mon] allow r
caps: [osd] allow rwx pool=hosting_windows_sharedweb, allow rwx
pool=infra_systems, allow rwx pool=hosting_linux_sharedweb, allow rwx pool=test

I suspect a simple buffer overflow on the auth ticket size ...

#2 Updated by Sage Weil over 9 years ago



#3 Updated by Sage Weil over 9 years ago

pushed wip-8979 which removes the fixed buffer size. but, we still need to make things not crash when the auth reply processing fails. that could still happen if we get a huge ticket (>4k) and kmalloc fails on a large page size. or the auth reply from the mon is simply not understood by the client.

#4 Updated by Sage Weil about 9 years ago

  • Status changed from New to Pending Backport

#5 Updated by Ilya Dryomov about 9 years ago

  • Status changed from Pending Backport to Resolved

Landed in 3.17-rc5. Opened #9560 and #9561 for the issues mentioned above.

