Project

General

Profile

Actions

Bug #13712

closed

Page allocation failure in Kernel 4.2.3.

Added by Jérôme Poulin over 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Maybe related to issue #8000, we updated Debian from Wheezy to Jessie, we were using the backport Kernel version 3.16, after upgrading, we updated to backport Kernel version 4.2.3 to get more accurate BTRFS quotas. After a week of use, we encountered 2 kernel panics on this machine, all having RBD in the stack trace.

Kernel panic is ultimately caused by AppArmor which tries to read capabilities from the VFS which tries to read from BTRFS which is blocked writing to RBD.

Nov  6 09:46:43 CASRV0104 kernel: [692354.650311] kworker/3:1: page allocation failure: order:1, mode:0x204020
Nov  6 09:46:43 CASRV0104 kernel: [692354.650316] CPU: 3 PID: 11596 Comm: kworker/3:1 Not tainted 4.2.0-0.bpo.1-amd64 #1 Debian 4.2.3-2~bpo8+1
Nov  6 09:46:43 CASRV0104 kernel: [692354.650318] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.5.2 10/15/2010
Nov  6 09:46:43 CASRV0104 kernel: [692354.650325] Workqueue: rbd rbd_queue_workfn [rbd]
Nov  6 09:46:43 CASRV0104 kernel: [692354.650327]  0000000000000000 0000000000000001 ffffffff8154dbb1 0000000000204020
Nov  6 09:46:43 CASRV0104 kernel: [692354.650329]  ffffffff8115508f ffff88043fff0b00 0000000000000000 0000000000000001
Nov  6 09:46:43 CASRV0104 kernel: [692354.650331]  0000000100000000 0000000000000000 ffff88043fff2fa8 0000000000000046
Nov  6 09:46:43 CASRV0104 kernel: [692354.650334] Call Trace:
Nov  6 09:46:43 CASRV0104 kernel: [692354.650340]  [<ffffffff8154dbb1>] ? dump_stack+0x40/0x50
Nov  6 09:46:43 CASRV0104 kernel: [692354.650344]  [<ffffffff8115508f>] ? warn_alloc_failed+0xcf/0x130
Nov  6 09:46:43 CASRV0104 kernel: [692354.650347]  [<ffffffff811587e0>] ? __alloc_pages_nodemask+0x2d0/0xa00
Nov  6 09:46:43 CASRV0104 kernel: [692354.650351]  [<ffffffff811a0691>] ? kmem_getpages+0x61/0x110
Nov  6 09:46:43 CASRV0104 kernel: [692354.650353]  [<ffffffff811a235c>] ? fallback_alloc+0x14c/0x200
Nov  6 09:46:43 CASRV0104 kernel: [692354.650356]  [<ffffffff811a3ec4>] ? kmem_cache_alloc+0x1f4/0x440
Nov  6 09:46:43 CASRV0104 kernel: [692354.650367]  [<ffffffffa07c3e95>] ? ceph_osdc_alloc_request+0x55/0x250 [libceph]
Nov  6 09:46:43 CASRV0104 kernel: [692354.650371]  [<ffffffffa0809935>] ? rbd_osd_req_create.isra.26+0x55/0x1b0 [rbd]
Nov  6 09:46:43 CASRV0104 kernel: [692354.650374]  [<ffffffffa080c20f>] ? rbd_img_request_fill+0x22f/0x850 [rbd]
Nov  6 09:46:43 CASRV0104 kernel: [692354.650378]  [<ffffffffa080d90f>] ? rbd_queue_workfn+0x2bf/0x3c0 [rbd]
Nov  6 09:46:43 CASRV0104 kernel: [692354.650382]  [<ffffffff8108671a>] ? process_one_work+0x14a/0x3d0
Nov  6 09:46:43 CASRV0104 kernel: [692354.650385]  [<ffffffff81087105>] ? worker_thread+0x65/0x470
Nov  6 09:46:43 CASRV0104 kernel: [692354.650387]  [<ffffffff810870a0>] ? rescuer_thread+0x2f0/0x2f0
Nov  6 09:46:43 CASRV0104 kernel: [692354.650389]  [<ffffffff8108c583>] ? kthread+0xd3/0xf0
Nov  6 09:46:43 CASRV0104 kernel: [692354.650391]  [<ffffffff8108c4b0>] ? kthread_create_on_node+0x170/0x170
Nov  6 09:46:43 CASRV0104 kernel: [692354.650394]  [<ffffffff8155381f>] ? ret_from_fork+0x3f/0x70
Nov  6 09:46:43 CASRV0104 kernel: [692354.650396]  [<ffffffff8108c4b0>] ? kthread_create_on_node+0x170/0x170
Nov  6 09:46:43 CASRV0104 kernel: [692354.650397] Mem-Info:
Nov  6 09:46:43 CASRV0104 kernel: [692354.650401] active_anon:1750510 inactive_anon:237494 isolated_anon:0
Nov  6 09:46:43 CASRV0104 kernel: [692354.650401]  active_file:915398 inactive_file:938093 isolated_file:0
Nov  6 09:46:43 CASRV0104 kernel: [692354.650401]  unevictable:1 dirty:31831 writeback:256 unstable:0
Nov  6 09:46:43 CASRV0104 kernel: [692354.650401]  slab_reclaimable:174720 slab_unreclaimable:17676
Nov  6 09:46:43 CASRV0104 kernel: [692354.650401]  mapped:17254 shmem:26883 pagetables:10815 bounce:0
Nov  6 09:46:43 CASRV0104 kernel: [692354.650401]  free:22847 free_pcp:1043 free_cma:0
Nov  6 09:46:43 CASRV0104 kernel: [692354.650405] Node 0 DMA free:15900kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov  6 09:46:43 CASRV0104 kernel: [692354.650409] lowmem_reserve[]: 0 2969 16020 16020
Nov  6 09:46:43 CASRV0104 kernel: [692354.650412] Node 0 DMA32 free:56132kB min:2988kB low:3732kB high:4480kB active_anon:1145880kB inactive_anon:310196kB active_file:648040kB inactive_file:734252kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3119716kB managed:3043968kB mlocked:0kB dirty:21844kB writeback:440kB mapped:8536kB shmem:2084kB slab_reclaimable:97944kB slab_unreclaimable:16276kB kernel_stack:2400kB pagetables:9744kB unstable:0kB bounce:0kB free_pcp:2264kB local_pcp:364kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? no
Nov  6 09:46:43 CASRV0104 kernel: [692354.650417] lowmem_reserve[]: 0 0 13050 13050
Nov  6 09:46:43 CASRV0104 kernel: [692354.650419] Node 0 Normal free:19356kB min:13148kB low:16432kB high:19720kB active_anon:5856160kB inactive_anon:639780kB active_file:3013552kB inactive_file:3017944kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13364192kB mlocked:4kB dirty:105480kB writeback:584kB mapped:60480kB shmem:105448kB slab_reclaimable:600936kB slab_unreclaimable:54428kB kernel_stack:5008kB pagetables:33516kB unstable:0kB bounce:0kB free_pcp:1880kB local_pcp:668kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Nov  6 09:46:43 CASRV0104 kernel: [692354.650424] lowmem_reserve[]: 0 0 0 0
Nov  6 09:46:43 CASRV0104 kernel: [692354.650426] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
Nov  6 09:46:43 CASRV0104 kernel: [692354.650436] Node 0 DMA32: 13976*4kB (UEM) 44*8kB (UEM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 56256kB
Nov  6 09:46:43 CASRV0104 kernel: [692354.650443] Node 0 Normal: 4624*4kB (UEM) 83*8kB (UEM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19160kB
Nov  6 09:46:43 CASRV0104 kernel: [692354.650450] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov  6 09:46:43 CASRV0104 kernel: [692354.650451] 1889223 total pagecache pages
Nov  6 09:46:43 CASRV0104 kernel: [692354.650453] 8854 pages in swap cache
Nov  6 09:46:43 CASRV0104 kernel: [692354.650454] Swap cache stats: add 998038, delete 989184, find 4654933/4901647
Nov  6 09:46:43 CASRV0104 kernel: [692354.650455] Free swap  = 43508kB
Nov  6 09:46:43 CASRV0104 kernel: [692354.650456] Total swap = 974844kB
Nov  6 09:46:43 CASRV0104 kernel: [692354.650457] 4191797 pages RAM
Nov  6 09:46:43 CASRV0104 kernel: [692354.650458] 0 pages HighMem/MovableOnly
Nov  6 09:46:43 CASRV0104 kernel: [692354.650459] 85782 pages reserved
Nov  6 09:46:43 CASRV0104 kernel: [692354.650459] 0 pages hwpoisoned
Nov  6 09:46:43 CASRV0104 kernel: [692354.650483] rbd: rbd0: write 40000 at 41e5268000 result -12
Nov  6 09:46:43 CASRV0104 kernel: [692354.650485] blk_update_request: I/O error, dev rbd0, sector 552768320
Nov  6 09:47:14 CASRV0104 kernel: [692385.095115] vmalloc: allocation failure: 18446744072439583573 bytes
Nov  6 09:47:14 CASRV0104 kernel: [692385.095118] kworker/2:1: page allocation failure: order:0, mode:0x22
Nov  6 09:47:14 CASRV0104 kernel: [692385.095121] CPU: 2 PID: 22092 Comm: kworker/2:1 Not tainted 4.2.0-0.bpo.1-amd64 #1 Debian 4.2.3-2~bpo8+1
Nov  6 09:47:14 CASRV0104 kernel: [692385.095123] Hardware name: Dell Inc. PowerEdge R310/05XKKK, BIOS 1.5.2 10/15/2010
Nov  6 09:47:14 CASRV0104 kernel: [692385.095129] Workqueue: rbd rbd_queue_workfn [rbd]
Nov  6 09:47:14 CASRV0104 kernel: [692385.095132]  0000000000000000 0000000000000000 ffffffff8154dbb1 0000000000000022
Nov  6 09:47:14 CASRV0104 kernel: [692385.095134]  ffffffff8115508f 0000000000000000 ffffffff818000f8 ffff88001194bb00
Nov  6 09:47:14 CASRV0104 kernel: [692385.095136]  0000000000000018 ffff88001194bb78 ffff88001194bb18 0000000000000000
Nov  6 09:47:14 CASRV0104 kernel: [692385.095138] Call Trace:
Nov  6 09:47:14 CASRV0104 kernel: [692385.095144]  [<ffffffff8154dbb1>] ? dump_stack+0x40/0x50
Nov  6 09:47:14 CASRV0104 kernel: [692385.095148]  [<ffffffff8115508f>] ? warn_alloc_failed+0xcf/0x130
Nov  6 09:47:14 CASRV0104 kernel: [692385.095151]  [<ffffffff8118acdb>] ? __vmalloc_node_range+0x22b/0x2b0
Nov  6 09:47:14 CASRV0104 kernel: [692385.095153]  [<ffffffff8118ada7>] ? __vmalloc+0x47/0x50
Nov  6 09:47:14 CASRV0104 kernel: [692385.095161]  [<ffffffffa07ba147>] ? ceph_kvmalloc+0x47/0x50 [libceph]
Nov  6 09:47:14 CASRV0104 kernel: [692385.095165]  [<ffffffffa07ba147>] ? ceph_kvmalloc+0x47/0x50 [libceph]
Nov  6 09:47:14 CASRV0104 kernel: [692385.095171]  [<ffffffffa07bca6c>] ? ceph_msg_new+0xbc/0x170 [libceph]
Nov  6 09:47:14 CASRV0104 kernel: [692385.095178]  [<ffffffffa07c4062>] ? ceph_osdc_alloc_request+0x222/0x250 [libceph]
Nov  6 09:47:14 CASRV0104 kernel: [692385.095181]  [<ffffffffa0809935>] ? rbd_osd_req_create.isra.26+0x55/0x1b0 [rbd]
Nov  6 09:47:14 CASRV0104 kernel: [692385.095185]  [<ffffffffa080c20f>] ? rbd_img_request_fill+0x22f/0x850 [rbd]
Nov  6 09:47:14 CASRV0104 kernel: [692385.095189]  [<ffffffffa080d90f>] ? rbd_queue_workfn+0x2bf/0x3c0 [rbd]
Nov  6 09:47:14 CASRV0104 kernel: [692385.095192]  [<ffffffff8108671a>] ? process_one_work+0x14a/0x3d0
Nov  6 09:47:14 CASRV0104 kernel: [692385.095194]  [<ffffffff81087105>] ? worker_thread+0x65/0x470
Nov  6 09:47:14 CASRV0104 kernel: [692385.095196]  [<ffffffff8154f38f>] ? __schedule+0x28f/0x8b0
Nov  6 09:47:14 CASRV0104 kernel: [692385.095198]  [<ffffffff810870a0>] ? rescuer_thread+0x2f0/0x2f0
Nov  6 09:47:14 CASRV0104 kernel: [692385.095200]  [<ffffffff8108c583>] ? kthread+0xd3/0xf0
Nov  6 09:47:14 CASRV0104 kernel: [692385.095202]  [<ffffffff8108c4b0>] ? kthread_create_on_node+0x170/0x170
Nov  6 09:47:14 CASRV0104 kernel: [692385.095205]  [<ffffffff8155381f>] ? ret_from_fork+0x3f/0x70
Nov  6 09:47:14 CASRV0104 kernel: [692385.095206]  [<ffffffff8108c4b0>] ? kthread_create_on_node+0x170/0x170
Nov  6 09:47:14 CASRV0104 kernel: [692385.095208] Mem-Info:
Nov  6 09:47:14 CASRV0104 kernel: [692385.095212] active_anon:1751330 inactive_anon:237499 isolated_anon:0
Nov  6 09:47:14 CASRV0104 kernel: [692385.095212]  active_file:916518 inactive_file:936558 isolated_file:0
Nov  6 09:47:14 CASRV0104 kernel: [692385.095212]  unevictable:1 dirty:33367 writeback:219 unstable:0
Nov  6 09:47:14 CASRV0104 kernel: [692385.095212]  slab_reclaimable:174302 slab_unreclaimable:17722
Nov  6 09:47:14 CASRV0104 kernel: [692385.095212]  mapped:17266 shmem:26885 pagetables:11083 bounce:0
Nov  6 09:47:14 CASRV0104 kernel: [692385.095212]  free:22950 free_pcp:870 free_cma:0
Nov  6 09:47:14 CASRV0104 kernel: [692385.095215] Node 0 DMA free:15900kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov  6 09:47:14 CASRV0104 kernel: [692385.095220] lowmem_reserve[]: 0 2969 16020 16020
Nov  6 09:47:14 CASRV0104 kernel: [692385.095222] Node 0 DMA32 free:55796kB min:2988kB low:3732kB high:4480kB active_anon:1145880kB inactive_anon:310196kB active_file:651900kB inactive_file:732276kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3119716kB managed:3043968kB mlocked:0kB dirty:27320kB writeback:0kB mapped:8536kB shmem:2084kB slab_reclaimable:97936kB slab_unreclaimable:16320kB kernel_stack:2208kB pagetables:9744kB unstable:0kB bounce:0kB free_pcp:1960kB local_pcp:376kB free_cma:0kB writeback_tmp:0kB pages_scanned:760 all_unreclaimable? no
Nov  6 09:47:14 CASRV0104 kernel: [692385.095227] lowmem_reserve[]: 0 0 13050 13050
Nov  6 09:47:14 CASRV0104 kernel: [692385.095229] Node 0 Normal free:20104kB min:13148kB low:16432kB high:19720kB active_anon:5859440kB inactive_anon:639800kB active_file:3014172kB inactive_file:3013956kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13364192kB mlocked:4kB dirty:106148kB writeback:876kB mapped:60528kB shmem:105456kB slab_reclaimable:599272kB slab_unreclaimable:54568kB kernel_stack:5008kB pagetables:34588kB unstable:0kB bounce:0kB free_pcp:1520kB local_pcp:188kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Nov  6 09:47:14 CASRV0104 kernel: [692385.095234] lowmem_reserve[]: 0 0 0 0
Nov  6 09:47:14 CASRV0104 kernel: [692385.095236] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
Nov  6 09:47:14 CASRV0104 kernel: [692385.095246] Node 0 DMA32: 13919*4kB (UEM) 13*8kB (UEM) 1*16kB (E) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55796kB
Nov  6 09:47:14 CASRV0104 kernel: [692385.095253] Node 0 Normal: 5046*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20184kB
Nov  6 09:47:14 CASRV0104 kernel: [692385.095260] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov  6 09:47:14 CASRV0104 kernel: [692385.095261] 1888830 total pagecache pages
Nov  6 09:47:14 CASRV0104 kernel: [692385.095263] 8856 pages in swap cache
Nov  6 09:47:14 CASRV0104 kernel: [692385.095264] Swap cache stats: add 998051, delete 989195, find 4655306/4902030
Nov  6 09:47:14 CASRV0104 kernel: [692385.095265] Free swap  = 43552kB
Nov  6 09:47:14 CASRV0104 kernel: [692385.095266] Total swap = 974844kB
Nov  6 09:47:14 CASRV0104 kernel: [692385.095267] 4191797 pages RAM
Nov  6 09:47:14 CASRV0104 kernel: [692385.095268] 0 pages HighMem/MovableOnly
Nov  6 09:47:14 CASRV0104 kernel: [692385.095268] 85782 pages reserved
Nov  6 09:47:14 CASRV0104 kernel: [692385.095269] 0 pages hwpoisoned
Nov  6 09:47:14 CASRV0104 kernel: [692385.095273] rbd: rbd0: write 40000 at 425f19f000 result -12
Nov  6 09:47:14 CASRV0104 kernel: [692385.095275] blk_update_request: I/O error, dev rbd0, sector 556764408
Nov  6 09:47:14 CASRV0104 kernel: [692385.173843] vmalloc: allocation failure: 18446744072439583573 bytes
Nov  6 09:47:14 CASRV0104 kernel: [692385.173847] kworker/2:1: page allocation failure: order:0, mode:0x22
...
Nov  6 09:47:14 CASRV0104 kernel: [692385.821256] rbd: rbd0: write 4000 at 55b9c98000 result -12
Nov  6 09:47:14 CASRV0104 kernel: [692385.821257] blk_update_request: I/O error, dev rbd0, sector 719119552
Nov  6 09:47:14 CASRV0104 kernel: [692385.899612] rbd: rbd0: write 4000 at 55b9cb0000 result -12
Nov  6 09:47:14 CASRV0104 kernel: [692385.899623] rbd: rbd0: write 4000 at 55b9cc4000 result -12
Nov  6 09:47:14 CASRV0104 kernel: [692385.899630] rbd: rbd0: write c000 at 55b9cdc000 result -12
Nov  6 09:47:14 CASRV0104 kernel: [692385.899639] rbd: rbd0: write 4000 at 55b9d10000 result -12
Nov  6 09:47:14 CASRV0104 kernel: [692385.899645] rbd: rbd0: write 4000 at 5599d54000 result -12
Nov  6 09:47:14 CASRV0104 kernel: [692385.899649] rbd: rbd0: write 4000 at 5599d5c000 result -12
...

Files

dmesg.201511280201 (98.6 KB) dmesg.201511280201 Daniel Swarbrick, 12/01/2015 03:41 PM
dmesg.201512010153 (90 KB) dmesg.201512010153 Daniel Swarbrick, 12/01/2015 03:41 PM
Actions #1

Updated by Jason Dillaman over 8 years ago

  • Project changed from rbd to Linux kernel client
Actions #2

Updated by c sights over 8 years ago

hit this this backtrace also. Also debian backport kernel 4.2.3 .
Ubuntu Wiley is using 4.2.x . Might be often encountered backtrace.

Actions #3

Updated by Ilya Dryomov over 8 years ago

  • Assignee set to Ilya Dryomov

Hmm, "vmalloc: allocation failure: 18446744072439583573 bytes" stands out in the second trace. I'll try to investigate.

Updated by Daniel Swarbrick over 8 years ago

We've seen this problem repeatedly on an AMD Opteron system here, running Debian Jessie, which panics every few days. I have attached the two latest dmesg outputs, including call traces.

The crashes have so far occurred on jessie-backports kernels 4.2.3, 4.2.5 and 4.2.6.

Actions #5

Updated by Ilya Dryomov over 8 years ago

I looked into the "vmalloc: allocation failure: 18446744072439583573 bytes" and it turns out we have a buggy memory allocation failure error path - we free a data structure before we are done with it, which then leads to rbd writing to invalid memory. That's why the huge number in the vmalloc message and that's probably the reason for the crashes Daniel saw. I have a patch for that and it will be pushed to stable kernels ASAP.

The data structure I mentioned above manages snapshots. Jérôme, Daniel, do you use rbd snapshots? More specifically, do any of mapped images have snapshots?

Now, that error path is triggered by a memory allocation failure, which in itself is not a panic. It's a lot harder to triage, it could be that 4.2 is just more willing to fail allocation requests. The proper long-term fix would be to change rbd to preallocate more stuff, but that's not a trivial task, so I guess we should at least try... If one of you could describe your setup in more detail (cluster topology, number of images mapped, workload, etc) and whether you made any other changes except for upgrading the kernel, it would be a good start.

Actions #6

Updated by c sights over 8 years ago

The data structure I mentioned above manages snapshots. Jérôme, Daniel, do
you use rbd snapshots? More specifically, do any of mapped images have
snapshots?

I do not use snapshots and experienced a backtrace with similar keywords
(4.2.3) . But I don't know if my problem is exactly the same as the other
reports.

I hope others can confirm or deny using snapshots!

C.

Actions #7

Updated by Jérôme Poulin over 8 years ago

We were not using snapshots before, then after the first corruption, I created a snapshot and a clone and if anything, it just made the problem worst since we had the next corruption/panic a week later instead.

Actions #8

Updated by Daniel Swarbrick over 8 years ago

We are also not using snapshots. Just a simple RBD, formatted as ext4. We had another panic just under two days ago, but kdump-tools didn't manage to get a useful dmesg capture that time, and an incomplete core dump.

We have four Ceph nodes: three mons, three OSD nodes (two of which are the same as mon hosts), with qemu/kvm VMs (using librbd) also running on them. It's not an ideal setup, but our hardware at this site is limited. We only have one host (outside the Ceph cluster) which is mapping a krbd, and that host is the one with kernel panics. The host is running Bacula, writing to a local LVM2 volume on directly attached RAID. We are using a single krbd mount for spooling Postgres transaction logs to (each 16 MB), from the Bacula catalog DB. It is not under any significant load, and only writes to the single krbd mount for a couple of hours every 24 hours. The panics have always been during this period, but most of the time, it makes it through unscathed.

Actions #9

Updated by Ilya Dryomov over 8 years ago

The structure in question manages snapshots, but is allocated even if you have no snapshots. It just gets bigger if you are using snapshots, so you are more likely to hit a memory corruption after rbd frees it prematurely on the (unrelated) memory allocation failure error path. The patch fixing it was just merged into 4.4-rc4 ("rbd: don't put snap_context twice in rbd_queue_workfn()") and should show up in stable kernels, including a debian kernel, soon.

As for the memory allocation failure itself, I'm inclined to say that 4.2 is just more willing to fail allocations and/or pushes down requests more aggressively. That's not to say that the rest of the kernel is to blame - we will work on properly preallocating all of the things rbd needs to process a request for 4.5-4.6. In the meantime you could try cranking up sysctl vm values (e.g. vm.swappiness=1, vm.min_free_kbytes=131072) so that more memory is reserved and these failures are hopefully avoided.

Actions #10

Updated by Ilya Dryomov almost 8 years ago

  • Status changed from New to Resolved

Ah, this was a GFP_ATOMIC allocation (which is not allowed to block and therefore more likely to fail). We used to require GFP_ATOMIC, but we don't really need it anymore. "rbd: use GFP_NOIO consistently for request allocations" in 4.6-rc3 (backported to 3.18+) got rid of it. Sorry, I missed it when I originally looked at this ticket.

Actions

Also available in: Atom PDF