Project

General

Profile

Actions

Bug #56423

closed

kernel BUG at mm/filemap.c:1559!

Added by Patrick Donnelly almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

2022-06-24T00:58:15.514035+00:00 smithi067 kernel: page:00000000c9746ff1 refcount:2 mapcount:0 mapping:00000000dc2785bb index:0x1 pfn:0x141afc
2022-06-24T00:58:15.514240+00:00 smithi067 kernel: memcg:ffff88810f766000
2022-06-24T00:58:15.514297+00:00 smithi067 kernel: aops:ceph_aops [ceph] ino:100000005e7 dentry name:"postgresql-Fri.log" 
2022-06-24T00:58:15.514357+00:00 smithi067 kernel: flags: 0x5ffc000000201c(uptodate|dirty|lru|private|node=0|zone=2|lastcpupid=0x7ff)
2022-06-24T00:58:15.514408+00:00 smithi067 kernel: raw: 005ffc000000201c ffffea000a9eeb48 ffffea00060ade48 ffff888193ed8228
2022-06-24T00:58:15.514457+00:00 smithi067 kernel: raw: 0000000000000001 ffff88810cc96500 00000002ffffffff ffff88810f766000
2022-06-24T00:58:15.514509+00:00 smithi067 kernel: page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
2022-06-24T00:58:15.514558+00:00 smithi067 kernel: ------------[ cut here ]------------
2022-06-24T00:58:15.514606+00:00 smithi067 kernel: kernel BUG at mm/filemap.c:1559!
2022-06-24T00:58:15.514658+00:00 smithi067 kernel: invalid opcode: 0000 [#1] PREEMPT SMP PTI
2022-06-24T00:58:15.514708+00:00 smithi067 kernel: CPU: 4 PID: 131697 Comm: postmaster Tainted: G S                5.19.0-rc2-ceph-g822a4c74e05d #1
2022-06-24T00:58:15.514757+00:00 smithi067 kernel: Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
2022-06-24T00:58:15.514810+00:00 smithi067 kernel: RIP: 0010:folio_unlock+0x26/0x30
2022-06-24T00:58:15.514860+00:00 smithi067 kernel: Code: 00 0f 1f 00 0f 1f 44 00 00 48 8b 07 a8 01 74 0e f0 80 27 fe 78 01 c3 31 f6 e9 d6 fe ff ff 48 c7 c6 c0 81 37 82 e8 aa 64 04 00 <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 b8 01 00 00
2022-06-24T00:58:15.514909+00:00 smithi067 kernel: RSP: 0018:ffffc90004377bc8 EFLAGS: 00010246
2022-06-24T00:58:15.514958+00:00 smithi067 kernel: RAX: 000000000000003f RBX: ffff888193ed8228 RCX: 0000000000000001
2022-06-24T00:58:15.515017+00:00 smithi067 kernel: RDX: 0000000000000000 RSI: ffffffff823a3569 RDI: 00000000ffffffff
2022-06-24T00:58:15.515068+00:00 smithi067 kernel: RBP: ffffffff828a0058 R08: 0000000000000001 R09: 0000000000000001
2022-06-24T00:58:15.532749+00:00 smithi067 kernel: R10: 000000007c6b0fd2 R11: 0000000000000034 R12: 0000000000000001
2022-06-24T00:58:15.532807+00:00 smithi067 kernel: R13: 00000000fffffe00 R14: ffffea000506bf00 R15: ffff888193ed8000
2022-06-24T00:58:15.532856+00:00 smithi067 kernel: FS:  00007f4993626340(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000
2022-06-24T00:58:15.532928+00:00 smithi067 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-06-24T00:58:15.532981+00:00 smithi067 kernel: CR2: 0000555789ee8000 CR3: 000000017a52a006 CR4: 00000000001706e0
2022-06-24T00:58:15.533040+00:00 smithi067 kernel: Call Trace:
2022-06-24T00:58:15.533116+00:00 smithi067 kernel: <TASK>
2022-06-24T00:58:15.533189+00:00 smithi067 kernel: netfs_write_begin+0x130/0x950 [netfs]
2022-06-24T00:58:15.533238+00:00 smithi067 kernel: ceph_write_begin+0x46/0xd0 [ceph]
2022-06-24T00:58:15.533284+00:00 smithi067 kernel: generic_perform_write+0xef/0x200
2022-06-24T00:58:15.533350+00:00 smithi067 kernel: ? file_update_time+0xd4/0x110
2022-06-24T00:58:15.533403+00:00 smithi067 kernel: ceph_write_iter+0xb01/0xcd0 [ceph]
2022-06-24T00:58:15.533459+00:00 smithi067 kernel: ? lock_is_held_type+0xe3/0x140
2022-06-24T00:58:15.533514+00:00 smithi067 kernel: ? new_sync_write+0x106/0x180
2022-06-24T00:58:15.533563+00:00 smithi067 kernel: new_sync_write+0x106/0x180
2022-06-24T00:58:15.533611+00:00 smithi067 kernel: vfs_write+0x29a/0x3a0
2022-06-24T00:58:15.533656+00:00 smithi067 kernel: ksys_write+0x5c/0xd0
2022-06-24T00:58:15.533700+00:00 smithi067 kernel: do_syscall_64+0x34/0x80
2022-06-24T00:58:15.533744+00:00 smithi067 kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0
2022-06-24T00:58:15.533792+00:00 smithi067 kernel: RIP: 0033:0x7f49903205c8
2022-06-24T00:58:15.549754+00:00 smithi067 kernel: Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 d5 3f 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
2022-06-24T00:58:15.549826+00:00 smithi067 kernel: RSP: 002b:00007fff104bd178 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
2022-06-24T00:58:15.549884+00:00 smithi067 kernel: RAX: ffffffffffffffda RBX: 0000000000000048 RCX: 00007f49903205c8
2022-06-24T00:58:15.549933+00:00 smithi067 kernel: RDX: 0000000000000048 RSI: 000055944d3c1ea0 RDI: 000000000000000b
2022-06-24T00:58:15.549985+00:00 smithi067 kernel: RBP: 000055944d3c1ea0 R08: 000055944d3963d0 R09: 00007fff1055b080
2022-06-24T00:58:15.550036+00:00 smithi067 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000055944d3962f0
2022-06-24T00:58:15.550085+00:00 smithi067 kernel: R13: 0000000000000048 R14: 00007f49905bb880 R15: 0000000000000048
2022-06-24T00:58:15.550133+00:00 smithi067 kernel: </TASK>

From: /ceph/teuthology-archive/pdonnell-2022-06-24_00:00:32-fs:workload-wip-pdonnell-testing-20220623.153748-distro-default-smithi/6895938/remote/smithi067/syslog/kern.log.gz

Only thing interesting happening is postgres running on this kclient.

Actions #1

Updated by Xiubo Li almost 2 years ago

  • Status changed from New to In Progress
  • Assignee set to Xiubo Li
330 int netfs_write_begin(struct netfs_inode *ctx,
331                       struct file *file, struct address_space *mapping,                                                                            
332                       loff_t pos, unsigned int len, struct folio **_folio,
333                       void **_fsdata)
334 {
335         struct netfs_io_request *rreq;
336         struct folio *folio;
337         unsigned int fgp_flags = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE;
338         pgoff_t index = pos >> PAGE_SHIFT;
339         int ret;
340 
341         DEFINE_READAHEAD(ractl, file, NULL, mapping, index);
342 
343 retry:
344         folio = __filemap_get_folio(mapping, index, fgp_flags,
345                                     mapping_gfp_mask(mapping));
346         if (!folio)
347                 return -ENOMEM;
348 
349         if (ctx->ops->check_write_begin) {
350                 /* Allow the netfs (eg. ceph) to flush conflicts. */
351                 ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata);
352                 if (ret < 0) {
353                         trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
354                         if (ret == -EAGAIN)
355                                 goto retry;
356                         goto error;
357                 }
358         }
359

The __filemap_get_folio() will return a folio by locking it. While in ctx->ops->check_write_begin() in ceph it will unlock it conditionally.

1302 static int ceph_netfs_check_write_begin(struct file *file, loff_t pos, unsigned int len,                                                          
1303                                         struct folio *folio, void **_fsdata)
1304 {                             
1305         struct inode *inode = file_inode(file);
1306         struct ceph_inode_info *ci = ceph_inode(inode);
1307         struct ceph_snap_context *snapc;    
1308   
1309         snapc = ceph_find_incompatible(folio_page(folio, 0));
1310         if (snapc) {
1311                 int r;
1312   
1313                 folio_unlock(folio);                
1314                 folio_put(folio);
1315                 if (IS_ERR(snapc))
1316                         return PTR_ERR(snapc);              
1317   
1318                 ceph_queue_writeback(inode);        
1319                 r = wait_event_killable(ci->i_cap_wq, 
1320                                         context_is_writeable_or_written(inode, snapc));
1321                 ceph_put_snap_context(snapc);       
1322                 return r == 0 ? -EAGAIN : r;        
1323         }
1324         return 0;
1325 }

Not sure why we didn't see this call trace before.

Actions #2

Updated by Xiubo Li almost 2 years ago

  • Status changed from In Progress to Fix Under Review

Fixed it in both netfs and ceph, and the patchwork link: https://patchwork.kernel.org/project/ceph-devel/list/?series=655616

Actions #3

Updated by Xiubo Li almost 2 years ago

  • Status changed from Fix Under Review to Resolved
Actions #4

Updated by Xiubo Li almost 2 years ago

  • Status changed from Resolved to Fix Under Review
Actions #5

Updated by Xiubo Li over 1 year ago

  • Status changed from Fix Under Review to Resolved

Applied to the mainline and closing this tracker.

Actions

Also available in: Atom PDF