Project

General

Profile

Actions

Bug #55421

closed

kclient: kernel BUG at fs/ceph/addr.c:125! when running the ffsb.sh

Added by Xiubo Li almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
kcephfs
Crash signature (v1):
Crash signature (v2):

Description

$ cat /proc/kmsg
...
<6>[153596.561572] libceph: mon2 (1)10.72.47.117:40380 session established
<6>[153596.570725] libceph: client4214 fsid b838c647-6b7a-4095-9613-e426d05a3b5c
<4>[154162.616948] ------------[ cut here ]------------
<2>[154162.617046] kernel BUG at fs/ceph/addr.c:125!

While running the `ffsh.sh` test based on the 'testing' branch.


Related issues 1 (0 open1 closed)

Has duplicate Linux kernel client - Bug #55425: kclient: with 'wsync' option enabled with ffsb.sh will crashDuplicateXiubo Li

Actions
Actions #1

Updated by Xiubo Li almost 2 years ago

The relevant code is:

  75 /*
  76  * Dirty a page.  Optimistically adjust accounting, on the assumption
  77  * that we won't race with invalidate.  If we do, readjust.
  78  */
  79 static bool ceph_dirty_folio(struct address_space *mapping, struct folio *folio)
  80 {
     ...
 121         /*
 122          * Reference snap context in folio->private.  Also set
 123          * PagePrivate so that we get invalidate_folio callback.
 124          */
 125         BUG_ON(folio_get_private(folio));                                                                                                         
 126         folio_attach_private(folio, snapc);
 127 
 128         return ceph_fscache_dirty_folio(mapping, folio);
 129 }
Actions #2

Updated by Xiubo Li almost 2 years ago

  • Status changed from New to Fix Under Review
Actions #3

Updated by Xiubo Li almost 2 years ago

  • Related to Bug #55240: mds: stuck 2 seconds and keeps retrying to find ino from auth MDS added
Actions #4

Updated by Xiubo Li almost 2 years ago

  • Related to deleted (Bug #55240: mds: stuck 2 seconds and keeps retrying to find ino from auth MDS)
Actions #5

Updated by Xiubo Li almost 2 years ago

Xiubo Li wrote:

The patchwork link: https://patchwork.kernel.org/project/ceph-devel/list/?series=635030

Except this case, there have some other cases could cause this crash, I have reproduced it with this fixing.

Actions #6

Updated by Xiubo Li almost 2 years ago

<4>[ 2246.035727] ------------[ cut here ]------------
<2>[ 2246.035805] kernel BUG at fs/ceph/addr.c:125!
<4>[ 2246.036160] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
<4>[ 2246.036232] CPU: 17 PID: 31535 Comm: ffsb Tainted: G            E     5.18.0-rc4+ #106
<4>[ 2246.036239] Hardware name: Red Hat RHEV Hypervisor, BIOS 1.11.0-2.el7 04/01/2014
<4>[ 2246.036243] RIP: 0010:ceph_dirty_folio+0x2d8/0x2e0 [ceph]
<4>[ 2246.036514] Code: c6 48 83 c4 30 e9 d5 fe ff ff 48 89 ef e8 d0 a1 d7 c5 4c 89 ff e8 68 09 cf c5 44 8b 8d 30 05 00 00 e9 90 fe ff ff 0f 0b 0f 0b <0f> 0b 0f 0b 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41
<4>[ 2246.036520] RSP: 0018:ffffc900019af8c8 EFLAGS: 00010206
<4>[ 2246.036529] RAX: 0000000000000000 RBX: ffffea001a1110c0 RCX: ffffffffc0fe9f81
<4>[ 2246.036533] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffffea001a1110e8
<4>[ 2246.036538] RBP: ffffea001a1110e8 R08: ffffed10203e6051 R09: 0000000000000b4e
<4>[ 2246.036542] R10: ffff888101f30283 R11: ffffed10203e6050 R12: ffff88831b7dae18
<4>[ 2246.036547] R13: ffff888101f30280 R14: ffff88831b7daf38 R15: ffff88831b7db1d0
<4>[ 2246.036554] FS:  000014b7e7dfe700(0000) GS:ffff8887cd480000(0000) knlGS:0000000000000000
<4>[ 2246.036562] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2246.036586] CR2: 000014b7e4d5a000 CR3: 0000000341272006 CR4: 00000000007706e0
<4>[ 2246.036590] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 2246.036594] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[ 2246.036598] PKRU: 55555554
<4>[ 2246.036601] Call Trace:
<4>[ 2246.036605]  <TASK>
<4>[ 2246.036610]  ceph_write_end+0x90/0x1b0 [ceph]
<4>[ 2246.036666]  generic_perform_write+0x26a/0x300
<4>[ 2246.036728]  ? folio_unlock+0x30/0x30
<4>[ 2246.036734]  ? file_update_time+0x1bc/0x220
<4>[ 2246.036774]  ? inode_update_time+0x60/0x60
<4>[ 2246.036780]  ? rwsem_down_read_slowpath+0x5c0/0x5c0
<4>[ 2246.036811]  ceph_write_iter+0xf87/0x1240 [ceph]
<4>[ 2246.036875]  ? free_unref_page_list+0x366/0x380
<4>[ 2246.036899]  ? release_pages+0x221/0xa20
<4>[ 2246.036915]  ? ceph_direct_read_write+0xe80/0xe80 [ceph]
<4>[ 2246.036969]  ? cgroup_rstat_updated+0x66/0x1c0
<4>[ 2246.036997]  ? __mod_memcg_lruvec_state+0x9e/0x110
<4>[ 2246.037018]  ? __rcu_read_unlock+0x4e/0x250
<4>[ 2246.037064]  ? pagevec_add_and_need_flush+0x85/0xa0
<4>[ 2246.037075]  ? new_sync_write+0x21a/0x330
<4>[ 2246.037102]  new_sync_write+0x21a/0x330
<4>[ 2246.037110]  ? new_sync_read+0x320/0x320
<4>[ 2246.037115]  ? __handle_mm_fault+0xc51/0xc90
<4>[ 2246.037160]  ? sysvec_apic_timer_interrupt+0x9c/0xb0
<4>[ 2246.037177]  ? __inode_security_revalidate+0x49/0x80
<4>[ 2246.037197]  ? avc_policy_seqno+0x22/0x30
<4>[ 2246.037203]  ? security_file_permission+0x4b/0x2b0
<4>[ 2246.037216]  vfs_write+0x335/0x3d0
<4>[ 2246.037222]  ksys_write+0xb4/0x150
<4>[ 2246.037228]  ? __ia32_sys_read+0x50/0x50
<4>[ 2246.037233]  ? ktime_get_coarse_real_ts64+0x51/0x70
<4>[ 2246.037267]  do_syscall_64+0x37/0x80
<4>[ 2246.037273]  entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[ 2246.037280] RIP: 0033:0x14b7f5211947
<4>[ 2246.037329] Code: c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 1b fd ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 54 fd ff ff 48
<4>[ 2246.037334] RSP: 002b:000014b7e7dfdda0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
<4>[ 2246.037340] RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 000014b7f5211947
<4>[ 2246.037345] RDX: 0000000000001000 RSI: 00000000015ca000 RDI: 0000000000000017
<4>[ 2246.037349] RBP: 00000000015ca000 R08: 0000000000000000 R09: 000000000042a57c
<4>[ 2246.037371] R10: 00007fffbf13d080 R11: 0000000000000293 R12: 0000000000001000
<4>[ 2246.037375] R13: 0000000000000017 R14: 0000000000001000 R15: 00000000015ca000
<4>[ 2246.037381]  </TASK>
Actions #7

Updated by Luis Henriques almost 2 years ago

FWIW, I've seen this too with two different stack traces: the one in the previous comment and with this one:

[ 4643.181014] Call Trace:                                                                                                                                                   
[ 4643.182034]  <TASK>                                                                                                                                                       
[ 4643.182896]  ceph_page_mkwrite+0x41a/0x590 [ceph]                                                                                                                         
[ 4643.184690]  do_page_mkwrite+0x55/0xb0                                                                                                                                    
[ 4643.186066]  __handle_mm_fault+0x64a/0xad0                                                                                                                                
[ 4643.187554]  handle_mm_fault+0x5f/0x180                                                                                                                                   
[ 4643.189065]  do_user_addr_fault+0x1de/0x680                                                                                                                               
[ 4643.189792]  exc_page_fault+0x65/0x130                                                                                                                                    
[ 4643.190440]  ? asm_exc_page_fault+0x8/0x30                                         
[ 4643.191145]  asm_exc_page_fault+0x1e/0x30                                                                                                                                 

Unfortunately, I don't really have a reproducer; it happens occasionally.

Actions #8

Updated by Xiubo Li almost 2 years ago

Luis Henriques wrote:

FWIW, I've seen this too with two different stack traces: the one in the previous comment and with this one:

[...]

Unfortunately, I don't really have a reproducer; it happens occasionally.

I can see the kernel stuck infinitely by using the ffsb by setting the max_mds = 3 and user the wsync option for kernel mount, but only sometimes could see the call trace, mostly I could't see anything in the terminal.

There still has some cases that could cause the page's dirty bit is cleared without detatching the private data from it. Both this should be done under the folio_lock()/folio_unlock() scope.

Actions #9

Updated by Xiubo Li almost 2 years ago

It seems buggy in:

 787 static int ceph_writepages_start(struct address_space *mapping,
 788                                  struct writeback_control *wbc)
 789 {
     ...
 944                         if (!clear_page_dirty_for_io(page)) {
 945                                 dout("%p !clear_page_dirty_for_io\n", page);
 946                                 unlock_page(page);
 947                                 continue;
 948                         }
     ...
1215 }

When doing writeback it will lock the pages and clear the dirty bit for them, and then do the IOs. And after the writeback finishes, then in:

 700 static void writepages_finish(struct ceph_osd_request *req)
 701 {
     ...
 763                         unlock_page(page);
 764                 }
 765                 dout("writepages_finish %p wrote %llu bytes cleaned %d pages\n",
 766                      inode, osd_data->length, rc >= 0 ? num_pages : 0);
 767 
 768                 release_pages(osd_data->pages, num_pages);
     ...
 782 } 

It will unlock and release the pages from the mmaping, but in the release_pages() it says if the page ref won't reach to 0 the it will be kept. So next time when reusing it, the page stale private data will cause the BUG.

Actions #10

Updated by Xiubo Li almost 2 years ago

Xiubo Li wrote:

It seems buggy in:

[...]

When doing writeback it will lock the pages and clear the dirty bit for them, and then do the IOs. And after the writeback finishes, then in:

[...]

It will unlock and release the pages from the mmaping, but in the release_pages() it says if the page ref won't reach to 0 the it will be kept. So next time when reusing it, the page stale private data will cause the BUG.

Sorry, misread the code, it has already detached the private data.

Actions #11

Updated by Jeff Layton almost 2 years ago

  • Status changed from Fix Under Review to In Progress

I hit the problem again as well in testing. The kernel has the proposed patch:

http://qa-proxy.ceph.com/teuthology/jlayton-2022-05-02_14:27:24-fs:thrash-master-testing-default-smithi/6818503/console_logs/smithi071.log

So I think the proposed fix is not sufficient.

Actions #12

Updated by Xiubo Li almost 2 years ago

Jeff Layton wrote:

I hit the problem again as well in testing. The kernel has the proposed patch:

http://qa-proxy.ceph.com/teuthology/jlayton-2022-05-02_14:27:24-fs:thrash-master-testing-default-smithi/6818503/console_logs/smithi071.log

So I think the proposed fix is not sufficient.

I have gone through the related code again in both vfs and kceph code, didn't find any clue yet. From my reading all the related code are properly handled by the folio_lock()/folio_unlock() or page_lock()/page_unlock(). Not sure what I has missed. Will dig it further later again.

Actions #13

Updated by Jeff Layton almost 2 years ago

One thing that might be helpful is to turn this BUG_ON into a VM_BUG_ON, so we at least get some information about page flags, etc. That might give us more insight into the page state at the time.

Actions #14

Updated by Xiubo Li almost 2 years ago

Jeff Layton wrote:

One thing that might be helpful is to turn this BUG_ON into a VM_BUG_ON, so we at least get some information about page flags, etc. That might give us more insight into the page state at the time.

Sure, will try this.

Actions #15

Updated by Jeff Layton almost 2 years ago

I ran ffsb.sh in a loop for 10 iterations on my home test rig and never saw a crash or hang. Perhaps this is related to some bug unrelated to ceph (possibly even something hw-specific)? I guess we'll just have to wait for a VM_BUG_ON_FOLIO report to pop up and see what it looks like.

Actions #16

Updated by Xiubo Li almost 2 years ago

Jeff Layton wrote:

I ran ffsb.sh in a loop for 10 iterations on my home test rig and never saw a crash or hang. Perhaps this is related to some bug unrelated to ceph (possibly even something hw-specific)? I guess we'll just have to wait for a VM_BUG_ON_FOLIO report to pop up and see what it looks like.

So strange with the patch [1] I cannot reproduce any kernel crash or hang issue by running the ffsb.sh for 15 hours till now. Before it's very easy to reproduce it.

[1] https://patchwork.kernel.org/project/ceph-devel/list/?series=638723

Actions #17

Updated by Xiubo Li almost 2 years ago

<4>[  690.237190] page:0000000072bce213 refcount:2 mapcount:0 mapping:0000000055310797 index:0x5e1c pfn:0x85148
<4>[  690.237362] memcg:ffff888156070000
<4>[  690.237373] aops:ceph_aops [ceph] ino:10000000541 dentry name:"datafile96" 
<4>[  690.237857] flags: 0xfffffc0000035(locked|uptodate|lru|active|node=0|zone=1|lastcpupid=0x1fffff)
<4>[  690.237878] raw: 000fffffc0000035 ffffea0002145248 ffffea00021451c8 ffff8882688ea7b8
<4>[  690.237907] raw: 0000000000005e1c 0000000000070000 00000002ffffffff ffff888156070000
<4>[  690.237912] page dumped because: VM_BUG_ON_FOLIO(folio_get_private(folio))
<4>[  690.240212]  generic_perform_write+0x26d/0x300
<4>[  690.240861]  ? folio_unlock+0x60/0x60
<4>[  690.240870]  ? file_update_time+0x1bc/0x220
<4>[  690.240979]  ? inode_update_time+0x60/0x60
<4>[  690.240988]  ? rwsem_down_read_slowpath+0x5c0/0x5c0
<4>[  690.241521]  ? kmem_cache_alloc+0x13e/0x270
<4>[  690.241662]  ceph_write_iter+0xf87/0x1240 [ceph]

<4>[  690.239297] ------------[ cut here ]------------
<2>[  690.239303] kernel BUG at fs/ceph/addr.c:125!
<4>[  690.239611] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
<4>[  690.239739] CPU: 30 PID: 18279 Comm: ffsb Tainted: G            E     5.18.0-rc5+ #113
<4>[  690.239756] Hardware name: Red Hat RHEV Hypervisor, BIOS 1.11.0-2.el7 04/01/2014
<4>[  690.239764] RIP: 0010:ceph_dirty_folio+0x31d/0x340 [ceph]
<4>[  690.239919] Code: 05 00 00 e9 69 fe ff ff 48 c7 c6 80 52 f4 c0 48 89 df e8 66 61 94 e7 0f 0b 0f 0b 48 c7 c6 a0 53 f4 c0 48 89 df e8 53 61 94 e7 <0f> 0b 48 c7 c6 00 54 f4 c0 48 89 df e8 42 61 94 e7 0f 0b 0f 0b 66
<4>[  690.239930] RSP: 0018:ffffc9001a60f8c8 EFLAGS: 00010282
<4>[  690.239942] RAX: 000000000000003e RBX: ffffea0002145200 RCX: 0000000000000000
<4>[  690.239951] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffffab67c900
<4>[  690.239963] RBP: ffff8882688ea640 R08: ffffed10f9b65f0e R09: ffffed10f9b65f0e
<4>[  690.239972] R10: ffff8887cdb2f86b R11: ffffed10f9b65f0d R12: ffff8882688ea7b8
<4>[  690.239980] R13: ffff88815d071500 R14: ffffea0002145228 R15: ffff8882688eab70
<4>[  690.239998] FS:  000014b21b2f3700(0000) GS:ffff8887cdb00000(0000) knlGS:0000000000000000
<4>[  690.240011] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  690.240026] CR2: 000055f8e6b68860 CR3: 00000001a7038005 CR4: 00000000007706e0
<4>[  690.240035] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[  690.240042] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[  690.240050] PKRU: 55555554
<4>[  690.240056] Call Trace:
<4>[  690.240062]  <TASK>
<4>[  690.240076]  ceph_write_end+0x8e/0x1e0 [ceph]
<4>[  690.241894]  ? ceph_direct_read_write+0xe80/0xe80 [ceph]
<4>[  690.242079]  ? update_load_avg+0x126/0xcd0
<4>[  690.242193]  ? update_load_avg+0x126/0xcd0
<4>[  690.242232]  ? rb_erase+0x5b2/0x770
<4>[  690.242408]  ? load_new_mm_cr3+0xb8/0x250
<4>[  690.242556]  ? preempt_count_sub+0x14/0xc0
<4>[  690.242627]  ? _raw_spin_unlock+0x15/0x30
<4>[  690.242638]  ? finish_task_switch+0x11b/0x450
<4>[  690.242649]  ? __update_load_avg_cfs_rq+0x5a/0x560
<4>[  690.242688]  ? osq_unlock+0x25/0x100
<4>[  690.242727]  ? new_sync_write+0x21d/0x330
<4>[  690.242766]  new_sync_write+0x21d/0x330
<4>[  690.243195]  ? new_sync_read+0x320/0x320
<4>[  690.243204]  ? update_load_avg+0x126/0xcd0
<4>[  690.243217]  ? load_new_mm_cr3+0xb8/0x250
<4>[  690.243229]  ? __inode_security_revalidate+0x49/0x80
<4>[  690.243312]  ? avc_policy_seqno+0x22/0x30
<4>[  690.243325]  ? security_file_permission+0x4b/0x2b0
<4>[  690.243375]  vfs_write+0x335/0x3d0
<4>[  690.243389]  ksys_write+0xb4/0x150
<4>[  690.243401]  ? __ia32_sys_read+0x50/0x50
<4>[  690.243411]  ? ktime_get_coarse_real_ts64+0x51/0x70
<4>[  690.243475]  do_syscall_64+0x3a/0x80
<4>[  690.243544]  entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[  690.243558] RIP: 0033:0x14b222411947
<4>[  690.243586] Code: c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 1b fd ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 54 fd ff ff 48
<4>[  690.243642] RSP: 002b:000014b21b2f2da0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
<4>[  690.243656] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 000014b222411947
<4>[  690.243665] RDX: 0000000000001000 RSI: 0000000000eaf000 RDI: 0000000000000007
<4>[  690.243673] RBP: 0000000000eaf000 R08: 0000000000000000 R09: 0000000000148ec6
<4>[  690.243681] R10: 00007fff745bc080 R11: 0000000000000293 R12: 0000000000001000
<4>[  690.243746] R13: 0000000000000007 R14: 0000000000001000 R15: 0000000000eaf000
<4>[  690.243762]  </TASK>
Actions #18

Updated by Xiubo Li almost 2 years ago

I found a very easy way to reproduce it, that's run the ffsb.sh test in more than terminals at the same time and then do cat /proc/kmsg at two terminals at the same time, and I can see the call traces amost every time.

Actions #19

Updated by Xiubo Li almost 2 years ago

Added some debug logs and they are:

<4>[  656.970751] __filemap_get_folio:1956 -- page flags: 17ffffc0000014, page->private: 100000
<4>[  656.970767] __filemap_get_folio:1977 -- page flags: 17ffffc0000015, page->private: 100000
<4>[  656.970775] __filemap_get_folio:1987 -- page flags: 17ffffc0000015, page->private: 100000
<4>[  656.970790] netfs: netfs_write_begin:354 -- page flags: 17ffffc0000015, page->private: 100000
<4>[  656.970811] netfs: netfs_write_begin:367 -- page flags: 17ffffc0000015, page->private: 100000
<4>[  656.970820] generic_perform_write:3803 -- page flags: 17ffffc0000015, page->private: 100000
<4>[  656.970828] generic_perform_write:3808 -- page flags: 17ffffc0000015, page->private: 100000
<4>[  656.970838] generic_perform_write:3811 -- page flags: 17ffffc0000015, page->private: 100000
<4>[  656.970847] page:0000000008d21958 refcount:2 mapcount:0 mapping:000000008847cd54 index:0x4f04 pfn:0x4e96a8
<4>[  656.970924] memcg:ffff888110bcc000
<4>[  656.970953] aops:ceph_aops [ceph] ino:10000000544 dentry name:"datafile107" 
<4>[  656.971322] flags: 0x17ffffc0000015(locked|uptodate|lru|node=0|zone=2|lastcpupid=0x1fffff)
<4>[  656.971373] raw: 0017ffffc0000015 ffffea0013a5aa48 ffffea0013a5a9c8 ffff8882968a6ff8
<4>[  656.971392] raw: 0000000000004f04 0000000000100000 00000002ffffffff ffff888110bcc000
<4>[  656.971404] page dumped because: VM_BUG_ON_FOLIO(cond)

The folio is not new allocated just existing one:

1940 struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
1941                 int fgp_flags, gfp_t gfp)
1942 {
1943         struct folio *folio;
1944 
1945 repeat:
1946         folio = mapping_get_entry(mapping, index);
1947         if (xa_is_value(folio)) {
1948                 if (fgp_flags & FGP_ENTRY)
1949                         return folio;
1950                 folio = NULL;
1951         }
1952         if (!folio)
1953                 goto no_page;
1954 
1955         if (!folio_test_dirty(folio) && folio->private)
1956                 pr_warn("%s:%d -- page flags: %lx, page->private: %llx\n", __func__, __LINE__, folio->flags, (long long unsigned)folio->private);
1957         if (fgp_flags & FGP_LOCK) {
1958                 if (fgp_flags & FGP_NOWAIT) {
1959                         if (!folio_trylock(folio)) {
1960                                 folio_put(folio);
1961                                 return NULL;
1962                         }

Actions #20

Updated by Xiubo Li almost 2 years ago

2083489 <4>[  539.179879] __filemap_add_folio:900 -- folio: 00000000dffbb8ef, flags: fffffc0000001, private: 0

...

3695883 <4>[  694.980052] __filemap_get_folio:1957 -- folio: 00000000dffbb8ef, flags: fffffc0000034, page->private: 40000
3695884 <4>[  694.980061] page:00000000dffbb8ef refcount:2 mapcount:0 mapping:00000000e246cdc1 index:0x2451 pfn:0x71e08
3695885 <4>[  694.980129] memcg:ffff8881215a4000                                                                                                       
3695886 <4>[  694.980131] __filemap_add_folio:900 -- folio: 000000004e4b7b22, flags: fffffc0000001, private: 0
3695887 <4>[  694.980148] __filemap_add_folio:900 -- folio: 00000000932e5e2e, flags: fffffc0000001, private: 0
3695888 <4>[  694.980296] __filemap_add_folio:900 -- folio: 00000000158888af, flags: fffffc0000001, private: 0
3695889 <4>[  694.980138] aops:ceph_aops [ceph] ino:20000000257 dentry name:"datafile65" 
3695890 <4>[  694.980352] __filemap_add_folio:900 -- folio: 0000000010a8632f, flags: fffffc0000001, private: 0
3695891 <4>[  694.980385] __filemap_add_folio:900 -- folio: 00000000d1b6e3b0, flags: fffffc0000001, private: 0
3695892 <4>[  694.980355] flags: 0xfffffc0000035(locked|uptodate|lru|active|node=0|zone=1|lastcpupid=0x1fffff)
3695893 <4>[  694.980400] raw: 000fffffc0000035 ffffea0001c78248 ffffea0001c781c8 ffff888215199b78
3695894 <4>[  694.980410] raw: 0000000000002451 0000000000040000 00000002ffffffff ffff8881215a4000
3695895 <4>[  694.980414] page dumped because: VM_BUG_ON_FOLIO(cond)

It's corrupted after inserted into xas.

The patch:

diff --git a/mm/filemap.c b/mm/filemap.c
index 9a1eef6c5d35..e4bee3176abc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -897,6 +897,7 @@ noinline int __filemap_add_folio(struct address_space *mapping,
                        }
                }

+               pr_warn("%s:%d -- folio: %p, ino: %lx, flags: %lx, private: %llx\n", __func__, __LINE__, folio, mapping->host->i_ino, folio->flags, (long long unsigned)folio->private);
                xas_store(&xas, folio);
                if (xas_error(&xas))
                        goto unlock;
Actions #21

Updated by Xiubo Li almost 2 years ago

Tried the following patch, where are possible to set or change the private data for the page or folio, and when reproduced I get nothing about this logs added, that means the private seems not curroputed by ceph related code:

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8834e38c06a4..981203382c3e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -324,6 +324,10 @@ static inline atomic_t *compound_pincount_ptr(struct page *page)

 static inline void set_page_private(struct page *page, unsigned long private)
 {
+       if (private < 0x100000 && private > 0x1000) {
+               pr_warn("%s:%d -- folio: %p, flags: %lx, invalidate private: %llx\n", __func__, __LINE__, page, page->flags, (long long unsigned)private);
+               dump_stack();
+       }
        page->private = private;
 }

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 993994cd943a..4d1a5eaee148 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -412,6 +412,10 @@ static inline struct inode *folio_inode(struct folio *folio)
  */
 static inline void folio_attach_private(struct folio *folio, void *data)
 {
+       if ((long long unsigned)data < 0x100000 && (long long unsigned)data > 0x1000) {
+               pr_warn("%s:%d -- folio: %p, flags: %lx, invalidate private: %llx\n", __func__, __LINE__, folio, folio->flags, (long long unsigned)data);
+               dump_stack();
+       }
        folio_get(folio);
        folio->private = data;
        folio_set_private(folio);
@@ -432,6 +436,10 @@ static inline void *folio_change_private(struct folio *folio, void *data)
 {
        void *old = folio_get_private(folio);

+       if ((long long unsigned)data < 0x100000 && (long long unsigned)data > 0x1000) {
+               pr_warn("%s:%d -- folio: %p, flags: %lx, invalidate private: %llx\n", __func__, __LINE__, folio, folio->flags, (long long unsigned)data);
+               dump_stack();
+       }
        folio->private = data;
        return old;
 }
@@ -449,6 +457,11 @@ static inline void *folio_detach_private(struct folio *folio)
 {
        void *data = folio_get_private(folio);

+       if (!folio_test_private(folio) && folio->private) {
+               pr_warn("%s:%d -- folio: %p, flags: %lx, invalidate private: %llx\n", __func__, __LINE__, folio, folio->flags, (long long unsigned)data);
+               dump_stack();
+       }
+
        if (!folio_test_private(folio))
                return NULL;
        folio_clear_private(folio);
Actions #22

Updated by Xiubo Li almost 2 years ago

I added logs to print the inode number at the same time:

When the folio was added to the mapping, the inode# is 100000005d8:

2478322 <4>[  489.394209] __filemap_add_folio:900 -- folio: 00000000b236a241, ino: 100000005d8, flags: fffffc0000001, private: 0

While when BUG the inode# changed to 10000000598:

4591166 <4>[  621.187854] __filemap_get_folio:1957 -- folio: 00000000b236a241, ino: 10000000598, flags: fffffc0000034, page->private: 40000            
4591167 <4>[  621.187864] page:00000000b236a241 refcount:2 mapcount:0 mapping:00000000ffde338c index:0x107c pfn:0xa0628
4591168 <4>[  621.187889] memcg:ffff8881116dc000
4591169 <4>[  621.187897] aops:ceph_aops [ceph] ino:10000000598 dentry name:"datafile20" 
4591170 <4>[  621.188109] flags: 0xfffffc0000035(locked|uptodate|lru|active|node=0|zone=1|lastcpupid=0x1fffff)
4591171 <4>[  621.188136] raw: 000fffffc0000035 ffffea0002818ac8 ffffea0002818988 ffff8881ca0d6978
4591172 <4>[  621.188150] raw: 000000000000107c 0000000000040000 00000002ffffffff ffff8881116dc000
4591173 <4>[  621.188152] __filemap_add_folio:900 -- folio: 000000009ac5e4ab, ino: 100000005aa, flags: fffffc0000001, private: 0
4591174 <4>[  621.188155] page dumped because: VM_BUG_ON_FOLIO(cond)

IMO this could happen when the folio 00000000b236a241 was reclaimed by the MM from inode# 100000005d8, and then reused by inode#
10000000598 in page fault.

Actions #23

Updated by Xiubo Li almost 2 years ago

Added the index in the mapping and reproduce it again, the same index 3439 were using different folio:

954368 <4>[ 1749.186719] __filemap_add_folio:900 -- folio: 0000000056ef187e, ino: 10000000559 -- 3439, flags: fffffc0000001, private: 0
830874 <4>[ 1769.011478] __filemap_get_folio:1957 -- folio: 00000000688c5288, ino: 10000000559 -- 3439, flags: fffffc0000034, private: 2e0000          
830875 <4>[ 1769.011487] page:00000000688c5288 refcount:2 mapcount:0 mapping:00000000eb4fdbf3 index:0x3439 pfn:0x7febd
830876 <4>[ 1769.011499] __filemap_add_folio:900 -- folio: 00000000ff47f521, ino: 20000000431 -- 205, flags: 17ffffc0000001, private: 0
830877 <4>[ 1769.011511] memcg:ffff88815448c000
830878 <4>[ 1769.011514] __filemap_add_folio:900 -- folio: 00000000bd443883, ino: 20000000427 -- fa6, flags: 17ffffc0000001, private: 0
830879 <4>[ 1769.011547] __filemap_add_folio:900 -- folio: 00000000325a1e79, ino: 200000003f8 -- 5946, flags: 17ffffc0000001, private: 0
830880 <4>[ 1769.011633] __filemap_add_folio:900 -- folio: 00000000cdbfa8de, ino: 10000000533 -- e9c, flags: 17ffffc0000001, private: 0
830881 <4>[ 1769.011640] __filemap_add_folio:900 -- folio: 00000000556b9c1b, ino: 200000003f8 -- 5bbc, flags: 17ffffc0000001, private: 0
830882 <4>[ 1769.011519] aops:ceph_aops [ceph] ino:10000000559 dentry name:"datafile31" 
830883 <4>[ 1769.011727] flags: 0xfffffc0000035(locked|uptodate|lru|active|node=0|zone=1|lastcpupid=0x1fffff)
830884 <4>[ 1769.011734] __filemap_add_folio:900 -- folio: 00000000a76525f7, ino: 200000003f8 -- 4ec4, flags: 17ffffc0000001, private: 0
830885 <4>[ 1769.011742] raw: 000fffffc0000035 ffffea0001ffaf88 ffffea0001ffaf08 ffff8881b485aef8
830886 <4>[ 1769.011752] raw: 0000000000003439 00000000002e0000 00000002ffffffff ffff88815448c000
830887 <4>[ 1769.011755] page dumped because: VM_BUG_ON_FOLIO(cond)
Actions #24

Updated by Xiubo Li almost 2 years ago

The pages in the file mapping maybe reclaimed and reused by other
subsystems and the page->private maybe used as flags field or
something else, if later that pages are used by page caches again
the page->private maybe not cleared as expected.

Here will check the PG_private bit instead of the folio->private.

The patchwork link: https://patchwork.kernel.org/project/ceph-devel/list/?series=639462

Actions #25

Updated by Xiubo Li almost 2 years ago

  • Status changed from In Progress to Fix Under Review
Actions #26

Updated by Ilya Dryomov almost 2 years ago

  • Has duplicate Bug #55425: kclient: with 'wsync' option enabled with ffsb.sh will crash added
Actions #27

Updated by Xiubo Li almost 2 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF