Project

General

Profile

Bug #78

bdi_init list bug

Added by Sage Weil almost 14 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

There were 2 clients mounted, here, so unclear what what was. One was behaving fine.

The other was forcefully unmounted. A while later, it was remounted, and crashed immediately.

[424827.793247] ceph: mon0 10.3.14.136:6789 connection failed
[424837.792949] ceph: mon0 10.3.14.136:6789 connection failed
[424847.794571] ceph: authentication error 1
[424857.794727] ceph: authentication error -1
[424863.096033] ceph: mds0 hung
[424864.881653] ceph: writepage_start ffff8800e0e4b490 on forced umount
[432041.361331] -----------
[ cut here ]------------
[432041.366109] WARNING: at lib/list_debug.c:26 __list_add+0x42/0x87()
[432041.372421] Hardware name: H8SSL
[432041.375777] list_add corruption. next->prev should be prev (ffffffff816b7ae0), but was ffffea000211e0a0. (next=ffff880048a15008).
[432041.387572] Modules linked in: ceph aes_x86_64 aes_generic fan ac battery psmouse ide_pci_generic ehci_hcd ohci_hcd thermal processor button [last unloaded: ceph]
[432041.402639] Pid: 22962, comm: mount.ceph Not tainted 2.6.34-rc3 #26
[432041.409041] Call Trace:
[432041.411608] [<ffffffff812283cb>] ? __list_add+0x42/0x87
[432041.417053] [<ffffffff810357e0>] warn_slowpath_common+0x77/0x8f
[432041.423184] [<ffffffff8103586d>] warn_slowpath_fmt+0x64/0x66
[432041.429062] [<ffffffff814260c7>] ? mutex_lock_nested+0x2e2/0x32c
[432041.435292] [<ffffffff81055aa7>] ? debug_mutex_free_waiter+0x4f/0x53
[432041.441860] [<ffffffff814260f4>] ? mutex_lock_nested+0x30f/0x32c
[432041.448080] [<ffffffff8122b774>] ? __percpu_counter_init+0x6d/0x9c
[432041.454467] [<ffffffff81056a72>] ? lockdep_init_map+0xa5/0x43a
[432041.460510] [<ffffffff812283cb>] __list_add+0x42/0x87
[432041.465777] [<ffffffff8122b78b>] __percpu_counter_init+0x84/0x9c
[432041.472004] [<ffffffff8108f87a>] bdi_init+0x134/0x192
[432041.477290] [<ffffffffa018829f>] ceph_get_sb+0x559/0xf18 [ceph]
[432041.483430] [<ffffffff810a5f7f>] ? alloc_pages_current+0x96/0x9f
[432041.489657] [<ffffffff810b1a82>] vfs_kern_mount+0xaf/0x164
[432041.495356] [<ffffffff810b1b94>] do_kern_mount+0x47/0xee
[432041.500886] [<ffffffff810c6a08>] do_mount+0x750/0x7cb
[432041.506155] [<ffffffff810c6b02>] sys_mount+0x7f/0xbf
[432041.511334] [<ffffffff81427645>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[432041.517908] [<ffffffff810029eb>] system_call_fastpath+0x16/0x1b
[432041.524040] ---[ end trace 7e46f668d5014977 ]---
[432043.272686] BUG: unable to handle kernel paging request at 00000000001d93fc
[432043.272815] IP: [<ffffffffa0195c6f>] writepages_finish+0x1be/0x44f [ceph]
[432043.272815] PGD f7c5d067 PUD f7f42067 PMD 0
[432043.272815] Oops: 0002 [#1] PREEMPT SMP
[432043.272815] last sysfs file: /sys/kernel/uevent_seqnum
[432043.272815] CPU 1
[432043.272815] Modules linked in: ceph aes_x86_64 aes_generic fan ac battery psmouse ide_pci_generic ehci_hcd ohci_hcd thermal processor button [last unloaded: ceph]
[432043.272815]
[432043.272815] Pid: 9873, comm: ceph-msgr/1 Tainted: G W 2.6.34-rc3 #26 H8SSL/H8SSL
[432043.272815] RIP: 0010:[<ffffffffa0195c6f>] [<ffffffffa0195c6f>] writepages_finish+0x1be/0x44f [ceph]
[432043.272815] RSP: 0018:ffff8800f51a3ac0 EFLAGS: 00010206
[432043.272815] RAX: 0000000000002eec RBX: ffff880048a82f48 RCX: 0000000000004c13
[432043.272815] RDX: 0000000000000fa4 RSI: ffffea00020d0428 RDI: 00000000001d93fc
[432043.272815] RBP: ffff8800f51a3bd0 R08: 0000000000000000 R09: 0000000000000000
[432043.272815] R10: ffffea00020d0428 R11: 0000000000000000 R12: 0000000000000274
[432043.272815] R13: 0000000000000202 R14: ffff8800f4fa94c8 R15: ffff8800f6fa2000
[432043.272815] FS: 00007f1c87d856e0(0000) GS:ffff880002800000(0000) knlGS:0000000000000000
[432043.272815] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[432043.272815] CR2: 00000000001d93fc CR3: 00000000f7fdc000 CR4: 00000000000006e0
[432043.272815] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[432043.272815] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[432043.272815] Process ceph-msgr/1 (pid: 9873, threadinfo ffff8800f51a2000, task ffff8800f7e00840)
[432043.272815] Stack:
[432043.272815] 0000000000000001 00000000001d2840 ffff8800f51a3b00 000004008104f3ea
[432043.272815] <0> ffff8800f6fa2ea0 ffff8800a0f7e810 ffff8800a0f7e3c0 0000020200000046
[432043.272815] <0> ffff8800f4cb7880 ffff8800b3e0fd30 00003fdd00000002 0000000000000202
[432043.272815] Call Trace:
[432043.272815] [<ffffffff81057f33>] ? mark_held_locks+0x49/0x64
[432043.272815] [<ffffffff81425c54>] ? __mutex_unlock_slowpath+0x10d/0x130
[432043.272815] [<ffffffff81058061>] ? trace_hardirqs_on_caller+0x113/0x13e
[432043.272815] [<ffffffff81058099>] ? trace_hardirqs_on+0xd/0xf
[432043.272815] [<ffffffffa01b1bdd>] dispatch+0x238/0x40b [ceph]
[432043.272815] [<ffffffffa01a3a4c>] try_read+0xd4a/0x1358 [ceph]
[432043.272815] [<ffffffff810099e3>] ? native_sched_clock+0x37/0x71
[432043.272815] [<ffffffff8104f2c2>] ? sched_clock_local+0x11/0x73
[432043.272815] [<ffffffff8105abf5>] ? __lock_acquire+0x7eb/0x84e
[432043.272815] [<ffffffff81056719>] ? put_lock_stats+0xe/0x27
[432043.272815] [<ffffffffa01a59f4>] con_work+0x11a/0x6bc [ceph]
[432043.272815] [<ffffffff810477ea>] worker_thread+0x1e8/0x2fa
[432043.272815] [<ffffffff81047791>] ? worker_thread+0x18f/0x2fa
[432043.272815] [<ffffffffa01a58da>] ? con_work+0x0/0x6bc [ceph]
[432043.272815] [<ffffffff8104a8b0>] ? autoremove_wake_function+0x0/0x38
[432043.272815] [<ffffffff81047602>] ? worker_thread+0x0/0x2fa
[432043.272815] [<ffffffff8104a57e>] kthread+0x7d/0x85
[432043.272815] [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
[432043.272815] [<ffffffff81428700>] ? restore_args+0x0/0x30
[432043.272815] [<ffffffff8104a501>] ? kthread+0x0/0x85
[432043.272815] [<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10
[432043.272815] Code: 16 f6 05 f7 26 2e e2 02 74 0d 80 3d 02 39 04 00 00 0f 85 ee 01 00 00 48 8b 85 38 ff ff ff 48 ff 40 28 48 8b 7b 10 48 85 ff 74 0f <f0> ff 0f 0f 94 c0 84 c0 74 05 e8 4f 41 f1 e0 48 c7 43 10 00 00
[432043.272815] RIP [<ffffffffa0195c6f>] writepages_finish+0x1be/0x44f [ceph]
[432043.272815] RSP <ffff8800f51a3ac0>
[432043.272815] CR2: 00000000001d93fc
[432043.612206] ---[ end trace 7e46f668d5014978 ]---
[432092.939403] ceph: mds0 caps stale
[432107.938999] ceph: mds0 caps stale
[432139.566006] ceph: tid 4207031 timed out on osd0, will reset osd
[432199.568295] ceph: tid 4207031 timed out on osd0, will reset osd
[432259.570589] ceph: tid 4207031 timed out on osd0, will reset osd
[432319.572886] ceph: tid 4207031 timed out on osd0, will reset osd
[432337.932428] ceph: mds0 hung
[432379.575183] ceph: tid 4207031 timed out on osd0, will reset osd
[432439.577478] ceph: tid 4207031 timed out on osd0, will reset osd
[432499.579774] ceph: tid 4207031 timed out on osd0, will reset osd

History

#1 Updated by Sage Weil almost 14 years ago

  • Target version set to v2.6.35

#2 Updated by Sage Weil almost 14 years ago

  • Status changed from New to 7

i suspect this was fixed by commit:5dfc589a8467470226feccdc50f1b32713318e7b

#3 Updated by Sage Weil almost 14 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF