Project

General

Profile

Actions

Bug #1723

closed

timeouts during ffsb

Added by Anonymous over 12 years ago. Updated over 12 years ago.

Status:
Rejected
Priority:
High
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During the 11/11 nightlies, in suite 1827, sepia65 experienced multiple timeout events.
The first (from /syslog/kern.log.gz) is:

libceph: loaded (mon/osd proto 15/24, osdmap 5/6 5/6)
2011-11-11T02:05:35.283053-08:00 sepia65 kernel: [ 1902.633565] ceph: loaded (mds proto 32)
2011-11-11T02:05:37.963057-08:00 sepia65 kernel: [ 1905.317225] Intel AES-NI instructions are not detected.
2011-11-11T02:05:40.473054-08:00 sepia65 kernel: [ 1907.821603] libceph: client4100 fsid cdcbfe2a-7c1b-df86-0ede-33cf6a15b32e
2011-11-11T02:05:40.473076-08:00 sepia65 kernel: [ 1907.821667] libceph: mon2 10.3.14.192:6791 session established
2011-11-11T02:24:18.156345-08:00 sepia65 kernel: [ 3025.480098] libceph: tid 3788 timed out on osd1, will reset osd
2011-11-11T02:24:18.156368-08:00 sepia65 kernel: [ 3025.485062] libceph: tid 3925 timed out on osd0, will reset osd
2011-11-11T02:25:18.363050-08:00 sepia65 kernel: [ 3085.710029] libceph: tid 7285 timed out on osd0, will reset osd
2011-11-11T02:25:23.383051-08:00 sepia65 kernel: [ 3090.730022] libceph: tid 8558 timed out on osd1, will reset osd
2011-11-11T02:25:55.317449-08:00 sepia65 kernel: [ 3122.660050] INFO: task ffsb:5535 blocked for more than 120 seconds.
2011-11-11T02:25:55.317471-08:00 sepia65 kernel: [ 3122.666394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2011-11-11T02:25:55.317477-08:00 sepia65 kernel: [ 3122.674361] ffsb D 00000000ffffffff 0 5535 3840 0x00000000
2011-11-11T02:25:55.317708-08:00 sepia65 kernel: [ 3122.674429] ffff8800ed1d5de8 0000000000000046 0000000000000000 ffffffffa025fbb5
2011-11-11T02:25:55.317713-08:00 sepia65 kernel: [ 3122.674434] ffff880111f40000 0000000000013b80 ffff8800ed1d5fd8 ffff8800ed1d4010
2011-11-11T02:25:55.317719-08:00 sepia65 kernel: [ 3122.674438] ffff8800ed1d5fd8 0000000000013b80 ffff8800a230bf60 ffff880111f40000
2011-11-11T02:25:55.317722-08:00 sepia65 kernel: [ 3122.674443] Call Trace:
2011-11-11T02:25:55.317732-08:00 sepia65 kernel: [ 3122.674465] [<ffffffffa025fbb5>] ? ceph_llseek+0x45/0x160 [ceph]
2011-11-11T02:25:55.317738-08:00 sepia65 kernel: [ 3122.674470] [<ffffffff81600e0f>] schedule+0x3f/0x60
2011-11-11T02:25:55.317743-08:00 sepia65 kernel: [ 3122.674474] [<ffffffff81601b77>] __mutex_lock_common+0x187/0x3d0
2011-11-11T02:25:55.317748-08:00 sepia65 kernel: [ 3122.674483] [<ffffffffa025fbb5>] ? ceph_llseek+0x45/0x160 [ceph]
2011-11-11T02:25:55.317753-08:00 sepia65 kernel: [ 3122.674488] [<ffffffff8113d30e>] ? might_fault+0x8e/0xa0
2011-11-11T02:25:55.317763-08:00 sepia65 kernel: [ 3122.674491] [<ffffffff8113d2c5>] ? might_fault+0x45/0xa0
2011-11-11T02:25:55.317768-08:00 sepia65 kernel: [ 3122.674625] [<ffffffffa025fbb5>] ? ceph_llseek+0x45/0x160 [ceph]
2011-11-11T02:25:55.317813-08:00 sepia65 kernel: [ 3122.674699] [<ffffffff81601ee7>] mutex_lock_nested+0x37/0x50
2011-11-11T02:25:55.317818-08:00 sepia65 kernel: [ 3122.674708] [<ffffffffa025fbb5>] ceph_llseek+0x45/0x160 [ceph]
2011-11-11T02:25:55.317826-08:00 sepia65 kernel: [ 3122.674713] [<ffffffff81174a33>] vfs_llseek+0x33/0x40
2011-11-11T02:25:55.317832-08:00 sepia65 kernel: [ 3122.674716] [<ffffffff81174ba6>] sys_lseek+0x76/0x80
2011-11-11T02:25:55.317837-08:00 sepia65 kernel: [ 3122.674720] [<ffffffff8160bb42>] system_call_fastpath+0x16/0x1b
2011-11-11T02:25:55.317842-08:00 sepia65 kernel: [ 3122.674722] INFO: lockdep is turned off.

Actions #1

Updated by Sage Weil over 12 years ago

  • Project changed from Linux kernel client to Ceph
Actions #2

Updated by Sage Weil over 12 years ago

  • Target version set to v0.39
Actions #3

Updated by Sage Weil over 12 years ago

  • Priority changed from Normal to High
Actions #4

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position set to 9
Actions #5

Updated by Sage Weil over 12 years ago

also didn't have the umount bug fix.

i think the osd timeouts are just sluggish server, not actual errors per se.. since both of these tests did complete (and then crash on umount).

we may want to refine the post-checks to look for BUG or WARNING in kern.log, though. hmm!

Actions #6

Updated by Sage Weil over 12 years ago

  • Status changed from New to Rejected
Actions

Also available in: Atom PDF