Bug #1723
closedtimeouts during ffsb
0%
Description
During the 11/11 nightlies, in suite 1827, sepia65 experienced multiple timeout events.
The first (from 1827/remote/ubuntu@sepia65.ceph.dreamhost.com/syslog/kern.log.gz) is:
libceph: loaded (mon/osd proto 15/24, osdmap 5/6 5/6)
2011-11-11T02:05:35.283053-08:00 sepia65 kernel: [ 1902.633565] ceph: loaded (mds proto 32)
2011-11-11T02:05:37.963057-08:00 sepia65 kernel: [ 1905.317225] Intel AES-NI instructions are not detected.
2011-11-11T02:05:40.473054-08:00 sepia65 kernel: [ 1907.821603] libceph: client4100 fsid cdcbfe2a-7c1b-df86-0ede-33cf6a15b32e
2011-11-11T02:05:40.473076-08:00 sepia65 kernel: [ 1907.821667] libceph: mon2 10.3.14.192:6791 session established
2011-11-11T02:24:18.156345-08:00 sepia65 kernel: [ 3025.480098] libceph: tid 3788 timed out on osd1, will reset osd
2011-11-11T02:24:18.156368-08:00 sepia65 kernel: [ 3025.485062] libceph: tid 3925 timed out on osd0, will reset osd
2011-11-11T02:25:18.363050-08:00 sepia65 kernel: [ 3085.710029] libceph: tid 7285 timed out on osd0, will reset osd
2011-11-11T02:25:23.383051-08:00 sepia65 kernel: [ 3090.730022] libceph: tid 8558 timed out on osd1, will reset osd
2011-11-11T02:25:55.317449-08:00 sepia65 kernel: [ 3122.660050] INFO: task ffsb:5535 blocked for more than 120 seconds.
2011-11-11T02:25:55.317471-08:00 sepia65 kernel: [ 3122.666394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2011-11-11T02:25:55.317477-08:00 sepia65 kernel: [ 3122.674361] ffsb D 00000000ffffffff 0 5535 3840 0x00000000
2011-11-11T02:25:55.317708-08:00 sepia65 kernel: [ 3122.674429] ffff8800ed1d5de8 0000000000000046 0000000000000000 ffffffffa025fbb5
2011-11-11T02:25:55.317713-08:00 sepia65 kernel: [ 3122.674434] ffff880111f40000 0000000000013b80 ffff8800ed1d5fd8 ffff8800ed1d4010
2011-11-11T02:25:55.317719-08:00 sepia65 kernel: [ 3122.674438] ffff8800ed1d5fd8 0000000000013b80 ffff8800a230bf60 ffff880111f40000
2011-11-11T02:25:55.317722-08:00 sepia65 kernel: [ 3122.674443] Call Trace:
2011-11-11T02:25:55.317732-08:00 sepia65 kernel: [ 3122.674465] [<ffffffffa025fbb5>] ? ceph_llseek+0x45/0x160 [ceph]
2011-11-11T02:25:55.317738-08:00 sepia65 kernel: [ 3122.674470] [<ffffffff81600e0f>] schedule+0x3f/0x60
2011-11-11T02:25:55.317743-08:00 sepia65 kernel: [ 3122.674474] [<ffffffff81601b77>] __mutex_lock_common+0x187/0x3d0
2011-11-11T02:25:55.317748-08:00 sepia65 kernel: [ 3122.674483] [<ffffffffa025fbb5>] ? ceph_llseek+0x45/0x160 [ceph]
2011-11-11T02:25:55.317753-08:00 sepia65 kernel: [ 3122.674488] [<ffffffff8113d30e>] ? might_fault+0x8e/0xa0
2011-11-11T02:25:55.317763-08:00 sepia65 kernel: [ 3122.674491] [<ffffffff8113d2c5>] ? might_fault+0x45/0xa0
2011-11-11T02:25:55.317768-08:00 sepia65 kernel: [ 3122.674625] [<ffffffffa025fbb5>] ? ceph_llseek+0x45/0x160 [ceph]
2011-11-11T02:25:55.317813-08:00 sepia65 kernel: [ 3122.674699] [<ffffffff81601ee7>] mutex_lock_nested+0x37/0x50
2011-11-11T02:25:55.317818-08:00 sepia65 kernel: [ 3122.674708] [<ffffffffa025fbb5>] ceph_llseek+0x45/0x160 [ceph]
2011-11-11T02:25:55.317826-08:00 sepia65 kernel: [ 3122.674713] [<ffffffff81174a33>] vfs_llseek+0x33/0x40
2011-11-11T02:25:55.317832-08:00 sepia65 kernel: [ 3122.674716] [<ffffffff81174ba6>] sys_lseek+0x76/0x80
2011-11-11T02:25:55.317837-08:00 sepia65 kernel: [ 3122.674720] [<ffffffff8160bb42>] system_call_fastpath+0x16/0x1b
2011-11-11T02:25:55.317842-08:00 sepia65 kernel: [ 3122.674722] INFO: lockdep is turned off.
Updated by Sage Weil over 12 years ago
- Project changed from Linux kernel client to Ceph
Updated by Sage Weil over 12 years ago
- Translation missing: en.field_position set to 9
Updated by Sage Weil over 12 years ago
also didn't have the umount bug fix.
i think the osd timeouts are just sluggish server, not actual errors per se.. since both of these tests did complete (and then crash on umount).
we may want to refine the post-checks to look for BUG or WARNING in kern.log, though. hmm!