Bug #10905

kdb on ceph_tcp_sendmsg

Added by Greg Farnum over 4 years ago. Updated about 4 years ago.

Target version:
Start date:
Due date:
% Done:


3 - minor
Affected Versions:


0xffff88022316c360    19267        2  1    0   R  0xffff88022316c948 *kworker/0:0
 ffff8800bb52ba48 0000000000000018 ffffffff8161b84e ffffffff8161fc94
 00000001ca6905e8 0000000000000000 0000001500000246 0000000000000000
 0000000180220019 0000000000000000 ffff8800bb52bbb8 ffff8800bb52bad8
Call Trace:
 [<ffffffff8161b84e>] ? do_sock_sendmsg+0xbe/0xf0
 [<ffffffff8161fc94>] ? release_sock+0x34/0x1d0
 [<ffffffff816187f3>] ? sock_destroy_inode+0x33/0x40
 [<ffffffff8161b890>] ? sock_sendmsg+0x10/0x20
 [<ffffffff8161b901>] ? kernel_sendmsg+0x61/0x80
 [<ffffffffa071127b>] ? ceph_tcp_sendmsg+0x4b/0x60 [libceph]
 [<ffffffffa0713de1>] ? con_work+0xc41/0x2d00 [libceph]
 [<ffffffff810985e3>] ? pick_next_task_fair+0xf3/0x560
 [<ffffffff81081aab>] ? finish_task_switch+0x4b/0x130
 [<ffffffff81081aab>] ? finish_task_switch+0x4b/0x130
 [<ffffffff81075442>] ? process_one_work+0x142/0x530
 [<ffffffff81075442>] ? process_one_work+0x142/0x530
 [<ffffffff810754b9>] ? process_one_work+0x1b9/0x530
 [<ffffffff81075442>] ? process_one_work+0x142/0x530
 [<ffffffff81075ccf>] ? worker_thread+0x11f/0x480
 [<ffffffff81075bb0>] ? rescuer_thread+0x340/0x340
 [<ffffffff8107bc8f>] ? kthread+0xef/0x110
 [<ffffffff8107bba0>] ? flush_kthread_worker+0xf0/0xf0
 [<ffffffff8174beac>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff8107bba0>] ? flush_kthread_worker+0xf0/0xf0

The teuthology run hung when this machine disappeared, and when I went in via ipmi it was in kdb with that backtrace.


#1 Updated by BenoƮt Canet about 4 years ago

I will work on it.

14:38 < benoit> dis: so maybe it would be better that I try to exercice myself on another bug if you will throw away this code
14:39 !- kefu [~] has quit [Max SendQ exceeded]
14:39 < benoit> dis: do you have one ?
14:40 -!
kefu [~] has joined #ceph-devel
14:40 < dis> well, not entirely, but yeah - i'm working on it
14:41 < dis> benoit: a kernel client one? what exactly are you looking for?
14:41 < benoit> dis: something in the kernel part yes. And I am stronger in block device things than fs stuff
14:41 < rzarzynski> loicd: the mini-cluster seems the be more than enough for Tempest. I guess we don't need a sophisticated nor powerful test env. to verify API implementation
14:41 !- kefu [~] has quit []
14:42 -!
smithfarm [~] has quit [Quit: Leaving.]
14:43 !- wenjunhuang [~] has quit [Ping timeout: 480 seconds]
14:43 < loicd> rzarzynski: there is no restriction on what you can do in teuthology, but it needs a teuthology cluster to run. the scrips run via make check should be short running (no more
than a few minutes) because they run every time a change is made in the code, regardless.
14:46 -!
jashank42 [~] has quit [Ping timeout: 480 seconds]
14:46 < loicd> rzarzynski: that's all I have. I suspect the tempest test are not for make check to run, rather teuthology
14:46 < loicd> but
14:46 < dis> benoit: is the one we haven't triaged
14:46 < loicd> having a script that could be run via make check allows you to run it with make TESTS=tempest-test check without running teuthology, if you're patient enough
14:46 < rzarzynski> loicd: full run of Tempest object storage campaign took 40s :-)
14:47 < loicd> rzarzynski: oh, then it's a good candidate for make check, very good news
14:47 -!- kefu [~] has joined #ceph-devel
14:47 < dis> benoit: it's from cephfs run but not in the fs bowels
14:47 < benoit> dis: ok

Also available in: Atom PDF