kdb on ceph_tcp_sendmsg
0xffff88022316c360 19267 2 1 0 R 0xffff88022316c948 *kworker/0:0 ffff8800bb52ba48 0000000000000018 ffffffff8161b84e ffffffff8161fc94 00000001ca6905e8 0000000000000000 0000001500000246 0000000000000000 0000000180220019 0000000000000000 ffff8800bb52bbb8 ffff8800bb52bad8 Call Trace: [<ffffffff8161b84e>] ? do_sock_sendmsg+0xbe/0xf0 [<ffffffff8161fc94>] ? release_sock+0x34/0x1d0 [<ffffffff816187f3>] ? sock_destroy_inode+0x33/0x40 [<ffffffff8161b890>] ? sock_sendmsg+0x10/0x20 [<ffffffff8161b901>] ? kernel_sendmsg+0x61/0x80 [<ffffffffa071127b>] ? ceph_tcp_sendmsg+0x4b/0x60 [libceph] [<ffffffffa0713de1>] ? con_work+0xc41/0x2d00 [libceph] [<ffffffff810985e3>] ? pick_next_task_fair+0xf3/0x560 [<ffffffff81081aab>] ? finish_task_switch+0x4b/0x130 [<ffffffff81081aab>] ? finish_task_switch+0x4b/0x130 [<ffffffff81075442>] ? process_one_work+0x142/0x530 [<ffffffff81075442>] ? process_one_work+0x142/0x530 [<ffffffff810754b9>] ? process_one_work+0x1b9/0x530 [<ffffffff81075442>] ? process_one_work+0x142/0x530 [<ffffffff81075ccf>] ? worker_thread+0x11f/0x480 [<ffffffff81075bb0>] ? rescuer_thread+0x340/0x340 [<ffffffff8107bc8f>] ? kthread+0xef/0x110 [<ffffffff8107bba0>] ? flush_kthread_worker+0xf0/0xf0 [<ffffffff8174beac>] ? ret_from_fork+0x7c/0xb0 [<ffffffff8107bba0>] ? flush_kthread_worker+0xf0/0xf0
The teuthology run hung when this machine disappeared, and when I went in via ipmi it was in kdb with that backtrace.
#1 Updated by Benoît Canet about 4 years ago
I will work on it.
14:38 < benoit> dis: so maybe it would be better that I try to exercice myself on another bug if you will throw away this code
!- kefu [~email@example.com] has quit [Max SendQ exceeded] kefu [~firstname.lastname@example.org] has joined #ceph-devel
14:39 < benoit> dis: do you have one ?
14:40 < dis> well, not entirely, but yeah - i'm working on it
14:41 < dis> benoit: a kernel client one? what exactly are you looking for?
14:41 < benoit> dis: something in the kernel part yes. And I am stronger in block device things than fs stuff
14:41 < rzarzynski> loicd: the mini-cluster seems the be more than enough for Tempest. I guess we don't need a sophisticated nor powerful test env. to verify API implementation
!- kefu [~email@example.com] has quit  smithfarm [~firstname.lastname@example.org] has quit [Quit: Leaving.]
!- wenjunhuang [~email@example.com] has quit [Ping timeout: 480 seconds] jashank42 [~firstname.lastname@example.org] has quit [Ping timeout: 480 seconds]
14:43 < loicd> rzarzynski: there is no restriction on what you can do in teuthology, but it needs a teuthology cluster to run. the scrips run via make check should be short running (no more
than a few minutes) because they run every time a change is made in the code, regardless.
14:46 < loicd> rzarzynski: that's all I have. I suspect the tempest test are not for make check to run, rather teuthology
14:46 < loicd> but
14:46 < dis> benoit: http://tracker.ceph.com/issues/10905 is the one we haven't triaged
14:46 < loicd> having a script that could be run via make check allows you to run it with make TESTS=tempest-test check without running teuthology, if you're patient enough
14:46 < rzarzynski> loicd: full run of Tempest object storage campaign took 40s :-)
14:47 < loicd> rzarzynski: oh, then it's a good candidate for make check, very good news
14:47 -!- kefu [~email@example.com] has joined #ceph-devel
14:47 < dis> benoit: it's from cephfs run but not in the fs bowels
14:47 < benoit> dis: ok