Project

General

Profile

Activity

From 04/01/2018 to 04/30/2018

04/30/2018

07:52 AM Bug #23537 (Pending Backport): libceph: monX xxxxxx session lost, hunting for new mon
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b4c443d139f1d2b5570da475f7a9cbcef86740... Ilya Dryomov
07:52 AM Bug #23706 (Pending Backport): NULL sock gets passed to ceph_tcp_sendmsg()
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9c55ad1c214d9f8c4594ac2c3fa392c1c32431a... Ilya Dryomov

04/27/2018

05:18 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
Ilya Dryomov
12:20 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
Ilya Dryomov wrote:
> No, this doesn't look related at first sight. Can you paste more so I can see which kernel, w...
geng jichao
10:01 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
No, this doesn't look related at first sight. Can you paste more so I can see which kernel, what happened before the... Ilya Dryomov
09:55 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
we encountered similar problems. I don't know whether they are the same reasons.
Task dump for CPU 14:
Call Tr...
geng jichao

04/26/2018

07:35 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
Does it happen right after you map an rbd image or mount cephfs? If so, and all your monitors are up, you are hittin... Ilya Dryomov
03:29 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
yes.
2018-04-26T11:18:36.435400+08:00 node53 kernel: libceph: mon2 10.0.30.53:6789 session established
2018-04-26...
Yong Wang

04/25/2018

10:32 AM Bug #23706 (Fix Under Review): NULL sock gets passed to ceph_tcp_sendmsg()
Ilya Dryomov
09:28 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
async messenger was still experimental in jewel. If you are on jewel, you should be using simple messenger. Ilya Dryomov
09:16 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
No, I'll post the patch for the kernel panic later today. The monitor session issue is separate.
Do you see "sess...
Ilya Dryomov
07:03 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
libceph: mon2 10.244.73.5:6789 session lost, hunting for new mon

the above logs will be usual saw in kernel log...
Yong Wang
06:58 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
http://tracker.ceph.com/issues/17664
https://github.com/ceph/ceph/pull/11601
do you meaning that async server not...
Yong Wang
06:25 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
We meet totally twices. Yong Wang
06:13 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
we used jewel 10.2.10 and ms_type is async. yes.
vmcore-dmesg has been attached.
Yong Wang

04/24/2018

02:53 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
It happened at least 9 times (from Jan 8 to Mar 31), i have only 3 last crash logs on the 3 servers.... Bertrand Gouny
12:58 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
Yes, I wanted to make sure you were using async messenger.
I'm still looking into the crash. Did it happen just o...
Ilya Dryomov
10:09 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
Thanks i will follow #23537 :)
I must have some serious config error when i run ...
Bertrand Gouny
08:38 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
Bertrand, Yong, what is the output of... Ilya Dryomov
12:55 PM Bug #23537: libceph: monX xxxxxx session lost, hunting for new mon
Ilya Dryomov

04/23/2018

05:02 PM Bug #23537: libceph: monX xxxxxx session lost, hunting for new mon
I think I found the issue. The fix should be in soon and will be backported to stable kernels. Ilya Dryomov
05:00 PM Bug #23537 (Fix Under Review): libceph: monX xxxxxx session lost, hunting for new mon
Ilya Dryomov

04/20/2018

12:40 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
os centos 7.3 kernel version 3.10.514
not found session lost near the panic timestamp
libceph print a lot of osds u...
Yong Wang
06:50 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
I'm working on a fix. What I'm wondering is why has it popped up now and not in the past.
Yong, which kernel is t...
Ilya Dryomov
03:05 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
ceph_con_workfn will continue to do the below if no flag settled in connections?
write_partial_kvec
ceph_tcp_sen...
Yong Wang
03:02 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
We meet the same panic:
=================
2129 [11093.424272] Call Trace:
2130 [11093.424438] [<ffffffffa05...
Yong Wang

04/18/2018

03:38 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
@Ilya
I guess this is a bad thing but the monitors are run in containers, one on each machine.
Times to times we st...
Bertrand Gouny
03:07 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
Yeah:
cancel_con calls cancel_delayed_work, but that can return while the work is still running. So, suppose a cal...
Jeff Layton
03:02 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
That is supposed to be protected by con->mutex. It is probably a bug in connection state handling code, I'll take a ... Ilya Dryomov
02:00 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
Ilya Dryomov wrote:
> [...]where ceph_tcp_sendmsg() got called with a NULL sock, meaning that con->sock was NULL a...
Jeff Layton
01:16 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
The "session lost" is logged on every (3) machines.
It seems to start when a monitor came down then up or is replace...
Bertrand Gouny
12:40 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
Do you see "session lost" on every machine you mount cephfs on or on just some of them? Ilya Dryomov
12:32 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
@Ilya Dryomov
the woarkload is very very low, osd up/down seems to happen from time to time on read and write, but t...
Bertrand Gouny
12:19 PM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
Ilya Dryomov wrote:
> That said, I managed to find what I hope is the correct kernel module and "ceph_msg_new+0x108e...
Jos Collin
11:03 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
It looks like CoreOS buildbot compiler is doing some really weird inlining. The backtrace doesn't make any sense: ce... Ilya Dryomov
09:27 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
Can you describe what the workload is -- it looks like this is cephfs? Are those "osd up/down" and "session lost" me... Ilya Dryomov
08:48 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
maybe not, i had this errors on 3 machines,
but i noticed the error checking the logs after an update, will let you ...
Bertrand Gouny
08:00 AM Bug #23706: NULL sock gets passed to ceph_tcp_sendmsg()
... Ilya Dryomov
09:21 AM Bug #23272: switch port down ,cephfs kernel client lost session, blocked not recover ok util port...
... Zheng Yan
08:32 AM Bug #23097 (Closed): Stale directories and files in CentOS (release <= 7.3 or kernel version < 3....
no place to commit it because recent rhel kernel also includes backport of the d_invalidate change Zheng Yan

04/13/2018

08:58 AM Bug #23706 (Resolved): NULL sock gets passed to ceph_tcp_sendmsg()
Hello,
we noticed some server crash with this kind of logs:...
Bertrand Gouny

04/12/2018

04:13 PM Feature #12902 (In Progress): krbd: support object-map and fast-diff
Ilya Dryomov
04:13 PM Feature #23073 (Resolved): Allow set CEPH_OSD_REQUEST_TIMEOUT_DEFAULT on rbd map
Ilya Dryomov
04:00 PM Feature #23688 (Resolved): get_features with readonly=true for parent images
Pass the optional read-only flag down to the 'get_features' rbd class method. For the parent image case, it would be... Ilya Dryomov

04/11/2018

08:47 AM Bug #22702 (Resolved): cephfs crashed under high memory pressure due to reserved caps number mism...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e30ee58121e34831b9665934d70dbc72ab0fe2f... Ilya Dryomov
08:40 AM Feature #4770 (Resolved): krbd: consider including write data with layered existence check
Done in 4.17. Ilya Dryomov
08:39 AM Feature #3837 (Resolved): krbd: support format 2 striping
Done in 4.17. Ilya Dryomov

04/10/2018

12:40 PM Bug #23112: rbd kernel client might hang when write to a quota-full pool
This isn't specific to the kernel client, I believe other ceph clients behave the same way.
Ilya Dryomov

04/09/2018

05:38 PM Bug #23537: libceph: monX xxxxxx session lost, hunting for new mon
v12.2.2 includes the fix for #17664.
Do these messages appear right after you mount or later? Do they go away if ...
Ilya Dryomov
05:29 PM Bug #23537: libceph: monX xxxxxx session lost, hunting for new mon
Марк Коренберг wrote:
> Important: on another machine with same OS everything is fine.
Another client machine whe...
Ilya Dryomov

04/03/2018

01:58 PM Bug #18130: soft lockups in ceph.ko
Reassigning to Ilya since he's working on this. Jeff Layton

04/01/2018

06:48 PM Bug #23537: libceph: monX xxxxxx session lost, hunting for new mon
Important: on another machine with same OS everything is fine. Марк Коренберг
06:46 PM Bug #23537 (Resolved): libceph: monX xxxxxx session lost, hunting for new mon
maybe connected with #17664
I use Luminous 12.2.2 on both client and cluster. Kernel at cephfs client: Linux mmwor...
Марк Коренберг
 

Also available in: Atom