Project

General

Profile

Activity

From 03/09/2017 to 04/07/2017

04/07/2017

01:38 PM Bug #18690: kclient: FAILED assert(0 == "old msgs despite reconnect_seq feature")
I suspect simple msgr also has this problem, maybe we can try this Haomai Wang

04/06/2017

06:39 PM Feature #17204: Implement new-style ENOSPC handling in kclient
Ilya seems to be OK with v7 of the set, and I merged that into the kernel testing branch yesterday. How do we turn on... Jeff Layton

04/04/2017

12:50 PM Bug #19127: NULL pointer dereference in ceph_readdir
Jeff Layton wrote:
> Yes, we also don't hold a reference to CEPH_CAP_FILE_SHARED in this code. I suppose that's the ...
Zheng Yan
12:31 PM Bug #19127: NULL pointer dereference in ceph_readdir
Jeff Layton wrote:
> So, my thinking was to call ceph_try_get_caps in ceph_readdir to grab CEPH_CAP_FILE_RD and CEPH...
Zheng Yan
10:20 AM Bug #18690: kclient: FAILED assert(0 == "old msgs despite reconnect_seq feature")
I read a little at kernel/net/ceph/messenger.cc it looks not respect on reconnect seq actually. maybe we could disabl... Haomai Wang

04/03/2017

02:28 PM Bug #19127: NULL pointer dereference in ceph_readdir
So, my thinking was to call ceph_try_get_caps in ceph_readdir to grab CEPH_CAP_FILE_RD and CEPH_CAP_FILE_SHARED. ceph... Jeff Layton
01:51 PM Bug #19127: NULL pointer dereference in ceph_readdir
Zheng Yan wrote:
> Jeff Layton wrote:
> > Zheng Yan wrote:
> >
> > > no vmcore
> >
> > Bummer. Do you have th...
Jeff Layton
01:45 PM Bug #19127: NULL pointer dereference in ceph_readdir
Jeff Layton wrote:
> Zheng Yan wrote:
>
> > no vmcore
>
> Bummer. Do you have the text in the log from before ...
Zheng Yan
01:12 PM Bug #19127: NULL pointer dereference in ceph_readdir

Zheng Yan wrote:
> no vmcore
Bummer. Do you have the text in the log from before the Oops line? It'd be goo...
Jeff Layton

04/01/2017

03:24 PM Bug #11555 (Resolved): lock inversion related to memory reclaim
We were allocating (with GFP_KERNEL) and destroying the cipher context on each encrypt/decrypt operation:
https://...
Ilya Dryomov
03:04 PM Bug #18543 (Closed): rbd map lun02 -p hdd2 rbd: sysfs write failed rbd: map failed: (5) Input/ou...
Ilya Dryomov
10:33 AM Bug #18130 (In Progress): soft lockups in ceph.ko
No, not quite resolved yet. I have a couple of patches for this in the testing branch, but they're marked DNM for now... Jeff Layton
02:44 AM Bug #18130 (Resolved): soft lockups in ceph.ko
relevant code has been removed. (now vfs helpers are used) Zheng Yan
02:50 AM Bug #15432: kcephfs: umount -f can fail after mds reconnect failure
base on Jeff's ENOSPC work, It should be easy to implement function that abort pending osd requests for 'umount -f' Zheng Yan
02:18 AM Bug #19127: NULL pointer dereference in ceph_readdir
Jeff Layton wrote:
> Zheng Yan wrote:
> >
> > we call ceph_dir_clear_ordered() before splice_dentry(). But only f...
Zheng Yan
01:11 AM Bug #19127: NULL pointer dereference in ceph_readdir
Jeff Layton wrote:
>Another question too...why are we using CEPH_CAP_FILE_SHARED in this code. Shouldn't we require ...
Zheng Yan

03/31/2017

07:17 PM Bug #19419: XFS filesystem on RBD image was corrupt after remount
Hi Ilya,
There were Samba client tests occurring on the platform that were timing out with oplocks. the RBD was be...
Bryan Apperson
02:52 PM Bug #19419 (Need More Info): XFS filesystem on RBD image was corrupt after remount
There is no way to tell at this point. It's a possibility, although if that were the case it would have manifested s... Ilya Dryomov
03:29 PM Feature #17524 (In Progress): krbd: support disabling auto-exclusive lock transition logic
Ilya Dryomov
12:02 PM Bug #19127: NULL pointer dereference in ceph_readdir
Zheng Yan wrote:
>
> we call ceph_dir_clear_ordered() before splice_dentry(). But only for dentry's current parent...
Jeff Layton
07:41 AM Bug #19127: NULL pointer dereference in ceph_readdir
Jeff Layton wrote:
> We clear the parent's complete bit whenever we call splice_dentry, AFAICT. The only exception i...
Zheng Yan

03/30/2017

07:39 PM Bug #19127: NULL pointer dereference in ceph_readdir
We clear the parent's complete bit whenever we call splice_dentry, AFAICT. The only exception is in the readdir code ... Jeff Layton
06:20 PM Bug #19419: XFS filesystem on RBD image was corrupt after remount
the RBD client system was gracefully rebooted at Mar 17 10:22:36. Bryan Apperson
06:13 PM Bug #19419: XFS filesystem on RBD image was corrupt after remount
Bryan Apperson wrote:
> Hi Ilya,
>
> The log starts before the reboot - the corruption appeared before the reboot...
Bryan Apperson
06:13 PM Bug #19419: XFS filesystem on RBD image was corrupt after remount
Hi Ilya,
The log starts before the reboot - the corruption appeared before the reboot in the logs. The server had ...
Bryan Apperson
04:19 PM Bug #19419: XFS filesystem on RBD image was corrupt after remount
BTW not only firefly is EOL, but that RHEL 7.0 kernel client is also very old. I'd recommend upgrading to the 7.3 ke... Ilya Dryomov
04:09 PM Bug #19419: XFS filesystem on RBD image was corrupt after remount
Hi Bryan,
> Then on March 16th the image became unresponsive. After attempting a reboot of the server and a remoun...
Ilya Dryomov
06:09 PM Feature #17204: Implement new-style ENOSPC handling in kclient
Up to v6 of the set now, just re-posted it today. The main difference from v5 is some changes to address Ilya's comme... Jeff Layton
04:01 PM Bug #19385 (Need More Info): rbd stuck on read/write op to ceph_osd
Let me know if you can reproduce this on 4.9.z. Ilya Dryomov
10:07 AM Bug #19122 (Resolved): pre-jewel "osd rm" incrementals are misinterpreted (kernel client)
In 4.4.58, 4.9.19, 4.10.7. Ilya Dryomov

03/29/2017

05:32 PM Bug #19419 (Need More Info): XFS filesystem on RBD image was corrupt after remount
An RBD image was being used in production to back a Samba and NFS export. The 100TB image was formatted with XFS and ... Bryan Apperson
01:07 PM Bug #18690: kclient: FAILED assert(0 == "old msgs despite reconnect_seq feature")
Still seeing this here:
http://pulpito.ceph.com/jspray-2017-03-29_01:19:13-multimds-wip-jcsp-testing-20170328-test...
John Spray

03/28/2017

05:03 PM Bug #19385: rbd stuck on read/write op to ceph_osd
Ilya, thanks.
I try it
Sergey Jerusalimov
02:33 PM Bug #19385: rbd stuck on read/write op to ceph_osd
There are known issues with resending code in older kernels and manually restarting OSDs is indeed one of the workaro... Ilya Dryomov
11:18 AM Bug #19385: rbd stuck on read/write op to ceph_osd
Ilya, Hi, it's 3.18.43-40 Sergey Jerusalimov
01:41 PM Bug #19127: NULL pointer dereference in ceph_readdir
I suspect the first check does not properly handle the d_splice_alias() case. Zheng Yan
01:39 PM Bug #19127: NULL pointer dereference in ceph_readdir
The idea is borrowed from ncpfs. When receiving reply from server, the readdir code fills dentry pointers in page cac... Zheng Yan

03/27/2017

01:12 PM Bug #19127: NULL pointer dereference in ceph_readdir
Ouch...the ceph readdir code seems to keep pointers to dentries in the ceph_readdir_cache_control, but doesn't hold r... Jeff Layton
01:10 PM Bug #19385: rbd stuck on read/write op to ceph_osd
Which kernel is this on? Ilya Dryomov
09:24 AM Bug #19309 (Pending Backport): IO Hang on raw rbd device using libceph kernel module
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=633ee407b9d15a75ac9740ba9d3338815e1fcb9... Ilya Dryomov

03/24/2017

10:08 PM Bug #19385 (Closed): rbd stuck on read/write op to ceph_osd
Sometimes, after massive osd restart (server with shelf) rbd image (maped and mounted xfs) stuck on read or write ope... Sergey Jerusalimov

03/22/2017

11:13 AM Bug #19309 (Fix Under Review): IO Hang on raw rbd device using libceph kernel module
[PATCH] libceph: force GFP_NOIO for socket allocations Ilya Dryomov
02:12 AM Bug #19309 (In Progress): IO Hang on raw rbd device using libceph kernel module
Ilya Dryomov

03/20/2017

05:13 PM Bug #19309: IO Hang on raw rbd device using libceph kernel module
sorry for "raw" mistake in subject Sergey Jerusalimov
07:35 AM Bug #19309: IO Hang on raw rbd device using libceph kernel module
Ilya, Hi.
I have three rbd devices mapped on this host. they are formatted to xfs and mounted,
It's stuck on "rm ...
Sergey Jerusalimov

03/19/2017

11:16 PM Bug #19309: IO Hang on raw rbd device using libceph kernel module
Hi Sergey,
The title says "raw rbd device", but the traces suggest that there is an xfs filesystem on top of the d...
Ilya Dryomov
10:00 PM Bug #19309: IO Hang on raw rbd device using libceph kernel module
Interesting is:
Mar 19 22:18:00 kikimr0056 kernel: [288420.742789] Workqueue: events handle_osds_timeout [libceph]...
Sergey Jerusalimov
09:55 PM Bug #19309 (Resolved): IO Hang on raw rbd device using libceph kernel module
Ilya, Hi!
We have a periodical problem with ceph rbd kern module.
IO stuck, and our working process put in D stat...
Sergey Jerusalimov

03/14/2017

10:46 AM Bug #19275 (Resolved): stable-writes flag gets reset underneath rbd since 4.4
... Ilya Dryomov
09:25 AM Bug #19095 (Resolved): handle image feature mismatches
Ilya Dryomov
08:07 AM Bug #19272 (New): kcephfs: got forward for unsafe request
... Zheng Yan

03/13/2017

03:42 AM Bug #19189 (Resolved): cephfs kernel 4.9.13 file read hangs
the fix is merged into stable-4.9 tree Zheng Yan

03/12/2017

09:44 AM Bug #19122 (Pending Backport): pre-jewel "osd rm" incrementals are misinterpreted (kernel client)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b581a5854eee4b7851dedb0f8c2ceb54fb902c0... Ilya Dryomov

03/10/2017

10:15 PM Bug #19095 (Fix Under Review): handle image feature mismatches
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8767b293a4ab6632f9288f34bcf2ab9ba20dca3a
...
Ilya Dryomov

03/09/2017

03:08 PM Bug #18690: kclient: FAILED assert(0 == "old msgs despite reconnect_seq feature")
BTW it looks this crash only happen when multimds? kernel client + multimds is a clue. Haomai Wang
03:08 PM Bug #18690: kclient: FAILED assert(0 == "old msgs despite reconnect_seq feature")
Ah, I checked Patrick Donnelly's log before. But I'm really have no experience on ceph kernel codes, I don't have any... Haomai Wang
02:56 PM Bug #18690: kclient: FAILED assert(0 == "old msgs despite reconnect_seq feature")
Lots more of these on latest run:
http://pulpito.ceph.com/jspray-2017-03-08_14:08:01-multimds-master-testing-basic-s...
John Spray
 

Also available in: Atom