Project

General

Profile

Activity

From 08/17/2015 to 09/15/2015

09/15/2015

04:08 PM Bug #12691: ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
Is the dup op log that small? That's <200 tids we are talking about.
Josh, can you attach those logs? It'd be inte...
Ilya Dryomov
04:00 PM Bug #12763: rbd: unmap failed: (16) Device or resource busy
The location of osd dirs shouldn't matter here. Typically, you get -EBUSY on rbd unmap if your block device is still... Ilya Dryomov
03:47 PM Bug #12867 (Resolved): support CEPH_FEATURE_MSGR_KEEPALIVE2
With "libceph: don't access invalid memory in keepalive2 path" and "libceph: advertise support for keepalive2" queued... Ilya Dryomov
09:47 AM Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
The issue in this ticket was a problem triggered by a change to the core kernel and is fixed. The follow up comments... Ilya Dryomov
09:25 AM Bug #8818: IO Hang on raw rbd device - Workqueue: ceph-msgr con_work [libceph]
Hi!
Are there any news here?
We are still having exactly the same issues with our cluster. And our *ONLY* usecase...
Christian Eichelmann
07:10 AM Bug #13081: data on rbd image get corrupted when pool quota is smaller than the size of the rbd i...
This is due to http://tracker.ceph.com/issues/12018 - fixed on the osd and userspace client. Moving this to the kerne... Josh Durgin

09/14/2015

02:19 PM Bug #13081 (Resolved): data on rbd image get corrupted when pool quota is smaller than the size o...
1. linux kernel is 4.1.1... runsisi hust

09/10/2015

05:32 PM Bug #12691: ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
Josh,
can you let me know what values I can try for osd_min_pg_log_entries and osd_max_pg_log_entries.
Vasu Kulkarni
02:10 AM Bug #12691: ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
Increasing osd_min_pg_log_entries and osd_max_pg_log_entries should avoid the problem. We should probably add that to... Josh Durgin
02:04 AM Bug #12691: ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
Looking at the osd logs, it seems like in one case at least this happened due to dup op detection having limited hist... Josh Durgin
01:50 AM Bug #12691: ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
Does this problem make sense http://tracker.ceph.com/issues/12691 Haomai Wang

09/09/2015

05:49 PM Bug #12691: ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
Following is the config file, you might have to modify it based on the nodes and os-type you reserve, this config has... Vasu Kulkarni
12:55 AM Bug #12763: rbd: unmap failed: (16) Device or resource busy
The image im1 is on the osd.And the osd is not install on the disk.
It is install on the /var/cache/local.
[root@...
ceph zte

09/08/2015

10:49 PM Bug #12691: ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
The logs I attached was to show the backtrace, but here is the test that I am running from ceph-qa-suites on rh 7.1 ,... Vasu Kulkarni
05:27 PM Bug #12691: ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
A thrash means lingering requests code is in play, which is a large part of what I'm rewriting for 4.4. Still, it wo... Ilya Dryomov
05:24 PM Bug #12691 (Need More Info): ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 ==...
Ilya Dryomov
03:53 PM Bug #12763 (Need More Info): rbd: unmap failed: (16) Device or resource busy
Sorry for a late reply - this was lingering in the Ceph project.
Are you sure there wasn't a filesystem mounted on t...
Ilya Dryomov

09/02/2015

03:01 AM Bug #12867: support CEPH_FEATURE_MSGR_KEEPALIVE2
https://github.com/ceph/ceph-client/commit/492da3d689d6009a2192898fbd42bdfb547012bb Zheng Yan

09/01/2015

03:55 PM Feature #12902 (Resolved): krbd: support object-map and fast-diff
Josh Durgin
02:18 AM Bug #11898 (Resolved): Writing on forced umount causes silent data loss
Zheng Yan
01:54 AM Bug #12695: rbd flatten makes clone's snap crash
Ilya Dryomov wrote:
> science luo wrote:
> > Ilya Dryomov wrote:
> > > This was fixed in kernel 3.17, commit 4d9b6...
science luo

08/31/2015

05:59 PM Bug #12811 (Resolved): ENXIO from kernel client (cephfs)
fix deployed to teuthology, so far so good! Reviewed-by: Sage Weil
02:37 PM Feature #12874 (New): krbd: get rid of RBD_MAX_SNAP_COUNT
There is no reason for that limit to exist, need to work out a better client - msgr interface and not insist on preal... Ilya Dryomov
02:31 PM Bug #12815 (Closed): Stop the rbd map command due to the linux kernal crush.
There are two problems here. The first one is... Ilya Dryomov
11:27 AM Bug #12867 (Resolved): support CEPH_FEATURE_MSGR_KEEPALIVE2
Without this feature, kernel client can't detect event that connection get lost quietly (without receiving TCP RST). ... Zheng Yan
07:43 AM Bug #12695: rbd flatten makes clone's snap crash
science luo wrote:
> Ilya Dryomov wrote:
> > This was fixed in kernel 3.17, commit 4d9b67cddd9b ("rbd: take snap_id...
Ilya Dryomov
03:48 AM Bug #12695: rbd flatten makes clone's snap crash
Ilya Dryomov wrote:
> This was fixed in kernel 3.17, commit 4d9b67cddd9b ("rbd: take snap_id into account when readi...
science luo

08/30/2015

09:23 PM Bug #12695 (Closed): rbd flatten makes clone's snap crash
This was fixed in kernel 3.17, commit 4d9b67cddd9b ("rbd: take snap_id into account when reading in parent info"). A... Ilya Dryomov

08/28/2015

10:03 AM Bug #12811 (Fix Under Review): ENXIO from kernel client (cephfs)
https://github.com/ceph/ceph-client/commit/4a18ede97ba5029587b362446360cfd42379f9d0... Zheng Yan
08:57 AM Bug #12811: ENXIO from kernel client (cephfs)
I compared osdmaps of good and bad mounts, found that the 'exist' flag is missing in bad mount's osdmap.... Zheng Yan
03:05 AM Bug #12815 (Closed): Stop the rbd map command due to the linux kernal crush.
My ceph version is 0.87
[root@song ~]# uname -a
Linux song 3.10.0-123.el7.x86_64 #1 SMP Thu Jun 4 17:17:49 CST 20...
jack ma

08/27/2015

06:56 PM Bug #12811: ENXIO from kernel client (cephfs)
during this time the apama osds were flapping and we were rebalancing lots of data... either of those may be the trig... Sage Weil
06:47 PM Bug #12811 (Resolved): ENXIO from kernel client (cephfs)
this is on the lab cluster. It has happened twice in 2 days.
In each case, the requests are sent to the second ac...
Sage Weil

08/25/2015

03:39 PM Bug #12691: ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
Can you post the yaml config for this test? We haven't seen it in the usual nightlies, and it'd be good to add it the... Josh Durgin

08/24/2015

06:15 AM Bug #12697: [RBD] rbd_dev_id_put causes system crash
Ilya Dryomov wrote:
> Any chance you could share the relevant Go pieces, pre and post adjustment? I suspect your co...
Zhi Zhang
05:54 AM Bug #12763 (Closed): rbd: unmap failed: (16) Device or resource busy
My ceph version is 0.87.
The linux system info
[root@client27 ~]# uname -a
Linux client27 3.10.0-123.el7.x86_64 ...
ceph zte

08/21/2015

10:29 AM Bug #12697: [RBD] rbd_dev_id_put causes system crash
Any chance you could share the relevant Go pieces, pre and post adjustment? I suspect your code isn't doing any udev... Ilya Dryomov
10:23 AM Bug #12697: [RBD] rbd_dev_id_put causes system crash
Hi Ilya,
Looks like this is caused by rbd unmap, but it is due to our codes to map rbd firstly, then unmap rbd via...
Zhi Zhang

08/17/2015

11:18 AM Bug #12697: [RBD] rbd_dev_id_put causes system crash
Ilya Dryomov wrote:
> I think the problem is that the reference dropped by put_device() caused by delayed fput was t...
Zhi Zhang
10:44 AM Bug #12697: [RBD] rbd_dev_id_put causes system crash
And the rest suggests a use-after-free on struct rbd_device. Ilya Dryomov
10:10 AM Bug #12697 (Need More Info): [RBD] rbd_dev_id_put causes system crash
Ilya Dryomov
10:10 AM Bug #12697: [RBD] rbd_dev_id_put causes system crash
I think the problem is that the reference dropped by put_device() caused by delayed fput was the last one. Unless I'... Ilya Dryomov
 

Also available in: Atom