Project

General

Profile

Activity

From 01/24/2017 to 02/22/2017

02/22/2017

11:44 PM Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
Well, sort of. last_epoch_clean is really about when we can forget OSDMaps. Should we retain OSDMaps on the mon (an... Samuel Just
11:33 PM Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
2017-02-20 20:45:59.104093 7f75c93f8700 10 osd.3 pg_epoch: 284 pg[1.16( v 278'379 (0'0,278'379] local-les=277 n=1 ec=... Samuel Just
12:09 AM Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
2017-02-20 20:46:28.567065 7ffa3242c700 10 osd.4 pg_epoch: 255 pg[1.16( v 254'369 (0'0,254'369] local-les=164 n=3 ec=... Samuel Just
12:05 AM Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
2017-02-20 20:46:40.165108 7f9e2ffc3700 10 osd.0 pg_epoch: 300 pg[1.16( DNE empty local-les=0 n=0 ec=0 les/c/f 0/0/0 ... Samuel Just
12:03 AM Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
2017-02-20 20:46:41.743173 7f9e277b2700 10 osd.0 pg_epoch: 301 pg[1.16( empty local-les=0 n=0 ec=141 les/c/f 164/164/... Samuel Just
07:46 AM Bug #18926: Why osds do not release memory?
Hello,
Version: L12.0.0, bluestore, two replication.
Memory size:16GB
OSD number:12

After I trying to...
yongqiang guo

02/21/2017

11:39 PM Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
Notably, when it goes active at the end there, it's missing the 10 commits which happened during the [3,1] interval. Samuel Just
11:38 PM Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
At epoch 255, 1.16 is on [4,3] and is active+clean
2017-02-20 20:45:10.962790 7fd9b7cba700 10 osd.4 pg_epoch: 255 ...
Samuel Just
01:35 AM Bug #19023: ceph_test_rados invalid read caused apparently by lost intervals due to mons trimming...
I assume from your description that this was a dirty interval the monitor shouldn't have trimmed? Or did osd.4 perhap... Greg Farnum
01:27 AM Bug #19023 (Resolved): ceph_test_rados invalid read caused apparently by lost intervals due to mo...
samuelj@teuthology:/a/samuelj-2017-02-20_18:45:04-rados-wip-18937---basic-smithi/839771/remote
If you look back in...
Samuel Just

02/20/2017

11:32 AM Bug #18996: api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure
the authenticate would time out at "15:59:24.639011".... Kefu Chai
10:28 AM Bug #18996 (New): api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure
... Kefu Chai

02/18/2017

09:51 PM Documentation #18986 (New): Need to document monitor health configuration values

All configuration variables referenced in OSDMonitor::get_health() need to be documented. These values affect the ...
David Zafman

02/16/2017

11:20 AM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
This was discussed during Yesterday's performance meeting and Sage suggested that this is indeed a memory leak.
Al...
Wido den Hollander
11:17 AM Bug #18926: Why osds do not release memory?
Seems to be related to #18924 doesn't it?
Machines seem to be running out of memory with BlueStore.
Wido den Hollander

02/15/2017

10:47 PM Feature #18943: crush: add devices class that rules can use as a filter
<loicd> sage: I'm confused by how we should handle the weights with the device classes. The weight of the generated b... Loïc Dachary
03:00 PM Feature #18943: crush: add devices class that rules can use as a filter
Instead of ... Loïc Dachary
12:24 PM Feature #18943 (Resolved): crush: add devices class that rules can use as a filter
h3. Problem
1. We want to have different types of devices (SSD, HDD, NVMe) backing different OSDs within the same ...
Loïc Dachary

02/14/2017

05:16 PM Bug #18930 (New): received Segmentation fault in PGLog::IndexedLog::add
2017-02-15 00:12:04.566736 7fee7b9ec700 -1 *** Caught signal (Segmentation fault) **
in thread 7fee7b9ec700 thread_...
Haodong Tang
12:09 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
Marek Panek wrote:
> We observe the same effect in 11.2 with bluestore. After some time OSDs consume ~6G RAM memory ...
Marek Panek
12:07 PM Bug #18924: kraken-bluestore 11.2.0 memory leak issue
We observe the same effect in 11.2. After some time OSDs consume ~6G RAM memory (12 OSDs per 64G RAM server) and fina... Marek Panek
08:58 AM Bug #18924 (Resolved): kraken-bluestore 11.2.0 memory leak issue
Hi All,
On all our 5 node cluster with ceph 11.2.0 we encounter memory leak issues.
Cluster details : 5 node wi...
Muthusamy Muthiah
11:40 AM Bug #18926 (Duplicate): Why osds do not release memory?
Version: K11.2.0, bluestore, two replication.
test: testing cluster with fio, with parmeters "-direct=1 -iodepth 6...
yongqiang guo
10:38 AM Bug #18925 (Can't reproduce): Leak_DefinitelyLost in KernelDevice::aio_write

See on fs test branch based on master.
http://pulpito.ceph.com/jspray-2017-02-14_02:39:19-fs-wip-jcsp-testing-20...
John Spray

02/10/2017

10:05 PM Bug #18749: OSD: allow EC PGs to do recovery below min_size
See https://www.mail-archive.com/ceph-users@lists.ceph.com/msg35273.html for user discovery. Greg Farnum

02/09/2017

11:30 PM Cleanup #18875 (New): osd: give deletion ops a cost when performing backfill
From PrimaryLogPG, line 11134 (at time of writing)... Greg Farnum
02:15 PM Bug #18871 (New): problem about create pool with expected-num-objects does not cause collection s...
i create a pool and want the PG folder splitting happen at the pool creation time,but i found it not happend
1、...
peng zhang

02/08/2017

08:10 PM Bug #18859 (Closed): kraken monitor fails to bootstrap off jewel monitors if it has booted before
To reproduce; bootstrap a quorum off of jewel. Stop one of the monitors, remove it's filesystem contents, re-create i... Kjetil Joergensen
05:00 PM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
Sorry, github is off-limits for me (it tries to run non-Free Software on my browser, and it refuses to work if I don'... Alexandre Oliva

02/07/2017

01:33 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
FWIW: in our case, the rbd pool is tiered in write-back mode. Kjetil Joergensen
01:27 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
This particular snapshot were created on the 20th of January, and I'm relatively certain clients/osds/monitors/etc. r... Kjetil Joergensen
01:15 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
I suspect we're hitting the same.... Kjetil Joergensen

02/06/2017

07:58 AM Feature #18826 (New): [RFE] Allow using an external DB file for extended attributes (xattr)
Allow using an external DB file for extended attributes (xattr) like Samba does[1], this would bring ceph on OSes whi... jiri b

02/03/2017

08:34 PM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
Running the repro scenario with `bluefs_allocator = stupid` did not reproduce the issue after running all night. It d... Jared Watts

01/31/2017

09:09 PM Bug #18752 (New): LibRadosList.EnumerateObjects failure
... Sage Weil
08:02 PM Bug #18750 (New): handle_pg_remove: pg_map_lock held for write when taking pg_lock
This could block the fast dispatch path, since fast dispatch takes pg_map_lock for read, so this makes it possibly bl... Josh Durgin
06:37 PM Bug #18749 (Resolved): OSD: allow EC PGs to do recovery below min_size
PG::choose_acting has a stanza which prevents EC PGs from peering if they are below min_size because, at the time, Sa... Greg Farnum
02:15 PM Bug #18746 (Resolved): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+k...
Afternoon! It would be great if anyone could shed any light on a pretty serious issue we had last week.
Essentiall...
Yiorgos Stamoulis

01/30/2017

04:30 PM Cleanup #18734 (Resolved): crush: transparently deprecated ruleset/ruleid difference
The crush tools and ceph commands will make sure there is no difference between ruleset and ruleid. However, existing... Loïc Dachary
04:15 PM Bug #16236: cache/proxied ops from different primaries (cache interval change) don't order proper...
This bug is now haunting rados runs in jewel 10.2.6 integration testing:
/a/smithfarm-2017-01-30_11:11:11-rados-wi...
Nathan Cutler

01/27/2017

09:37 PM Bug #18599 (Pending Backport): bluestore: full osd will not start. _do_alloc_write failed to res...
Sage Weil
04:30 PM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
I also have core-files and full symbols but those are hundreds of MB's. I'd be happy to share those as needed. Jared Watts
04:29 PM Bug #18698 (Can't reproduce): BlueFS FAILED assert(0 == "allocate failed... wtf")
We are seeing this failed assertion and crash using embedded ceph in the rook project: https://github.com/rook/rook.
...
Jared Watts
02:15 PM Bug #18696 (New): OSD might assert when LTTNG tracing is enabled
Following assert happens occasionally when LTTNG is enabled:
2017-01-27 13:52:07.451981 7f9edbf80700 -1 /root/ceph/r...
Igor Fedotov
01:48 PM Bug #18681: ceph-disk prepare/activate misses steps and fails on [Bluestore]
Wido den Hollander wrote:
> I see you split WAL and RocksDB out to different disks. If you try without that, does th...
Leonid Prytuliak
09:55 AM Bug #18681: ceph-disk prepare/activate misses steps and fails on [Bluestore]
I see you split WAL and RocksDB out to different disks. If you try without that, does that work?
I tried with the ...
Wido den Hollander

01/26/2017

06:48 PM Bug #18599 (Fix Under Review): bluestore: full osd will not start. _do_alloc_write failed to res...
https://github.com/ceph/ceph/pull/13140 Sage Weil
04:05 PM Bug #18687 (Resolved): bluestore: ENOSPC writing to XFS block file on smithi
... Sage Weil
10:52 AM Bug #18681 (Won't Fix): ceph-disk prepare/activate misses steps and fails on [Bluestore]
After prepare disk for bluestore ceph-disk did not chown db and wal partions and action activate failed.
Debian 8....
Leonid Prytuliak

01/25/2017

03:35 PM Feature #8609: Improve ceph pg repair
Has anyone fixed this bug?
My test results show that pg repair is smart enough.
ceph can find the right replica, an...
cheng li
02:54 PM Bug #18667 (Can't reproduce): [cache tiering] omap data time-traveled to stale version
Noticed an oddity while examining the logs of an upgrade test failure [1] against a Jewel (v10.2.5+) cluster. An imag... Jason Dillaman
12:25 AM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
Nope, that fix didn't work. Backfill doesn't put objects into the needs_recovery_map. Reverting. Samuel Just

01/24/2017

06:48 PM Bug #18599: bluestore: full osd will not start. _do_alloc_write failed to reserve 0x10000, etc.
I have yet to spend time to figure out how to tell ceph-disk what size to make the partitions (whether through the co... Heath Jepson
12:48 AM Bug #18599: bluestore: full osd will not start. _do_alloc_write failed to reserve 0x10000, etc.
I think the root cause here is that the space reporting should not include the db partition, because that space canno... Sage Weil
01:40 PM Bug #15653 (In Progress): crush: low weight devices get too many objects for num_rep > 1
Loïc Dachary
12:13 PM Bug #15653: crush: low weight devices get too many objects for num_rep > 1
The test Adam wrote to demonstrate the problem, made into a pull request: https://github.com/ceph/ceph/pull/13083 Loïc Dachary
11:52 AM Bug #15653: crush: low weight devices get too many objects for num_rep > 1
See https://github.com/ceph/ceph/pull/10218 for a discussion and a tentative fix. Loïc Dachary
04:48 AM Bug #18647 (Resolved): ceph df output with erasure coded pools
I have 2 clusters with erasure coded pools. Since I upgraded to Jewel, the ceph df output shows erroneous data for t... David Turner
12:20 AM Bug #18643 (Closed): SnapTrimmer: inconsistencies may lead to snaptrimmer hang
In PrimaryLogPG::trim_object(), there are a few inconsistencies between clone state and the snapmapper that cause the... Josh Durgin
 

Also available in: Atom