Project

General

Profile

Activity

From 01/09/2017 to 02/07/2017

02/07/2017

01:33 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
FWIW: in our case, the rbd pool is tiered in write-back mode. Kjetil Joergensen
01:27 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
This particular snapshot were created on the 20th of January, and I'm relatively certain clients/osds/monitors/etc. r... Kjetil Joergensen
01:15 AM Backport #17445: jewel: list-snap cache tier missing promotion logic (was: rbd cli segfault when ...
I suspect we're hitting the same.... Kjetil Joergensen

02/06/2017

07:58 AM Feature #18826 (New): [RFE] Allow using an external DB file for extended attributes (xattr)
Allow using an external DB file for extended attributes (xattr) like Samba does[1], this would bring ceph on OSes whi... jiri b

02/03/2017

08:34 PM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
Running the repro scenario with `bluefs_allocator = stupid` did not reproduce the issue after running all night. It d... Jared Watts

01/31/2017

09:09 PM Bug #18752 (New): LibRadosList.EnumerateObjects failure
... Sage Weil
08:02 PM Bug #18750 (New): handle_pg_remove: pg_map_lock held for write when taking pg_lock
This could block the fast dispatch path, since fast dispatch takes pg_map_lock for read, so this makes it possibly bl... Josh Durgin
06:37 PM Bug #18749 (Resolved): OSD: allow EC PGs to do recovery below min_size
PG::choose_acting has a stanza which prevents EC PGs from peering if they are below min_size because, at the time, Sa... Greg Farnum
02:15 PM Bug #18746 (Resolved): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+k...
Afternoon! It would be great if anyone could shed any light on a pretty serious issue we had last week.
Essentiall...
Yiorgos Stamoulis

01/30/2017

04:30 PM Cleanup #18734 (Resolved): crush: transparently deprecated ruleset/ruleid difference
The crush tools and ceph commands will make sure there is no difference between ruleset and ruleid. However, existing... Loïc Dachary
04:15 PM Bug #16236: cache/proxied ops from different primaries (cache interval change) don't order proper...
This bug is now haunting rados runs in jewel 10.2.6 integration testing:
/a/smithfarm-2017-01-30_11:11:11-rados-wi...
Nathan Cutler

01/27/2017

09:37 PM Bug #18599 (Pending Backport): bluestore: full osd will not start. _do_alloc_write failed to res...
Sage Weil
04:30 PM Bug #18698: BlueFS FAILED assert(0 == "allocate failed... wtf")
I also have core-files and full symbols but those are hundreds of MB's. I'd be happy to share those as needed. Jared Watts
04:29 PM Bug #18698 (Can't reproduce): BlueFS FAILED assert(0 == "allocate failed... wtf")
We are seeing this failed assertion and crash using embedded ceph in the rook project: https://github.com/rook/rook.
...
Jared Watts
02:15 PM Bug #18696 (New): OSD might assert when LTTNG tracing is enabled
Following assert happens occasionally when LTTNG is enabled:
2017-01-27 13:52:07.451981 7f9edbf80700 -1 /root/ceph/r...
Igor Fedotov
01:48 PM Bug #18681: ceph-disk prepare/activate misses steps and fails on [Bluestore]
Wido den Hollander wrote:
> I see you split WAL and RocksDB out to different disks. If you try without that, does th...
Leonid Prytuliak
09:55 AM Bug #18681: ceph-disk prepare/activate misses steps and fails on [Bluestore]
I see you split WAL and RocksDB out to different disks. If you try without that, does that work?
I tried with the ...
Wido den Hollander

01/26/2017

06:48 PM Bug #18599 (Fix Under Review): bluestore: full osd will not start. _do_alloc_write failed to res...
https://github.com/ceph/ceph/pull/13140 Sage Weil
04:05 PM Bug #18687 (Resolved): bluestore: ENOSPC writing to XFS block file on smithi
... Sage Weil
10:52 AM Bug #18681 (Won't Fix): ceph-disk prepare/activate misses steps and fails on [Bluestore]
After prepare disk for bluestore ceph-disk did not chown db and wal partions and action activate failed.
Debian 8....
Leonid Prytuliak

01/25/2017

03:35 PM Feature #8609: Improve ceph pg repair
Has anyone fixed this bug?
My test results show that pg repair is smart enough.
ceph can find the right replica, an...
cheng li
02:54 PM Bug #18667 (Can't reproduce): [cache tiering] omap data time-traveled to stale version
Noticed an oddity while examining the logs of an upgrade test failure [1] against a Jewel (v10.2.5+) cluster. An imag... Jason Dillaman
12:25 AM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
Nope, that fix didn't work. Backfill doesn't put objects into the needs_recovery_map. Reverting. Samuel Just

01/24/2017

06:48 PM Bug #18599: bluestore: full osd will not start. _do_alloc_write failed to reserve 0x10000, etc.
I have yet to spend time to figure out how to tell ceph-disk what size to make the partitions (whether through the co... Heath Jepson
12:48 AM Bug #18599: bluestore: full osd will not start. _do_alloc_write failed to reserve 0x10000, etc.
I think the root cause here is that the space reporting should not include the db partition, because that space canno... Sage Weil
01:40 PM Bug #15653 (In Progress): crush: low weight devices get too many objects for num_rep > 1
Loïc Dachary
12:13 PM Bug #15653: crush: low weight devices get too many objects for num_rep > 1
The test Adam wrote to demonstrate the problem, made into a pull request: https://github.com/ceph/ceph/pull/13083 Loïc Dachary
11:52 AM Bug #15653: crush: low weight devices get too many objects for num_rep > 1
See https://github.com/ceph/ceph/pull/10218 for a discussion and a tentative fix. Loïc Dachary
04:48 AM Bug #18647 (Resolved): ceph df output with erasure coded pools
I have 2 clusters with erasure coded pools. Since I upgraded to Jewel, the ceph df output shows erroneous data for t... David Turner
12:20 AM Bug #18643 (Closed): SnapTrimmer: inconsistencies may lead to snaptrimmer hang
In PrimaryLogPG::trim_object(), there are a few inconsistencies between clone state and the snapmapper that cause the... Josh Durgin

01/22/2017

11:11 PM Bug #18328 (Need More Info): crush: flaky unitest:
both links are 404, jenkins expired them Loïc Dachary
01:47 PM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
> Now, is there anyone watching this space, that could integrate these patches, or should I post them elsewhere?
Y...
Nathan Cutler
02:33 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
Alexandre Oliva wrote:
> Here's another incremental patch, that fixes a problem in which the presence of multiple re...
Aaron T
02:30 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
Here's another incremental patch, that fixes a problem in which the presence of multiple read errors in the same PG m... Alexandre Oliva

01/21/2017

01:09 AM Bug #18629: osd: unclear error when authentication with monitors fails
The full log includes:... Josh Durgin
12:15 AM Bug #18629 (New): osd: unclear error when authentication with monitors fails
Seen on a cluster of Linode VMs (16 osds, 6 fail with this error). Here's the backtrace:... Patrick Donnelly

01/20/2017

06:38 PM Bug #18595: bluestore: allocator fails for 0x80000000 allocations
PRs:
* master https://github.com/ceph/ceph/pull/13010
* kraken https://github.com/ceph/ceph/pull/13011
Nathan Cutler
05:14 PM Bug #18595 (Resolved): bluestore: allocator fails for 0x80000000 allocations
Sage Weil
01:48 AM Bug #18595: bluestore: allocator fails for 0x80000000 allocations
Turns out this is an int64_t -> int thing. bah! Sage Weil
01:11 AM Bug #18595 (Resolved): bluestore: allocator fails for 0x80000000 allocations
I have a bluestore OSD that is near full where bluestore is calling BlueFS::reclaim_blocks. There is lots of space f... Sage Weil
11:05 AM Bug #18599 (Resolved): bluestore: full osd will not start. _do_alloc_write failed to reserve 0x1...
Excited to see how fast I could get into trouble with kraken, I created a small test cluster with 3x 32gb bluestore O... Heath Jepson

01/18/2017

08:21 PM Bug #18591 (New): Putting objects which are larger than 4MiB in EC pool displays `(95) Operation ...
# ./bin/ceph -v
ceph version 11.1.0-6210-gfcb8df1 (fcb8df1b57a9fcff75fa7496485f2ac5e85e7973)
# ./bin/ceph osd poo...
Shinobu Kinjo

01/13/2017

10:25 PM Bug #18527 (New): entity_addr_t comparison uses memcmp improperly (endianness bug across the wire)
We use a memcmp on the entity_addr_t object for its comparator, and we use that in resolving connection races within ... Greg Farnum
10:11 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
The previous patch was a bit too conservative when constructing the 'have' set from missing_loc in ECBackend::get_min... Alexandre Oliva
08:02 AM Bug #17949: make check: unittest_bit_alloc get_used_blocks() >= 0
The fix for this is in pull request:
https://github.com/ceph/ceph/pull/12733
Ramesh Chander
02:53 AM Bug #18445: ceph: ping <mon.id> doesn't connect to cluster
https://github.com/ceph/ceph/pull/12891 Dan Mick
02:32 AM Bug #18445: ceph: ping <mon.id> doesn't connect to cluster
Ah. It depends on how build_initial_monmap works.
In my case, ping mon.<specific> returns ENOENT, which the CLI ...
Dan Mick
12:30 AM Bug #18445: ceph: ping <mon.id> doesn't connect to cluster
Huh. So it does. I don't understand how both of these haven't just been failing consistently then. Dan Mick

01/12/2017

11:26 PM Bug #18178: Unfound objects lost after OSD daemons restarted
The issue is that repair uses recovery to fix a PG and in this scenario the recovery can't complete because the objec... David Zafman
09:03 PM Bug #18165 (Pending Backport): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfil...
Sam, this issue has "Backport: kraken, jewel" set. Have the backports been done already? Nathan Cutler
09:00 PM Bug #18165 (Resolved): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_target...
Samuel Just
08:39 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
*master PR*: https://github.com/ceph/ceph/pull/12888 Nathan Cutler
08:18 PM Bug #18165 (Pending Backport): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfil...
Sage Weil
03:51 PM Bug #17743: ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
https://jenkins.ceph.com/job/ceph-pull-requests/16883/consoleFull#-108728127277933967-90d1-4877-8d60-89cb08ef4eb1 Kefu Chai
03:34 PM Bug #18445: ceph: ping <mon.id> doesn't connect to cluster
workunits/mon/ping.py does test @ceph ping mon.{mon_id}@, see https://github.com/ceph/ceph/blob/master/qa/workunits/m... Kefu Chai
01:53 PM Support #18508: PGs of EC pool stuck in peering state
While looking at this with George I noticed that the async messenger was being used. We set it back to SimpleMessenge... Wido den Hollander
11:31 AM Support #18508 (Closed): PGs of EC pool stuck in peering state
We have a 30 host, 1080 OSD cluster with a mix of replicated and EC 8+3 pools, running Jewel on SL7.... George Vasilakakos

01/10/2017

04:17 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
Samuel Just
01:57 PM Bug #18467: ceph ping mon.* can fail
That's with "ms inject socket failures: 500" which is unchanged. What's a reasonable higher value to try - 1000? 5000? Nathan Cutler

01/09/2017

10:42 PM Bug #18467: ceph ping mon.* can fail
This isn't a particularly frequent error: http://pulpito.ceph.com/sage-2017-01-09_21:59:24-rados-wip-sage-testing---b... Sage Weil
10:21 PM Bug #18467: ceph ping mon.* can fail
The offending code in ... Nathan Cutler
09:57 PM Bug #18467 (Resolved): ceph ping mon.* can fail
... Sage Weil
10:07 PM Bug #18368 (Resolved): bluestore: bluefs reclaim broken
Sage Weil
08:25 PM Bug #18445 (Fix Under Review): ceph: ping <mon.id> doesn't connect to cluster
Dan Mick
06:35 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
I looked at it more closely. This is kind of wierd. Really, missing_loc is what's supposed to be the location-of-rec... Samuel Just
 

Also available in: Atom