Activity
From 12/26/2016 to 01/24/2017
01/24/2017
- 06:48 PM Bug #18599: bluestore: full osd will not start. _do_alloc_write failed to reserve 0x10000, etc.
- I have yet to spend time to figure out how to tell ceph-disk what size to make the partitions (whether through the co...
- 12:48 AM Bug #18599: bluestore: full osd will not start. _do_alloc_write failed to reserve 0x10000, etc.
- I think the root cause here is that the space reporting should not include the db partition, because that space canno...
- 01:40 PM Bug #15653 (In Progress): crush: low weight devices get too many objects for num_rep > 1
- 12:13 PM Bug #15653: crush: low weight devices get too many objects for num_rep > 1
- The test Adam wrote to demonstrate the problem, made into a pull request: https://github.com/ceph/ceph/pull/13083
- 11:52 AM Bug #15653: crush: low weight devices get too many objects for num_rep > 1
- See https://github.com/ceph/ceph/pull/10218 for a discussion and a tentative fix.
- 04:48 AM Bug #18647 (Resolved): ceph df output with erasure coded pools
- I have 2 clusters with erasure coded pools. Since I upgraded to Jewel, the ceph df output shows erroneous data for t...
- 12:20 AM Bug #18643 (Closed): SnapTrimmer: inconsistencies may lead to snaptrimmer hang
- In PrimaryLogPG::trim_object(), there are a few inconsistencies between clone state and the snapmapper that cause the...
01/22/2017
- 11:11 PM Bug #18328 (Need More Info): crush: flaky unitest:
- both links are 404, jenkins expired them
- 01:47 PM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
- > Now, is there anyone watching this space, that could integrate these patches, or should I post them elsewhere?
Y... - 02:33 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
- Alexandre Oliva wrote:
> Here's another incremental patch, that fixes a problem in which the presence of multiple re... - 02:30 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
- Here's another incremental patch, that fixes a problem in which the presence of multiple read errors in the same PG m...
01/21/2017
- 01:09 AM Bug #18629: osd: unclear error when authentication with monitors fails
- The full log includes:...
- 12:15 AM Bug #18629 (New): osd: unclear error when authentication with monitors fails
- Seen on a cluster of Linode VMs (16 osds, 6 fail with this error). Here's the backtrace:...
01/20/2017
- 06:38 PM Bug #18595: bluestore: allocator fails for 0x80000000 allocations
- PRs:
* master https://github.com/ceph/ceph/pull/13010
* kraken https://github.com/ceph/ceph/pull/13011 - 05:14 PM Bug #18595 (Resolved): bluestore: allocator fails for 0x80000000 allocations
- 01:48 AM Bug #18595: bluestore: allocator fails for 0x80000000 allocations
- Turns out this is an int64_t -> int thing. bah!
- 01:11 AM Bug #18595 (Resolved): bluestore: allocator fails for 0x80000000 allocations
- I have a bluestore OSD that is near full where bluestore is calling BlueFS::reclaim_blocks. There is lots of space f...
- 11:05 AM Bug #18599 (Resolved): bluestore: full osd will not start. _do_alloc_write failed to reserve 0x1...
- Excited to see how fast I could get into trouble with kraken, I created a small test cluster with 3x 32gb bluestore O...
01/18/2017
- 08:21 PM Bug #18591 (New): Putting objects which are larger than 4MiB in EC pool displays `(95) Operation ...
- # ./bin/ceph -v
ceph version 11.1.0-6210-gfcb8df1 (fcb8df1b57a9fcff75fa7496485f2ac5e85e7973)
# ./bin/ceph osd poo...
01/13/2017
- 10:25 PM Bug #18527 (New): entity_addr_t comparison uses memcmp improperly (endianness bug across the wire)
- We use a memcmp on the entity_addr_t object for its comparator, and we use that in resolving connection races within ...
- 10:11 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
- The previous patch was a bit too conservative when constructing the 'have' set from missing_loc in ECBackend::get_min...
- 08:02 AM Bug #17949: make check: unittest_bit_alloc get_used_blocks() >= 0
- The fix for this is in pull request:
https://github.com/ceph/ceph/pull/12733
- 02:53 AM Bug #18445: ceph: ping <mon.id> doesn't connect to cluster
- https://github.com/ceph/ceph/pull/12891
- 02:32 AM Bug #18445: ceph: ping <mon.id> doesn't connect to cluster
- Ah. It depends on how build_initial_monmap works.
In my case, ping mon.<specific> returns ENOENT, which the CLI ... - 12:30 AM Bug #18445: ceph: ping <mon.id> doesn't connect to cluster
- Huh. So it does. I don't understand how both of these haven't just been failing consistently then.
01/12/2017
- 11:26 PM Bug #18178: Unfound objects lost after OSD daemons restarted
- The issue is that repair uses recovery to fix a PG and in this scenario the recovery can't complete because the objec...
- 09:03 PM Bug #18165 (Pending Backport): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfil...
- Sam, this issue has "Backport: kraken, jewel" set. Have the backports been done already?
- 09:00 PM Bug #18165 (Resolved): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_target...
- 08:39 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
- *master PR*: https://github.com/ceph/ceph/pull/12888
- 08:18 PM Bug #18165 (Pending Backport): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfil...
- 03:51 PM Bug #17743: ceph_test_objectstore & test_objectstore_memstore.sh crashes in qa run (kraken)
- https://jenkins.ceph.com/job/ceph-pull-requests/16883/consoleFull#-108728127277933967-90d1-4877-8d60-89cb08ef4eb1
- 03:34 PM Bug #18445: ceph: ping <mon.id> doesn't connect to cluster
- workunits/mon/ping.py does test @ceph ping mon.{mon_id}@, see https://github.com/ceph/ceph/blob/master/qa/workunits/m...
- 01:53 PM Support #18508: PGs of EC pool stuck in peering state
- While looking at this with George I noticed that the async messenger was being used. We set it back to SimpleMessenge...
- 11:31 AM Support #18508 (Closed): PGs of EC pool stuck in peering state
- We have a 30 host, 1080 OSD cluster with a mix of replicated and EC 8+3 pools, running Jewel on SL7....
01/10/2017
- 04:17 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
- 01:57 PM Bug #18467: ceph ping mon.* can fail
- That's with "ms inject socket failures: 500" which is unchanged. What's a reasonable higher value to try - 1000? 5000?
01/09/2017
- 10:42 PM Bug #18467: ceph ping mon.* can fail
- This isn't a particularly frequent error: http://pulpito.ceph.com/sage-2017-01-09_21:59:24-rados-wip-sage-testing---b...
- 10:21 PM Bug #18467: ceph ping mon.* can fail
- The offending code in ...
- 09:57 PM Bug #18467 (Resolved): ceph ping mon.* can fail
- ...
- 10:07 PM Bug #18368 (Resolved): bluestore: bluefs reclaim broken
- 08:25 PM Bug #18445 (Fix Under Review): ceph: ping <mon.id> doesn't connect to cluster
- 06:35 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
- I looked at it more closely. This is kind of wierd. Really, missing_loc is what's supposed to be the location-of-rec...
01/07/2017
- 04:55 AM Bug #18445 (Won't Fix): ceph: ping <mon.id> doesn't connect to cluster
- I guess no one uses this feature. Misplaced 'connect()' call only applies to mon.* rados/singleton should have caug...
01/06/2017
- 02:24 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
- Here's a patch for jewel that, on top of the David Zafman's patch for 13937 and Kefu Chai's for 17857, enables osds t...
01/04/2017
- 06:50 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
- If anyone else hits this, you can work around it by extracting the object from the osd which has it, using mark_unfou...
- 06:49 PM Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
- The bug is still present and fairly straightforward: it's possible that the newest version isn't on an osd in the cur...
01/03/2017
- 06:09 PM Bug #16279: assert(objiter->second->version > last_divergent_update) failed
- I don't think so, (0'0,234'1034] is the master log, 234'1034 isn't divergent. You should trace back in the log to tr...
- 07:11 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
- Nevermind the patch; clearing backfills_in_flight seems to be wrong and dangerous; I think it might even make the obj...
- 01:34 AM Bug #18162: osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but ...
- I've been trying to address the problem that backfills abort when encountering a read error or a corrupt file in jewe...
12/30/2016
- 05:23 PM Bug #18368 (Fix Under Review): bluestore: bluefs reclaim broken
- https://github.com/ceph/ceph/pull/12725
- 05:22 PM Bug #18368 (Resolved): bluestore: bluefs reclaim broken
- return extent is always offset 0, and only one extent
Also available in: Atom