Project

General

Profile

Activity

From 03/11/2018 to 04/09/2018

04/09/2018

10:24 PM Feature #23616 (New): osd: admin socket should help debug status at all times
Last week I was looking at an LRC OSD which was having trouble, and it wasn't clear why.
The cause ended up being ...
Greg Farnum
10:18 PM Bug #22882 (Resolved): Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
Whoops, this merged way back then with a slightly different plan than discussed here (see PR discussion). Greg Farnum
09:59 PM Bug #22525: auth: ceph auth add does not sanity-check caps
https://github.com/ceph/ceph/pull/21311 Sage Weil
09:21 PM Feature #23045 (Resolved): mon: warn on slow ops in OpTracker
That PR got merged a while ago and we've been working through the slow ops warnings that turn up since. Seems to be a... Greg Farnum
08:59 PM Feature #21084 (Resolved): auth: add osd auth caps based on pool metadata
Patrick Donnelly
06:53 PM Bug #23614: local_reserver double-reservation of backfilled pg
Looking through the code I don't see where the reservation is supposed to be released. I see releases for
- the p...
Sage Weil
06:52 PM Bug #23614 (Resolved): local_reserver double-reservation of backfilled pg
- pg gets reservations (incl local_reserver)
- pg backfills, finishes
- ...apparently enver releases the reservatio...
Sage Weil
06:15 PM Bug #23365: CEPH device class not honored for erasure encoding.
A quote from Greg Farnum on the crash from another ticket:... Brian Woods
06:13 PM Bug #23365: CEPH device class not honored for erasure encoding.
I put 12.2.2, but that is incorrect. It is version ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) lu... Brian Woods
05:38 PM Bug #23365: CEPH device class not honored for erasure encoding.
What version are you running? How are your OSDs configured?
There was a bug with BlueStore SSDs being misreported ...
Greg Farnum
05:36 PM Bug #23371: OSDs flaps when cluster network is made down
You tested this on a version prior to luminous and the behavior has *changed*?
This must be a result of some chang...
Greg Farnum
05:24 PM Documentation #23613 (Closed): doc: add description of new fs-client auth profile
Patrick Donnelly
05:23 PM Documentation #23613 (Closed): doc: add description of new fs-client auth profile
On that page: http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
https:...
Patrick Donnelly
05:23 PM Documentation #23612 (New): doc: add description of new auth profiles
On that page: http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
https:...
Patrick Donnelly
05:18 PM Support #23455 (Resolved): osd: large number of inconsistent objects after recover or backfilling
fiemap is disabled by default precisely because there are a number of known bugs in the local filesystems across kern... Greg Farnum
05:07 PM Bug #23610 (Fix Under Review): pg stuck in activating because of dropped pg-temp message
https://github.com/ceph/ceph/pull/21310 Kefu Chai
05:02 PM Bug #23610 (Resolved): pg stuck in activating because of dropped pg-temp message
http://pulpito.ceph.com/yuriw-2018-04-05_22:33:03-rados-wip-yuri3-testing-2018-04-05-1940-luminous-distro-basic-smith... Kefu Chai
05:06 PM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
This is a dupe of...something. We can track it down later.
For now, note that the crash is happening with Hammer c...
Greg Farnum
06:17 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
Hm hm hm Nathan Cutler
02:56 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
h3. rados bisect
Reproducer: ...
Nathan Cutler
02:11 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
This problem was not happening so reproducibly before the current integration run, so one of the following PRs might ... Nathan Cutler
02:05 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
Set priority to Urgent because this prevents us from getting a clean rados run in jewel 10.2.11 integration testing. Nathan Cutler
02:04 AM Bug #23598 (Duplicate): hammer->jewel: ceph_test_rados crashes during radosbench task in jewel ra...
Test description: rados/upgrade/{hammer-x-singleton/{0-cluster/{openstack.yaml start.yaml} 1-hammer-install/hammer.ya... Nathan Cutler
04:39 PM Bug #23595: osd: recovery/backfill is extremely slow
*I have it figured out!*
The issue was "osd_recovery_sleep_hdd", which defaults to 0.1 seconds.
After setting
...
Niklas Hambuechen
03:23 PM Bug #23595: osd: recovery/backfill is extremely slow
OK, if I only have the 6 large files in the cephfs AND set the options... Niklas Hambuechen
02:55 PM Bug #23595: osd: recovery/backfill is extremely slow
I have now tested with only the 6*1GB files, having deleted the 270k empty files from cephfs.
I continue to see ex...
Niklas Hambuechen
12:30 PM Bug #23595: osd: recovery/backfill is extremely slow
You can find a core dump of the -O0 version created with GDB at http://nh2.me/ceph-issue-23595-osd-O0.core.xz Niklas Hambuechen
12:06 PM Bug #23595: osd: recovery/backfill is extremely slow
Attached are two GDB runs of a sender node.
In the release build there were many values "<optimized out>", so I re...
Niklas Hambuechen
11:45 AM Bug #23595: osd: recovery/backfill is extremely slow
On https://forum.proxmox.com/threads/increase-ceph-recovery-speed.36728/ people reported the same number as me of 10 ... Niklas Hambuechen
10:43 AM Bug #23601 (Resolved): Cannot see the value of parameter osd_op_threads via ceph --show-config
I have set the parameter of "osd op threads" in configuration file
but I cannot see the value of parameter "osd op t...
Cyril Chang
10:17 AM Bug #23403 (Need More Info): Mon cannot join quorum
Brad Hubbard
07:23 AM Bug #23578 (In Progress): large-omap-object-warnings test fails
https://github.com/ceph/ceph/pull/21295 Brad Hubbard
01:33 AM Bug #23578: large-omap-object-warnings test fails
We instruct the OSDs to scrub at around 16:15.... Brad Hubbard
04:31 AM Bug #23593 (Fix Under Review): RESTControllerTest.test_detail_route and RESTControllerTest.test_f...
Kefu Chai
02:08 AM Bug #22123: osd: objecter sends out of sync with pg epochs for proxied ops
Despite the jewel backport of this fix being merged, this problem has reappeared in jewel 10.2.11 integration testing... Nathan Cutler

04/08/2018

07:55 PM Bug #23595: osd: recovery/backfill is extremely slow
For the record, I installed the following debugging packages for gdb stack traces:... Niklas Hambuechen
07:53 PM Bug #23595: osd: recovery/backfill is extremely slow
I have read https://www.spinics.net/lists/ceph-devel/msg38331.html which suggests that there is some throttling going... Niklas Hambuechen
06:17 PM Bug #23595 (Duplicate): osd: recovery/backfill is extremely slow
I made a Ceph 12.2.4 (luminous stable) cluster of 3 machines with 10-Gigabit networking on Ubuntu 16.04, using pretty... Niklas Hambuechen
05:40 PM Bug #23593: RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
PR: https://github.com/ceph/ceph/pull/21290 Ricardo Dias
03:10 PM Bug #23593 (Resolved): RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
... Kefu Chai
04:31 PM Documentation #23594: auth: document what to do when locking client.admin out
I found one way to fix it on the mailing list:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/01...
Niklas Hambuechen
04:23 PM Documentation #23594 (New): auth: document what to do when locking client.admin out
I accidentally ran ... Niklas Hambuechen
11:06 AM Bug #23590: kstore: statfs: (95) Operation not supported
https://github.com/ceph/ceph/pull/21287 Honggang Yang
11:01 AM Bug #23590 (Fix Under Review): kstore: statfs: (95) Operation not supported
2018-04-07 16:19:07.248 7fdec4675700 -1 osd.0 0 statfs() failed: (95) Operation not supported
2018-04-07 16:19:08....
Honggang Yang
08:50 AM Bug #23589 (New): jewel: KStore Segmentation fault in ceph_test_objectstore --gtest_filter=-*/2:-*/3
Test description: rados/objectstore/objectstore.yaml
Log excerpt:...
Nathan Cutler
08:39 AM Bug #23588 (New): LibRadosAioEC.IsCompletePP test fails in jewel 10.2.11 integration testing
Test description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yam... Nathan Cutler
06:53 AM Bug #23511: forwarded osd_failure leak in mon
Greg, no. both tests below include the no_reply() fix.
see
- http://pulpito.ceph.com/kchai-2018-03-30_15:29:52-r...
Kefu Chai
06:42 AM Bug #23585 (Duplicate): osd: safe_timer segfault
... Alex Gorbachev

04/07/2018

03:04 AM Bug #23195: Read operations segfaulting multiple OSDs

Change the test-erasure-eio.sh test as following:...
David Zafman

04/06/2018

10:23 PM Bug #22165 (Fix Under Review): split pg not actually created, gets stuck in state unknown
Fixed by https://github.com/ceph/ceph/pull/20469 Sage Weil
09:29 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
You'll definitely get more attention and advice if somebody else has hit this issue before. Greg Farnum
08:45 PM Bug #23195: Read operations segfaulting multiple OSDs
For anyone running into the send_all_remaining_reads() crash, a workaround is to use these osd settings:... Josh Durgin
04:17 PM Bug #23195 (Fix Under Review): Read operations segfaulting multiple OSDs
https://github.com/ceph/ceph/pull/21273
I'm going to treat this issue as tracking the first crash, in send_all_rem...
Josh Durgin
03:10 AM Bug #23195 (In Progress): Read operations segfaulting multiple OSDs
Josh Durgin
08:41 PM Bug #23200 (Resolved): invalid JSON returned when querying pool parameters
Nathan Cutler
08:40 PM Backport #23312 (Resolved): luminous: invalid JSON returned when querying pool parameters
Nathan Cutler
07:28 PM Backport #23312: luminous: invalid JSON returned when querying pool parameters
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20890
merged
Yuri Weinstein
08:40 PM Bug #23324 (Resolved): delete type mismatch in CephContext teardown
Nathan Cutler
08:40 PM Backport #23412 (Resolved): luminous: delete type mismatch in CephContext teardown
Nathan Cutler
07:28 PM Backport #23412: luminous: delete type mismatch in CephContext teardown
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20998
merged
Yuri Weinstein
08:38 PM Bug #23477 (Resolved): should not check for VERSION_ID
Nathan Cutler
08:38 PM Backport #23478 (Resolved): should not check for VERSION_ID
Nathan Cutler
07:26 PM Backport #23478: should not check for VERSION_ID
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21090
merged
Yuri Weinstein
06:03 PM Bug #21833 (Resolved): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
Nathan Cutler
06:02 PM Backport #23160 (Resolved): luminous: Multiple asserts caused by DNE pgs left behind after lots o...
Nathan Cutler
03:57 PM Backport #23160: luminous: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
Prashant D wrote:
> Waiting for code review for backport PR : https://github.com/ceph/ceph/pull/20668
merged
Yuri Weinstein
06:02 PM Bug #23078 (Resolved): SRV resolution fails to lookup AAAA records
Nathan Cutler
06:02 PM Backport #23174 (Resolved): luminous: SRV resolution fails to lookup AAAA records
Nathan Cutler
03:56 PM Backport #23174: luminous: SRV resolution fails to lookup AAAA records
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20710
merged
Yuri Weinstein
05:57 PM Backport #23472 (Resolved): luminous: add --add-bucket and --move options to crushtool
Nathan Cutler
03:53 PM Backport #23472: luminous: add --add-bucket and --move options to crushtool
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21079
mergedReviewed-by: Josh Durgin <jdurgin@redhat.com>
Yuri Weinstein
05:37 PM Bug #23578 (Resolved): large-omap-object-warnings test fails
... Sage Weil
03:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
Sorry, forgot to mention I am running 12.2.4. Michael Sudnick
03:50 PM Bug #23576 (Can't reproduce): osd: active+clean+inconsistent pg will not scrub or repair
My apologies if I'm too premature in posting this.
Myself and so far two others on the mailing list: http://lists....
Michael Sudnick
03:44 AM Bug #23345 (Resolved): `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
https://github.com/ceph/ceph/pull/20986 Joao Eduardo Luis
01:57 AM Bug #21737 (Resolved): OSDMap cache assert on shutdown
Nathan Cutler
01:56 AM Backport #21786 (Resolved): jewel: OSDMap cache assert on shutdown
Nathan Cutler

04/05/2018

09:12 PM Bug #22887 (Duplicate): osd/ECBackend.cc: 2202: FAILED assert((offset + length) <= (range.first.g...
Greg Farnum
09:12 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
From #22887, this also appeared in /ceph/teuthology-archive/pdonnell-2018-01-30_23:38:56-kcephfs-wip-pdonnell-i22627-... Greg Farnum
09:09 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
That was the fix I was wondering about, but it was merged to master as https://github.com/ceph/ceph/pull/15712 and so... Greg Farnum
09:05 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
https://github.com/ceph/ceph/pull/15712 Greg Farnum
09:10 PM Bug #19882: rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0.10251ca0c5f...
https://github.com/ceph/ceph/pull/15712 Greg Farnum
06:35 PM Bug #22351 (Resolved): Couldn't init storage provider (RADOS)
Nathan Cutler
06:35 PM Backport #23349 (Resolved): luminous: Couldn't init storage provider (RADOS)
Nathan Cutler
05:22 PM Backport #23349: luminous: Couldn't init storage provider (RADOS)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20896
merged
Yuri Weinstein
06:33 PM Bug #22114 (Resolved): mon: ops get stuck in "resend forwarded message to leader"
Nathan Cutler
06:33 PM Backport #23077 (Resolved): luminous: mon: ops get stuck in "resend forwarded message to leader"
Nathan Cutler
04:57 PM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21016
merged
Yuri Weinstein
04:57 PM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21016
merged
Yuri Weinstein
06:31 PM Bug #22752 (Resolved): snapmapper inconsistency, crash on luminous
Nathan Cutler
06:31 PM Backport #23500 (Resolved): luminous: snapmapper inconsistency, crash on luminous
Nathan Cutler
04:55 PM Backport #23500: luminous: snapmapper inconsistency, crash on luminous
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21118
merged
Yuri Weinstein
05:14 PM Bug #23565 (Fix Under Review): Inactive PGs don't seem to cause HEALTH_ERR
In looking at https://tracker.ceph.com/issues/23562, there were inactive PGs starting at... Greg Farnum
04:43 PM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
... Sage Weil
04:18 PM Bug #23564 (Duplicate): OSD Segfaults
Apr 5 11:40:31 roc05r-sc3a100 kernel: [126029.543698] safe_timer[28863]: segfault at 8d ip 00007fa9ad4dcccb sp 00007... Alex Gorbachev
12:24 PM Bug #23562 (New): VDO OSD caused cluster to hang
I awoke to alerts that apache serving teuthology logs on the Octo Long Running Cluster was unresponsive.
Here was ...
David Galloway
08:37 AM Bug #23439: Crashing OSDs after 'ceph pg repair'
Hi Greg,
thanks for your response.
> That URL denies access. You can use ceph-post-file instead to upload logs ...
Jan Marquardt
03:31 AM Bug #23403: Mon cannot join quorum
My apologies. It appears my previous analysis was incorrect.
I've pored over the logs and it appears the issue is ...
Brad Hubbard

04/04/2018

11:19 PM Bug #23554: mon: mons need to be aware of VDO statistics
Right, but AFAICT the monitor is then not even aware of VDO being involved. Which seems fine to my naive thoughts, bu... Greg Farnum
11:05 PM Bug #23554: mon: mons need to be aware of VDO statistics
Of course Sage is already on it :)
I don't know where the ...
David Galloway
10:46 PM Bug #23554: mon: mons need to be aware of VDO statistics
At least this: https://github.com/ceph/ceph/pull/20516 Josh Durgin
10:44 PM Bug #23554: mon: mons need to be aware of VDO statistics
What would we expect this monitor awareness to look like? Extra columns duplicating the output of vdostats? Greg Farnum
05:48 PM Bug #23554 (New): mon: mons need to be aware of VDO statistics
I created an OSD on top of a logical volume with a VDO device underneath.
Ceph is unaware of how much compression ...
David Galloway
09:58 PM Bug #23128: invalid values in ceph.conf do not issue visible warnings
http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/ has been updated with information about this Josh Durgin
09:53 PM Bug #23273: segmentation fault in PrimaryLogPG::recover_got()
Can you reproduce with osds configured with:... Josh Durgin
09:43 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
That URL denies access. You can use ceph-post-file instead to upload logs to a secure location.
It's not clear wha...
Greg Farnum
09:39 PM Bug #23320 (Fix Under Review): OSD suicide itself because of a firewall rule but reports a receiv...
github.com/ceph/ceph/pull/21000 Greg Farnum
09:37 PM Bug #23487: There is no 'ceph osd pool get erasure allow_ec_overwrites' command
Greg Farnum
09:31 PM Bug #23510 (Resolved): rocksdb spillover for hard drive configurations
Greg Farnum
09:31 PM Bug #23511: forwarded osd_failure leak in mon
Kefu, did your latest no_reply() PR resolve this? Greg Farnum
09:29 PM Bug #23535 (Closed): 'ceph --show-config --conf /dev/null' does not work any more
Yeah, you should use the monitor config commands now! :) Greg Farnum
09:28 PM Bug #23258: OSDs keep crashing.
Brian, that's a separate bug; the code address you've picked up on is just part of the generic failure handling code.... Greg Farnum
09:19 PM Bug #23258: OSDs keep crashing.
I was about to start a new bug and found this, I am also seeing 0xa74234 and ceph::__ceph_assert_fail...
A while b...
Brian Woods
09:22 PM Bug #20924: osd: leaked Session on osd.7
/a/sage-2018-04-04_02:28:04-rados-wip-sage2-testing-2018-04-03-1634-distro-basic-smithi/2351291
rados/verify/{ceph...
Sage Weil
09:21 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
Under discussion on the PR, which is good on its own terms but suffering from a prior CephFS bug. :( Greg Farnum
09:19 PM Bug #23297: mon-seesaw 'failed to become clean before timeout' due to laggy pg create
I suspect this is resolved in https://github.com/ceph/ceph/pull/19973 by the commit that has the OSDs proactively go ... Greg Farnum
09:16 PM Bug #23490: luminous: osd: double recovery reservation for PG when EIO injected (while already re...
David, can you look at this when you get a chance? I think it's due to EIO triggering recovery when recovery is alrea... Josh Durgin
09:13 PM Bug #23204: missing primary copy of object in mixed luminous<->master cluster with bluestore
We should see this again as we run the upgrade suite for mimic... Greg Farnum
09:08 PM Bug #22902 (Resolved): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
https://github.com/ceph/ceph/pull/20933 Josh Durgin
09:07 PM Bug #23267 (Pending Backport): scrub errors not cleared on replicas can cause inconsistent pg sta...
Greg Farnum
07:25 PM Backport #23413 (Resolved): jewel: delete type mismatch in CephContext teardown
Nathan Cutler
07:23 PM Bug #20471 (Resolved): Can't repair corrupt object info due to bad oid on all replicas
Nathan Cutler
07:23 PM Backport #23181 (Resolved): jewel: Can't repair corrupt object info due to bad oid on all replicas
Nathan Cutler
06:24 PM Bug #21758 (Resolved): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
Nathan Cutler
06:24 PM Backport #21784 (Resolved): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make check...
Nathan Cutler
06:18 PM Feature #23242 (Resolved): ceph-objectstore-tool command to trim the pg log
Nathan Cutler
06:18 PM Backport #23307 (Resolved): jewel: ceph-objectstore-tool command to trim the pg log
Nathan Cutler
08:14 AM Feature #23552 (New): cache PK11Context in Connection and probably other consumers of CryptoKeyHa...
please see attached flamegraph, the 0.67% CPU cycle is used by PK11_CreateContextBySymKey(), if we cache the PK11Cont... Kefu Chai

04/03/2018

08:40 PM Bug #23145: OSD crashes during recovery of EC pg
Investigation results up to the date:
1. The local PGLog claims its _pg_log_t::can_rollback_to_ is **17348'18588**...
Radoslaw Zarzynski
08:59 AM Backport #22906 (Need More Info): jewel: bluestore: New OSD - Caught signal - bstore_kv_sync (thr...
non-trivial backport Nathan Cutler
08:56 AM Backport #22808 (Need More Info): jewel: "osd pool stats" shows recovery information bugly
non-trivial backport Nathan Cutler
08:33 AM Backport #22808 (In Progress): jewel: "osd pool stats" shows recovery information bugly
Nathan Cutler
08:28 AM Backport #22449 (In Progress): jewel: Visibility for snap trim queue length
https://github.com/ceph/ceph/pull/21200 Piotr Dalek
08:13 AM Backport #22449: jewel: Visibility for snap trim queue length
I don't think it's possible to backport entire feature without breaking Jewel->Luminous upgrade, so just first commit... Piotr Dalek
08:22 AM Backport #22403 (In Progress): jewel: osd: replica read can trigger cache promotion
Nathan Cutler
08:15 AM Backport #22390 (In Progress): jewel: ceph-objectstore-tool: Add option "dump-import" to examine ...
Nathan Cutler
04:05 AM Backport #23486 (In Progress): jewel: scrub errors not cleared on replicas can cause inconsistent...
Nathan Cutler
02:35 AM Backport #21786 (In Progress): jewel: OSDMap cache assert on shutdown
Nathan Cutler

04/02/2018

05:35 PM Bug #23145: OSD crashes during recovery of EC pg
Anything new or info on what to do to try and recover this cluster? I don't even know how to get the pool deleted pro... Peter Woodman
10:28 AM Bug #23535: 'ceph --show-config --conf /dev/null' does not work any more
I just realized `--show-config` does not exist anymore. Probably it was removed intentionally? Mykola Golub

04/01/2018

07:49 AM Bug #23535 (Closed): 'ceph --show-config --conf /dev/null' does not work any more
Previously it could be used by users to return the default ceph configuration (see e.g. [1]), now it fails (even if w... Mykola Golub
07:03 AM Backport #21784 (In Progress): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make ch...
Nathan Cutler
06:58 AM Backport #22449 (Need More Info): jewel: Visibility for snap trim queue length
Backporting this feature to jewel at this late stage seems risky. Do we really need it in jewel? Nathan Cutler

03/30/2018

05:10 PM Bug #22123 (Resolved): osd: objecter sends out of sync with pg epochs for proxied ops
Nathan Cutler
05:09 PM Backport #23076 (Resolved): jewel: osd: objecter sends out of sync with pg epochs for proxied ops
Nathan Cutler
03:31 PM Bug #23511: forwarded osd_failure leak in mon
rerunning the tests at http://pulpito.ceph.com/kchai-2018-03-30_15:29:52-rados-wip-slow-mon-ops-kefu-distro-basic-smi... Kefu Chai
01:02 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
Moving this to CI. This failure would only occur if the cls_XYX.so libraries could not be loaded during the execution... Jason Dillaman
02:59 AM Bug #23517 (Resolved): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
... Kefu Chai
05:25 AM Bug #23510: rocksdb spillover for hard drive configurations
Igor Fedotov wrote:
> Ben,
> this has been fixed by https://github.com/ceph/ceph/pull/19257
> Not sure about an ex...
Nathan Cutler
12:10 AM Bug #23403 (Triaged): Mon cannot join quorum
... Brad Hubbard

03/29/2018

06:39 PM Bug #21218 (Resolved): thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing...
Nathan Cutler
06:39 PM Backport #23024 (Resolved): luminous: thrash-eio + bluestore (hangs with unfound objects or read_...
Nathan Cutler
01:20 PM Backport #23024: luminous: thrash-eio + bluestore (hangs with unfound objects or read_log_and_mis...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20495
merged
Yuri Weinstein
03:39 PM Bug #23510: rocksdb spillover for hard drive configurations
Ben,
this has been fixed by https://github.com/ceph/ceph/pull/19257
Not sure about an exact Luminous build it lande...
Igor Fedotov
03:02 PM Bug #23510 (Resolved): rocksdb spillover for hard drive configurations
version: ceph-*-12.2.1-34.el7cp.x86_64
One of Bluestore's best use cases is to accelerate performance for writes o...
Ben England
03:33 PM Bug #22413 (Resolved): can't delete object from pool when Ceph out of space
Nathan Cutler
03:33 PM Backport #23114 (Resolved): luminous: can't delete object from pool when Ceph out of space
Nathan Cutler
01:19 PM Backport #23114: luminous: can't delete object from pool when Ceph out of space
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20585
merged
Yuri Weinstein
03:08 PM Bug #23511 (Can't reproduce): forwarded osd_failure leak in mon
see http://pulpito.ceph.com/kchai-2018-03-29_13:20:02-rados-wip-slow-mon-ops-kefu-distro-basic-smithi/2334154/
<p...
Kefu Chai
01:24 PM Bug #22847 (Resolved): ceph osd force-create-pg cause all ceph-mon to crash and unable to come up...
Nathan Cutler
01:24 PM Backport #22942 (Resolved): luminous: ceph osd force-create-pg cause all ceph-mon to crash and un...
Nathan Cutler
01:21 PM Backport #22942: luminous: ceph osd force-create-pg cause all ceph-mon to crash and unable to com...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20399
merged
Yuri Weinstein
01:23 PM Backport #23075 (Resolved): luminous: osd: objecter sends out of sync with pg epochs for proxied ops
Nathan Cutler
01:18 PM Backport #23075: luminous: osd: objecter sends out of sync with pg epochs for proxied ops
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20609
merged
Yuri Weinstein
10:28 AM Bug #19737 (Resolved): EAGAIN encountered during pg scrub (jewel)
Nathan Cutler
09:54 AM Backport #23500 (In Progress): luminous: snapmapper inconsistency, crash on luminous
Nathan Cutler
08:20 AM Backport #23500 (Resolved): luminous: snapmapper inconsistency, crash on luminous
https://github.com/ceph/ceph/pull/21118 Nathan Cutler
09:16 AM Bug #21844 (Resolved): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions on -ENOENT
Nathan Cutler
09:16 AM Backport #21923 (Resolved): jewel: Objecter::C_ObjectOperation_sparse_read throws/catches excepti...
Nathan Cutler
09:16 AM Bug #23403: Mon cannot join quorum
Hi all,
As asked on the ceph-users mailing list, here are the results of the following commands on the 3 monitors:...
Julien Lavesque
09:09 AM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
Happened again (jewel 10.2.11 integration testing) - http://qa-proxy.ceph.com/teuthology/smithfarm-2018-03-28_20:31:4... Nathan Cutler
08:25 AM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
I've seen this on our cluster (luminous, bluestore based), but was unable to reproduce it...
Restarting primary mon...
Marcin Gibula
01:43 AM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
when we reboot one host, some osd take a long time to start.
and one osd succeed to start finally after several tim...
tangwenjun tang
01:11 AM Bug #17170 (New): mon/monclient: update "unable to obtain rotating service keys when osd init" to...
We hit this issue again in Luminous. xie xingguo
08:16 AM Backport #23186 (Resolved): luminous: ceph tell mds.* <command> prints only one matching usage
Nathan Cutler
08:15 AM Bug #23212 (Resolved): bluestore: should recalc_allocated when decoding bluefs_fnode_t
Nathan Cutler
08:15 AM Backport #23256 (Resolved): luminous: bluestore: should recalc_allocated when decoding bluefs_fno...
Nathan Cutler
08:15 AM Bug #23298 (Resolved): filestore: do_copy_range replay bad return value
Nathan Cutler
08:14 AM Backport #23351 (Resolved): luminous: filestore: do_copy_range replay bad return value
Nathan Cutler
04:10 AM Bug #23228: scrub mismatch on objects

Just bytes
dzafman-2018-03-28_18:21:29-rados-wip-zafman-testing-distro-basic-smithi/2332093
[ERR] 3.0 scrub s...
David Zafman
04:07 AM Bug #23495 (Resolved): Need (SLOW_OPS) in whitelist for another yaml

A job may have failed because (SLOW_OPS) is missing from tasks/mon_clock_with_skews.yaml
dzafman-2018-03-28_18:2...
David Zafman
02:09 AM Feature #23493 (Resolved): config: strip/escape single-quotes in values when setting them via con...
At the moment, the config parsing state machine does not account for single-quotes as potential value enclosures, as ... Joao Eduardo Luis
01:09 AM Bug #23492 (Resolved): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-e...

dzafman-2018-03-28_15:20:23-rados:standalone-wip-zafman-testing-distro-basic-smithi/2331804
In TEST_rados_get_ba...
David Zafman
12:29 AM Bug #22752 (Pending Backport): snapmapper inconsistency, crash on luminous
Kefu Chai

03/28/2018

10:58 PM Bug #23490 (Duplicate): luminous: osd: double recovery reservation for PG when EIO injected (whil...
During a luminous test run, this was hit:
http://pulpito.ceph.com/yuriw-2018-03-27_21:16:27-rados-wip-yuri5-testin...
Josh Durgin
10:26 PM Backport #23186: luminous: ceph tell mds.* <command> prints only one matching usage
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/20664
merged
Yuri Weinstein
10:26 PM Backport #23256: luminous: bluestore: should recalc_allocated when decoding bluefs_fnode_t
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/20771
merged
Yuri Weinstein
10:22 PM Backport #23351: luminous: filestore: do_copy_range replay bad return value
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20957
merged
Yuri Weinstein
06:06 PM Bug #23487 (Fix Under Review): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
PR: https://github.com/ceph/ceph/pull/21102 Mykola Golub
05:58 PM Bug #23487 (Resolved): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
We have `ceph osd pool set erasure allow_ec_overwrites` command but does not have a corresponding command to get the ... Mykola Golub
05:42 PM Backport #23486 (Resolved): jewel: scrub errors not cleared on replicas can cause inconsistent pg...
https://github.com/ceph/ceph/pull/21194 David Zafman
05:42 PM Backport #23485 (Resolved): luminous: scrub errors not cleared on replicas can cause inconsistent...
https://github.com/ceph/ceph/pull/21103 David Zafman
05:27 PM Bug #23267 (Fix Under Review): scrub errors not cleared on replicas can cause inconsistent pg sta...
https://github.com/ceph/ceph/pull/21101 David Zafman
11:21 AM Bug #22114 (Pending Backport): mon: ops get stuck in "resend forwarded message to leader"
Kefu Chai
08:15 AM Backport #23478 (In Progress): should not check for VERSION_ID
https://github.com/ceph/ceph/pull/21090 Kefu Chai
08:08 AM Backport #23478 (Resolved): should not check for VERSION_ID
https://github.com/ceph/ceph/pull/21090 Kefu Chai
08:07 AM Bug #23477 (Pending Backport): should not check for VERSION_ID
* https://github.com/ceph/ceph/pull/17787
* https://github.com/ceph/ceph/pull/21052
Kefu Chai
08:06 AM Bug #23477 (Resolved): should not check for VERSION_ID
as per os-release(5), VERSION_ID is optional. Kefu Chai
07:06 AM Bug #23352: osd: segfaults under normal operation
for those who wants to check the coredump. you should use apport-unpack to unpack it first.
and it crashed at /bui...
Kefu Chai
05:55 AM Backport #23413 (In Progress): jewel: delete type mismatch in CephContext teardown
https://github.com/ceph/ceph/pull/21084 Prashant D
01:28 AM Backport #23472 (In Progress): luminous: add --add-bucket and --move options to crushtool
https://github.com/ceph/ceph/pull/21079 Kefu Chai
12:57 AM Backport #23472 (Resolved): luminous: add --add-bucket and --move options to crushtool
https://github.com/ceph/ceph/pull/21079 Kefu Chai
12:50 AM Bug #23471 (Pending Backport): add --add-bucket and --move options to crushtool
https://github.com/ceph/ceph/pull/20183 Kefu Chai
12:49 AM Bug #23471 (Resolved): add --add-bucket and --move options to crushtool
When using crushtool to create a CRUSH map, it is not possible to create a complex CRUSH map, we have to edit the CRU... Kefu Chai

03/27/2018

10:46 PM Bug #23352: osd: segfaults under normal operation
Chris,
Was your stack identical to Alex's original description or was it more like the stack in #23431 ?
Brad Hubbard
10:39 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
I agree these are similar and the cause may indeed be the same however there are only two stack frames in this instan... Brad Hubbard
07:36 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
There's a coredump-in-apport on google drive in http://tracker.ceph.com/issues/23352 - it looks at the face of it sim... Kjetil Joergensen
01:06 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
I have seen this as well, on our cluster. We're using bluestore, ubuntu 16, latest luminous.
The crashes were totall...
Marcin Gibula
10:58 AM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
The ceph-osd comes from https://download.ceph.com/rpm-luminous/el7/x86_64/
I verified via md5sum if the the local co...
Dietmar Rieder
09:43 AM Bug #23431 (Need More Info): OSD Segmentation fault in thread_name:safe_timer
What's the exact version of the ceph-osd you are using (exact package URL if possible please).
You could try 'objd...
Brad Hubbard
02:52 PM Feature #22420 (Resolved): Add support for obtaining a list of available compression options
https://github.com/ceph/ceph/pull/20558 Kefu Chai
02:45 PM Bug #23215 (Resolved): config.cc: ~/.ceph/$cluster.conf is passed unexpanded to fopen()
https://github.com/ceph/ceph/pull/20774 Kefu Chai
09:49 AM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
might want to include https://github.com/ceph/ceph/pull/21057 also. Kefu Chai
09:49 AM Bug #22114 (Fix Under Review): mon: ops get stuck in "resend forwarded message to leader"
and https://github.com/ceph/ceph/pull/21057 Kefu Chai
01:35 AM Bug #22220 (Resolved): osd/ReplicatedPG.h:1667:14: internal compiler error: in force_type_die, at...
Resolved for Fedora and just waiting on next DTS to ship on rhel/CentOS. Brad Hubbard

03/26/2018

11:27 PM Bug #23465: "Mutex.cc: 110: FAILED assert(r == 0)" ("AttributeError: 'tuple' object has no attrib...
This isn't related to that suite commit. Run manually, 'file' returns "remote/smithi150/coredump/1522085413.12350.cor... Josh Durgin
07:42 PM Bug #23465 (New): "Mutex.cc: 110: FAILED assert(r == 0)" ("AttributeError: 'tuple' object has no ...
I see latest commit https://github.com/ceph/ceph/commit/c6760eba50860d40e25483c3e4cee772f3ad4468#diff-289c6ff15fd25ac... Yuri Weinstein
09:11 AM Backport #23316 (Need More Info): jewel: pool create cmd's expected_num_objects is not correctly ...
To backport this to jewel, we need to skip mgr changes and qa/standalone/mon/osd-pool-create.sh related changes to be... Prashant D

03/24/2018

11:01 AM Bug #23352 (New): osd: segfaults under normal operation
Raising priority because it's a possible regression in Luminous. Nathan Cutler
07:52 AM Support #23455: osd: large number of inconsistent objects after recover or backfilling
also affected head object, but very small number of portion. Yao Ning
07:46 AM Support #23455: osd: large number of inconsistent objects after recover or backfilling
it is also affected for v10.2.5. And just affect all snap object, and no head object is affected Yao Ning
07:36 AM Support #23455: osd: large number of inconsistent objects after recover or backfilling
it seems quite similar with issue http://tracker.ceph.com/issues/21388 Yao Ning
07:20 AM Support #23455 (Resolved): osd: large number of inconsistent objects after recover or backfilling
large number of inconsistent objects after recover or backfilling.
reproduce method:
1) create rbd volume and, ...
Yao Ning
07:09 AM Bug #23430 (Resolved): PGs are stuck in 'creating+incomplete' status on vstart cluster
Mykola Golub

03/23/2018

04:47 PM Bug #23352: osd: segfaults under normal operation
"Me too". I had a brief look at the coredump, without becoming all that much wiser. Judging by the lock attached to t... Kjetil Joergensen
02:44 PM Bug #23145: OSD crashes during recovery of EC pg
sorry Json, i saw the ec pool min_size is equal 5, i need verify with our test engineer tomorrow... the two environme... Zengran Zhang
06:21 AM Bug #23439: Crashing OSDs after 'ceph pg repair'
And the next three ,,,... Jan Marquardt
05:18 AM Bug #23440: duplicated "commit_queued_for_journal_write" events in OpTracker
OK. then I'm going to close mine. Yanhu Cao
05:13 AM Bug #23440: duplicated "commit_queued_for_journal_write" events in OpTracker
Yanhu Cao wrote:
> https://github.com/ceph/ceph/pull/21017
Hi, Yanhu, thank you for your contribution. My intern ...
Chang Liu
03:37 AM Bug #23440: duplicated "commit_queued_for_journal_write" events in OpTracker
https://github.com/ceph/ceph/pull/21017 Yanhu Cao
02:41 AM Backport #23077 (In Progress): luminous: mon: ops get stuck in "resend forwarded message to leader"
Include both PRs from comment#2:
https://github.com/ceph/ceph/pull/21016
Prashant D

03/22/2018

06:56 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
And the next three OSDs crashed:... Jan Marquardt
11:49 AM Bug #23439: Crashing OSDs after 'ceph pg repair'
With #23258 we already had a similar issue and I am wondering if this is something you always have to expect with Cep... Jan Marquardt
11:46 AM Bug #23439 (New): Crashing OSDs after 'ceph pg repair'
Yesterday, ceph reported scrub errors.... Jan Marquardt
06:14 PM Bug #23430 (Fix Under Review): PGs are stuck in 'creating+incomplete' status on vstart cluster
PR: https://github.com/ceph/ceph/pull/21008 Mykola Golub
05:54 PM Bug #23430 (In Progress): PGs are stuck in 'creating+incomplete' status on vstart cluster
Mykola Golub
05:54 PM Bug #23430: PGs are stuck in 'creating+incomplete' status on vstart cluster
I think the problem is that `ceph config` sets osd_pool_default_erasure_code_profile too late: when the cluster alrea... Mykola Golub
05:05 PM Bug #23430: PGs are stuck in 'creating+incomplete' status on vstart cluster
I think it still worth investigating.
Previously the default profile just worked on vstart clusters, and now it d...
Mykola Golub
03:51 PM Bug #23430: PGs are stuck in 'creating+incomplete' status on vstart cluster
I did further investigation here and figured out this issue occurs due to the "special" situation of my vstart enviro... Tatjana Dehler
01:59 PM Bug #23440 (In Progress): duplicated "commit_queued_for_journal_write" events in OpTracker
... Chang Liu
01:38 PM Bug #23352: osd: segfaults under normal operation
Also seeing these, no core dump but have now had 3 segfaults in 2 weeks since upgrading to 12.2.4 from a very stable ... Chris Hoy Poy
11:15 AM Bug #23145: OSD crashes during recovery of EC pg
hi Josh, here is the log i said to offer in APAC with debug_osd=30 & debug_bluestore = 30. its another environment, s... Zengran Zhang
08:08 AM Bug #23145: OSD crashes during recovery of EC pg
Xiaofei Cui wrote:
> We think we have met the same problem.
> The pginfos:
>
> [...]
>
> We have no idea why...
Josh Durgin
05:21 AM Backport #23412 (In Progress): luminous: delete type mismatch in CephContext teardown
https://github.com/ceph/ceph/pull/20998 Prashant D
02:55 AM Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica tak...
I reproduced this by creating an inconsistent pg and then causing it to split.
pool of size 2 with 1 pg and I crea...
David Zafman
02:02 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Got it! I couldn't use the pause feature because none of the get/get-bytes/stat stuff would work, it all got stuck.
...
Ryan Anstey

03/21/2018

11:38 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...

I'm confused because you are dealing with 2 different objects.
Does rb.0.854e.238e1f29.000000140b6d still have a...
David Zafman
11:12 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Ah, I got confused by stuff. Should I just not stop the OSDs, or just stop during the get, start for the put?
I ju...
Ryan Anstey
11:00 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...

I don't know how rados get/put could work while the PGs OSDs are all stopped. Also, 'rados get' will give EIO erro...
David Zafman
10:39 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
So the stuff I did above did not work, the result of the repair after get/put:... Ryan Anstey
10:08 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
So for some reason my rados get/put commands are working now, not sure why. After I complete all my steps, the repair... Ryan Anstey
09:31 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
I'm trying the same thing on a different broken pg, while it's stuck the pg detail is:... Ryan Anstey
09:00 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
The time to write should be on the same order as reading.
You forgot to restart your osd before running rados.
David Zafman
08:38 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
David Zafman wrote:
> With client activity stopped, read the data from this object and write it again using rados ge...
Ryan Anstey
06:15 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...

You have a data_digest issue not an omap_digest one. You can remove the temporary omap entry. Since shards 8, 13 ...
David Zafman
05:57 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
I'm trying to fix my problems but I'm kind of a noob, having trouble getting things to work. My cluster seems to be d... Ryan Anstey
07:38 PM Bug #23228: scrub mismatch on objects
/a/dzafman-2018-03-21_09:57:19-rados:thrash-wip-zafman-testing2-distro-basic-smithi/2312125
rados:thrash/{0-size-m...
David Zafman
11:33 AM Backport #23408 (In Progress): luminous: mgrc's ms_handle_reset races with send_pgstats()
https://github.com/ceph/ceph/pull/20987 Prashant D
09:14 AM Bug #23431 (Duplicate): OSD Segmentation fault in thread_name:safe_timer
I noticed an OSD segmentation fault in one of our OSDs logs.
See the attached log entries. There is no core file tha...
Dietmar Rieder
08:27 AM Bug #23430 (Resolved): PGs are stuck in 'creating+incomplete' status on vstart cluster
Hi,
The PGs are stuck in 'creating+incomplete' status after creating an erasure coded pool on a vstart cluster.
...
Tatjana Dehler
03:05 AM Bug #23428 (New): Snapset inconsistency is hard to diagnose because authoritative copy used by li...
... David Zafman
02:27 AM Bug #23145: OSD crashes during recovery of EC pg
We think we have met the same problem.
The pginfos:...
Xiaofei Cui

03/20/2018

12:32 PM Bug #23145 (In Progress): OSD crashes during recovery of EC pg
Radoslaw Zarzynski
12:32 PM Bug #23145: OSD crashes during recovery of EC pg
Sorry for missing your updates, Peter. :-( I've just scripted my Gmail for _X-Redmine-Project: bluestore_.
From th...
Radoslaw Zarzynski

03/19/2018

09:35 PM Bug #23145 (New): OSD crashes during recovery of EC pg
Nathan Cutler
07:32 PM Bug #23145: OSD crashes during recovery of EC pg
Can't seem to flip this ticket out of 'Needs more info', unfortunately.. Peter Woodman
04:42 PM Backport #23413 (Resolved): jewel: delete type mismatch in CephContext teardown
https://github.com/ceph/ceph/pull/21084 Nathan Cutler
04:42 PM Backport #23412 (Resolved): luminous: delete type mismatch in CephContext teardown
https://github.com/ceph/ceph/pull/20998 Nathan Cutler
04:42 PM Backport #23408 (Resolved): luminous: mgrc's ms_handle_reset races with send_pgstats()
https://github.com/ceph/ceph/pull/23791 Nathan Cutler
04:26 PM Bug #23267 (In Progress): scrub errors not cleared on replicas can cause inconsistent pg state wh...
David Zafman
04:00 PM Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica tak...
David Zafman
01:00 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
Appears the error is with calculating the host weight.
It has set it at 43.664 when it should be set to 43.668
...
Warren Jeffs
10:34 AM Bug #23403 (Closed): Mon cannot join quorum
Hi all,
On a 3-mon cluster running infernalis one of the mon left the quorum and we are unable to make it come bac...
Gauvain Pocentek
10:23 AM Backport #23351 (In Progress): luminous: filestore: do_copy_range replay bad return value
https://github.com/ceph/ceph/pull/20957 Prashant D
09:24 AM Bug #23402 (Duplicate): objecter: does not resend op on split interval
... Sage Weil
09:01 AM Bug #23370 (Pending Backport): mgrc's ms_handle_reset races with send_pgstats()
Kefu Chai

03/18/2018

10:19 PM Bug #23339 (Resolved): Scrub errors after ec-small-objects-overwrites test
http://pulpito.ceph.com/sage-2018-03-18_09:19:17-rados-wip-sage-testing-2018-03-18-0231-distro-basic-smithi/ Sage Weil

03/17/2018

02:08 AM Bug #23395: qa/standalone/special/ceph_objectstore_tool.py causes ceph-mon core dump

../qa/run-standalone.sh ceph_objectstore_tool.py
--- ../qa/standalone/special/ceph_objectstore_tool.py ---
vst...
David Zafman
02:05 AM Bug #23395 (Can't reproduce): qa/standalone/special/ceph_objectstore_tool.py causes ceph-mon core...

I assume erasure code profile handling must have changed. It shouldn't crash but we may need a test change too.
...
David Zafman

03/16/2018

10:38 PM Feature #23364: Special scrub handling of hinfo_key errors
https://github.com/ceph/ceph/pull/20947 David Zafman
08:37 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
Appears Paul Emmerich has found the problem and its down the weights.
The email chain can be seen from the mailin...
Warren Jeffs
09:22 AM Bug #23386 (Resolved): crush device class: Monitor Crash when moving Bucket into Default root
When moving prestaged hosts with disks that out side of a root moving them into the root, causes the monitor to crash... Warren Jeffs
08:08 PM Bug #23339 (Fix Under Review): Scrub errors after ec-small-objects-overwrites test
http://pulpito.ceph.com/sage-2018-03-16_17:59:04-rados:thrash-erasure-code-overwrites-wip-sage-testing-2018-03-16-112... Sage Weil
05:09 PM Bug #23352: osd: segfaults under normal operation
Here is the link to the core dump https://drive.google.com/open?id=1tOTqSOaS94gOhHfXmGbbfuXLNFFfOVuf Alex Gorbachev
04:34 PM Bug #23324 (Pending Backport): delete type mismatch in CephContext teardown
Kefu Chai
03:03 AM Bug #23324 (In Progress): delete type mismatch in CephContext teardown
https://github.com/ceph/ceph/pull/20930 Brad Hubbard
01:38 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
Forgot to mention the exact place it breaks:... Daniel Glaser
10:21 AM Bug #23387 (Resolved): Building Ceph on armhf fails due to out-of-memory
Hi,
I'm currently struggling with building ceph through make-deps.sh on a armhf (namely the ODROID HC2). Everythin...
Daniel Glaser
09:16 AM Bug #23385: osd: master osd crash when pg scrub
The ceph version is 10.2.3 rongzhen zhan
09:11 AM Bug #23385 (New): osd: master osd crash when pg scrub
my ceph on arm 4.4.52-armada-17.06.2.I put a object into rados.when scrub the pg with handle,the master osd crash.bel... rongzhen zhan
08:56 AM Bug #23320: OSD suicide itself because of a firewall rule but reports a received signal
Can I have some inputs on this topic ? I can make the PR but I'd love having your opinion on it.
Thx,
Anonymous

03/15/2018

06:00 PM Bug #23145: OSD crashes during recovery of EC pg
Let me know if you need anything else off this cluster, I probably will have to trash this busted PG at some point so... Peter Woodman
05:37 AM Bug #23370 (Fix Under Review): mgrc's ms_handle_reset races with send_pgstats()
https://github.com/ceph/ceph/pull/20909 Kefu Chai
05:34 AM Bug #23370 (Resolved): mgrc's ms_handle_reset races with send_pgstats()
2018-03-14T12:29:45.168 INFO:teuthology.orchestra.run.mira056:Running: 'sudo adjust-ulimits ceph-coverage /home/ubunt... Kefu Chai
05:34 AM Bug #23371 (New): OSDs flaps when cluster network is made down
we are having a 5 node cluster with 5 mons and 120 OSDs equally distributed.
As part of our resiliency test we ma...
Nokia ceph-users
04:06 AM Backport #23315 (In Progress): luminous: pool create cmd's expected_num_objects is not correctly ...
https://github.com/ceph/ceph/pull/20907 Prashant D

03/14/2018

09:37 PM Bug #22346: OSD_ORPHAN issues after jewel->luminous upgrade, but orphaned osds not in crushmap
Hi Jun,
It's not really possible to pinpoint an exact PR at this stage as it's possible there was more than one an...
Brad Hubbard
10:19 AM Bug #22346: OSD_ORPHAN issues after jewel->luminous upgrade, but orphaned osds not in crushmap
Brad Hubbard wrote:
> Hi Graham,
>
> The consensus is that this was caused by a bug in a previous release which f...
huang jun
08:41 PM Bug #23365 (New): CEPH device class not honored for erasure encoding.
To start, this cluster isn't happy. It is my destructive testing/learning cluster.
Recently I rebuilt the cluster...
Brian Woods
08:36 PM Feature #23364 (Resolved): Special scrub handling of hinfo_key errors

We shouldn't handle hinfo_key as just another user xattr
Add the following errors specific to hinfo_key for eras...
David Zafman
06:32 PM Bug #23361 (New): /build/ceph-12.2.4/src/osd/PGLog.h: 888: FAILED assert(i->prior_version == last...

Log with debug_osd=20 and debug_bluestore=20 enabled:
https://drive.google.com/open?id=1Yr_MIXHzrgWUR5ZsV1xKlPUqZH...
Christoffer Lilja
04:49 PM Bug #23360 (Duplicate): call to 'ceph osd erasure-code-profile set' asserts the monitors
duplicate of http://tracker.ceph.com/issues/23345 Joao Eduardo Luis
04:16 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
A proper fix would be to provide a proper error message in @OSDMonitor::parse_erasure_code_profile@ instead of assert... Sebastian Wagner
04:15 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
Found the cause of this. From the mon.a.log:... Sebastian Wagner
03:08 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
Hm. quite possible that this is in fact not a classc deadlock.
Turns out, the `ceph` command line tool is also br...
Sebastian Wagner
02:52 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
The @send_command()@ function visible in this traceback is: https://github.com/ceph/ceph/pull/20865/files#diff-188b91... Sebastian Wagner
02:48 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
Could you point to the code, or provide a small python example, that triggers this deadlock? Ricardo Dias
02:37 PM Bug #23360 (Duplicate): call to 'ceph osd erasure-code-profile set' asserts the monitors
I've attached `thread apply all bt` mixed with `thread apply all py-bt`
Threads 38 35 34 32 and 31 are waiting for...
Sebastian Wagner
03:48 PM Bug #23352: osd: segfaults under normal operation
Sage, PM'ed to you the public download link, hope it works. Alex Gorbachev
03:39 PM Bug #23352: osd: segfaults under normal operation
HI Sage, I do have the core dump. Where can I upload the file, it's rather large, 850 MB compressed. Alex Gorbachev
01:54 PM Bug #23352 (Need More Info): osd: segfaults under normal operation
Do you have a core file? I haven't seen this crash before. Sage Weil
02:13 AM Bug #23352 (Resolved): osd: segfaults under normal operation
-1> 2018-03-13 22:03:27.390956 7f42eec36700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1520993007390955, "job": 454,... Alex Gorbachev
01:58 PM Bug #23345: `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
Sage Weil
01:55 PM Bug #23339: Scrub errors after ec-small-objects-overwrites test
Sage Weil
10:59 AM Bug #22351: Couldn't init storage provider (RADOS)
@Brad - that's perfect, thanks. Backport PR open. Nathan Cutler
10:27 AM Bug #22351: Couldn't init storage provider (RADOS)
@Nathan Oops, sorry mate, my bad.
These are the two we need.
https://github.com/ceph/ceph/pull/20022
https:/...
Brad Hubbard
09:44 AM Bug #22351: Couldn't init storage provider (RADOS)
@Brad - I was confused because you changed the status to Resolved, apparently before the backport was done.
Could ...
Nathan Cutler
12:25 AM Bug #22351: Couldn't init storage provider (RADOS)
@Nathan There wasn't one, I just set the backport field?
Just let me know if you need any action from me on this.
Brad Hubbard
10:57 AM Backport #23349 (In Progress): luminous: Couldn't init storage provider (RADOS)
Nathan Cutler
07:13 AM Documentation #23354 (Resolved): doc: osd_op_queue & osd_op_queue_cut_off
In docs:
osd_op_queue default is: `prio`. Real is `wpq`. So this is a doc's bug.
If I understand properly: if o...
Konstantin Shalygin
05:12 AM Backport #23312 (In Progress): luminous: invalid JSON returned when querying pool parameters
https://github.com/ceph/ceph/pull/20890 Prashant D

03/13/2018

11:15 PM Backport #23307 (In Progress): jewel: ceph-objectstore-tool command to trim the pg log
David Zafman
10:22 PM Backport #23351 (Resolved): luminous: filestore: do_copy_range replay bad return value
https://github.com/ceph/ceph/pull/20957 Nathan Cutler
10:22 PM Bug #23298 (Pending Backport): filestore: do_copy_range replay bad return value
Sage Weil
10:13 PM Backport #23323 (Resolved): luminous: ERROR type entries of pglog do not update min_last_complete...
Nathan Cutler
09:58 PM Backport #23349 (Resolved): luminous: Couldn't init storage provider (RADOS)
https://github.com/ceph/ceph/pull/20896 Nathan Cutler
09:46 PM Bug #22351 (Pending Backport): Couldn't init storage provider (RADOS)
@Brad, I missed which PR is the luminous backport PR? Nathan Cutler
09:27 PM Bug #22887: osd/ECBackend.cc: 2202: FAILED assert((offset + length) <= (range.first.get_off() + r...
Here's another: /ceph/teuthology-archive/pdonnell-2018-03-11_22:42:18-multimds-wip-pdonnell-testing-20180311.180352-t... Patrick Donnelly
09:20 PM Bug #23345: `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
Running either... Joao Eduardo Luis
09:09 PM Bug #23345 (Resolved): `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
Coming into OSDMonitor::parse_erasure_code_profile() will trigger an assert that probably should be an error instead.... Joao Eduardo Luis
08:58 PM Bug #22902 (In Progress): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine eve...
David Zafman
08:55 PM Bug #23282 (New): If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0...
Greg Farnum
11:44 AM Bug #23282 (Closed): If you add extra characters to an fsid, it gets parsed as "00000000-0000-000...
John Spray
04:00 AM Bug #23282: If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0000-00...
Greg Farnum wrote:
> So it got better when you took away the extra "80" prefix?
yes,my mistake.
Amine Liu
05:16 PM Bug #23339 (Resolved): Scrub errors after ec-small-objects-overwrites test

dzafman-2018-03-12_08:11:53-rados-wip-zafman-testing-distro-basic-smithi/2283533...
David Zafman
07:13 AM Bug #23258: OSDs keep crashing.
... Jan Marquardt
07:04 AM Bug #23258: OSDs keep crashing.
We are now having the same issue on osd.1, osd.11, osd.20 and osd.25, each located on different host. osd.1 uses file... Jan Marquardt
06:13 AM Bug #23324: delete type mismatch in CephContext teardown
This has to do with the use of placement new in the overload of Log::create_entry with the expected_size argument. I'... Brad Hubbard

03/12/2018

10:56 PM Bug #22902: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")

OSD 4 is the primary [6,5,4]/[4,5,7] with osd.6 crashing...
David Zafman
09:18 PM Bug #22050 (Resolved): ERROR type entries of pglog do not update min_last_complete_ondisk, potent...
Josh Durgin
04:45 PM Bug #22050 (Pending Backport): ERROR type entries of pglog do not update min_last_complete_ondisk...
Josh Durgin
09:12 PM Bug #23325: osd_max_pg_per_osd.py: race between pool creation and wait_for_clean
Seen here as well: http://pulpito.ceph.com/nojha-2018-03-02_23:59:23-rados-wip-async-recovery-2018-03-02-distro-basic... Neha Ojha
09:06 PM Bug #23325 (New): osd_max_pg_per_osd.py: race between pool creation and wait_for_clean
Seen in http://pulpito.ceph.com/joshd-2018-03-12_15:49:43-rados-wip-pg-log-trim-error-luminous-distro-basic-smithi/22... Josh Durgin
06:22 PM Bug #23324: delete type mismatch in CephContext teardown
It looks more to me like we're allocating an object of one type (Entry) and then casting it to another (Log)? Is ther... Jeff Layton
05:16 PM Bug #23324: delete type mismatch in CephContext teardown
I don't recognize this from elsewhere and it looks like the kind of issue that could arise from trying to delete some... Greg Farnum
04:56 PM Bug #23324: delete type mismatch in CephContext teardown
Package in this case is:
librados2-13.0.1-2356.gf2b88f364515.fc27.x86_64
Jeff Layton
04:51 PM Bug #23324 (Resolved): delete type mismatch in CephContext teardown
I've been hunting some memory corruption in ganesha and ran across this. Seems unlikely to be the cause of the crashe... Jeff Layton
05:19 PM Bug #23282: If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0000-00...
So it got better when you took away the extra "80" prefix? Greg Farnum
06:31 AM Bug #23282: If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0000-00...
My mistake. I don't know why there's an extra "80" of fsid in My conf.
Amine Liu
05:19 PM Bug #23290: "/test/osd/RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basi...
Is that the "the disk errored out" bug? Greg Farnum
04:44 PM Backport #23323 (Resolved): luminous: ERROR type entries of pglog do not update min_last_complete...
https://github.com/ceph/ceph/pull/20851 Josh Durgin
01:39 PM Bug #22656: scrub mismatch on bytes (cache pools)
/a/sage-2018-03-11_23:03:25-rados-wip-sage2-testing-2018-03-10-1616-distro-basic-smithi/2280391
description: rados...
Sage Weil
01:09 PM Bug #23320: OSD suicide itself because of a firewall rule but reports a received signal
I used this url https://www.mkssoftware.com/docs/man5/siginfo_t.5.asp#Signal_Codes to get a better understanding of t... Anonymous
01:08 PM Bug #23320: OSD suicide itself because of a firewall rule but reports a received signal
I'm attaching the patch for more readability. Anonymous
11:13 AM Bug #23320 (Resolved): OSD suicide itself because of a firewall rule but reports a received signal
We (leseb & I) had an issue where the OSD crashes with the following message :
2018-03-08 14:30:26.042607 7f6142b7...
Anonymous
10:40 AM Bug #23281 (Resolved): run-tox-ceph-disk fails in luminous's "make check" run by jenkins
Nathan Cutler
10:39 AM Bug #23283 (Duplicate): os/bluestore:cache arise a Segmentation fault
Duplicated https://tracker.ceph.com/issues/21259 Igor Fedotov
10:23 AM Bug #23258: OSDs keep crashing.
After extending the cluster to 40 osds and removing osd.11 from it, the problem has moved to osd.1:... Jan Marquardt
09:16 AM Backport #23316 (Resolved): jewel: pool create cmd's expected_num_objects is not correctly interp...
https://github.com/ceph/ceph/pull/22050 Nathan Cutler
09:16 AM Backport #23315 (Resolved): luminous: pool create cmd's expected_num_objects is not correctly int...
https://github.com/ceph/ceph/pull/20907 Nathan Cutler
09:14 AM Backport #23312 (Resolved): luminous: invalid JSON returned when querying pool parameters
https://github.com/ceph/ceph/pull/20890 Nathan Cutler
09:14 AM Backport #23307 (Resolved): jewel: ceph-objectstore-tool command to trim the pg log
https://github.com/ceph/ceph/pull/20882 Nathan Cutler

03/11/2018

11:04 PM Bug #23297: mon-seesaw 'failed to become clean before timeout' due to laggy pg create
/a/sage-2018-03-11_02:12:48-rados-wip-sage2-testing-2018-03-10-1616-distro-basic-smithi/2276594 Sage Weil
02:19 AM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
Anyways, the only place where this can happen is if @snap_seq < max(removed_snaps)@ because the deletion request inse... Paul Emmerich
12:36 AM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
Well, turns out there were both 12.2.1 and 12.2.4 clients doing snapshot operations. This messed up removed_snaps due... Paul Emmerich
 

Also available in: Atom