Activity
From 03/10/2018 to 04/08/2018
04/08/2018
- 07:55 PM Bug #23595: osd: recovery/backfill is extremely slow
- For the record, I installed the following debugging packages for gdb stack traces:...
- 07:53 PM Bug #23595: osd: recovery/backfill is extremely slow
- I have read https://www.spinics.net/lists/ceph-devel/msg38331.html which suggests that there is some throttling going...
- 06:17 PM Bug #23595 (Duplicate): osd: recovery/backfill is extremely slow
- I made a Ceph 12.2.4 (luminous stable) cluster of 3 machines with 10-Gigabit networking on Ubuntu 16.04, using pretty...
- 05:40 PM Bug #23593: RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- PR: https://github.com/ceph/ceph/pull/21290
- 03:10 PM Bug #23593 (Resolved): RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- ...
- 04:31 PM Documentation #23594: auth: document what to do when locking client.admin out
- I found one way to fix it on the mailing list:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/01... - 04:23 PM Documentation #23594 (New): auth: document what to do when locking client.admin out
- I accidentally ran ...
- 11:06 AM Bug #23590: kstore: statfs: (95) Operation not supported
- https://github.com/ceph/ceph/pull/21287
- 11:01 AM Bug #23590 (Fix Under Review): kstore: statfs: (95) Operation not supported
- 2018-04-07 16:19:07.248 7fdec4675700 -1 osd.0 0 statfs() failed: (95) Operation not supported
2018-04-07 16:19:08.... - 08:50 AM Bug #23589 (New): jewel: KStore Segmentation fault in ceph_test_objectstore --gtest_filter=-*/2:-*/3
- Test description: rados/objectstore/objectstore.yaml
Log excerpt:... - 08:39 AM Bug #23588 (New): LibRadosAioEC.IsCompletePP test fails in jewel 10.2.11 integration testing
- Test description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yam...
- 06:53 AM Bug #23511: forwarded osd_failure leak in mon
- Greg, no. both tests below include the no_reply() fix.
see
- http://pulpito.ceph.com/kchai-2018-03-30_15:29:52-r... - 06:42 AM Bug #23585 (Duplicate): osd: safe_timer segfault
- ...
04/07/2018
- 03:04 AM Bug #23195: Read operations segfaulting multiple OSDs
Change the test-erasure-eio.sh test as following:...
04/06/2018
- 10:23 PM Bug #22165 (Fix Under Review): split pg not actually created, gets stuck in state unknown
- Fixed by https://github.com/ceph/ceph/pull/20469
- 09:29 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
- You'll definitely get more attention and advice if somebody else has hit this issue before.
- 08:45 PM Bug #23195: Read operations segfaulting multiple OSDs
- For anyone running into the send_all_remaining_reads() crash, a workaround is to use these osd settings:...
- 04:17 PM Bug #23195 (Fix Under Review): Read operations segfaulting multiple OSDs
- https://github.com/ceph/ceph/pull/21273
I'm going to treat this issue as tracking the first crash, in send_all_rem... - 03:10 AM Bug #23195 (In Progress): Read operations segfaulting multiple OSDs
- 08:41 PM Bug #23200 (Resolved): invalid JSON returned when querying pool parameters
- 08:40 PM Backport #23312 (Resolved): luminous: invalid JSON returned when querying pool parameters
- 07:28 PM Backport #23312: luminous: invalid JSON returned when querying pool parameters
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20890
merged - 08:40 PM Bug #23324 (Resolved): delete type mismatch in CephContext teardown
- 08:40 PM Backport #23412 (Resolved): luminous: delete type mismatch in CephContext teardown
- 07:28 PM Backport #23412: luminous: delete type mismatch in CephContext teardown
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20998
merged - 08:38 PM Bug #23477 (Resolved): should not check for VERSION_ID
- 08:38 PM Backport #23478 (Resolved): should not check for VERSION_ID
- 07:26 PM Backport #23478: should not check for VERSION_ID
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21090
merged - 06:03 PM Bug #21833 (Resolved): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
- 06:02 PM Backport #23160 (Resolved): luminous: Multiple asserts caused by DNE pgs left behind after lots o...
- 03:57 PM Backport #23160: luminous: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
- Prashant D wrote:
> Waiting for code review for backport PR : https://github.com/ceph/ceph/pull/20668
merged - 06:02 PM Bug #23078 (Resolved): SRV resolution fails to lookup AAAA records
- 06:02 PM Backport #23174 (Resolved): luminous: SRV resolution fails to lookup AAAA records
- 03:56 PM Backport #23174: luminous: SRV resolution fails to lookup AAAA records
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20710
merged - 05:57 PM Backport #23472 (Resolved): luminous: add --add-bucket and --move options to crushtool
- 03:53 PM Backport #23472: luminous: add --add-bucket and --move options to crushtool
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21079
mergedReviewed-by: Josh Durgin <jdurgin@redhat.com> - 05:37 PM Bug #23578 (Resolved): large-omap-object-warnings test fails
- ...
- 03:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Sorry, forgot to mention I am running 12.2.4.
- 03:50 PM Bug #23576 (Can't reproduce): osd: active+clean+inconsistent pg will not scrub or repair
- My apologies if I'm too premature in posting this.
Myself and so far two others on the mailing list: http://lists.... - 03:44 AM Bug #23345 (Resolved): `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
- https://github.com/ceph/ceph/pull/20986
- 01:57 AM Bug #21737 (Resolved): OSDMap cache assert on shutdown
- 01:56 AM Backport #21786 (Resolved): jewel: OSDMap cache assert on shutdown
04/05/2018
- 09:12 PM Bug #22887 (Duplicate): osd/ECBackend.cc: 2202: FAILED assert((offset + length) <= (range.first.g...
- 09:12 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- From #22887, this also appeared in /ceph/teuthology-archive/pdonnell-2018-01-30_23:38:56-kcephfs-wip-pdonnell-i22627-...
- 09:09 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- That was the fix I was wondering about, but it was merged to master as https://github.com/ceph/ceph/pull/15712 and so...
- 09:05 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- https://github.com/ceph/ceph/pull/15712
- 09:10 PM Bug #19882: rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0.10251ca0c5f...
- https://github.com/ceph/ceph/pull/15712
- 06:35 PM Bug #22351 (Resolved): Couldn't init storage provider (RADOS)
- 06:35 PM Backport #23349 (Resolved): luminous: Couldn't init storage provider (RADOS)
- 05:22 PM Backport #23349: luminous: Couldn't init storage provider (RADOS)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20896
merged - 06:33 PM Bug #22114 (Resolved): mon: ops get stuck in "resend forwarded message to leader"
- 06:33 PM Backport #23077 (Resolved): luminous: mon: ops get stuck in "resend forwarded message to leader"
- 04:57 PM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21016
merged - 04:57 PM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21016
merged - 06:31 PM Bug #22752 (Resolved): snapmapper inconsistency, crash on luminous
- 06:31 PM Backport #23500 (Resolved): luminous: snapmapper inconsistency, crash on luminous
- 04:55 PM Backport #23500: luminous: snapmapper inconsistency, crash on luminous
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21118
merged
- 05:14 PM Bug #23565 (Fix Under Review): Inactive PGs don't seem to cause HEALTH_ERR
- In looking at https://tracker.ceph.com/issues/23562, there were inactive PGs starting at...
- 04:43 PM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
- ...
- 04:18 PM Bug #23564 (Duplicate): OSD Segfaults
- Apr 5 11:40:31 roc05r-sc3a100 kernel: [126029.543698] safe_timer[28863]: segfault at 8d ip 00007fa9ad4dcccb sp 00007...
- 12:24 PM Bug #23562 (New): VDO OSD caused cluster to hang
- I awoke to alerts that apache serving teuthology logs on the Octo Long Running Cluster was unresponsive.
Here was ... - 08:37 AM Bug #23439: Crashing OSDs after 'ceph pg repair'
- Hi Greg,
thanks for your response.
> That URL denies access. You can use ceph-post-file instead to upload logs ... - 03:31 AM Bug #23403: Mon cannot join quorum
- My apologies. It appears my previous analysis was incorrect.
I've pored over the logs and it appears the issue is ...
04/04/2018
- 11:19 PM Bug #23554: mon: mons need to be aware of VDO statistics
- Right, but AFAICT the monitor is then not even aware of VDO being involved. Which seems fine to my naive thoughts, bu...
- 11:05 PM Bug #23554: mon: mons need to be aware of VDO statistics
- Of course Sage is already on it :)
I don't know where the ... - 10:46 PM Bug #23554: mon: mons need to be aware of VDO statistics
- At least this: https://github.com/ceph/ceph/pull/20516
- 10:44 PM Bug #23554: mon: mons need to be aware of VDO statistics
- What would we expect this monitor awareness to look like? Extra columns duplicating the output of vdostats?
- 05:48 PM Bug #23554 (New): mon: mons need to be aware of VDO statistics
- I created an OSD on top of a logical volume with a VDO device underneath.
Ceph is unaware of how much compression ... - 09:58 PM Bug #23128: invalid values in ceph.conf do not issue visible warnings
- http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/ has been updated with information about this
- 09:53 PM Bug #23273: segmentation fault in PrimaryLogPG::recover_got()
- Can you reproduce with osds configured with:...
- 09:43 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
- That URL denies access. You can use ceph-post-file instead to upload logs to a secure location.
It's not clear wha... - 09:39 PM Bug #23320 (Fix Under Review): OSD suicide itself because of a firewall rule but reports a receiv...
- github.com/ceph/ceph/pull/21000
- 09:37 PM Bug #23487: There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- 09:31 PM Bug #23510 (Resolved): rocksdb spillover for hard drive configurations
- 09:31 PM Bug #23511: forwarded osd_failure leak in mon
- Kefu, did your latest no_reply() PR resolve this?
- 09:29 PM Bug #23535 (Closed): 'ceph --show-config --conf /dev/null' does not work any more
- Yeah, you should use the monitor config commands now! :)
- 09:28 PM Bug #23258: OSDs keep crashing.
- Brian, that's a separate bug; the code address you've picked up on is just part of the generic failure handling code....
- 09:19 PM Bug #23258: OSDs keep crashing.
- I was about to start a new bug and found this, I am also seeing 0xa74234 and ceph::__ceph_assert_fail...
A while b... - 09:22 PM Bug #20924: osd: leaked Session on osd.7
- /a/sage-2018-04-04_02:28:04-rados-wip-sage2-testing-2018-04-03-1634-distro-basic-smithi/2351291
rados/verify/{ceph... - 09:21 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
- Under discussion on the PR, which is good on its own terms but suffering from a prior CephFS bug. :(
- 09:19 PM Bug #23297: mon-seesaw 'failed to become clean before timeout' due to laggy pg create
- I suspect this is resolved in https://github.com/ceph/ceph/pull/19973 by the commit that has the OSDs proactively go ...
- 09:16 PM Bug #23490: luminous: osd: double recovery reservation for PG when EIO injected (while already re...
- David, can you look at this when you get a chance? I think it's due to EIO triggering recovery when recovery is alrea...
- 09:13 PM Bug #23204: missing primary copy of object in mixed luminous<->master cluster with bluestore
- We should see this again as we run the upgrade suite for mimic...
- 09:08 PM Bug #22902 (Resolved): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
- https://github.com/ceph/ceph/pull/20933
- 09:07 PM Bug #23267 (Pending Backport): scrub errors not cleared on replicas can cause inconsistent pg sta...
- 07:25 PM Backport #23413 (Resolved): jewel: delete type mismatch in CephContext teardown
- 07:23 PM Bug #20471 (Resolved): Can't repair corrupt object info due to bad oid on all replicas
- 07:23 PM Backport #23181 (Resolved): jewel: Can't repair corrupt object info due to bad oid on all replicas
- 06:24 PM Bug #21758 (Resolved): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
- 06:24 PM Backport #21784 (Resolved): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make check...
- 06:18 PM Feature #23242 (Resolved): ceph-objectstore-tool command to trim the pg log
- 06:18 PM Backport #23307 (Resolved): jewel: ceph-objectstore-tool command to trim the pg log
- 08:14 AM Feature #23552 (New): cache PK11Context in Connection and probably other consumers of CryptoKeyHa...
- please see attached flamegraph, the 0.67% CPU cycle is used by PK11_CreateContextBySymKey(), if we cache the PK11Cont...
04/03/2018
- 08:40 PM Bug #23145: OSD crashes during recovery of EC pg
- Investigation results up to the date:
1. The local PGLog claims its _pg_log_t::can_rollback_to_ is **17348'18588**... - 08:59 AM Backport #22906 (Need More Info): jewel: bluestore: New OSD - Caught signal - bstore_kv_sync (thr...
- non-trivial backport
- 08:56 AM Backport #22808 (Need More Info): jewel: "osd pool stats" shows recovery information bugly
- non-trivial backport
- 08:33 AM Backport #22808 (In Progress): jewel: "osd pool stats" shows recovery information bugly
- 08:28 AM Backport #22449 (In Progress): jewel: Visibility for snap trim queue length
- https://github.com/ceph/ceph/pull/21200
- 08:13 AM Backport #22449: jewel: Visibility for snap trim queue length
- I don't think it's possible to backport entire feature without breaking Jewel->Luminous upgrade, so just first commit...
- 08:22 AM Backport #22403 (In Progress): jewel: osd: replica read can trigger cache promotion
- 08:15 AM Backport #22390 (In Progress): jewel: ceph-objectstore-tool: Add option "dump-import" to examine ...
- 04:05 AM Backport #23486 (In Progress): jewel: scrub errors not cleared on replicas can cause inconsistent...
- 02:35 AM Backport #21786 (In Progress): jewel: OSDMap cache assert on shutdown
04/02/2018
- 05:35 PM Bug #23145: OSD crashes during recovery of EC pg
- Anything new or info on what to do to try and recover this cluster? I don't even know how to get the pool deleted pro...
- 10:28 AM Bug #23535: 'ceph --show-config --conf /dev/null' does not work any more
- I just realized `--show-config` does not exist anymore. Probably it was removed intentionally?
04/01/2018
- 07:49 AM Bug #23535 (Closed): 'ceph --show-config --conf /dev/null' does not work any more
- Previously it could be used by users to return the default ceph configuration (see e.g. [1]), now it fails (even if w...
- 07:03 AM Backport #21784 (In Progress): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make ch...
- 06:58 AM Backport #22449 (Need More Info): jewel: Visibility for snap trim queue length
- Backporting this feature to jewel at this late stage seems risky. Do we really need it in jewel?
03/30/2018
- 05:10 PM Bug #22123 (Resolved): osd: objecter sends out of sync with pg epochs for proxied ops
- 05:09 PM Backport #23076 (Resolved): jewel: osd: objecter sends out of sync with pg epochs for proxied ops
- 03:31 PM Bug #23511: forwarded osd_failure leak in mon
- rerunning the tests at http://pulpito.ceph.com/kchai-2018-03-30_15:29:52-rados-wip-slow-mon-ops-kefu-distro-basic-smi...
- 01:02 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- Moving this to CI. This failure would only occur if the cls_XYX.so libraries could not be loaded during the execution...
- 02:59 AM Bug #23517 (Resolved): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- ...
- 05:25 AM Bug #23510: rocksdb spillover for hard drive configurations
- Igor Fedotov wrote:
> Ben,
> this has been fixed by https://github.com/ceph/ceph/pull/19257
> Not sure about an ex... - 12:10 AM Bug #23403 (Triaged): Mon cannot join quorum
- ...
03/29/2018
- 06:39 PM Bug #21218 (Resolved): thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing...
- 06:39 PM Backport #23024 (Resolved): luminous: thrash-eio + bluestore (hangs with unfound objects or read_...
- 01:20 PM Backport #23024: luminous: thrash-eio + bluestore (hangs with unfound objects or read_log_and_mis...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20495
merged - 03:39 PM Bug #23510: rocksdb spillover for hard drive configurations
- Ben,
this has been fixed by https://github.com/ceph/ceph/pull/19257
Not sure about an exact Luminous build it lande... - 03:02 PM Bug #23510 (Resolved): rocksdb spillover for hard drive configurations
- version: ceph-*-12.2.1-34.el7cp.x86_64
One of Bluestore's best use cases is to accelerate performance for writes o... - 03:33 PM Bug #22413 (Resolved): can't delete object from pool when Ceph out of space
- 03:33 PM Backport #23114 (Resolved): luminous: can't delete object from pool when Ceph out of space
- 01:19 PM Backport #23114: luminous: can't delete object from pool when Ceph out of space
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20585
merged - 03:08 PM Bug #23511 (Can't reproduce): forwarded osd_failure leak in mon
- see http://pulpito.ceph.com/kchai-2018-03-29_13:20:02-rados-wip-slow-mon-ops-kefu-distro-basic-smithi/2334154/
<p... - 01:24 PM Bug #22847 (Resolved): ceph osd force-create-pg cause all ceph-mon to crash and unable to come up...
- 01:24 PM Backport #22942 (Resolved): luminous: ceph osd force-create-pg cause all ceph-mon to crash and un...
- 01:21 PM Backport #22942: luminous: ceph osd force-create-pg cause all ceph-mon to crash and unable to com...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20399
merged - 01:23 PM Backport #23075 (Resolved): luminous: osd: objecter sends out of sync with pg epochs for proxied ops
- 01:18 PM Backport #23075: luminous: osd: objecter sends out of sync with pg epochs for proxied ops
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20609
merged - 10:28 AM Bug #19737 (Resolved): EAGAIN encountered during pg scrub (jewel)
- 09:54 AM Backport #23500 (In Progress): luminous: snapmapper inconsistency, crash on luminous
- 08:20 AM Backport #23500 (Resolved): luminous: snapmapper inconsistency, crash on luminous
- https://github.com/ceph/ceph/pull/21118
- 09:16 AM Bug #21844 (Resolved): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions on -ENOENT
- 09:16 AM Backport #21923 (Resolved): jewel: Objecter::C_ObjectOperation_sparse_read throws/catches excepti...
- 09:16 AM Bug #23403: Mon cannot join quorum
- Hi all,
As asked on the ceph-users mailing list, here are the results of the following commands on the 3 monitors:... - 09:09 AM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
- Happened again (jewel 10.2.11 integration testing) - http://qa-proxy.ceph.com/teuthology/smithfarm-2018-03-28_20:31:4...
- 08:25 AM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
- I've seen this on our cluster (luminous, bluestore based), but was unable to reproduce it...
Restarting primary mon... - 01:43 AM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
- when we reboot one host, some osd take a long time to start.
and one osd succeed to start finally after several tim... - 01:11 AM Bug #17170 (New): mon/monclient: update "unable to obtain rotating service keys when osd init" to...
- We hit this issue again in Luminous.
- 08:16 AM Backport #23186 (Resolved): luminous: ceph tell mds.* <command> prints only one matching usage
- 08:15 AM Bug #23212 (Resolved): bluestore: should recalc_allocated when decoding bluefs_fnode_t
- 08:15 AM Backport #23256 (Resolved): luminous: bluestore: should recalc_allocated when decoding bluefs_fno...
- 08:15 AM Bug #23298 (Resolved): filestore: do_copy_range replay bad return value
- 08:14 AM Backport #23351 (Resolved): luminous: filestore: do_copy_range replay bad return value
- 04:10 AM Bug #23228: scrub mismatch on objects
Just bytes
dzafman-2018-03-28_18:21:29-rados-wip-zafman-testing-distro-basic-smithi/2332093
[ERR] 3.0 scrub s...- 04:07 AM Bug #23495 (Resolved): Need (SLOW_OPS) in whitelist for another yaml
A job may have failed because (SLOW_OPS) is missing from tasks/mon_clock_with_skews.yaml
dzafman-2018-03-28_18:2...- 02:09 AM Feature #23493 (Resolved): config: strip/escape single-quotes in values when setting them via con...
- At the moment, the config parsing state machine does not account for single-quotes as potential value enclosures, as ...
- 01:09 AM Bug #23492 (Resolved): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-e...
dzafman-2018-03-28_15:20:23-rados:standalone-wip-zafman-testing-distro-basic-smithi/2331804
In TEST_rados_get_ba...- 12:29 AM Bug #22752 (Pending Backport): snapmapper inconsistency, crash on luminous
03/28/2018
- 10:58 PM Bug #23490 (Duplicate): luminous: osd: double recovery reservation for PG when EIO injected (whil...
- During a luminous test run, this was hit:
http://pulpito.ceph.com/yuriw-2018-03-27_21:16:27-rados-wip-yuri5-testin... - 10:26 PM Backport #23186: luminous: ceph tell mds.* <command> prints only one matching usage
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/20664
merged - 10:26 PM Backport #23256: luminous: bluestore: should recalc_allocated when decoding bluefs_fnode_t
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/20771
merged - 10:22 PM Backport #23351: luminous: filestore: do_copy_range replay bad return value
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20957
merged - 06:06 PM Bug #23487 (Fix Under Review): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- PR: https://github.com/ceph/ceph/pull/21102
- 05:58 PM Bug #23487 (Resolved): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- We have `ceph osd pool set erasure allow_ec_overwrites` command but does not have a corresponding command to get the ...
- 05:42 PM Backport #23486 (Resolved): jewel: scrub errors not cleared on replicas can cause inconsistent pg...
- https://github.com/ceph/ceph/pull/21194
- 05:42 PM Backport #23485 (Resolved): luminous: scrub errors not cleared on replicas can cause inconsistent...
- https://github.com/ceph/ceph/pull/21103
- 05:27 PM Bug #23267 (Fix Under Review): scrub errors not cleared on replicas can cause inconsistent pg sta...
- https://github.com/ceph/ceph/pull/21101
- 11:21 AM Bug #22114 (Pending Backport): mon: ops get stuck in "resend forwarded message to leader"
- 08:15 AM Backport #23478 (In Progress): should not check for VERSION_ID
- https://github.com/ceph/ceph/pull/21090
- 08:08 AM Backport #23478 (Resolved): should not check for VERSION_ID
- https://github.com/ceph/ceph/pull/21090
- 08:07 AM Bug #23477 (Pending Backport): should not check for VERSION_ID
- * https://github.com/ceph/ceph/pull/17787
* https://github.com/ceph/ceph/pull/21052
- 08:06 AM Bug #23477 (Resolved): should not check for VERSION_ID
- as per os-release(5), VERSION_ID is optional.
- 07:06 AM Bug #23352: osd: segfaults under normal operation
- for those who wants to check the coredump. you should use apport-unpack to unpack it first.
and it crashed at /bui... - 05:55 AM Backport #23413 (In Progress): jewel: delete type mismatch in CephContext teardown
- https://github.com/ceph/ceph/pull/21084
- 01:28 AM Backport #23472 (In Progress): luminous: add --add-bucket and --move options to crushtool
- https://github.com/ceph/ceph/pull/21079
- 12:57 AM Backport #23472 (Resolved): luminous: add --add-bucket and --move options to crushtool
- https://github.com/ceph/ceph/pull/21079
- 12:50 AM Bug #23471 (Pending Backport): add --add-bucket and --move options to crushtool
- https://github.com/ceph/ceph/pull/20183
- 12:49 AM Bug #23471 (Resolved): add --add-bucket and --move options to crushtool
- When using crushtool to create a CRUSH map, it is not possible to create a complex CRUSH map, we have to edit the CRU...
03/27/2018
- 10:46 PM Bug #23352: osd: segfaults under normal operation
- Chris,
Was your stack identical to Alex's original description or was it more like the stack in #23431 ? - 10:39 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- I agree these are similar and the cause may indeed be the same however there are only two stack frames in this instan...
- 07:36 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- There's a coredump-in-apport on google drive in http://tracker.ceph.com/issues/23352 - it looks at the face of it sim...
- 01:06 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- I have seen this as well, on our cluster. We're using bluestore, ubuntu 16, latest luminous.
The crashes were totall... - 10:58 AM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- The ceph-osd comes from https://download.ceph.com/rpm-luminous/el7/x86_64/
I verified via md5sum if the the local co... - 09:43 AM Bug #23431 (Need More Info): OSD Segmentation fault in thread_name:safe_timer
- What's the exact version of the ceph-osd you are using (exact package URL if possible please).
You could try 'objd... - 02:52 PM Feature #22420 (Resolved): Add support for obtaining a list of available compression options
- https://github.com/ceph/ceph/pull/20558
- 02:45 PM Bug #23215 (Resolved): config.cc: ~/.ceph/$cluster.conf is passed unexpanded to fopen()
- https://github.com/ceph/ceph/pull/20774
- 09:49 AM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
- might want to include https://github.com/ceph/ceph/pull/21057 also.
- 09:49 AM Bug #22114 (Fix Under Review): mon: ops get stuck in "resend forwarded message to leader"
- and https://github.com/ceph/ceph/pull/21057
- 01:35 AM Bug #22220 (Resolved): osd/ReplicatedPG.h:1667:14: internal compiler error: in force_type_die, at...
- Resolved for Fedora and just waiting on next DTS to ship on rhel/CentOS.
03/26/2018
- 11:27 PM Bug #23465: "Mutex.cc: 110: FAILED assert(r == 0)" ("AttributeError: 'tuple' object has no attrib...
- This isn't related to that suite commit. Run manually, 'file' returns "remote/smithi150/coredump/1522085413.12350.cor...
- 07:42 PM Bug #23465 (New): "Mutex.cc: 110: FAILED assert(r == 0)" ("AttributeError: 'tuple' object has no ...
- I see latest commit https://github.com/ceph/ceph/commit/c6760eba50860d40e25483c3e4cee772f3ad4468#diff-289c6ff15fd25ac...
- 09:11 AM Backport #23316 (Need More Info): jewel: pool create cmd's expected_num_objects is not correctly ...
- To backport this to jewel, we need to skip mgr changes and qa/standalone/mon/osd-pool-create.sh related changes to be...
03/24/2018
- 11:01 AM Bug #23352 (New): osd: segfaults under normal operation
- Raising priority because it's a possible regression in Luminous.
- 07:52 AM Support #23455: osd: large number of inconsistent objects after recover or backfilling
- also affected head object, but very small number of portion.
- 07:46 AM Support #23455: osd: large number of inconsistent objects after recover or backfilling
- it is also affected for v10.2.5. And just affect all snap object, and no head object is affected
- 07:36 AM Support #23455: osd: large number of inconsistent objects after recover or backfilling
- it seems quite similar with issue http://tracker.ceph.com/issues/21388
- 07:20 AM Support #23455 (Resolved): osd: large number of inconsistent objects after recover or backfilling
- large number of inconsistent objects after recover or backfilling.
reproduce method:
1) create rbd volume and, ... - 07:09 AM Bug #23430 (Resolved): PGs are stuck in 'creating+incomplete' status on vstart cluster
03/23/2018
- 04:47 PM Bug #23352: osd: segfaults under normal operation
- "Me too". I had a brief look at the coredump, without becoming all that much wiser. Judging by the lock attached to t...
- 02:44 PM Bug #23145: OSD crashes during recovery of EC pg
- sorry Json, i saw the ec pool min_size is equal 5, i need verify with our test engineer tomorrow... the two environme...
- 06:21 AM Bug #23439: Crashing OSDs after 'ceph pg repair'
- And the next three ,,,...
- 05:18 AM Bug #23440: duplicated "commit_queued_for_journal_write" events in OpTracker
- OK. then I'm going to close mine.
- 05:13 AM Bug #23440: duplicated "commit_queued_for_journal_write" events in OpTracker
- Yanhu Cao wrote:
> https://github.com/ceph/ceph/pull/21017
Hi, Yanhu, thank you for your contribution. My intern ... - 03:37 AM Bug #23440: duplicated "commit_queued_for_journal_write" events in OpTracker
- https://github.com/ceph/ceph/pull/21017
- 02:41 AM Backport #23077 (In Progress): luminous: mon: ops get stuck in "resend forwarded message to leader"
- Include both PRs from comment#2:
https://github.com/ceph/ceph/pull/21016
03/22/2018
- 06:56 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
- And the next three OSDs crashed:...
- 11:49 AM Bug #23439: Crashing OSDs after 'ceph pg repair'
- With #23258 we already had a similar issue and I am wondering if this is something you always have to expect with Cep...
- 11:46 AM Bug #23439 (New): Crashing OSDs after 'ceph pg repair'
- Yesterday, ceph reported scrub errors....
- 06:14 PM Bug #23430 (Fix Under Review): PGs are stuck in 'creating+incomplete' status on vstart cluster
- PR: https://github.com/ceph/ceph/pull/21008
- 05:54 PM Bug #23430 (In Progress): PGs are stuck in 'creating+incomplete' status on vstart cluster
- 05:54 PM Bug #23430: PGs are stuck in 'creating+incomplete' status on vstart cluster
- I think the problem is that `ceph config` sets osd_pool_default_erasure_code_profile too late: when the cluster alrea...
- 05:05 PM Bug #23430: PGs are stuck in 'creating+incomplete' status on vstart cluster
- I think it still worth investigating.
Previously the default profile just worked on vstart clusters, and now it d... - 03:51 PM Bug #23430: PGs are stuck in 'creating+incomplete' status on vstart cluster
- I did further investigation here and figured out this issue occurs due to the "special" situation of my vstart enviro...
- 01:59 PM Bug #23440 (In Progress): duplicated "commit_queued_for_journal_write" events in OpTracker
- ...
- 01:38 PM Bug #23352: osd: segfaults under normal operation
- Also seeing these, no core dump but have now had 3 segfaults in 2 weeks since upgrading to 12.2.4 from a very stable ...
- 11:15 AM Bug #23145: OSD crashes during recovery of EC pg
- hi Josh, here is the log i said to offer in APAC with debug_osd=30 & debug_bluestore = 30. its another environment, s...
- 08:08 AM Bug #23145: OSD crashes during recovery of EC pg
- Xiaofei Cui wrote:
> We think we have met the same problem.
> The pginfos:
>
> [...]
>
> We have no idea why... - 05:21 AM Backport #23412 (In Progress): luminous: delete type mismatch in CephContext teardown
- https://github.com/ceph/ceph/pull/20998
- 02:55 AM Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica tak...
- I reproduced this by creating an inconsistent pg and then causing it to split.
pool of size 2 with 1 pg and I crea... - 02:02 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- Got it! I couldn't use the pause feature because none of the get/get-bytes/stat stuff would work, it all got stuck.
...
03/21/2018
- 11:38 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
I'm confused because you are dealing with 2 different objects.
Does rb.0.854e.238e1f29.000000140b6d still have a...- 11:12 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- Ah, I got confused by stuff. Should I just not stop the OSDs, or just stop during the get, start for the put?
I ju... - 11:00 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
I don't know how rados get/put could work while the PGs OSDs are all stopped. Also, 'rados get' will give EIO erro...- 10:39 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- So the stuff I did above did not work, the result of the repair after get/put:...
- 10:08 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- So for some reason my rados get/put commands are working now, not sure why. After I complete all my steps, the repair...
- 09:31 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- I'm trying the same thing on a different broken pg, while it's stuck the pg detail is:...
- 09:00 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- The time to write should be on the same order as reading.
You forgot to restart your osd before running rados. - 08:38 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- David Zafman wrote:
> With client activity stopped, read the data from this object and write it again using rados ge... - 06:15 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
You have a data_digest issue not an omap_digest one. You can remove the temporary omap entry. Since shards 8, 13 ...- 05:57 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- I'm trying to fix my problems but I'm kind of a noob, having trouble getting things to work. My cluster seems to be d...
- 07:38 PM Bug #23228: scrub mismatch on objects
- /a/dzafman-2018-03-21_09:57:19-rados:thrash-wip-zafman-testing2-distro-basic-smithi/2312125
rados:thrash/{0-size-m... - 11:33 AM Backport #23408 (In Progress): luminous: mgrc's ms_handle_reset races with send_pgstats()
- https://github.com/ceph/ceph/pull/20987
- 09:14 AM Bug #23431 (Duplicate): OSD Segmentation fault in thread_name:safe_timer
- I noticed an OSD segmentation fault in one of our OSDs logs.
See the attached log entries. There is no core file tha... - 08:27 AM Bug #23430 (Resolved): PGs are stuck in 'creating+incomplete' status on vstart cluster
- Hi,
The PGs are stuck in 'creating+incomplete' status after creating an erasure coded pool on a vstart cluster.
... - 03:05 AM Bug #23428 (New): Snapset inconsistency is hard to diagnose because authoritative copy used by li...
- ...
- 02:27 AM Bug #23145: OSD crashes during recovery of EC pg
- We think we have met the same problem.
The pginfos:...
03/20/2018
- 12:32 PM Bug #23145 (In Progress): OSD crashes during recovery of EC pg
- 12:32 PM Bug #23145: OSD crashes during recovery of EC pg
- Sorry for missing your updates, Peter. :-( I've just scripted my Gmail for _X-Redmine-Project: bluestore_.
From th...
03/19/2018
- 09:35 PM Bug #23145 (New): OSD crashes during recovery of EC pg
- 07:32 PM Bug #23145: OSD crashes during recovery of EC pg
- Can't seem to flip this ticket out of 'Needs more info', unfortunately..
- 04:42 PM Backport #23413 (Resolved): jewel: delete type mismatch in CephContext teardown
- https://github.com/ceph/ceph/pull/21084
- 04:42 PM Backport #23412 (Resolved): luminous: delete type mismatch in CephContext teardown
- https://github.com/ceph/ceph/pull/20998
- 04:42 PM Backport #23408 (Resolved): luminous: mgrc's ms_handle_reset races with send_pgstats()
- https://github.com/ceph/ceph/pull/23791
- 04:26 PM Bug #23267 (In Progress): scrub errors not cleared on replicas can cause inconsistent pg state wh...
- 04:00 PM Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica tak...
- 01:00 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- Appears the error is with calculating the host weight.
It has set it at 43.664 when it should be set to 43.668
... - 10:34 AM Bug #23403 (Closed): Mon cannot join quorum
- Hi all,
On a 3-mon cluster running infernalis one of the mon left the quorum and we are unable to make it come bac... - 10:23 AM Backport #23351 (In Progress): luminous: filestore: do_copy_range replay bad return value
- https://github.com/ceph/ceph/pull/20957
- 09:24 AM Bug #23402 (Duplicate): objecter: does not resend op on split interval
- ...
- 09:01 AM Bug #23370 (Pending Backport): mgrc's ms_handle_reset races with send_pgstats()
03/18/2018
- 10:19 PM Bug #23339 (Resolved): Scrub errors after ec-small-objects-overwrites test
- http://pulpito.ceph.com/sage-2018-03-18_09:19:17-rados-wip-sage-testing-2018-03-18-0231-distro-basic-smithi/
03/17/2018
- 02:08 AM Bug #23395: qa/standalone/special/ceph_objectstore_tool.py causes ceph-mon core dump
../qa/run-standalone.sh ceph_objectstore_tool.py
--- ../qa/standalone/special/ceph_objectstore_tool.py ---
vst...- 02:05 AM Bug #23395 (Can't reproduce): qa/standalone/special/ceph_objectstore_tool.py causes ceph-mon core...
I assume erasure code profile handling must have changed. It shouldn't crash but we may need a test change too.
...
03/16/2018
- 10:38 PM Feature #23364: Special scrub handling of hinfo_key errors
- https://github.com/ceph/ceph/pull/20947
- 08:37 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- Appears Paul Emmerich has found the problem and its down the weights.
The email chain can be seen from the mailin... - 09:22 AM Bug #23386 (Resolved): crush device class: Monitor Crash when moving Bucket into Default root
- When moving prestaged hosts with disks that out side of a root moving them into the root, causes the monitor to crash...
- 08:08 PM Bug #23339 (Fix Under Review): Scrub errors after ec-small-objects-overwrites test
- http://pulpito.ceph.com/sage-2018-03-16_17:59:04-rados:thrash-erasure-code-overwrites-wip-sage-testing-2018-03-16-112...
- 05:09 PM Bug #23352: osd: segfaults under normal operation
- Here is the link to the core dump https://drive.google.com/open?id=1tOTqSOaS94gOhHfXmGbbfuXLNFFfOVuf
- 04:34 PM Bug #23324 (Pending Backport): delete type mismatch in CephContext teardown
- 03:03 AM Bug #23324 (In Progress): delete type mismatch in CephContext teardown
- https://github.com/ceph/ceph/pull/20930
- 01:38 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
- Forgot to mention the exact place it breaks:...
- 10:21 AM Bug #23387 (Resolved): Building Ceph on armhf fails due to out-of-memory
- Hi,
I'm currently struggling with building ceph through make-deps.sh on a armhf (namely the ODROID HC2). Everythin... - 09:16 AM Bug #23385: osd: master osd crash when pg scrub
- The ceph version is 10.2.3
- 09:11 AM Bug #23385 (New): osd: master osd crash when pg scrub
- my ceph on arm 4.4.52-armada-17.06.2.I put a object into rados.when scrub the pg with handle,the master osd crash.bel...
- 08:56 AM Bug #23320: OSD suicide itself because of a firewall rule but reports a received signal
- Can I have some inputs on this topic ? I can make the PR but I'd love having your opinion on it.
Thx,
03/15/2018
- 06:00 PM Bug #23145: OSD crashes during recovery of EC pg
- Let me know if you need anything else off this cluster, I probably will have to trash this busted PG at some point so...
- 05:37 AM Bug #23370 (Fix Under Review): mgrc's ms_handle_reset races with send_pgstats()
- https://github.com/ceph/ceph/pull/20909
- 05:34 AM Bug #23370 (Resolved): mgrc's ms_handle_reset races with send_pgstats()
- 2018-03-14T12:29:45.168 INFO:teuthology.orchestra.run.mira056:Running: 'sudo adjust-ulimits ceph-coverage /home/ubunt...
- 05:34 AM Bug #23371 (New): OSDs flaps when cluster network is made down
- we are having a 5 node cluster with 5 mons and 120 OSDs equally distributed.
As part of our resiliency test we ma... - 04:06 AM Backport #23315 (In Progress): luminous: pool create cmd's expected_num_objects is not correctly ...
- https://github.com/ceph/ceph/pull/20907
03/14/2018
- 09:37 PM Bug #22346: OSD_ORPHAN issues after jewel->luminous upgrade, but orphaned osds not in crushmap
- Hi Jun,
It's not really possible to pinpoint an exact PR at this stage as it's possible there was more than one an... - 10:19 AM Bug #22346: OSD_ORPHAN issues after jewel->luminous upgrade, but orphaned osds not in crushmap
- Brad Hubbard wrote:
> Hi Graham,
>
> The consensus is that this was caused by a bug in a previous release which f... - 08:41 PM Bug #23365 (New): CEPH device class not honored for erasure encoding.
- To start, this cluster isn't happy. It is my destructive testing/learning cluster.
Recently I rebuilt the cluster... - 08:36 PM Feature #23364 (Resolved): Special scrub handling of hinfo_key errors
We shouldn't handle hinfo_key as just another user xattr
Add the following errors specific to hinfo_key for eras...- 06:32 PM Bug #23361 (New): /build/ceph-12.2.4/src/osd/PGLog.h: 888: FAILED assert(i->prior_version == last...
Log with debug_osd=20 and debug_bluestore=20 enabled:
https://drive.google.com/open?id=1Yr_MIXHzrgWUR5ZsV1xKlPUqZH...- 04:49 PM Bug #23360 (Duplicate): call to 'ceph osd erasure-code-profile set' asserts the monitors
- duplicate of http://tracker.ceph.com/issues/23345
- 04:16 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
- A proper fix would be to provide a proper error message in @OSDMonitor::parse_erasure_code_profile@ instead of assert...
- 04:15 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
- Found the cause of this. From the mon.a.log:...
- 03:08 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
- Hm. quite possible that this is in fact not a classc deadlock.
Turns out, the `ceph` command line tool is also br... - 02:52 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
- The @send_command()@ function visible in this traceback is: https://github.com/ceph/ceph/pull/20865/files#diff-188b91...
- 02:48 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
- Could you point to the code, or provide a small python example, that triggers this deadlock?
- 02:37 PM Bug #23360 (Duplicate): call to 'ceph osd erasure-code-profile set' asserts the monitors
- I've attached `thread apply all bt` mixed with `thread apply all py-bt`
Threads 38 35 34 32 and 31 are waiting for... - 03:48 PM Bug #23352: osd: segfaults under normal operation
- Sage, PM'ed to you the public download link, hope it works.
- 03:39 PM Bug #23352: osd: segfaults under normal operation
- HI Sage, I do have the core dump. Where can I upload the file, it's rather large, 850 MB compressed.
- 01:54 PM Bug #23352 (Need More Info): osd: segfaults under normal operation
- Do you have a core file? I haven't seen this crash before.
- 02:13 AM Bug #23352 (Resolved): osd: segfaults under normal operation
- -1> 2018-03-13 22:03:27.390956 7f42eec36700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1520993007390955, "job": 454,...
- 01:58 PM Bug #23345: `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
- 01:55 PM Bug #23339: Scrub errors after ec-small-objects-overwrites test
- 10:59 AM Bug #22351: Couldn't init storage provider (RADOS)
- @Brad - that's perfect, thanks. Backport PR open.
- 10:27 AM Bug #22351: Couldn't init storage provider (RADOS)
- @Nathan Oops, sorry mate, my bad.
These are the two we need.
https://github.com/ceph/ceph/pull/20022
https:/... - 09:44 AM Bug #22351: Couldn't init storage provider (RADOS)
- @Brad - I was confused because you changed the status to Resolved, apparently before the backport was done.
Could ... - 12:25 AM Bug #22351: Couldn't init storage provider (RADOS)
- @Nathan There wasn't one, I just set the backport field?
Just let me know if you need any action from me on this. - 10:57 AM Backport #23349 (In Progress): luminous: Couldn't init storage provider (RADOS)
- 07:13 AM Documentation #23354 (Resolved): doc: osd_op_queue & osd_op_queue_cut_off
- In docs:
osd_op_queue default is: `prio`. Real is `wpq`. So this is a doc's bug.
If I understand properly: if o... - 05:12 AM Backport #23312 (In Progress): luminous: invalid JSON returned when querying pool parameters
- https://github.com/ceph/ceph/pull/20890
03/13/2018
- 11:15 PM Backport #23307 (In Progress): jewel: ceph-objectstore-tool command to trim the pg log
- 10:22 PM Backport #23351 (Resolved): luminous: filestore: do_copy_range replay bad return value
- https://github.com/ceph/ceph/pull/20957
- 10:22 PM Bug #23298 (Pending Backport): filestore: do_copy_range replay bad return value
- 10:13 PM Backport #23323 (Resolved): luminous: ERROR type entries of pglog do not update min_last_complete...
- 09:58 PM Backport #23349 (Resolved): luminous: Couldn't init storage provider (RADOS)
- https://github.com/ceph/ceph/pull/20896
- 09:46 PM Bug #22351 (Pending Backport): Couldn't init storage provider (RADOS)
- @Brad, I missed which PR is the luminous backport PR?
- 09:27 PM Bug #22887: osd/ECBackend.cc: 2202: FAILED assert((offset + length) <= (range.first.get_off() + r...
- Here's another: /ceph/teuthology-archive/pdonnell-2018-03-11_22:42:18-multimds-wip-pdonnell-testing-20180311.180352-t...
- 09:20 PM Bug #23345: `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
- Running either...
- 09:09 PM Bug #23345 (Resolved): `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
- Coming into OSDMonitor::parse_erasure_code_profile() will trigger an assert that probably should be an error instead....
- 08:58 PM Bug #22902 (In Progress): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine eve...
- 08:55 PM Bug #23282 (New): If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0...
- 11:44 AM Bug #23282 (Closed): If you add extra characters to an fsid, it gets parsed as "00000000-0000-000...
- 04:00 AM Bug #23282: If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0000-00...
- Greg Farnum wrote:
> So it got better when you took away the extra "80" prefix?
yes,my mistake. - 05:16 PM Bug #23339 (Resolved): Scrub errors after ec-small-objects-overwrites test
dzafman-2018-03-12_08:11:53-rados-wip-zafman-testing-distro-basic-smithi/2283533...- 07:13 AM Bug #23258: OSDs keep crashing.
- ...
- 07:04 AM Bug #23258: OSDs keep crashing.
- We are now having the same issue on osd.1, osd.11, osd.20 and osd.25, each located on different host. osd.1 uses file...
- 06:13 AM Bug #23324: delete type mismatch in CephContext teardown
- This has to do with the use of placement new in the overload of Log::create_entry with the expected_size argument. I'...
03/12/2018
- 10:56 PM Bug #22902: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
OSD 4 is the primary [6,5,4]/[4,5,7] with osd.6 crashing...- 09:18 PM Bug #22050 (Resolved): ERROR type entries of pglog do not update min_last_complete_ondisk, potent...
- 04:45 PM Bug #22050 (Pending Backport): ERROR type entries of pglog do not update min_last_complete_ondisk...
- 09:12 PM Bug #23325: osd_max_pg_per_osd.py: race between pool creation and wait_for_clean
- Seen here as well: http://pulpito.ceph.com/nojha-2018-03-02_23:59:23-rados-wip-async-recovery-2018-03-02-distro-basic...
- 09:06 PM Bug #23325 (New): osd_max_pg_per_osd.py: race between pool creation and wait_for_clean
- Seen in http://pulpito.ceph.com/joshd-2018-03-12_15:49:43-rados-wip-pg-log-trim-error-luminous-distro-basic-smithi/22...
- 06:22 PM Bug #23324: delete type mismatch in CephContext teardown
- It looks more to me like we're allocating an object of one type (Entry) and then casting it to another (Log)? Is ther...
- 05:16 PM Bug #23324: delete type mismatch in CephContext teardown
- I don't recognize this from elsewhere and it looks like the kind of issue that could arise from trying to delete some...
- 04:56 PM Bug #23324: delete type mismatch in CephContext teardown
- Package in this case is:
librados2-13.0.1-2356.gf2b88f364515.fc27.x86_64 - 04:51 PM Bug #23324 (Resolved): delete type mismatch in CephContext teardown
- I've been hunting some memory corruption in ganesha and ran across this. Seems unlikely to be the cause of the crashe...
- 05:19 PM Bug #23282: If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0000-00...
- So it got better when you took away the extra "80" prefix?
- 06:31 AM Bug #23282: If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0000-00...
- My mistake. I don't know why there's an extra "80" of fsid in My conf.
- 05:19 PM Bug #23290: "/test/osd/RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basi...
- Is that the "the disk errored out" bug?
- 04:44 PM Backport #23323 (Resolved): luminous: ERROR type entries of pglog do not update min_last_complete...
- https://github.com/ceph/ceph/pull/20851
- 01:39 PM Bug #22656: scrub mismatch on bytes (cache pools)
- /a/sage-2018-03-11_23:03:25-rados-wip-sage2-testing-2018-03-10-1616-distro-basic-smithi/2280391
description: rados... - 01:09 PM Bug #23320: OSD suicide itself because of a firewall rule but reports a received signal
- I used this url https://www.mkssoftware.com/docs/man5/siginfo_t.5.asp#Signal_Codes to get a better understanding of t...
- 01:08 PM Bug #23320: OSD suicide itself because of a firewall rule but reports a received signal
- I'm attaching the patch for more readability.
- 11:13 AM Bug #23320 (Resolved): OSD suicide itself because of a firewall rule but reports a received signal
- We (leseb & I) had an issue where the OSD crashes with the following message :
2018-03-08 14:30:26.042607 7f6142b7... - 10:40 AM Bug #23281 (Resolved): run-tox-ceph-disk fails in luminous's "make check" run by jenkins
- 10:39 AM Bug #23283 (Duplicate): os/bluestore:cache arise a Segmentation fault
- Duplicated https://tracker.ceph.com/issues/21259
- 10:23 AM Bug #23258: OSDs keep crashing.
- After extending the cluster to 40 osds and removing osd.11 from it, the problem has moved to osd.1:...
- 09:16 AM Backport #23316 (Resolved): jewel: pool create cmd's expected_num_objects is not correctly interp...
- https://github.com/ceph/ceph/pull/22050
- 09:16 AM Backport #23315 (Resolved): luminous: pool create cmd's expected_num_objects is not correctly int...
- https://github.com/ceph/ceph/pull/20907
- 09:14 AM Backport #23312 (Resolved): luminous: invalid JSON returned when querying pool parameters
- https://github.com/ceph/ceph/pull/20890
- 09:14 AM Backport #23307 (Resolved): jewel: ceph-objectstore-tool command to trim the pg log
- https://github.com/ceph/ceph/pull/20882
03/11/2018
- 11:04 PM Bug #23297: mon-seesaw 'failed to become clean before timeout' due to laggy pg create
- /a/sage-2018-03-11_02:12:48-rados-wip-sage2-testing-2018-03-10-1616-distro-basic-smithi/2276594
- 02:19 AM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
- Anyways, the only place where this can happen is if @snap_seq < max(removed_snaps)@ because the deletion request inse...
- 12:36 AM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
- Well, turns out there were both 12.2.1 and 12.2.4 clients doing snapshot operations. This messed up removed_snaps due...
03/10/2018
- 11:28 PM Bug #22351 (Resolved): Couldn't init storage provider (RADOS)
- All of these PRs have merged on the RADOS side.
- 09:00 PM Bug #23298: filestore: do_copy_range replay bad return value
- https://github.com/ceph/ceph/pull/20832
- 08:55 PM Bug #23298 (Resolved): filestore: do_copy_range replay bad return value
- + if (r < 0 && replaying) {
+ assert(r == -ERANGE);
+ derr << "Filestore: short source tolerated because we ... - 08:41 PM Bug #23297 (Fix Under Review): mon-seesaw 'failed to become clean before timeout' due to laggy pg...
- The OSD gets the pg_create but for a future osdmap and never gets the osdmap due to the mons being slow and thrashy.
... - 12:31 AM Bug #22050 (Fix Under Review): ERROR type entries of pglog do not update min_last_complete_ondisk...
- https://github.com/ceph/ceph/pull/20827
Backport only needed to luminous since error pg log entries did not exist ... - 12:04 AM Bug #23294 (New): OSD booted with noup never got marked in; pgs stuck peering while osd up, but out
- http://pulpito.ceph.com/joshd-2018-03-09_22:47:53-rados-master-distro-basic-smithi/2273020/
This test restarts osd...
Also available in: Atom