Activity
From 03/14/2018 to 04/12/2018
04/12/2018
- 11:08 PM Feature #23364: Special scrub handling of hinfo_key errors
- Follow on pull request included in backport to this tracker
https://github.com/ceph/ceph/pull/21362 - 09:49 PM Bug #23610 (Resolved): pg stuck in activating because of dropped pg-temp message
- 09:48 PM Backport #23630 (Resolved): luminous: pg stuck in activating
- 09:28 PM Backport #23630: luminous: pg stuck in activating
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21330
merged - 05:35 PM Bug #23228: scrub mismatch on objects
- My change only affects the scrub error counts in the stats. However, if setting dirty_info in proc_primary_info() wo...
- 04:27 PM Bug #23228: scrub mismatch on objects
- The original report was an EC test, so it looks like a dup of #23339.
David, your failures are not EC. Could they... - 04:43 PM Bug #20439 (Can't reproduce): PG never finishes getting created
- 04:29 PM Bug #22656: scrub mismatch on bytes (cache pools)
- Just bytes
dzafman-2018-03-28_18:21:29-rados-wip-zafman-testing-distro-basic-smithi/2332093
[ERR] 3.0 scrub sta... - 02:29 PM Bug #20874: osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end() || (miter->second...
- /a/sage-2018-04-11_22:26:40-rados-wip-sage-testing-2018-04-11-1604-distro-basic-smithi/2387226
- 02:25 PM Backport #23668 (In Progress): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrit...
- https://github.com/ceph/ceph/pull/21378
- 01:34 AM Backport #23668 (Resolved): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrites'...
- https://github.com/ceph/ceph/pull/21378
- 07:19 AM Backport #23675 (In Progress): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- 07:07 AM Backport #23675 (Resolved): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- https://github.com/ceph/ceph/pull/21368
- 03:27 AM Bug #23662 (Resolved): osd: regression causes SLOW_OPS warnings in multimds suite
- 02:59 AM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- not able to move this to CI somehow... moving it to RADOS.
- 02:54 AM Backport #23315 (Resolved): luminous: pool create cmd's expected_num_objects is not correctly int...
- 02:41 AM Bug #23622 (Pending Backport): qa/workunits/mon/test_mon_config_key.py fails on master
- 02:01 AM Bug #23564: OSD Segfaults
- Correct, Bluestore and Luminous 12.2.4
- 01:57 AM Backport #23673 (In Progress): jewel: auth: ceph auth add does not sanity-check caps
- 01:43 AM Backport #23673 (Resolved): jewel: auth: ceph auth add does not sanity-check caps
- https://github.com/ceph/ceph/pull/21367
- 01:53 AM Bug #23578 (Resolved): large-omap-object-warnings test fails
- 01:52 AM Backport #23633 (Rejected): luminous: large-omap-object-warnings test fails
- We can close this if that test isn't present in luminous.
- 01:35 AM Backport #23633 (Need More Info): luminous: large-omap-object-warnings test fails
- Brad,
Backporting PR#21295 to luminous is unrelated unless we get qa/suites/rados/singleton-nomsgr/all/large-omap-ob... - 01:41 AM Backport #23670 (In Progress): luminous: auth: ceph auth add does not sanity-check caps
- 01:34 AM Backport #23670 (Resolved): luminous: auth: ceph auth add does not sanity-check caps
- https://github.com/ceph/ceph/pull/24906
- 01:34 AM Backport #23654 (New): luminous: Special scrub handling of hinfo_key errors
- 01:33 AM Bug #22525 (Pending Backport): auth: ceph auth add does not sanity-check caps
04/11/2018
- 11:22 PM Bug #23662 (Fix Under Review): osd: regression causes SLOW_OPS warnings in multimds suite
- https://github.com/ceph/teuthology/pull/1166
- 09:38 PM Bug #23662: osd: regression causes SLOW_OPS warnings in multimds suite
- Looks like the obvious cause: https://github.com/ceph/ceph/pull/20660
- 07:56 PM Bug #23662 (Resolved): osd: regression causes SLOW_OPS warnings in multimds suite
- See: [1], first instance of the problem at [0].
The last run which did not cause most multimds jobs to fail with S... - 11:20 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
Any scrub that completes without errors will set num_scrub_errors in pg stats to 0. That will cause the inconsiste...- 10:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- David, is there any way a missing object wouldn't be reported in list-inconsistent output?
- 11:01 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- Let's see if this happens again now that sage's fast peering branch is merged.
- 10:58 PM Backport #23485 (Resolved): luminous: scrub errors not cleared on replicas can cause inconsistent...
- 10:58 PM Bug #23585: osd: safe_timer segfault
- Possibly the same as http://tracker.ceph.com/issues/23431
- 02:10 PM Bug #23585: osd: safe_timer segfault
- Got segfault in safe_timer too. Got it just once so can not provide more info at the moment.
2018-04-03 05:53:07... - 10:57 PM Bug #23564: OSD Segfaults
- Is this on bluestore? there are a few reports of this occurring on bluestore including your other bug http://tracker....
- 10:44 PM Bug #23590: kstore: statfs: (95) Operation not supported
- 10:42 PM Bug #23601 (Resolved): Cannot see the value of parameter osd_op_threads via ceph --show-config
- 10:37 PM Bug #23614: local_reserver double-reservation of backfilled pg
- This may be the same root cause as http://tracker.ceph.com/issues/23490
- 10:36 PM Bug #23565: Inactive PGs don't seem to cause HEALTH_ERR
- Brad, can you take a look at this? I think it can be handled by the stuck pg code, that iirc already warns about pgs ...
- 10:25 PM Bug #23664 (Resolved): cache-try-flush hits wrlock, busy loops
- ...
- 10:12 PM Bug #23403 (Closed): Mon cannot join quorum
- Thanks for letting us know.
- 01:15 PM Bug #23403: Mon cannot join quorum
- After more investigation we discovered that one of the bonds on the machine was not behaving properly. We removed the...
- 11:28 AM Bug #23403: Mon cannot join quorum
- Thanks for the investigation Brad.
The "fault, initiating reconnect" and "RESETSESSION" messages only appear when ... - 07:57 PM Bug #23595: osd: recovery/backfill is extremely slow
- @Greg Farnum: Ah, great that part is already handled!
What about my other questions though, like
> I think it i... - 06:45 PM Bug #23595 (Duplicate): osd: recovery/backfill is extremely slow
- https://tracker.ceph.com/issues/23141
Sorry you ran into this, it's a bug in BlueStore/BlueFS. The fix will be in ... - 07:49 PM Backport #23315: luminous: pool create cmd's expected_num_objects is not correctly interpreted
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20907
merged - 05:45 PM Feature #23660 (New): when scrub errors are due to disk read errors, ceph status can say "likely ...
- If some of the scrub errors are due to disk read errors, we can also say in the status output "likely disk errors" an...
- 03:49 PM Bug #23487 (Pending Backport): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- 03:39 PM Backport #23654 (Resolved): luminous: Special scrub handling of hinfo_key errors
- https://github.com/ceph/ceph/pull/21397
- 03:09 PM Bug #23621 (Resolved): qa/standalone/mon/misc.sh fails on master
- 01:40 PM Backport #23316: jewel: pool create cmd's expected_num_objects is not correctly interpreted
- -https://github.com/ceph/ceph/pull/21042-
but test/mon/osd-pool-create.sh failing, looking into it. - 05:00 AM Bug #23610 (Pending Backport): pg stuck in activating because of dropped pg-temp message
- 04:56 AM Bug #23648 (New): max-pg-per-osd.from-primary fails because of activating pg
- the reason why we have activating pg when the number of pg is under the hard limit of max-pg-per-osd is that:
1. o... - 03:01 AM Bug #23647 (In Progress): thrash-eio test can prevent recovery
- We are injecting random EIOs. However, in a recovery situation an EIO leads us to decide the object is missing in on...
04/10/2018
- 11:38 PM Feature #23364 (Pending Backport): Special scrub handling of hinfo_key errors
- 09:13 PM Bug #23428: Snapset inconsistency is hard to diagnose because authoritative copy used by list-inc...
- In the pull request https://github.com/ceph/ceph/pull/20947 there is a change to partially address this issue. Unfor...
- 09:08 PM Bug #23646 (Resolved): scrub interaction with HEAD boundaries and clones is broken
- Scrub will work in chunks, accumulating work in cleaned_meta_map. A single object's clones may stretch across two su...
- 06:12 PM Backport #23630 (In Progress): luminous: pg stuck in activating
- 05:53 PM Backport #23630 (Resolved): luminous: pg stuck in activating
- https://github.com/ceph/ceph/pull/21330
- 05:53 PM Bug #23610 (Fix Under Review): pg stuck in activating because of dropped pg-temp message
- 05:53 PM Bug #23610 (Pending Backport): pg stuck in activating because of dropped pg-temp message
- 05:53 PM Backport #23633 (Rejected): luminous: large-omap-object-warnings test fails
- 05:47 PM Bug #18746 (Fix Under Review): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) ...
- 04:26 PM Bug #23495 (Resolved): Need (SLOW_OPS) in whitelist for another yaml
- 11:26 AM Bug #23495 (Fix Under Review): Need (SLOW_OPS) in whitelist for another yaml
- https://github.com/ceph/ceph/pull/21324
- 01:55 PM Bug #23627 (Resolved): Error EACCES: problem getting command descriptions from mgr.None from 'cep...
- ...
- 01:32 PM Bug #23622 (Fix Under Review): qa/workunits/mon/test_mon_config_key.py fails on master
- https://github.com/ceph/ceph/pull/21329
- 03:42 AM Bug #23622: qa/workunits/mon/test_mon_config_key.py fails on master
- see https://github.com/ceph/ceph/pull/21317 (not a fix)
- 02:56 AM Bug #23622 (Resolved): qa/workunits/mon/test_mon_config_key.py fails on master
- ...
- 07:04 AM Bug #20919 (Resolved): osd: replica read can trigger cache promotion
- 06:59 AM Backport #22403 (Resolved): jewel: osd: replica read can trigger cache promotion
- 06:22 AM Bug #23585: osd: safe_timer segfault
- https://drive.google.com/open?id=1x_0p9s9JkQ1zo-LCx6mHxm0DQO5sc1UA too larger about(1.2G). And ceph-osd.297.log.gz di...
- 05:53 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- doc don't update. So i create a PR:https://github.com/ceph/ceph/pull/21319.
- 04:57 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- In this commit:08731c3567300b28d83b1ac1c2ba. It removed. Maybe docs didn't update or you read old docs.
- 04:27 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- But I can see this option in document !! The setting is work in Jewel
So osd_op_threads was removed from Luminous ??
- 03:14 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- There is no "osd_op_threads". Now it call osd_op_num_shards/osd_op_num_shards_hdd/osd_op_num_shards_sdd.
- 05:34 AM Bug #23595: osd: recovery/backfill is extremely slow
- check hdd or ssd by code at osd started and not changed after starting.
I think we need increase the log level fo... - 05:19 AM Bug #23593 (Resolved): RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- 04:29 AM Bug #23621 (In Progress): qa/standalone/mon/misc.sh fails on master
- https://github.com/ceph/ceph/pull/21318
- 04:17 AM Bug #23621: qa/standalone/mon/misc.sh fails on master
- bc5df2b4497104c2a8747daf0530bb5184f9fecb added ceph::features::mon::FEATURE_OSDMAP_PRUNE so the output that's failing...
- 02:53 AM Bug #23621: qa/standalone/mon/misc.sh fails on master
- http://pulpito.ceph.com/sage-2018-04-10_02:19:56-rados-master-distro-basic-smithi/2377263
http://pulpito.ceph.com/sa... - 02:51 AM Bug #23621 (Resolved): qa/standalone/mon/misc.sh fails on master
- This appears to be from the addition of the osdmap-prune mon feature?
- 02:49 AM Bug #23620 (Fix Under Review): tasks.mgr.test_failover.TestFailover failure
- https://github.com/ceph/ceph/pull/21315
- 02:43 AM Bug #23620 (Resolved): tasks.mgr.test_failover.TestFailover failure
- http://pulpito.ceph.com/sage-2018-04-10_02:19:56-rados-master-distro-basic-smithi/2377255...
- 12:57 AM Bug #23578 (Pending Backport): large-omap-object-warnings test fails
- Just a note that my analysis above was incorrect and this was not due to the lost coin flips but due to a pg map upda...
- 12:18 AM Backport #23485 (In Progress): luminous: scrub errors not cleared on replicas can cause inconsist...
04/09/2018
- 10:24 PM Feature #23616 (New): osd: admin socket should help debug status at all times
- Last week I was looking at an LRC OSD which was having trouble, and it wasn't clear why.
The cause ended up being ... - 10:18 PM Bug #22882 (Resolved): Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
- Whoops, this merged way back then with a slightly different plan than discussed here (see PR discussion).
- 09:59 PM Bug #22525: auth: ceph auth add does not sanity-check caps
- https://github.com/ceph/ceph/pull/21311
- 09:21 PM Feature #23045 (Resolved): mon: warn on slow ops in OpTracker
- That PR got merged a while ago and we've been working through the slow ops warnings that turn up since. Seems to be a...
- 08:59 PM Feature #21084 (Resolved): auth: add osd auth caps based on pool metadata
- 06:53 PM Bug #23614: local_reserver double-reservation of backfilled pg
- Looking through the code I don't see where the reservation is supposed to be released. I see releases for
- the p... - 06:52 PM Bug #23614 (Resolved): local_reserver double-reservation of backfilled pg
- - pg gets reservations (incl local_reserver)
- pg backfills, finishes
- ...apparently enver releases the reservatio... - 06:15 PM Bug #23365: CEPH device class not honored for erasure encoding.
- A quote from Greg Farnum on the crash from another ticket:...
- 06:13 PM Bug #23365: CEPH device class not honored for erasure encoding.
- I put 12.2.2, but that is incorrect. It is version ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) lu...
- 05:38 PM Bug #23365: CEPH device class not honored for erasure encoding.
- What version are you running? How are your OSDs configured?
There was a bug with BlueStore SSDs being misreported ... - 05:36 PM Bug #23371: OSDs flaps when cluster network is made down
- You tested this on a version prior to luminous and the behavior has *changed*?
This must be a result of some chang... - 05:24 PM Documentation #23613 (Closed): doc: add description of new fs-client auth profile
- 05:23 PM Documentation #23613 (Closed): doc: add description of new fs-client auth profile
- On that page: http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
https:... - 05:23 PM Documentation #23612 (New): doc: add description of new auth profiles
- On that page: http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
https:... - 05:18 PM Support #23455 (Resolved): osd: large number of inconsistent objects after recover or backfilling
- fiemap is disabled by default precisely because there are a number of known bugs in the local filesystems across kern...
- 05:07 PM Bug #23610 (Fix Under Review): pg stuck in activating because of dropped pg-temp message
- https://github.com/ceph/ceph/pull/21310
- 05:02 PM Bug #23610 (Resolved): pg stuck in activating because of dropped pg-temp message
- http://pulpito.ceph.com/yuriw-2018-04-05_22:33:03-rados-wip-yuri3-testing-2018-04-05-1940-luminous-distro-basic-smith...
- 05:06 PM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- This is a dupe of...something. We can track it down later.
For now, note that the crash is happening with Hammer c... - 06:17 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- Hm hm hm
- 02:56 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- h3. rados bisect
Reproducer: ... - 02:11 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- This problem was not happening so reproducibly before the current integration run, so one of the following PRs might ...
- 02:05 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- Set priority to Urgent because this prevents us from getting a clean rados run in jewel 10.2.11 integration testing.
- 02:04 AM Bug #23598 (Duplicate): hammer->jewel: ceph_test_rados crashes during radosbench task in jewel ra...
- Test description: rados/upgrade/{hammer-x-singleton/{0-cluster/{openstack.yaml start.yaml} 1-hammer-install/hammer.ya...
- 04:39 PM Bug #23595: osd: recovery/backfill is extremely slow
- *I have it figured out!*
The issue was "osd_recovery_sleep_hdd", which defaults to 0.1 seconds.
After setting
... - 03:23 PM Bug #23595: osd: recovery/backfill is extremely slow
- OK, if I only have the 6 large files in the cephfs AND set the options...
- 02:55 PM Bug #23595: osd: recovery/backfill is extremely slow
- I have now tested with only the 6*1GB files, having deleted the 270k empty files from cephfs.
I continue to see ex... - 12:30 PM Bug #23595: osd: recovery/backfill is extremely slow
- You can find a core dump of the -O0 version created with GDB at http://nh2.me/ceph-issue-23595-osd-O0.core.xz
- 12:06 PM Bug #23595: osd: recovery/backfill is extremely slow
- Attached are two GDB runs of a sender node.
In the release build there were many values "<optimized out>", so I re... - 11:45 AM Bug #23595: osd: recovery/backfill is extremely slow
- On https://forum.proxmox.com/threads/increase-ceph-recovery-speed.36728/ people reported the same number as me of 10 ...
- 10:43 AM Bug #23601 (Resolved): Cannot see the value of parameter osd_op_threads via ceph --show-config
- I have set the parameter of "osd op threads" in configuration file
but I cannot see the value of parameter "osd op t... - 10:17 AM Bug #23403 (Need More Info): Mon cannot join quorum
- 07:23 AM Bug #23578 (In Progress): large-omap-object-warnings test fails
- https://github.com/ceph/ceph/pull/21295
- 01:33 AM Bug #23578: large-omap-object-warnings test fails
- We instruct the OSDs to scrub at around 16:15....
- 04:31 AM Bug #23593 (Fix Under Review): RESTControllerTest.test_detail_route and RESTControllerTest.test_f...
- 02:08 AM Bug #22123: osd: objecter sends out of sync with pg epochs for proxied ops
- Despite the jewel backport of this fix being merged, this problem has reappeared in jewel 10.2.11 integration testing...
04/08/2018
- 07:55 PM Bug #23595: osd: recovery/backfill is extremely slow
- For the record, I installed the following debugging packages for gdb stack traces:...
- 07:53 PM Bug #23595: osd: recovery/backfill is extremely slow
- I have read https://www.spinics.net/lists/ceph-devel/msg38331.html which suggests that there is some throttling going...
- 06:17 PM Bug #23595 (Duplicate): osd: recovery/backfill is extremely slow
- I made a Ceph 12.2.4 (luminous stable) cluster of 3 machines with 10-Gigabit networking on Ubuntu 16.04, using pretty...
- 05:40 PM Bug #23593: RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- PR: https://github.com/ceph/ceph/pull/21290
- 03:10 PM Bug #23593 (Resolved): RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- ...
- 04:31 PM Documentation #23594: auth: document what to do when locking client.admin out
- I found one way to fix it on the mailing list:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/01... - 04:23 PM Documentation #23594 (New): auth: document what to do when locking client.admin out
- I accidentally ran ...
- 11:06 AM Bug #23590: kstore: statfs: (95) Operation not supported
- https://github.com/ceph/ceph/pull/21287
- 11:01 AM Bug #23590 (Fix Under Review): kstore: statfs: (95) Operation not supported
- 2018-04-07 16:19:07.248 7fdec4675700 -1 osd.0 0 statfs() failed: (95) Operation not supported
2018-04-07 16:19:08.... - 08:50 AM Bug #23589 (New): jewel: KStore Segmentation fault in ceph_test_objectstore --gtest_filter=-*/2:-*/3
- Test description: rados/objectstore/objectstore.yaml
Log excerpt:... - 08:39 AM Bug #23588 (New): LibRadosAioEC.IsCompletePP test fails in jewel 10.2.11 integration testing
- Test description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yam...
- 06:53 AM Bug #23511: forwarded osd_failure leak in mon
- Greg, no. both tests below include the no_reply() fix.
see
- http://pulpito.ceph.com/kchai-2018-03-30_15:29:52-r... - 06:42 AM Bug #23585 (Duplicate): osd: safe_timer segfault
- ...
04/07/2018
- 03:04 AM Bug #23195: Read operations segfaulting multiple OSDs
Change the test-erasure-eio.sh test as following:...
04/06/2018
- 10:23 PM Bug #22165 (Fix Under Review): split pg not actually created, gets stuck in state unknown
- Fixed by https://github.com/ceph/ceph/pull/20469
- 09:29 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
- You'll definitely get more attention and advice if somebody else has hit this issue before.
- 08:45 PM Bug #23195: Read operations segfaulting multiple OSDs
- For anyone running into the send_all_remaining_reads() crash, a workaround is to use these osd settings:...
- 04:17 PM Bug #23195 (Fix Under Review): Read operations segfaulting multiple OSDs
- https://github.com/ceph/ceph/pull/21273
I'm going to treat this issue as tracking the first crash, in send_all_rem... - 03:10 AM Bug #23195 (In Progress): Read operations segfaulting multiple OSDs
- 08:41 PM Bug #23200 (Resolved): invalid JSON returned when querying pool parameters
- 08:40 PM Backport #23312 (Resolved): luminous: invalid JSON returned when querying pool parameters
- 07:28 PM Backport #23312: luminous: invalid JSON returned when querying pool parameters
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20890
merged - 08:40 PM Bug #23324 (Resolved): delete type mismatch in CephContext teardown
- 08:40 PM Backport #23412 (Resolved): luminous: delete type mismatch in CephContext teardown
- 07:28 PM Backport #23412: luminous: delete type mismatch in CephContext teardown
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20998
merged - 08:38 PM Bug #23477 (Resolved): should not check for VERSION_ID
- 08:38 PM Backport #23478 (Resolved): should not check for VERSION_ID
- 07:26 PM Backport #23478: should not check for VERSION_ID
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21090
merged - 06:03 PM Bug #21833 (Resolved): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
- 06:02 PM Backport #23160 (Resolved): luminous: Multiple asserts caused by DNE pgs left behind after lots o...
- 03:57 PM Backport #23160: luminous: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
- Prashant D wrote:
> Waiting for code review for backport PR : https://github.com/ceph/ceph/pull/20668
merged - 06:02 PM Bug #23078 (Resolved): SRV resolution fails to lookup AAAA records
- 06:02 PM Backport #23174 (Resolved): luminous: SRV resolution fails to lookup AAAA records
- 03:56 PM Backport #23174: luminous: SRV resolution fails to lookup AAAA records
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20710
merged - 05:57 PM Backport #23472 (Resolved): luminous: add --add-bucket and --move options to crushtool
- 03:53 PM Backport #23472: luminous: add --add-bucket and --move options to crushtool
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21079
mergedReviewed-by: Josh Durgin <jdurgin@redhat.com> - 05:37 PM Bug #23578 (Resolved): large-omap-object-warnings test fails
- ...
- 03:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Sorry, forgot to mention I am running 12.2.4.
- 03:50 PM Bug #23576 (Can't reproduce): osd: active+clean+inconsistent pg will not scrub or repair
- My apologies if I'm too premature in posting this.
Myself and so far two others on the mailing list: http://lists.... - 03:44 AM Bug #23345 (Resolved): `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
- https://github.com/ceph/ceph/pull/20986
- 01:57 AM Bug #21737 (Resolved): OSDMap cache assert on shutdown
- 01:56 AM Backport #21786 (Resolved): jewel: OSDMap cache assert on shutdown
04/05/2018
- 09:12 PM Bug #22887 (Duplicate): osd/ECBackend.cc: 2202: FAILED assert((offset + length) <= (range.first.g...
- 09:12 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- From #22887, this also appeared in /ceph/teuthology-archive/pdonnell-2018-01-30_23:38:56-kcephfs-wip-pdonnell-i22627-...
- 09:09 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- That was the fix I was wondering about, but it was merged to master as https://github.com/ceph/ceph/pull/15712 and so...
- 09:05 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- https://github.com/ceph/ceph/pull/15712
- 09:10 PM Bug #19882: rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0.10251ca0c5f...
- https://github.com/ceph/ceph/pull/15712
- 06:35 PM Bug #22351 (Resolved): Couldn't init storage provider (RADOS)
- 06:35 PM Backport #23349 (Resolved): luminous: Couldn't init storage provider (RADOS)
- 05:22 PM Backport #23349: luminous: Couldn't init storage provider (RADOS)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20896
merged - 06:33 PM Bug #22114 (Resolved): mon: ops get stuck in "resend forwarded message to leader"
- 06:33 PM Backport #23077 (Resolved): luminous: mon: ops get stuck in "resend forwarded message to leader"
- 04:57 PM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21016
merged - 04:57 PM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21016
merged - 06:31 PM Bug #22752 (Resolved): snapmapper inconsistency, crash on luminous
- 06:31 PM Backport #23500 (Resolved): luminous: snapmapper inconsistency, crash on luminous
- 04:55 PM Backport #23500: luminous: snapmapper inconsistency, crash on luminous
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21118
merged
- 05:14 PM Bug #23565 (Fix Under Review): Inactive PGs don't seem to cause HEALTH_ERR
- In looking at https://tracker.ceph.com/issues/23562, there were inactive PGs starting at...
- 04:43 PM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
- ...
- 04:18 PM Bug #23564 (Duplicate): OSD Segfaults
- Apr 5 11:40:31 roc05r-sc3a100 kernel: [126029.543698] safe_timer[28863]: segfault at 8d ip 00007fa9ad4dcccb sp 00007...
- 12:24 PM Bug #23562 (New): VDO OSD caused cluster to hang
- I awoke to alerts that apache serving teuthology logs on the Octo Long Running Cluster was unresponsive.
Here was ... - 08:37 AM Bug #23439: Crashing OSDs after 'ceph pg repair'
- Hi Greg,
thanks for your response.
> That URL denies access. You can use ceph-post-file instead to upload logs ... - 03:31 AM Bug #23403: Mon cannot join quorum
- My apologies. It appears my previous analysis was incorrect.
I've pored over the logs and it appears the issue is ...
04/04/2018
- 11:19 PM Bug #23554: mon: mons need to be aware of VDO statistics
- Right, but AFAICT the monitor is then not even aware of VDO being involved. Which seems fine to my naive thoughts, bu...
- 11:05 PM Bug #23554: mon: mons need to be aware of VDO statistics
- Of course Sage is already on it :)
I don't know where the ... - 10:46 PM Bug #23554: mon: mons need to be aware of VDO statistics
- At least this: https://github.com/ceph/ceph/pull/20516
- 10:44 PM Bug #23554: mon: mons need to be aware of VDO statistics
- What would we expect this monitor awareness to look like? Extra columns duplicating the output of vdostats?
- 05:48 PM Bug #23554 (New): mon: mons need to be aware of VDO statistics
- I created an OSD on top of a logical volume with a VDO device underneath.
Ceph is unaware of how much compression ... - 09:58 PM Bug #23128: invalid values in ceph.conf do not issue visible warnings
- http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/ has been updated with information about this
- 09:53 PM Bug #23273: segmentation fault in PrimaryLogPG::recover_got()
- Can you reproduce with osds configured with:...
- 09:43 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
- That URL denies access. You can use ceph-post-file instead to upload logs to a secure location.
It's not clear wha... - 09:39 PM Bug #23320 (Fix Under Review): OSD suicide itself because of a firewall rule but reports a receiv...
- github.com/ceph/ceph/pull/21000
- 09:37 PM Bug #23487: There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- 09:31 PM Bug #23510 (Resolved): rocksdb spillover for hard drive configurations
- 09:31 PM Bug #23511: forwarded osd_failure leak in mon
- Kefu, did your latest no_reply() PR resolve this?
- 09:29 PM Bug #23535 (Closed): 'ceph --show-config --conf /dev/null' does not work any more
- Yeah, you should use the monitor config commands now! :)
- 09:28 PM Bug #23258: OSDs keep crashing.
- Brian, that's a separate bug; the code address you've picked up on is just part of the generic failure handling code....
- 09:19 PM Bug #23258: OSDs keep crashing.
- I was about to start a new bug and found this, I am also seeing 0xa74234 and ceph::__ceph_assert_fail...
A while b... - 09:22 PM Bug #20924: osd: leaked Session on osd.7
- /a/sage-2018-04-04_02:28:04-rados-wip-sage2-testing-2018-04-03-1634-distro-basic-smithi/2351291
rados/verify/{ceph... - 09:21 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
- Under discussion on the PR, which is good on its own terms but suffering from a prior CephFS bug. :(
- 09:19 PM Bug #23297: mon-seesaw 'failed to become clean before timeout' due to laggy pg create
- I suspect this is resolved in https://github.com/ceph/ceph/pull/19973 by the commit that has the OSDs proactively go ...
- 09:16 PM Bug #23490: luminous: osd: double recovery reservation for PG when EIO injected (while already re...
- David, can you look at this when you get a chance? I think it's due to EIO triggering recovery when recovery is alrea...
- 09:13 PM Bug #23204: missing primary copy of object in mixed luminous<->master cluster with bluestore
- We should see this again as we run the upgrade suite for mimic...
- 09:08 PM Bug #22902 (Resolved): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
- https://github.com/ceph/ceph/pull/20933
- 09:07 PM Bug #23267 (Pending Backport): scrub errors not cleared on replicas can cause inconsistent pg sta...
- 07:25 PM Backport #23413 (Resolved): jewel: delete type mismatch in CephContext teardown
- 07:23 PM Bug #20471 (Resolved): Can't repair corrupt object info due to bad oid on all replicas
- 07:23 PM Backport #23181 (Resolved): jewel: Can't repair corrupt object info due to bad oid on all replicas
- 06:24 PM Bug #21758 (Resolved): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
- 06:24 PM Backport #21784 (Resolved): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make check...
- 06:18 PM Feature #23242 (Resolved): ceph-objectstore-tool command to trim the pg log
- 06:18 PM Backport #23307 (Resolved): jewel: ceph-objectstore-tool command to trim the pg log
- 08:14 AM Feature #23552 (New): cache PK11Context in Connection and probably other consumers of CryptoKeyHa...
- please see attached flamegraph, the 0.67% CPU cycle is used by PK11_CreateContextBySymKey(), if we cache the PK11Cont...
04/03/2018
- 08:40 PM Bug #23145: OSD crashes during recovery of EC pg
- Investigation results up to the date:
1. The local PGLog claims its _pg_log_t::can_rollback_to_ is **17348'18588**... - 08:59 AM Backport #22906 (Need More Info): jewel: bluestore: New OSD - Caught signal - bstore_kv_sync (thr...
- non-trivial backport
- 08:56 AM Backport #22808 (Need More Info): jewel: "osd pool stats" shows recovery information bugly
- non-trivial backport
- 08:33 AM Backport #22808 (In Progress): jewel: "osd pool stats" shows recovery information bugly
- 08:28 AM Backport #22449 (In Progress): jewel: Visibility for snap trim queue length
- https://github.com/ceph/ceph/pull/21200
- 08:13 AM Backport #22449: jewel: Visibility for snap trim queue length
- I don't think it's possible to backport entire feature without breaking Jewel->Luminous upgrade, so just first commit...
- 08:22 AM Backport #22403 (In Progress): jewel: osd: replica read can trigger cache promotion
- 08:15 AM Backport #22390 (In Progress): jewel: ceph-objectstore-tool: Add option "dump-import" to examine ...
- 04:05 AM Backport #23486 (In Progress): jewel: scrub errors not cleared on replicas can cause inconsistent...
- 02:35 AM Backport #21786 (In Progress): jewel: OSDMap cache assert on shutdown
04/02/2018
- 05:35 PM Bug #23145: OSD crashes during recovery of EC pg
- Anything new or info on what to do to try and recover this cluster? I don't even know how to get the pool deleted pro...
- 10:28 AM Bug #23535: 'ceph --show-config --conf /dev/null' does not work any more
- I just realized `--show-config` does not exist anymore. Probably it was removed intentionally?
04/01/2018
- 07:49 AM Bug #23535 (Closed): 'ceph --show-config --conf /dev/null' does not work any more
- Previously it could be used by users to return the default ceph configuration (see e.g. [1]), now it fails (even if w...
- 07:03 AM Backport #21784 (In Progress): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make ch...
- 06:58 AM Backport #22449 (Need More Info): jewel: Visibility for snap trim queue length
- Backporting this feature to jewel at this late stage seems risky. Do we really need it in jewel?
03/30/2018
- 05:10 PM Bug #22123 (Resolved): osd: objecter sends out of sync with pg epochs for proxied ops
- 05:09 PM Backport #23076 (Resolved): jewel: osd: objecter sends out of sync with pg epochs for proxied ops
- 03:31 PM Bug #23511: forwarded osd_failure leak in mon
- rerunning the tests at http://pulpito.ceph.com/kchai-2018-03-30_15:29:52-rados-wip-slow-mon-ops-kefu-distro-basic-smi...
- 01:02 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- Moving this to CI. This failure would only occur if the cls_XYX.so libraries could not be loaded during the execution...
- 02:59 AM Bug #23517 (Resolved): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- ...
- 05:25 AM Bug #23510: rocksdb spillover for hard drive configurations
- Igor Fedotov wrote:
> Ben,
> this has been fixed by https://github.com/ceph/ceph/pull/19257
> Not sure about an ex... - 12:10 AM Bug #23403 (Triaged): Mon cannot join quorum
- ...
03/29/2018
- 06:39 PM Bug #21218 (Resolved): thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing...
- 06:39 PM Backport #23024 (Resolved): luminous: thrash-eio + bluestore (hangs with unfound objects or read_...
- 01:20 PM Backport #23024: luminous: thrash-eio + bluestore (hangs with unfound objects or read_log_and_mis...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20495
merged - 03:39 PM Bug #23510: rocksdb spillover for hard drive configurations
- Ben,
this has been fixed by https://github.com/ceph/ceph/pull/19257
Not sure about an exact Luminous build it lande... - 03:02 PM Bug #23510 (Resolved): rocksdb spillover for hard drive configurations
- version: ceph-*-12.2.1-34.el7cp.x86_64
One of Bluestore's best use cases is to accelerate performance for writes o... - 03:33 PM Bug #22413 (Resolved): can't delete object from pool when Ceph out of space
- 03:33 PM Backport #23114 (Resolved): luminous: can't delete object from pool when Ceph out of space
- 01:19 PM Backport #23114: luminous: can't delete object from pool when Ceph out of space
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20585
merged - 03:08 PM Bug #23511 (Can't reproduce): forwarded osd_failure leak in mon
- see http://pulpito.ceph.com/kchai-2018-03-29_13:20:02-rados-wip-slow-mon-ops-kefu-distro-basic-smithi/2334154/
<p... - 01:24 PM Bug #22847 (Resolved): ceph osd force-create-pg cause all ceph-mon to crash and unable to come up...
- 01:24 PM Backport #22942 (Resolved): luminous: ceph osd force-create-pg cause all ceph-mon to crash and un...
- 01:21 PM Backport #22942: luminous: ceph osd force-create-pg cause all ceph-mon to crash and unable to com...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20399
merged - 01:23 PM Backport #23075 (Resolved): luminous: osd: objecter sends out of sync with pg epochs for proxied ops
- 01:18 PM Backport #23075: luminous: osd: objecter sends out of sync with pg epochs for proxied ops
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20609
merged - 10:28 AM Bug #19737 (Resolved): EAGAIN encountered during pg scrub (jewel)
- 09:54 AM Backport #23500 (In Progress): luminous: snapmapper inconsistency, crash on luminous
- 08:20 AM Backport #23500 (Resolved): luminous: snapmapper inconsistency, crash on luminous
- https://github.com/ceph/ceph/pull/21118
- 09:16 AM Bug #21844 (Resolved): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions on -ENOENT
- 09:16 AM Backport #21923 (Resolved): jewel: Objecter::C_ObjectOperation_sparse_read throws/catches excepti...
- 09:16 AM Bug #23403: Mon cannot join quorum
- Hi all,
As asked on the ceph-users mailing list, here are the results of the following commands on the 3 monitors:... - 09:09 AM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
- Happened again (jewel 10.2.11 integration testing) - http://qa-proxy.ceph.com/teuthology/smithfarm-2018-03-28_20:31:4...
- 08:25 AM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
- I've seen this on our cluster (luminous, bluestore based), but was unable to reproduce it...
Restarting primary mon... - 01:43 AM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
- when we reboot one host, some osd take a long time to start.
and one osd succeed to start finally after several tim... - 01:11 AM Bug #17170 (New): mon/monclient: update "unable to obtain rotating service keys when osd init" to...
- We hit this issue again in Luminous.
- 08:16 AM Backport #23186 (Resolved): luminous: ceph tell mds.* <command> prints only one matching usage
- 08:15 AM Bug #23212 (Resolved): bluestore: should recalc_allocated when decoding bluefs_fnode_t
- 08:15 AM Backport #23256 (Resolved): luminous: bluestore: should recalc_allocated when decoding bluefs_fno...
- 08:15 AM Bug #23298 (Resolved): filestore: do_copy_range replay bad return value
- 08:14 AM Backport #23351 (Resolved): luminous: filestore: do_copy_range replay bad return value
- 04:10 AM Bug #23228: scrub mismatch on objects
Just bytes
dzafman-2018-03-28_18:21:29-rados-wip-zafman-testing-distro-basic-smithi/2332093
[ERR] 3.0 scrub s...- 04:07 AM Bug #23495 (Resolved): Need (SLOW_OPS) in whitelist for another yaml
A job may have failed because (SLOW_OPS) is missing from tasks/mon_clock_with_skews.yaml
dzafman-2018-03-28_18:2...- 02:09 AM Feature #23493 (Resolved): config: strip/escape single-quotes in values when setting them via con...
- At the moment, the config parsing state machine does not account for single-quotes as potential value enclosures, as ...
- 01:09 AM Bug #23492 (Resolved): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-e...
dzafman-2018-03-28_15:20:23-rados:standalone-wip-zafman-testing-distro-basic-smithi/2331804
In TEST_rados_get_ba...- 12:29 AM Bug #22752 (Pending Backport): snapmapper inconsistency, crash on luminous
03/28/2018
- 10:58 PM Bug #23490 (Duplicate): luminous: osd: double recovery reservation for PG when EIO injected (whil...
- During a luminous test run, this was hit:
http://pulpito.ceph.com/yuriw-2018-03-27_21:16:27-rados-wip-yuri5-testin... - 10:26 PM Backport #23186: luminous: ceph tell mds.* <command> prints only one matching usage
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/20664
merged - 10:26 PM Backport #23256: luminous: bluestore: should recalc_allocated when decoding bluefs_fnode_t
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/20771
merged - 10:22 PM Backport #23351: luminous: filestore: do_copy_range replay bad return value
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20957
merged - 06:06 PM Bug #23487 (Fix Under Review): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- PR: https://github.com/ceph/ceph/pull/21102
- 05:58 PM Bug #23487 (Resolved): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- We have `ceph osd pool set erasure allow_ec_overwrites` command but does not have a corresponding command to get the ...
- 05:42 PM Backport #23486 (Resolved): jewel: scrub errors not cleared on replicas can cause inconsistent pg...
- https://github.com/ceph/ceph/pull/21194
- 05:42 PM Backport #23485 (Resolved): luminous: scrub errors not cleared on replicas can cause inconsistent...
- https://github.com/ceph/ceph/pull/21103
- 05:27 PM Bug #23267 (Fix Under Review): scrub errors not cleared on replicas can cause inconsistent pg sta...
- https://github.com/ceph/ceph/pull/21101
- 11:21 AM Bug #22114 (Pending Backport): mon: ops get stuck in "resend forwarded message to leader"
- 08:15 AM Backport #23478 (In Progress): should not check for VERSION_ID
- https://github.com/ceph/ceph/pull/21090
- 08:08 AM Backport #23478 (Resolved): should not check for VERSION_ID
- https://github.com/ceph/ceph/pull/21090
- 08:07 AM Bug #23477 (Pending Backport): should not check for VERSION_ID
- * https://github.com/ceph/ceph/pull/17787
* https://github.com/ceph/ceph/pull/21052
- 08:06 AM Bug #23477 (Resolved): should not check for VERSION_ID
- as per os-release(5), VERSION_ID is optional.
- 07:06 AM Bug #23352: osd: segfaults under normal operation
- for those who wants to check the coredump. you should use apport-unpack to unpack it first.
and it crashed at /bui... - 05:55 AM Backport #23413 (In Progress): jewel: delete type mismatch in CephContext teardown
- https://github.com/ceph/ceph/pull/21084
- 01:28 AM Backport #23472 (In Progress): luminous: add --add-bucket and --move options to crushtool
- https://github.com/ceph/ceph/pull/21079
- 12:57 AM Backport #23472 (Resolved): luminous: add --add-bucket and --move options to crushtool
- https://github.com/ceph/ceph/pull/21079
- 12:50 AM Bug #23471 (Pending Backport): add --add-bucket and --move options to crushtool
- https://github.com/ceph/ceph/pull/20183
- 12:49 AM Bug #23471 (Resolved): add --add-bucket and --move options to crushtool
- When using crushtool to create a CRUSH map, it is not possible to create a complex CRUSH map, we have to edit the CRU...
03/27/2018
- 10:46 PM Bug #23352: osd: segfaults under normal operation
- Chris,
Was your stack identical to Alex's original description or was it more like the stack in #23431 ? - 10:39 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- I agree these are similar and the cause may indeed be the same however there are only two stack frames in this instan...
- 07:36 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- There's a coredump-in-apport on google drive in http://tracker.ceph.com/issues/23352 - it looks at the face of it sim...
- 01:06 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- I have seen this as well, on our cluster. We're using bluestore, ubuntu 16, latest luminous.
The crashes were totall... - 10:58 AM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- The ceph-osd comes from https://download.ceph.com/rpm-luminous/el7/x86_64/
I verified via md5sum if the the local co... - 09:43 AM Bug #23431 (Need More Info): OSD Segmentation fault in thread_name:safe_timer
- What's the exact version of the ceph-osd you are using (exact package URL if possible please).
You could try 'objd... - 02:52 PM Feature #22420 (Resolved): Add support for obtaining a list of available compression options
- https://github.com/ceph/ceph/pull/20558
- 02:45 PM Bug #23215 (Resolved): config.cc: ~/.ceph/$cluster.conf is passed unexpanded to fopen()
- https://github.com/ceph/ceph/pull/20774
- 09:49 AM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
- might want to include https://github.com/ceph/ceph/pull/21057 also.
- 09:49 AM Bug #22114 (Fix Under Review): mon: ops get stuck in "resend forwarded message to leader"
- and https://github.com/ceph/ceph/pull/21057
- 01:35 AM Bug #22220 (Resolved): osd/ReplicatedPG.h:1667:14: internal compiler error: in force_type_die, at...
- Resolved for Fedora and just waiting on next DTS to ship on rhel/CentOS.
03/26/2018
- 11:27 PM Bug #23465: "Mutex.cc: 110: FAILED assert(r == 0)" ("AttributeError: 'tuple' object has no attrib...
- This isn't related to that suite commit. Run manually, 'file' returns "remote/smithi150/coredump/1522085413.12350.cor...
- 07:42 PM Bug #23465 (New): "Mutex.cc: 110: FAILED assert(r == 0)" ("AttributeError: 'tuple' object has no ...
- I see latest commit https://github.com/ceph/ceph/commit/c6760eba50860d40e25483c3e4cee772f3ad4468#diff-289c6ff15fd25ac...
- 09:11 AM Backport #23316 (Need More Info): jewel: pool create cmd's expected_num_objects is not correctly ...
- To backport this to jewel, we need to skip mgr changes and qa/standalone/mon/osd-pool-create.sh related changes to be...
03/24/2018
- 11:01 AM Bug #23352 (New): osd: segfaults under normal operation
- Raising priority because it's a possible regression in Luminous.
- 07:52 AM Support #23455: osd: large number of inconsistent objects after recover or backfilling
- also affected head object, but very small number of portion.
- 07:46 AM Support #23455: osd: large number of inconsistent objects after recover or backfilling
- it is also affected for v10.2.5. And just affect all snap object, and no head object is affected
- 07:36 AM Support #23455: osd: large number of inconsistent objects after recover or backfilling
- it seems quite similar with issue http://tracker.ceph.com/issues/21388
- 07:20 AM Support #23455 (Resolved): osd: large number of inconsistent objects after recover or backfilling
- large number of inconsistent objects after recover or backfilling.
reproduce method:
1) create rbd volume and, ... - 07:09 AM Bug #23430 (Resolved): PGs are stuck in 'creating+incomplete' status on vstart cluster
03/23/2018
- 04:47 PM Bug #23352: osd: segfaults under normal operation
- "Me too". I had a brief look at the coredump, without becoming all that much wiser. Judging by the lock attached to t...
- 02:44 PM Bug #23145: OSD crashes during recovery of EC pg
- sorry Json, i saw the ec pool min_size is equal 5, i need verify with our test engineer tomorrow... the two environme...
- 06:21 AM Bug #23439: Crashing OSDs after 'ceph pg repair'
- And the next three ,,,...
- 05:18 AM Bug #23440: duplicated "commit_queued_for_journal_write" events in OpTracker
- OK. then I'm going to close mine.
- 05:13 AM Bug #23440: duplicated "commit_queued_for_journal_write" events in OpTracker
- Yanhu Cao wrote:
> https://github.com/ceph/ceph/pull/21017
Hi, Yanhu, thank you for your contribution. My intern ... - 03:37 AM Bug #23440: duplicated "commit_queued_for_journal_write" events in OpTracker
- https://github.com/ceph/ceph/pull/21017
- 02:41 AM Backport #23077 (In Progress): luminous: mon: ops get stuck in "resend forwarded message to leader"
- Include both PRs from comment#2:
https://github.com/ceph/ceph/pull/21016
03/22/2018
- 06:56 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
- And the next three OSDs crashed:...
- 11:49 AM Bug #23439: Crashing OSDs after 'ceph pg repair'
- With #23258 we already had a similar issue and I am wondering if this is something you always have to expect with Cep...
- 11:46 AM Bug #23439 (New): Crashing OSDs after 'ceph pg repair'
- Yesterday, ceph reported scrub errors....
- 06:14 PM Bug #23430 (Fix Under Review): PGs are stuck in 'creating+incomplete' status on vstart cluster
- PR: https://github.com/ceph/ceph/pull/21008
- 05:54 PM Bug #23430 (In Progress): PGs are stuck in 'creating+incomplete' status on vstart cluster
- 05:54 PM Bug #23430: PGs are stuck in 'creating+incomplete' status on vstart cluster
- I think the problem is that `ceph config` sets osd_pool_default_erasure_code_profile too late: when the cluster alrea...
- 05:05 PM Bug #23430: PGs are stuck in 'creating+incomplete' status on vstart cluster
- I think it still worth investigating.
Previously the default profile just worked on vstart clusters, and now it d... - 03:51 PM Bug #23430: PGs are stuck in 'creating+incomplete' status on vstart cluster
- I did further investigation here and figured out this issue occurs due to the "special" situation of my vstart enviro...
- 01:59 PM Bug #23440 (In Progress): duplicated "commit_queued_for_journal_write" events in OpTracker
- ...
- 01:38 PM Bug #23352: osd: segfaults under normal operation
- Also seeing these, no core dump but have now had 3 segfaults in 2 weeks since upgrading to 12.2.4 from a very stable ...
- 11:15 AM Bug #23145: OSD crashes during recovery of EC pg
- hi Josh, here is the log i said to offer in APAC with debug_osd=30 & debug_bluestore = 30. its another environment, s...
- 08:08 AM Bug #23145: OSD crashes during recovery of EC pg
- Xiaofei Cui wrote:
> We think we have met the same problem.
> The pginfos:
>
> [...]
>
> We have no idea why... - 05:21 AM Backport #23412 (In Progress): luminous: delete type mismatch in CephContext teardown
- https://github.com/ceph/ceph/pull/20998
- 02:55 AM Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica tak...
- I reproduced this by creating an inconsistent pg and then causing it to split.
pool of size 2 with 1 pg and I crea... - 02:02 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- Got it! I couldn't use the pause feature because none of the get/get-bytes/stat stuff would work, it all got stuck.
...
03/21/2018
- 11:38 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
I'm confused because you are dealing with 2 different objects.
Does rb.0.854e.238e1f29.000000140b6d still have a...- 11:12 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- Ah, I got confused by stuff. Should I just not stop the OSDs, or just stop during the get, start for the put?
I ju... - 11:00 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
I don't know how rados get/put could work while the PGs OSDs are all stopped. Also, 'rados get' will give EIO erro...- 10:39 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- So the stuff I did above did not work, the result of the repair after get/put:...
- 10:08 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- So for some reason my rados get/put commands are working now, not sure why. After I complete all my steps, the repair...
- 09:31 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- I'm trying the same thing on a different broken pg, while it's stuck the pg detail is:...
- 09:00 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- The time to write should be on the same order as reading.
You forgot to restart your osd before running rados. - 08:38 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- David Zafman wrote:
> With client activity stopped, read the data from this object and write it again using rados ge... - 06:15 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
You have a data_digest issue not an omap_digest one. You can remove the temporary omap entry. Since shards 8, 13 ...- 05:57 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
- I'm trying to fix my problems but I'm kind of a noob, having trouble getting things to work. My cluster seems to be d...
- 07:38 PM Bug #23228: scrub mismatch on objects
- /a/dzafman-2018-03-21_09:57:19-rados:thrash-wip-zafman-testing2-distro-basic-smithi/2312125
rados:thrash/{0-size-m... - 11:33 AM Backport #23408 (In Progress): luminous: mgrc's ms_handle_reset races with send_pgstats()
- https://github.com/ceph/ceph/pull/20987
- 09:14 AM Bug #23431 (Duplicate): OSD Segmentation fault in thread_name:safe_timer
- I noticed an OSD segmentation fault in one of our OSDs logs.
See the attached log entries. There is no core file tha... - 08:27 AM Bug #23430 (Resolved): PGs are stuck in 'creating+incomplete' status on vstart cluster
- Hi,
The PGs are stuck in 'creating+incomplete' status after creating an erasure coded pool on a vstart cluster.
... - 03:05 AM Bug #23428 (New): Snapset inconsistency is hard to diagnose because authoritative copy used by li...
- ...
- 02:27 AM Bug #23145: OSD crashes during recovery of EC pg
- We think we have met the same problem.
The pginfos:...
03/20/2018
- 12:32 PM Bug #23145 (In Progress): OSD crashes during recovery of EC pg
- 12:32 PM Bug #23145: OSD crashes during recovery of EC pg
- Sorry for missing your updates, Peter. :-( I've just scripted my Gmail for _X-Redmine-Project: bluestore_.
From th...
03/19/2018
- 09:35 PM Bug #23145 (New): OSD crashes during recovery of EC pg
- 07:32 PM Bug #23145: OSD crashes during recovery of EC pg
- Can't seem to flip this ticket out of 'Needs more info', unfortunately..
- 04:42 PM Backport #23413 (Resolved): jewel: delete type mismatch in CephContext teardown
- https://github.com/ceph/ceph/pull/21084
- 04:42 PM Backport #23412 (Resolved): luminous: delete type mismatch in CephContext teardown
- https://github.com/ceph/ceph/pull/20998
- 04:42 PM Backport #23408 (Resolved): luminous: mgrc's ms_handle_reset races with send_pgstats()
- https://github.com/ceph/ceph/pull/23791
- 04:26 PM Bug #23267 (In Progress): scrub errors not cleared on replicas can cause inconsistent pg state wh...
- 04:00 PM Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica tak...
- 01:00 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- Appears the error is with calculating the host weight.
It has set it at 43.664 when it should be set to 43.668
... - 10:34 AM Bug #23403 (Closed): Mon cannot join quorum
- Hi all,
On a 3-mon cluster running infernalis one of the mon left the quorum and we are unable to make it come bac... - 10:23 AM Backport #23351 (In Progress): luminous: filestore: do_copy_range replay bad return value
- https://github.com/ceph/ceph/pull/20957
- 09:24 AM Bug #23402 (Duplicate): objecter: does not resend op on split interval
- ...
- 09:01 AM Bug #23370 (Pending Backport): mgrc's ms_handle_reset races with send_pgstats()
03/18/2018
- 10:19 PM Bug #23339 (Resolved): Scrub errors after ec-small-objects-overwrites test
- http://pulpito.ceph.com/sage-2018-03-18_09:19:17-rados-wip-sage-testing-2018-03-18-0231-distro-basic-smithi/
03/17/2018
- 02:08 AM Bug #23395: qa/standalone/special/ceph_objectstore_tool.py causes ceph-mon core dump
../qa/run-standalone.sh ceph_objectstore_tool.py
--- ../qa/standalone/special/ceph_objectstore_tool.py ---
vst...- 02:05 AM Bug #23395 (Can't reproduce): qa/standalone/special/ceph_objectstore_tool.py causes ceph-mon core...
I assume erasure code profile handling must have changed. It shouldn't crash but we may need a test change too.
...
03/16/2018
- 10:38 PM Feature #23364: Special scrub handling of hinfo_key errors
- https://github.com/ceph/ceph/pull/20947
- 08:37 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- Appears Paul Emmerich has found the problem and its down the weights.
The email chain can be seen from the mailin... - 09:22 AM Bug #23386 (Resolved): crush device class: Monitor Crash when moving Bucket into Default root
- When moving prestaged hosts with disks that out side of a root moving them into the root, causes the monitor to crash...
- 08:08 PM Bug #23339 (Fix Under Review): Scrub errors after ec-small-objects-overwrites test
- http://pulpito.ceph.com/sage-2018-03-16_17:59:04-rados:thrash-erasure-code-overwrites-wip-sage-testing-2018-03-16-112...
- 05:09 PM Bug #23352: osd: segfaults under normal operation
- Here is the link to the core dump https://drive.google.com/open?id=1tOTqSOaS94gOhHfXmGbbfuXLNFFfOVuf
- 04:34 PM Bug #23324 (Pending Backport): delete type mismatch in CephContext teardown
- 03:03 AM Bug #23324 (In Progress): delete type mismatch in CephContext teardown
- https://github.com/ceph/ceph/pull/20930
- 01:38 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
- Forgot to mention the exact place it breaks:...
- 10:21 AM Bug #23387 (Resolved): Building Ceph on armhf fails due to out-of-memory
- Hi,
I'm currently struggling with building ceph through make-deps.sh on a armhf (namely the ODROID HC2). Everythin... - 09:16 AM Bug #23385: osd: master osd crash when pg scrub
- The ceph version is 10.2.3
- 09:11 AM Bug #23385 (New): osd: master osd crash when pg scrub
- my ceph on arm 4.4.52-armada-17.06.2.I put a object into rados.when scrub the pg with handle,the master osd crash.bel...
- 08:56 AM Bug #23320: OSD suicide itself because of a firewall rule but reports a received signal
- Can I have some inputs on this topic ? I can make the PR but I'd love having your opinion on it.
Thx,
03/15/2018
- 06:00 PM Bug #23145: OSD crashes during recovery of EC pg
- Let me know if you need anything else off this cluster, I probably will have to trash this busted PG at some point so...
- 05:37 AM Bug #23370 (Fix Under Review): mgrc's ms_handle_reset races with send_pgstats()
- https://github.com/ceph/ceph/pull/20909
- 05:34 AM Bug #23370 (Resolved): mgrc's ms_handle_reset races with send_pgstats()
- 2018-03-14T12:29:45.168 INFO:teuthology.orchestra.run.mira056:Running: 'sudo adjust-ulimits ceph-coverage /home/ubunt...
- 05:34 AM Bug #23371 (New): OSDs flaps when cluster network is made down
- we are having a 5 node cluster with 5 mons and 120 OSDs equally distributed.
As part of our resiliency test we ma... - 04:06 AM Backport #23315 (In Progress): luminous: pool create cmd's expected_num_objects is not correctly ...
- https://github.com/ceph/ceph/pull/20907
03/14/2018
- 09:37 PM Bug #22346: OSD_ORPHAN issues after jewel->luminous upgrade, but orphaned osds not in crushmap
- Hi Jun,
It's not really possible to pinpoint an exact PR at this stage as it's possible there was more than one an... - 10:19 AM Bug #22346: OSD_ORPHAN issues after jewel->luminous upgrade, but orphaned osds not in crushmap
- Brad Hubbard wrote:
> Hi Graham,
>
> The consensus is that this was caused by a bug in a previous release which f... - 08:41 PM Bug #23365 (New): CEPH device class not honored for erasure encoding.
- To start, this cluster isn't happy. It is my destructive testing/learning cluster.
Recently I rebuilt the cluster... - 08:36 PM Feature #23364 (Resolved): Special scrub handling of hinfo_key errors
We shouldn't handle hinfo_key as just another user xattr
Add the following errors specific to hinfo_key for eras...- 06:32 PM Bug #23361 (New): /build/ceph-12.2.4/src/osd/PGLog.h: 888: FAILED assert(i->prior_version == last...
Log with debug_osd=20 and debug_bluestore=20 enabled:
https://drive.google.com/open?id=1Yr_MIXHzrgWUR5ZsV1xKlPUqZH...- 04:49 PM Bug #23360 (Duplicate): call to 'ceph osd erasure-code-profile set' asserts the monitors
- duplicate of http://tracker.ceph.com/issues/23345
- 04:16 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
- A proper fix would be to provide a proper error message in @OSDMonitor::parse_erasure_code_profile@ instead of assert...
- 04:15 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
- Found the cause of this. From the mon.a.log:...
- 03:08 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
- Hm. quite possible that this is in fact not a classc deadlock.
Turns out, the `ceph` command line tool is also br... - 02:52 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
- The @send_command()@ function visible in this traceback is: https://github.com/ceph/ceph/pull/20865/files#diff-188b91...
- 02:48 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
- Could you point to the code, or provide a small python example, that triggers this deadlock?
- 02:37 PM Bug #23360 (Duplicate): call to 'ceph osd erasure-code-profile set' asserts the monitors
- I've attached `thread apply all bt` mixed with `thread apply all py-bt`
Threads 38 35 34 32 and 31 are waiting for... - 03:48 PM Bug #23352: osd: segfaults under normal operation
- Sage, PM'ed to you the public download link, hope it works.
- 03:39 PM Bug #23352: osd: segfaults under normal operation
- HI Sage, I do have the core dump. Where can I upload the file, it's rather large, 850 MB compressed.
- 01:54 PM Bug #23352 (Need More Info): osd: segfaults under normal operation
- Do you have a core file? I haven't seen this crash before.
- 02:13 AM Bug #23352 (Resolved): osd: segfaults under normal operation
- -1> 2018-03-13 22:03:27.390956 7f42eec36700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1520993007390955, "job": 454,...
- 01:58 PM Bug #23345: `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
- 01:55 PM Bug #23339: Scrub errors after ec-small-objects-overwrites test
- 10:59 AM Bug #22351: Couldn't init storage provider (RADOS)
- @Brad - that's perfect, thanks. Backport PR open.
- 10:27 AM Bug #22351: Couldn't init storage provider (RADOS)
- @Nathan Oops, sorry mate, my bad.
These are the two we need.
https://github.com/ceph/ceph/pull/20022
https:/... - 09:44 AM Bug #22351: Couldn't init storage provider (RADOS)
- @Brad - I was confused because you changed the status to Resolved, apparently before the backport was done.
Could ... - 12:25 AM Bug #22351: Couldn't init storage provider (RADOS)
- @Nathan There wasn't one, I just set the backport field?
Just let me know if you need any action from me on this. - 10:57 AM Backport #23349 (In Progress): luminous: Couldn't init storage provider (RADOS)
- 07:13 AM Documentation #23354 (Resolved): doc: osd_op_queue & osd_op_queue_cut_off
- In docs:
osd_op_queue default is: `prio`. Real is `wpq`. So this is a doc's bug.
If I understand properly: if o... - 05:12 AM Backport #23312 (In Progress): luminous: invalid JSON returned when querying pool parameters
- https://github.com/ceph/ceph/pull/20890
Also available in: Atom