Activity
From 03/25/2018 to 04/23/2018
04/23/2018
- 09:40 PM Bug #23646 (In Progress): scrub interaction with HEAD boundaries and clones is broken
The osd log for primary osd.1 shows that pg 3.0 is a cache pool in a cache tiering configuration. The message "_de...- 09:12 PM Bug #23830 (Can't reproduce): rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
- ...
- 07:45 PM Bug #23828 (Can't reproduce): ec gen object leaks into different filestore collection just after ...
- ...
- 05:11 PM Bug #23827 (Resolved): osd sends op_reply out of order
- ...
- 03:13 AM Bug #23713 (Fix Under Review): High MON cpu usage when cluster is changing
- https://github.com/ceph/ceph/pull/21532
04/22/2018
- 08:07 PM Bug #21977: null map from OSDService::get_map in advance_pg
- From the latest logs, the peering thread id does not appear at all in the log until the crash.
I'm wondering if we... - 08:05 PM Bug #21977: null map from OSDService::get_map in advance_pg
- Seen again here:
http://pulpito.ceph.com/yuriw-2018-04-20_20:02:29-upgrade:jewel-x-luminous-distro-basic-ovh/2420862/
04/21/2018
- 04:06 PM Bug #23793: ceph-osd consumed 10+GB rss memory
- Set osd_debug_op_order to false can fix this problem.
My ceph cluster is created through vstart.sh which set osd_deb... - 03:57 PM Bug #23816: disable bluestore cache caused a rocksdb error
- https://github.com/ceph/ceph/pull/21583
- 03:53 PM Bug #23816 (Resolved): disable bluestore cache caused a rocksdb error
- I disabled bluestore/rocksdb cache to estimate ceph-osd's memory consumption
by set bluestore_cache_size_ssd/bluesto... - 06:55 AM Bug #23145: OSD crashes during recovery of EC pg
- `2018-03-09 08:29:09.170227 7f901e6b30 10 merge_log log((17348'18587,17348'18587], crt=17348'18585) from osd.6(2) int...
04/20/2018
- 09:09 PM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- You don't really have authentication without the message signing. Since we don't do full encryption, signing is the o...
- 03:07 PM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- How costly is just the authentication piece, i.e. keep cephx but turn off message signing?
- 07:21 AM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- Summary of the discussion:
`check_message_signature` in `AsyncConnection::process` is being already protected by `... - 06:38 AM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- per Radoslaw Zarzynski
> the overhead between `CreateContextBySym` and `DigestBegin` is small
and probably we c... - 08:53 PM Bug #23517 (Resolved): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- 02:27 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- https://github.com/ceph/ceph/pull/21571
- 08:47 PM Bug #23811: RADOS stat slow for some objects on same OSD
- We are still debugging this. On a further look, it looks like all objects on that PG (aka _79.1f9_) show similar slow...
- 05:30 PM Bug #23811 (New): RADOS stat slow for some objects on same OSD
- We have observed that queries have been slow for some RADOS objects while others on the same OSD respond much quickly...
- 05:19 PM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
- I guess the intention is that scrubbing takes priority and proceeds even if trimming is in progress. Before more tri...
- 04:45 PM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
We don't start trimming if scrubbing is happening, so maybe the only hole is that scrubbing doesn't check for trimm...- 04:38 PM Bug #23810: ceph mon dump outputs verbose text to stderr
- As a simple verification, running:...
- 04:26 PM Bug #23810 (New): ceph mon dump outputs verbose text to stderr
- When executing...
- 02:41 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- My opinion is that this is different from a problem where the inconsistent flag reappears after repairing a PG becaus...
- 12:55 PM Backport #23808 (In Progress): luminous: upgrade: bad pg num and stale health status in mixed lum...
- https://github.com/ceph/ceph/pull/21556
- 12:55 PM Backport #23808 (Resolved): luminous: upgrade: bad pg num and stale health status in mixed lumnio...
- https://github.com/ceph/ceph/pull/21556
- 11:11 AM Bug #23763 (Fix Under Review): upgrade: bad pg num and stale health status in mixed lumnious/mimi...
- https://github.com/ceph/ceph/pull/21555
- 10:09 AM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- i think the pg_num = 11 is set by LibRadosList.EnumerateObjects...
- 12:32 AM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- Yuri reproduced the bad pg_num in 1 of 2 runs:...
- 12:48 AM Bug #22881 (In Progress): scrub interaction with HEAD boundaries and snapmapper repair is broken
04/19/2018
- 02:18 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- 12:34 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- https://github.com/ceph/ceph/pull/21280
- 07:42 AM Bug #23517 (Fix Under Review): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- ...
- 09:33 AM Backport #23784 (In Progress): luminous: osd: Warn about objects with too many omap entries
- 09:33 AM Backport #23784: luminous: osd: Warn about objects with too many omap entries
- h3. description
As discussed in this PR - https://github.com/ceph/ceph/pull/16332 - 07:29 AM Bug #23793: ceph-osd consumed 10+GB rss memory
- the "mon max pg per osd" is 1024 in my test.
- 07:14 AM Bug #23793 (New): ceph-osd consumed 10+GB rss memory
- After 26GB data is written, ceph-osd's memory(rss) reached 10+GB.
The objectstore backed is *KStore*. master branc... - 06:42 AM Backport #22934 (In Progress): luminous: filestore journal replay does not guard omap operations
04/18/2018
- 09:10 PM Bug #23620 (Resolved): tasks.mgr.test_failover.TestFailover failure
- 08:25 PM Bug #23788 (Duplicate): luminous->mimic: EIO (crc mismatch) on copy-get from ec pool
- ...
- 08:01 PM Bug #23787 (Rejected): luminous: "osd-scrub-repair.sh'" failures in rados
- This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s... - 07:57 PM Backport #23786 (Resolved): luminous: "utilities/env_librados.cc:175:33: error: unused parameter ...
- This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s... - 07:52 PM Bug #23785 (Resolved): "test_prometheus (tasks.mgr.test_module_selftest.TestModuleSelftest) ... E...
- This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s... - 06:45 PM Backport #23784 (Resolved): luminous: osd: Warn about objects with too many omap entries
- https://github.com/ceph/ceph/pull/21518
- 03:34 PM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- the pgs with creating or unknown status "pg dump" were active+clean after 2018-04-16 22:47. so the output of last "pg...
- 01:29 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Any update on this?
- 12:14 PM Bug #23701 (Fix Under Review): oi(object_info_t) size mismatch with real object size during old w...
- 12:12 PM Bug #23701: oi(object_info_t) size mismatch with real object size during old write arrives after ...
- i think this issue only exists in jewel.
- 02:47 AM Documentation #23777: doc: description of OSD_OUT_OF_ORDER_FULL problem
- the default values :
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
- 02:39 AM Documentation #23777 (Resolved): doc: description of OSD_OUT_OF_ORDER_FULL problem
- The description of OSD_OUT_OF_ORDER_FULL is...
- 12:30 AM Feature #23364 (Resolved): Special scrub handling of hinfo_key errors
- 12:30 AM Backport #23654 (Resolved): luminous: Special scrub handling of hinfo_key errors
04/17/2018
- 07:00 PM Backport #23772 (Resolved): luminous: ceph status shows wrong number of objects
- https://github.com/ceph/ceph/pull/22680
- 06:36 PM Bug #23769 (Resolved): osd/EC: slow/hung ops in multimds suite test
- ...
- 03:40 PM Feature #23364: Special scrub handling of hinfo_key errors
- This pull request is another follow on:
https://github.com/ceph/ceph/pull/21450 - 11:41 AM Bug #20924: osd: leaked Session on osd.7
- /a/sage-2018-04-17_04:17:03-rados-wip-sage3-testing-2018-04-16-2028-distro-basic-smithi/2404155
this time on osd.4... - 07:37 AM Bug #23767: "ceph ping mon" doesn't work
- so "ceph ping mon.<id>" will remind you mon.<id> doesn't existed. however, if you run "ceph ping mon.a", you can get ...
- 07:33 AM Bug #23767 (New): "ceph ping mon" doesn't work
- if there is only mon_host= ip1, ip2...in the ceph.conf, then "ceph ping mon.<id>" doesn't work.
Root cause is in the... - 06:14 AM Bug #23273: segmentation fault in PrimaryLogPG::recover_got()
- Sorry for late reply, but it's hard to reproduce. we reproduce it once with...
- 02:09 AM Documentation #23765 (New): librbd hangs if permissions are incorrect
- I've been building rust bindings for librbd against ceph jewel and luminous. I found out by accident that if a cephx...
- 12:14 AM Bug #23763 (Resolved): upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- This happened in a luminous-x/point-to-point run. Logs in teuthology:/home/yuriw/logs/2387999/
Versions at this po...
04/16/2018
- 05:52 PM Bug #23760 (New): mon: `config get <who>` does not allow `who` as 'mon'/'osd'
- `config set mon` is allowed, but `config get mon` is not.
This is due to <who> on `get` being parsed as an EntityN... - 04:39 PM Bug #23753: "Error ENXIO: problem getting command descriptions from osd.4" in upgrade:kraken-x-lu...
- This generally means the OSD isn't on?
04/15/2018
- 10:22 PM Bug #23753 (Can't reproduce): "Error ENXIO: problem getting command descriptions from osd.4" in u...
- Run: http://pulpito.ceph.com/teuthology-2018-04-15_03:25:02-upgrade:kraken-x-luminous-distro-basic-smithi/
Jobs: '23... - 05:44 PM Bug #23471 (Resolved): add --add-bucket and --move options to crushtool
- 02:51 PM Bug #22095 (Pending Backport): ceph status shows wrong number of objects
- 08:52 AM Bug #19348: "ceph ping mon.c" cli prints assertion failure on timeout
- https://github.com/ceph/ceph/pull/21432
04/14/2018
- 06:11 AM Support #23719: Three nodes shutdown,only boot two of the nodes,many pgs down.(host failure-domai...
- fix description: If pg1.1 have acting set [1,2,3], I power down osd.3 first,then power down osd.2. In case I boot osd...
- 05:50 AM Support #23719 (New): Three nodes shutdown,only boot two of the nodes,many pgs down.(host failure...
- The interval mechanism of PG will cause a problem in the process of cluster restart.If I have 3 nodes(host failure-do...
04/13/2018
- 10:40 PM Bug #23716: osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (on upgrade f...
- ...
- 10:21 PM Bug #23716 (Resolved): osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (o...
- ...
- 07:33 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- Live multimds run: /ceph/teuthology-archive/pdonnell-2018-04-13_04:19:33-multimds-wip-pdonnell-testing-20180413.02283...
- 07:30 PM Bug #21992: osd: src/common/interval_map.h: 161: FAILED assert(len > 0)
- /ceph/teuthology-archive/pdonnell-2018-04-13_04:19:33-multimds-wip-pdonnell-testing-20180413.022831-testing-basic-smi...
- 06:28 PM Bug #23713: High MON cpu usage when cluster is changing
- My guess is that this is the compat reencoding of the OSDMap for the pre-luminous clients.
Are you by chance makin... - 06:10 PM Bug #23713 (Resolved): High MON cpu usage when cluster is changing
- After upgrading to Luminous 12.2.4 (from Jewel 10.2.5), we consistently see high cpu usage when OSDMap changes , esp...
- 03:03 PM Bug #23228 (Closed): scrub mismatch on objects
- The failure in comment (2) looks unrelated, but i twas a test branch. let's see if it happens again.
The original ... - 01:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Is there any testing, logs, etc that will be helpful for tracking down the cause of this problem. I had a fairly bad...
- 08:20 AM Bug #23701: oi(object_info_t) size mismatch with real object size during old write arrives after ...
- here is my pull request to fix this problem
https://github.com/ceph/ceph/pull/21408 - 08:08 AM Bug #23701 (Fix Under Review): oi(object_info_t) size mismatch with real object size during old w...
- currently, in our test environment (jewel : cephfs + cache tier + ec pool), we found several osd coredump
in the fol... - 01:52 AM Backport #23654 (In Progress): luminous: Special scrub handling of hinfo_key errors
- https://github.com/ceph/ceph/pull/21397
04/12/2018
- 11:08 PM Feature #23364: Special scrub handling of hinfo_key errors
- Follow on pull request included in backport to this tracker
https://github.com/ceph/ceph/pull/21362 - 09:49 PM Bug #23610 (Resolved): pg stuck in activating because of dropped pg-temp message
- 09:48 PM Backport #23630 (Resolved): luminous: pg stuck in activating
- 09:28 PM Backport #23630: luminous: pg stuck in activating
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21330
merged - 05:35 PM Bug #23228: scrub mismatch on objects
- My change only affects the scrub error counts in the stats. However, if setting dirty_info in proc_primary_info() wo...
- 04:27 PM Bug #23228: scrub mismatch on objects
- The original report was an EC test, so it looks like a dup of #23339.
David, your failures are not EC. Could they... - 04:43 PM Bug #20439 (Can't reproduce): PG never finishes getting created
- 04:29 PM Bug #22656: scrub mismatch on bytes (cache pools)
- Just bytes
dzafman-2018-03-28_18:21:29-rados-wip-zafman-testing-distro-basic-smithi/2332093
[ERR] 3.0 scrub sta... - 02:29 PM Bug #20874: osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end() || (miter->second...
- /a/sage-2018-04-11_22:26:40-rados-wip-sage-testing-2018-04-11-1604-distro-basic-smithi/2387226
- 02:25 PM Backport #23668 (In Progress): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrit...
- https://github.com/ceph/ceph/pull/21378
- 01:34 AM Backport #23668 (Resolved): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrites'...
- https://github.com/ceph/ceph/pull/21378
- 07:19 AM Backport #23675 (In Progress): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- 07:07 AM Backport #23675 (Resolved): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- https://github.com/ceph/ceph/pull/21368
- 03:27 AM Bug #23662 (Resolved): osd: regression causes SLOW_OPS warnings in multimds suite
- 02:59 AM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- not able to move this to CI somehow... moving it to RADOS.
- 02:54 AM Backport #23315 (Resolved): luminous: pool create cmd's expected_num_objects is not correctly int...
- 02:41 AM Bug #23622 (Pending Backport): qa/workunits/mon/test_mon_config_key.py fails on master
- 02:01 AM Bug #23564: OSD Segfaults
- Correct, Bluestore and Luminous 12.2.4
- 01:57 AM Backport #23673 (In Progress): jewel: auth: ceph auth add does not sanity-check caps
- 01:43 AM Backport #23673 (Resolved): jewel: auth: ceph auth add does not sanity-check caps
- https://github.com/ceph/ceph/pull/21367
- 01:53 AM Bug #23578 (Resolved): large-omap-object-warnings test fails
- 01:52 AM Backport #23633 (Rejected): luminous: large-omap-object-warnings test fails
- We can close this if that test isn't present in luminous.
- 01:35 AM Backport #23633 (Need More Info): luminous: large-omap-object-warnings test fails
- Brad,
Backporting PR#21295 to luminous is unrelated unless we get qa/suites/rados/singleton-nomsgr/all/large-omap-ob... - 01:41 AM Backport #23670 (In Progress): luminous: auth: ceph auth add does not sanity-check caps
- 01:34 AM Backport #23670 (Resolved): luminous: auth: ceph auth add does not sanity-check caps
- https://github.com/ceph/ceph/pull/24906
- 01:34 AM Backport #23654 (New): luminous: Special scrub handling of hinfo_key errors
- 01:33 AM Bug #22525 (Pending Backport): auth: ceph auth add does not sanity-check caps
04/11/2018
- 11:22 PM Bug #23662 (Fix Under Review): osd: regression causes SLOW_OPS warnings in multimds suite
- https://github.com/ceph/teuthology/pull/1166
- 09:38 PM Bug #23662: osd: regression causes SLOW_OPS warnings in multimds suite
- Looks like the obvious cause: https://github.com/ceph/ceph/pull/20660
- 07:56 PM Bug #23662 (Resolved): osd: regression causes SLOW_OPS warnings in multimds suite
- See: [1], first instance of the problem at [0].
The last run which did not cause most multimds jobs to fail with S... - 11:20 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
Any scrub that completes without errors will set num_scrub_errors in pg stats to 0. That will cause the inconsiste...- 10:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- David, is there any way a missing object wouldn't be reported in list-inconsistent output?
- 11:01 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- Let's see if this happens again now that sage's fast peering branch is merged.
- 10:58 PM Backport #23485 (Resolved): luminous: scrub errors not cleared on replicas can cause inconsistent...
- 10:58 PM Bug #23585: osd: safe_timer segfault
- Possibly the same as http://tracker.ceph.com/issues/23431
- 02:10 PM Bug #23585: osd: safe_timer segfault
- Got segfault in safe_timer too. Got it just once so can not provide more info at the moment.
2018-04-03 05:53:07... - 10:57 PM Bug #23564: OSD Segfaults
- Is this on bluestore? there are a few reports of this occurring on bluestore including your other bug http://tracker....
- 10:44 PM Bug #23590: kstore: statfs: (95) Operation not supported
- 10:42 PM Bug #23601 (Resolved): Cannot see the value of parameter osd_op_threads via ceph --show-config
- 10:37 PM Bug #23614: local_reserver double-reservation of backfilled pg
- This may be the same root cause as http://tracker.ceph.com/issues/23490
- 10:36 PM Bug #23565: Inactive PGs don't seem to cause HEALTH_ERR
- Brad, can you take a look at this? I think it can be handled by the stuck pg code, that iirc already warns about pgs ...
- 10:25 PM Bug #23664 (Resolved): cache-try-flush hits wrlock, busy loops
- ...
- 10:12 PM Bug #23403 (Closed): Mon cannot join quorum
- Thanks for letting us know.
- 01:15 PM Bug #23403: Mon cannot join quorum
- After more investigation we discovered that one of the bonds on the machine was not behaving properly. We removed the...
- 11:28 AM Bug #23403: Mon cannot join quorum
- Thanks for the investigation Brad.
The "fault, initiating reconnect" and "RESETSESSION" messages only appear when ... - 07:57 PM Bug #23595: osd: recovery/backfill is extremely slow
- @Greg Farnum: Ah, great that part is already handled!
What about my other questions though, like
> I think it i... - 06:45 PM Bug #23595 (Duplicate): osd: recovery/backfill is extremely slow
- https://tracker.ceph.com/issues/23141
Sorry you ran into this, it's a bug in BlueStore/BlueFS. The fix will be in ... - 07:49 PM Backport #23315: luminous: pool create cmd's expected_num_objects is not correctly interpreted
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20907
merged - 05:45 PM Feature #23660 (New): when scrub errors are due to disk read errors, ceph status can say "likely ...
- If some of the scrub errors are due to disk read errors, we can also say in the status output "likely disk errors" an...
- 03:49 PM Bug #23487 (Pending Backport): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- 03:39 PM Backport #23654 (Resolved): luminous: Special scrub handling of hinfo_key errors
- https://github.com/ceph/ceph/pull/21397
- 03:09 PM Bug #23621 (Resolved): qa/standalone/mon/misc.sh fails on master
- 01:40 PM Backport #23316: jewel: pool create cmd's expected_num_objects is not correctly interpreted
- -https://github.com/ceph/ceph/pull/21042-
but test/mon/osd-pool-create.sh failing, looking into it. - 05:00 AM Bug #23610 (Pending Backport): pg stuck in activating because of dropped pg-temp message
- 04:56 AM Bug #23648 (New): max-pg-per-osd.from-primary fails because of activating pg
- the reason why we have activating pg when the number of pg is under the hard limit of max-pg-per-osd is that:
1. o... - 03:01 AM Bug #23647 (In Progress): thrash-eio test can prevent recovery
- We are injecting random EIOs. However, in a recovery situation an EIO leads us to decide the object is missing in on...
04/10/2018
- 11:38 PM Feature #23364 (Pending Backport): Special scrub handling of hinfo_key errors
- 09:13 PM Bug #23428: Snapset inconsistency is hard to diagnose because authoritative copy used by list-inc...
- In the pull request https://github.com/ceph/ceph/pull/20947 there is a change to partially address this issue. Unfor...
- 09:08 PM Bug #23646 (Resolved): scrub interaction with HEAD boundaries and clones is broken
- Scrub will work in chunks, accumulating work in cleaned_meta_map. A single object's clones may stretch across two su...
- 06:12 PM Backport #23630 (In Progress): luminous: pg stuck in activating
- 05:53 PM Backport #23630 (Resolved): luminous: pg stuck in activating
- https://github.com/ceph/ceph/pull/21330
- 05:53 PM Bug #23610 (Fix Under Review): pg stuck in activating because of dropped pg-temp message
- 05:53 PM Bug #23610 (Pending Backport): pg stuck in activating because of dropped pg-temp message
- 05:53 PM Backport #23633 (Rejected): luminous: large-omap-object-warnings test fails
- 05:47 PM Bug #18746 (Fix Under Review): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) ...
- 04:26 PM Bug #23495 (Resolved): Need (SLOW_OPS) in whitelist for another yaml
- 11:26 AM Bug #23495 (Fix Under Review): Need (SLOW_OPS) in whitelist for another yaml
- https://github.com/ceph/ceph/pull/21324
- 01:55 PM Bug #23627 (Resolved): Error EACCES: problem getting command descriptions from mgr.None from 'cep...
- ...
- 01:32 PM Bug #23622 (Fix Under Review): qa/workunits/mon/test_mon_config_key.py fails on master
- https://github.com/ceph/ceph/pull/21329
- 03:42 AM Bug #23622: qa/workunits/mon/test_mon_config_key.py fails on master
- see https://github.com/ceph/ceph/pull/21317 (not a fix)
- 02:56 AM Bug #23622 (Resolved): qa/workunits/mon/test_mon_config_key.py fails on master
- ...
- 07:04 AM Bug #20919 (Resolved): osd: replica read can trigger cache promotion
- 06:59 AM Backport #22403 (Resolved): jewel: osd: replica read can trigger cache promotion
- 06:22 AM Bug #23585: osd: safe_timer segfault
- https://drive.google.com/open?id=1x_0p9s9JkQ1zo-LCx6mHxm0DQO5sc1UA too larger about(1.2G). And ceph-osd.297.log.gz di...
- 05:53 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- doc don't update. So i create a PR:https://github.com/ceph/ceph/pull/21319.
- 04:57 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- In this commit:08731c3567300b28d83b1ac1c2ba. It removed. Maybe docs didn't update or you read old docs.
- 04:27 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- But I can see this option in document !! The setting is work in Jewel
So osd_op_threads was removed from Luminous ??
- 03:14 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- There is no "osd_op_threads". Now it call osd_op_num_shards/osd_op_num_shards_hdd/osd_op_num_shards_sdd.
- 05:34 AM Bug #23595: osd: recovery/backfill is extremely slow
- check hdd or ssd by code at osd started and not changed after starting.
I think we need increase the log level fo... - 05:19 AM Bug #23593 (Resolved): RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- 04:29 AM Bug #23621 (In Progress): qa/standalone/mon/misc.sh fails on master
- https://github.com/ceph/ceph/pull/21318
- 04:17 AM Bug #23621: qa/standalone/mon/misc.sh fails on master
- bc5df2b4497104c2a8747daf0530bb5184f9fecb added ceph::features::mon::FEATURE_OSDMAP_PRUNE so the output that's failing...
- 02:53 AM Bug #23621: qa/standalone/mon/misc.sh fails on master
- http://pulpito.ceph.com/sage-2018-04-10_02:19:56-rados-master-distro-basic-smithi/2377263
http://pulpito.ceph.com/sa... - 02:51 AM Bug #23621 (Resolved): qa/standalone/mon/misc.sh fails on master
- This appears to be from the addition of the osdmap-prune mon feature?
- 02:49 AM Bug #23620 (Fix Under Review): tasks.mgr.test_failover.TestFailover failure
- https://github.com/ceph/ceph/pull/21315
- 02:43 AM Bug #23620 (Resolved): tasks.mgr.test_failover.TestFailover failure
- http://pulpito.ceph.com/sage-2018-04-10_02:19:56-rados-master-distro-basic-smithi/2377255...
- 12:57 AM Bug #23578 (Pending Backport): large-omap-object-warnings test fails
- Just a note that my analysis above was incorrect and this was not due to the lost coin flips but due to a pg map upda...
- 12:18 AM Backport #23485 (In Progress): luminous: scrub errors not cleared on replicas can cause inconsist...
04/09/2018
- 10:24 PM Feature #23616 (New): osd: admin socket should help debug status at all times
- Last week I was looking at an LRC OSD which was having trouble, and it wasn't clear why.
The cause ended up being ... - 10:18 PM Bug #22882 (Resolved): Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
- Whoops, this merged way back then with a slightly different plan than discussed here (see PR discussion).
- 09:59 PM Bug #22525: auth: ceph auth add does not sanity-check caps
- https://github.com/ceph/ceph/pull/21311
- 09:21 PM Feature #23045 (Resolved): mon: warn on slow ops in OpTracker
- That PR got merged a while ago and we've been working through the slow ops warnings that turn up since. Seems to be a...
- 08:59 PM Feature #21084 (Resolved): auth: add osd auth caps based on pool metadata
- 06:53 PM Bug #23614: local_reserver double-reservation of backfilled pg
- Looking through the code I don't see where the reservation is supposed to be released. I see releases for
- the p... - 06:52 PM Bug #23614 (Resolved): local_reserver double-reservation of backfilled pg
- - pg gets reservations (incl local_reserver)
- pg backfills, finishes
- ...apparently enver releases the reservatio... - 06:15 PM Bug #23365: CEPH device class not honored for erasure encoding.
- A quote from Greg Farnum on the crash from another ticket:...
- 06:13 PM Bug #23365: CEPH device class not honored for erasure encoding.
- I put 12.2.2, but that is incorrect. It is version ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) lu...
- 05:38 PM Bug #23365: CEPH device class not honored for erasure encoding.
- What version are you running? How are your OSDs configured?
There was a bug with BlueStore SSDs being misreported ... - 05:36 PM Bug #23371: OSDs flaps when cluster network is made down
- You tested this on a version prior to luminous and the behavior has *changed*?
This must be a result of some chang... - 05:24 PM Documentation #23613 (Closed): doc: add description of new fs-client auth profile
- 05:23 PM Documentation #23613 (Closed): doc: add description of new fs-client auth profile
- On that page: http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
https:... - 05:23 PM Documentation #23612 (New): doc: add description of new auth profiles
- On that page: http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
https:... - 05:18 PM Support #23455 (Resolved): osd: large number of inconsistent objects after recover or backfilling
- fiemap is disabled by default precisely because there are a number of known bugs in the local filesystems across kern...
- 05:07 PM Bug #23610 (Fix Under Review): pg stuck in activating because of dropped pg-temp message
- https://github.com/ceph/ceph/pull/21310
- 05:02 PM Bug #23610 (Resolved): pg stuck in activating because of dropped pg-temp message
- http://pulpito.ceph.com/yuriw-2018-04-05_22:33:03-rados-wip-yuri3-testing-2018-04-05-1940-luminous-distro-basic-smith...
- 05:06 PM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- This is a dupe of...something. We can track it down later.
For now, note that the crash is happening with Hammer c... - 06:17 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- Hm hm hm
- 02:56 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- h3. rados bisect
Reproducer: ... - 02:11 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- This problem was not happening so reproducibly before the current integration run, so one of the following PRs might ...
- 02:05 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- Set priority to Urgent because this prevents us from getting a clean rados run in jewel 10.2.11 integration testing.
- 02:04 AM Bug #23598 (Duplicate): hammer->jewel: ceph_test_rados crashes during radosbench task in jewel ra...
- Test description: rados/upgrade/{hammer-x-singleton/{0-cluster/{openstack.yaml start.yaml} 1-hammer-install/hammer.ya...
- 04:39 PM Bug #23595: osd: recovery/backfill is extremely slow
- *I have it figured out!*
The issue was "osd_recovery_sleep_hdd", which defaults to 0.1 seconds.
After setting
... - 03:23 PM Bug #23595: osd: recovery/backfill is extremely slow
- OK, if I only have the 6 large files in the cephfs AND set the options...
- 02:55 PM Bug #23595: osd: recovery/backfill is extremely slow
- I have now tested with only the 6*1GB files, having deleted the 270k empty files from cephfs.
I continue to see ex... - 12:30 PM Bug #23595: osd: recovery/backfill is extremely slow
- You can find a core dump of the -O0 version created with GDB at http://nh2.me/ceph-issue-23595-osd-O0.core.xz
- 12:06 PM Bug #23595: osd: recovery/backfill is extremely slow
- Attached are two GDB runs of a sender node.
In the release build there were many values "<optimized out>", so I re... - 11:45 AM Bug #23595: osd: recovery/backfill is extremely slow
- On https://forum.proxmox.com/threads/increase-ceph-recovery-speed.36728/ people reported the same number as me of 10 ...
- 10:43 AM Bug #23601 (Resolved): Cannot see the value of parameter osd_op_threads via ceph --show-config
- I have set the parameter of "osd op threads" in configuration file
but I cannot see the value of parameter "osd op t... - 10:17 AM Bug #23403 (Need More Info): Mon cannot join quorum
- 07:23 AM Bug #23578 (In Progress): large-omap-object-warnings test fails
- https://github.com/ceph/ceph/pull/21295
- 01:33 AM Bug #23578: large-omap-object-warnings test fails
- We instruct the OSDs to scrub at around 16:15....
- 04:31 AM Bug #23593 (Fix Under Review): RESTControllerTest.test_detail_route and RESTControllerTest.test_f...
- 02:08 AM Bug #22123: osd: objecter sends out of sync with pg epochs for proxied ops
- Despite the jewel backport of this fix being merged, this problem has reappeared in jewel 10.2.11 integration testing...
04/08/2018
- 07:55 PM Bug #23595: osd: recovery/backfill is extremely slow
- For the record, I installed the following debugging packages for gdb stack traces:...
- 07:53 PM Bug #23595: osd: recovery/backfill is extremely slow
- I have read https://www.spinics.net/lists/ceph-devel/msg38331.html which suggests that there is some throttling going...
- 06:17 PM Bug #23595 (Duplicate): osd: recovery/backfill is extremely slow
- I made a Ceph 12.2.4 (luminous stable) cluster of 3 machines with 10-Gigabit networking on Ubuntu 16.04, using pretty...
- 05:40 PM Bug #23593: RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- PR: https://github.com/ceph/ceph/pull/21290
- 03:10 PM Bug #23593 (Resolved): RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- ...
- 04:31 PM Documentation #23594: auth: document what to do when locking client.admin out
- I found one way to fix it on the mailing list:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/01... - 04:23 PM Documentation #23594 (New): auth: document what to do when locking client.admin out
- I accidentally ran ...
- 11:06 AM Bug #23590: kstore: statfs: (95) Operation not supported
- https://github.com/ceph/ceph/pull/21287
- 11:01 AM Bug #23590 (Fix Under Review): kstore: statfs: (95) Operation not supported
- 2018-04-07 16:19:07.248 7fdec4675700 -1 osd.0 0 statfs() failed: (95) Operation not supported
2018-04-07 16:19:08.... - 08:50 AM Bug #23589 (New): jewel: KStore Segmentation fault in ceph_test_objectstore --gtest_filter=-*/2:-*/3
- Test description: rados/objectstore/objectstore.yaml
Log excerpt:... - 08:39 AM Bug #23588 (New): LibRadosAioEC.IsCompletePP test fails in jewel 10.2.11 integration testing
- Test description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yam...
- 06:53 AM Bug #23511: forwarded osd_failure leak in mon
- Greg, no. both tests below include the no_reply() fix.
see
- http://pulpito.ceph.com/kchai-2018-03-30_15:29:52-r... - 06:42 AM Bug #23585 (Duplicate): osd: safe_timer segfault
- ...
04/07/2018
- 03:04 AM Bug #23195: Read operations segfaulting multiple OSDs
Change the test-erasure-eio.sh test as following:...
04/06/2018
- 10:23 PM Bug #22165 (Fix Under Review): split pg not actually created, gets stuck in state unknown
- Fixed by https://github.com/ceph/ceph/pull/20469
- 09:29 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
- You'll definitely get more attention and advice if somebody else has hit this issue before.
- 08:45 PM Bug #23195: Read operations segfaulting multiple OSDs
- For anyone running into the send_all_remaining_reads() crash, a workaround is to use these osd settings:...
- 04:17 PM Bug #23195 (Fix Under Review): Read operations segfaulting multiple OSDs
- https://github.com/ceph/ceph/pull/21273
I'm going to treat this issue as tracking the first crash, in send_all_rem... - 03:10 AM Bug #23195 (In Progress): Read operations segfaulting multiple OSDs
- 08:41 PM Bug #23200 (Resolved): invalid JSON returned when querying pool parameters
- 08:40 PM Backport #23312 (Resolved): luminous: invalid JSON returned when querying pool parameters
- 07:28 PM Backport #23312: luminous: invalid JSON returned when querying pool parameters
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20890
merged - 08:40 PM Bug #23324 (Resolved): delete type mismatch in CephContext teardown
- 08:40 PM Backport #23412 (Resolved): luminous: delete type mismatch in CephContext teardown
- 07:28 PM Backport #23412: luminous: delete type mismatch in CephContext teardown
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20998
merged - 08:38 PM Bug #23477 (Resolved): should not check for VERSION_ID
- 08:38 PM Backport #23478 (Resolved): should not check for VERSION_ID
- 07:26 PM Backport #23478: should not check for VERSION_ID
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21090
merged - 06:03 PM Bug #21833 (Resolved): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
- 06:02 PM Backport #23160 (Resolved): luminous: Multiple asserts caused by DNE pgs left behind after lots o...
- 03:57 PM Backport #23160: luminous: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
- Prashant D wrote:
> Waiting for code review for backport PR : https://github.com/ceph/ceph/pull/20668
merged - 06:02 PM Bug #23078 (Resolved): SRV resolution fails to lookup AAAA records
- 06:02 PM Backport #23174 (Resolved): luminous: SRV resolution fails to lookup AAAA records
- 03:56 PM Backport #23174: luminous: SRV resolution fails to lookup AAAA records
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20710
merged - 05:57 PM Backport #23472 (Resolved): luminous: add --add-bucket and --move options to crushtool
- 03:53 PM Backport #23472: luminous: add --add-bucket and --move options to crushtool
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21079
mergedReviewed-by: Josh Durgin <jdurgin@redhat.com> - 05:37 PM Bug #23578 (Resolved): large-omap-object-warnings test fails
- ...
- 03:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Sorry, forgot to mention I am running 12.2.4.
- 03:50 PM Bug #23576 (Can't reproduce): osd: active+clean+inconsistent pg will not scrub or repair
- My apologies if I'm too premature in posting this.
Myself and so far two others on the mailing list: http://lists.... - 03:44 AM Bug #23345 (Resolved): `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
- https://github.com/ceph/ceph/pull/20986
- 01:57 AM Bug #21737 (Resolved): OSDMap cache assert on shutdown
- 01:56 AM Backport #21786 (Resolved): jewel: OSDMap cache assert on shutdown
04/05/2018
- 09:12 PM Bug #22887 (Duplicate): osd/ECBackend.cc: 2202: FAILED assert((offset + length) <= (range.first.g...
- 09:12 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- From #22887, this also appeared in /ceph/teuthology-archive/pdonnell-2018-01-30_23:38:56-kcephfs-wip-pdonnell-i22627-...
- 09:09 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- That was the fix I was wondering about, but it was merged to master as https://github.com/ceph/ceph/pull/15712 and so...
- 09:05 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- https://github.com/ceph/ceph/pull/15712
- 09:10 PM Bug #19882: rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0.10251ca0c5f...
- https://github.com/ceph/ceph/pull/15712
- 06:35 PM Bug #22351 (Resolved): Couldn't init storage provider (RADOS)
- 06:35 PM Backport #23349 (Resolved): luminous: Couldn't init storage provider (RADOS)
- 05:22 PM Backport #23349: luminous: Couldn't init storage provider (RADOS)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20896
merged - 06:33 PM Bug #22114 (Resolved): mon: ops get stuck in "resend forwarded message to leader"
- 06:33 PM Backport #23077 (Resolved): luminous: mon: ops get stuck in "resend forwarded message to leader"
- 04:57 PM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21016
merged - 04:57 PM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21016
merged - 06:31 PM Bug #22752 (Resolved): snapmapper inconsistency, crash on luminous
- 06:31 PM Backport #23500 (Resolved): luminous: snapmapper inconsistency, crash on luminous
- 04:55 PM Backport #23500: luminous: snapmapper inconsistency, crash on luminous
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21118
merged
- 05:14 PM Bug #23565 (Fix Under Review): Inactive PGs don't seem to cause HEALTH_ERR
- In looking at https://tracker.ceph.com/issues/23562, there were inactive PGs starting at...
- 04:43 PM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
- ...
- 04:18 PM Bug #23564 (Duplicate): OSD Segfaults
- Apr 5 11:40:31 roc05r-sc3a100 kernel: [126029.543698] safe_timer[28863]: segfault at 8d ip 00007fa9ad4dcccb sp 00007...
- 12:24 PM Bug #23562 (New): VDO OSD caused cluster to hang
- I awoke to alerts that apache serving teuthology logs on the Octo Long Running Cluster was unresponsive.
Here was ... - 08:37 AM Bug #23439: Crashing OSDs after 'ceph pg repair'
- Hi Greg,
thanks for your response.
> That URL denies access. You can use ceph-post-file instead to upload logs ... - 03:31 AM Bug #23403: Mon cannot join quorum
- My apologies. It appears my previous analysis was incorrect.
I've pored over the logs and it appears the issue is ...
04/04/2018
- 11:19 PM Bug #23554: mon: mons need to be aware of VDO statistics
- Right, but AFAICT the monitor is then not even aware of VDO being involved. Which seems fine to my naive thoughts, bu...
- 11:05 PM Bug #23554: mon: mons need to be aware of VDO statistics
- Of course Sage is already on it :)
I don't know where the ... - 10:46 PM Bug #23554: mon: mons need to be aware of VDO statistics
- At least this: https://github.com/ceph/ceph/pull/20516
- 10:44 PM Bug #23554: mon: mons need to be aware of VDO statistics
- What would we expect this monitor awareness to look like? Extra columns duplicating the output of vdostats?
- 05:48 PM Bug #23554 (New): mon: mons need to be aware of VDO statistics
- I created an OSD on top of a logical volume with a VDO device underneath.
Ceph is unaware of how much compression ... - 09:58 PM Bug #23128: invalid values in ceph.conf do not issue visible warnings
- http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/ has been updated with information about this
- 09:53 PM Bug #23273: segmentation fault in PrimaryLogPG::recover_got()
- Can you reproduce with osds configured with:...
- 09:43 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
- That URL denies access. You can use ceph-post-file instead to upload logs to a secure location.
It's not clear wha... - 09:39 PM Bug #23320 (Fix Under Review): OSD suicide itself because of a firewall rule but reports a receiv...
- github.com/ceph/ceph/pull/21000
- 09:37 PM Bug #23487: There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- 09:31 PM Bug #23510 (Resolved): rocksdb spillover for hard drive configurations
- 09:31 PM Bug #23511: forwarded osd_failure leak in mon
- Kefu, did your latest no_reply() PR resolve this?
- 09:29 PM Bug #23535 (Closed): 'ceph --show-config --conf /dev/null' does not work any more
- Yeah, you should use the monitor config commands now! :)
- 09:28 PM Bug #23258: OSDs keep crashing.
- Brian, that's a separate bug; the code address you've picked up on is just part of the generic failure handling code....
- 09:19 PM Bug #23258: OSDs keep crashing.
- I was about to start a new bug and found this, I am also seeing 0xa74234 and ceph::__ceph_assert_fail...
A while b... - 09:22 PM Bug #20924: osd: leaked Session on osd.7
- /a/sage-2018-04-04_02:28:04-rados-wip-sage2-testing-2018-04-03-1634-distro-basic-smithi/2351291
rados/verify/{ceph... - 09:21 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
- Under discussion on the PR, which is good on its own terms but suffering from a prior CephFS bug. :(
- 09:19 PM Bug #23297: mon-seesaw 'failed to become clean before timeout' due to laggy pg create
- I suspect this is resolved in https://github.com/ceph/ceph/pull/19973 by the commit that has the OSDs proactively go ...
- 09:16 PM Bug #23490: luminous: osd: double recovery reservation for PG when EIO injected (while already re...
- David, can you look at this when you get a chance? I think it's due to EIO triggering recovery when recovery is alrea...
- 09:13 PM Bug #23204: missing primary copy of object in mixed luminous<->master cluster with bluestore
- We should see this again as we run the upgrade suite for mimic...
- 09:08 PM Bug #22902 (Resolved): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
- https://github.com/ceph/ceph/pull/20933
- 09:07 PM Bug #23267 (Pending Backport): scrub errors not cleared on replicas can cause inconsistent pg sta...
- 07:25 PM Backport #23413 (Resolved): jewel: delete type mismatch in CephContext teardown
- 07:23 PM Bug #20471 (Resolved): Can't repair corrupt object info due to bad oid on all replicas
- 07:23 PM Backport #23181 (Resolved): jewel: Can't repair corrupt object info due to bad oid on all replicas
- 06:24 PM Bug #21758 (Resolved): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
- 06:24 PM Backport #21784 (Resolved): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make check...
- 06:18 PM Feature #23242 (Resolved): ceph-objectstore-tool command to trim the pg log
- 06:18 PM Backport #23307 (Resolved): jewel: ceph-objectstore-tool command to trim the pg log
- 08:14 AM Feature #23552 (New): cache PK11Context in Connection and probably other consumers of CryptoKeyHa...
- please see attached flamegraph, the 0.67% CPU cycle is used by PK11_CreateContextBySymKey(), if we cache the PK11Cont...
04/03/2018
- 08:40 PM Bug #23145: OSD crashes during recovery of EC pg
- Investigation results up to the date:
1. The local PGLog claims its _pg_log_t::can_rollback_to_ is **17348'18588**... - 08:59 AM Backport #22906 (Need More Info): jewel: bluestore: New OSD - Caught signal - bstore_kv_sync (thr...
- non-trivial backport
- 08:56 AM Backport #22808 (Need More Info): jewel: "osd pool stats" shows recovery information bugly
- non-trivial backport
- 08:33 AM Backport #22808 (In Progress): jewel: "osd pool stats" shows recovery information bugly
- 08:28 AM Backport #22449 (In Progress): jewel: Visibility for snap trim queue length
- https://github.com/ceph/ceph/pull/21200
- 08:13 AM Backport #22449: jewel: Visibility for snap trim queue length
- I don't think it's possible to backport entire feature without breaking Jewel->Luminous upgrade, so just first commit...
- 08:22 AM Backport #22403 (In Progress): jewel: osd: replica read can trigger cache promotion
- 08:15 AM Backport #22390 (In Progress): jewel: ceph-objectstore-tool: Add option "dump-import" to examine ...
- 04:05 AM Backport #23486 (In Progress): jewel: scrub errors not cleared on replicas can cause inconsistent...
- 02:35 AM Backport #21786 (In Progress): jewel: OSDMap cache assert on shutdown
04/02/2018
- 05:35 PM Bug #23145: OSD crashes during recovery of EC pg
- Anything new or info on what to do to try and recover this cluster? I don't even know how to get the pool deleted pro...
- 10:28 AM Bug #23535: 'ceph --show-config --conf /dev/null' does not work any more
- I just realized `--show-config` does not exist anymore. Probably it was removed intentionally?
04/01/2018
- 07:49 AM Bug #23535 (Closed): 'ceph --show-config --conf /dev/null' does not work any more
- Previously it could be used by users to return the default ceph configuration (see e.g. [1]), now it fails (even if w...
- 07:03 AM Backport #21784 (In Progress): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make ch...
- 06:58 AM Backport #22449 (Need More Info): jewel: Visibility for snap trim queue length
- Backporting this feature to jewel at this late stage seems risky. Do we really need it in jewel?
03/30/2018
- 05:10 PM Bug #22123 (Resolved): osd: objecter sends out of sync with pg epochs for proxied ops
- 05:09 PM Backport #23076 (Resolved): jewel: osd: objecter sends out of sync with pg epochs for proxied ops
- 03:31 PM Bug #23511: forwarded osd_failure leak in mon
- rerunning the tests at http://pulpito.ceph.com/kchai-2018-03-30_15:29:52-rados-wip-slow-mon-ops-kefu-distro-basic-smi...
- 01:02 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- Moving this to CI. This failure would only occur if the cls_XYX.so libraries could not be loaded during the execution...
- 02:59 AM Bug #23517 (Resolved): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- ...
- 05:25 AM Bug #23510: rocksdb spillover for hard drive configurations
- Igor Fedotov wrote:
> Ben,
> this has been fixed by https://github.com/ceph/ceph/pull/19257
> Not sure about an ex... - 12:10 AM Bug #23403 (Triaged): Mon cannot join quorum
- ...
03/29/2018
- 06:39 PM Bug #21218 (Resolved): thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing...
- 06:39 PM Backport #23024 (Resolved): luminous: thrash-eio + bluestore (hangs with unfound objects or read_...
- 01:20 PM Backport #23024: luminous: thrash-eio + bluestore (hangs with unfound objects or read_log_and_mis...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20495
merged - 03:39 PM Bug #23510: rocksdb spillover for hard drive configurations
- Ben,
this has been fixed by https://github.com/ceph/ceph/pull/19257
Not sure about an exact Luminous build it lande... - 03:02 PM Bug #23510 (Resolved): rocksdb spillover for hard drive configurations
- version: ceph-*-12.2.1-34.el7cp.x86_64
One of Bluestore's best use cases is to accelerate performance for writes o... - 03:33 PM Bug #22413 (Resolved): can't delete object from pool when Ceph out of space
- 03:33 PM Backport #23114 (Resolved): luminous: can't delete object from pool when Ceph out of space
- 01:19 PM Backport #23114: luminous: can't delete object from pool when Ceph out of space
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20585
merged - 03:08 PM Bug #23511 (Can't reproduce): forwarded osd_failure leak in mon
- see http://pulpito.ceph.com/kchai-2018-03-29_13:20:02-rados-wip-slow-mon-ops-kefu-distro-basic-smithi/2334154/
<p... - 01:24 PM Bug #22847 (Resolved): ceph osd force-create-pg cause all ceph-mon to crash and unable to come up...
- 01:24 PM Backport #22942 (Resolved): luminous: ceph osd force-create-pg cause all ceph-mon to crash and un...
- 01:21 PM Backport #22942: luminous: ceph osd force-create-pg cause all ceph-mon to crash and unable to com...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20399
merged - 01:23 PM Backport #23075 (Resolved): luminous: osd: objecter sends out of sync with pg epochs for proxied ops
- 01:18 PM Backport #23075: luminous: osd: objecter sends out of sync with pg epochs for proxied ops
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20609
merged - 10:28 AM Bug #19737 (Resolved): EAGAIN encountered during pg scrub (jewel)
- 09:54 AM Backport #23500 (In Progress): luminous: snapmapper inconsistency, crash on luminous
- 08:20 AM Backport #23500 (Resolved): luminous: snapmapper inconsistency, crash on luminous
- https://github.com/ceph/ceph/pull/21118
- 09:16 AM Bug #21844 (Resolved): Objecter::C_ObjectOperation_sparse_read throws/catches exceptions on -ENOENT
- 09:16 AM Backport #21923 (Resolved): jewel: Objecter::C_ObjectOperation_sparse_read throws/catches excepti...
- 09:16 AM Bug #23403: Mon cannot join quorum
- Hi all,
As asked on the ceph-users mailing list, here are the results of the following commands on the 3 monitors:... - 09:09 AM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
- Happened again (jewel 10.2.11 integration testing) - http://qa-proxy.ceph.com/teuthology/smithfarm-2018-03-28_20:31:4...
- 08:25 AM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
- I've seen this on our cluster (luminous, bluestore based), but was unable to reproduce it...
Restarting primary mon... - 01:43 AM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
- when we reboot one host, some osd take a long time to start.
and one osd succeed to start finally after several tim... - 01:11 AM Bug #17170 (New): mon/monclient: update "unable to obtain rotating service keys when osd init" to...
- We hit this issue again in Luminous.
- 08:16 AM Backport #23186 (Resolved): luminous: ceph tell mds.* <command> prints only one matching usage
- 08:15 AM Bug #23212 (Resolved): bluestore: should recalc_allocated when decoding bluefs_fnode_t
- 08:15 AM Backport #23256 (Resolved): luminous: bluestore: should recalc_allocated when decoding bluefs_fno...
- 08:15 AM Bug #23298 (Resolved): filestore: do_copy_range replay bad return value
- 08:14 AM Backport #23351 (Resolved): luminous: filestore: do_copy_range replay bad return value
- 04:10 AM Bug #23228: scrub mismatch on objects
Just bytes
dzafman-2018-03-28_18:21:29-rados-wip-zafman-testing-distro-basic-smithi/2332093
[ERR] 3.0 scrub s...- 04:07 AM Bug #23495 (Resolved): Need (SLOW_OPS) in whitelist for another yaml
A job may have failed because (SLOW_OPS) is missing from tasks/mon_clock_with_skews.yaml
dzafman-2018-03-28_18:2...- 02:09 AM Feature #23493 (Resolved): config: strip/escape single-quotes in values when setting them via con...
- At the moment, the config parsing state machine does not account for single-quotes as potential value enclosures, as ...
- 01:09 AM Bug #23492 (Resolved): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-e...
dzafman-2018-03-28_15:20:23-rados:standalone-wip-zafman-testing-distro-basic-smithi/2331804
In TEST_rados_get_ba...- 12:29 AM Bug #22752 (Pending Backport): snapmapper inconsistency, crash on luminous
03/28/2018
- 10:58 PM Bug #23490 (Duplicate): luminous: osd: double recovery reservation for PG when EIO injected (whil...
- During a luminous test run, this was hit:
http://pulpito.ceph.com/yuriw-2018-03-27_21:16:27-rados-wip-yuri5-testin... - 10:26 PM Backport #23186: luminous: ceph tell mds.* <command> prints only one matching usage
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/20664
merged - 10:26 PM Backport #23256: luminous: bluestore: should recalc_allocated when decoding bluefs_fnode_t
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/20771
merged - 10:22 PM Backport #23351: luminous: filestore: do_copy_range replay bad return value
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20957
merged - 06:06 PM Bug #23487 (Fix Under Review): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- PR: https://github.com/ceph/ceph/pull/21102
- 05:58 PM Bug #23487 (Resolved): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- We have `ceph osd pool set erasure allow_ec_overwrites` command but does not have a corresponding command to get the ...
- 05:42 PM Backport #23486 (Resolved): jewel: scrub errors not cleared on replicas can cause inconsistent pg...
- https://github.com/ceph/ceph/pull/21194
- 05:42 PM Backport #23485 (Resolved): luminous: scrub errors not cleared on replicas can cause inconsistent...
- https://github.com/ceph/ceph/pull/21103
- 05:27 PM Bug #23267 (Fix Under Review): scrub errors not cleared on replicas can cause inconsistent pg sta...
- https://github.com/ceph/ceph/pull/21101
- 11:21 AM Bug #22114 (Pending Backport): mon: ops get stuck in "resend forwarded message to leader"
- 08:15 AM Backport #23478 (In Progress): should not check for VERSION_ID
- https://github.com/ceph/ceph/pull/21090
- 08:08 AM Backport #23478 (Resolved): should not check for VERSION_ID
- https://github.com/ceph/ceph/pull/21090
- 08:07 AM Bug #23477 (Pending Backport): should not check for VERSION_ID
- * https://github.com/ceph/ceph/pull/17787
* https://github.com/ceph/ceph/pull/21052
- 08:06 AM Bug #23477 (Resolved): should not check for VERSION_ID
- as per os-release(5), VERSION_ID is optional.
- 07:06 AM Bug #23352: osd: segfaults under normal operation
- for those who wants to check the coredump. you should use apport-unpack to unpack it first.
and it crashed at /bui... - 05:55 AM Backport #23413 (In Progress): jewel: delete type mismatch in CephContext teardown
- https://github.com/ceph/ceph/pull/21084
- 01:28 AM Backport #23472 (In Progress): luminous: add --add-bucket and --move options to crushtool
- https://github.com/ceph/ceph/pull/21079
- 12:57 AM Backport #23472 (Resolved): luminous: add --add-bucket and --move options to crushtool
- https://github.com/ceph/ceph/pull/21079
- 12:50 AM Bug #23471 (Pending Backport): add --add-bucket and --move options to crushtool
- https://github.com/ceph/ceph/pull/20183
- 12:49 AM Bug #23471 (Resolved): add --add-bucket and --move options to crushtool
- When using crushtool to create a CRUSH map, it is not possible to create a complex CRUSH map, we have to edit the CRU...
03/27/2018
- 10:46 PM Bug #23352: osd: segfaults under normal operation
- Chris,
Was your stack identical to Alex's original description or was it more like the stack in #23431 ? - 10:39 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- I agree these are similar and the cause may indeed be the same however there are only two stack frames in this instan...
- 07:36 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- There's a coredump-in-apport on google drive in http://tracker.ceph.com/issues/23352 - it looks at the face of it sim...
- 01:06 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- I have seen this as well, on our cluster. We're using bluestore, ubuntu 16, latest luminous.
The crashes were totall... - 10:58 AM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- The ceph-osd comes from https://download.ceph.com/rpm-luminous/el7/x86_64/
I verified via md5sum if the the local co... - 09:43 AM Bug #23431 (Need More Info): OSD Segmentation fault in thread_name:safe_timer
- What's the exact version of the ceph-osd you are using (exact package URL if possible please).
You could try 'objd... - 02:52 PM Feature #22420 (Resolved): Add support for obtaining a list of available compression options
- https://github.com/ceph/ceph/pull/20558
- 02:45 PM Bug #23215 (Resolved): config.cc: ~/.ceph/$cluster.conf is passed unexpanded to fopen()
- https://github.com/ceph/ceph/pull/20774
- 09:49 AM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
- might want to include https://github.com/ceph/ceph/pull/21057 also.
- 09:49 AM Bug #22114 (Fix Under Review): mon: ops get stuck in "resend forwarded message to leader"
- and https://github.com/ceph/ceph/pull/21057
- 01:35 AM Bug #22220 (Resolved): osd/ReplicatedPG.h:1667:14: internal compiler error: in force_type_die, at...
- Resolved for Fedora and just waiting on next DTS to ship on rhel/CentOS.
03/26/2018
- 11:27 PM Bug #23465: "Mutex.cc: 110: FAILED assert(r == 0)" ("AttributeError: 'tuple' object has no attrib...
- This isn't related to that suite commit. Run manually, 'file' returns "remote/smithi150/coredump/1522085413.12350.cor...
- 07:42 PM Bug #23465 (New): "Mutex.cc: 110: FAILED assert(r == 0)" ("AttributeError: 'tuple' object has no ...
- I see latest commit https://github.com/ceph/ceph/commit/c6760eba50860d40e25483c3e4cee772f3ad4468#diff-289c6ff15fd25ac...
- 09:11 AM Backport #23316 (Need More Info): jewel: pool create cmd's expected_num_objects is not correctly ...
- To backport this to jewel, we need to skip mgr changes and qa/standalone/mon/osd-pool-create.sh related changes to be...
Also available in: Atom