Activity
From 01/03/2021 to 02/01/2021
02/01/2021
- 07:41 PM Backport #48595 (Resolved): nautilus: nautilus: qa/standalone/scrub/osd-scrub-test.sh: _scrub_abo...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39125
m... - 05:31 PM Backport #48595: nautilus: nautilus: qa/standalone/scrub/osd-scrub-test.sh: _scrub_abort: return 1
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/39125
merged - 07:39 PM Backport #48379 (Resolved): nautilus: invalid values of crush-failure-domain should not be allowe...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39124
m... - 05:29 PM Backport #48379: nautilus: invalid values of crush-failure-domain should not be allowed while cre...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/39124
merged - 06:59 PM Bug #49064 (In Progress): test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeys...
- 06:15 PM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
- rados/thrash/{0-size-min-size-overrides/2-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overrides/{more...
- 06:05 PM Bug #44595: cache tiering: Error: oid 48 copy_from 493 returned error code -2
- ...
- 02:49 PM Bug #49069 (Fix Under Review): mds crashes on v15.2.8 -> master upgrade decoding MMgrConfigure
- 11:57 AM Bug #48909: clog slow request overwhelm monitors
- PR: https://github.com/ceph/ceph/pull/39199
- 04:20 AM Backport #49073 (In Progress): nautilus: crash in Objecter and CRUSH map lookup
- 04:19 AM Backport #49073 (Resolved): nautilus: crash in Objecter and CRUSH map lookup
- https://github.com/ceph/ceph/pull/39197
01/31/2021
- 10:39 PM Bug #49072: Segmentation fault in thread_name:tp_osd_tp apparently in libpthread
- Note that Kefu did the heavy lifting in comment 3.
- 10:37 PM Bug #49072 (Resolved): Segmentation fault in thread_name:tp_osd_tp apparently in libpthread
- /a/kchai-2021-01-11_11:52:22-rados-wip-kefu2-testing-2021-01-10-1949-distro-basic-smithi/5777646
PG::recovery_stat... - 10:35 PM Bug #49072: Segmentation fault in thread_name:tp_osd_tp apparently in libpthread
- /a/jafaj-2021-01-05_16:20:30-rados-wip-jan-testing-2021-01-05-1401-distro-basic-smithi/5756811 with logs, coredump is...
- 10:34 PM Bug #49072: Segmentation fault in thread_name:tp_osd_tp apparently in libpthread
- Looks like this might be it....
- 10:29 PM Bug #49072 (Resolved): Segmentation fault in thread_name:tp_osd_tp apparently in libpthread
- I suspect there is memory corruption involved and that this is a badly corrupted stack....
- 02:48 PM Bug #49069 (Resolved): mds crashes on v15.2.8 -> master upgrade decoding MMgrConfigure
- ...
01/30/2021
- 12:45 PM Bug #48793 (Fix Under Review): out of order op
- 12:29 AM Bug #48990: rados/dashboard: Health check failed: Telemetry requires re-opt-in (TELEMETRY_CHANGED...
- Trying to reproduce the issue:
https://pulpito.ceph.com/yaarit-2021-01-29_19:21:30-rados:dashboard-pacific-distro-ba...
01/29/2021
- 07:16 PM Backport #48986 (In Progress): pacific: ceph osd df tree reporting incorrect SIZE value for rack ...
- https://github.com/ceph/ceph/pull/39180
- 07:12 PM Backport #49058 (In Progress): pacific: thrash_cache_writeback_proxy_none: FAILED ceph_assert(ver...
- https://github.com/ceph/ceph/pull/39179
- 03:40 PM Backport #49058 (Resolved): pacific: thrash_cache_writeback_proxy_none: FAILED ceph_assert(versio...
- https://github.com/ceph/ceph/pull/39179
- 06:36 PM Bug #49064: test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder ...
- ...
- 06:22 PM Bug #49064 (Resolved): test_envlibrados_for_rocksdb.sh: EnvLibradosMutipoolTest.DBBulkLoadKeysInR...
- ...
- 03:38 PM Bug #46323 (Pending Backport): thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == o...
- 01:18 AM Bug #46323 (Fix Under Review): thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == o...
- 08:17 AM Backport #48496 (In Progress): octopus: Paxos::restart() and Paxos::shutdown() can race leading t...
- 08:15 AM Backport #48495 (In Progress): nautilus: Paxos::restart() and Paxos::shutdown() can race leading ...
- 08:14 AM Backport #48495: nautilus: Paxos::restart() and Paxos::shutdown() can race leading to use-after-f...
- > @Nathan, Definitely. It was originally "Seen in Nautilus" (downstream BZ) and the race condition still exists.
@... - 07:30 AM Backport #49055 (Resolved): nautilus: pick_a_shard() always select shard 0
- https://github.com/ceph/ceph/pull/39651
- 07:30 AM Backport #49054 (Resolved): pacific: pick_a_shard() always select shard 0
- https://github.com/ceph/ceph/pull/39977
- 07:30 AM Backport #49053 (Resolved): octopus: pick_a_shard() always select shard 0
- https://github.com/ceph/ceph/pull/39978
- 07:25 AM Bug #49052 (Resolved): pick_a_shard() always select shard 0
- 07:18 AM Bug #47003: ceph_test_rados test error. Reponses out of order due to the connection drops data.
- /a//kchai-2021-01-28_03:28:19-rados-wip-kefu-testing-2021-01-27-1353-distro-basic-smithi/5834177
- 07:00 AM Bug #48613: Reproduce https://tracker.ceph.com/issues/48417
- I did a run with https://github.com/ceph/ceph/pull/38906/commits ( passes )
http://pulpito.front.sepia.ceph.com/idee... - 01:17 AM Bug #49050 (New): Make thrash_cache_writeback_proxy_none work with writeback overlay
- In https://github.com/ceph/ceph/pull/39152, we have disabled some tests since
(1) they cause noise in daily rados r...
01/28/2021
- 11:04 PM Backport #48496: octopus: Paxos::restart() and Paxos::shutdown() can race leading to use-after-fr...
- Nathan Cutler wrote:
> @Brad, are you sure this backport is applicable to octopus? If it's not applicable, please ch... - 11:11 AM Backport #48496 (Need More Info): octopus: Paxos::restart() and Paxos::shutdown() can race leadin...
- @Brad, are you sure this backport is applicable to octopus? If it's not applicable, please change Status to "Rejected...
- 11:02 PM Backport #48495: nautilus: Paxos::restart() and Paxos::shutdown() can race leading to use-after-f...
- Nathan Cutler wrote:
> @Brad, are you sure this backport is applicable to nautilus? If it's not applicable, please c... - 11:10 AM Backport #48495 (Need More Info): nautilus: Paxos::restart() and Paxos::shutdown() can race leadi...
- @Brad, are you sure this backport is applicable to nautilus? If it's not applicable, please change Status to "Rejecte...
- 06:22 PM Bug #48997: rados/singleton/all/recovery-preemption: defer backfill|defer recovery not found in logs
- /a/teuthology-2021-01-26_19:05:09-rados-pacific-distro-basic-smithi/5831527
- 05:08 PM Bug #48946 (Fix Under Review): Disable and re-enable clog_to_monitors could trigger assertion
- 01:52 AM Bug #48946 (In Progress): Disable and re-enable clog_to_monitors could trigger assertion
- 01:22 PM Bug #48793 (In Progress): out of order op
- In the revised scrub code there is a period in which:
- the scrub is marked as 'preempted', and
- preemption is alr... - 11:16 AM Backport #48987 (In Progress): nautilus: ceph osd df tree reporting incorrect SIZE value for rack...
- 11:13 AM Backport #48595 (In Progress): nautilus: nautilus: qa/standalone/scrub/osd-scrub-test.sh: _scrub_...
- 11:07 AM Backport #48379 (In Progress): nautilus: invalid values of crush-failure-domain should not be all...
01/27/2021
- 08:10 PM Bug #36473 (Resolved): hung osd_repop, bluestore committed but failed to trigger repop_commit
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:07 PM Bug #39525 (Resolved): lz4 compressor corrupts data when buffers are unaligned
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:06 PM Bug #40792 (Resolved): monc: send_command to specific down mon breaks other mon msgs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:05 PM Bug #41190 (Resolved): osd: pg stuck in waitactingchange when new acting set doesn't change
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:04 PM Bug #42452 (Resolved): msg/async: the event center is blocked by rdma construct conection for tra...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:04 PM Bug #42477 (Resolved): Rados should use the '-o outfile' convention
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:03 PM Bug #42977 (Resolved): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:03 PM Bug #43311 (Resolved): asynchronous recovery + backfill might spin pg undersized for a long time
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:03 PM Bug #43582 (Resolved): rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:02 PM Bug #44407 (Resolved): mon: Get session_map_lock before remove_session
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:01 PM Bug #45076 (Resolved): rados: Sharded OpWQ drops suicide_grace after waiting for work
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:00 PM Bug #47044 (Resolved): PG::_delete_some isn't optimal iterating objects
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:00 PM Bug #47328 (Resolved): nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:56 PM Documentation #23354 (Resolved): doc: osd_op_queue & osd_op_queue_cut_off
- 07:54 PM Backport #48481 (Rejected): mimic: PG::_delete_some isn't optimal iterating objects
- 07:54 PM Backport #47992 (Rejected): mimic: nautilus: ObjectStore/SimpleCloneTest: invalid rm coll
- 07:54 PM Backport #44467 (Rejected): mimic: mon: Get session_map_lock before remove_session
- 07:54 PM Backport #45025 (Rejected): mimic: hung osd_repop, bluestore committed but failed to trigger repo...
- 07:54 PM Backport #44369 (Rejected): mimic: msg/async: the event center is blocked by rdma construct conec...
- 07:54 PM Backport #44088 (Rejected): mimic: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- 07:54 PM Backport #44368 (Rejected): mimic: Rados should use the '-o outfile' convention
- 07:54 PM Backport #44086 (Rejected): mimic: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- 07:54 PM Backport #43622 (Rejected): mimic: pg: fastinfo incorrect when last_update moves backward in time
- 07:54 PM Backport #43991 (Rejected): mimic: objecter doesn't send osd_op
- 07:54 PM Backport #41732 (Rejected): mimic: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missin...
- 07:54 PM Backport #42847 (Rejected): mimic: "failing miserably..." in Infiniband.cc
- 07:54 PM Backport #41546 (Rejected): mimic: monc: send_command to specific down mon breaks other mon msgs
- 07:53 PM Backport #45892 (Rejected): mimic: osd: pg stuck in waitactingchange when new acting set doesn't ...
- mimic EOL
- 07:53 PM Backport #45358 (Rejected): mimic: rados: Sharded OpWQ drops suicide_grace after waiting for work
- mimic EOL
- 07:52 PM Backport #45038 (Rejected): mimic: mon: reset min_size when changing pool size
- 07:51 PM Backport #44489 (Rejected): mimic: lz4 compressor corrupts data when buffers are unaligned
- mimic EOL
- 07:51 PM Backport #43470 (Rejected): mimic: asynchronous recovery + backfill might spin pg undersized for ...
- mimic EOL
- 07:33 PM Backport #45891 (Rejected): luminous: osd: pg stuck in waitactingchange when new acting set doesn...
- luminous EOL
- 07:20 PM Bug #43929 (Resolved): osd: Allow 64-char hostname to be added as the "host" in CRUSH
- 07:19 PM Backport #43988 (Rejected): luminous: osd: Allow 64-char hostname to be added as the "host" in CRUSH
- luminous EOL
- 07:18 PM Bug #42114 (Resolved): mon: /var/lib/ceph/mon/* data (esp rocksdb) is not 0600
- 07:18 PM Backport #42201 (Rejected): mimic: mon: /var/lib/ceph/mon/* data (esp rocksdb) is not 0600
- mimic EOL
- 07:18 PM Bug #42577 (Rejected): acting_recovery_backfill won't catch all up peers
- 07:17 PM Backport #42202 (Rejected): luminous: mon: /var/lib/ceph/mon/* data (esp rocksdb) is not 0600
- 07:16 PM Backport #42996 (Rejected): luminous: acting_recovery_backfill won't catch all up peers
- eol
- 07:14 PM Bug #23816 (Resolved): disable bluestore cache caused a rocksdb error
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:14 PM Bug #24664 (Resolved): osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:12 PM Bug #38724 (Resolved): _txc_add_transaction error (39) Directory not empty not handled on operati...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:12 PM Bug #39174 (Resolved): crushtool crash on Fedora 28 and newer
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:12 PM Bug #39390 (Resolved): filestore pre-split may not split enough directories
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:12 PM Bug #39439 (Resolved): osd: segv in _preboot -> heartbeat
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:10 PM Bug #40483 (Resolved): Pool settings aren't populated to OSD after restart.
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:09 PM Bug #40634 (Resolved): mon: auth mon isn't loading full KeyServerData after restart
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:09 PM Bug #40804 (Resolved): ceph mgr module ls -f plain crashes mon
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:08 PM Bug #41601 (Resolved): oi(object_info_t).size does not match on disk size
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:07 PM Bug #43306 (Resolved): segv in collect_sys_info
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 07:02 PM Backport #44084 (Rejected): luminous: rebuild-mondb doesn't populate mgr commands -> pg dump EINVAL
- 07:02 PM Backport #44087 (Rejected): luminous: mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
- 07:02 PM Backport #43621 (Rejected): luminous: pg: fastinfo incorrect when last_update moves backward in time
- 07:02 PM Backport #43632 (Rejected): luminous: segv in collect_sys_info
- 07:02 PM Backport #41702 (Rejected): luminous: oi(object_info_t).size does not match on disk size
- 07:02 PM Backport #40889 (Rejected): luminous: Pool settings aren't populated to OSD after restart.
- 07:02 PM Backport #41547 (Rejected): luminous: monc: send_command to specific down mon breaks other mon msgs
- 07:02 PM Backport #40883 (Rejected): luminous: ceph mgr module ls -f plain crashes mon
- 07:02 PM Backport #39694 (Rejected): luminous: _txc_add_transaction error (39) Directory not empty not han...
- 07:02 PM Backport #40731 (Rejected): luminous: mon: auth mon isn't loading full KeyServerData after restart
- 07:02 PM Backport #39681 (Rejected): luminous: filestore pre-split may not split enough directories
- 07:02 PM Backport #39309 (Rejected): luminous: crushtool crash on Fedora 28 and newer
- 07:02 PM Backport #39515 (Rejected): luminous: osd: segv in _preboot -> heartbeat
- 07:02 PM Backport #24888 (Rejected): luminous: osd: crash in OpTracker::unregister_inflight_op via OSD::ge...
- 07:02 PM Backport #23926 (Rejected): luminous: disable bluestore cache caused a rocksdb error
- 04:32 PM Bug #48793 (Triaged): out of order op
- 12:44 AM Bug #48793: out of order op
- Ronen, can you please check if this is related to your scrub refactor?
- 12:37 AM Bug #48793: out of order op
- /a/jafaj-2021-01-05_16:20:30-rados-wip-jan-testing-2021-01-05-1401-distro-basic-smithi/5756733
Following is the pr... - 11:17 AM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
- rados/singleton/{all/thrash_cache_writeback_proxy_none mon_election/classic msgr-failures/few msgr/async-v2only objec...
01/26/2021
- 10:53 PM Bug #49020 (New): rados subcommand rmomapkey does not report error when key provided not found
- In the following command, the object exists in the pool, but there is no omap key "waldo". When the following is run:...
- 10:05 PM Bug #48793 (New): out of order op
- 07:42 PM Bug #48793 (Need More Info): out of order op
- rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/classic msgr-failures/osd-delay objectst...
- 07:39 PM Bug #48793: out of order op
- This is not related to https://github.com/ceph/ceph/pull/38111.
rados/thrash/{0-size-min-size-overrides/2-size-2-m... - 09:30 AM Backport #49009 (Resolved): octopus: osd crash in OSD::heartbeat when dereferencing null session
- https://github.com/ceph/ceph/pull/40277
- 09:30 AM Backport #49008 (Resolved): pacific: osd crash in OSD::heartbeat when dereferencing null session
- https://github.com/ceph/ceph/pull/40246
- 09:28 AM Bug #48821 (Pending Backport): osd crash in OSD::heartbeat when dereferencing null session
- 01:20 AM Bug #48998: Scrubbing terminated -- not all pgs were active and clean
- PG 2.4 is in active+recovering+undersized+degraded+remapped...
- 01:16 AM Bug #48998 (New): Scrubbing terminated -- not all pgs were active and clean
- ...
- 12:38 AM Bug #48997 (Can't reproduce): rados/singleton/all/recovery-preemption: defer backfill|defer recov...
- ...
01/25/2021
- 11:43 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- /a/teuthology-2021-01-23_07:01:02-rados-master-distro-basic-gibba/5819503
- 06:58 PM Bug #48990 (Resolved): rados/dashboard: Health check failed: Telemetry requires re-opt-in (TELEME...
- ...
- 06:20 PM Backport #48987 (Resolved): nautilus: ceph osd df tree reporting incorrect SIZE value for rack ha...
- https://github.com/ceph/ceph/pull/39126
- 06:20 PM Backport #48986 (Resolved): pacific: ceph osd df tree reporting incorrect SIZE value for rack hav...
- 06:20 PM Backport #48985 (Resolved): octopus: ceph osd df tree reporting incorrect SIZE value for rack hav...
- https://github.com/ceph/ceph/pull/39970
- 06:17 PM Bug #48884 (Pending Backport): ceph osd df tree reporting incorrect SIZE value for rack having an...
- 06:01 PM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
- /a/ideepika-2021-01-22_07:01:14-rados-wip-deepika-testing-master-2021-01-22-0047-distro-basic-smithi/5814986
- 04:19 PM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
- /a/sage-2021-01-25_15:21:38-rados:cephadm:thrash-wip-sage-testing-2021-01-23-1326-distro-basic-smithi/5828000
- 05:48 PM Bug #48984 (Resolved): lazy_omap_stats_test: "ceph osd deep-scrub all" hangs
- ...
- 04:19 PM Bug #48786: api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcount2 failed
- /a/sage-2021-01-25_15:21:38-rados:cephadm:thrash-wip-sage-testing-2021-01-23-1326-distro-basic-smithi/5828000
01/24/2021
- 08:27 PM Bug #46437 (Fix Under Review): Admin Socket leaves behind .asok files after daemons (ex: RGW) shu...
01/22/2021
- 11:12 PM Bug #43893 (Duplicate): lingering osd_failure ops (due to failure_info holding references?)
- Let's use https://tracker.ceph.com/issues/47380 to track this.
- 08:14 PM Bug #48965 (New): qa/standalone/osd/osd-force-create-pg.sh: TEST_reuse_id: return 1
- ...
- 06:46 PM Bug #48871: nautilus: rados/test_crash.sh: "kill ceph-osd" times out
- ...
- 11:32 AM Backport #48378 (Resolved): octopus: invalid values of crush-failure-domain should not be allowed...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38347
m... - 01:40 AM Bug #45441: rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
- The rados/singleton MON_DOWN failures with msgr-failures/many should be safe to ignore.
/a/nojha-2021-01-20_23:46:...
01/21/2021
- 07:36 PM Bug #48959 (Resolved): Primary OSD crash caused corrupted object and further crashes during backf...
- Hi,
We ran into an issue on our EC, object storage, Nautilus cluster just before Christmas, that although we 'reso... - 04:52 PM Bug #48906 (Resolved): wait_for_recovery: failed before timeout expired with tests that override ...
- 04:52 PM Backport #48378: octopus: invalid values of crush-failure-domain should not be allowed while crea...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/38347
merged - 04:52 PM Backport #48949 (Resolved): pacific: wait_for_recovery: failed before timeout expired with tests ...
- https://github.com/ceph/ceph/pull/38993
- 11:20 AM Backport #48949 (Resolved): pacific: wait_for_recovery: failed before timeout expired with tests ...
- 02:42 PM Bug #48954 (New): the results of _merge_object_divergent_entries on Primary and Replica are incon...
- ...
- 08:46 AM Bug #48946: Disable and re-enable clog_to_monitors could trigger assertion
- PR
https://github.com/ceph/ceph/pull/38997 - 08:06 AM Bug #48946 (Resolved): Disable and re-enable clog_to_monitors could trigger assertion
- Steps to reproduce:
1. Disable clog_to_monitors for OSDs:
ceph tell osd.* injectargs '--clog_to_monitors=true'
...
01/20/2021
- 10:12 PM Bug #48786 (In Progress): api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcou...
- 08:44 PM Bug #48906: wait_for_recovery: failed before timeout expired with tests that override osd_async_r...
- pacific backport: https://github.com/ceph/ceph/pull/38993
- 08:38 PM Bug #48906 (Pending Backport): wait_for_recovery: failed before timeout expired with tests that o...
- 03:29 PM Bug #48609 (Fix Under Review): osd/PGLog: don’t fast-forward can_rollback_to during merge_log if ...
- 01:07 AM Bug #48918 (Resolved): valgrind issue in PgScrubber::dump: Use of uninitialised value of size 8
- 12:02 AM Bug #48918 (Pending Backport): valgrind issue in PgScrubber::dump: Use of uninitialised value of ...
- Pacific backport https://github.com/ceph/ceph/pull/38979
- 12:53 AM Bug #48899 (Resolved): api_list: LibRadosList.EnumerateObjects and LibRadosList.EnumerateObjectsS...
- Should be fixed by https://github.com/ceph/ceph/pull/38959.
01/19/2021
- 11:14 PM Bug #48899: api_list: LibRadosList.EnumerateObjects and LibRadosList.EnumerateObjectsSplit failed
- EnumerateObjectsSplit and EnumerateObjects rely on a healthy cluster. When this happens, this is the state of the clu...
- 10:24 PM Bug #48884 (Fix Under Review): ceph osd df tree reporting incorrect SIZE value for rack having an...
- 07:49 PM Bug #48909 (In Progress): clog slow request overwhelm monitors
- The clog introduced by issue#43975 should either be removed or configurable to prevent issues on large, high-throughp...
- 07:37 PM Bug #48915 (In Progress): api_tier_pp: LibRadosTwoPoolsPP.ManifestFlushDupCount failed
- 08:38 AM Bug #48915: api_tier_pp: LibRadosTwoPoolsPP.ManifestFlushDupCount failed
- https://github.com/ceph/ceph/pull/38937
- 02:43 PM Bug #43948: Remapped PGs are sometimes not deleted from previous OSDs
- I have the exact same problem with version 15.2.8 as described above and verified it following Eric's steps.
After r... - 05:44 AM Bug #48918 (Fix Under Review): valgrind issue in PgScrubber::dump: Use of uninitialised value of ...
- 01:14 AM Bug #48786: api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcount2 failed
- https://github.com/ceph/ceph/pull/38937
01/18/2021
- 11:43 PM Bug #48918 (Resolved): valgrind issue in PgScrubber::dump: Use of uninitialised value of size 8
- ...
- 10:45 PM Bug #48917 (New): no reply for copy-get
- ...
- 06:59 PM Bug #48915 (Duplicate): api_tier_pp: LibRadosTwoPoolsPP.ManifestFlushDupCount failed
- ...
- 05:58 PM Bug #46318: mon_recovery: quorum_status times out
- We are still seeing these.
/a/teuthology-2021-01-18_07:01:01-rados-master-distro-basic-smithi/5798278 - 04:25 PM Bug #48906 (Fix Under Review): wait_for_recovery: failed before timeout expired with tests that o...
- 03:18 PM Documentation #17871 (Closed): crush-map document could use clearer warning about impact of chang...
- Josh and I spoke about this and agreed that this doesn't require an explicit warning.
I am changing the status of th... - 02:30 PM Bug #48583 (Resolved): nautilus: Log files are created with rights root:root
- 02:29 PM Bug #48583 (Pending Backport): nautilus: Log files are created with rights root:root
- 08:59 AM Bug #48909 (Duplicate): clog slow request overwhelm monitors
- A recent change https://tracker.ceph.com/issues/43975 logs details for each slow request and sends to monitors
But o... - 08:27 AM Bug #48908 (Need More Info): EC Pool OSD crashes
- Recently, I have started seeing a lot of these OSD crashes:...
01/17/2021
- 02:21 AM Bug #48906 (Resolved): wait_for_recovery: failed before timeout expired with tests that override ...
- ...
01/16/2021
- 05:26 AM Bug #48821 (Fix Under Review): osd crash in OSD::heartbeat when dereferencing null session
- 04:44 AM Bug #48821 (In Progress): osd crash in OSD::heartbeat when dereferencing null session
- 01:32 AM Bug #47003: ceph_test_rados test error. Reponses out of order due to the connection drops data.
- rados/thrash-erasure-code/{ceph clusters/{fixed-2 openstack} fast/fast mon_election/classic msgr-failures/few objects...
- 01:28 AM Bug #48789 (Resolved): qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
- 01:25 AM Bug #47949: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
- /a/yuriw-2021-01-15_19:06:33-rados-wip-yuri8-testing-master-2021-01-15-0935-distro-basic-smithi/5789552
01/15/2021
- 10:35 PM Bug #48821: osd crash in OSD::heartbeat when dereferencing null session
- sounds right, would you like to create a quick PR for this?
- 10:33 PM Bug #48840: Octopus: Assert failure: test_ceph_osd_pool_create_utf8
- Not sure what you mean by we should fix the test case, why is this only a problem in octopus?
- 06:22 PM Bug #48609 (In Progress): osd/PGLog: don’t fast-forward can_rollback_to during merge_log if the l...
- 06:02 PM Bug #48899 (Resolved): api_list: LibRadosList.EnumerateObjects and LibRadosList.EnumerateObjectsS...
- ...
- 05:57 PM Bug #48844: api_watch_notify: LibRadosWatchNotify.AioWatchDelete failed
- /a/nojha-2021-01-15_01:03:38-rados-wip-38730-2021-01-14-distro-basic-smithi/5786950
- 05:53 PM Bug #48786: api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcount2 failed
- /a/nojha-2021-01-15_01:03:38-rados-wip-38730-2021-01-14-distro-basic-smithi/5786912
- 09:30 AM Bug #48786: api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcount2 failed
- Due to false-positive design on reference counting,
we do not care whether a decrement operation is complete or not.... - 05:47 PM Bug #48896 (Fix Under Review): osd/OSDMap.cc: FAILED ceph_assert(osd_weight.count(i.first))
- ...
- 05:47 PM Bug #48842 (Resolved): qa/standalone/osd/osd-recovery-prio.sh: TEST_recovery_pool_priority failed
- 01:23 AM Bug #48884 (Resolved): ceph osd df tree reporting incorrect SIZE value for rack having an empty h...
- This was discovered on luminous but master behaves similarly....
01/14/2021
- 08:27 PM Bug #48583 (Resolved): nautilus: Log files are created with rights root:root
- 04:16 PM Bug #48583: nautilus: Log files are created with rights root:root
- https://github.com/ceph/ceph/pull/38558 merged
- 05:44 PM Bug #46978: OSD: shutdown of a OSD Host causes slow requests
- If the PR is merged, can this be backported to Octopus/Nautilus? (I can't update the fields.) Thanks!
- 05:32 PM Bug #46978 (Fix Under Review): OSD: shutdown of a OSD Host causes slow requests
- 05:31 PM Bug #46978: OSD: shutdown of a OSD Host causes slow requests
- Test steps, plus testing with vstart, run-make-check, and qa/run-standalone look good.
https://github.com/ceph/cep... - 04:06 PM Bug #48153 (Resolved): collection_list_legacy: pg inconsistent
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 03:36 PM Bug #48875 (New): qa: OSDThrasher gets stuck during unwind
- ...
- 07:34 AM Bug #48867: After enabling ec overwrite, the object of the data pool is unfounded
- There is a large number of osd IDs, 2147483647....
- 04:33 AM Bug #48867: After enabling ec overwrite, the object of the data pool is unfounded
- Is there any other way to recover unfounded pg?
- 04:30 AM Bug #48867 (New): After enabling ec overwrite, the object of the data pool is unfounded
- After enabling ec overwrite, the object of the data pool is unfounded
Hello there,
I have a cluster, using ceph 1... - 07:33 AM Bug #48871 (New): nautilus: rados/test_crash.sh: "kill ceph-osd" times out
- failure reason: ...
01/13/2021
- 06:39 AM Bug #48855: OSD_SUPERBLOCK Checksum failed after node restart
- *CRC verification failure also occurred in another OSD, the detailed information as follows:*
【scene】
CRC verifica... - 04:03 AM Bug #48855 (New): OSD_SUPERBLOCK Checksum failed after node restart
- 【scene】
After the OSD node is restarted, the OSD power on process reads OSD_SUPERBLOCK,checksum verification failed,... - 03:31 AM Bug #48853 (New): OSD_SUPERBLOCK Checksun failed after node restart
01/12/2021
- 12:11 PM Bug #44945: Mon High CPU usage when another mon syncing from it
- I think this might be related to #42830. If so it may be resolved with Ceph Nautilus 14.2.10 by backport #44464. If s...
- 11:41 AM Backport #48482: nautilus: PG::_delete_some isn't optimal iterating objects
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38478
m... - 10:46 AM Bug #42830: problem returning mon to cluster
- Am I correct in thinking this is resolved for nautilus by backport #44464 and it is not going to backported to Mimic?...
- 02:59 AM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
- ...
- 01:34 AM Bug #48842: qa/standalone/osd/osd-recovery-prio.sh: TEST_recovery_pool_priority failed
- David Zafman wrote:
> This can easily be explained by a slow test machine. The 10 second sleep wasn't enough time t... - 01:34 AM Bug #48842 (In Progress): qa/standalone/osd/osd-recovery-prio.sh: TEST_recovery_pool_priority failed
- 01:12 AM Bug #48786: api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcount2 failed
- Myoungwon Oh wrote:
> OK. I'll take a look
thanks! - 01:10 AM Bug #48786: api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcount2 failed
- OK. I'll take a look
- 12:47 AM Bug #48786: api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcount2 failed
- Myoungwon Oh, can you please take a look.
- 12:39 AM Bug #48786: api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcount2 failed
- ...
- 12:54 AM Bug #48844 (Duplicate): api_watch_notify: LibRadosWatchNotify.AioWatchDelete failed
- ...
- 12:35 AM Bug #48793: out of order op
- /a/jafaj-2021-01-05_16:20:30-rados-wip-jan-testing-2021-01-05-1401-distro-basic-smithi/5756733 with logs
- 12:34 AM Bug #47654: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- /a/jafaj-2021-01-05_16:20:30-rados-wip-jan-testing-2021-01-05-1401-distro-basic-smithi/5756701
01/11/2021
- 11:58 PM Bug #48789 (In Progress): qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
- 07:41 PM Bug #48789 (Triaged): qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
- related to https://github.com/ceph/ceph/pull/38651
- 10:54 PM Bug #48842: qa/standalone/osd/osd-recovery-prio.sh: TEST_recovery_pool_priority failed
This can easily be explained by a slow test machine. The 10 second sleep wasn't enough time to get recovery initia...- 09:35 PM Bug #48842: qa/standalone/osd/osd-recovery-prio.sh: TEST_recovery_pool_priority failed
- ...
- 07:39 PM Bug #48842 (Resolved): qa/standalone/osd/osd-recovery-prio.sh: TEST_recovery_pool_priority failed
- ...
- 09:57 PM Bug #48843 (Resolved): Get more parallel scrubs within osd_max_scrubs limits
When a reservation failure prevents a PG from scrubbing, other possible scrubbable PGs aren't tried.- 07:43 PM Bug #45441: rados: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
- rados/singleton-nomsgr/{all/recovery-unfound-found mon_election/connectivity rados supported-random-distro$/{centos_8...
- 07:32 PM Bug #48841 (Resolved): test_turn_off_module: wait_until_equal timed out
- ...
- 07:10 PM Bug #48840 (Closed): Octopus: Assert failure: test_ceph_osd_pool_create_utf8
- FAIL: test_rados.TestCommand.test_ceph_osd_pool_create_utf8
test_ceph_osd_pool_create_utf8 should also work if a c... - 06:56 PM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
- Myoungwon Oh wrote:
> Hm... I can't find any clues in /a/nojha-2021-01-07_00\:06\:49-rados-master-distro-basic-smith... - 06:09 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
- /ceph/teuthology-archive/yuriw-2021-01-08_16:38:07-rados-wip-yuri4-testing-2021-01-07-1041-octopus-distro-basic-smith...
- 06:08 PM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
- -/ceph/teuthology-archive/yuriw-2021-01-08_16:38:07-rados-wip-yuri4-testing-2021-01-07-1041-octopus-distro-basic-smit...
- 04:06 AM Bug #46323: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) i...
- http://pulpito.front.sepia.ceph.com:80/gregf-2021-01-09_02:02:11-rados-wip-stretch-updates-108-2-distro-basic-smithi/...
- 06:05 PM Bug #48793 (Triaged): out of order op
- details in https://tracker.ceph.com/issues/48777#note-7
- 10:16 AM Bug #48793: out of order op
- http://qa-proxy.ceph.com/teuthology/ideepika-2020-12-18_14:27:53-rados:thrash-erasure-code-master-distro-basic-smith...
- 06:05 PM Bug #48485: osd thrasher timeout
- seems related,adding here will verify later
/ceph/teuthology-archive/yuriw-2021-01-08_16:38:07-rados-wip-yuri4-tes... - 04:50 PM Backport #48482 (Resolved): nautilus: PG::_delete_some isn't optimal iterating objects
- 04:43 PM Backport #48482: nautilus: PG::_delete_some isn't optimal iterating objects
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/38478
merged - 11:51 AM Bug #44595: cache tiering: Error: oid 48 copy_from 493 returned error code -2
- ...
- 10:52 AM Bug #48821: osd crash in OSD::heartbeat when dereferencing null session
- The fix seems just to check that the session pointer is not null before trying to use it. If the problem is not deepe...
- 10:48 AM Bug #48821 (Resolved): osd crash in OSD::heartbeat when dereferencing null session
- For an unhealthy (unstable) cluster with flip-flopping osds we observed crashes like this:...
- 08:00 AM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
- We got a little relief by reducing mon_osdmap_full_prune_min from the default 10,000 to 1,000 but osdmaps still grew ...
- 03:57 AM Bug #48503: scrub stat mismatch on bytes
- http://pulpito.front.sepia.ceph.com:80/gregf-2021-01-09_02:02:11-rados-wip-stretch-updates-108-2-distro-basic-smithi/...
- 03:40 AM Bug #47719: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
- http://pulpito.front.sepia.ceph.com:80/gregf-2021-01-09_02:02:11-rados-wip-stretch-updates-108-2-distro-basic-smithi/...
01/09/2021
- 10:55 PM Bug #46978: OSD: shutdown of a OSD Host causes slow requests
- Hi Manuel,
Would you be able to test a patch for this issue?
If so, what OS and ceph packages/version you run?
... - 04:47 PM Bug #48721: tcmalloc doesn't release memory
- Josh Durgin wrote:
> Those stats show the memory is mostly used by the mon or released by tcmalloc but the kernel ha... - 04:16 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
- Reproduced on another merge cycle. Restarting only the leading mon, waiting 5 minutes and then creating a new epoch r...
- 01:13 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
- Apologies, the leading monitor bit is miss leading. The osdmap data is immediately trimmed the moment the last monito...
- 12:56 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
- We're running Ceph Octopus 15.2.8 with the same problem. Our monitors ran out of space after enabling autoscale as os...
- 06:53 AM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
- Hm... I can't find any clues in /a/nojha-2021-01-07_00\:06\:49-rados-master-distro-basic-smithi/5761073.
Can we repr...
01/08/2021
- 10:31 PM Bug #48721: tcmalloc doesn't release memory
- Those stats show the memory is mostly used by the mon or released by tcmalloc but the kernel hasn't reclaimed it.
... - 10:30 PM Bug #48775 (Duplicate): FAILED ceph_assert(is_primary()) in PG::scrub()
- 10:27 PM Bug #48732 (Need More Info): Marking OSDs out causes mon daemons to crash following tcmalloc: lar...
- It will be great if you can share a reproducer for this or reproduce this capture monitor logs with debugging enabled.
- 10:21 PM Bug #48790: rados/multimon: MON_DOWN in mon_election/connectivity with mon_clock_no_skews
- Greg, could you please take a look and see if my theory makes sense.
- 10:17 PM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
- Myoungwon Oh: I am assigning this to you for more inputs.
- 12:26 AM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
- Xie Xingguo/Myoungwon Oh: this seems to be new regression in master, do you know what could have caused it? I don't s...
- 12:11 AM Bug #48745: Segmentation fault in PrimaryLogPG::cancel_manifest_ops
- /a/nojha-2021-01-07_00:06:49-rados-master-distro-basic-smithi/5761073
- 07:07 PM Bug #48536 (Rejected): ceph tool: osd crush create-or-move cannot accept multiple crush buckets
- I'm dumb; this works just fine with the expected syntax:...
- 12:59 PM Bug #48789: qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
- I suspect this is related to the changes in column family in RocksDB.
(Maybe https://github.com/ceph/ceph/pull/38310... - 03:50 AM Bug #48789: qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
- /a//kchai-2021-01-07_03:01:15-rados-wip-kefu-testing-2021-01-05-2058-distro-basic-smithi/5761661
my branch include... - 12:02 AM Bug #48793 (Resolved): out of order op
- ...
01/07/2021
- 09:06 PM Bug #48789: qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
- /a/nojha-2021-01-07_00:06:49-rados-master-distro-basic-smithi/5761090
- 06:17 PM Bug #48789 (Resolved): qa/standalone/scrub/osd-scrub-snaps.sh: create_scenario: return 1
- ...
- 08:58 PM Bug #48775: FAILED ceph_assert(is_primary()) in PG::scrub()
- @Ronen: /a/nojha-2021-01-07_00:06:49-rados-master-distro-basic-smithi/5760959 has logs
- 08:56 AM Bug #48775: FAILED ceph_assert(is_primary()) in PG::scrub()
- Seems to be the same problem supposedly solved by https://github.com/ceph/ceph/pull/38730.
Verifying.
- 12:25 AM Bug #48775: FAILED ceph_assert(is_primary()) in PG::scrub()
- /a/teuthology-2021-01-05_07:01:02-rados-master-distro-basic-smithi/5755459
- 12:12 AM Bug #48775 (Duplicate): FAILED ceph_assert(is_primary()) in PG::scrub()
- ...
- 08:55 PM Bug #48790: rados/multimon: MON_DOWN in mon_election/connectivity with mon_clock_no_skews
- My first impression is that this is related to election_strategy connectivity.
All mons are in quorum here:
<pr... - 06:56 PM Bug #48790 (New): rados/multimon: MON_DOWN in mon_election/connectivity with mon_clock_no_skews
- rados/multimon/{clusters/9 mon_election/connectivity msgr-failures/many msgr/async-v1only no_pools objectstore/bluest...
- 06:05 PM Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failu...
- rados/thrash-erasure-code-shec/{ceph clusters/{fixed-4 openstack} mon_election/classic msgr-failures/few objectstore/...
- 04:03 PM Bug #48786 (Resolved): api_tier_pp: LibRadosTwoPoolsPP.ManifestSnapRefcount/ManifestSnapRefcount2...
- Run: https://pulpito.ceph.com/teuthology-2021-01-07_05:00:03-smoke-master-distro-basic-smithi/
Job: 5761861
Logs: h... - 12:20 PM Backport #48480 (Resolved): octopus: PG::_delete_some isn't optimal iterating objects
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38477
m... - 12:19 PM Backport #48243 (Resolved): octopus: collection_list_legacy: pg inconsistent
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/38098
m...
01/06/2021
- 04:27 PM Backport #48480: octopus: PG::_delete_some isn't optimal iterating objects
- Igor Fedotov wrote:
> https://github.com/ceph/ceph/pull/38477
merged - 04:24 PM Backport #48243: octopus: collection_list_legacy: pg inconsistent
- Mykola Golub wrote:
> https://github.com/ceph/ceph/pull/38098
merged - 10:20 AM Bug #48764 (New): Octopus: Radosmodel tests fails when trying DeleteOp::_begin()
- ...
- 09:17 AM Bug #45615: api_watch_notify_pp: LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify/1 f...
- /ceph/teuthology-archive/yuriw-2021-01-04_18:28:05-rados-wip-yuri2-testing-2021-01-04-0837-octopus-distro-basic-smith...
- 06:40 AM Bug #47024: rados/test.sh: api_tier_pp LibRadosTwoPoolsPP.ManifestSnapRefcount failed
- /a//kchai-2021-01-06_02:57:51-rados-wip-kefu-testing-2021-01-05-2058-distro-basic-smithi/5758216
01/05/2021
- 03:27 PM Bug #48750 (Resolved): ceph config set using osd/host mask not working
- this does not work, tested with 14.2.9 and 14.2.16:...
01/04/2021
- 07:43 PM Bug #48732: Marking OSDs out causes mon daemons to crash following tcmalloc: large alloc
- This seems related to https://bugzilla.redhat.com/show_bug.cgi?id=1826450 our circumstances are highly similar.
- 05:42 PM Bug #48745 (Resolved): Segmentation fault in PrimaryLogPG::cancel_manifest_ops
- ...
Also available in: Atom