Activity
From 11/21/2021 to 12/20/2021
12/20/2021
- 05:30 PM Bug #53677 (Resolved): qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
- ...
- 11:51 AM Bug #23827: osd sends op_reply out of order
- This bug occurred in my online environment(Nautilus 14.2.5) some days ago and my application exited because client’s ...
- 09:10 AM Bug #53667: osd cannot be started after being set to stop
- fix in https://github.com/ceph/ceph/pull/44363
- 08:55 AM Bug #53667 (Fix Under Review): osd cannot be started after being set to stop
- after setting osd stop, osd cannot be pulled up again
[root@controller-2 ~]# ceph osd status
ID HOST US...
12/19/2021
- 07:56 PM Bug #44286: Cache tiering shows unfound objects after OSD reboots
- the problem still exists on 15.2.15.
I've also got replicated size 3 min_size 2.
the problem occurs only when one O... - 12:50 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- The only "special" settings I can think of are...
12/18/2021
- 11:16 PM Bug #53663 (Duplicate): Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
- On a 4 node Octopus cluster I am randomly seeing batches of scrub errors, as in:...
12/17/2021
- 04:28 PM Bug #53485: monstore: logm entries are not garbage collected
- fix is in progress
- 03:07 PM Backport #53660 (Resolved): octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" when...
- https://github.com/ceph/ceph/pull/44544
- 03:07 PM Backport #53659 (Resolved): pacific: mon: "FAILED ceph_assert(session_map.sessions.empty())" when...
- https://github.com/ceph/ceph/pull/44543
- 03:00 PM Bug #39150 (Pending Backport): mon: "FAILED ceph_assert(session_map.sessions.empty())" when out o...
12/16/2021
- 11:24 PM Bug #53600 (Rejected): Crash in MOSDPGLog::encode_payload
- 11:12 PM Bug #53600: Crash in MOSDPGLog::encode_payload
- It should be noted there were a whole lot of oom-kill events on this node during the times these crashes occurred. Gi...
- 03:11 AM Bug #53600: Crash in MOSDPGLog::encode_payload
- The binaries running when these crashes were seen actually are from this wip branch in the ceph-ci repo.
https://s... - 05:55 PM Bug #53485: monstore: logm entries are not garbage collected
- I changed the paxos debug level to 20 and fond this in mon store log:...
- 03:36 PM Bug #53485: monstore: logm entries are not garbage collected
- We just grew to wopping 80 gb metadata server. I'm out ideas here and don't know how to stop the growth.
Somebody ad... - 04:35 PM Backport #53644 (Resolved): pacific: Disable health warning when autoscaler is on
- https://github.com/ceph/ceph/pull/45152
- 04:33 PM Bug #53516 (Pending Backport): Disable health warning when autoscaler is on
- 03:56 PM Bug #52189: crash in AsyncConnection::maybe_start_delay_thread()
- We observed a few more of those crashes. Six of them where just seconds or minutes apart or different osd / hosts eve...
- 03:45 PM Bug #39150 (Fix Under Review): mon: "FAILED ceph_assert(session_map.sessions.empty())" when out o...
12/15/2021
- 08:04 AM Bug #52488: Pacific mon won't join Octopus mons
- There is the same problem with migrating to Pacific from Nautilus
12/14/2021
- 10:02 PM Bug #50042: rados/test.sh: api_watch_notify failures
- ...
- 09:56 PM Bug #49524: ceph_test_rados_delete_pools_parallel didn't start
- ...
- 12:31 PM Bug #50657 (Resolved): smart query on monitors
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 12:29 PM Bug #52583 (Resolved): partial recovery become whole object recovery after restart osd
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 12:23 PM Backport #52450 (Resolved): pacific: smart query on monitors
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44164
m... - 12:22 PM Backport #52451 (Resolved): octopus: smart query on monitors
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44177
m... - 12:20 PM Backport #51149 (Resolved): octopus: When read failed, ret can not take as data len, in FillInVer...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44174
m... - 12:20 PM Backport #51171 (Resolved): octopus: regression in ceph daemonperf command output, osd columns ar...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44176
m... - 12:20 PM Backport #52710 (Resolved): octopus: partial recovery become whole object recovery after restart osd
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44165
m... - 12:20 PM Backport #53389 (Resolved): octopus: pg-temp entries are not cleared for PGs that no longer exist
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44097
m... - 08:37 AM Bug #53600 (Rejected): Crash in MOSDPGLog::encode_payload
- 3 OSDs crashed on the gibba cluster. All the OSDs were a part of gibba045 node.
*Observations:*
- osd.15 and os... - 01:22 AM Bug #53584: FAILED ceph_assert(pop.data.length() == sinfo.aligned_logical_offset_to_chunk_offset(...
- Neha Ojha wrote:
> ..., it seems like you have "enough copies available" to remove the problematic OSD but we won't ...
12/13/2021
- 10:56 PM Bug #52416 (Fix Under Review): devices: mon devices appear empty when scraping SMART metrics
- 10:48 PM Bug #53575 (Rejected): Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64
- We could suppress this but since it is not coming from the Ceph code, rejecting it.
- 10:41 PM Bug #53584 (Need More Info): FAILED ceph_assert(pop.data.length() == sinfo.aligned_logical_offset...
- Can you provide OSD logs for the PG that is crashing (from all the shards)? From the error logs, it seems like you ha...
- 10:08 AM Bug #53593: RBD cloned image is slow in 4k write with "waiting for rw locks"
- [Observed Poor Performance]
On a rbd image, we found the 4k write IOPS is much lower than expected.
I understood th... - 10:05 AM Bug #53593 (Pending Backport): RBD cloned image is slow in 4k write with "waiting for rw locks"
- h1. [Observed Poor Performance]
On a rbd image, we found the 4k write IOPS is much lower than expected.
I understoo...
12/12/2021
- 01:39 PM Bug #53586 (New): rocksdb: build error with rocksdb-6.25.x
- Here we go, again, same bug as in #52415, affects all attempt to build ceph-16.2.7 against rocksdb-6.25-*
Cheers,
... - 08:49 AM Bug #53584 (Need More Info): FAILED ceph_assert(pop.data.length() == sinfo.aligned_logical_offset...
- ...
12/11/2021
- 04:15 PM Backport #51149: octopus: When read failed, ret can not take as data len, in FillInVerifyExtent
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44174
meged
12/10/2021
- 11:46 PM Backport #51171: octopus: regression in ceph daemonperf command output, osd columns aren't visibl...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44176
merged - 11:43 PM Backport #52710: octopus: partial recovery become whole object recovery after restart osd
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44165
merged - 11:43 PM Backport #53389: octopus: pg-temp entries are not cleared for PGs that no longer exist
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44097
merged - 09:16 PM Bug #53516 (Fix Under Review): Disable health warning when autoscaler is on
- 06:03 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
- ...
12/09/2021
- 11:06 PM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
- /a/yuriw-2021-12-09_00:18:57-rados-wip-yuri-testing-2021-12-08-1336-distro-default-smithi/6553724/ ----> osd.1.log.gz
- 09:38 PM Bug #53575 (Resolved): Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64
- Found in /a/yuriw-2021-12-09_00:18:57-rados-wip-yuri-testing-2021-12-08-1336-distro-default-smithi/6553724
The fol... - 04:32 PM Backport #53549 (In Progress): nautilus: [RFE] Provide warning when the 'require-osd-release' fla...
- 01:43 PM Backport #53550 (In Progress): octopus: [RFE] Provide warning when the 'require-osd-release' flag...
- 12:53 PM Backport #53551 (In Progress): pacific: [RFE] Provide warning when the 'require-osd-release' flag...
12/08/2021
- 09:15 PM Backport #53551 (Resolved): pacific: [RFE] Provide warning when the 'require-osd-release' flag do...
- https://github.com/ceph/ceph/pull/44259
- 09:15 PM Backport #53550 (Resolved): octopus: [RFE] Provide warning when the 'require-osd-release' flag do...
- https://github.com/ceph/ceph/pull/44260
- 09:15 PM Backport #53549 (Rejected): nautilus: [RFE] Provide warning when the 'require-osd-release' flag d...
- https://github.com/ceph/ceph/pull/44263
- 09:13 PM Feature #51984 (Pending Backport): [RFE] Provide warning when the 'require-osd-release' flag does...
- 07:08 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- /a/yuriw-2021-12-07_16:04:59-rados-wip-yuri5-testing-2021-12-06-1619-distro-default-smithi/6551120
pg map right be... - 06:49 PM Bug #53544 (New): src/test/osd/RadosModel.h: ceph_abort_msg("racing read got wrong version") in t...
- ...
- 03:30 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2021-12-07_16:02:55-rados-wip-yuri11-testing-2021-12-06-1619-distro-default-smithi/6550873
- 12:15 PM Backport #53535 (Resolved): pacific: mon: mgrstatmonitor spams mgr with service_map
- https://github.com/ceph/ceph/pull/44721
- 12:15 PM Backport #53534 (Resolved): octopus: mon: mgrstatmonitor spams mgr with service_map
- https://github.com/ceph/ceph/pull/44722
- 12:10 PM Bug #53479 (Pending Backport): mon: mgrstatmonitor spams mgr with service_map
12/07/2021
- 09:27 PM Bug #53516 (Resolved): Disable health warning when autoscaler is on
- the command:
ceph health detail
displays a warning when a pool has many more objects per pg than other pools. Thi...
12/06/2021
- 10:05 PM Backport #53507 (Duplicate): pacific: ceph -s mon quorum age negative number
- 10:03 PM Bug #53306 (Pending Backport): ceph -s mon quorum age negative number
- Needs to be included in https://github.com/ceph/ceph/pull/43698
- 08:42 PM Backport #52450: pacific: smart query on monitors
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44164
merged - 06:13 PM Bug #53506 (Fix Under Review): mon: frequent cpu_tp had timed out messages
- 06:06 PM Bug #53506 (Closed): mon: frequent cpu_tp had timed out messages
- ...
- 11:06 AM Bug #52416: devices: mon devices appear empty when scraping SMART metrics
- If `ceph-mon` runs as a systemd unit, check if `PrivateDevices=yes` in `/lib/systemd/system/ceph-mon@.service`; if so...
- 10:30 AM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Ist Gab wrote:
> Igor Fedotov wrote:
> > …
>
> Igor, do you think if we put a super fast 2-4TB write optimized n... - 09:14 AM Bug #52189: crash in AsyncConnection::maybe_start_delay_thread()
- Neha Ojha wrote:
> We'll need more information to debug a crash like this.
@Nea, we observed another one of the... - 08:49 AM Bug #51307: LibRadosWatchNotify.Watch2Delete fails
- /a/yuriw-2021-12-03_15:27:18-rados-wip-yuri11-testing-2021-12-02-1451-distro-default-smithi/6542889...
- 08:25 AM Bug #53500: rte_eal_init fail will waiting forever
- r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /... - 08:20 AM Bug #53500 (New): rte_eal_init fail will waiting forever
- The rte_eal_init returns a failure message and does not wake up the waiting msgr-worker thread. As a result, the wait...
12/03/2021
- 09:02 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Igor Fedotov wrote:
> …
Igor, do you think if we put a super fast 2-4TB write optimized nvme in front of each 15.... - 01:16 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Ist Gab wrote:
> Igor Fedotov wrote:
>
> > Right - PG removal/moving are the primary cause of bulk data removals.... - 12:43 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Igor Fedotov wrote:
> Right - PG removal/moving are the primary cause of bulk data removals. We're working on impr... - 12:39 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Igor Fedotov wrote:
> So if compaction provides some relief (at least temporarily) - I would suggest running periodi... - 12:31 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Ist Gab wrote:
> Most likely this is related to this pg delete/movement things because after the pg increase the c... - 12:12 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Igor Fedotov wrote:
> In my opinion this issue is caused by a well-known problem with RocksDB performance degradatio... - 11:12 AM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- In my opinion this issue is caused by a well-known problem with RocksDB performance degradation after bulk data remov...
- 05:05 PM Backport #53486 (In Progress): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- 01:19 PM Backport #53486: pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- https://github.com/ceph/ceph/pull/44202
- 12:25 PM Backport #53486 (Resolved): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- https://github.com/ceph/ceph/pull/44202
- 12:20 PM Bug #52872 (Pending Backport): LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- 12:20 PM Bug #53485 (Fix Under Review): monstore: logm entries are not garbage collected
- We had to run a ceph cluster with a damaged cephfs for a while that got deleted already. We suspect this was the culp...
- 01:56 AM Bug #53481 (New): rte_exit can't exit when call it in dpdk thread
(gdb) info thr
Id Target Id Frame
* 1 Thread 0xfffc1ba26100 (LW...
12/02/2021
- 11:36 PM Backport #53480 (Resolved): pacific: Segmentation fault under Pacific 16.2.1 when using a custom ...
- https://github.com/ceph/ceph/pull/44897
- 11:33 PM Bug #50659 (Pending Backport): Segmentation fault under Pacific 16.2.1 when using a custom crush ...
- 11:31 PM Bug #52872: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- Myoungwon Oh: should we backport this? please update the status accordingly.
- 11:14 PM Bug #53479 (Fix Under Review): mon: mgrstatmonitor spams mgr with service_map
- 10:46 PM Bug #53479 (Pending Backport): mon: mgrstatmonitor spams mgr with service_map
- ...
- 08:39 PM Bug #53138: cluster [WRN] Health check failed: Degraded data redundancy: 3/1164 objects degrade...
- @Neha I am seeing these failures more than usual, maybe we might be having performance regression, if not, can we inc...
- 08:34 PM Backport #50274 (In Progress): pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log()....
- 08:20 PM Bug #51652: heartbeat timeouts on filestore OSDs while deleting objects in upgrade:pacific-p2p-pa...
- /a/yuriw-2021-11-28_15:43:54-upgrade:pacific-p2p-pacific-16.2.7_RC1-distro-default-smithi/6531998
- 02:26 PM Support #51609: OSD refuses to start (OOMK) due to pg split
- Tor Martin Ølberg wrote:
> Tor Martin Ølberg wrote:
> > After an upgrade to 15.2.13 from 15.2.4 my small home lab c... - 07:28 AM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- https://github.com/ceph/ceph/pull/44181
12/01/2021
- 08:57 PM Bug #53454 (New): nautilus: MInfoRec in Started/ToDelete/WaitDeleteReseved causes state machine c...
- ...
- 08:24 PM Backport #52451 (In Progress): octopus: smart query on monitors
- 08:14 PM Backport #51171 (In Progress): octopus: regression in ceph daemonperf command output, osd columns...
- 08:14 PM Backport #51172 (In Progress): pacific: regression in ceph daemonperf command output, osd columns...
- 08:12 PM Backport #51149 (In Progress): octopus: When read failed, ret can not take as data len, in FillIn...
- 08:12 PM Backport #51150 (In Progress): pacific: When read failed, ret can not take as data len, in FillIn...
- 07:38 PM Backport #52710 (In Progress): octopus: partial recovery become whole object recovery after resta...
- 07:05 PM Backport #52450 (In Progress): pacific: smart query on monitors
- 06:21 PM Bug #52261: OSD takes all memory and crashes, after pg_num increase
- Aldo Briessmann wrote:
> Hi, same issue here on a cluster with ceph 16.2.4-r2 on Gentoo. Moving the cluster with the... - 06:16 PM Bug #52261: OSD takes all memory and crashes, after pg_num increase
- Hi, same issue here on a cluster with ceph 16.2.4-r2 on Gentoo. Moving the cluster with the in-progress PG split to 1...
- 02:30 AM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
- Needs a pacific backport, showed up in pacific...
11/30/2021
- 03:45 AM Support #53432 (Resolved): How to use and optimize ceph dpdk
- Write a CEPH DPDK enabling guide and place it in doc/dev. The document contains the following contents:
1. Compilati...
11/29/2021
- 11:19 AM Bug #53237 (Resolved): mon: stretch mode blocks kernel clients from connecting
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 11:19 AM Bug #53258 (Resolved): mon: should always display disallowed leaders when set
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 11:17 AM Backport #53259 (Resolved): pacific: mon: should always display disallowed leaders when set
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43972
m... - 11:17 AM Backport #53239 (Resolved): pacific: mon: stretch mode blocks kernel clients from connecting
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43971
m...
11/26/2021
- 10:54 AM Bug #52867 (New): pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:...
- moving over to rados
11/24/2021
- 05:29 PM Bug #53308: pg-temp entries are not cleared for PGs that no longer exist
- That makes sense to me, thanks Neha!
- 05:15 PM Bug #53308 (Pending Backport): pg-temp entries are not cleared for PGs that no longer exist
- Cory, I am marking this for backport to octopus and pacific, makes sense to you?
- 05:29 PM Backport #53389 (In Progress): octopus: pg-temp entries are not cleared for PGs that no longer exist
- 05:20 PM Backport #53389 (Resolved): octopus: pg-temp entries are not cleared for PGs that no longer exist
- https://github.com/ceph/ceph/pull/44097
- 05:29 PM Backport #53388 (In Progress): pacific: pg-temp entries are not cleared for PGs that no longer exist
- 05:20 PM Backport #53388 (Resolved): pacific: pg-temp entries are not cleared for PGs that no longer exist
- https://github.com/ceph/ceph/pull/44096
- 03:50 PM Feature #51984 (Fix Under Review): [RFE] Provide warning when the 'require-osd-release' flag does...
11/23/2021
- 01:53 PM Bug #44286: Cache tiering shows unfound objects after OSD reboots
- Update: Also happens with 16.2.5 :-(
- 01:16 PM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- New instance seen in below pacific run:
http://pulpito.front.sepia.ceph.com/yuriw-2021-11-20_20:20:29-fs-wip-yuri6... - 10:54 AM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
- Seems to be the same problem in:
http://pulpito.front.sepia.ceph.com/yuriw-2021-11-20_18:00:22-rados-wip-yuri6-testi... - 07:40 AM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- /a/yuriw-2021-11-20_18:01:41-rados-wip-yuri8-testing-2021-11-20-0807-distro-basic-smithi/6516396
11/22/2021
- 08:29 PM Feature #21579 (Resolved): [RFE] Stop OSD's removal if the OSD's are part of inactive PGs
- 07:11 PM Feature #51984: [RFE] Provide warning when the 'require-osd-release' flag does not match current ...
- I am providing the history of PRs and commits that resulted in
the loss/removal of the checks for 'require-osd-relea... - 06:45 PM Bug #53306 (Fix Under Review): ceph -s mon quorum age negative number
Also available in: Atom