Project

General

Profile

Activity

From 11/21/2021 to 12/20/2021

12/20/2021

05:30 PM Bug #53677 (Resolved): qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
... Neha Ojha
11:51 AM Bug #23827: osd sends op_reply out of order
This bug occurred in my online environment(Nautilus 14.2.5) some days ago and my application exited because client’s ... Ivan Guan
09:10 AM Bug #53667: osd cannot be started after being set to stop
fix in https://github.com/ceph/ceph/pull/44363 changzhi tan
08:55 AM Bug #53667 (Fix Under Review): osd cannot be started after being set to stop
after setting osd stop, osd cannot be pulled up again
[root@controller-2 ~]# ceph osd status
ID HOST US...
changzhi tan

12/19/2021

07:56 PM Bug #44286: Cache tiering shows unfound objects after OSD reboots
the problem still exists on 15.2.15.
I've also got replicated size 3 min_size 2.
the problem occurs only when one O...
marek czardybon
12:50 AM Bug #53663: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
The only "special" settings I can think of are... Christian Rohmann

12/18/2021

11:16 PM Bug #53663 (Duplicate): Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools
On a 4 node Octopus cluster I am randomly seeing batches of scrub errors, as in:... Christian Rohmann

12/17/2021

04:28 PM Bug #53485: monstore: logm entries are not garbage collected
fix is in progress Daniel Poelzleithner
03:07 PM Backport #53660 (Resolved): octopus: mon: "FAILED ceph_assert(session_map.sessions.empty())" when...
https://github.com/ceph/ceph/pull/44544 Backport Bot
03:07 PM Backport #53659 (Resolved): pacific: mon: "FAILED ceph_assert(session_map.sessions.empty())" when...
https://github.com/ceph/ceph/pull/44543 Backport Bot
03:00 PM Bug #39150 (Pending Backport): mon: "FAILED ceph_assert(session_map.sessions.empty())" when out o...
Sage Weil

12/16/2021

11:24 PM Bug #53600 (Rejected): Crash in MOSDPGLog::encode_payload
Brad Hubbard
11:12 PM Bug #53600: Crash in MOSDPGLog::encode_payload
It should be noted there were a whole lot of oom-kill events on this node during the times these crashes occurred. Gi... Brad Hubbard
03:11 AM Bug #53600: Crash in MOSDPGLog::encode_payload
The binaries running when these crashes were seen actually are from this wip branch in the ceph-ci repo.
https://s...
Brad Hubbard
05:55 PM Bug #53485: monstore: logm entries are not garbage collected
I changed the paxos debug level to 20 and fond this in mon store log:... Daniel Poelzleithner
03:36 PM Bug #53485: monstore: logm entries are not garbage collected
We just grew to wopping 80 gb metadata server. I'm out ideas here and don't know how to stop the growth.
Somebody ad...
Daniel Poelzleithner
04:35 PM Backport #53644 (Resolved): pacific: Disable health warning when autoscaler is on
https://github.com/ceph/ceph/pull/45152 Backport Bot
04:33 PM Bug #53516 (Pending Backport): Disable health warning when autoscaler is on
Neha Ojha
03:56 PM Bug #52189: crash in AsyncConnection::maybe_start_delay_thread()
We observed a few more of those crashes. Six of them where just seconds or minutes apart or different osd / hosts eve... Christian Rohmann
03:45 PM Bug #39150 (Fix Under Review): mon: "FAILED ceph_assert(session_map.sessions.empty())" when out o...
Sage Weil

12/15/2021

08:04 AM Bug #52488: Pacific mon won't join Octopus mons
There is the same problem with migrating to Pacific from Nautilus Michael Uleysky

12/14/2021

10:02 PM Bug #50042: rados/test.sh: api_watch_notify failures
... Neha Ojha
09:56 PM Bug #49524: ceph_test_rados_delete_pools_parallel didn't start
... Neha Ojha
12:31 PM Bug #50657 (Resolved): smart query on monitors
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
12:29 PM Bug #52583 (Resolved): partial recovery become whole object recovery after restart osd
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
12:23 PM Backport #52450 (Resolved): pacific: smart query on monitors
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44164
m...
Loïc Dachary
12:22 PM Backport #52451 (Resolved): octopus: smart query on monitors
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44177
m...
Loïc Dachary
12:20 PM Backport #51149 (Resolved): octopus: When read failed, ret can not take as data len, in FillInVer...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44174
m...
Loïc Dachary
12:20 PM Backport #51171 (Resolved): octopus: regression in ceph daemonperf command output, osd columns ar...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44176
m...
Loïc Dachary
12:20 PM Backport #52710 (Resolved): octopus: partial recovery become whole object recovery after restart osd
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44165
m...
Loïc Dachary
12:20 PM Backport #53389 (Resolved): octopus: pg-temp entries are not cleared for PGs that no longer exist
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/44097
m...
Loïc Dachary
08:37 AM Bug #53600 (Rejected): Crash in MOSDPGLog::encode_payload
3 OSDs crashed on the gibba cluster. All the OSDs were a part of gibba045 node.
*Observations:*
- osd.15 and os...
Sridhar Seshasayee
01:22 AM Bug #53584: FAILED ceph_assert(pop.data.length() == sinfo.aligned_logical_offset_to_chunk_offset(...
Neha Ojha wrote:
> ..., it seems like you have "enough copies available" to remove the problematic OSD but we won't ...
玮文 胡

12/13/2021

10:56 PM Bug #52416 (Fix Under Review): devices: mon devices appear empty when scraping SMART metrics
Neha Ojha
10:48 PM Bug #53575 (Rejected): Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64
We could suppress this but since it is not coming from the Ceph code, rejecting it. Neha Ojha
10:41 PM Bug #53584 (Need More Info): FAILED ceph_assert(pop.data.length() == sinfo.aligned_logical_offset...
Can you provide OSD logs for the PG that is crashing (from all the shards)? From the error logs, it seems like you ha... Neha Ojha
10:08 AM Bug #53593: RBD cloned image is slow in 4k write with "waiting for rw locks"
[Observed Poor Performance]
On a rbd image, we found the 4k write IOPS is much lower than expected.
I understood th...
Cuicui Zhao
10:05 AM Bug #53593 (Pending Backport): RBD cloned image is slow in 4k write with "waiting for rw locks"
h1. [Observed Poor Performance]
On a rbd image, we found the 4k write IOPS is much lower than expected.
I understoo...
Cuicui Zhao

12/12/2021

01:39 PM Bug #53586 (New): rocksdb: build error with rocksdb-6.25.x
Here we go, again, same bug as in #52415, affects all attempt to build ceph-16.2.7 against rocksdb-6.25-*
Cheers,
...
chris denice
08:49 AM Bug #53584 (Need More Info): FAILED ceph_assert(pop.data.length() == sinfo.aligned_logical_offset...
... 玮文 胡

12/11/2021

04:15 PM Backport #51149: octopus: When read failed, ret can not take as data len, in FillInVerifyExtent
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44174
meged
Yuri Weinstein

12/10/2021

11:46 PM Backport #51171: octopus: regression in ceph daemonperf command output, osd columns aren't visibl...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44176
merged
Yuri Weinstein
11:43 PM Backport #52710: octopus: partial recovery become whole object recovery after restart osd
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44165
merged
Yuri Weinstein
11:43 PM Backport #53389: octopus: pg-temp entries are not cleared for PGs that no longer exist
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44097
merged
Yuri Weinstein
09:16 PM Bug #53516 (Fix Under Review): Disable health warning when autoscaler is on
Neha Ojha
06:03 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
... Neha Ojha

12/09/2021

11:06 PM Bug #52136: Valgrind reports memory "Leak_DefinitelyLost" errors.
/a/yuriw-2021-12-09_00:18:57-rados-wip-yuri-testing-2021-12-08-1336-distro-default-smithi/6553724/ ----> osd.1.log.gz Laura Flores
09:38 PM Bug #53575 (Resolved): Valgrind reports memory "Leak_PossiblyLost" errors concerning lib64
Found in /a/yuriw-2021-12-09_00:18:57-rados-wip-yuri-testing-2021-12-08-1336-distro-default-smithi/6553724
The fol...
Laura Flores
04:32 PM Backport #53549 (In Progress): nautilus: [RFE] Provide warning when the 'require-osd-release' fla...
Sridhar Seshasayee
01:43 PM Backport #53550 (In Progress): octopus: [RFE] Provide warning when the 'require-osd-release' flag...
Sridhar Seshasayee
12:53 PM Backport #53551 (In Progress): pacific: [RFE] Provide warning when the 'require-osd-release' flag...
Sridhar Seshasayee

12/08/2021

09:15 PM Backport #53551 (Resolved): pacific: [RFE] Provide warning when the 'require-osd-release' flag do...
https://github.com/ceph/ceph/pull/44259 Backport Bot
09:15 PM Backport #53550 (Resolved): octopus: [RFE] Provide warning when the 'require-osd-release' flag do...
https://github.com/ceph/ceph/pull/44260 Backport Bot
09:15 PM Backport #53549 (Rejected): nautilus: [RFE] Provide warning when the 'require-osd-release' flag d...
https://github.com/ceph/ceph/pull/44263 Backport Bot
09:13 PM Feature #51984 (Pending Backport): [RFE] Provide warning when the 'require-osd-release' flag does...
Neha Ojha
07:08 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
/a/yuriw-2021-12-07_16:04:59-rados-wip-yuri5-testing-2021-12-06-1619-distro-default-smithi/6551120
pg map right be...
Neha Ojha
06:49 PM Bug #53544 (New): src/test/osd/RadosModel.h: ceph_abort_msg("racing read got wrong version") in t...
... Neha Ojha
03:30 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
/a/yuriw-2021-12-07_16:02:55-rados-wip-yuri11-testing-2021-12-06-1619-distro-default-smithi/6550873 Sridhar Seshasayee
12:15 PM Backport #53535 (Resolved): pacific: mon: mgrstatmonitor spams mgr with service_map
https://github.com/ceph/ceph/pull/44721 Backport Bot
12:15 PM Backport #53534 (Resolved): octopus: mon: mgrstatmonitor spams mgr with service_map
https://github.com/ceph/ceph/pull/44722 Backport Bot
12:10 PM Bug #53479 (Pending Backport): mon: mgrstatmonitor spams mgr with service_map
Sage Weil

12/07/2021

09:27 PM Bug #53516 (Resolved): Disable health warning when autoscaler is on
the command:
ceph health detail
displays a warning when a pool has many more objects per pg than other pools. Thi...
Christopher Hoffman

12/06/2021

10:05 PM Backport #53507 (Duplicate): pacific: ceph -s mon quorum age negative number
Backport Bot
10:03 PM Bug #53306 (Pending Backport): ceph -s mon quorum age negative number
Needs to be included in https://github.com/ceph/ceph/pull/43698 Neha Ojha
08:42 PM Backport #52450: pacific: smart query on monitors
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44164
merged
Yuri Weinstein
06:13 PM Bug #53506 (Fix Under Review): mon: frequent cpu_tp had timed out messages
Sage Weil
06:06 PM Bug #53506 (Closed): mon: frequent cpu_tp had timed out messages
... Sage Weil
11:06 AM Bug #52416: devices: mon devices appear empty when scraping SMART metrics
If `ceph-mon` runs as a systemd unit, check if `PrivateDevices=yes` in `/lib/systemd/system/ceph-mon@.service`; if so... Benoît Knecht
10:30 AM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
Ist Gab wrote:
> Igor Fedotov wrote:
> > …
>
> Igor, do you think if we put a super fast 2-4TB write optimized n...
Igor Fedotov
09:14 AM Bug #52189: crash in AsyncConnection::maybe_start_delay_thread()
Neha Ojha wrote:
> We'll need more information to debug a crash like this.
@Nea, we observed another one of the...
Christian Rohmann
08:49 AM Bug #51307: LibRadosWatchNotify.Watch2Delete fails
/a/yuriw-2021-12-03_15:27:18-rados-wip-yuri11-testing-2021-12-02-1451-distro-default-smithi/6542889... Sridhar Seshasayee
08:25 AM Bug #53500: rte_eal_init fail will waiting forever
r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /...
chunsong feng
08:20 AM Bug #53500 (New): rte_eal_init fail will waiting forever
The rte_eal_init returns a failure message and does not wake up the waiting msgr-worker thread. As a result, the wait... chunsong feng

12/03/2021

09:02 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
Igor Fedotov wrote:
> …
Igor, do you think if we put a super fast 2-4TB write optimized nvme in front of each 15....
Ist Gab
01:16 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
Ist Gab wrote:
> Igor Fedotov wrote:
>
> > Right - PG removal/moving are the primary cause of bulk data removals....
Igor Fedotov
12:43 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
Igor Fedotov wrote:
> Right - PG removal/moving are the primary cause of bulk data removals. We're working on impr...
Ist Gab
12:39 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
Igor Fedotov wrote:
> So if compaction provides some relief (at least temporarily) - I would suggest running periodi...
Ist Gab
12:31 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
Ist Gab wrote:
> Most likely this is related to this pg delete/movement things because after the pg increase the c...
Igor Fedotov
12:12 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
Igor Fedotov wrote:
> In my opinion this issue is caused by a well-known problem with RocksDB performance degradatio...
Ist Gab
11:12 AM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
In my opinion this issue is caused by a well-known problem with RocksDB performance degradation after bulk data remov... Igor Fedotov
05:05 PM Backport #53486 (In Progress): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
Neha Ojha
01:19 PM Backport #53486: pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
https://github.com/ceph/ceph/pull/44202 Myoungwon Oh
12:25 PM Backport #53486 (Resolved): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
https://github.com/ceph/ceph/pull/44202 Backport Bot
12:20 PM Bug #52872 (Pending Backport): LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
Myoungwon Oh
12:20 PM Bug #53485 (Fix Under Review): monstore: logm entries are not garbage collected
We had to run a ceph cluster with a damaged cephfs for a while that got deleted already. We suspect this was the culp... Daniel Poelzleithner
01:56 AM Bug #53481 (New): rte_exit can't exit when call it in dpdk thread

(gdb) info thr
Id Target Id Frame
* 1 Thread 0xfffc1ba26100 (LW...
chunsong feng

12/02/2021

11:36 PM Backport #53480 (Resolved): pacific: Segmentation fault under Pacific 16.2.1 when using a custom ...
https://github.com/ceph/ceph/pull/44897 Backport Bot
11:33 PM Bug #50659 (Pending Backport): Segmentation fault under Pacific 16.2.1 when using a custom crush ...
Neha Ojha
11:31 PM Bug #52872: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
Myoungwon Oh: should we backport this? please update the status accordingly. Neha Ojha
11:14 PM Bug #53479 (Fix Under Review): mon: mgrstatmonitor spams mgr with service_map
Sage Weil
10:46 PM Bug #53479 (Pending Backport): mon: mgrstatmonitor spams mgr with service_map
... Sage Weil
08:39 PM Bug #53138: cluster [WRN] Health check failed: Degraded data redundancy: 3/1164 objects degrade...
@Neha I am seeing these failures more than usual, maybe we might be having performance regression, if not, can we inc... Deepika Upadhyay
08:34 PM Backport #50274 (In Progress): pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log()....
Neha Ojha
08:20 PM Bug #51652: heartbeat timeouts on filestore OSDs while deleting objects in upgrade:pacific-p2p-pa...
/a/yuriw-2021-11-28_15:43:54-upgrade:pacific-p2p-pacific-16.2.7_RC1-distro-default-smithi/6531998 Neha Ojha
02:26 PM Support #51609: OSD refuses to start (OOMK) due to pg split
Tor Martin Ølberg wrote:
> Tor Martin Ølberg wrote:
> > After an upgrade to 15.2.13 from 15.2.4 my small home lab c...
Igor Dell
07:28 AM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
https://github.com/ceph/ceph/pull/44181 Myoungwon Oh

12/01/2021

08:57 PM Bug #53454 (New): nautilus: MInfoRec in Started/ToDelete/WaitDeleteReseved causes state machine c...
... Neha Ojha
08:24 PM Backport #52451 (In Progress): octopus: smart query on monitors
Cory Snyder
08:14 PM Backport #51171 (In Progress): octopus: regression in ceph daemonperf command output, osd columns...
Cory Snyder
08:14 PM Backport #51172 (In Progress): pacific: regression in ceph daemonperf command output, osd columns...
Cory Snyder
08:12 PM Backport #51149 (In Progress): octopus: When read failed, ret can not take as data len, in FillIn...
Cory Snyder
08:12 PM Backport #51150 (In Progress): pacific: When read failed, ret can not take as data len, in FillIn...
Cory Snyder
07:38 PM Backport #52710 (In Progress): octopus: partial recovery become whole object recovery after resta...
Cory Snyder
07:05 PM Backport #52450 (In Progress): pacific: smart query on monitors
Cory Snyder
06:21 PM Bug #52261: OSD takes all memory and crashes, after pg_num increase
Aldo Briessmann wrote:
> Hi, same issue here on a cluster with ceph 16.2.4-r2 on Gentoo. Moving the cluster with the...
Neha Ojha
06:16 PM Bug #52261: OSD takes all memory and crashes, after pg_num increase
Hi, same issue here on a cluster with ceph 16.2.4-r2 on Gentoo. Moving the cluster with the in-progress PG split to 1... Aldo Briessmann
02:30 AM Bug #50192: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_missing(soi...
Needs a pacific backport, showed up in pacific... Neha Ojha

11/30/2021

03:45 AM Support #53432 (Resolved): How to use and optimize ceph dpdk
Write a CEPH DPDK enabling guide and place it in doc/dev. The document contains the following contents:
1. Compilati...
chunsong feng

11/29/2021

11:19 AM Bug #53237 (Resolved): mon: stretch mode blocks kernel clients from connecting
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
11:19 AM Bug #53258 (Resolved): mon: should always display disallowed leaders when set
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
11:17 AM Backport #53259 (Resolved): pacific: mon: should always display disallowed leaders when set
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43972
m...
Loïc Dachary
11:17 AM Backport #53239 (Resolved): pacific: mon: stretch mode blocks kernel clients from connecting
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43971
m...
Loïc Dachary

11/26/2021

10:54 AM Bug #52867 (New): pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:...
moving over to rados Sebastian Wagner

11/24/2021

05:29 PM Bug #53308: pg-temp entries are not cleared for PGs that no longer exist
That makes sense to me, thanks Neha! Cory Snyder
05:15 PM Bug #53308 (Pending Backport): pg-temp entries are not cleared for PGs that no longer exist
Cory, I am marking this for backport to octopus and pacific, makes sense to you? Neha Ojha
05:29 PM Backport #53389 (In Progress): octopus: pg-temp entries are not cleared for PGs that no longer exist
Cory Snyder
05:20 PM Backport #53389 (Resolved): octopus: pg-temp entries are not cleared for PGs that no longer exist
https://github.com/ceph/ceph/pull/44097 Backport Bot
05:29 PM Backport #53388 (In Progress): pacific: pg-temp entries are not cleared for PGs that no longer exist
Cory Snyder
05:20 PM Backport #53388 (Resolved): pacific: pg-temp entries are not cleared for PGs that no longer exist
https://github.com/ceph/ceph/pull/44096 Backport Bot
03:50 PM Feature #51984 (Fix Under Review): [RFE] Provide warning when the 'require-osd-release' flag does...
Sridhar Seshasayee

11/23/2021

01:53 PM Bug #44286: Cache tiering shows unfound objects after OSD reboots
Update: Also happens with 16.2.5 :-( Jan-Philipp Litza
01:16 PM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
New instance seen in below pacific run:
http://pulpito.front.sepia.ceph.com/yuriw-2021-11-20_20:20:29-fs-wip-yuri6...
Kotresh Hiremath Ravishankar
10:54 AM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
Seems to be the same problem in:
http://pulpito.front.sepia.ceph.com/yuriw-2021-11-20_18:00:22-rados-wip-yuri6-testi...
Ronen Friedman
07:40 AM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
/a/yuriw-2021-11-20_18:01:41-rados-wip-yuri8-testing-2021-11-20-0807-distro-basic-smithi/6516396 Aishwarya Mathuria

11/22/2021

08:29 PM Feature #21579 (Resolved): [RFE] Stop OSD's removal if the OSD's are part of inactive PGs
Vikhyat Umrao
07:11 PM Feature #51984: [RFE] Provide warning when the 'require-osd-release' flag does not match current ...
I am providing the history of PRs and commits that resulted in
the loss/removal of the checks for 'require-osd-relea...
Sridhar Seshasayee
06:45 PM Bug #53306 (Fix Under Review): ceph -s mon quorum age negative number
Sage Weil
 

Also available in: Atom