Activity
From 11/01/2021 to 11/30/2021
11/30/2021
- 03:45 AM Support #53432 (Resolved): How to use and optimize ceph dpdk
- Write a CEPH DPDK enabling guide and place it in doc/dev. The document contains the following contents:
1. Compilati...
11/29/2021
- 11:19 AM Bug #53237 (Resolved): mon: stretch mode blocks kernel clients from connecting
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 11:19 AM Bug #53258 (Resolved): mon: should always display disallowed leaders when set
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 11:17 AM Backport #53259 (Resolved): pacific: mon: should always display disallowed leaders when set
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43972
m... - 11:17 AM Backport #53239 (Resolved): pacific: mon: stretch mode blocks kernel clients from connecting
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43971
m...
11/26/2021
- 10:54 AM Bug #52867 (New): pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:...
- moving over to rados
11/24/2021
- 05:29 PM Bug #53308: pg-temp entries are not cleared for PGs that no longer exist
- That makes sense to me, thanks Neha!
- 05:15 PM Bug #53308 (Pending Backport): pg-temp entries are not cleared for PGs that no longer exist
- Cory, I am marking this for backport to octopus and pacific, makes sense to you?
- 05:29 PM Backport #53389 (In Progress): octopus: pg-temp entries are not cleared for PGs that no longer exist
- 05:20 PM Backport #53389 (Resolved): octopus: pg-temp entries are not cleared for PGs that no longer exist
- https://github.com/ceph/ceph/pull/44097
- 05:29 PM Backport #53388 (In Progress): pacific: pg-temp entries are not cleared for PGs that no longer exist
- 05:20 PM Backport #53388 (Resolved): pacific: pg-temp entries are not cleared for PGs that no longer exist
- https://github.com/ceph/ceph/pull/44096
- 03:50 PM Feature #51984 (Fix Under Review): [RFE] Provide warning when the 'require-osd-release' flag does...
11/23/2021
- 01:53 PM Bug #44286: Cache tiering shows unfound objects after OSD reboots
- Update: Also happens with 16.2.5 :-(
- 01:16 PM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- New instance seen in below pacific run:
http://pulpito.front.sepia.ceph.com/yuriw-2021-11-20_20:20:29-fs-wip-yuri6... - 10:54 AM Bug #51945: qa/workunits/mon/caps.sh: Error: Expected return 13, got 0
- Seems to be the same problem in:
http://pulpito.front.sepia.ceph.com/yuriw-2021-11-20_18:00:22-rados-wip-yuri6-testi... - 07:40 AM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- /a/yuriw-2021-11-20_18:01:41-rados-wip-yuri8-testing-2021-11-20-0807-distro-basic-smithi/6516396
11/22/2021
- 08:29 PM Feature #21579 (Resolved): [RFE] Stop OSD's removal if the OSD's are part of inactive PGs
- 07:11 PM Feature #51984: [RFE] Provide warning when the 'require-osd-release' flag does not match current ...
- I am providing the history of PRs and commits that resulted in
the loss/removal of the checks for 'require-osd-relea... - 06:45 PM Bug #53306 (Fix Under Review): ceph -s mon quorum age negative number
11/20/2021
- 01:41 AM Bug #53349 (New): stat_sum.num_bytes of pool is incorrect when randomly writing small IOs to the ...
- In a test, I found that when random writes with an IO size of 512B are performed on the rbd, The pool's stat_sum.num_...
- 12:06 AM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- /a/ksirivad-2021-11-19_19:14:07-rados-wip-autoscale-profile-scale-up-default-distro-basic-smithi/6514251
11/19/2021
- 06:23 PM Bug #53342 (Resolved): Exiting scrub checking -- not all pgs scrubbed
- ...
- 04:31 PM Backport #53340 (Rejected): pacific: osd/scrub: OSD crashes at PG removal
- 04:30 PM Backport #53339 (Resolved): pacific: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<cons...
- https://github.com/ceph/ceph/pull/46767
- 04:30 PM Backport #53338 (Resolved): pacific: osd/scrub: src/osd/scrub_machine.cc: 55: FAILED ceph_assert(...
- 04:29 PM Bug #51843 (Pending Backport): osd/scrub: OSD crashes at PG removal
- 04:28 PM Bug #51942 (Pending Backport): src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotA...
- 04:27 PM Bug #52012 (Pending Backport): osd/scrub: src/osd/scrub_machine.cc: 55: FAILED ceph_assert(state_...
- 03:46 AM Bug #53330 (New): ceph client request connection with an old invalid key.
- We have a production ceph cluster with 3 mons and 516 osds.
Ceph version: 14.2.8
CPU: Intel(R) Xeon(R) Gold 5218
... - 01:20 AM Bug #53329 (Duplicate): Set osd_fast_shutdown_notify_mon=true by default
- 01:18 AM Bug #53328 (Fix Under Review): osd_fast_shutdown_notify_mon option should be true by default
11/18/2021
- 11:10 PM Bug #53329 (Duplicate): Set osd_fast_shutdown_notify_mon=true by default
- This option was introduced in https://github.com/ceph/ceph/pull/38909, but was set false by default. There is a lot o...
- 09:30 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Ist Gab wrote:
> Neha Ojha wrote:
> > Set osd_delete_sleep to 2 secs and go higher if this does not help. Setting o... - 09:24 PM Bug #53328: osd_fast_shutdown_notify_mon option should be true by default
- Pull request ID: 44016
- 09:14 PM Bug #53328 (Duplicate): osd_fast_shutdown_notify_mon option should be true by default
- osd_fast_shutdown_notify_mon option is false by default. So users suffer
from error log flood, slow ops, and the lon... - 09:22 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- Tobias Urdin wrote:
> After upgrading osd.107 to 15.5.15 and waiting 2 hours for it to recover 3,000 objects in a si... - 09:11 PM Bug #53327 (Resolved): osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shut...
- - it should send MOSDMarkMeDead not MarkMeDown
- we must confirm that we set a flag (preparing to stop?) that makes ... - 08:57 PM Bug #53326 (Fix Under Review): pgs wait for read lease after osd start
- 08:28 PM Bug #53326 (Resolved): pgs wait for read lease after osd start
- - pg is healthy
- primary osd stops
- wait for things to settle
- restart primary
- pg goes into WAIT state
Th... - 08:08 PM Bug #51942: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
- rados/thrash/{0-size-min-size-overrides/2-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overrides/{defa...
- 05:58 PM Bug #48298: hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly
- still encountering on ceph octopus 15.2.15 :(
please add the HEALTH_ERROR when the limit is hit, then one at least... - 12:51 PM Bug #53316 (New): qa: (smithi150) slow request osd_op, currently waiting for sub ops warning
- The warning is seen in following teuthology run
http://pulpito.front.sepia.ceph.com/yuriw-2021-11-17_19:02:43-fs-w... - 03:05 AM Feature #52424 (Fix Under Review): [RFE] Limit slow request details to mgr log
11/17/2021
- 06:14 PM Bug #53308 (Resolved): pg-temp entries are not cleared for PGs that no longer exist
- When scaling down pg_num while it was in the process of scaling up, we consistently end up with stuck pg-temp entries...
- 04:59 PM Bug #53306 (Resolved): ceph -s mon quorum age negative number
- ...
- 06:24 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Seen in this pacific run as well.
http://pulpito.front.sepia.ceph.com/yuriw-2021-11-12_00:33:28-fs-wip-yuri7-testi...
11/16/2021
- 10:15 PM Bug #50659 (Fix Under Review): Segmentation fault under Pacific 16.2.1 when using a custom crush ...
- 03:12 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- Thank you for this fix. It is very much appreciated.
- 08:25 PM Bug #53295 (New): Leak_DefinitelyLost PrimaryLogPG::do_proxy_chunked_read()
- ...
- 08:20 PM Bug #53294 (Resolved): rados/test.sh hangs while running LibRadosTwoPoolsPP.TierFlushDuringFlush
- ...
- 07:14 PM Bug #52867: pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:fd00:3...
- Kefu Chai wrote:
> @John,
>
> per the logging message pasted at http://ix.io/3B1y
>
>
> [...]
>
> it seem... - 06:27 PM Backport #53259 (In Progress): pacific: mon: should always display disallowed leaders when set
- 06:26 PM Bug #53258 (Pending Backport): mon: should always display disallowed leaders when set
- 06:25 PM Bug #53237 (Pending Backport): mon: stretch mode blocks kernel clients from connecting
- 06:24 PM Backport #53239 (In Progress): pacific: mon: stretch mode blocks kernel clients from connecting
- 07:21 AM Backport #52936 (Resolved): pacific: Primary OSD crash caused corrupted object and further crashe...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43544
m... - 07:21 AM Backport #52868: stretch mode: allow users to change the tiebreaker monitor
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43457
m... - 12:32 AM Bug #53240 (Fix Under Review): full-object read crc is mismatch, because truncate modify oi.size ...
11/15/2021
- 03:27 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- After upgrading osd.107 to 15.5.15 and waiting 2 hours for it to recover 3,000 objects in a single PG it crashed agai...
- 03:19 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- ...
- 01:15 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- This is still an issue and it repeatedly hits this during recovery when upgrading the cluster where some (already upg...
- 02:18 AM Bug #53219: LibRadosTwoPoolsPP.ManifestRollbackRefcount failure
- Calculating reference count on manifest snapshotted object requires correct refcount information. So, current unittes...
11/14/2021
- 09:59 PM Bug #52901: osd/scrub: setting then clearing noscrub may lock a PG in 'scrubbing' state
- A test to detect this specific bug pushed as PR 43919
- 08:47 AM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Neha Ojha wrote:
> Set osd_delete_sleep to 2 secs and go higher if this does not help. Setting osd_delete_sleep take...
11/13/2021
- 08:01 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Ist Gab wrote:
> Neha Ojha wrote:
> > Can you try to set a higher value of "osd delete sleep" and see if that helps... - 05:50 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Neha Ojha wrote:
> Can you try to set a higher value of "osd delete sleep" and see if that helps?
Which one speci...
11/12/2021
- 11:11 PM Backport #53259 (Resolved): pacific: mon: should always display disallowed leaders when set
- https://github.com/ceph/ceph/pull/43972
- 11:10 PM Bug #53258 (Resolved): mon: should always display disallowed leaders when set
- I made some usability improvements in https://github.com/ceph/ceph/pull/43373, but accidentally switched things so th...
- 11:08 PM Backport #53238 (Rejected): octopus: mon: stretch mode blocks kernel clients from connecting
- Apparently I sometimes fail at sorting alphanumerically?
- 06:57 PM Bug #51942: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
- Ronen, let's prioritize this.
- 06:56 PM Bug #48909 (Duplicate): clog slow request overwhelm monitors
- 06:51 PM Bug #53138 (Triaged): cluster [WRN] Health check failed: Degraded data redundancy: 3/1164 objec...
- This warning comes up because there are PGs recovering, probably because the test is injecting failures - we can igno...
- 06:46 PM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- minghang zhao wrote:
> My solution is to add a function del_down_out_osd() to PGMap::get_rule_avail() to calculate t... - 06:43 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Can you try to set a higher value of "osd delete sleep" and see if that helps?
- 06:29 PM Bug #53190: counter num_read_kb is going down
- This seems possible to occur for many such counters in a distributed system like ceph, where these values are not tre...
- 06:26 PM Bug #52901 (Resolved): osd/scrub: setting then clearing noscrub may lock a PG in 'scrubbing' state
- 06:20 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- Deepika Upadhyay wrote:
> /ceph/teuthology-archive/ideepika-2021-11-02_12:33:30-rbd-wip-ssd-cache-testing-distro-bas... - 06:17 PM Bug #53219: LibRadosTwoPoolsPP.ManifestRollbackRefcount failure
- Myoungwon Oh wrote:
> I think this is the same issue as https://tracker.ceph.com/issues/52872.
> Recovery takes alm... - 07:28 AM Bug #53219: LibRadosTwoPoolsPP.ManifestRollbackRefcount failure
- I think this is the same issue as https://tracker.ceph.com/issues/52872.
Recovery takes almost 8 minutes even if cur... - 05:58 PM Bug #53251 (Closed): compiler warning about deprecated fmt::format_to()
- ...
- 03:49 PM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- http://qa-proxy.ceph.com/teuthology/ideepika-2021-11-12_08:56:59-rbd-wip-deepika-testing-2021-11-12-1203-distro-basic...
- 06:31 AM Bug #53240: full-object read crc is mismatch, because truncate modify oi.size and forget to clear...
- my ceph version is nautilous 14.2.5
- 04:08 AM Bug #53240: full-object read crc is mismatch, because truncate modify oi.size and forget to clear...
- https://github.com/ceph/ceph/pull/43902
- 03:27 AM Bug #53240: full-object read crc is mismatch, because truncate modify oi.size and forget to clear...
- The object oi.size should be 4194304, but it is actually 4063232.
The object data_digest is 0xffffffff, but read crc... - 02:56 AM Bug #53240 (Fix Under Review): full-object read crc is mismatch, because truncate modify oi.size ...
- I use 100 threads to dd on multiple files under the directory, so the same file can be truncated at any time.
When d... - 05:53 AM Cleanup #52754: windows warnings
- @Laura, they appear in windows shaman builds, anyone can take a look at the latest windows builds available here http...
11/11/2021
- 09:11 PM Bug #52867: pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:fd00:3...
- John Fulton wrote:
> As per comment #3 I was on the right path but I should have set an OSD setting, not a mon setti... - 09:10 PM Bug #52867 (Need More Info): pick_address.cc prints: unable to find any IPv4 address in networks ...
- 08:40 PM Backport #53239 (Resolved): pacific: mon: stretch mode blocks kernel clients from connecting
- https://github.com/ceph/ceph/pull/43971
- 08:40 PM Backport #53238 (Rejected): octopus: mon: stretch mode blocks kernel clients from connecting
- This was reported by Red Hat at https://bugzilla.redhat.com/show_bug.cgi?id=2022190
> [66873.543382] libceph: got ... - 08:30 PM Bug #53237 (Resolved): mon: stretch mode blocks kernel clients from connecting
- This was reported by Red Hat at https://bugzilla.redhat.com/show_bug.cgi?id=2022190
> [66873.543382] libceph: got ... - 07:48 PM Cleanup #52754: windows warnings
- Deepika, the link is 404 now. Is there a way that we could preserve the Jenkins output and provide a different link?
- 03:26 PM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- Analysis of logs from JobID: 6443924
osd.3 during running of the "ceph" teuthology task didn't get initialized. As... - 12:32 AM Bug #53219: LibRadosTwoPoolsPP.ManifestRollbackRefcount failure
- I'll take a look
11/10/2021
11/09/2021
- 06:21 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- I'm also seeing this issue:...
11/08/2021
- 05:45 PM Bug #52901: osd/scrub: setting then clearing noscrub may lock a PG in 'scrubbing' state
- Easy to reproduce: set noscrub, then request a deep-scrub. That will get the PG's Scrubber state-machine
stuck in su... - 01:42 PM Bug #53190 (New): counter num_read_kb is going down
- h3. Description of problem
An unreasonably high read metric value has been reported by monitoring (28.76TB/s).
... - 12:36 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Have a look the latency.png, please, all the spikes are almost outage.
- 12:27 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Igor Fedotov wrote:
> Can we have a relevant OSD log, please. I presume suicide timeout/slow DB operations are prese... - 12:26 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Neha Ojha wrote:
> Do you have any more information about this crash? How often do you see it?
I have quite a lot... - 10:47 AM Bug #52760: Monitor unable to rejoin the cluster
- Neha Ojha wrote:
> Can you share mon logs from all the monitors with debug_mon=20 and debug_ms=1?
I will once thi...
11/05/2021
- 09:14 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Can we have a relevant OSD log, please. I presume suicide timeout/slow DB operations are present there.
- 09:12 PM Bug #53142 (Need More Info): OSD crash in PG::do_delete_work when increasing PGs
- Do you have any more information about this crash? How often do you see it?
- 09:10 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- ...
11/04/2021
- 09:17 PM Backport #53167 (Rejected): octopus: api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- 09:16 PM Backport #53166 (Resolved): pacific: api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- https://github.com/ceph/ceph/pull/51261
- 09:11 PM Bug #24990 (Pending Backport): api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- 09:08 PM Bug #24990: api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- https://github.com/ceph/ceph/pull/43700 merged
- 01:51 AM Bug #52126 (Resolved): stretch mode: allow users to change the tiebreaker monitor
- 01:50 AM Backport #52868 (Resolved): stretch mode: allow users to change the tiebreaker monitor
- 12:38 AM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- Sridhar has started looking into this.
11/03/2021
- 10:36 PM Backport #52936: pacific: Primary OSD crash caused corrupted object and further crashes during ba...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43544
merged - 10:35 PM Backport #52868: stretch mode: allow users to change the tiebreaker monitor
- Greg Farnum wrote:
> https://github.com/ceph/ceph/pull/43457
merged - 03:10 PM Bug #53142 (Need More Info): OSD crash in PG::do_delete_work when increasing PGs
- I've attached the file and put the crash signature also.
- 01:40 PM Bug #53138 (Triaged): cluster [WRN] Health check failed: Degraded data redundancy: 3/1164 objec...
- ...
- 12:23 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- /ceph/teuthology-archive/ideepika-2021-11-02_12:33:30-rbd-wip-ssd-cache-testing-distro-basic-smithi/6477582/teutholog...
- 10:47 AM Bug #47300: mount.ceph fails to understand AAAA records from SRV record
- Issue still present on 16.2.6 (ceph packages 16.2.6-1focal, kernel 5.11.0-38-generic)...
- 09:57 AM Bug #51463: blocked requests while stopping/starting OSDs
- Hi Sage,
I tested it with fast shutdown enabled (default) and disabled. In both cases I got slow ops (longer than ... - 07:15 AM Bug #52967: premerge pgs may be backfill_wait for a long time
- hi, Sage, What are the conditions for production "premerge+backfill_wait"?
- 06:58 AM Bug #52741: pg inconsistent state is lost after the primary osd restart
- yite gu wrote:
> What is the way you remove replica?
In my case it was filestore so I just remove the file on the... - 03:24 AM Bug #52741: pg inconsistent state is lost after the primary osd restart
- What is the way you remove replica?
11/02/2021
- 11:49 PM Bug #48909: clog slow request overwhelm monitors
- This is being handled over https://tracker.ceph.com/issues/52424.
- 10:17 PM Bug #51527 (Resolved): Ceph osd crashed due to segfault
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:01 PM Backport #52770 (Resolved): pacific: pg scrub stat mismatch with special objects that have hash '...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43512
m... - 10:01 PM Backport #52620 (Resolved): pacific: partial recovery become whole object recovery after restart osd
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43513
m... - 10:01 PM Backport #52843 (Resolved): pacific: msg/async/ProtocalV2: recv_stamp of a message is set to a wr...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43511
m... - 10:00 PM Backport #52831 (Resolved): pacific: osd: pg may get stuck in backfill_toofull after backfill is ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43437
m... - 01:50 PM Bug #51463 (Need More Info): blocked requests while stopping/starting OSDs
- I easily reproduced this with 'osd fast shutdown = false' (vstart default), but was unable to do so with 'osd fast sh...
- 12:22 AM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- ...
- 12:21 AM Bug #52694 (Duplicate): src/messages/MOSDPGLog.h: virtual void MOSDPGLog::encode_payload(uint64_t...
11/01/2021
Also available in: Atom