Activity
From 10/13/2021 to 11/11/2021
11/11/2021
- 09:11 PM Bug #52867: pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:fd00:3...
- John Fulton wrote:
> As per comment #3 I was on the right path but I should have set an OSD setting, not a mon setti... - 09:10 PM Bug #52867 (Need More Info): pick_address.cc prints: unable to find any IPv4 address in networks ...
- 08:40 PM Backport #53239 (Resolved): pacific: mon: stretch mode blocks kernel clients from connecting
- https://github.com/ceph/ceph/pull/43971
- 08:40 PM Backport #53238 (Rejected): octopus: mon: stretch mode blocks kernel clients from connecting
- This was reported by Red Hat at https://bugzilla.redhat.com/show_bug.cgi?id=2022190
> [66873.543382] libceph: got ... - 08:30 PM Bug #53237 (Resolved): mon: stretch mode blocks kernel clients from connecting
- This was reported by Red Hat at https://bugzilla.redhat.com/show_bug.cgi?id=2022190
> [66873.543382] libceph: got ... - 07:48 PM Cleanup #52754: windows warnings
- Deepika, the link is 404 now. Is there a way that we could preserve the Jenkins output and provide a different link?
- 03:26 PM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- Analysis of logs from JobID: 6443924
osd.3 during running of the "ceph" teuthology task didn't get initialized. As... - 12:32 AM Bug #53219: LibRadosTwoPoolsPP.ManifestRollbackRefcount failure
- I'll take a look
11/10/2021
11/09/2021
- 06:21 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- I'm also seeing this issue:...
11/08/2021
- 05:45 PM Bug #52901: osd/scrub: setting then clearing noscrub may lock a PG in 'scrubbing' state
- Easy to reproduce: set noscrub, then request a deep-scrub. That will get the PG's Scrubber state-machine
stuck in su... - 01:42 PM Bug #53190 (New): counter num_read_kb is going down
- h3. Description of problem
An unreasonably high read metric value has been reported by monitoring (28.76TB/s).
... - 12:36 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Have a look the latency.png, please, all the spikes are almost outage.
- 12:27 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Igor Fedotov wrote:
> Can we have a relevant OSD log, please. I presume suicide timeout/slow DB operations are prese... - 12:26 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Neha Ojha wrote:
> Do you have any more information about this crash? How often do you see it?
I have quite a lot... - 10:47 AM Bug #52760: Monitor unable to rejoin the cluster
- Neha Ojha wrote:
> Can you share mon logs from all the monitors with debug_mon=20 and debug_ms=1?
I will once thi...
11/05/2021
- 09:14 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- Can we have a relevant OSD log, please. I presume suicide timeout/slow DB operations are present there.
- 09:12 PM Bug #53142 (Need More Info): OSD crash in PG::do_delete_work when increasing PGs
- Do you have any more information about this crash? How often do you see it?
- 09:10 PM Bug #53142: OSD crash in PG::do_delete_work when increasing PGs
- ...
11/04/2021
- 09:17 PM Backport #53167 (Rejected): octopus: api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- 09:16 PM Backport #53166 (Resolved): pacific: api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- https://github.com/ceph/ceph/pull/51261
- 09:11 PM Bug #24990 (Pending Backport): api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- 09:08 PM Bug #24990: api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- https://github.com/ceph/ceph/pull/43700 merged
- 01:51 AM Bug #52126 (Resolved): stretch mode: allow users to change the tiebreaker monitor
- 01:50 AM Backport #52868 (Resolved): stretch mode: allow users to change the tiebreaker monitor
- 12:38 AM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- Sridhar has started looking into this.
11/03/2021
- 10:36 PM Backport #52936: pacific: Primary OSD crash caused corrupted object and further crashes during ba...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43544
merged - 10:35 PM Backport #52868: stretch mode: allow users to change the tiebreaker monitor
- Greg Farnum wrote:
> https://github.com/ceph/ceph/pull/43457
merged - 03:10 PM Bug #53142 (Need More Info): OSD crash in PG::do_delete_work when increasing PGs
- I've attached the file and put the crash signature also.
- 01:40 PM Bug #53138 (Triaged): cluster [WRN] Health check failed: Degraded data redundancy: 3/1164 objec...
- ...
- 12:23 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- /ceph/teuthology-archive/ideepika-2021-11-02_12:33:30-rbd-wip-ssd-cache-testing-distro-basic-smithi/6477582/teutholog...
- 10:47 AM Bug #47300: mount.ceph fails to understand AAAA records from SRV record
- Issue still present on 16.2.6 (ceph packages 16.2.6-1focal, kernel 5.11.0-38-generic)...
- 09:57 AM Bug #51463: blocked requests while stopping/starting OSDs
- Hi Sage,
I tested it with fast shutdown enabled (default) and disabled. In both cases I got slow ops (longer than ... - 07:15 AM Bug #52967: premerge pgs may be backfill_wait for a long time
- hi, Sage, What are the conditions for production "premerge+backfill_wait"?
- 06:58 AM Bug #52741: pg inconsistent state is lost after the primary osd restart
- yite gu wrote:
> What is the way you remove replica?
In my case it was filestore so I just remove the file on the... - 03:24 AM Bug #52741: pg inconsistent state is lost after the primary osd restart
- What is the way you remove replica?
11/02/2021
- 11:49 PM Bug #48909: clog slow request overwhelm monitors
- This is being handled over https://tracker.ceph.com/issues/52424.
- 10:17 PM Bug #51527 (Resolved): Ceph osd crashed due to segfault
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:01 PM Backport #52770 (Resolved): pacific: pg scrub stat mismatch with special objects that have hash '...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43512
m... - 10:01 PM Backport #52620 (Resolved): pacific: partial recovery become whole object recovery after restart osd
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43513
m... - 10:01 PM Backport #52843 (Resolved): pacific: msg/async/ProtocalV2: recv_stamp of a message is set to a wr...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43511
m... - 10:00 PM Backport #52831 (Resolved): pacific: osd: pg may get stuck in backfill_toofull after backfill is ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43437
m... - 01:50 PM Bug #51463 (Need More Info): blocked requests while stopping/starting OSDs
- I easily reproduced this with 'osd fast shutdown = false' (vstart default), but was unable to do so with 'osd fast sh...
- 12:22 AM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- ...
- 12:21 AM Bug #52694 (Duplicate): src/messages/MOSDPGLog.h: virtual void MOSDPGLog::encode_payload(uint64_t...
11/01/2021
- 05:09 PM Feature #51213 (Fix Under Review): [ceph osd set noautoscale] Global on/off flag for PG autoscale...
10/31/2021
10/29/2021
- 02:05 PM Feature #52424: [RFE] Limit slow request details to mgr log
- I am working on to run qa-suite on PR. Keeping status as "In progress" for now as need to push qa suite changes if any.
10/28/2021
- 11:29 PM Feature #51213: [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
- PR: https://github.com/ceph/ceph/pull/43716
- 03:29 PM Backport #52845 (In Progress): pacific: osd: add scrub duration to pg dump
- 02:12 PM Bug #51942: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
- and /a/sage-2021-10-28_02:19:01-rados-wip-sage3-testing-2021-10-27-1300-distro-basic-smithi/6464056
with logs
- 02:10 PM Bug #51942: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
- /a/sage-2021-10-28_02:19:01-rados-wip-sage3-testing-2021-10-27-1300-distro-basic-smithi/6464393
with osd logs - 02:08 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- /a/sage-2021-10-28_02:19:01-rados-wip-sage3-testing-2021-10-27-1300-distro-basic-smithi/6464204
with logs! - 02:06 PM Bug #24990 (Fix Under Review): api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- 02:04 PM Bug #24990: api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- /a/sage-2021-10-28_02:19:01-rados-wip-sage3-testing-2021-10-27-1300-distro-basic-smithi/6464087...
- 11:56 AM Feature #52424 (In Progress): [RFE] Limit slow request details to mgr log
10/27/2021
- 07:55 PM Feature #53050: Support blocklisting a CIDR range
- Greg Farnum wrote:
> Patrick Donnelly wrote:
> > So we're going to put a huge asterisk here that the CIDR range of ... - 05:00 AM Feature #53050: Support blocklisting a CIDR range
- Patrick Donnelly wrote:
> So we're going to put a huge asterisk here that the CIDR range of machines must be hard-re... - 01:39 AM Feature #53050: Support blocklisting a CIDR range
- So we're going to put a huge asterisk here that the CIDR range of machines must be hard-rebooted, right? Otherwise w...
- 06:46 PM Bug #53067: Fix client "version" display for kernel clients
- I looked at this at one point and it was moderately irritating, but the display bug is also really confusing for user...
- 01:10 PM Bug #53067 (New): Fix client "version" display for kernel clients
- Hello
When a rhel7 client mounts a cephfs share, it appears in `ceph features` as it was a jewel client, even if t... - 10:40 AM Bug #52509: PG merge: PG stuck in premerge+peered state
- We had a similar outage.
We did try to increase the number of PGs on a bucket-index-pool:...
10/26/2021
- 08:30 PM Backport #52770: pacific: pg scrub stat mismatch with special objects that have hash 'ffffffff'
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43512
merged - 08:29 PM Backport #52620: pacific: partial recovery become whole object recovery after restart osd
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43513
merged - 08:28 PM Backport #52843: pacific: msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43511
merged - 08:26 PM Backport #52831: pacific: osd: pg may get stuck in backfill_toofull after backfill is interrupted...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43437
merged - 07:45 PM Bug #52124: Invalid read of size 8 in handle_recovery_delete()
- /a/yuriw-2021-10-21_13:40:38-rados-wip-yuri2-testing-2021-10-20-1700-pacific-distro-basic-smithi/6454961/remote/smith...
- 06:48 PM Feature #48590 (Rejected): Add ability to blocklist a cephx entity name, a set of entities by a l...
- This is really impractical to do in RADOS. Closing in favor of https://tracker.ceph.com/issues/53050
- 06:48 PM Feature #53050 (Resolved): Support blocklisting a CIDR range
- Disaster recovery use cases want to be able to fence off entire IP ranges, rather than needing to specify individual ...
- 04:55 PM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
- Yes, I tried that, but it does not change the behavior:
>> ceph config set global public_network 10.113.0.0/16
... - 04:35 PM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- Neha Ojha wrote:
> This could be related to the removal of allocation metadata from rocksdb work from Gabi, I will h...
10/25/2021
- 09:34 PM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
- The docs suggest setting public_network in the global section, not just for the mons https://docs.ceph.com/en/latest/...
- 09:27 PM Bug #52760 (Need More Info): Monitor unable to rejoin the cluster
- Can you share mon logs from all the monitors with debug_mon=20 and debug_ms=1?
- 09:26 PM Bug #52513: BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- Added logs to teuthology:/post/tracker_52513/
- 09:20 PM Bug #52513: BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- Konstantin Shalygin wrote:
> I was reproduced this issue:
>
> # put rados object to pool
> # get mapping for thi... - 09:20 PM Bug #52513 (New): BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- 09:15 PM Bug #15546 (Resolved): json numerical output is between quotes
- Based on https://tracker.ceph.com/issues/15546#note-2 (Thanks Laura!)
- 09:13 PM Bug #52385 (Need More Info): a possible data loss due to recovery_unfound PG after restarting all...
- 09:13 PM Bug #27053: qa: thrashosds: "[ERR] : 2.0 has 1 objects unfound and apparently lost"
- Deepika Upadhyay wrote:
> [...]
>
> /ceph/teuthology-archive/yuriw-2021-10-18_19:03:43-rados-wip-yuri5-testing-20... - 09:06 PM Bug #44184: Slow / Hanging Ops after pool creation
- Neha Ojha wrote:
> Which version are you using?
Octopus 15.2.14 - 08:50 PM Bug #44184: Slow / Hanging Ops after pool creation
- Ist Gab wrote:
> Wido den Hollander wrote:
> > On a cluster with 1405 OSDs I've ran into a situation for the second... - 08:59 PM Bug #52948: osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- This could be related to the removal of allocation metadata from rocksdb work from Gabi, I will have him verify.
... - 06:28 AM Bug #53000: OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_d...
- Yet another example, now for a PR for the master branch [1, 2], and for OSDMap/OSDMapTest.BUG_51842/1:...
10/21/2021
- 06:30 AM Bug #53000 (New): OSDMap/OSDMapTest.BUG_51842/2: ThreadPool::WorkQueue<ParallelPGMapper::Item>::_...
- This failure was reported by Jenkins for pacific branch PR [1], though it does not look like related to that PR, and ...
- 06:07 AM Bug #38219 (Resolved): rebuild-mondb hangs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 05:53 AM Backport #52809 (Resolved): octopus: ceph-erasure-code-tool: new tool to encode/decode files
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43407
m... - 05:52 AM Backport #51552 (Resolved): octopus: rebuild-mondb hangs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43263
m... - 05:49 AM Backport #51569 (Resolved): octopus: pool last_epoch_clean floor is stuck after pg merging
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42837
m...
10/20/2021
- 07:39 PM Bug #52993 (New): upgrade:octopus-x Test: Upgrade test failed due to timeout of the "ceph pg dump...
- /a/teuthology-2021-10-12_13:10:21-upgrade:octopus-x-pacific-distro-basic-smithi/6433896
/a/teuthology-2021-10-12_13:... - 07:24 PM Feature #52992: Enhance auto-repair capabilities to handle stat mismatch scrub errors
- Downstream bug - https://bugzilla.redhat.com/show_bug.cgi?id=2010447
- 07:24 PM Feature #52992 (New): Enhance auto-repair capabilities to handle stat mismatch scrub errors
- At present the auto repair does not handle stat mismatch scrub errors.
- We plan to enhance the auto-repair capabi... - 06:30 PM Bug #52925 (Fix Under Review): pg peering alway after trigger async recovery
- 02:58 AM Bug #47838: mon/test_mon_osdmap_prune.sh: first_pinned != trim_to
- ...
- 02:55 AM Bug #27053: qa: thrashosds: "[ERR] : 2.0 has 1 objects unfound and apparently lost"
- ...
10/19/2021
- 08:55 PM Bug #52886: osd: backfill reservation does not take compression into account
- Neha Ojha wrote:
> I'll create a trello card to track this, I think the initial toofull implementation was intention... - 02:12 AM Bug #52969: use "ceph df" command found pool max avail increase when there are degraded objects i...
- My solution is to add a function del_down_out_osd() to PGMap::get_rule_avail() to calculate the avail value of the st...
- 01:05 AM Bug #52969 (Fix Under Review): use "ceph df" command found pool max avail increase when there are...
- down former:
--- POOLS ---
POOL ID STORED OBJECTS USED %USED MAX AVAIL
device_health_met... - 01:40 AM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- > Please provide osd logs (roughly 10 mins) from all replicas with debug_osd=20, debug_ms=1, when the osds are restar...
10/18/2021
- 10:24 PM Bug #51576: qa/tasks/radosbench.py times out
- ...
- 05:54 PM Bug #52967 (New): premerge pgs may be backfill_wait for a long time
- ...
10/17/2021
- 11:27 AM Bug #44184: Slow / Hanging Ops after pool creation
- Ist Gab wrote:
> > Are you sure recovery_deletes is set in the OSDMap?
>
> yeah, these are the flags:
> flags no...
10/15/2021
- 04:59 PM Bug #15546: json numerical output is between quotes
- This seems to be okay now. Is it possible that this issue was fixed without getting updated?...
- 02:58 PM Bug #52948 (New): osd: fails to come up: "teuthology.misc:7 of 8 OSDs are up"
- ...
- 02:06 PM Bug #52513: BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- I was reproduced this issue:
# put rados object to pool
# get mapping for this object... - 12:47 PM Bug #44184: Slow / Hanging Ops after pool creation
- > Are you sure recovery_deletes is set in the OSDMap?
yeah, these are the flags:
flags noout,sortbitwise,recovery... - 11:58 AM Bug #44184: Slow / Hanging Ops after pool creation
- Ist Gab wrote:
> Wido den Hollander wrote:
> > On a cluster with 1405 OSDs I've ran into a situation for the second... - 12:20 PM Support #52881: Filtered out host node3.foo.com: does not belong to mon public_network ()
- I can now answer my question by myself. It was a missconfiguration.
After I entered the unmanaged mode with
@$ ...
10/14/2021
- 07:14 PM Bug #52867: pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:fd00:3...
- As per comment #3 I was on the right path but I should have set an OSD setting, not a mon setting. If I run the follo...
- 06:15 AM Bug #52867: pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:fd00:3...
- @John,
per the logging message pasted at http://ix.io/3B1y... - 04:53 PM Bug #51527 (Pending Backport): Ceph osd crashed due to segfault
- Just sent https://github.com/ceph/ceph/pull/43548 for pacific. Nothing more to do here, I think.
- 04:32 PM Bug #51527: Ceph osd crashed due to segfault
- This is a known issue that has been fixed in master by commit https://github.com/ceph/ceph/commit/d51d80b3234e1769006...
- 03:56 PM Backport #52938 (In Progress): nautilus: Primary OSD crash caused corrupted object and further cr...
- 02:46 PM Backport #52938 (Rejected): nautilus: Primary OSD crash caused corrupted object and further crash...
- https://github.com/ceph/ceph/pull/43547
- 03:51 PM Backport #52937 (In Progress): octopus: Primary OSD crash caused corrupted object and further cra...
- 02:46 PM Backport #52937 (Rejected): octopus: Primary OSD crash caused corrupted object and further crashe...
- https://github.com/ceph/ceph/pull/43545
- 03:47 PM Backport #52936 (In Progress): pacific: Primary OSD crash caused corrupted object and further cra...
- 02:46 PM Backport #52936 (Resolved): pacific: Primary OSD crash caused corrupted object and further crashe...
- https://github.com/ceph/ceph/pull/43544
- 02:43 PM Bug #48959 (Pending Backport): Primary OSD crash caused corrupted object and further crashes duri...
- 02:36 PM Bug #52815 (Resolved): exact_timespan_str()
- 12:49 PM Bug #42861: Libceph-common.so needs to use private link attribute when including dpdk static library
- link with shared libs,it can find ethernet,and test passed.
[root@ceph1 aarch64-openEuler-linux-gnu]# cat ./src/test... - 12:49 PM Bug #42861: Libceph-common.so needs to use private link attribute when including dpdk static library
- link with static lib ,can't find ethernet port
[root@ceph1 aarch64-openEuler-linux-gnu]# cat ./src/test/msgr/CMakeFi... - 12:49 PM Bug #52930 (New): Cannot get 'quorum_status' output from socket file
- As per doc[1] if we try to get quorum_status it fails with invalid command.
Is this no longer available in ceph 16.x... - 10:31 AM Bug #52513: BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operation 15
- @Neha, I will try to reproduce this via removing objects from replicas
- 07:04 AM Bug #52925: pg peering alway after trigger async recovery
- PR: https://github.com/ceph/ceph/pull/43534
- 06:40 AM Bug #52925 (Closed): pg peering alway after trigger async recovery
- my ceph version 14.2.21,I want to test pg async recovery function, so I olny set osd.9 config "osd_async_recovery_mi...
10/13/2021
- 07:09 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- ...
- 05:58 PM Bug #52162: crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): abort
- Hey Josh,
This issue is marked as 'Duplicate', but it does not have a 'duplicates' relation, but a 'relates to' on... - 01:53 PM Feature #52609: New PG states for pending scrubs / repairs
- Looks good to me! Thanks Ronen
- 07:21 AM Bug #52901 (Fix Under Review): osd/scrub: setting then clearing noscrub may lock a PG in 'scrubbi...
Also available in: Atom