Activity
From 09/13/2021 to 10/12/2021
10/12/2021
- 10:25 PM Bug #52872 (Fix Under Review): LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- 03:20 AM Bug #52872: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- https://github.com/ceph/ceph/pull/43493
- 09:12 PM Backport #52620 (In Progress): pacific: partial recovery become whole object recovery after resta...
- 09:11 PM Backport #52770 (In Progress): pacific: pg scrub stat mismatch with special objects that have has...
- 09:10 PM Backport #52843 (In Progress): pacific: msg/async/ProtocalV2: recv_stamp of a message is set to a...
- 06:46 PM Bug #52578 (Fix Under Review): CLI - osd pool rm --help message is wrong or misleading
- 04:30 PM Bug #52901 (Resolved): osd/scrub: setting then clearing noscrub may lock a PG in 'scrubbing' state
- Recent scrub scheduling code errs in (at one location) incorrectly considering noscrub as not
precluding deep-scrub. - 02:42 PM Bug #51463: blocked requests while stopping/starting OSDs
- Sure.
Simple cluster with 5 nodes 125 OSDs in total
one pool replicated size 3, min_size 1
at least this in t... - 09:23 AM Feature #52609 (Fix Under Review): New PG states for pending scrubs / repairs
- Please see the updated proposed change (https://github.com/ceph/ceph/pull/43403 - new comment from today).
I hope it... - 06:47 AM Bug #44184: Slow / Hanging Ops after pool creation
- Wido den Hollander wrote:
> On a cluster with 1405 OSDs I've ran into a situation for the second time now where a po...
10/11/2021
- 10:28 PM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- Satoru Takeuchi wrote:
> > Can you share the full set of logs using ceph-post-file (https://docs.ceph.com/en/pacific... - 10:25 PM Backport #52893 (Rejected): octopus: ceph-kvstore-tool repair segmentfault without bluestore-kv
- 10:25 PM Backport #52892 (Resolved): pacific: ceph-kvstore-tool repair segmentfault without bluestore-kv
- https://github.com/ceph/ceph/pull/51254
- 10:24 PM Bug #52756 (Pending Backport): ceph-kvstore-tool repair segmentfault without bluestore-kv
- based on https://tracker.ceph.com/issues/52756#note-2, looks like the fix needs to be backported all the way.
- 10:20 PM Bug #52513 (Need More Info): BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operati...
- Can you capture a coredump or osd logs with debug_osd=20,debug_bluestore=20,debug_ms=1 if this crash is reproducible?
- 10:13 PM Bug #52126 (Pending Backport): stretch mode: allow users to change the tiebreaker monitor
- 10:13 PM Bug #52872: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- Myoungwon Oh : I am assigning it to you, in case you have any thoughts on this issue. Feel free to un-assign if you d...
- 10:08 PM Bug #52884 (Fix Under Review): osd: optimize pg peering latency when add new osd that need backfill
- 05:29 AM Bug #52884: osd: optimize pg peering latency when add new osd that need backfill
- https://github.com/ceph/ceph/pull/43482
- 05:28 AM Bug #52884 (Fix Under Review): osd: optimize pg peering latency when add new osd that need backfill
- Reproduce:
(1) ceph cluster not running any client IO
(2) only ceph osd in osd.14 operation ( add new osd to cluste... - 10:07 PM Bug #52886: osd: backfill reservation does not take compression into account
- I'll create a trello card to track this, I think the initial toofull implementation was intentionally kept simple, bu...
- 08:42 AM Bug #52886 (New): osd: backfill reservation does not take compression into account
- The problem may be observed with the recently added backfill-toofull test when it runs with bluestore-comp-lz4. When ...
- 10:02 PM Bug #51463: blocked requests while stopping/starting OSDs
- Is it possible for you to share your test reproducer with us? It would be great if we could run it against a vstart c...
- 05:19 PM Bug #52889 (Triaged): upgrade tests fails because of missing ceph-volume package
- We need something like 4e525127fbb710c1ac074cf61b448055781a69e3, for octopus-x as well.
- 04:43 PM Bug #52889 (Triaged): upgrade tests fails because of missing ceph-volume package
- Recently we seperated ceph-volume from ceph-base to a seperate one, have not looked deeply into it right now, but I s...
10/10/2021
- 07:03 PM Documentation #22843 (Won't Fix): [doc][luminous] the configuration guide still contains osd_op_t...
- 07:02 PM Documentation #7386 (Won't Fix): librados: document rados_osd_op_timeout and rados_mon_op_timeout...
- Seven years old, marked @Advanced@, making a judgement call and closing this. If you disagree, let me know and I'll ...
10/09/2021
- 10:27 PM Support #52881 (New): Filtered out host node3.foo.com: does not belong to mon public_network ()
- I am running a Ceph Pacific cluster ( version 16.2.6) consisting of 3 nodes with public Internet Addresses. I also ha...
- 07:43 PM Bug #52867: pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:fd00:3...
- I set the following after bootstrap and before adding any OSDs but I got the same error.
`ceph config set mon ms_b... - 03:12 PM Bug #52878 (Fix Under Review): qa/tasks: python3 'dict' object has no attribute 'iterkeys' error
- 08:07 AM Bug #52878 (Resolved): qa/tasks: python3 'dict' object has no attribute 'iterkeys' error
- os: CentOS8.4
ceph version: ceph16.2.4
Teuthology error log:
2021-08-27T09:18:09.787 DEBUG:teuthology.orchestra.... - 12:05 AM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- ...
10/08/2021
- 10:22 PM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
- merged https://github.com/ceph/ceph/pull/43324
- 09:06 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
- All evidences pointing to a firmware bug persisting in dozens of used Hitachi 12TB (HGST model HUH721212AL5200) HDDs ...
- 12:18 PM Bug #52872 (Pending Backport): LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
- /a/yuriw-2021-10-04_21:49:48-rados-wip-yuri4-testing-2021-10-04-1236-distro-basic-smithi/6421867...
10/07/2021
- 11:00 PM Backport #52868 (In Progress): stretch mode: allow users to change the tiebreaker monitor
- 10:52 PM Backport #52868 (Resolved): stretch mode: allow users to change the tiebreaker monitor
- https://github.com/ceph/ceph/pull/43457
- 08:59 PM Bug #52867 (New): pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:...
- When using IPv6 for my public and cluster network my mon is able to bootstrap (because I have [1]) but I end up with ...
- 02:25 PM Backport #52809: octopus: ceph-erasure-code-tool: new tool to encode/decode files
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43407
merged - 12:40 AM Backport #52845 (Rejected): pacific: osd: add scrub duration to pg dump
- https://github.com/ceph/ceph/pull/43704
- 12:35 AM Bug #52605 (Pending Backport): osd: add scrub duration to pg dump
10/06/2021
- 10:45 PM Backport #52843 (Resolved): pacific: msg/async/ProtocalV2: recv_stamp of a message is set to a wr...
- https://github.com/ceph/ceph/pull/43511
- 10:45 PM Backport #52842 (Rejected): octopus: msg/async/ProtocalV2: recv_stamp of a message is set to a wr...
- 10:45 PM Backport #52841 (Resolved): pacific: shard-threads cannot wakeup bug
- https://github.com/ceph/ceph/pull/51262
- 10:45 PM Backport #52840 (Rejected): octopus: shard-threads cannot wakeup bug
- 10:45 PM Backport #52839 (In Progress): pacific: rados: build minimally when "WITH_MGR" is off
- https://github.com/ceph/ceph/pull/51250
- 10:45 PM Backport #52838 (Rejected): octopus: rados: build minimally when "WITH_MGR" is off
- 10:43 PM Cleanup #52796 (Pending Backport): rados: build minimally when "WITH_MGR" is off
- 10:42 PM Bug #52781 (Pending Backport): shard-threads cannot wakeup bug
- 10:40 PM Bug #52739 (Pending Backport): msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
- 06:24 PM Bug #48965: qa/standalone/osd/osd-force-create-pg.sh: TEST_reuse_id: return 1
- https://pulpito.ceph.com/kchai-2021-10-05_16:14:10-rados-wip-kefu-testing-2021-10-05-2221-distro-basic-smithi/6423567
- 05:56 PM Backport #52832 (In Progress): nautilus: osd: pg may get stuck in backfill_toofull after backfill...
- 04:35 PM Backport #52832 (Rejected): nautilus: osd: pg may get stuck in backfill_toofull after backfill is...
- https://github.com/ceph/ceph/pull/43439
- 05:40 PM Backport #52833 (In Progress): octopus: osd: pg may get stuck in backfill_toofull after backfill ...
- 04:35 PM Backport #52833 (Resolved): octopus: osd: pg may get stuck in backfill_toofull after backfill is ...
- https://github.com/ceph/ceph/pull/43438
- 05:39 PM Backport #52831 (In Progress): pacific: osd: pg may get stuck in backfill_toofull after backfill ...
- 04:35 PM Backport #52831 (Resolved): pacific: osd: pg may get stuck in backfill_toofull after backfill is ...
- https://github.com/ceph/ceph/pull/43437
- 04:31 PM Bug #52448 (Pending Backport): osd: pg may get stuck in backfill_toofull after backfill is interr...
10/05/2021
- 02:56 PM Backport #51552: octopus: rebuild-mondb hangs
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/43263
merged - 02:46 PM Backport #51569: octopus: pool last_epoch_clean floor is stuck after pg merging
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42837
merged - 12:53 PM Bug #52815 (Fix Under Review): exact_timespan_str()
- 12:25 PM Bug #52815 (Resolved): exact_timespan_str()
- exact_timespan_str() in ceph_time.cc handles some specific time-spans incorrectly:
150.567 seconds, for example, w... - 04:06 AM Bug #43580 (Resolved): pg: fastinfo incorrect when last_update moves backward in time
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:06 AM Bug #44798 (Resolved): librados mon_command (mgr) command hang
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:05 AM Bug #48611 (Resolved): osd: Delay sending info to new backfill peer resetting last_backfill until...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:04 AM Bug #49894 (Resolved): set a non-zero default value for osd_client_message_cap
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:03 AM Bug #51000 (Resolved): LibRadosTwoPoolsPP.ManifestSnapRefcount failure
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:02 AM Bug #51419 (Resolved): bufferlist::splice() may cause stack corruption in bufferlist::rebuild_ali...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:02 AM Bug #51627 (Resolved): FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:00 AM Backport #51952 (Resolved): pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43099
m... - 04:00 AM Backport #52322 (Resolved): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43306
m... - 03:59 AM Backport #51117: pacific: osd: Run osd bench test to override default max osd capacity for mclock.
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41731
m... - 03:55 AM Backport #51555 (Resolved): octopus: mon: return -EINVAL when handling unknown option in 'ceph os...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43266
m... - 03:55 AM Backport #51967 (Resolved): octopus: set a non-zero default value for osd_client_message_cap
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42616
m... - 03:52 AM Backport #51604 (Resolved): octopus: bufferlist::splice() may cause stack corruption in bufferlis...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42975
m... - 12:03 AM Bug #52126 (Fix Under Review): stretch mode: allow users to change the tiebreaker monitor
10/04/2021
- 09:33 PM Feature #52609 (In Progress): New PG states for pending scrubs / repairs
- 03:50 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
- Igor Fedotov wrote:
> Dmitry Smirnov wrote:
>
> Hey Dmitry,
> I'm not sure why you're saying this is the same co... - 03:47 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
- Dmitry Smirnov wrote:
> We face the same issue with simultaneously four OSD down events at 4 different hosts and ina... - 08:46 AM Bug #50683 (Rejected): [RBD] master - cluster [WRN] Health check failed: mon is allowing insecure...
- 08:34 AM Backport #52808 (In Progress): nautilus: ceph-erasure-code-tool: new tool to encode/decode files
- 08:15 AM Backport #52808 (Rejected): nautilus: ceph-erasure-code-tool: new tool to encode/decode files
- https://github.com/ceph/ceph/pull/43408
- 08:17 AM Backport #52809 (In Progress): octopus: ceph-erasure-code-tool: new tool to encode/decode files
- 08:15 AM Backport #52809 (Resolved): octopus: ceph-erasure-code-tool: new tool to encode/decode files
- https://github.com/ceph/ceph/pull/43407
- 08:14 AM Bug #52807 (Resolved): ceph-erasure-code-tool: new tool to encode/decode files
- This tool has already been pushed into pre-pacific master [1] and we have it since Pacific.
There is a demand from...
10/03/2021
- 06:41 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
- We face the same issue with simultaneously four OSD down events at 4 different hosts and inability to restart them al...
- 02:29 PM Bug #50657: smart query on monitors
- Just wanted to add that we have similar situation where we have 3 dedicated mon nodes, each running in their own cont...
10/01/2021
- 02:07 PM Bug #51463: blocked requests while stopping/starting OSDs
- This is still a issue- In the newest Pacific release (16.2.5) as well
The developer documentation mentioned above ... - 01:03 AM Cleanup #52796 (Pending Backport): rados: build minimally when "WITH_MGR" is off
- Minimize footprint of the MGR when WITH_MGR is off. Include the minimal in MON. Don't include any MGR tests.
- 12:51 AM Bug #52781 (Fix Under Review): shard-threads cannot wakeup bug
09/30/2021
- 10:53 PM Backport #51952: pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing()....
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43099
merged - 10:52 PM Backport #52322: pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43306
merged - 08:01 PM Backport #52792 (Rejected): octopus: common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_fli...
- 08:01 PM Backport #52791 (Resolved): pacific: common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_fli...
- https://github.com/ceph/ceph/pull/51249
- 08:01 PM Bug #51527 (New): Ceph osd crashed due to segfault
- 02:54 PM Bug #51527: Ceph osd crashed due to segfault
- I was able to reproduce on pacific (v16.2.6). Log attached.
- 07:59 PM Bug #44715 (Pending Backport): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_li...
- 02:52 PM Bug #44715: common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back())->ops_in_...
- https://github.com/ceph/ceph/pull/34624 merged
- 09:38 AM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
- I have kept restarting the incorrectly configured osds daemons until they got the right front_addr. In some cases it ...
- 09:23 AM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
- Upgraded from v16.2.6 to v16.2.6-20210927 to apply the remoto bug fix.
After the upgrade (no reboot of the nodes b... - 09:19 AM Support #52786 (New): processing of finisher is a block box, any means to observate it?
- 08:34 AM Bug #52781: shard-threads cannot wakeup bug
- https://github.com/ceph/ceph/pull/43360
- 08:34 AM Bug #52781 (Resolved): shard-threads cannot wakeup bug
- osd: fix shard-threads cannot wakeup bug
Reproduce:
(1) ceph cluster not running any client IO
(2) only ceph osd...
09/29/2021
- 06:48 PM Feature #52609: New PG states for pending scrubs / repairs
- While I agree the format is readable, it's a bit narrow in application.
Would it be a significant undertaking to:
... - 07:11 AM Feature #52609: New PG states for pending scrubs / repairs
- That schedule element seems like a pretty reasonable human-readable summary.
- 05:18 PM Bug #51527: Ceph osd crashed due to segfault
- I've attached the shell script "load-bi.sh". It requires that a cluster be brought up with RGW. It requires that a bu...
- 04:00 PM Bug #52756: ceph-kvstore-tool repair segmentfault without bluestore-kv
- ...
- 03:51 PM Bug #52756: ceph-kvstore-tool repair segmentfault without bluestore-kv
- huang jun wrote:
> [...]
The backtrace like this:... - 03:09 PM Bug #52756 (Fix Under Review): ceph-kvstore-tool repair segmentfault without bluestore-kv
- 07:26 AM Bug #52756 (Resolved): ceph-kvstore-tool repair segmentfault without bluestore-kv
- ...
- 03:00 PM Backport #52771 (Rejected): nautilus: pg scrub stat mismatch with special objects that have hash ...
- 03:00 PM Backport #52770 (Resolved): pacific: pg scrub stat mismatch with special objects that have hash '...
- https://github.com/ceph/ceph/pull/43512
- 03:00 PM Backport #52769 (Resolved): octopus: pg scrub stat mismatch with special objects that have hash '...
- 12:25 PM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
- In some cases it requires several daemon restarts until it gets to the right configuration.
I don't know if the wr... - 10:15 AM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
- Restarting the daemons seems to get the correct configuration but it is unclear why this did not happen when they wer...
- 10:00 AM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
- Just as statistics, there are now:
- 51 cases where there is an error in the front_addr or hb_front_addr configura... - 09:52 AM Bug #52761 (New): OSDs announcing incorrect front_addr after upgrade to 16.2.6
- Ceph cluster configured with a public and cluster network:
>> ceph config dump|grep network
global advanced cl... - 09:19 AM Bug #52760 (Need More Info): Monitor unable to rejoin the cluster
- Our cluster has three monitors.
After a restart one of our monitors failed to join the cluster with:
Sep 24 07:52...
09/28/2021
- 11:16 PM Cleanup #52754 (New): windows warnings
- ...
- 11:12 PM Cleanup #52753 (Rejected): rbd cls : centos 8 warning
- ...
- 11:11 PM Cleanup #52752 (New): fix warnings
- there are warnings existing in ceph codebase that needs update with respect to mordern c++
eg one of them:
<pre... - 02:23 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Patrick Donnelly wrote:
> Neha Ojha wrote:
> > Patrick Donnelly wrote:
> > > Patrick Donnelly wrote:
> > > > Neha... - 01:24 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Neha Ojha wrote:
> Patrick Donnelly wrote:
> > Patrick Donnelly wrote:
> > > Neha Ojha wrote:
> > > > [...]
> > ... - 07:06 AM Backport #52586 (Resolved): pacific: src/vstart: The command "set config key osd_mclock_max_capac...
- 07:05 AM Fix #52329 (Resolved): src/vstart: The command "set config key osd_mclock_max_capacity_iops_ssd" ...
- 07:04 AM Backport #52564 (Resolved): pacific: osd: Add config option to skip running the OSD benchmark on ...
- 07:03 AM Fix #52025 (Resolved): osd: Add config option to skip running the OSD benchmark on init.
- 07:03 AM Backport #51988 (Resolved): pacific: osd: Add mechanism to avoid running osd benchmark on osd ini...
- 07:01 AM Fix #51464 (Resolved): osd: Add mechanism to avoid running osd benchmark on osd init when using m...
- 07:00 AM Fix #51116 (Resolved): osd: Run osd bench test to override default max osd capacity for mclock.
- 06:59 AM Backport #51117 (Resolved): pacific: osd: Run osd bench test to override default max osd capacity...
- 05:08 AM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- > Can you share the full set of logs using ceph-post-file (https://docs.ceph.com/en/pacific/man/8/ceph-post-file/)?
...
09/27/2021
- 09:15 PM Backport #52322 (In Progress): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
- 08:12 PM Backport #51555: octopus: mon: return -EINVAL when handling unknown option in 'ceph osd pool get'
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43266
merged - 08:09 PM Backport #51967: octopus: set a non-zero default value for osd_client_message_cap
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42616
merged - 07:47 PM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
- lei cao wrote:
> global parms like mon_max_pg_per_osd can be update at runtime?It seems that mon_max_pg_per_osd is ... - 07:43 PM Bug #52509: PG merge: PG stuck in premerge+peered state
- Konstantin Shalygin wrote:
> We can plan and spent time to setup staging cluster for this and try to reproduce it, i... - 07:40 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Patrick Donnelly wrote:
> Patrick Donnelly wrote:
> > Neha Ojha wrote:
> > > [...]
> > >
> > > Looks like peeri... - 07:31 PM Backport #52747 (Resolved): pacific: MON_DOWN during mon_join process
- https://github.com/ceph/ceph/pull/48558
- 07:31 PM Backport #52746 (Rejected): octopus: MON_DOWN during mon_join process
- 07:28 PM Bug #52724: octopus: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
- Seems to be the same issue as https://tracker.ceph.com/issues/43584, marked the original ticket for backport.
- 07:27 PM Bug #43584 (Pending Backport): MON_DOWN during mon_join process
- 07:21 PM Bug #51527 (Need More Info): Ceph osd crashed due to segfault
- 07:21 PM Bug #51527: Ceph osd crashed due to segfault
- Hi Eric,
Could you please share the test that reproduces this crash on a vstart cluster? - 07:18 PM Bug #52741: pg inconsistent state is lost after the primary osd restart
- Can you please verify this behavior?
- 09:17 AM Bug #52741: pg inconsistent state is lost after the primary osd restart
- Just a note. I see that to re-detect inconsistency just running scrub (not deep-scrub) is enough, which is supposed t...
- 08:49 AM Bug #52741 (New): pg inconsistent state is lost after the primary osd restart
- Steps to reproduce:
- Create a pool (either replicated or erasure)
- Introduce an inconsistency (e.g. put an obje... - 02:13 PM Bug #52739 (Fix Under Review): msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
- 06:42 AM Bug #52739: msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
- https://github.com/ceph/ceph/pull/43307
- 06:09 AM Bug #52739 (Resolved): msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
- ProtocalV2 sets the recv_stamp after the message is throttled and received completely.
This is wrong because it wa... - 09:30 AM Bug #43174 (Resolved): pgs inconsistent, union_shard_errors=missing
- 09:29 AM Backport #47365 (Resolved): mimic: pgs inconsistent, union_shard_errors=missing
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37053
m... - 07:15 AM Bug #52486 (Resolved): test tracker: please ignore
09/26/2021
- 10:30 AM Bug #52737 (Duplicate): osd/tests: stat mismatch
- Test fails with:...
- 05:00 AM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- Neha Ojha wrote:
> Can you provide a link to the failed run?
Trying to reproduce.
09/25/2021
- 11:24 AM Feature #52609: New PG states for pending scrubs / repairs
- Is the following a good enough solution?
Neha Ojha, Josh Durgin, Sam Just - what do you think?
I have drafted a...
09/24/2021
- 06:29 PM Bug #52731 (New): FAILED ceph_assert(!slot->waiting_for_split.empty())
- ...
- 03:21 PM Backport #51117: pacific: osd: Run osd bench test to override default max osd capacity for mclock.
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41731
merged - 03:16 PM Backport #51604: octopus: bufferlist::splice() may cause stack corruption in bufferlist::rebuild_...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42975
merged - 01:59 PM Bug #51527: Ceph osd crashed due to segfault
- J. Eric Ivancich wrote:
> I too have run into this segfault in the two latest versions of Octopus -- 15.2.14 and 15.... - 10:01 AM Bug #52724 (Duplicate): octopus: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
- ...
09/23/2021
- 05:56 PM Bug #51527: Ceph osd crashed due to segfault
- ...
- 05:50 PM Bug #51527: Ceph osd crashed due to segfault
- Attached is an example log with backtrace that I get.
- 03:13 PM Bug #52707: mixed pool types reported via telemetry
- So far there are 6 clusters reporting this:
http://telemetry.front.sepia.ceph.com:4000/d/1rDWH5H7k/replicated-pool-w...
09/22/2021
- 10:45 PM Backport #52710 (Resolved): octopus: partial recovery become whole object recovery after restart osd
- https://github.com/ceph/ceph/pull/44165
- 10:35 PM Backport #43623 (Rejected): nautilus: pg: fastinfo incorrect when last_update moves backward in time
- Nautilus is EOL.
- 10:33 PM Backport #44835 (Rejected): nautilus: librados mon_command (mgr) command hang
- Nautilus is EOL
- 10:32 PM Backport #51523 (Resolved): octopus: osd: Delay sending info to new backfill peer resetting last_...
- 06:18 PM Backport #51555 (In Progress): octopus: mon: return -EINVAL when handling unknown option in 'ceph...
- 06:02 PM Backport #51552 (In Progress): octopus: rebuild-mondb hangs
- 05:39 PM Bug #51527: Ceph osd crashed due to segfault
- I too have run into this segfault in the two latest versions of Octopus -- 15.2.14 and 15.2.13.
I can reproduce it... - 05:24 PM Bug #52707 (New): mixed pool types reported via telemetry
- In telemetry reports there are clusters with pools of type `replicated` with erasure_code_profile defined, for exampl...
- 09:15 AM Support #52700 (New): OSDs wont start
- We have a three node ceph cluster with 3 baremetal nodes running 4 OSDs each (In total 12 OSDs). Recently, my collegu...
09/21/2021
- 11:51 PM Bug #52694 (Duplicate): src/messages/MOSDPGLog.h: virtual void MOSDPGLog::encode_payload(uint64_t...
- ...
- 08:47 PM Bug #52686 (Fix Under Review): scrub: deep-scrub command does not initiate a scrub
- 11:23 AM Bug #52686 (Fix Under Review): scrub: deep-scrub command does not initiate a scrub
- Following an old change that made operator-initiated scrubs into a type of 'scheduled scrubs':
the operator command ... - 03:30 PM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- I think I found an issue in `ECBackend::get_hash_info` that might be responsible for introducing the inconsistency in...
- 12:13 PM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- Just for the record. In our customer case it was a mix of bluestore and filestore osds. The primary osd was the fails...
- 11:31 AM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- That's awesome, thanks. The behaviour you suggest sounds sensible to me.
Since it's been a while, I should probabl... - 10:44 AM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
- We have a customer who experienced the same issue. In our case the hash info was corrupted only on two shards. I have...
- 10:38 AM Bug #48959 (Fix Under Review): Primary OSD crash caused corrupted object and further crashes duri...
- 01:45 PM Bug #52553: pybind: rados.RadosStateError raised when closed watch object goes out of scope after...
- ...
- 11:43 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- ...
- 09:00 AM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- Thank you for your reply.
My cluster has another trouble now. So I'll take these logs after resolving this problem.
09/20/2021
- 04:29 PM Bug #38931 (Resolved): osd does not proactively remove leftover PGs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:24 PM Bug #52335 (Resolved): ceph df detail reports dirty objects without a cache tier
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:23 PM Backport #51966 (Resolved): nautilus: set a non-zero default value for osd_client_message_cap
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42617
m... - 04:23 PM Backport #51583 (Resolved): nautilus: osd does not proactively remove leftover PGs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42240
m... - 04:21 PM Backport #52337 (Resolved): octopus: ceph df detail reports dirty objects without a cache tier
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42862
m... - 04:19 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Patrick Donnelly wrote:
> Neha Ojha wrote:
> > [...]
> >
> > Looks like peering induced by mapping change by the... - 07:17 AM Bug #52445: OSD asserts on starting too many pushes
- Neha Ojha wrote:
> Thanks, is it possible for you to share the logs using ceph-post-file (https://docs.ceph.com/en/p...
09/19/2021
09/18/2021
- 12:23 PM Bug #52657 (In Progress): MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, ...
Test: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overr...- 11:21 AM Backport #51497 (In Progress): nautilus: mgr spamming with repeated set pgp_num_actual while merging
- 11:07 AM Bug #52509: PG merge: PG stuck in premerge+peered state
- We can plan and spent time to setup staging cluster for this and try to reproduce it, if this a bug. With debug_mgr "...
- 11:04 AM Bug #52509: PG merge: PG stuck in premerge+peered state
- Neha, sorry, logs already rotated by logrotate.
When I was removes all upmaps from this pool - all PG merges passed ... - 01:08 AM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
- global parms like mon_max_pg_per_osd can be update at runtime?It seems that mon_max_pg_per_osd is not be observed by ...
- 12:38 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- Neha Ojha wrote:
> [...]
>
> Looks like peering induced by mapping change by the balancer. How often does this ha... - 12:03 AM Bug #52489 (New): Adding a Pacific MON to an Octopus cluster: All PGs inactive
- Chris Dunlop wrote:
> Neha Ojha wrote:
> > This is expected when mons don't form quorum, here it was caused by http...
09/17/2021
- 11:29 PM Bug #52489: Adding a Pacific MON to an Octopus cluster: All PGs inactive
- Neha Ojha wrote:
> This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issu... - 10:26 PM Bug #52489 (Duplicate): Adding a Pacific MON to an Octopus cluster: All PGs inactive
- This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issues/52488. Let's use ...
- 10:52 PM Bug #52509 (Need More Info): PG merge: PG stuck in premerge+peered state
- This is similar to https://tracker.ceph.com/issues/44684, which has already been fixed. It seems like the pgs are in ...
- 10:30 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
- Most likely caused by failure injections, might be worth looking at what was going in the osd logs when we started se...
- 10:13 PM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
- Can you share the full set of logs using ceph-post-file (https://docs.ceph.com/en/pacific/man/8/ceph-post-file/)?
- 09:58 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
- Is this reproducible on octopus or pacific, which are not EOL?
- 09:55 PM Bug #52445 (New): OSD asserts on starting too many pushes
- Thanks, is it possible for you to share the logs using ceph-post-file (https://docs.ceph.com/en/pacific/man/8/ceph-po...
- 09:47 PM Bug #52618 (Won't Fix - EOL): Ceph Luminous 12.2.13 OSD assert message
- Please re-open if you happen to see the same issue on a recent release.
- 09:46 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- ...
- 09:35 PM Bug #52640 (Need More Info): when osds out,reduce pool size reports a error "Error ERANGE: pool i...
- We can workaround this by temporarily increasing mon_max_pg_per_osd, right?
- 03:41 AM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
- https://github.com/ceph/ceph/pull/43201
- 03:15 AM Bug #52640 (Need More Info): when osds out,reduce pool size reports a error "Error ERANGE: pool i...
- At first,my cluster have 6 osds,3 pools which pgnum are all 256 and size are 2,mon_max_pg_per_osd is 300.So ,we need ...
- 09:30 PM Bug #52562: Thrashosds read error injection failed with error ENXIO
- Deepika Upadhyay wrote:
> [...]
> /ceph/teuthology-archive/yuriw-2021-09-13_19:12:32-rados-wip-yuri6-testing-2021-0... - 12:32 PM Bug #52562: Thrashosds read error injection failed with error ENXIO
- ...
- 09:24 PM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
- The log attached has a sha1 ca906d0d7a65c8a598d397b764dd262cce645fe3, is this the first time you encountered this iss...
- 05:56 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- /a/sage-2021-09-16_18:04:19-rados-wip-sage-testing-2021-09-16-1020-distro-basic-smithi/6393058
note that this is m... - 04:27 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
- /a/yuriw-2021-09-16_18:23:18-rados-wip-yuri2-testing-2021-09-16-0923-distro-basic-smithi/6393474
- 03:46 PM Documentation #35968 (Won't Fix): [doc][jewel] sync documentation "OSD Config Reference" default ...
- 09:41 AM Bug #50351 (Resolved): osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd restar...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:35 AM Backport #51605 (Resolved): pacific: bufferlist::splice() may cause stack corruption in bufferlis...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42976
m... - 08:35 AM Backport #52644 (In Progress): nautilus: pool last_epoch_clean floor is stuck after pg merging
- 08:34 AM Backport #52644 (Rejected): nautilus: pool last_epoch_clean floor is stuck after pg merging
- https://github.com/ceph/ceph/pull/43204
- 06:09 AM Bug #48508 (Resolved): Donot roundoff the bucket weight while decompiling crush map to source
09/16/2021
- 10:14 PM Backport #51966: nautilus: set a non-zero default value for osd_client_message_cap
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42617
merged - 10:13 PM Backport #51583: nautilus: osd does not proactively remove leftover PGs
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42240
merged - 07:01 PM Bug #49697 (Resolved): prime pg temp: unexpected optimization
- fan chen wrote:
> Recently, I find patch "https://github.com/ceph/ceph/commit/023524a26d7e12e7ddfc3537582b1a1cb03af6... - 11:16 AM Bug #52408 (Can't reproduce): osds not peering correctly after startup
- I rebuilt my cluster yesterday using a container image based on commit d906f946e845, and I'm not able to reproduce th...
- 02:08 AM Bug #52624 (New): qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABIL...
- /ceph/teuthology-archive/pdonnell-2021-09-14_01:17:08-fs-wip-pdonnell-testing-20210910.181451-distro-basic-smithi/638...
09/15/2021
- 09:23 PM Bug #52605 (Fix Under Review): osd: add scrub duration to pg dump
- 06:39 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
- I reran the test 10 times in https://pulpito.ceph.com/nojha-2021-09-14_18:44:41-rados:singleton-pacific-distro-basic-...
- 06:05 PM Bug #52621 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: bad ...
- ...
- 03:20 PM Backport #52620 (Resolved): pacific: partial recovery become whole object recovery after restart osd
- https://github.com/ceph/ceph/pull/43513
- 03:18 PM Bug #52583 (Pending Backport): partial recovery become whole object recovery after restart osd
- 02:50 PM Backport #52337: octopus: ceph df detail reports dirty objects without a cache tier
- Deepika Upadhyay wrote:
> https://github.com/ceph/ceph/pull/42862
merged - 02:43 PM Bug #50393: CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client...
- https://github.com/ceph/ceph/pull/42498 merged
- 02:18 PM Bug #52618 (Won't Fix - EOL): Ceph Luminous 12.2.13 OSD assert message
- 2021-09-02 14:25:37.173453 7f2235baf700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/...
09/14/2021
- 05:53 PM Feature #52609 (Fix Under Review): New PG states for pending scrubs / repairs
- Request to add new PG states to provide feedback to the admin when a PG scrub/repair is scheduled ( via command line,...
- 01:14 PM Bug #52605 (Resolved): osd: add scrub duration to pg dump
- We would like to add a new column to the pg dump which would give us the time it took for a pg to get scrubbed.
09/13/2021
- 10:48 PM Bug #52562 (Triaged): Thrashosds read error injection failed with error ENXIO
- Looking at the osd and mon logs
Here's when the osd was restarted in revive_osd()... - 09:05 PM Backport #52596 (Rejected): octopus: make bufferlist::c_str() skip rebuild when it isn't necessary
- 09:05 PM Backport #52595 (New): pacific: make bufferlist::c_str() skip rebuild when it isn't necessary
- 09:03 PM Feature #51725 (Pending Backport): make bufferlist::c_str() skip rebuild when it isn't necessary
- This spares a heap allocation on every received message. Marking for backporting to both octopus and pacific as a lo...
- 02:21 PM Bug #45871: Incorrect (0) number of slow requests in health check
- On a...
- 10:35 AM Backport #52586 (Resolved): pacific: src/vstart: The command "set config key osd_mclock_max_capac...
- https://github.com/ceph/ceph/pull/41731
- 10:24 AM Bug #52583: partial recovery become whole object recovery after restart osd
- FIX URL:
https://github.com/ceph/ceph/pull/43146
https://github.com/ceph/ceph/pull/42904 - 09:43 AM Bug #52583 (Resolved): partial recovery become whole object recovery after restart osd
- Problem: After the osd that is undergoing partial recovery is restarted, the data recovery is rolled back from the pa...
- 05:32 AM Bug #52578 (Fix Under Review): CLI - osd pool rm --help message is wrong or misleading
- CLI - osd pool rm --help message is wrong or misleading
Version-Release number of selected component (if applicabl... - 04:19 AM Bug #52445: OSD asserts on starting too many pushes
- Neha Ojha wrote:
> Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of cep...
Also available in: Atom