Project

General

Profile

Activity

From 09/12/2021 to 10/11/2021

10/11/2021

10:28 PM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
Satoru Takeuchi wrote:
> > Can you share the full set of logs using ceph-post-file (https://docs.ceph.com/en/pacific...
Neha Ojha
10:25 PM Backport #52893 (Rejected): octopus: ceph-kvstore-tool repair segmentfault without bluestore-kv
Backport Bot
10:25 PM Backport #52892 (Resolved): pacific: ceph-kvstore-tool repair segmentfault without bluestore-kv
https://github.com/ceph/ceph/pull/51254 Backport Bot
10:24 PM Bug #52756 (Pending Backport): ceph-kvstore-tool repair segmentfault without bluestore-kv
based on https://tracker.ceph.com/issues/52756#note-2, looks like the fix needs to be backported all the way. Neha Ojha
10:20 PM Bug #52513 (Need More Info): BlueStore.cc: 12391: ceph_abort_msg(\"unexpected error\") on operati...
Can you capture a coredump or osd logs with debug_osd=20,debug_bluestore=20,debug_ms=1 if this crash is reproducible? Neha Ojha
10:13 PM Bug #52126 (Pending Backport): stretch mode: allow users to change the tiebreaker monitor
Neha Ojha
10:13 PM Bug #52872: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
Myoungwon Oh : I am assigning it to you, in case you have any thoughts on this issue. Feel free to un-assign if you d... Neha Ojha
10:08 PM Bug #52884 (Fix Under Review): osd: optimize pg peering latency when add new osd that need backfill
Neha Ojha
05:29 AM Bug #52884: osd: optimize pg peering latency when add new osd that need backfill
https://github.com/ceph/ceph/pull/43482 jianwei zhang
05:28 AM Bug #52884 (Fix Under Review): osd: optimize pg peering latency when add new osd that need backfill
Reproduce:
(1) ceph cluster not running any client IO
(2) only ceph osd in osd.14 operation ( add new osd to cluste...
jianwei zhang
10:07 PM Bug #52886: osd: backfill reservation does not take compression into account
I'll create a trello card to track this, I think the initial toofull implementation was intentionally kept simple, bu... Neha Ojha
08:42 AM Bug #52886 (New): osd: backfill reservation does not take compression into account
The problem may be observed with the recently added backfill-toofull test when it runs with bluestore-comp-lz4. When ... Mykola Golub
10:02 PM Bug #51463: blocked requests while stopping/starting OSDs
Is it possible for you to share your test reproducer with us? It would be great if we could run it against a vstart c... Neha Ojha
05:19 PM Bug #52889 (Triaged): upgrade tests fails because of missing ceph-volume package
We need something like 4e525127fbb710c1ac074cf61b448055781a69e3, for octopus-x as well. Neha Ojha
04:43 PM Bug #52889 (Triaged): upgrade tests fails because of missing ceph-volume package
Recently we seperated ceph-volume from ceph-base to a seperate one, have not looked deeply into it right now, but I s... Deepika Upadhyay

10/10/2021

07:03 PM Documentation #22843 (Won't Fix): [doc][luminous] the configuration guide still contains osd_op_t...
Anthony D'Atri
07:02 PM Documentation #7386 (Won't Fix): librados: document rados_osd_op_timeout and rados_mon_op_timeout...
Seven years old, marked @Advanced@, making a judgement call and closing this. If you disagree, let me know and I'll ... Anthony D'Atri

10/09/2021

10:27 PM Support #52881 (New): Filtered out host node3.foo.com: does not belong to mon public_network ()
I am running a Ceph Pacific cluster ( version 16.2.6) consisting of 3 nodes with public Internet Addresses. I also ha... Ralph Soika
07:43 PM Bug #52867: pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:fd00:3...
I set the following after bootstrap and before adding any OSDs but I got the same error.
`ceph config set mon ms_b...
John Fulton
03:12 PM Bug #52878 (Fix Under Review): qa/tasks: python3 'dict' object has no attribute 'iterkeys' error
Kefu Chai
08:07 AM Bug #52878 (Resolved): qa/tasks: python3 'dict' object has no attribute 'iterkeys' error
os: CentOS8.4
ceph version: ceph16.2.4
Teuthology error log:
2021-08-27T09:18:09.787 DEBUG:teuthology.orchestra....
Zhiwei Dai
12:05 AM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
... Deepika Upadhyay

10/08/2021

10:22 PM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
merged https://github.com/ceph/ceph/pull/43324 Yuri Weinstein
09:06 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
All evidences pointing to a firmware bug persisting in dozens of used Hitachi 12TB (HGST model HUH721212AL5200) HDDs ... Dmitry Smirnov
12:18 PM Bug #52872 (Pending Backport): LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.
/a/yuriw-2021-10-04_21:49:48-rados-wip-yuri4-testing-2021-10-04-1236-distro-basic-smithi/6421867... Sridhar Seshasayee

10/07/2021

11:00 PM Backport #52868 (In Progress): stretch mode: allow users to change the tiebreaker monitor
Greg Farnum
10:52 PM Backport #52868 (Resolved): stretch mode: allow users to change the tiebreaker monitor
https://github.com/ceph/ceph/pull/43457 Greg Farnum
08:59 PM Bug #52867 (New): pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:...
When using IPv6 for my public and cluster network my mon is able to bootstrap (because I have [1]) but I end up with ... John Fulton
02:25 PM Backport #52809: octopus: ceph-erasure-code-tool: new tool to encode/decode files
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43407
merged
Yuri Weinstein
12:40 AM Backport #52845 (Rejected): pacific: osd: add scrub duration to pg dump
https://github.com/ceph/ceph/pull/43704 Backport Bot
12:35 AM Bug #52605 (Pending Backport): osd: add scrub duration to pg dump
Neha Ojha

10/06/2021

10:45 PM Backport #52843 (Resolved): pacific: msg/async/ProtocalV2: recv_stamp of a message is set to a wr...
https://github.com/ceph/ceph/pull/43511 Backport Bot
10:45 PM Backport #52842 (Rejected): octopus: msg/async/ProtocalV2: recv_stamp of a message is set to a wr...
Backport Bot
10:45 PM Backport #52841 (Resolved): pacific: shard-threads cannot wakeup bug
https://github.com/ceph/ceph/pull/51262 Backport Bot
10:45 PM Backport #52840 (Rejected): octopus: shard-threads cannot wakeup bug
Backport Bot
10:45 PM Backport #52839 (Resolved): pacific: rados: build minimally when "WITH_MGR" is off
https://github.com/ceph/ceph/pull/51250 Backport Bot
10:45 PM Backport #52838 (Rejected): octopus: rados: build minimally when "WITH_MGR" is off
Backport Bot
10:43 PM Bug #52796 (Pending Backport): rados: build minimally when "WITH_MGR" is off
Kefu Chai
10:42 PM Bug #52781 (Pending Backport): shard-threads cannot wakeup bug
Kefu Chai
10:40 PM Bug #52739 (Pending Backport): msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
Kefu Chai
06:24 PM Bug #48965: qa/standalone/osd/osd-force-create-pg.sh: TEST_reuse_id: return 1
https://pulpito.ceph.com/kchai-2021-10-05_16:14:10-rados-wip-kefu-testing-2021-10-05-2221-distro-basic-smithi/6423567 Greg Farnum
05:56 PM Backport #52832 (In Progress): nautilus: osd: pg may get stuck in backfill_toofull after backfill...
Mykola Golub
04:35 PM Backport #52832 (Rejected): nautilus: osd: pg may get stuck in backfill_toofull after backfill is...
https://github.com/ceph/ceph/pull/43439 Backport Bot
05:40 PM Backport #52833 (In Progress): octopus: osd: pg may get stuck in backfill_toofull after backfill ...
Mykola Golub
04:35 PM Backport #52833 (Resolved): octopus: osd: pg may get stuck in backfill_toofull after backfill is ...
https://github.com/ceph/ceph/pull/43438 Backport Bot
05:39 PM Backport #52831 (In Progress): pacific: osd: pg may get stuck in backfill_toofull after backfill ...
Mykola Golub
04:35 PM Backport #52831 (Resolved): pacific: osd: pg may get stuck in backfill_toofull after backfill is ...
https://github.com/ceph/ceph/pull/43437 Backport Bot
04:31 PM Bug #52448 (Pending Backport): osd: pg may get stuck in backfill_toofull after backfill is interr...
Mykola Golub

10/05/2021

02:56 PM Backport #51552: octopus: rebuild-mondb hangs
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/43263
merged
Yuri Weinstein
02:46 PM Backport #51569: octopus: pool last_epoch_clean floor is stuck after pg merging
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42837
merged
Yuri Weinstein
12:53 PM Bug #52815 (Fix Under Review): exact_timespan_str()
Ronen Friedman
12:25 PM Bug #52815 (Resolved): exact_timespan_str()
exact_timespan_str() in ceph_time.cc handles some specific time-spans incorrectly:
150.567 seconds, for example, w...
Ronen Friedman
04:06 AM Bug #43580 (Resolved): pg: fastinfo incorrect when last_update moves backward in time
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:06 AM Bug #44798 (Resolved): librados mon_command (mgr) command hang
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:05 AM Bug #48611 (Resolved): osd: Delay sending info to new backfill peer resetting last_backfill until...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:04 AM Bug #49894 (Resolved): set a non-zero default value for osd_client_message_cap
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:03 AM Bug #51000 (Resolved): LibRadosTwoPoolsPP.ManifestSnapRefcount failure
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:02 AM Bug #51419 (Resolved): bufferlist::splice() may cause stack corruption in bufferlist::rebuild_ali...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:02 AM Bug #51627 (Resolved): FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing().is_...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:00 AM Backport #51952 (Resolved): pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43099
m...
Loïc Dachary
04:00 AM Backport #52322 (Resolved): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43306
m...
Loïc Dachary
03:59 AM Backport #51117: pacific: osd: Run osd bench test to override default max osd capacity for mclock.
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41731
m...
Loïc Dachary
03:55 AM Backport #51555 (Resolved): octopus: mon: return -EINVAL when handling unknown option in 'ceph os...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/43266
m...
Loïc Dachary
03:55 AM Backport #51967 (Resolved): octopus: set a non-zero default value for osd_client_message_cap
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42616
m...
Loïc Dachary
03:52 AM Backport #51604 (Resolved): octopus: bufferlist::splice() may cause stack corruption in bufferlis...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42975
m...
Loïc Dachary
12:03 AM Bug #52126 (Fix Under Review): stretch mode: allow users to change the tiebreaker monitor
Greg Farnum

10/04/2021

09:33 PM Feature #52609 (In Progress): New PG states for pending scrubs / repairs
Neha Ojha
03:50 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
Igor Fedotov wrote:
> Dmitry Smirnov wrote:
>
> Hey Dmitry,
> I'm not sure why you're saying this is the same co...
Igor Fedotov
03:47 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
Dmitry Smirnov wrote:
> We face the same issue with simultaneously four OSD down events at 4 different hosts and ina...
Igor Fedotov
08:46 AM Bug #50683 (Rejected): [RBD] master - cluster [WRN] Health check failed: mon is allowing insecure...
Ilya Dryomov
08:34 AM Backport #52808 (In Progress): nautilus: ceph-erasure-code-tool: new tool to encode/decode files
Mykola Golub
08:15 AM Backport #52808 (Rejected): nautilus: ceph-erasure-code-tool: new tool to encode/decode files
https://github.com/ceph/ceph/pull/43408 Backport Bot
08:17 AM Backport #52809 (In Progress): octopus: ceph-erasure-code-tool: new tool to encode/decode files
Mykola Golub
08:15 AM Backport #52809 (Resolved): octopus: ceph-erasure-code-tool: new tool to encode/decode files
https://github.com/ceph/ceph/pull/43407 Backport Bot
08:14 AM Bug #52807 (Resolved): ceph-erasure-code-tool: new tool to encode/decode files
This tool has already been pushed into pre-pacific master [1] and we have it since Pacific.
There is a demand from...
Mykola Golub

10/03/2021

06:41 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
We face the same issue with simultaneously four OSD down events at 4 different hosts and inability to restart them al... Dmitry Smirnov
02:29 PM Bug #50657: smart query on monitors
Just wanted to add that we have similar situation where we have 3 dedicated mon nodes, each running in their own cont... Matthew Darwin

10/01/2021

02:07 PM Bug #51463: blocked requests while stopping/starting OSDs
This is still a issue- In the newest Pacific release (16.2.5) as well
The developer documentation mentioned above ...
Manuel Lausch
01:03 AM Bug #52796 (Resolved): rados: build minimally when "WITH_MGR" is off
Minimize footprint of the MGR when WITH_MGR is off. Include the minimal in MON. Don't include any MGR tests.
J. Eric Ivancich
12:51 AM Bug #52781 (Fix Under Review): shard-threads cannot wakeup bug
Kefu Chai

09/30/2021

10:53 PM Backport #51952: pacific: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_missing()....
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43099
merged
Yuri Weinstein
10:52 PM Backport #52322: pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43306
merged
Yuri Weinstein
08:01 PM Backport #52792 (Rejected): octopus: common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_fli...
Backport Bot
08:01 PM Backport #52791 (Resolved): pacific: common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_fli...
https://github.com/ceph/ceph/pull/51249 Backport Bot
08:01 PM Bug #51527 (New): Ceph osd crashed due to segfault
Neha Ojha
02:54 PM Bug #51527: Ceph osd crashed due to segfault
I was able to reproduce on pacific (v16.2.6). Log attached. J. Eric Ivancich
07:59 PM Bug #44715 (Pending Backport): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_li...
Neha Ojha
02:52 PM Bug #44715: common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back())->ops_in_...
https://github.com/ceph/ceph/pull/34624 merged Yuri Weinstein
09:38 AM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
I have kept restarting the incorrectly configured osds daemons until they got the right front_addr. In some cases it ... Javier Cacheiro
09:23 AM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
Upgraded from v16.2.6 to v16.2.6-20210927 to apply the remoto bug fix.
After the upgrade (no reboot of the nodes b...
Javier Cacheiro
09:19 AM Support #52786 (New): processing of finisher is a block box, any means to observate it?
Jack Lv
08:34 AM Bug #52781: shard-threads cannot wakeup bug
https://github.com/ceph/ceph/pull/43360 jianwei zhang
08:34 AM Bug #52781 (Resolved): shard-threads cannot wakeup bug
osd: fix shard-threads cannot wakeup bug
Reproduce:
(1) ceph cluster not running any client IO
(2) only ceph osd...
jianwei zhang

09/29/2021

06:48 PM Feature #52609: New PG states for pending scrubs / repairs
While I agree the format is readable, it's a bit narrow in application.
Would it be a significant undertaking to:
...
Michael Kidd
07:11 AM Feature #52609: New PG states for pending scrubs / repairs
That schedule element seems like a pretty reasonable human-readable summary. Samuel Just
05:18 PM Bug #51527: Ceph osd crashed due to segfault
I've attached the shell script "load-bi.sh". It requires that a cluster be brought up with RGW. It requires that a bu... J. Eric Ivancich
04:00 PM Bug #52756: ceph-kvstore-tool repair segmentfault without bluestore-kv
... huang jun
03:51 PM Bug #52756: ceph-kvstore-tool repair segmentfault without bluestore-kv
huang jun wrote:
> [...]
The backtrace like this:...
huang jun
03:09 PM Bug #52756 (Fix Under Review): ceph-kvstore-tool repair segmentfault without bluestore-kv
Kefu Chai
07:26 AM Bug #52756 (Resolved): ceph-kvstore-tool repair segmentfault without bluestore-kv
... huang jun
03:00 PM Backport #52771 (Rejected): nautilus: pg scrub stat mismatch with special objects that have hash ...
Backport Bot
03:00 PM Backport #52770 (Resolved): pacific: pg scrub stat mismatch with special objects that have hash '...
https://github.com/ceph/ceph/pull/43512 Backport Bot
03:00 PM Backport #52769 (Resolved): octopus: pg scrub stat mismatch with special objects that have hash '...
Backport Bot
12:25 PM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
In some cases it requires several daemon restarts until it gets to the right configuration.
I don't know if the wr...
Javier Cacheiro
10:15 AM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
Restarting the daemons seems to get the correct configuration but it is unclear why this did not happen when they wer... Javier Cacheiro
10:00 AM Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6
Just as statistics, there are now:
- 51 cases where there is an error in the front_addr or hb_front_addr configura...
Javier Cacheiro
09:52 AM Bug #52761 (New): OSDs announcing incorrect front_addr after upgrade to 16.2.6
Ceph cluster configured with a public and cluster network:
>> ceph config dump|grep network
global advanced cl...
Javier Cacheiro
09:19 AM Bug #52760 (Need More Info): Monitor unable to rejoin the cluster
Our cluster has three monitors.
After a restart one of our monitors failed to join the cluster with:
Sep 24 07:52...
Ruben Kerkhof

09/28/2021

11:16 PM Cleanup #52754 (New): windows warnings
... Deepika Upadhyay
11:12 PM Cleanup #52753 (Rejected): rbd cls : centos 8 warning
... Deepika Upadhyay
11:11 PM Cleanup #52752 (New): fix warnings
there are warnings existing in ceph codebase that needs update with respect to mordern c++
eg one of them:
<pre...
Deepika Upadhyay
02:23 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Patrick Donnelly wrote:
> Neha Ojha wrote:
> > Patrick Donnelly wrote:
> > > Patrick Donnelly wrote:
> > > > Neha...
Patrick Donnelly
01:24 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Neha Ojha wrote:
> Patrick Donnelly wrote:
> > Patrick Donnelly wrote:
> > > Neha Ojha wrote:
> > > > [...]
> > ...
Patrick Donnelly
07:06 AM Backport #52586 (Resolved): pacific: src/vstart: The command "set config key osd_mclock_max_capac...
Sridhar Seshasayee
07:05 AM Fix #52329 (Resolved): src/vstart: The command "set config key osd_mclock_max_capacity_iops_ssd" ...
Sridhar Seshasayee
07:04 AM Backport #52564 (Resolved): pacific: osd: Add config option to skip running the OSD benchmark on ...
Sridhar Seshasayee
07:03 AM Fix #52025 (Resolved): osd: Add config option to skip running the OSD benchmark on init.
Sridhar Seshasayee
07:03 AM Backport #51988 (Resolved): pacific: osd: Add mechanism to avoid running osd benchmark on osd ini...
Sridhar Seshasayee
07:01 AM Fix #51464 (Resolved): osd: Add mechanism to avoid running osd benchmark on osd init when using m...
Sridhar Seshasayee
07:00 AM Fix #51116 (Resolved): osd: Run osd bench test to override default max osd capacity for mclock.
Sridhar Seshasayee
06:59 AM Backport #51117 (Resolved): pacific: osd: Run osd bench test to override default max osd capacity...
Sridhar Seshasayee
05:08 AM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
> Can you share the full set of logs using ceph-post-file (https://docs.ceph.com/en/pacific/man/8/ceph-post-file/)?
...
Satoru Takeuchi

09/27/2021

09:15 PM Backport #52322 (In Progress): pacific: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
Neha Ojha
08:12 PM Backport #51555: octopus: mon: return -EINVAL when handling unknown option in 'ceph osd pool get'
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/43266
merged
Yuri Weinstein
08:09 PM Backport #51967: octopus: set a non-zero default value for osd_client_message_cap
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42616
merged
Yuri Weinstein
07:47 PM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
lei cao wrote:
> global parms like mon_max_pg_per_osd can be update at runtime?It seems that mon_max_pg_per_osd is ...
Neha Ojha
07:43 PM Bug #52509: PG merge: PG stuck in premerge+peered state
Konstantin Shalygin wrote:
> We can plan and spent time to setup staging cluster for this and try to reproduce it, i...
Neha Ojha
07:40 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Patrick Donnelly wrote:
> Patrick Donnelly wrote:
> > Neha Ojha wrote:
> > > [...]
> > >
> > > Looks like peeri...
Neha Ojha
07:31 PM Backport #52747 (Resolved): pacific: MON_DOWN during mon_join process
https://github.com/ceph/ceph/pull/48558 Backport Bot
07:31 PM Backport #52746 (Rejected): octopus: MON_DOWN during mon_join process
Backport Bot
07:28 PM Bug #52724: octopus: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
Seems to be the same issue as https://tracker.ceph.com/issues/43584, marked the original ticket for backport. Neha Ojha
07:27 PM Bug #43584 (Pending Backport): MON_DOWN during mon_join process
Neha Ojha
07:21 PM Bug #51527 (Need More Info): Ceph osd crashed due to segfault
Neha Ojha
07:21 PM Bug #51527: Ceph osd crashed due to segfault
Hi Eric,
Could you please share the test that reproduces this crash on a vstart cluster?
Neha Ojha
07:18 PM Bug #52741: pg inconsistent state is lost after the primary osd restart
Can you please verify this behavior? Neha Ojha
09:17 AM Bug #52741: pg inconsistent state is lost after the primary osd restart
Just a note. I see that to re-detect inconsistency just running scrub (not deep-scrub) is enough, which is supposed t... Mykola Golub
08:49 AM Bug #52741 (New): pg inconsistent state is lost after the primary osd restart
Steps to reproduce:
- Create a pool (either replicated or erasure)
- Introduce an inconsistency (e.g. put an obje...
Mykola Golub
02:13 PM Bug #52739 (Fix Under Review): msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
Kefu Chai
06:42 AM Bug #52739: msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
https://github.com/ceph/ceph/pull/43307 dongdong tao
06:09 AM Bug #52739 (Resolved): msg/async/ProtocalV2: recv_stamp of a message is set to a wrong value
ProtocalV2 sets the recv_stamp after the message is throttled and received completely.
This is wrong because it wa...
dongdong tao
09:30 AM Bug #43174 (Resolved): pgs inconsistent, union_shard_errors=missing
Nathan Cutler
09:29 AM Backport #47365 (Resolved): mimic: pgs inconsistent, union_shard_errors=missing
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37053
m...
Nathan Cutler
07:15 AM Bug #52486 (Resolved): test tracker: please ignore
Deepika Upadhyay

09/26/2021

10:30 AM Bug #52737 (Duplicate): osd/tests: stat mismatch
Test fails with:... Ronen Friedman
05:00 AM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
Neha Ojha wrote:
> Can you provide a link to the failed run?
Trying to reproduce.
Ronen Friedman

09/25/2021

11:24 AM Feature #52609: New PG states for pending scrubs / repairs
Is the following a good enough solution?
Neha Ojha, Josh Durgin, Sam Just - what do you think?
I have drafted a...
Ronen Friedman

09/24/2021

06:29 PM Bug #52731 (New): FAILED ceph_assert(!slot->waiting_for_split.empty())
... Aishwarya Mathuria
03:21 PM Backport #51117: pacific: osd: Run osd bench test to override default max osd capacity for mclock.
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41731
merged
Yuri Weinstein
03:16 PM Backport #51604: octopus: bufferlist::splice() may cause stack corruption in bufferlist::rebuild_...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42975
merged
Yuri Weinstein
01:59 PM Bug #51527: Ceph osd crashed due to segfault
J. Eric Ivancich wrote:
> I too have run into this segfault in the two latest versions of Octopus -- 15.2.14 and 15....
Evgeny Zakharov
10:01 AM Bug #52724 (Duplicate): octopus: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'
... Deepika Upadhyay

09/23/2021

05:56 PM Bug #51527: Ceph osd crashed due to segfault
... Neha Ojha
05:50 PM Bug #51527: Ceph osd crashed due to segfault
Attached is an example log with backtrace that I get. J. Eric Ivancich
03:13 PM Bug #52707: mixed pool types reported via telemetry
So far there are 6 clusters reporting this:
http://telemetry.front.sepia.ceph.com:4000/d/1rDWH5H7k/replicated-pool-w...
Yaarit Hatuka

09/22/2021

10:45 PM Backport #52710 (Resolved): octopus: partial recovery become whole object recovery after restart osd
https://github.com/ceph/ceph/pull/44165 Backport Bot
10:35 PM Backport #43623 (Rejected): nautilus: pg: fastinfo incorrect when last_update moves backward in time
Nautilus is EOL. Neha Ojha
10:33 PM Backport #44835 (Rejected): nautilus: librados mon_command (mgr) command hang
Nautilus is EOL Neha Ojha
10:32 PM Backport #51523 (Resolved): octopus: osd: Delay sending info to new backfill peer resetting last_...
Neha Ojha
06:18 PM Backport #51555 (In Progress): octopus: mon: return -EINVAL when handling unknown option in 'ceph...
Cory Snyder
06:02 PM Backport #51552 (In Progress): octopus: rebuild-mondb hangs
Cory Snyder
05:39 PM Bug #51527: Ceph osd crashed due to segfault
I too have run into this segfault in the two latest versions of Octopus -- 15.2.14 and 15.2.13.
I can reproduce it...
J. Eric Ivancich
05:24 PM Bug #52707 (New): mixed pool types reported via telemetry
In telemetry reports there are clusters with pools of type `replicated` with erasure_code_profile defined, for exampl... Yaarit Hatuka
09:15 AM Support #52700 (New): OSDs wont start
We have a three node ceph cluster with 3 baremetal nodes running 4 OSDs each (In total 12 OSDs). Recently, my collegu... Nishith Tiwari

09/21/2021

11:51 PM Bug #52694 (Duplicate): src/messages/MOSDPGLog.h: virtual void MOSDPGLog::encode_payload(uint64_t...
... Neha Ojha
08:47 PM Bug #52686 (Fix Under Review): scrub: deep-scrub command does not initiate a scrub
Neha Ojha
11:23 AM Bug #52686 (Fix Under Review): scrub: deep-scrub command does not initiate a scrub
Following an old change that made operator-initiated scrubs into a type of 'scheduled scrubs':
the operator command ...
Ronen Friedman
03:30 PM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
I think I found an issue in `ECBackend::get_hash_info` that might be responsible for introducing the inconsistency in... Mykola Golub
12:13 PM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
Just for the record. In our customer case it was a mix of bluestore and filestore osds. The primary osd was the fails... Mykola Golub
11:31 AM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
That's awesome, thanks. The behaviour you suggest sounds sensible to me.
Since it's been a while, I should probabl...
Tom Byrne
10:44 AM Bug #48959: Primary OSD crash caused corrupted object and further crashes during backfill after s...
We have a customer who experienced the same issue. In our case the hash info was corrupted only on two shards. I have... Mykola Golub
10:38 AM Bug #48959 (Fix Under Review): Primary OSD crash caused corrupted object and further crashes duri...
Mykola Golub
01:45 PM Bug #52553: pybind: rados.RadosStateError raised when closed watch object goes out of scope after...
... Deepika Upadhyay
11:43 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
... Deepika Upadhyay
09:00 AM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
Thank you for your reply.
My cluster has another trouble now. So I'll take these logs after resolving this problem.
Satoru Takeuchi

09/20/2021

04:29 PM Bug #38931 (Resolved): osd does not proactively remove leftover PGs
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:24 PM Bug #52335 (Resolved): ceph df detail reports dirty objects without a cache tier
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:23 PM Backport #51966 (Resolved): nautilus: set a non-zero default value for osd_client_message_cap
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42617
m...
Loïc Dachary
04:23 PM Backport #51583 (Resolved): nautilus: osd does not proactively remove leftover PGs
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42240
m...
Loïc Dachary
04:21 PM Backport #52337 (Resolved): octopus: ceph df detail reports dirty objects without a cache tier
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42862
m...
Loïc Dachary
04:19 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Patrick Donnelly wrote:
> Neha Ojha wrote:
> > [...]
> >
> > Looks like peering induced by mapping change by the...
Patrick Donnelly
07:17 AM Bug #52445: OSD asserts on starting too many pushes
Neha Ojha wrote:
> Thanks, is it possible for you to share the logs using ceph-post-file (https://docs.ceph.com/en/p...
Amudhan Pandia n

09/19/2021

01:58 PM Bug #52415 (Resolved): rocksdb: build error with rocksdb-6.22.x
Kefu Chai

09/18/2021

12:23 PM Bug #52657 (Fix Under Review): MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(featu...

Test: rados/thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overr...
Ronen Friedman
11:21 AM Backport #51497 (In Progress): nautilus: mgr spamming with repeated set pgp_num_actual while merging
Konstantin Shalygin
11:07 AM Bug #52509: PG merge: PG stuck in premerge+peered state
We can plan and spent time to setup staging cluster for this and try to reproduce it, if this a bug. With debug_mgr "... Konstantin Shalygin
11:04 AM Bug #52509: PG merge: PG stuck in premerge+peered state
Neha, sorry, logs already rotated by logrotate.
When I was removes all upmaps from this pool - all PG merges passed ...
Konstantin Shalygin
01:08 AM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
global parms like mon_max_pg_per_osd can be update at runtime?It seems that mon_max_pg_per_osd is not be observed by ... lei cao
12:38 AM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
Neha Ojha wrote:
> [...]
>
> Looks like peering induced by mapping change by the balancer. How often does this ha...
Patrick Donnelly
12:03 AM Bug #52489 (New): Adding a Pacific MON to an Octopus cluster: All PGs inactive
Chris Dunlop wrote:
> Neha Ojha wrote:
> > This is expected when mons don't form quorum, here it was caused by http...
Neha Ojha

09/17/2021

11:29 PM Bug #52489: Adding a Pacific MON to an Octopus cluster: All PGs inactive
Neha Ojha wrote:
> This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issu...
Chris Dunlop
10:26 PM Bug #52489 (Duplicate): Adding a Pacific MON to an Octopus cluster: All PGs inactive
This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issues/52488. Let's use ... Neha Ojha
10:52 PM Bug #52509 (Need More Info): PG merge: PG stuck in premerge+peered state
This is similar to https://tracker.ceph.com/issues/44684, which has already been fixed. It seems like the pgs are in ... Neha Ojha
10:30 PM Bug #52503: cli_generic.sh: slow ops when trying rand write on cache pools
Most likely caused by failure injections, might be worth looking at what was going in the osd logs when we started se... Neha Ojha
10:13 PM Bug #52385: a possible data loss due to recovery_unfound PG after restarting all nodes
Can you share the full set of logs using ceph-post-file (https://docs.ceph.com/en/pacific/man/8/ceph-post-file/)? Neha Ojha
09:58 PM Bug #45202: Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()
Is this reproducible on octopus or pacific, which are not EOL? Neha Ojha
09:55 PM Bug #52445 (New): OSD asserts on starting too many pushes
Thanks, is it possible for you to share the logs using ceph-post-file (https://docs.ceph.com/en/pacific/man/8/ceph-po... Neha Ojha
09:47 PM Bug #52618 (Won't Fix - EOL): Ceph Luminous 12.2.13 OSD assert message
Please re-open if you happen to see the same issue on a recent release. Neha Ojha
09:46 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
... Neha Ojha
09:35 PM Bug #52640 (Need More Info): when osds out,reduce pool size reports a error "Error ERANGE: pool i...
We can workaround this by temporarily increasing mon_max_pg_per_osd, right? Neha Ojha
03:41 AM Bug #52640: when osds out,reduce pool size reports a error "Error ERANGE: pool id # pg_num 256 si...
https://github.com/ceph/ceph/pull/43201 lei cao
03:15 AM Bug #52640 (Need More Info): when osds out,reduce pool size reports a error "Error ERANGE: pool i...
At first,my cluster have 6 osds,3 pools which pgnum are all 256 and size are 2,mon_max_pg_per_osd is 300.So ,we need ... lei cao
09:30 PM Bug #52562: Thrashosds read error injection failed with error ENXIO
Deepika Upadhyay wrote:
> [...]
> /ceph/teuthology-archive/yuriw-2021-09-13_19:12:32-rados-wip-yuri6-testing-2021-0...
Neha Ojha
12:32 PM Bug #52562: Thrashosds read error injection failed with error ENXIO
... Deepika Upadhyay
09:24 PM Bug #52535: monitor crashes after an OSD got destroyed: OSDMap.cc: 5686: FAILED ceph_assert(num_d...
The log attached has a sha1 ca906d0d7a65c8a598d397b764dd262cce645fe3, is this the first time you encountered this iss... Neha Ojha
05:56 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
/a/sage-2021-09-16_18:04:19-rados-wip-sage-testing-2021-09-16-1020-distro-basic-smithi/6393058
note that this is m...
Sage Weil
04:27 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
/a/yuriw-2021-09-16_18:23:18-rados-wip-yuri2-testing-2021-09-16-0923-distro-basic-smithi/6393474 Neha Ojha
03:46 PM Documentation #35968 (Won't Fix): [doc][jewel] sync documentation "OSD Config Reference" default ...
Anthony D'Atri
09:41 AM Bug #50351 (Resolved): osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd restar...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
09:35 AM Backport #51605 (Resolved): pacific: bufferlist::splice() may cause stack corruption in bufferlis...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/42976
m...
Loïc Dachary
08:35 AM Backport #52644 (In Progress): nautilus: pool last_epoch_clean floor is stuck after pg merging
Konstantin Shalygin
08:34 AM Backport #52644 (Rejected): nautilus: pool last_epoch_clean floor is stuck after pg merging
https://github.com/ceph/ceph/pull/43204 Konstantin Shalygin
06:09 AM Bug #48508 (Resolved): Donot roundoff the bucket weight while decompiling crush map to source
Prashant D

09/16/2021

10:14 PM Backport #51966: nautilus: set a non-zero default value for osd_client_message_cap
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42617
merged
Yuri Weinstein
10:13 PM Backport #51583: nautilus: osd does not proactively remove leftover PGs
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42240
merged
Yuri Weinstein
07:01 PM Bug #49697 (Resolved): prime pg temp: unexpected optimization
fan chen wrote:
> Recently, I find patch "https://github.com/ceph/ceph/commit/023524a26d7e12e7ddfc3537582b1a1cb03af6...
Neha Ojha
11:16 AM Bug #52408 (Can't reproduce): osds not peering correctly after startup
I rebuilt my cluster yesterday using a container image based on commit d906f946e845, and I'm not able to reproduce th... Jeff Layton
02:08 AM Bug #52624 (New): qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABIL...
/ceph/teuthology-archive/pdonnell-2021-09-14_01:17:08-fs-wip-pdonnell-testing-20210910.181451-distro-basic-smithi/638... Patrick Donnelly

09/15/2021

09:23 PM Bug #52605 (Fix Under Review): osd: add scrub duration to pg dump
Neha Ojha
06:39 PM Bug #52621: cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_de...
I reran the test 10 times in https://pulpito.ceph.com/nojha-2021-09-14_18:44:41-rados:singleton-pacific-distro-basic-... Neha Ojha
06:05 PM Bug #52621 (Can't reproduce): cephx: verify_authorizer could not decrypt ticket info: error: bad ...
... Neha Ojha
03:20 PM Backport #52620 (Resolved): pacific: partial recovery become whole object recovery after restart osd
https://github.com/ceph/ceph/pull/43513 Backport Bot
03:18 PM Bug #52583 (Pending Backport): partial recovery become whole object recovery after restart osd
Kefu Chai
02:50 PM Backport #52337: octopus: ceph df detail reports dirty objects without a cache tier
Deepika Upadhyay wrote:
> https://github.com/ceph/ceph/pull/42862
merged
Yuri Weinstein
02:43 PM Bug #50393: CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client...
https://github.com/ceph/ceph/pull/42498 merged Yuri Weinstein
02:18 PM Bug #52618 (Won't Fix - EOL): Ceph Luminous 12.2.13 OSD assert message
2021-09-02 14:25:37.173453 7f2235baf700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/... ceph ceph

09/14/2021

05:53 PM Feature #52609 (Fix Under Review): New PG states for pending scrubs / repairs
Request to add new PG states to provide feedback to the admin when a PG scrub/repair is scheduled ( via command line,... Michael Kidd
01:14 PM Bug #52605 (Resolved): osd: add scrub duration to pg dump
We would like to add a new column to the pg dump which would give us the time it took for a pg to get scrubbed. Aishwarya Mathuria

09/13/2021

10:48 PM Bug #52562 (Triaged): Thrashosds read error injection failed with error ENXIO
Looking at the osd and mon logs
Here's when the osd was restarted in revive_osd()...
Neha Ojha
09:05 PM Backport #52596 (Rejected): octopus: make bufferlist::c_str() skip rebuild when it isn't necessary
Backport Bot
09:05 PM Backport #52595 (Rejected): pacific: make bufferlist::c_str() skip rebuild when it isn't necessary
Backport Bot
09:03 PM Bug #51725 (Pending Backport): make bufferlist::c_str() skip rebuild when it isn't necessary
This spares a heap allocation on every received message. Marking for backporting to both octopus and pacific as a lo... Ilya Dryomov
02:21 PM Bug #45871: Incorrect (0) number of slow requests in health check
On a... Nico Schottelius
10:35 AM Backport #52586 (Resolved): pacific: src/vstart: The command "set config key osd_mclock_max_capac...
https://github.com/ceph/ceph/pull/41731 Backport Bot
10:24 AM Bug #52583: partial recovery become whole object recovery after restart osd
FIX URL:
https://github.com/ceph/ceph/pull/43146
https://github.com/ceph/ceph/pull/42904
jianwei zhang
09:43 AM Bug #52583 (Resolved): partial recovery become whole object recovery after restart osd
Problem: After the osd that is undergoing partial recovery is restarted, the data recovery is rolled back from the pa... jianwei zhang
05:32 AM Bug #52578 (Fix Under Review): CLI - osd pool rm --help message is wrong or misleading
CLI - osd pool rm --help message is wrong or misleading
Version-Release number of selected component (if applicabl...
Vasishta Shastry
04:19 AM Bug #52445: OSD asserts on starting too many pushes
Neha Ojha wrote:
> Can you please provide 1) osd logs with debug_osd=20 and debug_ms=1 2) ceph.conf 3) output of cep...
Amudhan Pandia n
 

Also available in: Atom