Activity
From 06/06/2021 to 07/05/2021
07/05/2021
- 01:27 PM Bug #46847: Loss of placement information on OSD reboot
- Last week we had a power outage affecting all OSD machines in a 14.2.20 cluster. A small percentage of PGs didn't act...
- 01:18 PM Bug #51527 (Resolved): Ceph osd crashed due to segfault
- Hi everyone,
We have 9 osd nodes with 12 deamons for each node.
Ceph used for s3 objects and rbd images.
ceph ... - 12:06 PM Bug #48965: qa/standalone/osd/osd-force-create-pg.sh: TEST_reuse_id: return 1
- /a/sseshasa-2021-07-05_10:18:42-rados:standalone-wip-test-stdalone-mclk-1-distro-basic-smithi/6253062
- 11:49 AM Bug #45761 (Need More Info): mon_thrasher: "Error ENXIO: mon unavailable" during sync_force comma...
- Stopped Reproducing, please reopen if you hit another instance
- 11:47 AM Bug #48609 (Closed): osd/PGLog: don’t fast-forward can_rollback_to during merge_log if the log is...
- Root cause resolved
- 11:46 AM Backport #51522 (Resolved): pacific: osd: Delay sending info to new backfill peer resetting last_...
- https://github.com/ceph/ceph/pull/41136
- 11:35 AM Backport #51522 (Resolved): pacific: osd: Delay sending info to new backfill peer resetting last_...
- 11:45 AM Backport #51523: octopus: osd: Delay sending info to new backfill peer resetting last_backfill un...
- https://github.com/ceph/ceph/pull/40593/
- 11:35 AM Backport #51523 (Resolved): octopus: osd: Delay sending info to new backfill peer resetting last_...
- 11:36 AM Backport #51525 (Rejected): octopus: osd: Delay sending info to new backfill peer resetting last_...
- 11:35 AM Bug #48611: osd: Delay sending info to new backfill peer resetting last_backfill until backfill a...
- since nautilus has reached EOL removed it
- 11:34 AM Bug #48611 (Pending Backport): osd: Delay sending info to new backfill peer resetting last_backfi...
- 04:19 AM Bug #45457 (Fix Under Review): CEPH Graylog Logging Missing "host" Field
07/03/2021
- 06:16 AM Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())
- Another OSD crash after scrub assert bug, log attached. corrupted rocskdb.
07/02/2021
- 06:55 PM Bug #50866: osd: stat mismatch on objects
- /ceph/teuthology-archive/pdonnell-2021-07-02_10:08:50-fs-wip-pdonnell-testing-20210701.192056-distro-basic-smithi/624...
- 05:00 PM Backport #51498 (Resolved): pacific: mgr spamming with repeated set pgp_num_actual while merging
- https://github.com/ceph/ceph/pull/42223
- 05:00 PM Backport #51497 (Rejected): nautilus: mgr spamming with repeated set pgp_num_actual while merging
- https://github.com/ceph/ceph/pull/43218
- 05:00 PM Backport #51496 (Resolved): octopus: mgr spamming with repeated set pgp_num_actual while merging
- https://github.com/ceph/ceph/pull/42420
- 04:59 PM Bug #51433 (Pending Backport): mgr spamming with repeated set pgp_num_actual while merging
07/01/2021
- 09:39 PM Bug #51307: LibRadosWatchNotify.Watch2Delete fails
- Seems very similar to https://tracker.ceph.com/issues/50042#note-2
- 09:05 PM Bug #48417 (Duplicate): unfound EC objects in sepia's LRC after upgrade
- 06:57 PM Backport #51453: pacific: Add simultaneous scrubs to rados/thrash
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42120
merged - 04:42 PM Bug #48212 (Fix Under Review): poollast_epoch_clean floor is stuck after pg merging
- 02:13 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
- Dan van der Ster wrote:
> I suspect the cause is that there's a leftover epoch value for the now-deleted PG in `epoc... - 01:28 PM Bug #48212: poollast_epoch_clean floor is stuck after pg merging
- I suspect the cause is that there's a leftover epoch value for the now-deleted PG in `epoch_by_pg` in `void LastEpoch...
- 04:38 PM Bug #38931 (Fix Under Review): osd does not proactively remove leftover PGs
- Our customer reported a similar case, providing an easy way to reproduce the issue: if when purging a pg the osd is m...
- 11:03 AM Fix #51464 (Fix Under Review): osd: Add mechanism to avoid running osd benchmark on osd init when...
- 08:54 AM Fix #51464 (Resolved): osd: Add mechanism to avoid running osd benchmark on osd init when using m...
- The current behavior is to let the osd benchmark run on each osd
init, which is not necessary. If the underlying dev... - 07:44 AM Bug #51463 (Resolved): blocked requests while stopping/starting OSDs
- Hi,
we run into a lot of slow requests. (IO blocked for several seconds) while stopping or starting one or more OS... - 07:13 AM Bug #51419: bufferlist::splice() may cause stack corruption in bufferlist::rebuild_aligned_size_a...
- Initially triggered with fio when testing rbd persistent write-back cache in ssd mode:...
06/30/2021
- 09:50 PM Bug #49894 (In Progress): set a non-zero default value for osd_client_message_cap
- Neha Ojha wrote:
> Neha Ojha wrote:
> > The current default of 0 doesn't help and we've tried setting it to 5000 fo... - 06:53 PM Backport #50790: octopus: osd: write_trunc omitted to clear data digest
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41620
merged - 06:50 PM Backport #50791: pacific: osd: write_trunc omitted to clear data digest
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/42019
merged - 06:49 PM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
- https://github.com/ceph/ceph/pull/41944 merged
- 06:44 PM Bug #51457 (New): qa/standalone/scrub/osd-scrub-test.sh: TEST_interval_changes: date check failed
- ...
- 04:44 PM Backport #51453 (In Progress): pacific: Add simultaneous scrubs to rados/thrash
- 04:30 PM Backport #51453 (Resolved): pacific: Add simultaneous scrubs to rados/thrash
- https://github.com/ceph/ceph/pull/42120
- 04:39 PM Bug #45868: rados_api_tests: LibRadosWatchNotify.AioWatchNotify2 fails
- /a/yuriw-2021-06-29_19:12:08-rados-wip-yuri2-testing-2021-06-28-0858-pacific-distro-basic-smithi/6243653
- 04:35 PM Bug #51454 (New): Simultaneous OSD's crash with tp_osd_tp on rocksdb::MergingIterator::Next()
- Ceph v14.2.15
Main use case is RGW.
Bucket indexes on SSD OSDs.
Majority of SSD OSD under bucket intexes are FileS... - 04:30 PM Backport #51452 (Resolved): octopus: Add simultaneous scrubs to rados/thrash
- https://github.com/ceph/ceph/pull/42422
- 04:28 PM Bug #51451 (Resolved): Add simultaneous scrubs to rados/thrash
- Motivated by https://tracker.ceph.com/issues/50346.
- 09:18 AM Bug #51419 (Fix Under Review): bufferlist::splice() may cause stack corruption in bufferlist::reb...
- 06:55 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- http://qa-proxy.ceph.com/teuthology/ideepika-2021-06-30_04:28:06-rados-wip-yuri7-testing-2021-06-28-1224-octopus-dist...
06/29/2021
- 09:16 PM Bug #51433 (Fix Under Review): mgr spamming with repeated set pgp_num_actual while merging
- 08:33 PM Bug #51433 (Resolved): mgr spamming with repeated set pgp_num_actual while merging
- While merging PGs our osdmaps are churning through ~2000 epochs per hour.
The osdmap diffs are empty:... - 08:03 PM Bug #49525: found snap mapper error on pg 3.2s1 oid 3:4abe9991:::smithi10121515-14:e4 snaps missi...
- The sequence looks a lit different this time.
/a/rfriedma-2021-06-26_19:32:15-rados-wip-ronenf-scrubs-config-distr... - 05:03 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- /a/yuriw-2021-06-28_17:32:48-rados-wip-yuri2-testing-2021-06-28-0858-pacific-distro-basic-smithi/6239590
- 04:59 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2021-06-28_17:32:48-rados-wip-yuri2-testing-2021-06-28-0858-pacific-distro-basic-smithi/6239575
- 08:13 AM Bug #48613 (Resolved): Reproduce https://tracker.ceph.com/issues/48417
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:12 AM Bug #49139 (Resolved): rados/perf: cosbench workloads hang forever
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:11 AM Bug #49988 (Resolved): Global Recovery Event never completes
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:11 AM Bug #50230 (Resolved): mon: spawn loop after mon reinstalled
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:10 AM Bug #50466 (Resolved): _delete_some additional unexpected onode list
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:10 AM Bug #50477 (Resolved): mon/MonClient: reset authenticate_err in _reopen_session()
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:08 AM Bug #50964 (Resolved): mon: slow ops due to osd_failure
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:03 AM Backport #51237 (Resolved): nautilus: rebuild-mondb hangs
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41874
m... - 08:03 AM Bug #50245: TEST_recovery_scrub_2: Not enough recovery started simultaneously
- /a//kchai-2021-06-27_13:33:07-rados-wip-kefu-testing-2021-06-27-1907-distro-basic-smithi/6238237
- 08:00 AM Backport #50987 (Resolved): octopus: unaligned access to member variables of crush_work_bucket
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41622
m... - 08:00 AM Backport #50796 (Resolved): octopus: mon: spawn loop after mon reinstalled
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41621
m... - 07:56 AM Backport #51269 (Resolved): octopus: rados/perf: cosbench workloads hang forever
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41922
m... - 07:56 AM Backport #50990: octopus: mon: slow ops due to osd_failure
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41618
m... - 07:53 AM Backport #50705 (Resolved): octopus: _delete_some additional unexpected onode list
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41623
m... - 07:53 AM Backport #50152 (Resolved): octopus: Reproduce https://tracker.ceph.com/issues/48417
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41609
m... - 07:52 AM Backport #50750 (Resolved): octopus: max_misplaced was replaced by target_max_misplaced_ratio
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41624
m... - 07:38 AM Backport #51313 (Resolved): pacific: osd:scrub skip some pg
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41971
m... - 07:37 AM Backport #50505 (Resolved): pacific: mon/MonClient: reset authenticate_err in _reopen_session()
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41019
m... - 07:37 AM Backport #50986 (Resolved): pacific: unaligned access to member variables of crush_work_bucket
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41983
m... - 07:36 AM Backport #50989 (Resolved): pacific: mon: slow ops due to osd_failure
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41982
m... - 07:32 AM Backport #50797 (Resolved): pacific: mon: spawn loop after mon reinstalled
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41768
m... - 07:28 AM Backport #51215: pacific: Global Recovery Event never completes
- Nathan, would you be so kind as to add a link to this issue in https://github.com/ceph/ceph/pull/41872 ?
- 07:27 AM Backport #51215 (Resolved): pacific: Global Recovery Event never completes
- 07:19 AM Backport #50706 (Resolved): pacific: _delete_some additional unexpected onode list
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41680
m... - 05:40 AM Bug #51419 (Resolved): bufferlist::splice() may cause stack corruption in bufferlist::rebuild_ali...
- *** stack smashing detected ***: terminated2073 IOPS][eta 02h:59m:36s]
--Type <RET> for more, q to quit, c to contin...
06/28/2021
- 07:29 PM Backport #50987: octopus: unaligned access to member variables of crush_work_bucket
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41622
merged - 07:29 PM Backport #50796: octopus: mon: spawn loop after mon reinstalled
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41621
merged - 04:25 PM Backport #50505: pacific: mon/MonClient: reset authenticate_err in _reopen_session()
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41019
mergedReviewed-by: Kefu Chai <kchai@redhat.com> - 06:54 AM Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors i...
- I see a similar crash on quincy, suspect its seen when I try to add mons from 1 to 3 .
/]# ceph crash info 2021-06...
06/26/2021
- 02:27 PM Backport #50986: pacific: unaligned access to member variables of crush_work_bucket
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41983
merged - 02:26 PM Backport #50989: pacific: mon: slow ops due to osd_failure
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41982
merged
06/25/2021
- 09:43 PM Bug #51101: rados/test_envlibrados_for_rocksdb.sh: cmake: symbol lookup error: cmake: undefined s...
- /a/yuriw-2021-06-24_16:54:31-rados-wip-yuri-testing-2021-06-24-0708-pacific-distro-basic-smithi/6190738
- 03:50 PM Backport #51371 (Resolved): pacific: OSD crash FAILED ceph_assert(!is_scrubbing())
- https://github.com/ceph/ceph/pull/41944
- 03:48 PM Bug #50346 (Pending Backport): OSD crash FAILED ceph_assert(!is_scrubbing())
- 06:48 AM Backport #50990 (Resolved): octopus: mon: slow ops due to osd_failure
06/24/2021
06/23/2021
- 10:43 PM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
- Andrej Filipcic wrote:
> A related crash, happened when I disabled scrubbing:
>
> -1> 2021-06-14T11:17:15.373... - 10:42 PM Bug #51338 (Duplicate): osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*&g...
- Originally reported in https://tracker.ceph.com/issues/50346#note-6...
- 06:16 PM Backport #50796: octopus: mon: spawn loop after mon reinstalled
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41621
merged - 03:31 PM Backport #50797: pacific: mon: spawn loop after mon reinstalled
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41768
merged - 03:30 PM Bug #49988: Global Recovery Event never completes
- https://github.com/ceph/ceph/pull/41872 merged
06/22/2021
- 10:47 PM Backport #50986 (In Progress): pacific: unaligned access to member variables of crush_work_bucket
- 10:44 PM Backport #50989 (In Progress): pacific: mon: slow ops due to osd_failure
- 05:21 PM Backport #50705: octopus: _delete_some additional unexpected onode list
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41623
merged - 05:21 PM Backport #50152: octopus: Reproduce https://tracker.ceph.com/issues/48417
- Dan van der Ster wrote:
> Nathan I've done the manual backport here: https://github.com/ceph/ceph/pull/41609
> Copy... - 11:23 AM Backport #51315 (In Progress): nautilus: osd:scrub skip some pg
- 10:55 AM Backport #51315 (Resolved): nautilus: osd:scrub skip some pg
- https://github.com/ceph/ceph/pull/41973
- 11:11 AM Backport #51314 (In Progress): octopus: osd:scrub skip some pg
- 10:55 AM Backport #51314 (Resolved): octopus: osd:scrub skip some pg
- https://github.com/ceph/ceph/pull/41972
- 10:56 AM Backport #51313 (In Progress): pacific: osd:scrub skip some pg
- 10:55 AM Backport #51313 (Resolved): pacific: osd:scrub skip some pg
- https://github.com/ceph/ceph/pull/41971
- 10:55 AM Backport #51316 (Duplicate): nautilus: osd:scrub skip some pg
- 10:53 AM Bug #49487 (Pending Backport): osd:scrub skip some pg
- 10:28 AM Bug #50346 (Fix Under Review): OSD crash FAILED ceph_assert(!is_scrubbing())
- 06:59 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
- Andrej Filipcic wrote:
> On a 60-node, 1500 HDD cluster, and 16.2.4 release, this issue become very frequent, especi...
06/21/2021
- 08:50 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- FYI I tried with ceph/daemon-base:master-24e1f91-pacific-centos-8-x86_64 (the latest non-devel build at this time) ju...
- 06:07 PM Bug #51307 (Resolved): LibRadosWatchNotify.Watch2Delete fails
- ...
- 03:40 PM Bug #51270: mon: stretch mode clusters do not sanely set default crush rules
- Accidentally requested backports to Octopus/Nautilus, so nuking those.
- 03:40 PM Backport #51289 (Rejected): octopus: mon: stretch mode clusters do not sanely set default crush r...
- Accidental backport request
- 03:40 PM Backport #51288 (Rejected): nautilus: mon: stretch mode clusters do not sanely set default crush ...
- Accidental backport request
06/20/2021
06/19/2021
- 03:00 PM Backport #51290 (Resolved): pacific: mon: stretch mode clusters do not sanely set default crush r...
- https://github.com/ceph/ceph/pull/42909
- 03:00 PM Backport #51289 (Rejected): octopus: mon: stretch mode clusters do not sanely set default crush r...
- 03:00 PM Backport #51288 (Rejected): nautilus: mon: stretch mode clusters do not sanely set default crush ...
- 02:58 PM Bug #51270 (Pending Backport): mon: stretch mode clusters do not sanely set default crush rules
- 01:15 PM Backport #51287 (Resolved): pacific: LibRadosService.StatusFormat failed, Expected: (0) != (retry...
- https://github.com/ceph/ceph/pull/46677
- 01:12 PM Bug #51234 (Pending Backport): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
- 01:11 PM Bug #51234 (Resolved): LibRadosService.StatusFormat failed, Expected: (0) != (retry), actual: 0 vs 0
- 02:19 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- I think this one is related?
/ceph/teuthology-archive/pdonnell-2021-06-16_21:26:55-fs-wip-pdonnell-testing-2021061...
06/18/2021
- 09:15 PM Bug #51083: Raw space filling up faster than used space
- I don't have any ideas from the logs. Moving this back to RADOS. I doubt it has anything to do with CephFS.
- 11:56 AM Bug #51083: Raw space filling up faster than used space
- Patrick Donnelly wrote:
> Scrub is unlikely to help.
I came to the same conclusion after reading the documentatio...
06/17/2021
- 11:07 PM Bug #51083: Raw space filling up faster than used space
- Jan-Philipp Litza wrote:
> Yesterday evening we finally managed to upgrade the MDS daemons as well, and that seems t... - 09:13 PM Bug #51083: Raw space filling up faster than used space
- Patrick: do you understand how upgrading the MDS daemons helped in this case? There is nothing in the osd/bluestore s...
- 09:03 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
- We definitely do not use cache tiering on any of our clusters. On the cluster above, we do use snapshots (via cephfs...
- 08:48 PM Bug #51254: deep-scrub stat mismatch on last PG in pool
- It seems like you are using cache tiering, and there has been similar bugs reported like this. I don't understand why...
- 09:01 PM Bug #51234 (Fix Under Review): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
- 08:57 PM Bug #50842 (Need More Info): pacific: recovery does not complete because of rw_manager lock not ...
- 08:53 PM Backport #51269 (In Progress): octopus: rados/perf: cosbench workloads hang forever
- 07:14 PM Backport #51269 (Resolved): octopus: rados/perf: cosbench workloads hang forever
- https://github.com/ceph/ceph/pull/41922
- 08:42 PM Bug #51074 (Pending Backport): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with...
- marking Pending Backport, needs to be included with https://github.com/ceph/ceph/pull/41731
- 08:40 PM Bug #51168 (Need More Info): ceph-osd state machine crash during peering process
- Can you please attach the osd log for this crash?
- 08:07 PM Bug #51270 (Fix Under Review): mon: stretch mode clusters do not sanely set default crush rules
- 08:03 PM Bug #51270 (Pending Backport): mon: stretch mode clusters do not sanely set default crush rules
- If you do not specify a crush rule when creating a pool, the OSDMonitor picks the default one for you out of the conf...
- 07:12 PM Bug #49139 (Pending Backport): rados/perf: cosbench workloads hang forever
- 02:32 PM Backport #51237: nautilus: rebuild-mondb hangs
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41874
merged
06/16/2021
- 10:40 PM Bug #51254 (New): deep-scrub stat mismatch on last PG in pool
- In the past few weeks, we got inconsistent PGs in deep-scrub a few times, always on the very last PG in the pool:
... - 07:25 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
- ...
- 07:22 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- /ceph/teuthology-archive/yuriw-2021-06-14_19:20:57-rados-wip-yuri6-testing-2021-06-14-1106-octopus-distro-basic-smith...
- 06:48 PM Bug #50042: rados/test.sh: api_watch_notify failures
- ...
- 02:49 PM Bug #51246 (New): error in open_pools_parallel: rados_write(0.obj) failed with error: -2
- ...
- 01:22 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- > Will this patch be released in 14.2.22?
yes the PR has been merged to the nautilus branch, so it will be in the ... - 12:59 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- We hit this bug yesterday in a nautilus 14.2.18 cluster.
All monitors went down and started crashing on restart.
... - 02:18 AM Backport #51237 (In Progress): nautilus: rebuild-mondb hangs
- 02:16 AM Backport #51237 (Resolved): nautilus: rebuild-mondb hangs
- https://github.com/ceph/ceph/pull/41874
- 02:13 AM Bug #38219 (Pending Backport): rebuild-mondb hangs
06/15/2021
- 08:05 PM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
- Just to note:
IMO ceph-bluestore-tool crash is caused by a bag in AvlAllocator and is a duplicate of https://tracker... - 06:59 PM Backport #51215 (In Progress): pacific: Global Recovery Event never completes
- 12:55 AM Backport #51215 (Resolved): pacific: Global Recovery Event never completes
- Backport PR https://github.com/ceph/ceph/pull/41872
- 06:50 PM Backport #50706: pacific: _delete_some additional unexpected onode list
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41680
merged - 06:47 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
- @Neha: I did, but am afraid they are lost, the test was from https://pulpito.ceph.com/ideepika-2021-05-17_10:16:28-ra...
- 06:31 PM Bug #50842: pacific: recovery does not complete because of rw_manager lock not being released
- @Deepika, do you happen to have the logs saved somewhere?
- 06:41 PM Bug #51234 (Pending Backport): LibRadosService.StatusFormat failed, Expected: (0) != (retry), act...
- ...
- 06:30 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- rados/thrash-erasure-code-big/{ceph cluster/{12-osds openstack} mon_election/connectivity msgr-failures/osd-dispatch-...
- 06:26 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- Looks very similar...
- 04:12 PM Backport #50750: octopus: max_misplaced was replaced by target_max_misplaced_ratio
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41624
merged - 12:20 PM Bug #51223 (New): statfs: a cluster with filestore and bluestore OSD's will report bytes_used == ...
- Cluster migrated from Luminous mixed bluestore+filestore OSD's to Nautilus 14.2.21
After last filestore OSD purged f... - 10:47 AM Bug #49677 (Resolved): debian ceph-common package post-inst clobbers ownership of cephadm log dirs
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:47 AM Bug #49781 (Resolved): unittest_mempool.check_shard_select failed
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:44 AM Bug #50501 (Resolved): osd/scheduler/mClockScheduler: Async reservers are not updated with the ov...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:44 AM Bug #50558 (Resolved): Data loss propagation after backfill
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:42 AM Backport #50795 (Resolved): nautilus: mon: spawn loop after mon reinstalled
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41762
m... - 10:41 AM Backport #50704 (Resolved): nautilus: _delete_some additional unexpected onode list
- 10:36 AM Backport #50153 (Resolved): nautilus: Reproduce https://tracker.ceph.com/issues/48417
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41611
m... - 10:36 AM Backport #49729 (Resolved): nautilus: debian ceph-common package post-inst clobbers ownership of ...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40698
m... - 10:32 AM Backport #50988: nautilus: mon: slow ops due to osd_failure
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41519
m... - 09:05 AM Backport #50406: pacific: mon: new monitors may direct MMonJoin to a peon instead of the leader
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41131
m... - 09:04 AM Backport #50344: pacific: mon: stretch state is inconsistently-maintained on peons, preventing pr...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41130
m... - 09:04 AM Backport #50794 (Resolved): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41320
m... - 09:03 AM Backport #50702 (Resolved): pacific: Data loss propagation after backfill
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41236
m... - 09:03 AM Backport #50606 (Resolved): pacific: osd/scheduler/mClockScheduler: Async reservers are not updat...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41125
m... - 09:02 AM Backport #49992 (Resolved): pacific: unittest_mempool.check_shard_select failed
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40566
m... - 12:54 AM Bug #49988 (Pending Backport): Global Recovery Event never completes
06/14/2021
- 03:24 PM Feature #51213 (Resolved): [ceph osd set noautoscale] Global on/off flag for PG autoscale feature
- For now, we do not have a global flag, like `ceph osd set noout` for the pg autoscale feature. We have pool flags[1] ...
- 09:24 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
A related crash, happened when I disabled scrubbing:
-1> 2021-06-14T11:17:15.373+0200 7fb9916f5700 -1 /home/...
06/13/2021
- 03:34 PM Bug #51194: PG recovery_unfound after scrub repair failed on primary
- To prevent the user IO from being blocked, we took this action:
1. First, we queried the unfound objects. osd.951 ... - 01:36 PM Bug #51194 (New): PG recovery_unfound after scrub repair failed on primary
- This comes from a mail I send to the ceph-users ML: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3...
- 03:30 PM Backport #51195 (In Progress): pacific: [rfe] increase osd_max_write_op_reply_len default value t...
- https://github.com/ceph/ceph/pull/53470
- 03:28 PM Bug #51166 (Pending Backport): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
06/12/2021
06/10/2021
- 08:35 PM Backport #51173 (Rejected): nautilus: regression in ceph daemonperf command output, osd columns a...
- 08:35 PM Backport #51172 (Resolved): pacific: regression in ceph daemonperf command output, osd columns ar...
- https://github.com/ceph/ceph/pull/44175
- 08:35 PM Backport #51171 (Resolved): octopus: regression in ceph daemonperf command output, osd columns ar...
- https://github.com/ceph/ceph/pull/44176
- 08:32 PM Bug #51002 (Pending Backport): regression in ceph daemonperf command output, osd columns aren't v...
- 06:48 PM Backport #50795: nautilus: mon: spawn loop after mon reinstalled
- Dan van der Ster wrote:
> https://github.com/ceph/ceph/pull/41762
merged - 03:58 PM Bug #51168 (New): ceph-osd state machine crash during peering process
- ...
- 02:22 PM Bug #51166 (Fix Under Review): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
- 02:16 PM Bug #51166 (Pending Backport): [rfe] increase osd_max_write_op_reply_len default value to 64 bytes
- As agreed in #ceph-devel, Sage, Josh, Neha concurring.
- 01:53 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- For the dead jobs, relevant logs have been uploaded to senta02 under /home/sseshasa/recovery_timeout.
Please let me ... - 10:25 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
- On a 60-node, 1500 HDD cluster, and 16.2.4 release, this issue become very frequent, especially when RBD writes excee...
06/09/2021
- 01:32 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- From the logs of 6161181 snapshot recovery is not able to proceed since a rwlock on the head version
(3:cb63772d:::... - 08:07 AM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- Ran the same test repeatedly (5 times) on master by setting osd_op_queue to 'wpq' and 'mclock_scheduler' on different...
- 10:39 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
- Kefu, yes I did read the update and your effort to find the commit(s) that caused the regression in the standalone te...
- 10:25 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
- Sridhar, please read the https://tracker.ceph.com/issues/51074#note-3. that's my finding in the last 3 days.
- 09:44 AM Bug #51074: standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad data after pr...
- Raised PR https://github.com/ceph/ceph/pull/41782 to address the test failure.
Please see latest update to https:/... - 09:05 AM Backport #51151 (Rejected): nautilus: When read failed, ret can not take as data len, in FillInVe...
- 09:05 AM Backport #51150 (Resolved): pacific: When read failed, ret can not take as data len, in FillInVer...
- https://github.com/ceph/ceph/pull/44173
- 09:05 AM Backport #51149 (Resolved): octopus: When read failed, ret can not take as data len, in FillInVer...
- https://github.com/ceph/ceph/pull/44174
- 09:02 AM Bug #51115 (Pending Backport): When read failed, ret can not take as data len, in FillInVerifyExtent
06/08/2021
- 11:10 PM Bug #38219: rebuild-mondb hangs
- http://qa-proxy.ceph.com/teuthology/yuriw-2021-06-08_20:53:36-rados-wip-yuri-testing-2021-06-04-0753-nautilus-distro-...
- 06:39 PM Bug #38219: rebuild-mondb hangs
- 2021-06-04T23:05:38.775 INFO:tasks.ceph.mon.a.smithi071.stderr:/build/ceph-14.2.21-305-gac8fcfa6/src/mon/OSDMonitor.c...
- 07:52 PM Backport #50797 (In Progress): pacific: mon: spawn loop after mon reinstalled
- 05:58 PM Backport #50795: nautilus: mon: spawn loop after mon reinstalled
- https://github.com/ceph/ceph/pull/41762
- 04:50 PM Bug #50681: memstore: apparent memory leak when removing objects
- Sven Anderson wrote:
> Greg Farnum wrote:
> > How long did you wait to see if memory usage dropped? Did you look at... - 08:48 AM Bug #51074 (Triaged): standalone/osd-rep-recov-eio.sh: TEST_rep_read_unfound failed with "Bad dat...
06/07/2021
- 11:37 AM Backport #51117 (In Progress): pacific: osd: Run osd bench test to override default max osd capac...
- 10:25 AM Backport #51117 (Resolved): pacific: osd: Run osd bench test to override default max osd capacity...
- https://github.com/ceph/ceph/pull/41731
- 10:22 AM Fix #51116 (Resolved): osd: Run osd bench test to override default max osd capacity for mclock.
- 08:28 AM Bug #51115: When read failed, ret can not take as data len, in FillInVerifyExtent
- https://github.com/ceph/ceph/pull/41727
- 08:12 AM Bug #51115 (Fix Under Review): When read failed, ret can not take as data len, in FillInVerifyExtent
- 07:42 AM Bug #51115 (Resolved): When read failed, ret can not take as data len, in FillInVerifyExtent
- when read failed, such as return -EIO, FillInVerifyExtent take ret as data length.
- 06:59 AM Bug #51083: Raw space filling up faster than used space
- Yesterday evening we finally managed to upgrade the MDS daemons as well, and that seems to have stopped the space was...
06/06/2021
- 11:22 AM Feature #51110 (New): invalidate crc in buffer::ptr::c_str()
- h3. what:
*buffer::ptr* (or more precisely, *buffer::raw*) has the ability to cache CRC codes that are calculated ...
Also available in: Atom