Activity
From 04/21/2022 to 05/20/2022
05/20/2022
- 03:33 PM Bug #55726: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients
- https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NQKDCBJ2SH3DTUCMV6KU4T3EGKOSCGJV/
- 02:14 PM Bug #55726 (Need More Info): Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on c...
- Hi
I observed high latencies and mount points hanging since Octopus release
and it's still observed on Pacific l... - 03:24 PM Bug #46847: Loss of placement information on OSD reboot
- we also encounter the similar issue and when ecpool during rebalance; sometime (osd overload or pg peering crash), th...
05/19/2022
- 10:01 PM Bug #51076 (Fix Under Review): "wait_for_recovery: failed before timeout expired" during thrashos...
- 09:19 PM Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to...
- Neha Ojha wrote:
> Laura Flores wrote:
> > /a/yuriw-2022-03-25_18:42:52-rados-wip-yuri7-testing-2022-03-24-1341-pac... - 01:55 PM Bug #55711 (Fix Under Review): mon: race condition between `mgr fail` and MgrMonitor::prepare_bea...
- 01:51 PM Bug #55711 (Resolved): mon: race condition between `mgr fail` and MgrMonitor::prepare_beacon()
- https://gist.github.com/rzarzynski/25ac59c8422e9ad0b1710a765a77f19a#the-race-condition
- 06:01 AM Bug #55708 (Fix Under Review): Reducing 2 Monitors Causes Stray Daemon
- Example of the problem:
Roles:
smithi001: mon.a
smithi002: mon.b
smithi070: mon.c
smithi100 : mon.d
smithi2... - 05:27 AM Bug #55662: EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop.data.length...
- i used /qa/standalone/erasure-code/test-erasure-eio.sh, the test that failed is TEST_ec_object_attr_read_error when i...
05/18/2022
- 09:09 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-05-13_14:13:55-rados-wip-yuri3-testing-2022-05-12-1609-octopus-distro-default-smithi/6832699
- 08:56 PM Bug #52316: qa/tasks/mon_thrash.py: _do_thrash AssertionError len(s['quorum']) == len(mons)
- /a/yuriw-2022-05-13_14:13:55-rados-wip-yuri3-testing-2022-05-12-1609-octopus-distro-default-smithi/6832711...
- 07:37 PM Bug #53485 (Fix Under Review): monstore: logm entries are not garbage collected
- 01:37 PM Bug #53485: monstore: logm entries are not garbage collected
- PR https://github.com/ceph/ceph/pull/44511
- 06:26 PM Bug #55662: EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop.data.length...
- Can you please add the test that helped you discover this issue? I believe the same test was passing with other EC pl...
- 06:17 PM Bug #55407: quincy osd's fail to boot and crash
- This looks like something new and unrelated to other crashes in this ticket, so created a new one: https://tracker.ce...
- 06:17 PM Bug #51858: octopus: rados/test_crash.sh failure
- /a/nojha-2022-05-17_22:38:06-rados-wip-lrc-fix-pacific-distro-basic-smithi/6839177
- 06:17 PM Bug #55698 (New): osd: segfault at boot up
- In the https://tracker.ceph.com/issues/55407#note-14 an OSD crash during early boot up is reported:...
- 06:09 PM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
- The common theme between these failures (this one and #47026) is @check()@ function of @qa/standalone/osd-backfill/os...
- 03:02 PM Bug #55695: Shutting down a monitor forces Paxos to restart and sometimes disregard subsequent co...
- https://docs.google.com/document/d/1ucVz54vMlm26oiqQoqJ2upUPmiROd4AmwSwbVM_s2A0/edit#
- 03:01 PM Bug #55695 (Fix Under Review): Shutting down a monitor forces Paxos to restart and sometimes disr...
- *Problem:*
mon.a
mon.b
mon.c
mon.d
mon.e
ceph -a stop mon.d
ceph mon remove d
.
.
mon.d is down...
05/17/2022
- 11:21 PM Feature #55693 (Fix Under Review): Limit the Health Detail MSG log size in cluster logs
- RHBZ# https://bugzilla.redhat.com/show_bug.cgi?id=2087527
Version-Release number of selected component (if applica... - 10:58 PM Backport #55513: quincy: mount.ceph fails to understand AAAA records from SRV record
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46113
merged - 10:56 PM Backport #55280: quincy: mon/OSDMonitor: properly set last_force_op_resend in stretch mode
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45871
merged - 09:17 PM Bug #53485: monstore: logm entries are not garbage collected
- Sorry, forgot to add - we faced this issue on v15.2.13 and on v14.2.22 as well.
- 09:17 PM Bug #53485: monstore: logm entries are not garbage collected
- We observed this several times on a customer side. each out of 3 mon store.db rapidly growing, had tons of logm keys ...
- 06:05 PM Backport #52077 (Resolved): octopus: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- 09:15 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
- ...
- 01:38 AM Bug #52884: osd: optimize pg peering latency when add new osd that need backfill
- ...
05/16/2022
- 05:07 PM Bug #55670 (Fix Under Review): osdmaptool is not mapping child pgs to the target OSDs
- 09:01 AM Bug #55670 (Fix Under Review): osdmaptool is not mapping child pgs to the target OSDs
- Step to reproduce the issue:
1. ceph osd getmap > osdmap.bin
2. ./bin/osdmaptool --test-map-pgs-dump --pool <pool... - 02:44 PM Bug #55559 (Duplicate): osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
- 02:41 PM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
- I opened a new issue since a different test failed this time. The failure does look the same though, so maybe the one...
- 02:43 PM Bug #47026: osd-backfill-stats.sh fails in TEST_backfill_ec_down_all_out
- This was originally tracked in #55559. A different test was affected (TEST_backfill_ec_prim_out), but the failure loo...
- 09:40 AM Bug #52884: osd: optimize pg peering latency when add new osd that need backfill
- https://github.com/ceph/ceph/pull/46281
add codes for master branch - 09:17 AM Bug #55407: quincy osd's fail to boot and crash
- Hi! Today I've just build master again.
Rebuilt osd and on first boot:
...
-79> 2022-05-16T09:10:21.707+0000... - 08:51 AM Bug #55669: osd: add log for pg peering and activiting complete
- https://github.com/ceph/ceph/pull/46279
- 08:51 AM Bug #55669 (New): osd: add log for pg peering and activiting complete
- ...
- 08:22 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- Since I was developing on ceph15.2.13, I did not adapt to the master.
If there is no problem with the review, consi... - 08:21 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- ...
- 08:16 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- ...
- 08:07 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- ...
- 08:06 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- ...
- 08:05 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- The problem was first described at this link:
https://tracker.ceph.com/issues/54966?next_issue_id=54965 - 08:03 AM Bug #55668: osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recovery_unfound
- https://github.com/ceph/ceph/pull/46276
- 07:56 AM Bug #55668 (New): osd/ec: after one-by-one adding a new osd to the ceph cluster, pg stuck recover...
- problem:...
- 06:34 AM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
- Manuel Lausch wrote:
> Hi Nitzan,
> I checked your patch on the current pacific branch.
>
> unfortunately I stil... - 06:30 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
- pull_request: https://github.com/ceph/ceph/pull/46273
- 04:09 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
- For scenarios where a single node has both mon and osd, and the number of osd on a single node is large, or the mon e...
- 03:12 AM Bug #55665: osd: osd_fast_fail_on_connection_refused will cause the mon to continuously elect
- My test result:...
- 03:11 AM Bug #55665 (Fix Under Review): osd: osd_fast_fail_on_connection_refused will cause the mon to con...
- The first issue is described at https://tracker.ceph.com/issues/55067
Problem Description:... - 04:10 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
- https://tracker.ceph.com/issues/55665?next_issue_id=55662
05/15/2022
- 06:23 AM Bug #55662 (Rejected): EC: Clay assert fail ../src/osd/ECBackend.cc: 685: FAILED ceph_assert(pop....
- ...
05/13/2022
- 10:49 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- I reproduced the symptoms of this bug locally by incrementing the notify count before an eq check. The extra incremen...
- 09:29 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
- My test result:...
05/12/2022
- 11:10 PM Backport #55633 (In Progress): octopus: ceph-osd takes all memory before oom on boot
- https://github.com/ceph/ceph/pull/46253
- 06:07 PM Backport #55633 (Rejected): octopus: ceph-osd takes all memory before oom on boot
- 10:56 PM Backport #52077: octopus: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45320
merged - 10:42 PM Backport #55631 (In Progress): pacific: ceph-osd takes all memory before oom on boot
- https://github.com/ceph/ceph/pull/46252
- 06:06 PM Backport #55631 (Resolved): pacific: ceph-osd takes all memory before oom on boot
- 10:39 PM Backport #55632 (In Progress): quincy: ceph-osd takes all memory before oom on boot
- https://github.com/ceph/ceph/pull/46251
- 06:06 PM Backport #55632 (Resolved): quincy: ceph-osd takes all memory before oom on boot
- 06:52 PM Bug #55559: osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
- Hello Laura! Is there a thing that makes you think this isn't a duplicate of #47026?
- 06:48 PM Bug #49689: osd/PeeringState.cc: ceph_abort_msg("past_interval start interval mismatch") start
- If more date is necessary, it might be worth no contact Richard Bateman who replicated something awfully similar to t...
- 06:29 PM Bug #55582: octopus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails because `rados_w...
- Yet another in in the family of Watch / Notify ENOENT -> ENOTCONN bugs.
- 06:28 PM Bug #44229: monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early
- ...
- 06:25 PM Bug #44229 (New): monclient: _check_auth_rotating possible clock skew, rotating keys expired way ...
- Perhaps this replicated in:
/home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28... - 06:28 PM Bug #49591: no active mgr (MGR_DOWN)" in cluster log
- I can't find @Degraded data redundancy@ in the mgr's log but I can find messages about expired cephx keys:...
- 06:09 PM Bug #52993: upgrade:octopus-x Test: Upgrade test failed due to timeout of the "ceph pg dump" command
- We haven't backported the fix for https://tracker.ceph.com/issues/51815 to Octopus (per Neha's explanation).
- 06:02 PM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
- Hello! A note from a bug scrub:
1. This issue looks like being caused by a particular data stored in OSD which
2.... - 09:33 AM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
- ...
- 05:53 PM Bug #48440 (Need More Info): log [ERR] : scrub mismatch
- We would need to ensure the latest reoccurence is about the OSD scrub (we haven't seen too many mon scrubbing issues ...
- 05:45 PM Bug #53729 (Pending Backport): ceph-osd takes all memory before oom on boot
- 02:29 PM Backport #55624 (In Progress): quincy: Unable to format `ceph config dump` command output in yaml...
- 02:26 PM Backport #55624 (Resolved): quincy: Unable to format `ceph config dump` command output in yaml us...
- https://github.com/ceph/ceph/pull/46246
- 02:25 PM Bug #53895 (Pending Backport): Unable to format `ceph config dump` command output in yaml using `...
- 10:12 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
- I think we should also use failure_pending queue like send_failures to avoid one osd sending target osd to mon multip...
- 09:35 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
- ...
- 06:08 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
When a node is actively shut down for operation and maintenance,
the osd/mon/mds process on it will automatically...- 05:55 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
- Nitzan Mordechai wrote:
> jianwei zhang wrote:
> > osd_fast_shutdown(true)
> > osd_fast_shutdown_notify_mon(false)...
05/11/2022
- 08:59 PM Backport #54568: octopus: mon/MonCommands.h: target_size_ratio range is incorrect
- Kamoltat Sirivadhna wrote:
> https://github.com/ceph/ceph/pull/45398
merged - 08:46 PM Backport #55012: octopus: librados: check latest osdmap on ENOENT in pool_reverse_lookup()
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45587
merged - 08:12 PM Backport #53550: octopus: [RFE] Provide warning when the 'require-osd-release' flag does not matc...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/44260
merged - 04:58 PM Backport #52078 (Resolved): pacific: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- 04:58 PM Backport #55047 (Resolved): quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.Manifest...
- 04:58 PM Backport #55439 (Resolved): quincy: FAILED ceph_assert due to issue manifest API to the original ...
- 04:56 PM Backport #54468 (Resolved): octopus: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely cl...
- 04:15 PM Backport #54468: octopus: Setting osd_pg_max_concurrent_snap_trims to 0 prematurely clears the sn...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45324
merged - 04:56 PM Backport #55074 (Resolved): octopus: osd: osd_fast_shutdown_notify_mon not quite right
- 04:13 PM Backport #55074: octopus: osd: osd_fast_shutdown_notify_mon not quite right
- Laura Flores wrote:
> https://github.com/ceph/ceph/pull/45655
merged - 04:15 PM Bug #54592: partial recovery: CEPH_OSD_OP_OMAPRMKEYRANGE should mark omap dirty
- https://github.com/ceph/ceph/pull/45593 merged
- 03:53 PM Bug #52993: upgrade:octopus-x Test: Upgrade test failed due to timeout of the "ceph pg dump" command
- Similar problem happened on a rados/singleton test for Octopus:
/a/yuriw-2022-04-26_20:58:55-rados-wip-yuri2-testi... - 05:22 AM Bug #48440: log [ERR] : scrub mismatch
- /home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/681...
- 05:20 AM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/681...
- 05:18 AM Bug #49591: no active mgr (MGR_DOWN)" in cluster log
- /home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/681...
05/10/2022
- 12:38 PM Backport #53971 (In Progress): octopus: BufferList.rebuild_aligned_size_and_memory failure
- https://github.com/ceph/ceph/pull/46216
- 12:32 PM Backport #53972 (In Progress): pacific: BufferList.rebuild_aligned_size_and_memory failure
- https://github.com/ceph/ceph/pull/46215
- 04:38 AM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
- they did: https://tracker.ceph.com/issues/55074
- 01:59 AM Bug #53327: osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify...
- octopus: osd/OSD: osd_fast_shutdown_notify_mon not quite right #45655
https://github.com/ceph/ceph/pull/45655/commit... - 04:31 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
- jianwei zhang wrote:
> osd_fast_shutdown(true)
> osd_fast_shutdown_notify_mon(false)
> osd_mon_shutdown_timeout(5... - 12:55 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
- osd_fast_shutdown(true)
osd_fast_shutdown_notify_mon(false)
osd_mon_shutdown_timeout(5s) --> cannot send MOSDMar... - 12:49 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
- ...
- 12:44 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
- mon.a/c has millions of osd_failure (immediate+timeout). There should be messages forwarded by mon.c.
- 12:41 AM Backport #55067: octopus: osd_fast_shutdown_notify_mon option should be true by default
- ceph version: v15.2.13
I found a problem with the mon election, which should be related to it.
Test steps when ... - 01:02 AM Bug #53328 (Duplicate): osd_fast_shutdown_notify_mon option should be true by default
05/09/2022
- 04:47 PM Bug #55582 (New): octopus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails because `r...
- /a/lflores-2022-05-09_14:54:06-rados-wip-55077-octopus-distro-default-smithi/6828789...
- 04:10 PM Bug #48793: out of order op
- @Neha @Ronen this Octopus failure looks a lot like this Tracker. Was the revised scrub code backported to Octopus, or...
- 03:37 PM Backport #55581 (Rejected): octopus: api_list: LibRadosList.EnumerateObjects and LibRadosList.Enu...
- 03:35 PM Bug #52553: pybind: rados.RadosStateError raised when closed watch object goes out of scope after...
- /a/lflores-2022-05-04_18:59:38-rados-wip-55077-octopus-distro-default-smithi/6821227...
- 03:31 PM Bug #48899 (Pending Backport): api_list: LibRadosList.EnumerateObjects and LibRadosList.Enumerate...
- /a/lflores-2022-05-04_18:59:38-rados-wip-55077-octopus-distro-default-smithi/6820998...
- 11:43 AM Bug #54182: OSD_TOO_MANY_REPAIRS cannot be cleared in >=Octopus
- I just observed this issue once more and forgot to drop the info that a restart of an OSD actually resets this counte...
- 10:08 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- (upgrade and restart OSDs is probably more accurate wording). If I upgrade node #2 and OSD on node #1 would die with ...
- 10:07 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- Always happens when you upgrade nodes, probably some timing issue with PGs going or flapping primary. I never have de...
- 10:06 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- 'virtual void PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, ObjectContextRef, bool, Obj...
- 07:32 AM Bug #55573: stretch mode: be more sane about changing different size/min_size
- Realized my suggestion/formula in the mailing list wasn't good :)
This is what I intended originally:
- degraded ...
05/06/2022
- 04:46 PM Bug #55573 (New): stretch mode: be more sane about changing different size/min_size
- From the mailing list:
I created 2 aditional pools each with a matching stretch rule:
- size=2/min=1 (not advised... - 01:01 AM Bug #55549: OSDs crashing
- After days of fighting this (it's on a production cluster) I finally gave up on the least important of the pools -- t...
05/05/2022
- 04:23 PM Bug #47025: rados/test.sh: api_watch_notify_pp LibRadosWatchNotifyECPP.WatchNotify failed
- This is from the 16.2.8 run.
/a/yuriw-2022-05-04_20:09:21-rados-pacific-distro-default-smithi/6821705... - 03:22 PM Backport #55439: quincy: FAILED ceph_assert due to issue manifest API to the original object
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/46061
merged - 03:20 PM Backport #55047: quincy: rados/test.sh hangs while running LibRadosTwoPoolsPP.ManifestFlushDupCount
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/45624
merged - 03:13 PM Bug #55559 (Duplicate): osd-backfill-stats.sh fails in TEST_backfill_ec_prim_out
- /a/yuriw-2022-04-28_14:23:18-rados-wip-yuri-testing-2022-04-27-1456-quincy-distro-default-smithi/6811107...
- 01:37 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Can I kindly ask if there's an estimate when this will be fixed and backported? We have customers that have been in t...
05/04/2022
- 10:20 PM Bug #55549 (Resolved): OSDs crashing
- My apologies if this is the wrong project; I'm so lost on this particular issue that I'm not even sure where to ask f...
- 08:11 PM Bug #55407: quincy osd's fail to boot and crash
- It seems I messed up everything... Let me startover.
I have a ceph cluster running since looooong time ago. Recent... - 05:56 PM Bug #55407: quincy osd's fail to boot and crash
- Gonzalo Aguilar Delgado wrote:
> It doesn't matter. This is just a side effect. I mean... The bug is not caused by t... - 05:52 PM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
- I think the lack of @-2@ (@ENOENT@) **might** be caused the errno normalization @Objecter@ has.
- 07:42 AM Bug #53789: CommandFailedError (rados/test_python.sh): "RADOS object not found" causes test_rados...
- I hit another issue when we have socket failure injection active when running the tests. I think this is not only the...
- 05:43 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- To judge how severe the problem really is we need the information whether the stall is permanent (PG gets stuck and t...
- 01:35 PM Bug #52624: qa: "Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)"
- "These PG_AVAILBILITY warnings are frequently seen with snap-schedule teuthology jobs.":https://pulpito.ceph.com/mcha...
- 05:37 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- Just the record: we suspect the issue is related to the error injection in async-msgr. Some runs without them are sup...
- 11:30 AM Backport #55543 (In Progress): quincy: should use TCMalloc for better performance
- https://github.com/ceph/ceph/pull/47927
- 11:30 AM Backport #55542 (Rejected): octopus: should use TCMalloc for better performance
- 11:30 AM Backport #55541 (In Progress): pacific: should use TCMalloc for better performance
- https://github.com/ceph/ceph/pull/51282
- 11:29 AM Bug #55519 (Pending Backport): should use TCMalloc for better performance
- 06:13 AM Documentation #46120 (Resolved): Improve ceph-objectstore-tool documentation
- This issue has been resolved. The ceph-objectstore-tool documentation now exists, and there's even a good manpage.
...
05/03/2022
- 07:48 PM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
- Myoungwon Oh wrote:
> https://github.com/ceph/ceph/pull/46120
Thanks for looking into it and creating the backport! - 02:18 AM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
- https://github.com/ceph/ceph/pull/46120
- 01:40 AM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
- I think this is the same issue as https://tracker.ceph.com/issues/50806.
This issue was already fixed, but not backp... - 01:07 AM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
- Sure.
- 07:46 PM Backport #50893 (In Progress): pacific: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recover...
- 06:26 PM Backport #55019: octopus: partial recovery: CEPH_OSD_OP_OMAPRMKEYRANGE should mark omap dirty
- Christian Rohmann wrote:
> Sorry for being a nag ... I initially reported https://tracker.ceph.com/issues/53663 and ... - 04:41 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
- 玮文 胡 wrote:
> Maybe we should fix the release note(https://docs.ceph.com/en/latest/releases/quincy/) first? The work... - 04:32 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
- 玮文 胡 wrote:
> https://github.com/ceph/ceph/pull/46124
>
> Tested locally with
>
> [...]
Thank you. - 04:27 PM Bug #55383 (Fix Under Review): monitor cluster logs(ceph.log) appear empty until rotated
- 09:55 AM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
- Maybe we should fix the release note(https://docs.ceph.com/en/latest/releases/quincy/) first? The workaround there is...
- 09:44 AM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
- https://github.com/ceph/ceph/pull/46124
Tested locally with... - 03:49 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- Ran 100 thrash-erasure-code-big tests in octopus, and the `wait_for_recovery` assertion occurred 18/100 times, with 1...
- 11:19 AM Bug #55407: quincy osd's fail to boot and crash
- It doesn't matter. This is just a side effect. I mean... The bug is not caused by the tool.
The bug is caused becau... - 07:23 AM Bug #55519 (Fix Under Review): should use TCMalloc for better performance
- 07:22 AM Bug #55519 (Pending Backport): should use TCMalloc for better performance
- we had been using TCMalloc in older releases. but somehow, we stopped doing so. let's bring it back.
05/02/2022
- 06:50 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
- To me looks like this is the problem?
https://github.com/ceph/ceph/commit/7c84e06e6f846f6b4b6fd959218b4d474520f429... - 05:29 PM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
- Myoungwon Oh: Seeing this in pacific as well, can you confirm if it is the same issue?
/a/yuriw-2022-04-30_17:01:... - 02:12 PM Bug #53789 (In Progress): CommandFailedError (rados/test_python.sh): "RADOS object not found" cau...
- 01:11 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- Scheduled another run with just the rados/verify test that failed and I can see this happen frequently:
/a/amathu... - 01:07 PM Backport #55513 (In Progress): quincy: mount.ceph fails to understand AAAA records from SRV record
- 12:57 PM Backport #55513 (Resolved): quincy: mount.ceph fails to understand AAAA records from SRV record
- https://github.com/ceph/ceph/pull/46113
- 01:03 PM Backport #55514 (In Progress): pacific: mount.ceph fails to understand AAAA records from SRV record
- 12:57 PM Backport #55514 (Resolved): pacific: mount.ceph fails to understand AAAA records from SRV record
- https://github.com/ceph/ceph/pull/46112
- 12:52 PM Bug #47300 (Pending Backport): mount.ceph fails to understand AAAA records from SRV record
- 08:34 AM Bug #47300 (Resolved): mount.ceph fails to understand AAAA records from SRV record
05/01/2022
- 05:40 AM Bug #43887 (Fix Under Review): ceph_test_rados_delete_pools_parallel failure
- https://github.com/ceph/ceph/pull/46099
04/28/2022
- 09:29 PM Bug #55488 (New): ENOENT on clone on EC non-primary shard
- ...
- 06:59 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2022-04-27_02:52:22-rados-pacific-distro-default-smithi/6807766...
04/27/2022
- 09:51 PM Backport #55439 (In Progress): quincy: FAILED ceph_assert due to issue manifest API to the origin...
- 09:25 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- Laura Flores wrote:
> This one looks somewhat different from the other reported failures. First of all, it failed on... - 05:37 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- Let's discuss this on the next RADOS Team Meeting.
- 06:07 PM Bug #55424 (Won't Fix): ceph-mon process exit in dead status , which backtrace displayed has bloc...
- Sorry, the version is EOL :-(.
- 06:06 PM Bug #55419 (Resolved): cephtool/test.sh: failure on blocklist testing
- 05:58 PM Bug #55440: osd-scrub-test.sh: TEST_scrub_test failed due to inconsistent PG
- ...
- 05:56 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
- Laura Flores wrote:
> /a/yuriw-2022-04-26_00:11:14-rados-wip-55324-pacific-backport-distro-default-smithi/6805265/re... - 05:49 PM Bug #55407: quincy osd's fail to boot and crash
- Hello Gonzalo!
Just a quick note from a bug srub: we don't support mixing the tool from a newer release with OSDs fr... - 05:39 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- This was discussed in the rados meeting this week. Laura is trying to check if the bug exists in Octopus or not, to h...
- 05:34 PM Bug #55433 (Closed): common: FAILED ceph_assert(((lock).is_locked()))
- The fix will be merged with the original PR.
- 10:24 AM Bug #47300 (Fix Under Review): mount.ceph fails to understand AAAA records from SRV record
04/26/2022
- 03:58 PM Bug #52657: MOSDPGLog::encode_payload(uint64_t): Assertion `HAVE_FEATURE(features, SERVER_NAUTILUS)'
- This one looks somewhat different from the other reported failures. First of all, it failed on a rados/verify test, n...
- 02:33 PM Bug #49754: osd/OSD.cc: ceph_abort_msg("abort() called") during OSD::shutdown()
- /a/yuriw-2022-04-26_00:11:14-rados-wip-55324-pacific-backport-distro-default-smithi/6805265/remote/smithi061/crash/20...
- 10:27 AM Bug #55450 (Resolved): [DOC] stretch_rule defined in the doc needs updation
- in section [1], the stretch_rule defined to be added to the crush map needs to be updated.
min size and max size par...
04/25/2022
- 10:11 PM Bug #55433: common: FAILED ceph_assert(((lock).is_locked()))
- https://github.com/ceph/ceph/pull/46028 has been merged to unblock other master PR merges.
- 06:28 PM Bug #55433 (Fix Under Review): common: FAILED ceph_assert(((lock).is_locked()))
- 05:36 PM Bug #55433 (Closed): common: FAILED ceph_assert(((lock).is_locked()))
- Seen in jenkins make check tests, i.e. https://jenkins.ceph.com/job/ceph-pull-requests/94227/console...
- 09:59 PM Bug #55440 (New): osd-scrub-test.sh: TEST_scrub_test failed due to inconsistent PG
- /a/yuriw-2022-04-22_13:56:48-rados-wip-yuri2-testing-2022-04-22-0500-distro-default-smithi/6800338...
- 07:31 PM Bug #44595: cache tiering: Error: oid 48 copy_from 493 returned error code -2
- /a/yuriw-2022-04-25_14:14:44-rados-wip-yuri3-testing-2022-04-22-0534-quincy-distro-default-smithi/6805186...
- 07:07 PM Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC back...
- /a/yuriw-2022-04-22_21:06:04-rados-wip-yuri3-testing-2022-04-22-0534-quincy-distro-default-smithi/6802072
- 07:00 PM Backport #55439 (Resolved): quincy: FAILED ceph_assert due to issue manifest API to the original ...
- https://github.com/ceph/ceph/pull/46061
- 06:59 PM Bug #54509 (Pending Backport): FAILED ceph_assert due to issue manifest API to the original object
- 06:58 PM Bug #54509: FAILED ceph_assert due to issue manifest API to the original object
- /a/yuriw-2022-04-22_21:06:04-rados-wip-yuri3-testing-2022-04-22-0534-quincy-distro-default-smithi/6802065/remote/smit...
- 06:56 PM Bug #55435: mon/Elector: notify_ranked_removed() does not properly erase dead_ping in the case of...
- In an example scenario where we have 5 monitors:
rank_size = 5
mon.a (rank 0)
mon.b (rank 1)
mon.c (rank 2)
mo... - 06:29 PM Bug #55435 (Resolved): mon/Elector: notify_ranked_removed() does not properly erase dead_ping in ...
- 05:54 PM Bug #55407: quincy osd's fail to boot and crash
- Gonzalo Aguilar Delgado wrote:
> Neha Ojha wrote:
> > Did you see the same segmentation fault in quincy and pacific... - 05:50 PM Bug #55407: quincy osd's fail to boot and crash
- The situation is even worse. Any osd created with ceph version 17.1.0 (c675060073a05d40ef404d5921c81178a52af6e0) quin...
- 06:23 AM Bug #55407: quincy osd's fail to boot and crash
- I managed to reproduce...
I install an OSD with the pacific version. Then I let it run for a while (10 min or so)... - 05:51 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
- Looks like it is happening because of mon/LogMonitor changing it back to RADOS.
- 05:49 PM Bug #55383 (Triaged): monitor cluster logs(ceph.log) appear empty until rotated
- 05:49 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
- If you are okay can you please send a quick fix?
- 05:48 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
- 玮文 胡 wrote:
> I suspect this issue is due to https://github.com/ceph/ceph/commit/7c84e06e6f846f6b4b6fd959218b4d47452... - 05:19 PM Bug #55419 (Fix Under Review): cephtool/test.sh: failure on blocklist testing
- 03:23 PM Bug #54458: osd-scrub-snaps.sh: TEST_scrub_snaps failed due to malformed log message
- Perhaps this has resurfaced?
/a/yuriw-2022-04-22_13:56:48-rados-wip-yuri2-testing-2022-04-22-0500-distro-default-s... - 09:42 AM Bug #55424 (Won't Fix): ceph-mon process exit in dead status , which backtrace displayed has bloc...
- plz see abc.png
LevelDBstore::close
set thread quit flag, compact_queue_stop = true.
then send signal ...
04/23/2022
- 09:59 AM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
- I suspect this issue is due to https://github.com/ceph/ceph/commit/7c84e06e6f846f6b4b6fd959218b4d474520f429 and have ...
- 12:04 AM Bug #55419 (In Progress): cephtool/test.sh: failure on blocklist testing
04/22/2022
- 10:00 PM Bug #52153: crash: int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef): abort
- I have also seen this crash on my monitor running 16.2.7.
- 09:24 PM Bug #55407: quincy osd's fail to boot and crash
- I saw the stacktrace. This time v17.2.0. Latest...
- 09:20 PM Bug #55407: quincy osd's fail to boot and crash
- Ok. This is the situation:
1.- OSD built from scracth in pacific. (docker pull ceph/daemon:latest-pacific)(
2.- U... - 08:47 PM Bug #55407: quincy osd's fail to boot and crash
- Igor Fedotov wrote:
> >2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluefs _replay 0x11000: stop: unrecognized op 12... - 03:36 PM Bug #55407: quincy osd's fail to boot and crash
- >2022-04-22T13:34:42.419+0000 7fd5798ed080 -1 bluefs _replay 0x11000: stop: unrecognized op 12
@Gonzalo, AFAIU you... - 01:38 PM Bug #55407: quincy osd's fail to boot and crash
- Neha Ojha wrote:
> Did you see the same segmentation fault in quincy and pacific? Were you testing a custom build of... - 09:21 PM Bug #55419 (Resolved): cephtool/test.sh: failure on blocklist testing
- /a/yuriw-2022-04-22_13:56:48-rados-wip-yuri2-testing-2022-04-22-0500-distro-default-smithi/6800292...
- 07:52 PM Bug #24057 (Rejected): cbt fails to copy results to the archive dir
- 06:27 PM Bug #43189 (Resolved): pgs stuck in laggy state
- 06:27 PM Backport #43232 (Rejected): nautilus: pgs stuck in laggy state
- Nautilus is EOL
- 06:26 PM Bug #41385 (Resolved): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(from...
- 06:26 PM Backport #41731 (Rejected): nautilus: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_mis...
- Nautilus is EOL
- 02:46 PM Backport #55405 (In Progress): quincy: librados C++ API requires C++17 to build
- https://github.com/ceph/ceph/pull/46005
- 02:41 PM Backport #55406 (In Progress): pacific: librados C++ API requires C++17 to build
- https://github.com/ceph/ceph/pull/46004
04/21/2022
- 11:11 PM Bug #55407 (Need More Info): quincy osd's fail to boot and crash
- Did you see the same segmentation fault in quincy and pacific? Were you testing a custom build of ceph (17.1.0 is a d...
- 08:20 PM Bug #55407 (Rejected): quincy osd's fail to boot and crash
- I have a cluster with pacific. One of the osd started to crash...
So I zapped the disk and recreated again. I foun... - 08:21 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Gonzalo Aguilar Delgado wrote:
> I suppose this thread can be closed as soon as the fix is in master. But just for r... - 06:10 PM Bug #53729: ceph-osd takes all memory before oom on boot
- I suppose this thread can be closed as soon as the fix is in master. But just for reference, in case has something to...
- 05:01 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Mykola Golub wrote:
> Gonzalo Aguilar Delgado wrote:
>
> > Mykola, specially thank you for doing the patch.
>
... - 04:39 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Gonzalo Aguilar Delgado wrote:
> Mykola, specially thank you for doing the patch.
I am not the author of the pa... - 04:34 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Yesssss!!! Great job team!
It's up & running. It purged dups, booted the ceph-osd and only 1/2Gb RAM full booted. ... - 04:23 PM Bug #53729: ceph-osd takes all memory before oom on boot
- Mykola Golub wrote:
> Gonzalo Aguilar Delgado wrote:
>
> > CEPH_ARGS="--osd_pg_log_trim_max=10000 --osd_max_pg_lo... - 07:30 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Gonzalo Aguilar Delgado wrote:
> CEPH_ARGS="--osd_pg_log_trim_max=10000 --osd_max_pg_log_entries=2000 " LD_LIBRARY... - 07:01 AM Bug #53729: ceph-osd takes all memory before oom on boot
- Nitzan Mordechai wrote:
> Gonzalo Aguilar Delgado wrote:
> > Nitzan Mordechai wrote:
> > > Can you please add the ... - 07:28 PM Bug #55383: monitor cluster logs(ceph.log) appear empty until rotated
- - Just for completeness as expected this issue again happed today in gibba cluster during the log rotation window and...
- 05:14 PM Feature #54115 (In Progress): Log pglog entry size in OSD log if it exceeds certain size limit
- 04:25 PM Backport #55406 (In Progress): pacific: librados C++ API requires C++17 to build
- https://github.com/ceph/ceph/pull/46004
- 04:25 PM Backport #55405 (In Progress): quincy: librados C++ API requires C++17 to build
- 04:22 PM Bug #55233: librados C++ API requires C++17 to build
- The c++ api was created only for internal use. It should not be held to such a guarantee. At least, that's what I u...
- 04:20 PM Bug #55233 (Pending Backport): librados C++ API requires C++17 to build
- 03:31 PM Feature #53050 (Pending Backport): Support blocklisting a CIDR range
- 03:31 PM Feature #53050 (Resolved): Support blocklisting a CIDR range
- 12:21 PM Feature #55402 (New): rgw: Add dbstore & cloud-transition test-suites to teuthology
- Add new test-suites to teuthology for below RGW features -
* cloud-transition
* dbstore backend - 03:40 AM Bug #55355: osd thread deadlock
- Thanks for your reply @Radoslaw Zarzynski.
I checked the latest code and found that the code logic is the same. I th...
Also available in: Atom