Activity
From 04/26/2021 to 05/25/2021
05/25/2021
- 10:24 PM Bug #50775: mds and osd unable to obtain rotating service keys
- Out of curiosity, how many iterations of bugshell does it take to reproduce? I might try it on the weekend, but it w...
- 10:21 PM Bug #50775: mds and osd unable to obtain rotating service keys
- The logs that you provided are weird. Some log messages that should be there are not there. For example, I don't se...
- 08:20 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- Here's a bit more info that may be useful. Only because it's a volume already exported to the container out of the bo...
- 04:06 PM Bug #46847: Loss of placement information on OSD reboot
- It seems it does find the data when I issue @ceph pg repeer $pgid@. Observed on MON 14.2.21 with all OSDs 14.2.15.
- 12:30 PM Cleanup #50925 (Fix Under Review): add backfill_unfound test
- 10:24 AM Bug #50950: MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causing PG stuck a...
- And the what looks like from top:...
- 10:18 AM Bug #50950: MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causing PG stuck a...
- Finally, I got the cpu killer stack:...
- 09:37 AM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
- Neha, can you draw any conclusions from the above debug_osd=30 log with this issue?
- 09:36 AM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- Neha Ojha wrote:
> Do the OSDs hitting this assert come up fine on restarting? or are they repeatedly hitting this a... - 06:46 AM Bug #50657: smart query on monitors
- Sorry, I meant version 16.2.1 (Ubuntu packages), by now 16.2.4 of course
@ceph device ls@ doesn't list any devices... - 06:38 AM Bug #47380: mon: slow ops due to osd_failure
- https://github.com/ceph/ceph/pull/40033 failed to address this issue, i am creating another issue #50964 to track thi...
- 06:37 AM Bug #50964 (Resolved): mon: slow ops due to osd_failure
- ...
05/24/2021
- 09:46 PM Bug #49052 (Resolved): pick_a_shard() always select shard 0
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:42 PM Backport #50701 (Resolved): nautilus: Data loss propagation after backfill
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41238
m... - 09:35 PM Backport #50793 (Resolved): octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41321
m... - 09:30 PM Backport #50703 (Resolved): octopus: Data loss propagation after backfill
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41237
m... - 09:25 PM Backport #49993 (Resolved): octopus: unittest_mempool.check_shard_select failed
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39978
m... - 09:25 PM Backport #49053 (Resolved): octopus: pick_a_shard() always select shard 0
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39978
m... - 03:34 PM Bug #50657: smart query on monitors
- Thanks, Jan-Philipp.
I tried to reproduce this issue and get the empty device name, while not having a sudoer perm... - 11:54 AM Bug #50775: mds and osd unable to obtain rotating service keys
- bugshell is my test case,mon.b is a peon monitor
- 11:25 AM Bug #50775: mds and osd unable to obtain rotating service keys
- wenge song wrote:
> mds.b unable to obtain rotating service keys,this is mds.b log
2021-05-24T18:48:22.934+0800 7... - 11:18 AM Bug #50775: mds and osd unable to obtain rotating service keys
- mds.b unable to obtain rotating service keys,this is mds.b log
- 11:15 AM Bug #50775: mds and osd unable to obtain rotating service keys
- mon leader log
- 07:48 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> Just to be clear, are you saying that if the proposal with the new keys doesn't get sent becau... - 07:30 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> Can you share the full monitor logs? Specifically, I'm interested in the log where the follow... - 10:03 AM Bug #50950 (Won't Fix): MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causin...
- I'm using this mimic cluster (about 530 OSDs) for over 1 year, recently I found some particular OSDs randomly run int...
- 02:34 AM Bug #50943 (Closed): mon crash due to assert failed
- Ceph version 12.2.11
3 mons, 1 mon can't start up due to assert failed
-6> 2021-05-20 16:11:32.755959 7fffd...
05/23/2021
- 08:53 PM Bug #50775: mds and osd unable to obtain rotating service keys
- Just to be clear, are you saying that if the proposal with the new keys doesn't get sent because trigger_propose() re...
- 08:36 PM Bug #50775: mds and osd unable to obtain rotating service keys
- Can you share the full monitor logs? Specifically, I'm interested in the log where the following excerpt came from
...
05/21/2021
- 09:18 PM Bug #50829: nautilus: valgrind leak in SimpleMessenger
- ...
- 03:11 PM Bug #50681: memstore: apparent memory leak when removing objects
- The ceph-osd had a RES memory footprint of 2.6GB while I created above files.
- 03:08 PM Bug #50681: memstore: apparent memory leak when removing objects
- Greg Farnum wrote:
> How long did you wait to see if memory usage dropped? Did you look at any logs or dump any pool... - 01:57 PM Backport #50793: octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd res...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41321
merged - 09:22 AM Cleanup #50925 (Fix Under Review): add backfill_unfound test
- Add a teuthology test that would use a scenarios similar to described in [1].
[1] https://tracker.ceph.com/issues/...
05/20/2021
- 06:07 PM Bug #48385: nautilus: statfs: a cluster with any up but out osd will report bytes_used == stored
- Fixed starting 14.2.16
- 05:04 PM Backport #50701: nautilus: Data loss propagation after backfill
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41238
merged - 04:50 PM Backport #50911 (Rejected): nautilus: PGs always go into active+clean+scrubbing+deep+repair in th...
- 04:50 PM Backport #50910 (Rejected): octopus: PGs always go into active+clean+scrubbing+deep+repair in the...
- 04:46 PM Bug #50446: PGs always go into active+clean+scrubbing+deep+repair in the LRC
- This issue exists in nautilus and octopus as well. We might want to take a less intrusive approach for the backports.
- 06:29 AM Bug #50446 (Pending Backport): PGs always go into active+clean+scrubbing+deep+repair in the LRC
- 12:12 PM Bug #50903: ceph_objectstore_tool: Slow ops reported during the test.
- JobId:
/a/sseshasa-2021-05-17_11:08:21-rados-wip-sseshasa-testing-2021-05-17-1504-distro-basic-smithi/6118306
Obs... - 11:58 AM Bug #50903 (Closed): ceph_objectstore_tool: Slow ops reported during the test.
- 10:05 AM Bug #50775 (Fix Under Review): mds and osd unable to obtain rotating service keys
- 09:57 AM Bug #50775: mds and osd unable to obtain rotating service keys
- wenge song wrote:
> Ilya Dryomov wrote:
> > I posted https://github.com/ceph/ceph/pull/41368, please take a look. ... - 06:30 AM Backport #50900 (Resolved): pacific: PGs always go into active+clean+scrubbing+deep+repair in the...
- https://github.com/ceph/ceph/pull/42398
- 06:20 AM Backport #50893 (Resolved): pacific: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_s...
- https://github.com/ceph/ceph/pull/46120
- 06:17 AM Bug #50806 (Pending Backport): osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.g...
- 12:40 AM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
- I think pacific.
- 01:39 AM Bug #50743: *: crash in pthread_getname_np
- oh, I mean in general, not necessarily in this case.
This was opened automatically by a telemetry-to-redmine bot t...
05/19/2021
- 11:32 PM Bug #50813 (Fix Under Review): mon/OSDMonitor: should clear new flag when do destroy
- 03:10 AM Bug #47025: rados/test.sh: api_watch_notify_pp LibRadosWatchNotifyECPP.WatchNotify failed
- There are, at this time, three different versions of this problem as seen in https://tracker.ceph.com/issues/50042#no...
- 03:07 AM Bug #50042: rados/test.sh: api_watch_notify failures
- /a/yuriw-2021-04-30_12:58:14-rados-wip-yuri2-testing-2021-04-29-1501-pacific-distro-basic-smithi/6086155 is the same ...
- 01:37 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> I posted https://github.com/ceph/ceph/pull/41368, please take a look. It's probably not going... - 12:19 AM Bug #50510: OSD will return -EAGAIN on balance_reads although it can return the data
- If the replica has a log entry for write on the object more recent than last_complete_ondisk (iirc), it will bounce t...
- 12:05 AM Bug #50510: OSD will return -EAGAIN on balance_reads although it can return the data
- The following indicates that that it is not safe to do a balanced read from the secondary at this time. Making the "c...
05/18/2021
- 09:04 PM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
- Myoungwon Oh wrote:
> https://github.com/ceph/ceph/pull/41373
how far back should we backport this? - 07:23 PM Bug #50806 (Fix Under Review): osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.g...
- 08:55 AM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
- https://github.com/ceph/ceph/pull/41373
- 07:46 PM Bug #50866 (New): osd: stat mismatch on objects
- ...
- 11:17 AM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- /a/sseshasa-2021-05-17_11:08:21-rados-wip-sseshasa-testing-2021-05-17-1504-distro-basic-smithi/6118250
- 09:52 AM Feature #49089 (Fix Under Review): msg: add new func support_reencode
- 09:33 AM Bug #50775: mds and osd unable to obtain rotating service keys
- I posted https://github.com/ceph/ceph/pull/41368, please take a look. It's probably not going to solve an edge case ...
- 09:01 AM Bug #49688: FAILED ceph_assert(is_primary()) in submit_log_entries during PromoteManifestCallback...
- Can I get detailed logs in /remote?---it seems that /remote is removed.
This log seems very strange because Promot... - 08:46 AM Bug #50245: TEST_recovery_scrub_2: Not enough recovery started simultaneously
- Saw this during the below run:
/a/sseshasa-2021-05-17_11:08:21-rados-wip-sseshasa-testing-2021-05-17-1504-distro-bas... - 06:12 AM Bug #50853 (Can't reproduce): libcephsqlite: Core dump while running test_libcephsqlite.sh.
- Observed on master during following run:
/a/sseshasa-2021-05-17_11:08:21-rados-wip-sseshasa-testing-2021-05-17-1504-... - 03:39 AM Bug #50042: rados/test.sh: api_watch_notify failures
- Deepika Upadhyay wrote:
> [...]
>
> /ceph/teuthology-archive/ideepika-2021-05-17_08:09:09-rados-wip-yuri2-testing... - 01:56 AM Bug #50743: *: crash in pthread_getname_np
- Yaarit Hatuka wrote:
> Hi Patrick,
>
> We don't have the signal number yet in the telemetry crash reports.
>
>...
05/17/2021
- 10:34 PM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
- This issue seems very similar to https://tracker.ceph.com/issues/49427#change-189520, which was fixed by https://gith...
- 11:36 AM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
additional relevant log: ...- 02:50 PM Bug #50842 (Resolved): pacific: recovery does not complete because of rw_manager lock not being ...
- recovery of the snapshot should be complete, and proceed to the head object:...
- 02:23 PM Bug #50743: *: crash in pthread_getname_np
- Hi Patrick,
We don't have the signal number yet in the telemetry crash reports.
You can see other crash events ... - 10:12 AM Bug #50042: rados/test.sh: api_watch_notify failures
- ...
- 07:13 AM Bug #50657: smart query on monitors
- Sure:...
- 01:16 AM Bug #49688: FAILED ceph_assert(is_primary()) in submit_log_entries during PromoteManifestCallback...
- ok
05/16/2021
- 03:55 PM Backport #50831 (In Progress): pacific: pacific ceph-mon: mon initial failed on aarch64
- https://github.com/ceph/ceph/pull/53463
- 03:51 PM Bug #50384 (Pending Backport): pacific ceph-mon: mon initial failed on aarch64
- 03:47 PM Bug #50384 (Resolved): pacific ceph-mon: mon initial failed on aarch64
05/15/2021
- 09:38 PM Bug #50829 (New): nautilus: valgrind leak in SimpleMessenger
- ...
- 09:20 AM Bug #50775: mds and osd unable to obtain rotating service keys
- No, it's the other way around -- there is some suspicion that https://tracker.ceph.com/issues/50390 is caused by thes...
- 09:11 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> But it is the original octopus release with no substantial backports, correct?
>
> In parti... - 08:54 AM Bug #50775: mds and osd unable to obtain rotating service keys
- But it is the original octopus release with no substantial backports, correct?
In particular, can you confirm that... - 08:32 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> So you are running the original octopus release with custom patches added in?
just a little... - 07:45 AM Bug #50775: mds and osd unable to obtain rotating service keys
- So you are running the original octopus release with custom patches added in?
- 05:26 AM Bug #50775: mds and osd unable to obtain rotating service keys
- Ilya Dryomov wrote:
> Hi Song,
>
> Could you please confirm the ceph version with the output of "ceph-mds --versi... - 02:00 AM Bug #50743 (Need More Info): *: crash in pthread_getname_np
- How do we know what the signal number was? Not clear to me what to do with this. I don't see anything obviously wrong...
05/14/2021
- 09:58 PM Bug #50692 (Resolved): nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
- 09:56 PM Bug #50746: osd: terminate called after throwing an instance of 'std::out_of_range'
- I ran the same command "MDS=3 OSD=3 MON=3 MGR=1 ../src/vstart.sh -n -X -G --msgr1 --memstore" and everything works fi...
- 09:43 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- Do the OSDs hitting this assert come up fine on restarting? or are they repeatedly hitting this assert?
- 01:34 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- Just purely based on the numbering of OSDs I know for a fact that osd.47 was upgraded before osd.59 so based on that ...
- 08:51 PM Bug #50775: mds and osd unable to obtain rotating service keys
- Hi Song,
Could you please confirm the ceph version with the output of "ceph-mds --version"? - 08:33 PM Bug #50657 (In Progress): smart query on monitors
- Hi Jan-Philipp,
Thanks for reporting this.
Can you please provide the output of `df` on the host where a monito... - 05:46 PM Backport #50703: octopus: Data loss propagation after backfill
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41237
merged - 05:43 PM Backport #49993: octopus: unittest_mempool.check_shard_select failed
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/39978
merged - 05:43 PM Backport #49053: octopus: pick_a_shard() always select shard 0
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/39978
merged - 04:42 PM Bug #47380 (Resolved): mon: slow ops due to osd_failure
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 03:47 PM Backport #49919 (Resolved): nautilus: mon: slow ops due to osd_failure
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41213
m... - 02:02 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- ...
- 07:40 AM Bug #50813 (Duplicate): mon/OSDMonitor: should clear new flag when do destroy
- the new flag in osdmap will affects the osd up according option
mon_osd_auto_mark_new_in. So it is more safe... - 01:41 AM Bug #50806 (Resolved): osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_lo...
- ...
05/13/2021
- 11:30 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- The crashed OSD was running 15.2.11 but do you happen to know what version osd.59(the primary for pg 6.7a) was runnin...
- 03:59 PM Bug #50692 (Fix Under Review): nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
- 03:15 PM Bug #50761: ceph mon hangs forever while trying to parse config
- removed logs since they are unrelated
- 08:08 AM Backport #50793 (In Progress): octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-p...
- 07:05 AM Backport #50793 (Resolved): octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
- https://github.com/ceph/ceph/pull/41321
- 08:07 AM Backport #50794 (In Progress): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-p...
- 07:05 AM Backport #50794 (Resolved): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
- https://github.com/ceph/ceph/pull/41320
- 07:12 AM Backport #50792 (In Progress): nautilus: osd: FAILED ceph_assert(recovering.count(*i)) after non-...
- The backport is included in https://github.com/ceph/ceph/pull/41293
- 07:05 AM Backport #50792 (Rejected): nautilus: osd: FAILED ceph_assert(recovering.count(*i)) after non-pri...
- 07:05 AM Backport #50797 (Resolved): pacific: mon: spawn loop after mon reinstalled
- https://github.com/ceph/ceph/pull/41768
- 07:05 AM Backport #50796 (Resolved): octopus: mon: spawn loop after mon reinstalled
- https://github.com/ceph/ceph/pull/41621
- 07:05 AM Backport #50795 (Resolved): nautilus: mon: spawn loop after mon reinstalled
- https://github.com/ceph/ceph/pull/41762
- 07:02 AM Bug #50351 (Pending Backport): osd: FAILED ceph_assert(recovering.count(*i)) after non-primary os...
- 07:01 AM Bug #50230 (Pending Backport): mon: spawn loop after mon reinstalled
- 06:55 AM Backport #50791 (Resolved): pacific: osd: write_trunc omitted to clear data digest
- https://github.com/ceph/ceph/pull/42019
- 06:55 AM Backport #50790 (Resolved): octopus: osd: write_trunc omitted to clear data digest
- https://github.com/ceph/ceph/pull/41620
- 06:55 AM Backport #50789 (Rejected): nautilus: osd: write_trunc omitted to clear data digest
- 06:54 AM Bug #50763 (Pending Backport): osd: write_trunc omitted to clear data digest
05/12/2021
- 10:44 PM Bug #49688: FAILED ceph_assert(is_primary()) in submit_log_entries during PromoteManifestCallback...
- Myoungwon Oh, any idea what could be causing this? Feel free to unassign, if you are not aware of what is causing thi...
- 10:39 PM Bug #49688: FAILED ceph_assert(is_primary()) in submit_log_entries during PromoteManifestCallback...
- /a/yuriw-2021-05-11_19:33:39-rados-wip-yuri2-testing-2021-05-11-1032-pacific-distro-basic-smithi/6110085
- 06:02 PM Bug #50761: ceph mon hangs forever while trying to parse config
- Ilya Dryomov wrote:
> Are you sure that client.admin.4638.log was generated by your hello-world binary? Because the... - 05:15 PM Bug #50761: ceph mon hangs forever while trying to parse config
- Are you sure that client.admin.4638.log was generated by your hello-world binary? Because the complete log attached ...
- 04:38 PM Bug #50761: ceph mon hangs forever while trying to parse config
- logs: https://drive.google.com/file/d/1RdLToyo3vpL3nFMI2U3hfGrY4tpfQ9Az/view?usp=sharing
- 04:03 PM Bug #50761: ceph mon hangs forever while trying to parse config
- Ilya Dryomov wrote:
> Where client.admin.4803.log came from? How was it captured?
I ran a hello world script:
<... - 03:51 PM Bug #50761: ceph mon hangs forever while trying to parse config
- Where client.admin.4803.log came from? How was it captured?
- 02:24 PM Bug #50775: mds and osd unable to obtain rotating service keys
- I will reproduce, fix and verify this bug. Then will send code review of bugfix.
- 02:23 PM Bug #50775 (Fix Under Review): mds and osd unable to obtain rotating service keys
- version-15.2.0
error message:
2021-05-04T05:51:54.719+0800 7f105b2737c0 -1 mds.c unable to obtain rotating serv... - 02:19 PM Bug #50384 (Fix Under Review): pacific ceph-mon: mon initial failed on aarch64
- 12:09 PM Backport #50666 (Resolved): pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat fai...
- 12:09 PM Bug #50595 (Resolved): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
- 07:06 AM Bug #50747 (Fix Under Review): nautilus: osd: backfill_unfound state reset to clean after osd res...
- 05:46 AM Bug #50763 (Fix Under Review): osd: write_trunc omitted to clear data digest
- 02:20 AM Bug #50763: osd: write_trunc omitted to clear data digest
- https://github.com/ceph/ceph/pull/41290
- 02:13 AM Bug #50763 (Resolved): osd: write_trunc omitted to clear data digest
05/11/2021
- 07:43 PM Backport #50666 (In Progress): pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat ...
- 04:32 PM Bug #50692: nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
- /a/yuriw-2021-05-11_14:36:21-rados-wip-yuri2-testing-2021-05-10-1557-nautilus-distro-basic-smithi/6109477
- 03:37 PM Bug #50761 (New): ceph mon hangs forever while trying to parse config
- ...
- 08:58 AM Bug #50004 (Resolved): mon: Modify Paxos trim logic to be more efficient
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:57 AM Bug #50395 (Resolved): filestore: ENODATA error after directory split confuses transaction
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:52 AM Backport #50125 (Resolved): nautilus: mon: Modify Paxos trim logic to be more efficient
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41099
m... - 08:52 AM Backport #50506 (Resolved): nautilus: mon/MonClient: reset authenticate_err in _reopen_session()
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41016
m... - 08:52 AM Backport #50481 (Resolved): nautilus: filestore: ENODATA error after directory split confuses tra...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40987
m... - 08:50 AM Backport #50504 (Resolved): octopus: mon/MonClient: reset authenticate_err in _reopen_session()
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41017
m... - 08:50 AM Backport #50479 (Resolved): octopus: filestore: ENODATA error after directory split confuses tran...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40988
m... - 07:48 AM Backport #49918 (Resolved): pacific: mon: slow ops due to osd_failure
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41090
m... - 07:45 AM Backport #50750 (Resolved): octopus: max_misplaced was replaced by target_max_misplaced_ratio
- https://github.com/ceph/ceph/pull/41624
- 07:45 AM Backport #50749 (Rejected): nautilus: max_misplaced was replaced by target_max_misplaced_ratio
- 07:45 AM Backport #50748 (Resolved): pacific: max_misplaced was replaced by target_max_misplaced_ratio
- https://github.com/ceph/ceph/pull/42250
- 07:41 AM Bug #50745 (Pending Backport): max_misplaced was replaced by target_max_misplaced_ratio
- would be great if we can backport https://github.com/ceph/ceph/pull/41207 along with the https://github.com/ceph/ceph...
- 04:22 AM Bug #50745 (Resolved): max_misplaced was replaced by target_max_misplaced_ratio
- but the document was not sync'ed.
- 07:07 AM Bug #50747 (Fix Under Review): nautilus: osd: backfill_unfound state reset to clean after osd res...
- On nautilus we have been observing an issue when an EC pg is in active+backfill_unfound+degraded state (which happens...
- 07:00 AM Bug #50351 (Fix Under Review): osd: FAILED ceph_assert(recovering.count(*i)) after non-primary os...
- In the mailing list thread [1] I provided some details why I think the current behaviour of `PrimaryLogPG::on_failed_...
- 04:38 AM Bug #50746 (New): osd: terminate called after throwing an instance of 'std::out_of_range'
- ...
- 02:30 AM Bug #50743 (Need More Info): *: crash in pthread_getname_np
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6f...
05/10/2021
- 05:07 PM Bug #50681: memstore: apparent memory leak when removing objects
- How long did you wait to see if memory usage dropped? Did you look at any logs or dump any pool object info?
I rea... - 07:35 AM Bug #46670: refuse to remove mon from the monmap if the mon is in quorum
- I still believe the "extra" security is important, I mean we do this for pools, mons are almost equally critical...
- 04:03 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
- I have encountered this 4 more times on a 20-OSD cluster now running 16.2.3. If needed, I can provide more info.
05/09/2021
- 05:17 PM Bug #45690: pg_interval_t::check_new_interval is overly generous about guessing when EC PGs could...
- This description doesn't seem quite right to me -- OSDs 1-3 were part of the interval in step 4 so they know that not...
- 06:09 AM Bug #45390 (Closed): FreeBSD: osdmap decode and encode does not give the same OSDMap
- I assume this is fixed by now since the FreeBSD port is under active development? :)
- 05:45 AM Bug #46670: refuse to remove mon from the monmap if the mon is in quorum
- I'm inclined to say that this is fine as-is? I don't off-hand know how we remove monitors from quorum from the Ceph CLI.
- 05:42 AM Bug #46876 (Resolved): osd/ECBackend: optimize remaining read as readop contain multiple objects
- 05:08 AM Feature #47666: Ceph pool history
- Much of this is also maintained in the audit log, but that's not easily digestible back in by Ceph.
- 03:42 AM Feature #48151 (Closed): osd: allow remote read by calling cls method from within cls context
- Remote calls like this are unfortunately not plausible to implement within the object handling workflow.
- 03:19 AM Support #48530 (Closed): ceph pg status in incomplete.
- This kind of question is best served on the ceph-users@ceph.io mailing list if you can't find the answer in the docum...
05/08/2021
- 11:04 PM Bug #49158: doc: ceph-monstore-tools might create wrong monitor store
- This problem was fixed in the following PR.
https://github.com/ceph/ceph/pull/39288 - 02:49 PM Bug #48468: ceph-osd crash before being up again
- Hi Sage,
Hum I've finally managed to recover my cluster after an uncounted osd restart procedures until they star... - 11:21 AM Backport #50701 (In Progress): nautilus: Data loss propagation after backfill
- 08:40 AM Backport #50701 (Resolved): nautilus: Data loss propagation after backfill
- https://github.com/ceph/ceph/pull/41238
- 11:20 AM Backport #50703 (In Progress): octopus: Data loss propagation after backfill
- 08:40 AM Backport #50703 (Resolved): octopus: Data loss propagation after backfill
- https://github.com/ceph/ceph/pull/41237
- 11:19 AM Backport #50702 (In Progress): pacific: Data loss propagation after backfill
- 08:40 AM Backport #50702 (Resolved): pacific: Data loss propagation after backfill
- https://github.com/ceph/ceph/pull/41236
- 09:40 AM Backport #50706 (Resolved): pacific: _delete_some additional unexpected onode list
- https://github.com/ceph/ceph/pull/41680
- 09:40 AM Backport #50705 (Resolved): octopus: _delete_some additional unexpected onode list
- https://github.com/ceph/ceph/pull/41623
- 09:40 AM Backport #50704 (Resolved): nautilus: _delete_some additional unexpected onode list
- https://github.com/ceph/ceph/pull/41682
- 09:37 AM Bug #50466 (Pending Backport): _delete_some additional unexpected onode list
- 08:37 AM Bug #50558 (Pending Backport): Data loss propagation after backfill
- 08:30 AM Backport #50697 (In Progress): pacific: common: the dump of thread IDs is in dec instead of hex
- https://github.com/ceph/ceph/pull/53465
- 08:29 AM Bug #50653 (Pending Backport): common: the dump of thread IDs is in dec instead of hex
- 03:26 AM Feature #49089 (In Progress): msg: add new func support_reencode
- 03:24 AM Support #49268 (Closed): Blocked IOs up to 30 seconds when host powered down
- You can also tune how quickly the OSDs report their peers down from missing heartbeats, but in general losing a monit...
- 03:17 AM Support #49489: Getting Long heartbeat and slow requests on ceph luminous 12.2.13
- This is almost certainly a result of cache tiering (which we generally discourage from use) being a bad fit or incorr...
05/07/2021
- 11:39 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- I have attached a coredump. This hook works fine in 15.2.9. I can also run it fine manually from inside a launched OS...
- 09:57 PM Bug #50659 (Need More Info): Segmentation fault under Pacific 16.2.1 when using a custom crush lo...
- Is it possible for you to capture a coredump? Did the same crush_location_hook work fine on your 15.2.9 cluster?
- 10:12 PM Bug #50637: OSD slow ops warning stuck after OSD fail
- This sounds like a bug, we shouldn't be accounting for down+out osds when counting slow ops.
- 08:57 AM Bug #50637: OSD slow ops warning stuck after OSD fail
- I now zapped and re-created the OSDs on this disk. As expected, purging OSD 580 from the cluster cleared the health w...
- 10:01 PM Bug #50657: smart query on monitors
- Yaarit, can you help take a look at this?
- 09:45 PM Bug #50682: Pacific - OSD not starting after upgrade
- This issue has been fixed by https://github.com/ceph/ceph/pull/40845 and will be released in the next pacific point r...
- 07:48 PM Bug #47949: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
- /a/yuriw-2021-05-06_15:20:22-rados-wip-yuri4-testing-2021-05-05-1236-nautilus-distro-basic-smithi/6101282
- 07:47 PM Bug #50692 (Resolved): nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
- ...
- 02:25 PM Bug #50688 (Duplicate): Ceph can't be deployed using cephadm on nodes with /32 ip addresses
- *Preamble*
In certain data centers it is common to assign a /32 ip address to a node and let bgp handle the reacha... - 12:34 PM Bug #50681: memstore: apparent memory leak when removing objects
- Thanks Greg for your answer. So my expectation was, that at least when there is memory pressure or I am unmounting th...
- 09:16 AM Bug #50683: [RBD] master - cluster [WRN] Health check failed: mon is allowing insecure global_id ...
- Hi Harish,...
- 09:04 AM Bug #50683 (Rejected): [RBD] master - cluster [WRN] Health check failed: mon is allowing insecure...
- Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e826082... - 09:06 AM Bug #49231: MONs unresponsive over extended periods of time
- After running for a few months with the modified setting, it seems that it fixes the issue. I still see CPU load- and...
- 03:10 AM Backport #49919 (In Progress): nautilus: mon: slow ops due to osd_failure
- 02:13 AM Bug #50245: TEST_recovery_scrub_2: Not enough recovery started simultaneously
- /a/nojha-2021-05-06_22:58:00-rados-wip-default-mclock-2021-05-06-distro-basic-smithi/6102970
- 01:22 AM Bug #50162 (Won't Fix): Backport to Natilus of automatic lowering min_size for repairing tasks (o...
- Nathan Cutler wrote:
> This needs a pull request ID, or a list of master commits that are requested to be backported...
05/06/2021
- 11:48 PM Bug #50682 (New): Pacific - OSD not starting after upgrade
- Copied from https://tracker.ceph.com/issues/50169
Using Ubuntu 20.04, none cephadm, packages from ceph repositorie... - 09:49 PM Bug #50681: memstore: apparent memory leak when removing objects
- I’m not totally clear on what you’re doing here and what you think the erroneous behavior is. Memstore only stores da...
- 07:05 PM Bug #50681: memstore: apparent memory leak when removing objects
- The title should say "osd objectstore = memstore"
- 06:31 PM Bug #50681 (New): memstore: apparent memory leak when removing objects
- When I create and unlink big files like in this[1] little program in my development environment, the OSD daemon keeps...
- 02:26 AM Bug #50558: Data loss propagation after backfill
- For the record, the following is the sequence of the data loss propagation when readdir error happens on filestore du...
05/05/2021
- 09:38 PM Bug #49809: 1 out of 3 mon crashed in MonitorDBStore::get_synchronizer
- Hi Christian,
No, unfortunately I hit a dead end on this as the log message issue was a red herring.
I'm afraid... - 02:36 PM Bug #49809: 1 out of 3 mon crashed in MonitorDBStore::get_synchronizer
- Brad were you able to find out more about the root cause of this crash?
- 09:09 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- Possibly related to 40119
- 06:05 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
- /a/sage-2021-05-05_15:58:13-rados-wip-sage-testing-2021-05-04-1814-distro-basic-smithi/6099487
- 07:26 PM Backport #49918: pacific: mon: slow ops due to osd_failure
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41090
merged - 06:10 PM Backport #50666 (Resolved): pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat fai...
- https://github.com/ceph/ceph/pull/41182
- 06:09 PM Bug #50595 (Pending Backport): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
- 04:26 PM Backport #50504: octopus: mon/MonClient: reset authenticate_err in _reopen_session()
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41017
merged - 04:25 PM Backport #50479: octopus: filestore: ENODATA error after directory split confuses transaction
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40988
merged - 03:33 PM Bug #17257 (Can't reproduce): ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
- 02:29 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
- I forgot to add that I tried to diff code I thought was relevant between tags v15.2.9 and v16.2.1 and thought I saw s...
- 02:21 PM Bug #50659 (Resolved): Segmentation fault under Pacific 16.2.1 when using a custom crush location...
- I feel like if this wasn't somehow just my problem, there'd be an issue open on it already, but I'm not seeing one, a...
- 01:58 PM Bug #50658 (New): TEST_backfill_pool_priority fails
- ...
- 12:39 PM Bug #50657 (Resolved): smart query on monitors
- Since the upgrade to Pacific, our manager queries each daemon for smart statistics.
This is fine on the OSDs (at l... - 10:51 AM Bug #49962 (Fix Under Review): 'sudo ceph --cluster ceph osd crush tunables default' fails due to...
- https://github.com/ceph/ceph/pull/41169
- 03:56 AM Bug #40119: api_tier_pp hung causing a dead job
- /a/bhubbard-2021-04-26_22:38:21-rados-master-distro-basic-smithi/6075940
In this instance the slow requests are on...
05/04/2021
- 07:28 PM Bug #50647: common: the fault handling becomes inoperational when multiple faults happen the same...
- Just to the record: https://gist.github.com/rzarzynski/eb21e48a4458b593912eccd50ab8da46.
- 07:09 PM Bug #50647 (Fix Under Review): common: the fault handling becomes inoperational when multiple fau...
- https://github.com/ceph/ceph/pull/41154
- 02:42 PM Bug #50647 (Fix Under Review): common: the fault handling becomes inoperational when multiple fau...
- The problem arises due to installing the fault handlers with the flag @SA_RESETHAND@. It instructs the kernel to rest...
- 07:26 PM Bug #50653 (Fix Under Review): common: the dump of thread IDs is in dec instead of hex
- https://github.com/ceph/ceph/pull/41155
- 07:08 PM Bug #50653 (Pending Backport): common: the dump of thread IDs is in dec instead of hex
- It's a fallout from 5b8274f09951c7f36eb1ca1a234e7c8a08c30c9c.
- 04:39 PM Backport #50125: nautilus: mon: Modify Paxos trim logic to be more efficient
- https://github.com/ceph/ceph/pull/41099 merged
- 04:38 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- https://github.com/ceph/ceph/pull/41098 merged
- 04:37 PM Backport #50506: nautilus: mon/MonClient: reset authenticate_err in _reopen_session()
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41016
merged - 04:02 PM Bug #50595 (Fix Under Review): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
- ...
- 03:31 PM Backport #50481: nautilus: filestore: ENODATA error after directory split confuses transaction
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40987
merged - 03:16 PM Bug #50648 (New): nautilus: ceph status check times out
- ...
- 07:47 AM Bug #50637 (Duplicate): OSD slow ops warning stuck after OSD fail
- We had a disk fail with 2 OSDs deployed on it, ids=580, 581. Since then, the health warning @430 slow ops, oldest one...
05/03/2021
- 11:05 PM Backport #50344: pacific: mon: stretch state is inconsistently-maintained on peons, preventing pr...
- https://github.com/ceph/ceph/pull/41130
- 10:59 PM Backport #50087 (In Progress): pacific: test_mon_pg: mon fails to join quorum to due election str...
- 09:57 PM Backport #50087: pacific: test_mon_pg: mon fails to join quorum to due election strategy mismatch
- https://github.com/ceph/ceph/pull/40484
- 08:53 PM Bug #47719 (Resolved): api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:53 PM Bug #48946 (Resolved): Disable and re-enable clog_to_monitors could trigger assertion
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:52 PM Bug #49392 (Resolved): osd ok-to-stop too conservative
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:50 PM Backport #49640 (Resolved): nautilus: Disable and re-enable clog_to_monitors could trigger assertion
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39912
m... - 08:49 PM Backport #49567 (Resolved): nautilus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40697
m... - 08:47 PM Backport #50130: nautilus: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40700
m... - 08:47 PM Backport #49531 (Resolved): nautilus: osd ok-to-stop too conservative
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40676
m... - 08:46 PM Backport #50459: nautilus: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40959
m... - 07:17 PM Bug #48732: Marking OSDs out causes mon daemons to crash following tcmalloc: large alloc
- Wes Dillingham wrote:
> Hello Dan and Neha. Shortly after filing this bug I went on paternity leave but have returne... - 07:04 PM Bug #48732: Marking OSDs out causes mon daemons to crash following tcmalloc: large alloc
- Hello Dan and Neha. Shortly after filing this bug I went on paternity leave but have returned today. I will try and a...
- 05:48 PM Bug #50595: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
- The upgrade test do not fail all the time.
upgrade:nautilus-x/parallel/{0-cluster/{openstack start} 1-ceph-install... - 05:21 PM Bug #50595: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
- I think the upgrade test just needs to skip that test. It's just looking for a specific string that changed in pacifi...
- 04:31 PM Bug #50595 (Triaged): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
- This seems related to ab0d8f2ae9f551e15a4c7bacbf69161e91263785.
Reverting makes the issue go away http://pulpito.fro... - 04:36 PM Bug #48997: rados/singleton/all/recovery-preemption: defer backfill|defer recovery not found in logs
- /a/yuriw-2021-04-30_13:45:05-rados-pacific-distro-basic-smithi/6086228
- 04:22 PM Backport #50129: octopus: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr_...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40758
m... - 04:21 PM Backport #49917: octopus: mon: slow ops due to osd_failure
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40558
m... - 04:21 PM Backport #50123: octopus: mon: Modify Paxos trim logic to be more efficient
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40699
m... - 04:21 PM Backport #49566: octopus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40756
m... - 04:20 PM Backport #49816: octopus: mon: promote_standby does not update available_modules
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40757
m... - 04:19 PM Backport #50457: octopus: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/d...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40958
m... - 04:16 PM Bug #48336 (Resolved): monmaptool --create --add nodeA --clobber monmap aborts in entity_addr_t::...
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:14 PM Bug #49778 (Resolved): mon: promote_standby does not update available_modules
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 04:04 PM Backport #50124 (Resolved): pacific: mon: Modify Paxos trim logic to be more efficient
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40691
m... - 04:04 PM Backport #50480: pacific: filestore: ENODATA error after directory split confuses transaction
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40989
m... - 04:04 PM Backport #50154 (Resolved): pacific: Reproduce https://tracker.ceph.com/issues/48417
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40759
m... - 04:04 PM Backport #50131 (Resolved): pacific: monmaptool --create --add nodeA --clobber monmap aborts in e...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40690
m... - 03:53 PM Backport #50458: pacific: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/d...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40957
m... - 12:07 PM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
- I had multiple OSDs die during an upgrade from Nautilus to Octopus with this trace. See the attached crash2.txt
- 12:05 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- Longer debug_osd output in crash1.txt file....
- 12:04 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- Longer debug_osd output in crash1.txt file.
-1> 2021-04-29T13:51:57.756+0200 7fa2edae2700 -1 /home/jenkins-b... - 12:02 PM Bug #50608 (Need More Info): ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
- This was happening on a cluster running Nautilus 14.2.18/14.2.19 during upgrading to Octopus 15.2.11 some 3-4 OSDs cr...
- 09:00 AM Backport #50606 (In Progress): pacific: osd/scheduler/mClockScheduler: Async reservers are not up...
- 08:12 AM Backport #50606 (Resolved): pacific: osd/scheduler/mClockScheduler: Async reservers are not updat...
- https://github.com/ceph/ceph/pull/41125
05/02/2021
- 01:44 PM Bug #50510: OSD will return -EAGAIN on balance_reads although it can return the data
- Neha Ojha wrote:
> Can you please provide osd logs from the primary and replica with debug_osd=20 and debug_ms=1? Th...
05/01/2021
- 09:54 PM Support #49847 (Closed): OSD Fails to init after upgrading to octopus: _deferred_replay failed to...
- 12:12 AM Bug #50595: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
- ...
04/30/2021
- 09:53 PM Bug #50420 (Need More Info): all osd down after mon scrub too long
- Can provide us with the cluster log from this time? How large is your mon db?
- 09:21 PM Bug #50422: Error: finished tid 1 when last_acked_tid was 2
- Looks like a cache tiering+short pg log bug...
- 08:29 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- Dan van der Ster wrote:
> Leaving this open to address the msgr2 abort. Presumably this is caused by the >4GB messag... - 06:36 AM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- Josh Durgin wrote:
> Ah good catch Dan, that loop does appear to be generating millions of '=' in nautilus. Sounds l... - 06:07 AM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- Leaving this open to address the msgr2 abort. Presumably this is caused by the >4GB message generated to respond to `...
- 01:08 AM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- Ah good catch Dan, that loop does appear to be generating millions of '=' in nautilus. Sounds like we need to fix tha...
- 07:38 PM Bug #50595 (Resolved): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
- Seems to be ubuntu 18.04 specific
Run: https://pulpito.ceph.com/yuriw-2021-04-29_16:10:26-upgrade:nautilus-x-pacif... - 06:56 PM Backport #49919: nautilus: mon: slow ops due to osd_failure
- Kefu, can you please help with a minimal backport for nautilus?
- 05:38 PM Bug #50501 (Pending Backport): osd/scheduler/mClockScheduler: Async reservers are not updated wit...
- 04:27 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
- /a/yuriw-2021-04-30_12:58:14-rados-wip-yuri2-testing-2021-04-29-1501-pacific-distro-basic-smithi/6086154
- 04:25 PM Bug #50042: rados/test.sh: api_watch_notify failures
- ...
- 09:21 AM Backport #50125: nautilus: mon: Modify Paxos trim logic to be more efficient
- please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/41099
ceph-backport.sh versi...
04/29/2021
- 11:16 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- commit 5f95ec4457059889bc4dbc2ad25cdc0537255f69 removed that loop in Monitor.cc but wasn't backported to nautilus.
... - 10:39 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- Something else weird in the mgr log: can negative progress events break the mon ?...
- 10:26 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
- Here are some notes on the timelines between various actors at the start of the incident:
mon.cephbeesly-mon-2a00f... - 07:14 PM Bug #50587 (Resolved): mon election storm following osd recreation: huge tcmalloc and ceph::msgr:...
- We recreated an osd and seconds later our mons started using 100% CPU and going into an election storm which lasted n...
- 08:55 PM Bug #48732: Marking OSDs out causes mon daemons to crash following tcmalloc: large alloc
- Wes, did you ever find out more about the root cause of this? We saw something similar today in #50587
- 07:51 PM Bug #50510 (Need More Info): OSD will return -EAGAIN on balance_reads although it can return the ...
- Can you please provide osd logs from the primary and replica with debug_osd=20 and debug_ms=1? That will help us unde...
- 05:41 PM Backport #49918 (In Progress): pacific: mon: slow ops due to osd_failure
- 05:06 PM Backport #50480 (Resolved): pacific: filestore: ENODATA error after directory split confuses tran...
- 12:05 PM Bug #50558 (Fix Under Review): Data loss propagation after backfill
- 11:25 AM Bug #50558: Data loss propagation after backfill
- Hi
I worked with hase-san and submitted PR to handle readdir error correctly in filestore code: https://github.com... - 10:38 AM Fix #50574: qa/standalone: Modify/re-write failing standalone tests with mclock scheduler
- Standalone failures observed here:
https://pulpito.ceph.com/sseshasa-2021-04-23_15:37:51-rados-wip-mclock-max-backfi... - 05:43 AM Fix #50574 (Resolved): qa/standalone: Modify/re-write failing standalone tests with mclock scheduler
- A subset of the existing qa/standlone tests are failing with osd_op_queue set to "mclock_scheduler".
This is mainl... - 10:19 AM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
- We run into the same assert over and over on one OSD. We were upgrading from luminous to nautilus ceph version 14.2.2...
04/28/2021
- 12:17 PM Bug #50558 (Resolved): Data loss propagation after backfill
- Situation:
An OSD data loss has been propagated to other OSDs. If backfill is performed when shard is missing in a p...
04/27/2021
- 09:14 PM Backport #50130 (Resolved): nautilus: monmaptool --create --add nodeA --clobber monmap aborts in ...
- 04:48 PM Backport #50130: nautilus: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40700
merged - 09:06 PM Backport #49567: nautilus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40697
merged - 04:47 PM Backport #49531: nautilus: osd ok-to-stop too conservative
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40676
merged - 12:37 PM Bug #50536 (New): "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on mas...
- /a/sseshasa-2021-04-23_18:11:53-rados-wip-sseshasa-testing-2021-04-23-2212-distro-basic-smithi/6068991
Noticed a t... - 11:23 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
- Observed the same failure here:
/a/sseshasa-2021-04-23_18:11:53-rados-wip-sseshasa-testing-2021-04-23-2212-distro-ba... - 06:08 AM Backport #50129 (Resolved): octopus: monmaptool --create --add nodeA --clobber monmap aborts in e...
04/26/2021
- 09:33 PM Backport #50124: pacific: mon: Modify Paxos trim logic to be more efficient
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40691
merged - 09:30 PM Backport #50480: pacific: filestore: ENODATA error after directory split confuses transaction
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40989
merged - 09:29 PM Backport #50154: pacific: Reproduce https://tracker.ceph.com/issues/48417
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40759
merged - 09:27 PM Backport #50131: pacific: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr_...
- Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40690
merged - 06:32 PM Bug #50512: upgrade:nautilus-p2p-nautilus: unhandled event in ToDelete
- ...
- 09:54 AM Bug #50351: osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd restart when in b...
- `PrimaryLogPG::on_failed_pull` [1] looks suspicious to me. We remove the oid from `backfills_in_flight` here only if ...
Also available in: Atom