Project

General

Profile

Activity

From 04/25/2021 to 05/24/2021

05/24/2021

09:46 PM Bug #49052 (Resolved): pick_a_shard() always select shard 0
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
09:42 PM Backport #50701 (Resolved): nautilus: Data loss propagation after backfill
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41238
m...
Loïc Dachary
09:35 PM Backport #50793 (Resolved): octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41321
m...
Loïc Dachary
09:30 PM Backport #50703 (Resolved): octopus: Data loss propagation after backfill
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41237
m...
Loïc Dachary
09:25 PM Backport #49993 (Resolved): octopus: unittest_mempool.check_shard_select failed
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39978
m...
Loïc Dachary
09:25 PM Backport #49053 (Resolved): octopus: pick_a_shard() always select shard 0
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39978
m...
Loïc Dachary
03:34 PM Bug #50657: smart query on monitors
Thanks, Jan-Philipp.
I tried to reproduce this issue and get the empty device name, while not having a sudoer perm...
Yaarit Hatuka
11:54 AM Bug #50775: mds and osd unable to obtain rotating service keys
bugshell is my test case,mon.b is a peon monitor wenge song
11:25 AM Bug #50775: mds and osd unable to obtain rotating service keys
wenge song wrote:
> mds.b unable to obtain rotating service keys,this is mds.b log
2021-05-24T18:48:22.934+0800 7...
wenge song
11:18 AM Bug #50775: mds and osd unable to obtain rotating service keys
mds.b unable to obtain rotating service keys,this is mds.b log wenge song
11:15 AM Bug #50775: mds and osd unable to obtain rotating service keys
mon leader log wenge song
07:48 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> Just to be clear, are you saying that if the proposal with the new keys doesn't get sent becau...
wenge song
07:30 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> Can you share the full monitor logs? Specifically, I'm interested in the log where the follow...
wenge song
10:03 AM Bug #50950 (Won't Fix): MIMIC OSD very high CPU usage(3xx%), stop responding to other osd, causin...
I'm using this mimic cluster (about 530 OSDs) for over 1 year, recently I found some particular OSDs randomly run int... Bin Guo
02:34 AM Bug #50943 (Closed): mon crash due to assert failed
Ceph version 12.2.11
3 mons, 1 mon can't start up due to assert failed
-6> 2021-05-20 16:11:32.755959 7fffd...
wencong wan

05/23/2021

08:53 PM Bug #50775: mds and osd unable to obtain rotating service keys
Just to be clear, are you saying that if the proposal with the new keys doesn't get sent because trigger_propose() re... Ilya Dryomov
08:36 PM Bug #50775: mds and osd unable to obtain rotating service keys
Can you share the full monitor logs? Specifically, I'm interested in the log where the following excerpt came from
...
Ilya Dryomov

05/21/2021

09:18 PM Bug #50829: nautilus: valgrind leak in SimpleMessenger
... Neha Ojha
03:11 PM Bug #50681: memstore: apparent memory leak when removing objects
The ceph-osd had a RES memory footprint of 2.6GB while I created above files. Sven Anderson
03:08 PM Bug #50681: memstore: apparent memory leak when removing objects
Greg Farnum wrote:
> How long did you wait to see if memory usage dropped? Did you look at any logs or dump any pool...
Sven Anderson
01:57 PM Backport #50793: octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd res...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41321
merged
Yuri Weinstein
09:22 AM Cleanup #50925 (Fix Under Review): add backfill_unfound test
Add a teuthology test that would use a scenarios similar to described in [1].
[1] https://tracker.ceph.com/issues/...
Mykola Golub

05/20/2021

06:07 PM Bug #48385: nautilus: statfs: a cluster with any up but out osd will report bytes_used == stored
Fixed starting 14.2.16 Igor Fedotov
05:04 PM Backport #50701: nautilus: Data loss propagation after backfill
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41238
merged
Yuri Weinstein
04:50 PM Backport #50911 (Rejected): nautilus: PGs always go into active+clean+scrubbing+deep+repair in th...
Backport Bot
04:50 PM Backport #50910 (Rejected): octopus: PGs always go into active+clean+scrubbing+deep+repair in the...
Backport Bot
04:46 PM Bug #50446: PGs always go into active+clean+scrubbing+deep+repair in the LRC
This issue exists in nautilus and octopus as well. We might want to take a less intrusive approach for the backports. Neha Ojha
06:29 AM Bug #50446 (Pending Backport): PGs always go into active+clean+scrubbing+deep+repair in the LRC
Kefu Chai
12:12 PM Bug #50903: ceph_objectstore_tool: Slow ops reported during the test.
JobId:
/a/sseshasa-2021-05-17_11:08:21-rados-wip-sseshasa-testing-2021-05-17-1504-distro-basic-smithi/6118306
Obs...
Sridhar Seshasayee
11:58 AM Bug #50903 (Closed): ceph_objectstore_tool: Slow ops reported during the test.
Sridhar Seshasayee
10:05 AM Bug #50775 (Fix Under Review): mds and osd unable to obtain rotating service keys
Kefu Chai
09:57 AM Bug #50775: mds and osd unable to obtain rotating service keys
wenge song wrote:
> Ilya Dryomov wrote:
> > I posted https://github.com/ceph/ceph/pull/41368, please take a look. ...
wenge song
06:30 AM Backport #50900 (Resolved): pacific: PGs always go into active+clean+scrubbing+deep+repair in the...
https://github.com/ceph/ceph/pull/42398 Backport Bot
06:20 AM Backport #50893 (Resolved): pacific: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_s...
https://github.com/ceph/ceph/pull/46120 Backport Bot
06:17 AM Bug #50806 (Pending Backport): osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.g...
Kefu Chai
12:40 AM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
I think pacific. Myoungwon Oh
01:39 AM Bug #50743: *: crash in pthread_getname_np
oh, I mean in general, not necessarily in this case.
This was opened automatically by a telemetry-to-redmine bot t...
Yaarit Hatuka

05/19/2021

11:32 PM Bug #50813 (Fix Under Review): mon/OSDMonitor: should clear new flag when do destroy
Neha Ojha
03:10 AM Bug #47025: rados/test.sh: api_watch_notify_pp LibRadosWatchNotifyECPP.WatchNotify failed
There are, at this time, three different versions of this problem as seen in https://tracker.ceph.com/issues/50042#no... Brad Hubbard
03:07 AM Bug #50042: rados/test.sh: api_watch_notify failures
/a/yuriw-2021-04-30_12:58:14-rados-wip-yuri2-testing-2021-04-29-1501-pacific-distro-basic-smithi/6086155 is the same ... Brad Hubbard
01:37 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> I posted https://github.com/ceph/ceph/pull/41368, please take a look. It's probably not going...
wenge song
12:19 AM Bug #50510: OSD will return -EAGAIN on balance_reads although it can return the data
If the replica has a log entry for write on the object more recent than last_complete_ondisk (iirc), it will bounce t... Samuel Just
12:05 AM Bug #50510: OSD will return -EAGAIN on balance_reads although it can return the data
The following indicates that that it is not safe to do a balanced read from the secondary at this time. Making the "c... Neha Ojha

05/18/2021

09:04 PM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
Myoungwon Oh wrote:
> https://github.com/ceph/ceph/pull/41373
how far back should we backport this?
Neha Ojha
07:23 PM Bug #50806 (Fix Under Review): osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.g...
Neha Ojha
08:55 AM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
https://github.com/ceph/ceph/pull/41373 Myoungwon Oh
07:46 PM Bug #50866 (New): osd: stat mismatch on objects
... Patrick Donnelly
11:17 AM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
/a/sseshasa-2021-05-17_11:08:21-rados-wip-sseshasa-testing-2021-05-17-1504-distro-basic-smithi/6118250 Sridhar Seshasayee
09:52 AM Feature #49089 (Fix Under Review): msg: add new func support_reencode
Kefu Chai
09:33 AM Bug #50775: mds and osd unable to obtain rotating service keys
I posted https://github.com/ceph/ceph/pull/41368, please take a look. It's probably not going to solve an edge case ... Ilya Dryomov
09:01 AM Bug #49688: FAILED ceph_assert(is_primary()) in submit_log_entries during PromoteManifestCallback...
Can I get detailed logs in /remote?---it seems that /remote is removed.
This log seems very strange because Promot...
Myoungwon Oh
08:46 AM Bug #50245: TEST_recovery_scrub_2: Not enough recovery started simultaneously
Saw this during the below run:
/a/sseshasa-2021-05-17_11:08:21-rados-wip-sseshasa-testing-2021-05-17-1504-distro-bas...
Sridhar Seshasayee
06:12 AM Bug #50853 (Can't reproduce): libcephsqlite: Core dump while running test_libcephsqlite.sh.
Observed on master during following run:
/a/sseshasa-2021-05-17_11:08:21-rados-wip-sseshasa-testing-2021-05-17-1504-...
Sridhar Seshasayee
03:39 AM Bug #50042: rados/test.sh: api_watch_notify failures
Deepika Upadhyay wrote:
> [...]
>
> /ceph/teuthology-archive/ideepika-2021-05-17_08:09:09-rados-wip-yuri2-testing...
Brad Hubbard
01:56 AM Bug #50743: *: crash in pthread_getname_np
Yaarit Hatuka wrote:
> Hi Patrick,
>
> We don't have the signal number yet in the telemetry crash reports.
>
>...
Patrick Donnelly

05/17/2021

10:34 PM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...
This issue seems very similar to https://tracker.ceph.com/issues/49427#change-189520, which was fixed by https://gith... Neha Ojha
11:36 AM Bug #50806: osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_log().get_mis...

additional relevant log: ...
Deepika Upadhyay
02:50 PM Bug #50842 (Resolved): pacific: recovery does not complete because of rw_manager lock not being ...
recovery of the snapshot should be complete, and proceed to the head object:... Deepika Upadhyay
02:23 PM Bug #50743: *: crash in pthread_getname_np
Hi Patrick,
We don't have the signal number yet in the telemetry crash reports.
You can see other crash events ...
Yaarit Hatuka
10:12 AM Bug #50042: rados/test.sh: api_watch_notify failures
... Deepika Upadhyay
07:13 AM Bug #50657: smart query on monitors
Sure:... Jan-Philipp Litza
01:16 AM Bug #49688: FAILED ceph_assert(is_primary()) in submit_log_entries during PromoteManifestCallback...
ok Myoungwon Oh

05/16/2021

03:55 PM Backport #50831 (Resolved): pacific: pacific ceph-mon: mon initial failed on aarch64
https://github.com/ceph/ceph/pull/51314 Backport Bot
03:51 PM Bug #50384 (Pending Backport): pacific ceph-mon: mon initial failed on aarch64
Kefu Chai
03:47 PM Bug #50384 (Resolved): pacific ceph-mon: mon initial failed on aarch64
Kefu Chai

05/15/2021

09:38 PM Bug #50829 (New): nautilus: valgrind leak in SimpleMessenger
... Deepika Upadhyay
09:20 AM Bug #50775: mds and osd unable to obtain rotating service keys
No, it's the other way around -- there is some suspicion that https://tracker.ceph.com/issues/50390 is caused by thes... Ilya Dryomov
09:11 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> But it is the original octopus release with no substantial backports, correct?
>
> In parti...
wenge song
08:54 AM Bug #50775: mds and osd unable to obtain rotating service keys
But it is the original octopus release with no substantial backports, correct?
In particular, can you confirm that...
Ilya Dryomov
08:32 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> So you are running the original octopus release with custom patches added in?
just a little...
wenge song
07:45 AM Bug #50775: mds and osd unable to obtain rotating service keys
So you are running the original octopus release with custom patches added in? Ilya Dryomov
05:26 AM Bug #50775: mds and osd unable to obtain rotating service keys
Ilya Dryomov wrote:
> Hi Song,
>
> Could you please confirm the ceph version with the output of "ceph-mds --versi...
wenge song
02:00 AM Bug #50743 (Need More Info): *: crash in pthread_getname_np
How do we know what the signal number was? Not clear to me what to do with this. I don't see anything obviously wrong... Patrick Donnelly

05/14/2021

09:58 PM Bug #50692 (Resolved): nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
Neha Ojha
09:56 PM Bug #50746: osd: terminate called after throwing an instance of 'std::out_of_range'
I ran the same command "MDS=3 OSD=3 MON=3 MGR=1 ../src/vstart.sh -n -X -G --msgr1 --memstore" and everything works fi... Neha Ojha
09:43 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Do the OSDs hitting this assert come up fine on restarting? or are they repeatedly hitting this assert? Neha Ojha
01:34 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Just purely based on the numbering of OSDs I know for a fact that osd.47 was upgraded before osd.59 so based on that ... Tobias Urdin
08:51 PM Bug #50775: mds and osd unable to obtain rotating service keys
Hi Song,
Could you please confirm the ceph version with the output of "ceph-mds --version"?
Ilya Dryomov
08:33 PM Bug #50657 (In Progress): smart query on monitors
Hi Jan-Philipp,
Thanks for reporting this.
Can you please provide the output of `df` on the host where a monito...
Yaarit Hatuka
05:46 PM Backport #50703: octopus: Data loss propagation after backfill
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41237
merged
Yuri Weinstein
05:43 PM Backport #49993: octopus: unittest_mempool.check_shard_select failed
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/39978
merged
Yuri Weinstein
05:43 PM Backport #49053: octopus: pick_a_shard() always select shard 0
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/39978
merged
Yuri Weinstein
04:42 PM Bug #47380 (Resolved): mon: slow ops due to osd_failure
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
03:47 PM Backport #49919 (Resolved): nautilus: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41213
m...
Loïc Dachary
02:02 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
... Deepika Upadhyay
07:40 AM Bug #50813 (Duplicate): mon/OSDMonitor: should clear new flag when do destroy
the new flag in osdmap will affects the osd up according option
mon_osd_auto_mark_new_in. So it is more safe...
Zengran Zhang
01:41 AM Bug #50806 (Resolved): osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_lo...
... Neha Ojha

05/13/2021

11:30 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
The crashed OSD was running 15.2.11 but do you happen to know what version osd.59(the primary for pg 6.7a) was runnin... Neha Ojha
03:59 PM Bug #50692 (Fix Under Review): nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
Neha Ojha
03:15 PM Bug #50761: ceph mon hangs forever while trying to parse config
removed logs since they are unrelated Deepika Upadhyay
08:08 AM Backport #50793 (In Progress): octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-p...
Mykola Golub
07:05 AM Backport #50793 (Resolved): octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
https://github.com/ceph/ceph/pull/41321 Backport Bot
08:07 AM Backport #50794 (In Progress): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-p...
Mykola Golub
07:05 AM Backport #50794 (Resolved): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
https://github.com/ceph/ceph/pull/41320 Backport Bot
07:12 AM Backport #50792 (In Progress): nautilus: osd: FAILED ceph_assert(recovering.count(*i)) after non-...
The backport is included in https://github.com/ceph/ceph/pull/41293 Mykola Golub
07:05 AM Backport #50792 (Rejected): nautilus: osd: FAILED ceph_assert(recovering.count(*i)) after non-pri...
Backport Bot
07:05 AM Backport #50797 (Resolved): pacific: mon: spawn loop after mon reinstalled
https://github.com/ceph/ceph/pull/41768 Backport Bot
07:05 AM Backport #50796 (Resolved): octopus: mon: spawn loop after mon reinstalled
https://github.com/ceph/ceph/pull/41621 Backport Bot
07:05 AM Backport #50795 (Resolved): nautilus: mon: spawn loop after mon reinstalled
https://github.com/ceph/ceph/pull/41762 Backport Bot
07:02 AM Bug #50351 (Pending Backport): osd: FAILED ceph_assert(recovering.count(*i)) after non-primary os...
Kefu Chai
07:01 AM Bug #50230 (Pending Backport): mon: spawn loop after mon reinstalled
Kefu Chai
06:55 AM Backport #50791 (Resolved): pacific: osd: write_trunc omitted to clear data digest
https://github.com/ceph/ceph/pull/42019 Backport Bot
06:55 AM Backport #50790 (Resolved): octopus: osd: write_trunc omitted to clear data digest
https://github.com/ceph/ceph/pull/41620 Backport Bot
06:55 AM Backport #50789 (Rejected): nautilus: osd: write_trunc omitted to clear data digest
Backport Bot
06:54 AM Bug #50763 (Pending Backport): osd: write_trunc omitted to clear data digest
Kefu Chai

05/12/2021

10:44 PM Bug #49688: FAILED ceph_assert(is_primary()) in submit_log_entries during PromoteManifestCallback...
Myoungwon Oh, any idea what could be causing this? Feel free to unassign, if you are not aware of what is causing thi... Neha Ojha
10:39 PM Bug #49688: FAILED ceph_assert(is_primary()) in submit_log_entries during PromoteManifestCallback...
/a/yuriw-2021-05-11_19:33:39-rados-wip-yuri2-testing-2021-05-11-1032-pacific-distro-basic-smithi/6110085 Neha Ojha
06:02 PM Bug #50761: ceph mon hangs forever while trying to parse config
Ilya Dryomov wrote:
> Are you sure that client.admin.4638.log was generated by your hello-world binary? Because the...
Deepika Upadhyay
05:15 PM Bug #50761: ceph mon hangs forever while trying to parse config
Are you sure that client.admin.4638.log was generated by your hello-world binary? Because the complete log attached ... Ilya Dryomov
04:38 PM Bug #50761: ceph mon hangs forever while trying to parse config
logs: https://drive.google.com/file/d/1RdLToyo3vpL3nFMI2U3hfGrY4tpfQ9Az/view?usp=sharing Deepika Upadhyay
04:03 PM Bug #50761: ceph mon hangs forever while trying to parse config
Ilya Dryomov wrote:
> Where client.admin.4803.log came from? How was it captured?
I ran a hello world script:
<...
Deepika Upadhyay
03:51 PM Bug #50761: ceph mon hangs forever while trying to parse config
Where client.admin.4803.log came from? How was it captured? Ilya Dryomov
02:24 PM Bug #50775: mds and osd unable to obtain rotating service keys
I will reproduce, fix and verify this bug. Then will send code review of bugfix. wenge song
02:23 PM Bug #50775 (Fix Under Review): mds and osd unable to obtain rotating service keys
version-15.2.0
error message:
2021-05-04T05:51:54.719+0800 7f105b2737c0 -1 mds.c unable to obtain rotating serv...
wenge song
02:19 PM Bug #50384 (Fix Under Review): pacific ceph-mon: mon initial failed on aarch64
Sage Weil
12:09 PM Backport #50666 (Resolved): pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat fai...
Sage Weil
12:09 PM Bug #50595 (Resolved): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
Sage Weil
07:06 AM Bug #50747 (Fix Under Review): nautilus: osd: backfill_unfound state reset to clean after osd res...
Mykola Golub
05:46 AM Bug #50763 (Fix Under Review): osd: write_trunc omitted to clear data digest
Kefu Chai
02:20 AM Bug #50763: osd: write_trunc omitted to clear data digest
https://github.com/ceph/ceph/pull/41290 Zengran Zhang
02:13 AM Bug #50763 (Resolved): osd: write_trunc omitted to clear data digest
Zengran Zhang

05/11/2021

07:43 PM Backport #50666 (In Progress): pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat ...
Neha Ojha
04:32 PM Bug #50692: nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
/a/yuriw-2021-05-11_14:36:21-rados-wip-yuri2-testing-2021-05-10-1557-nautilus-distro-basic-smithi/6109477 Neha Ojha
03:37 PM Bug #50761 (New): ceph mon hangs forever while trying to parse config
... Deepika Upadhyay
08:58 AM Bug #50004 (Resolved): mon: Modify Paxos trim logic to be more efficient
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:57 AM Bug #50395 (Resolved): filestore: ENODATA error after directory split confuses transaction
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:52 AM Backport #50125 (Resolved): nautilus: mon: Modify Paxos trim logic to be more efficient
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41099
m...
Loïc Dachary
08:52 AM Backport #50506 (Resolved): nautilus: mon/MonClient: reset authenticate_err in _reopen_session()
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41016
m...
Loïc Dachary
08:52 AM Backport #50481 (Resolved): nautilus: filestore: ENODATA error after directory split confuses tra...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40987
m...
Loïc Dachary
08:50 AM Backport #50504 (Resolved): octopus: mon/MonClient: reset authenticate_err in _reopen_session()
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41017
m...
Loïc Dachary
08:50 AM Backport #50479 (Resolved): octopus: filestore: ENODATA error after directory split confuses tran...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40988
m...
Loïc Dachary
07:48 AM Backport #49918 (Resolved): pacific: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41090
m...
Loïc Dachary
07:45 AM Backport #50750 (Resolved): octopus: max_misplaced was replaced by target_max_misplaced_ratio
https://github.com/ceph/ceph/pull/41624 Backport Bot
07:45 AM Backport #50749 (Rejected): nautilus: max_misplaced was replaced by target_max_misplaced_ratio
Backport Bot
07:45 AM Backport #50748 (Resolved): pacific: max_misplaced was replaced by target_max_misplaced_ratio
https://github.com/ceph/ceph/pull/42250 Backport Bot
07:41 AM Bug #50745 (Pending Backport): max_misplaced was replaced by target_max_misplaced_ratio
would be great if we can backport https://github.com/ceph/ceph/pull/41207 along with the https://github.com/ceph/ceph... Kefu Chai
04:22 AM Bug #50745 (Resolved): max_misplaced was replaced by target_max_misplaced_ratio
but the document was not sync'ed. Kefu Chai
07:07 AM Bug #50747 (Fix Under Review): nautilus: osd: backfill_unfound state reset to clean after osd res...
On nautilus we have been observing an issue when an EC pg is in active+backfill_unfound+degraded state (which happens... Mykola Golub
07:00 AM Bug #50351 (Fix Under Review): osd: FAILED ceph_assert(recovering.count(*i)) after non-primary os...
In the mailing list thread [1] I provided some details why I think the current behaviour of `PrimaryLogPG::on_failed_... Mykola Golub
04:38 AM Bug #50746 (New): osd: terminate called after throwing an instance of 'std::out_of_range'
... Xiubo Li
02:30 AM Bug #50743 (Need More Info): *: crash in pthread_getname_np

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6f...
Yaarit Hatuka

05/10/2021

05:07 PM Bug #50681: memstore: apparent memory leak when removing objects
How long did you wait to see if memory usage dropped? Did you look at any logs or dump any pool object info?
I rea...
Greg Farnum
07:35 AM Bug #46670: refuse to remove mon from the monmap if the mon is in quorum
I still believe the "extra" security is important, I mean we do this for pools, mons are almost equally critical... Sébastien Han
04:03 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
I have encountered this 4 more times on a 20-OSD cluster now running 16.2.3. If needed, I can provide more info. 玮文 胡

05/09/2021

05:17 PM Bug #45690: pg_interval_t::check_new_interval is overly generous about guessing when EC PGs could...
This description doesn't seem quite right to me -- OSDs 1-3 were part of the interval in step 4 so they know that not... Greg Farnum
06:09 AM Bug #45390 (Closed): FreeBSD: osdmap decode and encode does not give the same OSDMap
I assume this is fixed by now since the FreeBSD port is under active development? :) Greg Farnum
05:45 AM Bug #46670: refuse to remove mon from the monmap if the mon is in quorum
I'm inclined to say that this is fine as-is? I don't off-hand know how we remove monitors from quorum from the Ceph CLI. Greg Farnum
05:42 AM Bug #46876 (Resolved): osd/ECBackend: optimize remaining read as readop contain multiple objects
Greg Farnum
05:08 AM Feature #47666: Ceph pool history
Much of this is also maintained in the audit log, but that's not easily digestible back in by Ceph. Greg Farnum
03:42 AM Feature #48151 (Closed): osd: allow remote read by calling cls method from within cls context
Remote calls like this are unfortunately not plausible to implement within the object handling workflow. Greg Farnum
03:19 AM Support #48530 (Closed): ceph pg status in incomplete.
This kind of question is best served on the ceph-users@ceph.io mailing list if you can't find the answer in the docum... Greg Farnum

05/08/2021

11:04 PM Bug #49158: doc: ceph-monstore-tools might create wrong monitor store
This problem was fixed in the following PR.
https://github.com/ceph/ceph/pull/39288
Satoru Takeuchi
02:49 PM Bug #48468: ceph-osd crash before being up again
Hi Sage,
Hum I've finally managed to recover my cluster after an uncounted osd restart procedures until they star...
Clément Hampaï
11:21 AM Backport #50701 (In Progress): nautilus: Data loss propagation after backfill
Mykola Golub
08:40 AM Backport #50701 (Resolved): nautilus: Data loss propagation after backfill
https://github.com/ceph/ceph/pull/41238 Backport Bot
11:20 AM Backport #50703 (In Progress): octopus: Data loss propagation after backfill
Mykola Golub
08:40 AM Backport #50703 (Resolved): octopus: Data loss propagation after backfill
https://github.com/ceph/ceph/pull/41237 Backport Bot
11:19 AM Backport #50702 (In Progress): pacific: Data loss propagation after backfill
Mykola Golub
08:40 AM Backport #50702 (Resolved): pacific: Data loss propagation after backfill
https://github.com/ceph/ceph/pull/41236 Backport Bot
09:40 AM Backport #50706 (Resolved): pacific: _delete_some additional unexpected onode list
https://github.com/ceph/ceph/pull/41680 Backport Bot
09:40 AM Backport #50705 (Resolved): octopus: _delete_some additional unexpected onode list
https://github.com/ceph/ceph/pull/41623 Backport Bot
09:40 AM Backport #50704 (Resolved): nautilus: _delete_some additional unexpected onode list
https://github.com/ceph/ceph/pull/41682 Backport Bot
09:37 AM Bug #50466 (Pending Backport): _delete_some additional unexpected onode list
Konstantin Shalygin
08:37 AM Bug #50558 (Pending Backport): Data loss propagation after backfill
Kefu Chai
08:30 AM Backport #50697 (Resolved): pacific: common: the dump of thread IDs is in dec instead of hex
https://github.com/ceph/ceph/pull/53465 Backport Bot
08:29 AM Bug #50653 (Pending Backport): common: the dump of thread IDs is in dec instead of hex
Kefu Chai
03:26 AM Feature #49089 (In Progress): msg: add new func support_reencode
Greg Farnum
03:24 AM Support #49268 (Closed): Blocked IOs up to 30 seconds when host powered down
You can also tune how quickly the OSDs report their peers down from missing heartbeats, but in general losing a monit... Greg Farnum
03:17 AM Support #49489: Getting Long heartbeat and slow requests on ceph luminous 12.2.13
This is almost certainly a result of cache tiering (which we generally discourage from use) being a bad fit or incorr... Greg Farnum

05/07/2021

11:39 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
I have attached a coredump. This hook works fine in 15.2.9. I can also run it fine manually from inside a launched OS... Andrew Davidoff
09:57 PM Bug #50659 (Need More Info): Segmentation fault under Pacific 16.2.1 when using a custom crush lo...
Is it possible for you to capture a coredump? Did the same crush_location_hook work fine on your 15.2.9 cluster? Neha Ojha
10:12 PM Bug #50637: OSD slow ops warning stuck after OSD fail
This sounds like a bug, we shouldn't be accounting for down+out osds when counting slow ops. Neha Ojha
08:57 AM Bug #50637: OSD slow ops warning stuck after OSD fail
I now zapped and re-created the OSDs on this disk. As expected, purging OSD 580 from the cluster cleared the health w... Frank Schilder
10:01 PM Bug #50657: smart query on monitors
Yaarit, can you help take a look at this? Neha Ojha
09:45 PM Bug #50682: Pacific - OSD not starting after upgrade
This issue has been fixed by https://github.com/ceph/ceph/pull/40845 and will be released in the next pacific point r... Neha Ojha
07:48 PM Bug #47949: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
/a/yuriw-2021-05-06_15:20:22-rados-wip-yuri4-testing-2021-05-05-1236-nautilus-distro-basic-smithi/6101282 Neha Ojha
07:47 PM Bug #50692 (Resolved): nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
... Neha Ojha
02:25 PM Bug #50688 (Duplicate): Ceph can't be deployed using cephadm on nodes with /32 ip addresses
*Preamble*
In certain data centers it is common to assign a /32 ip address to a node and let bgp handle the reacha...
Francesco Pantano
12:34 PM Bug #50681: memstore: apparent memory leak when removing objects
Thanks Greg for your answer. So my expectation was, that at least when there is memory pressure or I am unmounting th... Sven Anderson
09:16 AM Bug #50683: [RBD] master - cluster [WRN] Health check failed: mon is allowing insecure global_id ...
Hi Harish,... Ilya Dryomov
09:04 AM Bug #50683 (Rejected): [RBD] master - cluster [WRN] Health check failed: mon is allowing insecure...
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e826082...
Harish Munjulur
09:06 AM Bug #49231: MONs unresponsive over extended periods of time
After running for a few months with the modified setting, it seems that it fixes the issue. I still see CPU load- and... Frank Schilder
03:10 AM Backport #49919 (In Progress): nautilus: mon: slow ops due to osd_failure
Kefu Chai
02:13 AM Bug #50245: TEST_recovery_scrub_2: Not enough recovery started simultaneously
/a/nojha-2021-05-06_22:58:00-rados-wip-default-mclock-2021-05-06-distro-basic-smithi/6102970 Neha Ojha
01:22 AM Bug #50162 (Won't Fix): Backport to Natilus of automatic lowering min_size for repairing tasks (o...
Nathan Cutler wrote:
> This needs a pull request ID, or a list of master commits that are requested to be backported...
Neha Ojha

05/06/2021

11:48 PM Bug #50682 (New): Pacific - OSD not starting after upgrade
Copied from https://tracker.ceph.com/issues/50169
Using Ubuntu 20.04, none cephadm, packages from ceph repositorie...
Greg Farnum
09:49 PM Bug #50681: memstore: apparent memory leak when removing objects
I’m not totally clear on what you’re doing here and what you think the erroneous behavior is. Memstore only stores da... Greg Farnum
07:05 PM Bug #50681: memstore: apparent memory leak when removing objects
The title should say "osd objectstore = memstore" Sven Anderson
06:31 PM Bug #50681 (New): memstore: apparent memory leak when removing objects
When I create and unlink big files like in this[1] little program in my development environment, the OSD daemon keeps... Sven Anderson
02:26 AM Bug #50558: Data loss propagation after backfill
For the record, the following is the sequence of the data loss propagation when readdir error happens on filestore du... Tomohiro Misono

05/05/2021

09:38 PM Bug #49809: 1 out of 3 mon crashed in MonitorDBStore::get_synchronizer
Hi Christian,
No, unfortunately I hit a dead end on this as the log message issue was a red herring.
I'm afraid...
Brad Hubbard
02:36 PM Bug #49809: 1 out of 3 mon crashed in MonitorDBStore::get_synchronizer
Brad were you able to find out more about the root cause of this crash? Christian Rohmann
09:09 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
Possibly related to 40119 Brad Hubbard
06:05 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
/a/sage-2021-05-05_15:58:13-rados-wip-sage-testing-2021-05-04-1814-distro-basic-smithi/6099487
Sage Weil
07:26 PM Backport #49918: pacific: mon: slow ops due to osd_failure
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41090
merged
Yuri Weinstein
06:10 PM Backport #50666 (Resolved): pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat fai...
https://github.com/ceph/ceph/pull/41182 Backport Bot
06:09 PM Bug #50595 (Pending Backport): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
Sage Weil
04:26 PM Backport #50504: octopus: mon/MonClient: reset authenticate_err in _reopen_session()
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41017
merged
Yuri Weinstein
04:25 PM Backport #50479: octopus: filestore: ENODATA error after directory split confuses transaction
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40988
merged
Yuri Weinstein
03:33 PM Bug #17257 (Can't reproduce): ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
Sage Weil
02:29 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
I forgot to add that I tried to diff code I thought was relevant between tags v15.2.9 and v16.2.1 and thought I saw s... Andrew Davidoff
02:21 PM Bug #50659 (Resolved): Segmentation fault under Pacific 16.2.1 when using a custom crush location...
I feel like if this wasn't somehow just my problem, there'd be an issue open on it already, but I'm not seeing one, a... Andrew Davidoff
01:58 PM Bug #50658 (New): TEST_backfill_pool_priority fails
... Kefu Chai
12:39 PM Bug #50657 (Resolved): smart query on monitors
Since the upgrade to Pacific, our manager queries each daemon for smart statistics.
This is fine on the OSDs (at l...
Jan-Philipp Litza
10:51 AM Bug #49962 (Fix Under Review): 'sudo ceph --cluster ceph osd crush tunables default' fails due to...
https://github.com/ceph/ceph/pull/41169 Radoslaw Zarzynski
03:56 AM Bug #40119: api_tier_pp hung causing a dead job
/a/bhubbard-2021-04-26_22:38:21-rados-master-distro-basic-smithi/6075940
In this instance the slow requests are on...
Brad Hubbard

05/04/2021

07:28 PM Bug #50647: common: the fault handling becomes inoperational when multiple faults happen the same...
Just to the record: https://gist.github.com/rzarzynski/eb21e48a4458b593912eccd50ab8da46. Radoslaw Zarzynski
07:09 PM Bug #50647 (Fix Under Review): common: the fault handling becomes inoperational when multiple fau...
https://github.com/ceph/ceph/pull/41154 Radoslaw Zarzynski
02:42 PM Bug #50647 (Fix Under Review): common: the fault handling becomes inoperational when multiple fau...
The problem arises due to installing the fault handlers with the flag @SA_RESETHAND@. It instructs the kernel to rest... Radoslaw Zarzynski
07:26 PM Bug #50653 (Fix Under Review): common: the dump of thread IDs is in dec instead of hex
https://github.com/ceph/ceph/pull/41155 Radoslaw Zarzynski
07:08 PM Bug #50653 (Resolved): common: the dump of thread IDs is in dec instead of hex
It's a fallout from 5b8274f09951c7f36eb1ca1a234e7c8a08c30c9c. Radoslaw Zarzynski
04:39 PM Backport #50125: nautilus: mon: Modify Paxos trim logic to be more efficient
https://github.com/ceph/ceph/pull/41099 merged Yuri Weinstein
04:38 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
https://github.com/ceph/ceph/pull/41098 merged Yuri Weinstein
04:37 PM Backport #50506: nautilus: mon/MonClient: reset authenticate_err in _reopen_session()
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41016
merged
Yuri Weinstein
04:02 PM Bug #50595 (Fix Under Review): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
... Sage Weil
03:31 PM Backport #50481: nautilus: filestore: ENODATA error after directory split confuses transaction
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40987
merged
Yuri Weinstein
03:16 PM Bug #50648 (New): nautilus: ceph status check times out
... Deepika Upadhyay
07:47 AM Bug #50637 (Duplicate): OSD slow ops warning stuck after OSD fail
We had a disk fail with 2 OSDs deployed on it, ids=580, 581. Since then, the health warning @430 slow ops, oldest one... Frank Schilder

05/03/2021

11:05 PM Backport #50344: pacific: mon: stretch state is inconsistently-maintained on peons, preventing pr...
https://github.com/ceph/ceph/pull/41130 Greg Farnum
10:59 PM Backport #50087 (In Progress): pacific: test_mon_pg: mon fails to join quorum to due election str...
Neha Ojha
09:57 PM Backport #50087: pacific: test_mon_pg: mon fails to join quorum to due election strategy mismatch
https://github.com/ceph/ceph/pull/40484 Greg Farnum
08:53 PM Bug #47719 (Resolved): api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:53 PM Bug #48946 (Resolved): Disable and re-enable clog_to_monitors could trigger assertion
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:52 PM Bug #49392 (Resolved): osd ok-to-stop too conservative
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:50 PM Backport #49640 (Resolved): nautilus: Disable and re-enable clog_to_monitors could trigger assertion
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39912
m...
Loïc Dachary
08:49 PM Backport #49567 (Resolved): nautilus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40697
m...
Loïc Dachary
08:47 PM Backport #50130: nautilus: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40700
m...
Loïc Dachary
08:47 PM Backport #49531 (Resolved): nautilus: osd ok-to-stop too conservative
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40676
m...
Loïc Dachary
08:46 PM Backport #50459: nautilus: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40959
m...
Loïc Dachary
07:17 PM Bug #48732: Marking OSDs out causes mon daemons to crash following tcmalloc: large alloc
Wes Dillingham wrote:
> Hello Dan and Neha. Shortly after filing this bug I went on paternity leave but have returne...
Dan van der Ster
07:04 PM Bug #48732: Marking OSDs out causes mon daemons to crash following tcmalloc: large alloc
Hello Dan and Neha. Shortly after filing this bug I went on paternity leave but have returned today. I will try and a... Wes Dillingham
05:48 PM Bug #50595: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
The upgrade test do not fail all the time.
upgrade:nautilus-x/parallel/{0-cluster/{openstack start} 1-ceph-install...
Neha Ojha
05:21 PM Bug #50595: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
I think the upgrade test just needs to skip that test. It's just looking for a specific string that changed in pacifi... Sage Weil
04:31 PM Bug #50595 (Triaged): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
This seems related to ab0d8f2ae9f551e15a4c7bacbf69161e91263785.
Reverting makes the issue go away http://pulpito.fro...
Neha Ojha
04:36 PM Bug #48997: rados/singleton/all/recovery-preemption: defer backfill|defer recovery not found in logs
/a/yuriw-2021-04-30_13:45:05-rados-pacific-distro-basic-smithi/6086228 Neha Ojha
04:22 PM Backport #50129: octopus: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr_...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40758
m...
Loïc Dachary
04:21 PM Backport #49917: octopus: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40558
m...
Loïc Dachary
04:21 PM Backport #50123: octopus: mon: Modify Paxos trim logic to be more efficient
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40699
m...
Loïc Dachary
04:21 PM Backport #49566: octopus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40756
m...
Loïc Dachary
04:20 PM Backport #49816: octopus: mon: promote_standby does not update available_modules
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40757
m...
Loïc Dachary
04:19 PM Backport #50457: octopus: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/d...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40958
m...
Loïc Dachary
04:16 PM Bug #48336 (Resolved): monmaptool --create --add nodeA --clobber monmap aborts in entity_addr_t::...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:14 PM Bug #49778 (Resolved): mon: promote_standby does not update available_modules
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:04 PM Backport #50124 (Resolved): pacific: mon: Modify Paxos trim logic to be more efficient
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40691
m...
Loïc Dachary
04:04 PM Backport #50480: pacific: filestore: ENODATA error after directory split confuses transaction
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40989
m...
Loïc Dachary
04:04 PM Backport #50154 (Resolved): pacific: Reproduce https://tracker.ceph.com/issues/48417
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40759
m...
Loïc Dachary
04:04 PM Backport #50131 (Resolved): pacific: monmaptool --create --add nodeA --clobber monmap aborts in e...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40690
m...
Loïc Dachary
03:53 PM Backport #50458: pacific: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/d...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40957
m...
Loïc Dachary
12:07 PM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
I had multiple OSDs die during an upgrade from Nautilus to Octopus with this trace. See the attached crash2.txt Tobias Urdin
12:05 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Longer debug_osd output in crash1.txt file.... Tobias Urdin
12:04 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Longer debug_osd output in crash1.txt file.
-1> 2021-04-29T13:51:57.756+0200 7fa2edae2700 -1 /home/jenkins-b...
Tobias Urdin
12:02 PM Bug #50608 (Need More Info): ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
This was happening on a cluster running Nautilus 14.2.18/14.2.19 during upgrading to Octopus 15.2.11 some 3-4 OSDs cr... Tobias Urdin
09:00 AM Backport #50606 (In Progress): pacific: osd/scheduler/mClockScheduler: Async reservers are not up...
Sridhar Seshasayee
08:12 AM Backport #50606 (Resolved): pacific: osd/scheduler/mClockScheduler: Async reservers are not updat...
https://github.com/ceph/ceph/pull/41125 Sridhar Seshasayee

05/02/2021

01:44 PM Bug #50510: OSD will return -EAGAIN on balance_reads although it can return the data
Neha Ojha wrote:
> Can you please provide osd logs from the primary and replica with debug_osd=20 and debug_ms=1? Th...
Zulai Wang

05/01/2021

09:54 PM Support #49847 (Closed): OSD Fails to init after upgrading to octopus: _deferred_replay failed to...
Igor Fedotov
12:12 AM Bug #50595: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
... Neha Ojha

04/30/2021

09:53 PM Bug #50420 (Need More Info): all osd down after mon scrub too long
Can provide us with the cluster log from this time? How large is your mon db? Neha Ojha
09:21 PM Bug #50422: Error: finished tid 1 when last_acked_tid was 2
Looks like a cache tiering+short pg log bug... Neha Ojha
08:29 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Dan van der Ster wrote:
> Leaving this open to address the msgr2 abort. Presumably this is caused by the >4GB messag...
Josh Durgin
06:36 AM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Josh Durgin wrote:
> Ah good catch Dan, that loop does appear to be generating millions of '=' in nautilus. Sounds l...
Dan van der Ster
06:07 AM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Leaving this open to address the msgr2 abort. Presumably this is caused by the >4GB message generated to respond to `... Dan van der Ster
01:08 AM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Ah good catch Dan, that loop does appear to be generating millions of '=' in nautilus. Sounds like we need to fix tha... Josh Durgin
07:38 PM Bug #50595 (Resolved): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
Seems to be ubuntu 18.04 specific
Run: https://pulpito.ceph.com/yuriw-2021-04-29_16:10:26-upgrade:nautilus-x-pacif...
Yuri Weinstein
06:56 PM Backport #49919: nautilus: mon: slow ops due to osd_failure
Kefu, can you please help with a minimal backport for nautilus? Neha Ojha
05:38 PM Bug #50501 (Pending Backport): osd/scheduler/mClockScheduler: Async reservers are not updated wit...
Neha Ojha
04:27 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2021-04-30_12:58:14-rados-wip-yuri2-testing-2021-04-29-1501-pacific-distro-basic-smithi/6086154 Neha Ojha
04:25 PM Bug #50042: rados/test.sh: api_watch_notify failures
... Neha Ojha
09:21 AM Backport #50125: nautilus: mon: Modify Paxos trim logic to be more efficient
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/41099
ceph-backport.sh versi...
Aishwarya Mathuria

04/29/2021

11:16 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
commit 5f95ec4457059889bc4dbc2ad25cdc0537255f69 removed that loop in Monitor.cc but wasn't backported to nautilus.
...
Dan van der Ster
10:39 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Something else weird in the mgr log: can negative progress events break the mon ?... Dan van der Ster
10:26 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Here are some notes on the timelines between various actors at the start of the incident:
mon.cephbeesly-mon-2a00f...
Dan van der Ster
07:14 PM Bug #50587 (Resolved): mon election storm following osd recreation: huge tcmalloc and ceph::msgr:...
We recreated an osd and seconds later our mons started using 100% CPU and going into an election storm which lasted n... Dan van der Ster
08:55 PM Bug #48732: Marking OSDs out causes mon daemons to crash following tcmalloc: large alloc
Wes, did you ever find out more about the root cause of this? We saw something similar today in #50587
Dan van der Ster
07:51 PM Bug #50510 (Need More Info): OSD will return -EAGAIN on balance_reads although it can return the ...
Can you please provide osd logs from the primary and replica with debug_osd=20 and debug_ms=1? That will help us unde... Neha Ojha
05:41 PM Backport #49918 (In Progress): pacific: mon: slow ops due to osd_failure
Neha Ojha
05:06 PM Backport #50480 (Resolved): pacific: filestore: ENODATA error after directory split confuses tran...
Nathan Cutler
12:05 PM Bug #50558 (Fix Under Review): Data loss propagation after backfill
Kefu Chai
11:25 AM Bug #50558: Data loss propagation after backfill
Hi
I worked with hase-san and submitted PR to handle readdir error correctly in filestore code: https://github.com...
Tomohiro Misono
10:38 AM Fix #50574: qa/standalone: Modify/re-write failing standalone tests with mclock scheduler
Standalone failures observed here:
https://pulpito.ceph.com/sseshasa-2021-04-23_15:37:51-rados-wip-mclock-max-backfi...
Sridhar Seshasayee
05:43 AM Fix #50574 (Resolved): qa/standalone: Modify/re-write failing standalone tests with mclock scheduler
A subset of the existing qa/standlone tests are failing with osd_op_queue set to "mclock_scheduler".
This is mainl...
Sridhar Seshasayee
10:19 AM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
We run into the same assert over and over on one OSD. We were upgrading from luminous to nautilus ceph version 14.2.2... Ana Aviles

04/28/2021

12:17 PM Bug #50558 (Resolved): Data loss propagation after backfill
Situation:
An OSD data loss has been propagated to other OSDs. If backfill is performed when shard is missing in a p...
Jin Hase

04/27/2021

09:14 PM Backport #50130 (Resolved): nautilus: monmaptool --create --add nodeA --clobber monmap aborts in ...
Brad Hubbard
04:48 PM Backport #50130: nautilus: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40700
merged
Yuri Weinstein
09:06 PM Backport #49567: nautilus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40697
merged
Yuri Weinstein
04:47 PM Backport #49531: nautilus: osd ok-to-stop too conservative
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40676
merged
Yuri Weinstein
12:37 PM Bug #50536 (New): "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on mas...
/a/sseshasa-2021-04-23_18:11:53-rados-wip-sseshasa-testing-2021-04-23-2212-distro-basic-smithi/6068991
Noticed a t...
Sridhar Seshasayee
11:23 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
Observed the same failure here:
/a/sseshasa-2021-04-23_18:11:53-rados-wip-sseshasa-testing-2021-04-23-2212-distro-ba...
Sridhar Seshasayee
06:08 AM Backport #50129 (Resolved): octopus: monmaptool --create --add nodeA --clobber monmap aborts in e...
Kefu Chai

04/26/2021

09:33 PM Backport #50124: pacific: mon: Modify Paxos trim logic to be more efficient
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40691
merged
Yuri Weinstein
09:30 PM Backport #50480: pacific: filestore: ENODATA error after directory split confuses transaction
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40989
merged
Yuri Weinstein
09:29 PM Backport #50154: pacific: Reproduce https://tracker.ceph.com/issues/48417
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40759
merged
Yuri Weinstein
09:27 PM Backport #50131: pacific: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr_...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40690
merged
Yuri Weinstein
06:32 PM Bug #50512: upgrade:nautilus-p2p-nautilus: unhandled event in ToDelete
... Neha Ojha
09:54 AM Bug #50351: osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd restart when in b...
`PrimaryLogPG::on_failed_pull` [1] looks suspicious to me. We remove the oid from `backfills_in_flight` here only if ... Mykola Golub

04/25/2021

11:41 PM Bug #50512 (Won't Fix - EOL): upgrade:nautilus-p2p-nautilus: unhandled event in ToDelete
Run: https://pulpito.ceph.com/teuthology-2021-04-22_01:25:03-upgrade:nautilus-p2p-nautilus-distro-basic-smithi/
Job:...
Yuri Weinstein
05:59 PM Bug #50510 (Need More Info): OSD will return -EAGAIN on balance_reads although it can return the ...
PrimaryLogPG.cc:
if (!is_primary()) {
if (!recovery_state.can_serve_replica_read(oid)) {
dout(20) <...
Or Friedmann
02:23 PM Bug #50508: ceph-mon crash when create pool
Hi, the ceph-mon process crashed when I created a pool. ... chen long
02:18 PM Bug #50508: ceph-mon crash when create pool
[root@arsenal-ceph-test-167 ~]# ceph crash ls
ID ENTIT...
chen long
02:15 PM Bug #50508 (New): ceph-mon crash when create pool
chen long
10:00 AM Backport #50505 (In Progress): pacific: mon/MonClient: reset authenticate_err in _reopen_session()
Ilya Dryomov
10:00 AM Backport #50504 (In Progress): octopus: mon/MonClient: reset authenticate_err in _reopen_session()
Ilya Dryomov
09:59 AM Backport #50506 (In Progress): nautilus: mon/MonClient: reset authenticate_err in _reopen_session()
Ilya Dryomov
02:52 AM Backport #49917 (Resolved): octopus: mon: slow ops due to osd_failure
Kefu Chai
02:51 AM Backport #50123 (Resolved): octopus: mon: Modify Paxos trim logic to be more efficient
Kefu Chai
02:49 AM Backport #49566 (Resolved): octopus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
Kefu Chai
02:49 AM Backport #49816 (Resolved): octopus: mon: promote_standby does not update available_modules
Kefu Chai
 

Also available in: Atom