Project

General

Profile

Activity

From 04/15/2021 to 05/14/2021

05/14/2021

09:58 PM Bug #50692 (Resolved): nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
Neha Ojha
09:56 PM Bug #50746: osd: terminate called after throwing an instance of 'std::out_of_range'
I ran the same command "MDS=3 OSD=3 MON=3 MGR=1 ../src/vstart.sh -n -X -G --msgr1 --memstore" and everything works fi... Neha Ojha
09:43 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Do the OSDs hitting this assert come up fine on restarting? or are they repeatedly hitting this assert? Neha Ojha
01:34 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Just purely based on the numbering of OSDs I know for a fact that osd.47 was upgraded before osd.59 so based on that ... Tobias Urdin
08:51 PM Bug #50775: mds and osd unable to obtain rotating service keys
Hi Song,
Could you please confirm the ceph version with the output of "ceph-mds --version"?
Ilya Dryomov
08:33 PM Bug #50657 (In Progress): smart query on monitors
Hi Jan-Philipp,
Thanks for reporting this.
Can you please provide the output of `df` on the host where a monito...
Yaarit Hatuka
05:46 PM Backport #50703: octopus: Data loss propagation after backfill
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41237
merged
Yuri Weinstein
05:43 PM Backport #49993: octopus: unittest_mempool.check_shard_select failed
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/39978
merged
Yuri Weinstein
05:43 PM Backport #49053: octopus: pick_a_shard() always select shard 0
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/39978
merged
Yuri Weinstein
04:42 PM Bug #47380 (Resolved): mon: slow ops due to osd_failure
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
03:47 PM Backport #49919 (Resolved): nautilus: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41213
m...
Loïc Dachary
02:02 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
... Deepika Upadhyay
07:40 AM Bug #50813 (Duplicate): mon/OSDMonitor: should clear new flag when do destroy
the new flag in osdmap will affects the osd up according option
mon_osd_auto_mark_new_in. So it is more safe...
Zengran Zhang
01:41 AM Bug #50806 (Resolved): osd/PrimaryLogPG.cc: FAILED ceph_assert(attrs || !recovery_state.get_pg_lo...
... Neha Ojha

05/13/2021

11:30 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
The crashed OSD was running 15.2.11 but do you happen to know what version osd.59(the primary for pg 6.7a) was runnin... Neha Ojha
03:59 PM Bug #50692 (Fix Under Review): nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
Neha Ojha
03:15 PM Bug #50761: ceph mon hangs forever while trying to parse config
removed logs since they are unrelated Deepika Upadhyay
08:08 AM Backport #50793 (In Progress): octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-p...
Mykola Golub
07:05 AM Backport #50793 (Resolved): octopus: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
https://github.com/ceph/ceph/pull/41321 Backport Bot
08:07 AM Backport #50794 (In Progress): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-p...
Mykola Golub
07:05 AM Backport #50794 (Resolved): pacific: osd: FAILED ceph_assert(recovering.count(*i)) after non-prim...
https://github.com/ceph/ceph/pull/41320 Backport Bot
07:12 AM Backport #50792 (In Progress): nautilus: osd: FAILED ceph_assert(recovering.count(*i)) after non-...
The backport is included in https://github.com/ceph/ceph/pull/41293 Mykola Golub
07:05 AM Backport #50792 (Rejected): nautilus: osd: FAILED ceph_assert(recovering.count(*i)) after non-pri...
Backport Bot
07:05 AM Backport #50797 (Resolved): pacific: mon: spawn loop after mon reinstalled
https://github.com/ceph/ceph/pull/41768 Backport Bot
07:05 AM Backport #50796 (Resolved): octopus: mon: spawn loop after mon reinstalled
https://github.com/ceph/ceph/pull/41621 Backport Bot
07:05 AM Backport #50795 (Resolved): nautilus: mon: spawn loop after mon reinstalled
https://github.com/ceph/ceph/pull/41762 Backport Bot
07:02 AM Bug #50351 (Pending Backport): osd: FAILED ceph_assert(recovering.count(*i)) after non-primary os...
Kefu Chai
07:01 AM Bug #50230 (Pending Backport): mon: spawn loop after mon reinstalled
Kefu Chai
06:55 AM Backport #50791 (Resolved): pacific: osd: write_trunc omitted to clear data digest
https://github.com/ceph/ceph/pull/42019 Backport Bot
06:55 AM Backport #50790 (Resolved): octopus: osd: write_trunc omitted to clear data digest
https://github.com/ceph/ceph/pull/41620 Backport Bot
06:55 AM Backport #50789 (Rejected): nautilus: osd: write_trunc omitted to clear data digest
Backport Bot
06:54 AM Bug #50763 (Pending Backport): osd: write_trunc omitted to clear data digest
Kefu Chai

05/12/2021

10:44 PM Bug #49688: FAILED ceph_assert(is_primary()) in submit_log_entries during PromoteManifestCallback...
Myoungwon Oh, any idea what could be causing this? Feel free to unassign, if you are not aware of what is causing thi... Neha Ojha
10:39 PM Bug #49688: FAILED ceph_assert(is_primary()) in submit_log_entries during PromoteManifestCallback...
/a/yuriw-2021-05-11_19:33:39-rados-wip-yuri2-testing-2021-05-11-1032-pacific-distro-basic-smithi/6110085 Neha Ojha
06:02 PM Bug #50761: ceph mon hangs forever while trying to parse config
Ilya Dryomov wrote:
> Are you sure that client.admin.4638.log was generated by your hello-world binary? Because the...
Deepika Upadhyay
05:15 PM Bug #50761: ceph mon hangs forever while trying to parse config
Are you sure that client.admin.4638.log was generated by your hello-world binary? Because the complete log attached ... Ilya Dryomov
04:38 PM Bug #50761: ceph mon hangs forever while trying to parse config
logs: https://drive.google.com/file/d/1RdLToyo3vpL3nFMI2U3hfGrY4tpfQ9Az/view?usp=sharing Deepika Upadhyay
04:03 PM Bug #50761: ceph mon hangs forever while trying to parse config
Ilya Dryomov wrote:
> Where client.admin.4803.log came from? How was it captured?
I ran a hello world script:
<...
Deepika Upadhyay
03:51 PM Bug #50761: ceph mon hangs forever while trying to parse config
Where client.admin.4803.log came from? How was it captured? Ilya Dryomov
02:24 PM Bug #50775: mds and osd unable to obtain rotating service keys
I will reproduce, fix and verify this bug. Then will send code review of bugfix. wenge song
02:23 PM Bug #50775 (Fix Under Review): mds and osd unable to obtain rotating service keys
version-15.2.0
error message:
2021-05-04T05:51:54.719+0800 7f105b2737c0 -1 mds.c unable to obtain rotating serv...
wenge song
02:19 PM Bug #50384 (Fix Under Review): pacific ceph-mon: mon initial failed on aarch64
Sage Weil
12:09 PM Backport #50666 (Resolved): pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat fai...
Sage Weil
12:09 PM Bug #50595 (Resolved): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
Sage Weil
07:06 AM Bug #50747 (Fix Under Review): nautilus: osd: backfill_unfound state reset to clean after osd res...
Mykola Golub
05:46 AM Bug #50763 (Fix Under Review): osd: write_trunc omitted to clear data digest
Kefu Chai
02:20 AM Bug #50763: osd: write_trunc omitted to clear data digest
https://github.com/ceph/ceph/pull/41290 Zengran Zhang
02:13 AM Bug #50763 (Resolved): osd: write_trunc omitted to clear data digest
Zengran Zhang

05/11/2021

07:43 PM Backport #50666 (In Progress): pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat ...
Neha Ojha
04:32 PM Bug #50692: nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
/a/yuriw-2021-05-11_14:36:21-rados-wip-yuri2-testing-2021-05-10-1557-nautilus-distro-basic-smithi/6109477 Neha Ojha
03:37 PM Bug #50761 (New): ceph mon hangs forever while trying to parse config
... Deepika Upadhyay
08:58 AM Bug #50004 (Resolved): mon: Modify Paxos trim logic to be more efficient
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:57 AM Bug #50395 (Resolved): filestore: ENODATA error after directory split confuses transaction
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:52 AM Backport #50125 (Resolved): nautilus: mon: Modify Paxos trim logic to be more efficient
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41099
m...
Loïc Dachary
08:52 AM Backport #50506 (Resolved): nautilus: mon/MonClient: reset authenticate_err in _reopen_session()
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41016
m...
Loïc Dachary
08:52 AM Backport #50481 (Resolved): nautilus: filestore: ENODATA error after directory split confuses tra...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40987
m...
Loïc Dachary
08:50 AM Backport #50504 (Resolved): octopus: mon/MonClient: reset authenticate_err in _reopen_session()
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41017
m...
Loïc Dachary
08:50 AM Backport #50479 (Resolved): octopus: filestore: ENODATA error after directory split confuses tran...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40988
m...
Loïc Dachary
07:48 AM Backport #49918 (Resolved): pacific: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/41090
m...
Loïc Dachary
07:45 AM Backport #50750 (Resolved): octopus: max_misplaced was replaced by target_max_misplaced_ratio
https://github.com/ceph/ceph/pull/41624 Backport Bot
07:45 AM Backport #50749 (Rejected): nautilus: max_misplaced was replaced by target_max_misplaced_ratio
Backport Bot
07:45 AM Backport #50748 (Resolved): pacific: max_misplaced was replaced by target_max_misplaced_ratio
https://github.com/ceph/ceph/pull/42250 Backport Bot
07:41 AM Bug #50745 (Pending Backport): max_misplaced was replaced by target_max_misplaced_ratio
would be great if we can backport https://github.com/ceph/ceph/pull/41207 along with the https://github.com/ceph/ceph... Kefu Chai
04:22 AM Bug #50745 (Resolved): max_misplaced was replaced by target_max_misplaced_ratio
but the document was not sync'ed. Kefu Chai
07:07 AM Bug #50747 (Fix Under Review): nautilus: osd: backfill_unfound state reset to clean after osd res...
On nautilus we have been observing an issue when an EC pg is in active+backfill_unfound+degraded state (which happens... Mykola Golub
07:00 AM Bug #50351 (Fix Under Review): osd: FAILED ceph_assert(recovering.count(*i)) after non-primary os...
In the mailing list thread [1] I provided some details why I think the current behaviour of `PrimaryLogPG::on_failed_... Mykola Golub
04:38 AM Bug #50746 (New): osd: terminate called after throwing an instance of 'std::out_of_range'
... Xiubo Li
02:30 AM Bug #50743 (Need More Info): *: crash in pthread_getname_np

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6f...
Yaarit Hatuka

05/10/2021

05:07 PM Bug #50681: memstore: apparent memory leak when removing objects
How long did you wait to see if memory usage dropped? Did you look at any logs or dump any pool object info?
I rea...
Greg Farnum
07:35 AM Bug #46670: refuse to remove mon from the monmap if the mon is in quorum
I still believe the "extra" security is important, I mean we do this for pools, mons are almost equally critical... Sébastien Han
04:03 AM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
I have encountered this 4 more times on a 20-OSD cluster now running 16.2.3. If needed, I can provide more info. 玮文 胡

05/09/2021

05:17 PM Bug #45690: pg_interval_t::check_new_interval is overly generous about guessing when EC PGs could...
This description doesn't seem quite right to me -- OSDs 1-3 were part of the interval in step 4 so they know that not... Greg Farnum
06:09 AM Bug #45390 (Closed): FreeBSD: osdmap decode and encode does not give the same OSDMap
I assume this is fixed by now since the FreeBSD port is under active development? :) Greg Farnum
05:45 AM Bug #46670: refuse to remove mon from the monmap if the mon is in quorum
I'm inclined to say that this is fine as-is? I don't off-hand know how we remove monitors from quorum from the Ceph CLI. Greg Farnum
05:42 AM Bug #46876 (Resolved): osd/ECBackend: optimize remaining read as readop contain multiple objects
Greg Farnum
05:08 AM Feature #47666: Ceph pool history
Much of this is also maintained in the audit log, but that's not easily digestible back in by Ceph. Greg Farnum
03:42 AM Feature #48151 (Closed): osd: allow remote read by calling cls method from within cls context
Remote calls like this are unfortunately not plausible to implement within the object handling workflow. Greg Farnum
03:19 AM Support #48530 (Closed): ceph pg status in incomplete.
This kind of question is best served on the ceph-users@ceph.io mailing list if you can't find the answer in the docum... Greg Farnum

05/08/2021

11:04 PM Bug #49158: doc: ceph-monstore-tools might create wrong monitor store
This problem was fixed in the following PR.
https://github.com/ceph/ceph/pull/39288
Satoru Takeuchi
02:49 PM Bug #48468: ceph-osd crash before being up again
Hi Sage,
Hum I've finally managed to recover my cluster after an uncounted osd restart procedures until they star...
Clément Hampaï
11:21 AM Backport #50701 (In Progress): nautilus: Data loss propagation after backfill
Mykola Golub
08:40 AM Backport #50701 (Resolved): nautilus: Data loss propagation after backfill
https://github.com/ceph/ceph/pull/41238 Backport Bot
11:20 AM Backport #50703 (In Progress): octopus: Data loss propagation after backfill
Mykola Golub
08:40 AM Backport #50703 (Resolved): octopus: Data loss propagation after backfill
https://github.com/ceph/ceph/pull/41237 Backport Bot
11:19 AM Backport #50702 (In Progress): pacific: Data loss propagation after backfill
Mykola Golub
08:40 AM Backport #50702 (Resolved): pacific: Data loss propagation after backfill
https://github.com/ceph/ceph/pull/41236 Backport Bot
09:40 AM Backport #50706 (Resolved): pacific: _delete_some additional unexpected onode list
https://github.com/ceph/ceph/pull/41680 Backport Bot
09:40 AM Backport #50705 (Resolved): octopus: _delete_some additional unexpected onode list
https://github.com/ceph/ceph/pull/41623 Backport Bot
09:40 AM Backport #50704 (Resolved): nautilus: _delete_some additional unexpected onode list
https://github.com/ceph/ceph/pull/41682 Backport Bot
09:37 AM Bug #50466 (Pending Backport): _delete_some additional unexpected onode list
Konstantin Shalygin
08:37 AM Bug #50558 (Pending Backport): Data loss propagation after backfill
Kefu Chai
08:30 AM Backport #50697 (Resolved): pacific: common: the dump of thread IDs is in dec instead of hex
https://github.com/ceph/ceph/pull/53465 Backport Bot
08:29 AM Bug #50653 (Pending Backport): common: the dump of thread IDs is in dec instead of hex
Kefu Chai
03:26 AM Feature #49089 (In Progress): msg: add new func support_reencode
Greg Farnum
03:24 AM Support #49268 (Closed): Blocked IOs up to 30 seconds when host powered down
You can also tune how quickly the OSDs report their peers down from missing heartbeats, but in general losing a monit... Greg Farnum
03:17 AM Support #49489: Getting Long heartbeat and slow requests on ceph luminous 12.2.13
This is almost certainly a result of cache tiering (which we generally discourage from use) being a bad fit or incorr... Greg Farnum

05/07/2021

11:39 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
I have attached a coredump. This hook works fine in 15.2.9. I can also run it fine manually from inside a launched OS... Andrew Davidoff
09:57 PM Bug #50659 (Need More Info): Segmentation fault under Pacific 16.2.1 when using a custom crush lo...
Is it possible for you to capture a coredump? Did the same crush_location_hook work fine on your 15.2.9 cluster? Neha Ojha
10:12 PM Bug #50637: OSD slow ops warning stuck after OSD fail
This sounds like a bug, we shouldn't be accounting for down+out osds when counting slow ops. Neha Ojha
08:57 AM Bug #50637: OSD slow ops warning stuck after OSD fail
I now zapped and re-created the OSDs on this disk. As expected, purging OSD 580 from the cluster cleared the health w... Frank Schilder
10:01 PM Bug #50657: smart query on monitors
Yaarit, can you help take a look at this? Neha Ojha
09:45 PM Bug #50682: Pacific - OSD not starting after upgrade
This issue has been fixed by https://github.com/ceph/ceph/pull/40845 and will be released in the next pacific point r... Neha Ojha
07:48 PM Bug #47949: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_scrub: return 1
/a/yuriw-2021-05-06_15:20:22-rados-wip-yuri4-testing-2021-05-05-1236-nautilus-distro-basic-smithi/6101282 Neha Ojha
07:47 PM Bug #50692 (Resolved): nautilus: ERROR: test_rados.TestIoctx.test_service_daemon
... Neha Ojha
02:25 PM Bug #50688 (Duplicate): Ceph can't be deployed using cephadm on nodes with /32 ip addresses
*Preamble*
In certain data centers it is common to assign a /32 ip address to a node and let bgp handle the reacha...
Francesco Pantano
12:34 PM Bug #50681: memstore: apparent memory leak when removing objects
Thanks Greg for your answer. So my expectation was, that at least when there is memory pressure or I am unmounting th... Sven Anderson
09:16 AM Bug #50683: [RBD] master - cluster [WRN] Health check failed: mon is allowing insecure global_id ...
Hi Harish,... Ilya Dryomov
09:04 AM Bug #50683 (Rejected): [RBD] master - cluster [WRN] Health check failed: mon is allowing insecure...
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_19220a3bd6e252c6e826082...
Harish Munjulur
09:06 AM Bug #49231: MONs unresponsive over extended periods of time
After running for a few months with the modified setting, it seems that it fixes the issue. I still see CPU load- and... Frank Schilder
03:10 AM Backport #49919 (In Progress): nautilus: mon: slow ops due to osd_failure
Kefu Chai
02:13 AM Bug #50245: TEST_recovery_scrub_2: Not enough recovery started simultaneously
/a/nojha-2021-05-06_22:58:00-rados-wip-default-mclock-2021-05-06-distro-basic-smithi/6102970 Neha Ojha
01:22 AM Bug #50162 (Won't Fix): Backport to Natilus of automatic lowering min_size for repairing tasks (o...
Nathan Cutler wrote:
> This needs a pull request ID, or a list of master commits that are requested to be backported...
Neha Ojha

05/06/2021

11:48 PM Bug #50682 (New): Pacific - OSD not starting after upgrade
Copied from https://tracker.ceph.com/issues/50169
Using Ubuntu 20.04, none cephadm, packages from ceph repositorie...
Greg Farnum
09:49 PM Bug #50681: memstore: apparent memory leak when removing objects
I’m not totally clear on what you’re doing here and what you think the erroneous behavior is. Memstore only stores da... Greg Farnum
07:05 PM Bug #50681: memstore: apparent memory leak when removing objects
The title should say "osd objectstore = memstore" Sven Anderson
06:31 PM Bug #50681 (New): memstore: apparent memory leak when removing objects
When I create and unlink big files like in this[1] little program in my development environment, the OSD daemon keeps... Sven Anderson
02:26 AM Bug #50558: Data loss propagation after backfill
For the record, the following is the sequence of the data loss propagation when readdir error happens on filestore du... Tomohiro Misono

05/05/2021

09:38 PM Bug #49809: 1 out of 3 mon crashed in MonitorDBStore::get_synchronizer
Hi Christian,
No, unfortunately I hit a dead end on this as the log message issue was a red herring.
I'm afraid...
Brad Hubbard
02:36 PM Bug #49809: 1 out of 3 mon crashed in MonitorDBStore::get_synchronizer
Brad were you able to find out more about the root cause of this crash? Christian Rohmann
09:09 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
Possibly related to 40119 Brad Hubbard
06:05 PM Bug #45423: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.HitSetWrite
/a/sage-2021-05-05_15:58:13-rados-wip-sage-testing-2021-05-04-1814-distro-basic-smithi/6099487
Sage Weil
07:26 PM Backport #49918: pacific: mon: slow ops due to osd_failure
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41090
merged
Yuri Weinstein
06:10 PM Backport #50666 (Resolved): pacific: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat fai...
https://github.com/ceph/ceph/pull/41182 Backport Bot
06:09 PM Bug #50595 (Pending Backport): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
Sage Weil
04:26 PM Backport #50504: octopus: mon/MonClient: reset authenticate_err in _reopen_session()
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41017
merged
Yuri Weinstein
04:25 PM Backport #50479: octopus: filestore: ENODATA error after directory split confuses transaction
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40988
merged
Yuri Weinstein
03:33 PM Bug #17257 (Can't reproduce): ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
Sage Weil
02:29 PM Bug #50659: Segmentation fault under Pacific 16.2.1 when using a custom crush location hook
I forgot to add that I tried to diff code I thought was relevant between tags v15.2.9 and v16.2.1 and thought I saw s... Andrew Davidoff
02:21 PM Bug #50659 (Resolved): Segmentation fault under Pacific 16.2.1 when using a custom crush location...
I feel like if this wasn't somehow just my problem, there'd be an issue open on it already, but I'm not seeing one, a... Andrew Davidoff
01:58 PM Bug #50658 (New): TEST_backfill_pool_priority fails
... Kefu Chai
12:39 PM Bug #50657 (Resolved): smart query on monitors
Since the upgrade to Pacific, our manager queries each daemon for smart statistics.
This is fine on the OSDs (at l...
Jan-Philipp Litza
10:51 AM Bug #49962 (Fix Under Review): 'sudo ceph --cluster ceph osd crush tunables default' fails due to...
https://github.com/ceph/ceph/pull/41169 Radoslaw Zarzynski
03:56 AM Bug #40119: api_tier_pp hung causing a dead job
/a/bhubbard-2021-04-26_22:38:21-rados-master-distro-basic-smithi/6075940
In this instance the slow requests are on...
Brad Hubbard

05/04/2021

07:28 PM Bug #50647: common: the fault handling becomes inoperational when multiple faults happen the same...
Just to the record: https://gist.github.com/rzarzynski/eb21e48a4458b593912eccd50ab8da46. Radoslaw Zarzynski
07:09 PM Bug #50647 (Fix Under Review): common: the fault handling becomes inoperational when multiple fau...
https://github.com/ceph/ceph/pull/41154 Radoslaw Zarzynski
02:42 PM Bug #50647 (Fix Under Review): common: the fault handling becomes inoperational when multiple fau...
The problem arises due to installing the fault handlers with the flag @SA_RESETHAND@. It instructs the kernel to rest... Radoslaw Zarzynski
07:26 PM Bug #50653 (Fix Under Review): common: the dump of thread IDs is in dec instead of hex
https://github.com/ceph/ceph/pull/41155 Radoslaw Zarzynski
07:08 PM Bug #50653 (Resolved): common: the dump of thread IDs is in dec instead of hex
It's a fallout from 5b8274f09951c7f36eb1ca1a234e7c8a08c30c9c. Radoslaw Zarzynski
04:39 PM Backport #50125: nautilus: mon: Modify Paxos trim logic to be more efficient
https://github.com/ceph/ceph/pull/41099 merged Yuri Weinstein
04:38 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
https://github.com/ceph/ceph/pull/41098 merged Yuri Weinstein
04:37 PM Backport #50506: nautilus: mon/MonClient: reset authenticate_err in _reopen_session()
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/41016
merged
Yuri Weinstein
04:02 PM Bug #50595 (Fix Under Review): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
... Sage Weil
03:31 PM Backport #50481: nautilus: filestore: ENODATA error after directory split confuses transaction
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40987
merged
Yuri Weinstein
03:16 PM Bug #50648 (New): nautilus: ceph status check times out
... Deepika Upadhyay
07:47 AM Bug #50637 (Duplicate): OSD slow ops warning stuck after OSD fail
We had a disk fail with 2 OSDs deployed on it, ids=580, 581. Since then, the health warning @430 slow ops, oldest one... Frank Schilder

05/03/2021

11:05 PM Backport #50344: pacific: mon: stretch state is inconsistently-maintained on peons, preventing pr...
https://github.com/ceph/ceph/pull/41130 Greg Farnum
10:59 PM Backport #50087 (In Progress): pacific: test_mon_pg: mon fails to join quorum to due election str...
Neha Ojha
09:57 PM Backport #50087: pacific: test_mon_pg: mon fails to join quorum to due election strategy mismatch
https://github.com/ceph/ceph/pull/40484 Greg Farnum
08:53 PM Bug #47719 (Resolved): api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:53 PM Bug #48946 (Resolved): Disable and re-enable clog_to_monitors could trigger assertion
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:52 PM Bug #49392 (Resolved): osd ok-to-stop too conservative
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
08:50 PM Backport #49640 (Resolved): nautilus: Disable and re-enable clog_to_monitors could trigger assertion
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/39912
m...
Loïc Dachary
08:49 PM Backport #49567 (Resolved): nautilus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40697
m...
Loïc Dachary
08:47 PM Backport #50130: nautilus: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40700
m...
Loïc Dachary
08:47 PM Backport #49531 (Resolved): nautilus: osd ok-to-stop too conservative
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40676
m...
Loïc Dachary
08:46 PM Backport #50459: nautilus: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40959
m...
Loïc Dachary
07:17 PM Bug #48732: Marking OSDs out causes mon daemons to crash following tcmalloc: large alloc
Wes Dillingham wrote:
> Hello Dan and Neha. Shortly after filing this bug I went on paternity leave but have returne...
Dan van der Ster
07:04 PM Bug #48732: Marking OSDs out causes mon daemons to crash following tcmalloc: large alloc
Hello Dan and Neha. Shortly after filing this bug I went on paternity leave but have returned today. I will try and a... Wes Dillingham
05:48 PM Bug #50595: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
The upgrade test do not fail all the time.
upgrade:nautilus-x/parallel/{0-cluster/{openstack start} 1-ceph-install...
Neha Ojha
05:21 PM Bug #50595: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
I think the upgrade test just needs to skip that test. It's just looking for a specific string that changed in pacifi... Sage Weil
04:31 PM Bug #50595 (Triaged): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
This seems related to ab0d8f2ae9f551e15a4c7bacbf69161e91263785.
Reverting makes the issue go away http://pulpito.fro...
Neha Ojha
04:36 PM Bug #48997: rados/singleton/all/recovery-preemption: defer backfill|defer recovery not found in logs
/a/yuriw-2021-04-30_13:45:05-rados-pacific-distro-basic-smithi/6086228 Neha Ojha
04:22 PM Backport #50129: octopus: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr_...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40758
m...
Loïc Dachary
04:21 PM Backport #49917: octopus: mon: slow ops due to osd_failure
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40558
m...
Loïc Dachary
04:21 PM Backport #50123: octopus: mon: Modify Paxos trim logic to be more efficient
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40699
m...
Loïc Dachary
04:21 PM Backport #49566: octopus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40756
m...
Loïc Dachary
04:20 PM Backport #49816: octopus: mon: promote_standby does not update available_modules
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40757
m...
Loïc Dachary
04:19 PM Backport #50457: octopus: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/d...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40958
m...
Loïc Dachary
04:16 PM Bug #48336 (Resolved): monmaptool --create --add nodeA --clobber monmap aborts in entity_addr_t::...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:14 PM Bug #49778 (Resolved): mon: promote_standby does not update available_modules
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Loïc Dachary
04:04 PM Backport #50124 (Resolved): pacific: mon: Modify Paxos trim logic to be more efficient
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40691
m...
Loïc Dachary
04:04 PM Backport #50480: pacific: filestore: ENODATA error after directory split confuses transaction
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40989
m...
Loïc Dachary
04:04 PM Backport #50154 (Resolved): pacific: Reproduce https://tracker.ceph.com/issues/48417
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40759
m...
Loïc Dachary
04:04 PM Backport #50131 (Resolved): pacific: monmaptool --create --add nodeA --clobber monmap aborts in e...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40690
m...
Loïc Dachary
03:53 PM Backport #50458: pacific: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/d...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40957
m...
Loïc Dachary
12:07 PM Bug #47299: Assertion in pg_missing_set: p->second.need <= v || p->second.is_delete()
I had multiple OSDs die during an upgrade from Nautilus to Octopus with this trace. See the attached crash2.txt Tobias Urdin
12:05 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Longer debug_osd output in crash1.txt file.... Tobias Urdin
12:04 PM Bug #50608: ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
Longer debug_osd output in crash1.txt file.
-1> 2021-04-29T13:51:57.756+0200 7fa2edae2700 -1 /home/jenkins-b...
Tobias Urdin
12:02 PM Bug #50608 (Need More Info): ceph_assert(is_primary()) in PrimaryLogPG::on_local_recover
This was happening on a cluster running Nautilus 14.2.18/14.2.19 during upgrading to Octopus 15.2.11 some 3-4 OSDs cr... Tobias Urdin
09:00 AM Backport #50606 (In Progress): pacific: osd/scheduler/mClockScheduler: Async reservers are not up...
Sridhar Seshasayee
08:12 AM Backport #50606 (Resolved): pacific: osd/scheduler/mClockScheduler: Async reservers are not updat...
https://github.com/ceph/ceph/pull/41125 Sridhar Seshasayee

05/02/2021

01:44 PM Bug #50510: OSD will return -EAGAIN on balance_reads although it can return the data
Neha Ojha wrote:
> Can you please provide osd logs from the primary and replica with debug_osd=20 and debug_ms=1? Th...
Zulai Wang

05/01/2021

09:54 PM Support #49847 (Closed): OSD Fails to init after upgrading to octopus: _deferred_replay failed to...
Igor Fedotov
12:12 AM Bug #50595: upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
... Neha Ojha

04/30/2021

09:53 PM Bug #50420 (Need More Info): all osd down after mon scrub too long
Can provide us with the cluster log from this time? How large is your mon db? Neha Ojha
09:21 PM Bug #50422: Error: finished tid 1 when last_acked_tid was 2
Looks like a cache tiering+short pg log bug... Neha Ojha
08:29 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Dan van der Ster wrote:
> Leaving this open to address the msgr2 abort. Presumably this is caused by the >4GB messag...
Josh Durgin
06:36 AM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Josh Durgin wrote:
> Ah good catch Dan, that loop does appear to be generating millions of '=' in nautilus. Sounds l...
Dan van der Ster
06:07 AM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Leaving this open to address the msgr2 abort. Presumably this is caused by the >4GB message generated to respond to `... Dan van der Ster
01:08 AM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Ah good catch Dan, that loop does appear to be generating millions of '=' in nautilus. Sounds like we need to fix tha... Josh Durgin
07:38 PM Bug #50595 (Resolved): upgrade:nautilus-x-pacific: LibRadosService.StatusFormat failure
Seems to be ubuntu 18.04 specific
Run: https://pulpito.ceph.com/yuriw-2021-04-29_16:10:26-upgrade:nautilus-x-pacif...
Yuri Weinstein
06:56 PM Backport #49919: nautilus: mon: slow ops due to osd_failure
Kefu, can you please help with a minimal backport for nautilus? Neha Ojha
05:38 PM Bug #50501 (Pending Backport): osd/scheduler/mClockScheduler: Async reservers are not updated wit...
Neha Ojha
04:27 PM Bug #45721: CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_ra...
/a/yuriw-2021-04-30_12:58:14-rados-wip-yuri2-testing-2021-04-29-1501-pacific-distro-basic-smithi/6086154 Neha Ojha
04:25 PM Bug #50042: rados/test.sh: api_watch_notify failures
... Neha Ojha
09:21 AM Backport #50125: nautilus: mon: Modify Paxos trim logic to be more efficient
please link this Backport tracker issue with GitHub PR https://github.com/ceph/ceph/pull/41099
ceph-backport.sh versi...
Aishwarya Mathuria

04/29/2021

11:16 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
commit 5f95ec4457059889bc4dbc2ad25cdc0537255f69 removed that loop in Monitor.cc but wasn't backported to nautilus.
...
Dan van der Ster
10:39 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Something else weird in the mgr log: can negative progress events break the mon ?... Dan van der Ster
10:26 PM Bug #50587: mon election storm following osd recreation: huge tcmalloc and ceph::msgr::v2::FrameA...
Here are some notes on the timelines between various actors at the start of the incident:
mon.cephbeesly-mon-2a00f...
Dan van der Ster
07:14 PM Bug #50587 (Resolved): mon election storm following osd recreation: huge tcmalloc and ceph::msgr:...
We recreated an osd and seconds later our mons started using 100% CPU and going into an election storm which lasted n... Dan van der Ster
08:55 PM Bug #48732: Marking OSDs out causes mon daemons to crash following tcmalloc: large alloc
Wes, did you ever find out more about the root cause of this? We saw something similar today in #50587
Dan van der Ster
07:51 PM Bug #50510 (Need More Info): OSD will return -EAGAIN on balance_reads although it can return the ...
Can you please provide osd logs from the primary and replica with debug_osd=20 and debug_ms=1? That will help us unde... Neha Ojha
05:41 PM Backport #49918 (In Progress): pacific: mon: slow ops due to osd_failure
Neha Ojha
05:06 PM Backport #50480 (Resolved): pacific: filestore: ENODATA error after directory split confuses tran...
Nathan Cutler
12:05 PM Bug #50558 (Fix Under Review): Data loss propagation after backfill
Kefu Chai
11:25 AM Bug #50558: Data loss propagation after backfill
Hi
I worked with hase-san and submitted PR to handle readdir error correctly in filestore code: https://github.com...
Tomohiro Misono
10:38 AM Fix #50574: qa/standalone: Modify/re-write failing standalone tests with mclock scheduler
Standalone failures observed here:
https://pulpito.ceph.com/sseshasa-2021-04-23_15:37:51-rados-wip-mclock-max-backfi...
Sridhar Seshasayee
05:43 AM Fix #50574 (Resolved): qa/standalone: Modify/re-write failing standalone tests with mclock scheduler
A subset of the existing qa/standlone tests are failing with osd_op_queue set to "mclock_scheduler".
This is mainl...
Sridhar Seshasayee
10:19 AM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
We run into the same assert over and over on one OSD. We were upgrading from luminous to nautilus ceph version 14.2.2... Ana Aviles

04/28/2021

12:17 PM Bug #50558 (Resolved): Data loss propagation after backfill
Situation:
An OSD data loss has been propagated to other OSDs. If backfill is performed when shard is missing in a p...
Jin Hase

04/27/2021

09:14 PM Backport #50130 (Resolved): nautilus: monmaptool --create --add nodeA --clobber monmap aborts in ...
Brad Hubbard
04:48 PM Backport #50130: nautilus: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40700
merged
Yuri Weinstein
09:06 PM Backport #49567: nautilus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40697
merged
Yuri Weinstein
04:47 PM Backport #49531: nautilus: osd ok-to-stop too conservative
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40676
merged
Yuri Weinstein
12:37 PM Bug #50536 (New): "Command failed (workunit test rados/test.sh)" - rados/test.sh times out on mas...
/a/sseshasa-2021-04-23_18:11:53-rados-wip-sseshasa-testing-2021-04-23-2212-distro-basic-smithi/6068991
Noticed a t...
Sridhar Seshasayee
11:23 AM Bug #49888: rados/singleton: radosbench.py: teuthology.exceptions.MaxWhileTries: reached maximum ...
Observed the same failure here:
/a/sseshasa-2021-04-23_18:11:53-rados-wip-sseshasa-testing-2021-04-23-2212-distro-ba...
Sridhar Seshasayee
06:08 AM Backport #50129 (Resolved): octopus: monmaptool --create --add nodeA --clobber monmap aborts in e...
Kefu Chai

04/26/2021

09:33 PM Backport #50124: pacific: mon: Modify Paxos trim logic to be more efficient
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40691
merged
Yuri Weinstein
09:30 PM Backport #50480: pacific: filestore: ENODATA error after directory split confuses transaction
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40989
merged
Yuri Weinstein
09:29 PM Backport #50154: pacific: Reproduce https://tracker.ceph.com/issues/48417
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40759
merged
Yuri Weinstein
09:27 PM Backport #50131: pacific: monmaptool --create --add nodeA --clobber monmap aborts in entity_addr_...
Backport Bot wrote:
> https://github.com/ceph/ceph/pull/40690
merged
Yuri Weinstein
06:32 PM Bug #50512: upgrade:nautilus-p2p-nautilus: unhandled event in ToDelete
... Neha Ojha
09:54 AM Bug #50351: osd: FAILED ceph_assert(recovering.count(*i)) after non-primary osd restart when in b...
`PrimaryLogPG::on_failed_pull` [1] looks suspicious to me. We remove the oid from `backfills_in_flight` here only if ... Mykola Golub

04/25/2021

11:41 PM Bug #50512 (Won't Fix - EOL): upgrade:nautilus-p2p-nautilus: unhandled event in ToDelete
Run: https://pulpito.ceph.com/teuthology-2021-04-22_01:25:03-upgrade:nautilus-p2p-nautilus-distro-basic-smithi/
Job:...
Yuri Weinstein
05:59 PM Bug #50510 (Need More Info): OSD will return -EAGAIN on balance_reads although it can return the ...
PrimaryLogPG.cc:
if (!is_primary()) {
if (!recovery_state.can_serve_replica_read(oid)) {
dout(20) <...
Or Friedmann
02:23 PM Bug #50508: ceph-mon crash when create pool
Hi, the ceph-mon process crashed when I created a pool. ... chen long
02:18 PM Bug #50508: ceph-mon crash when create pool
[root@arsenal-ceph-test-167 ~]# ceph crash ls
ID ENTIT...
chen long
02:15 PM Bug #50508 (New): ceph-mon crash when create pool
chen long
10:00 AM Backport #50505 (In Progress): pacific: mon/MonClient: reset authenticate_err in _reopen_session()
Ilya Dryomov
10:00 AM Backport #50504 (In Progress): octopus: mon/MonClient: reset authenticate_err in _reopen_session()
Ilya Dryomov
09:59 AM Backport #50506 (In Progress): nautilus: mon/MonClient: reset authenticate_err in _reopen_session()
Ilya Dryomov
02:52 AM Backport #49917 (Resolved): octopus: mon: slow ops due to osd_failure
Kefu Chai
02:51 AM Backport #50123 (Resolved): octopus: mon: Modify Paxos trim logic to be more efficient
Kefu Chai
02:49 AM Backport #49566 (Resolved): octopus: api_watch_notify: LibRadosWatchNotify.AioWatchDelete2 fails
Kefu Chai
02:49 AM Backport #49816 (Resolved): octopus: mon: promote_standby does not update available_modules
Kefu Chai

04/24/2021

05:55 AM Backport #50506 (Resolved): nautilus: mon/MonClient: reset authenticate_err in _reopen_session()
https://github.com/ceph/ceph/pull/41016 Backport Bot
05:55 AM Backport #50505 (Resolved): pacific: mon/MonClient: reset authenticate_err in _reopen_session()
https://github.com/ceph/ceph/pull/41019 Backport Bot
05:55 AM Backport #50504 (Resolved): octopus: mon/MonClient: reset authenticate_err in _reopen_session()
https://github.com/ceph/ceph/pull/41017 Backport Bot
05:51 AM Bug #50477 (Pending Backport): mon/MonClient: reset authenticate_err in _reopen_session()
Kefu Chai
05:20 AM Bug #49961: scrub/osd-recovery-scrub.sh: TEST_recovery_scrub_1 failed
/a/kchai-2021-04-24_04:07:09-rados-wip-kefu-testing-2021-04-23-2026-distro-basic-smithi/6070018 Kefu Chai

04/23/2021

02:40 PM Bug #43489: PG.cc: 953: FAILED assert(0 == "past_interval start interval mismatch")
This has been observed a handful of times at AT&T over the last six months or so. I'm afraid I don't have logs, but t... Steve Taylor
01:45 PM Bug #50501 (Fix Under Review): osd/scheduler/mClockScheduler: Async reservers are not updated wit...
Sridhar Seshasayee
01:06 PM Bug #50501 (Resolved): osd/scheduler/mClockScheduler: Async reservers are not updated with the ov...
The local and remote Async reserver objects are not updated with the new overridden values as part of mClockScheduler... Sridhar Seshasayee
08:58 AM Bug #50466: _delete_some additional unexpected onode list
Konstantin Shalygin wrote:
> Actually, when PG objects deleted with sleep 1 (default) - NVMe is not loaded, but when...
Dan van der Ster
08:52 AM Bug #50466: _delete_some additional unexpected onode list
Actually, when PG objects deleted with sleep 1 (default) - NVMe is not loaded, but when pg header is deleted - is hug... Konstantin Shalygin

04/22/2021

05:53 PM Bug #50466: _delete_some additional unexpected onode list
https://github.com/ceph/ceph/pull/40993 Neha Ojha
04:09 PM Bug #49591: no active mgr (MGR_DOWN)" in cluster log
/a/yuriw-2021-04-21_15:39:30-rados-wip-yuri5-testing-2021-04-20-0819-pacific-distro-basic-smithi/6061973/ Neha Ojha
03:24 PM Backport #50480 (In Progress): pacific: filestore: ENODATA error after directory split confuses t...
Mykola Golub
01:25 PM Backport #50480 (Resolved): pacific: filestore: ENODATA error after directory split confuses tran...
https://github.com/ceph/ceph/pull/40989 Backport Bot
03:23 PM Backport #50479 (In Progress): octopus: filestore: ENODATA error after directory split confuses t...
Mykola Golub
01:25 PM Backport #50479 (Resolved): octopus: filestore: ENODATA error after directory split confuses tran...
https://github.com/ceph/ceph/pull/40988 Backport Bot
03:20 PM Backport #50481 (In Progress): nautilus: filestore: ENODATA error after directory split confuses ...
Mykola Golub
01:25 PM Backport #50481 (Resolved): nautilus: filestore: ENODATA error after directory split confuses tra...
https://github.com/ceph/ceph/pull/40987 Backport Bot
01:25 PM Bug #50352 (Resolved): LibRadosTwoPoolsPP.ManifestSnapRefcount failure
Kefu Chai
01:20 PM Bug #50395 (Pending Backport): filestore: ENODATA error after directory split confuses transaction
Kefu Chai
10:36 AM Bug #50477 (Fix Under Review): mon/MonClient: reset authenticate_err in _reopen_session()
Ilya Dryomov
10:25 AM Bug #50477 (Resolved): mon/MonClient: reset authenticate_err in _reopen_session()
Otherwise, if "mon host" list has at least one unqualified IP address without a port and both msgr1 and msgr2 are tur... Ilya Dryomov
07:11 AM Bug #50245: TEST_recovery_scrub_2: Not enough recovery started simultaneously
/a/kchai-2021-04-22_05:10:24-rados-wip-kefu-testing-2021-04-22-1017-distro-basic-smithi/6063735/ Kefu Chai
02:41 AM Bug #49428: ceph_test_rados_api_snapshots fails with "rados_mon_command osd pool create failed wi...
/a/sage-2021-03-28_19:04:26-rados-wip-sage2-testing-2021-03-28-0933-pacific-distro-basic-smithi/6007274 Brad Hubbard
02:40 AM Bug #49428: ceph_test_rados_api_snapshots fails with "rados_mon_command osd pool create failed wi...
/a/nojha-2021-04-15_20:05:27-rados-wip-50217-distro-basic-smithi/6049636... Brad Hubbard
02:40 AM Bug #50042: rados/test.sh: api_watch_notify failures
The actual failure that caused the segfault for /a/nojha-2021-04-15_20:05:27-rados-wip-50217-distro-basic-smithi/6049... Brad Hubbard
12:39 AM Bug #50042: rados/test.sh: api_watch_notify failures
-Core from /a/nojha-2021-04-15_20:05:27-rados-wip-50217-distro-basic-smithi/6049636 is the issue seen in https://trac... Brad Hubbard
02:31 AM Bug #50473: ceph_test_rados_api_lock_pp segfault in librados::v14_2_0::RadosClient::wait_for_osdm...
I suspect the Rados object was deleted in another, now finished, thread while this thread was still using it. This is... Brad Hubbard
02:16 AM Bug #50473 (Can't reproduce): ceph_test_rados_api_lock_pp segfault in librados::v14_2_0::RadosCli...
/a/nojha-2021-04-15_20:05:27-rados-wip-50217-distro-basic-smithi/6049636... Brad Hubbard

04/21/2021

11:31 PM Bug #50466: _delete_some additional unexpected onode list
I think #178:ec000000::::head# is just a pgmeta object which is skipped until all other objects are removed. Note how... Neha Ojha
02:42 PM Bug #50466: _delete_some additional unexpected onode list
osd log with debugging:
ceph-post-file: 09094430-abdb-4248-812c-47b7babae06c
Dan van der Ster
02:39 PM Bug #50466 (Resolved): _delete_some additional unexpected onode list
After updating to 14.2.19 and then moving some PGs around we have a few warnings related to the new efficient PG remo... Dan van der Ster
06:06 PM Bug #50468 (New): Simultaneous mon daemon crash on cephfs mount
I have a newly installed 16.2.0 cluster consisting of 5 nodes deployed using cephadm on Ubuntu 18.04.5. I have create... David Prude
05:30 PM Backport #50457 (Resolved): octopus: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReq...
Ernesto Puerta
10:49 AM Backport #50457 (In Progress): octopus: ERROR: test_version (tasks.mgr.dashboard.test_api.Version...
Ernesto Puerta
10:44 AM Backport #50457 (Resolved): octopus: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReq...
https://github.com/ceph/ceph/pull/40958 Ernesto Puerta
05:30 PM Bug #50374 (Resolved): ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/dash...
Ernesto Puerta
10:43 AM Bug #50374 (Pending Backport): ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) ...
Ernesto Puerta
05:29 PM Backport #50458 (Resolved): pacific: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReq...
Ernesto Puerta
10:48 AM Backport #50458 (In Progress): pacific: ERROR: test_version (tasks.mgr.dashboard.test_api.Version...
Ernesto Puerta
10:44 AM Backport #50458 (Resolved): pacific: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReq...
https://github.com/ceph/ceph/pull/40957 Ernesto Puerta
01:56 PM Backport #50459 (Resolved): nautilus: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionRe...
Ernesto Puerta
10:52 AM Backport #50459 (In Progress): nautilus: ERROR: test_version (tasks.mgr.dashboard.test_api.Versio...
Ernesto Puerta
10:45 AM Backport #50459 (Resolved): nautilus: ERROR: test_version (tasks.mgr.dashboard.test_api.VersionRe...
https://github.com/ceph/ceph/pull/40959 Ernesto Puerta
11:30 AM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
finaly the correct format
The issue started on luminous and it looked like an instance of https://tracker.ceph.com...
Martin Steinigen
11:20 AM Bug #50462: OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.count(clone))
Sorry for formatting that bad
The issue started on luminous and it looked like an instance of https://tracker.c...
Martin Steinigen
11:17 AM Bug #50462 (Won't Fix - EOL): OSDs crash in osd/osd_types.cc: FAILED ceph_assert(clone_overlap.co...

The issue started on luminous and it looked like an instance of https://tracker.ceph.com/issues/23030, so we decide...
Martin Steinigen
09:01 AM Backport #49991 (Resolved): nautilus: unittest_mempool.check_shard_select failed
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/40567
m...
Loïc Dachary
06:45 AM Bug #50042: rados/test.sh: api_watch_notify failures
Neha Ojha wrote:
> Looks similar to https://tracker.ceph.com/issues/50042#note-2, feel free to create a separate tra...
Brad Hubbard
04:16 AM Bug #50042: rados/test.sh: api_watch_notify failures
I believe the log snippet below shows a race. We have just called and sent the unwatch command to osd4 but just after... Brad Hubbard
03:05 AM Bug #50371 (Closed): Segmentation fault (core dumped) ceph_test_rados_api_watch_notify_pp
This looks like an issue that only comes about because of the problem seen in https://tracker.ceph.com/issues/50042. ... Brad Hubbard
12:34 AM Bug #50446 (Fix Under Review): PGs always go into active+clean+scrubbing+deep+repair in the LRC
Neha Ojha
12:29 AM Bug #50446 (Triaged): PGs always go into active+clean+scrubbing+deep+repair in the LRC
... Neha Ojha

04/20/2021

11:51 PM Bug #50446 (Pending Backport): PGs always go into active+clean+scrubbing+deep+repair in the LRC
... Neha Ojha
10:08 PM Bug #50042: rados/test.sh: api_watch_notify failures
Looking at the latest issue (ignoring the segfault which is being tracked in https://tracker.ceph.com/issues/50371) t... Brad Hubbard
03:00 PM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
We are using two distinct conditions to decide whether a candidate PG is already being scrubbed. The OSD checks pg->i... Ronen Friedman
12:59 PM Bug #50441 (Rejected): cephadm bootstrap on arm64 fails to start ceph/ceph-grafana service
Hello,
I installed a new Ceph 15.2.10 cluster on Ubuntu 20.04 arm64 bare metal starting with a first monitor/manag...
M B

04/19/2021

11:02 PM Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors i...
showed up in a pacific->master upgrade test... Neha Ojha
02:49 PM Bug #50368 (Resolved): common/PriorityCache.cc: FAILED ceph_assert(mem_avail >= 0) in radosbench_...
Neha Ojha
12:29 PM Bug #50422 (New): Error: finished tid 1 when last_acked_tid was 2
... Sage Weil
12:26 PM Bug #50396: leak in PrimaryLogPG::inc_refcount_by_set
/a/sage-2021-04-18_22:27:23-rados-wip-sage-testing-2021-04-18-1607-distro-basic-smithi/6056492
/a/sage-2021-04-18_22...
Sage Weil
10:56 AM Bug #50393 (Resolved): CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubuntu/cephtest/m...
Kefu Chai
10:07 AM Bug #50420 (Need More Info): all osd down after mon scrub too long
Hi all.
My cluster has 5 mons. Everything is ok.
my ceph mon config ...
hoan nv
08:29 AM Bug #50299 (Resolved): PrimaryLogPG::inc_refcount_by_set leak
Kefu Chai

04/18/2021

01:16 AM Bug #50352: LibRadosTwoPoolsPP.ManifestSnapRefcount failure
https://github.com/ceph/ceph/pull/40900 Myoungwon Oh

04/17/2021

12:59 AM Bug #50368 (Fix Under Review): common/PriorityCache.cc: FAILED ceph_assert(mem_avail >= 0) in rad...
Neha Ojha wrote:
> Tests are passing with fdb4f834486, the commit before https://github.com/ceph/ceph/pull/40731 mer...
Neha Ojha
12:04 AM Bug #50368 (Triaged): common/PriorityCache.cc: FAILED ceph_assert(mem_avail >= 0) in radosbench_o...
Tests are passing with fdb4f834486, the commit before https://github.com/ceph/ceph/pull/40731 merged - https://pulpit... Neha Ojha

04/16/2021

10:20 PM Backport #50406 (Resolved): pacific: mon: new monitors may direct MMonJoin to a peon instead of t...
https://github.com/ceph/ceph/pull/41131 Backport Bot
10:16 PM Bug #50345 (Pending Backport): mon: new monitors may direct MMonJoin to a peon instead of the leader
Neha Ojha
10:14 PM Bug #50346: OSD crash FAILED ceph_assert(!is_scrubbing())
Ronen, can you please take a look at this bug. Neha Ojha
09:51 PM Bug #50396 (Duplicate): leak in PrimaryLogPG::inc_refcount_by_set
Neha Ojha
12:04 PM Bug #50396 (Duplicate): leak in PrimaryLogPG::inc_refcount_by_set
... Sage Weil
08:57 PM Bug #50368: common/PriorityCache.cc: FAILED ceph_assert(mem_avail >= 0) in radosbench_omap_write ...
The tests are consistently failing in master - https://pulpito.ceph.com/nojha-2021-04-16_15:33:38-rados:perf-wip-5021... Neha Ojha
03:23 PM Bug #50368: common/PriorityCache.cc: FAILED ceph_assert(mem_avail >= 0) in radosbench_omap_write ...
rados/perf/{ceph mon_election/classic objectstore/bluestore-basic-min-osd-mem-target openstack scheduler/dmclock_1Sha... Neha Ojha
02:33 PM Bug #50368: common/PriorityCache.cc: FAILED ceph_assert(mem_avail >= 0) in radosbench_omap_write ...
rados/perf/{ceph mon_election/classic objectstore/bluestore-low-osd-mem-target openstack scheduler/dmclock_default_sh... Neha Ojha
08:13 PM Bug #50398 (Duplicate): cls_cas.dup_get fails with ENOENT
Neha Ojha
12:10 PM Bug #50398 (Duplicate): cls_cas.dup_get fails with ENOENT
... Sage Weil
08:05 PM Bug #50404 (New): qa/workunits/mon/crush_ops.sh: Error ENOENT: no weight-set for pool
... Neha Ojha
05:43 PM Bug #50042: rados/test.sh: api_watch_notify failures
Looks similar to https://tracker.ceph.com/issues/50042#note-2, feel free to create a separate tracker, Brad.... Neha Ojha
05:39 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
/a/nojha-2021-04-15_20:05:27-rados-wip-50217-distro-basic-smithi/6049676 Neha Ojha
02:31 PM Bug #50397 (Duplicate): src/common/PriorityCache.cc: 301: FAILED ceph_assert(mem_avail >= 0)
Neha Ojha
12:06 PM Bug #50397 (Duplicate): src/common/PriorityCache.cc: 301: FAILED ceph_assert(mem_avail >= 0)
... Sage Weil
08:38 AM Bug #50395 (Resolved): filestore: ENODATA error after directory split confuses transaction
We had a case reported by our customer, when a faulty disk was returning ENODATA error on directory split and it crea... Mykola Golub
04:19 AM Bug #50393 (Fix Under Review): CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubuntu/ce...
Kefu Chai
04:17 AM Bug #50393: CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client...
mon/test_mon_config_key.py
/a/kchai-2021-04-15_08:31:03-rados-wip-kefu-testing-2021-04-15-1359-distro-basic-smithi...
Kefu Chai
04:16 AM Bug #50393 (Resolved): CommandCrashedError: Command crashed: 'mkdir -p -- /home/ubuntu/cephtest/m...
https://sentry.ceph.com/organizations/ceph/issues/7316/... Sentry Bot
03:42 AM Bug #50299 (Fix Under Review): PrimaryLogPG::inc_refcount_by_set leak
Kefu Chai
03:05 AM Bug #50299: PrimaryLogPG::inc_refcount_by_set leak
https://github.com/ceph/ceph/pull/40879 Myoungwon Oh
01:43 AM Bug #50299: PrimaryLogPG::inc_refcount_by_set leak
created https://github.com/ceph/ceph/pull/40878 on my way looking into this issue. Kefu Chai
01:25 AM Bug #50299: PrimaryLogPG::inc_refcount_by_set leak
I'll take a look. Myoungwon Oh
12:54 AM Bug #50299: PrimaryLogPG::inc_refcount_by_set leak
/a/kchai-2021-04-15_08:31:03-rados-wip-kefu-testing-2021-04-15-1359-distro-basic-smithi/6048611 Kefu Chai
12:33 AM Bug #50299: PrimaryLogPG::inc_refcount_by_set leak
seems related to https://github.com/ceph/ceph/pull/39216 Neha Ojha
12:25 AM Bug #50299: PrimaryLogPG::inc_refcount_by_set leak
/a/nojha-2021-04-15_20:05:27-rados-wip-50217-distro-basic-smithi/6049402 Neha Ojha
02:58 AM Bug #37808: osd: osdmap cache weak_refs assert during shutdown
/ceph/teuthology-archive/pdonnell-2021-04-15_01:35:57-fs-wip-pdonnell-testing-20210414.230315-distro-basic-smithi/604... Patrick Donnelly
02:55 AM Bug #50222: osd: 5.2s0 deep-scrub : stat mismatch
https://pulpito.ceph.com/pdonnell-2021-04-15_01:35:57-fs-wip-pdonnell-testing-20210414.230315-distro-basic-smithi/604... Patrick Donnelly
01:48 AM Bug #50384: pacific ceph-mon: mon initial failed on aarch64
as bug #48681 recording,
https://tracker.ceph.com/issues/48681#note-14 rosin luo wrote:
> this bug has been solved...
Zhiwei Dai
01:31 AM Bug #50384 (Resolved): pacific ceph-mon: mon initial failed on aarch64
OS: centos8.1, 4.18.0, selinux is disabled
ceph: pacific 16.2.0 aarch64
platform: Kunpeng920 5250@2.6GHz
rpm bin...
Zhiwei Dai

04/15/2021

11:41 PM Bug #50371: Segmentation fault (core dumped) ceph_test_rados_api_watch_notify_pp
Here's the log from the coredump that I believe should pinpoint the issue but I'll need to analyse it further today.
...
Brad Hubbard
05:35 AM Bug #50371: Segmentation fault (core dumped) ceph_test_rados_api_watch_notify_pp
Looking at the coredump.... Brad Hubbard
03:43 AM Bug #50371 (New): Segmentation fault (core dumped) ceph_test_rados_api_watch_notify_pp
/a/nojha-2021-04-14_00:54:53-rados-master-distro-basic-smithi/6044164... Brad Hubbard
06:13 PM Bug #50376 (New): upgrade-clients:client-upgrade-octopus-pacific: cluster [WRN] Health check fail...
https://pulpito.ceph.com/ideepika-2021-04-15_12:31:36-upgrade-clients:client-upgrade-octopus-pacific-wip-rgw-dpp-upda... Neha Ojha
05:42 PM Bug #50376 (Resolved): upgrade-clients:client-upgrade-octopus-pacific: cluster [WRN] Health check...
just need to disable it for upgrade suite for now Deepika Upadhyay
02:12 PM Bug #50376: upgrade-clients:client-upgrade-octopus-pacific: cluster [WRN] Health check failed: mo...
https://sentry.ceph.com/organizations/ceph/issues/739/events/8a47fcc562d74339989ae5d93c3d1669/events/?project=2 Deepika Upadhyay
11:05 AM Bug #50376 (New): upgrade-clients:client-upgrade-octopus-pacific: cluster [WRN] Health check fail...
... Deepika Upadhyay
01:33 PM Bug #50374 (Resolved): ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/dash...
Kefu Chai
01:11 PM Bug #50374 (Fix Under Review): ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) ...
Kefu Chai
05:44 AM Bug #50374 (Resolved): ERROR: test_version (tasks.mgr.dashboard.test_api.VersionReqTest) mgr/dash...
... Kefu Chai
01:25 PM Bug #50339 (Resolved): test_cls_cas failure: FAILED cls_cas.dup_get
Kefu Chai
03:37 AM Bug #50339 (Fix Under Review): test_cls_cas failure: FAILED cls_cas.dup_get
Kefu Chai
03:44 AM Bug #50042: rados/test.sh: api_watch_notify failures
Created https://tracker.ceph.com/issues/50371 for the segfault analysis from /a/nojha-2021-04-14_00:54:53-rados-maste... Brad Hubbard
 

Also available in: Atom