Project

General

Profile

Activity

From 05/05/2019 to 06/03/2019

06/03/2019

09:06 PM Support #40103: ceph monitor cannot start
The ceph-users@ceph.com mailing list is a more reliable way to get help on issues like this. Looks like the OSDMap ha... Greg Farnum
09:00 PM Bug #40117 (Duplicate): PG stuck in WaitActingChange
osd.9 requests a switch to acting set=[5] from [9,5] which never shows up. The teuthology test hangs waiting for tha... Samuel Just
08:44 PM Bug #39282 (Resolved): EIO from process_copy_chunk_manifest
Sage Weil
03:48 PM Bug #40112 (Resolved): mon: rados/multimon tests fail with clock skew
See
http://pulpito.ceph.com/sage-2019-05-30_21:14:09-rados:multimon-master-distro-basic-smithi/
or
http://p...
Sage Weil
02:06 PM Bug #39115: ceph pg repair doesn't fix itself if osd is bluestore
See #39116 for the stack trace.
I initially thought that this and the other issue were two separate problems. How...
Iain Buclaw

06/01/2019

10:23 AM Backport #38850 (Resolved): upgrade: 1 nautilus mon + 1 luminous mon can't automatically form quorum
Nathan Cutler
12:36 AM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
I'd say the odds are high migrating the bucket indexes to bluestore would fix it - the omap structure there is very s... Josh Durgin

05/31/2019

10:32 PM Bug #39115: ceph pg repair doesn't fix itself if osd is bluestore
Since OSDs are crashing we should get stack traces out of the logs (e.g osd.9). Per http://tracker.ceph.com/issues/39... David Zafman
08:19 PM Backport #38850: upgrade: 1 nautilus mon + 1 luminous mon can't automatically form quorum
Joao Eduardo Luis wrote:
> backport PR to nautilus: https://github.com/ceph/ceph/pull/28262
merged
Yuri Weinstein
07:38 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
We are planning on migrating all of our clusters to BlueStore, but that's going to take the rest of the year. We cou... Bryan Stillwell
07:08 PM Support #40103 (New): ceph monitor cannot start
I have a ceph cluster running over 2 years and the monitor began crash since yesterday. I had some flapping OSDs up a... JIANYU LI
01:11 AM Bug #40073 (In Progress): PG scrub stamps reset to 0.000000
David Zafman

05/30/2019

10:49 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Ok, so this is a different bug then. Any chance you're planning on migrating to bluestore with part of one of the pro... Josh Durgin
09:39 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Hey Josh,
We backfilled onto the SSDs by creating a new crush rule which just uses the ssd class and switching the...
Bryan Stillwell
02:49 PM Bug #40081: mon: luminous crash attempting to decode maps after nautilus quorum has been formed
-https://github.com/ceph/ceph/pull/28323- (closed; see Pull Request ID field for the real PR)
This actually has us...
Joao Eduardo Luis
10:39 AM Bug #40081 (Closed): mon: luminous crash attempting to decode maps after nautilus quorum has been...
While upgrading, we found a rather annoying corner case:
Assuming we start with 3 luminous ceph-mon, upgrading fro...
Joao Eduardo Luis
01:48 PM Backport #40084 (Resolved): nautilus: osd: Better error message when OSD count is less than osd_p...
https://github.com/ceph/ceph/pull/29992 Nathan Cutler
01:47 PM Backport #40083 (Resolved): mimic: osd: Better error message when OSD count is less than osd_pool...
https://github.com/ceph/ceph/pull/30180 Nathan Cutler
01:47 PM Backport #40082 (Resolved): luminous: osd: Better error message when OSD count is less than osd_p...
https://github.com/ceph/ceph/pull/30298 Nathan Cutler
01:29 PM Feature #38617 (Pending Backport): osd: Better error message when OSD count is less than osd_pool...
Kefu Chai
09:11 AM Backport #39699 (Resolved): nautilus: OSD down on snaptrim.
Nathan Cutler
05:00 AM Bug #23387 (Resolved): Building Ceph on armhf fails due to out-of-memory
i am resolving this issue. as quite a few (probably all) of issues noted by Louwrentius have been addressed by Daniel... Kefu Chai
12:48 AM Bug #39723 (Duplicate): osd: valgrind Leak_DefinitelyLost
Greg Farnum

05/29/2019

10:34 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Hey Bryan, Neha's out this week. I'd like to verify whether this could be the same bug we'd seen before (http://track... Josh Durgin
10:07 PM Backport #39699: nautilus: OSD down on snaptrim.
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28203
merged
Yuri Weinstein
09:42 PM Bug #38827 (Fix Under Review): valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWir...
https://github.com/ceph/ceph/pull/28305 Radoslaw Zarzynski
02:54 PM Bug #38827: valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authent...
Second run (on slightly amended branch): http://pulpito.front.sepia.ceph.com/rzarzynski-2019-05-29_13:08:09-rgw-wip-b... Radoslaw Zarzynski
09:41 PM Bug #39723 (Fix Under Review): osd: valgrind Leak_DefinitelyLost
Greg Farnum
09:24 PM Bug #39723: osd: valgrind Leak_DefinitelyLost
Okay, simple osdmap pointer assignment snafu. Working on a quick PR. Greg Farnum
09:36 PM Bug #40073: PG scrub stamps reset to 0.000000

When auto repair is enabled a bug causes a regular scrub to reset time stamps which is only intended to happen when...
David Zafman
07:49 PM Bug #40073: PG scrub stamps reset to 0.000000
The similarity to #40066 is so striking I just had to mention it and create a "Relates to" link. Nathan Cutler
06:47 PM Bug #40073: PG scrub stamps reset to 0.000000
A full pg query:... Greg Farnum
06:47 PM Bug #40073 (Resolved): PG scrub stamps reset to 0.000000
From Ceph-users, https://www.spinics.net/lists/ceph-users/msg52869.html
After upgrading from 14.2.0 to 14.2.1, I'v...
Greg Farnum
09:02 PM Bug #40078 (Resolved): qa/standalone/scrub/osd-scrub-snaps.sh sometimes fails

yuriw-2019-05-16_23:32:37-rados-mimic_v13.2.6_QE-distro-basic-smithi/3959865
Command failed (workunit test scrub...
David Zafman
05:39 PM Bug #40070: mon/OSDMonitor: target_size_bytes integer overflow
This worked fine for me in an earlier version of this cluster, which was running 14.2.0. But it's possible things oth... Nathan Fish
05:37 PM Bug #40070 (Rejected): mon/OSDMonitor: target_size_bytes integer overflow
Nautilus 14.2.1 on Ubuntu 18.04 LTS, kernel 4.18 (HWE)
It appears that the "target_size_bytes" setting has an inte...
Nathan Fish
11:26 AM Backport #39375 (Resolved): nautilus: ceph tell osd.xx bench help : gives wrong help
Nathan Cutler
11:25 AM Backport #39421 (Resolved): nautilus: Don't mark removed osds in when running "ceph osd in any|al...
Nathan Cutler
11:25 AM Backport #39721 (Resolved): nautilus: short pg log+nautilus-p2p-stress-split: "Error: finished ti...
Nathan Cutler
11:25 AM Bug #39441 (Resolved): osd acting cycle
Nathan Cutler
11:25 AM Backport #39512 (Resolved): nautilus: osd acting cycle
Nathan Cutler
11:24 AM Backport #39514 (Resolved): nautilus: osd: segv in _preboot -> heartbeat
Nathan Cutler
11:24 AM Backport #39519 (Resolved): nautilus: snaps missing in mapper, should be: ca was r -2...repaired
Nathan Cutler
11:23 AM Backport #39539 (Resolved): nautilus: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()-...
Nathan Cutler
11:23 AM Backport #39043 (Resolved): nautilus: osd/PGLog: preserve original_crt to check rollbackability
Nathan Cutler
11:21 AM Backport #39432 (Resolved): nautilus: Degraded PG does not discover remapped data on originating OSD
Nathan Cutler
11:21 AM Bug #39263 (Resolved): rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) Shutting...
Nathan Cutler
11:21 AM Backport #39419 (Resolved): nautilus: rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elect...
Nathan Cutler
11:18 AM Backport #39219 (Resolved): nautilus: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_m...
Nathan Cutler

05/28/2019

08:58 PM Bug #38827: valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authent...
Scheduled a resurrected run for validation: http://pulpito.front.sepia.ceph.com/rzarzynski-2019-05-28_20:56:45-rgw-wi... Radoslaw Zarzynski
05:43 PM Bug #38827: valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authent...
Changeset: https://github.com/ceph/ceph/compare/master...rzarzynski:wip-bug-38827. Radoslaw Zarzynski
05:13 PM Bug #38827: valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authent...
This bug looks like being duplicated by of http://tracker.ceph.com/issues/39449 which has been addressed with a pair ... Radoslaw Zarzynski
04:10 PM Backport #39375: nautilus: ceph tell osd.xx bench help : gives wrong help
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28035
merged
Yuri Weinstein
04:10 PM Backport #39421: nautilus: Don't mark removed osds in when running "ceph osd in any|all|*"
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28072
merged
Yuri Weinstein
04:09 PM Backport #39721: nautilus: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when la...
David Zafman wrote:
> https://github.com/ceph/ceph/pull/28088
merged
Yuri Weinstein
04:08 PM Backport #39512: nautilus: osd acting cycle
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28160
merged
Yuri Weinstein
04:08 PM Backport #39514: nautilus: osd: segv in _preboot -> heartbeat
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28164
merged
Yuri Weinstein
04:07 PM Backport #39519: nautilus: snaps missing in mapper, should be: ca was r -2...repaired
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28205
merged
Yuri Weinstein
04:07 PM Backport #39539: nautilus: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log()....
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28219
merged
Yuri Weinstein
04:06 PM Backport #39043: nautilus: osd/PGLog: preserve original_crt to check rollbackability
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27632
merged
Yuri Weinstein
04:04 PM Backport #39432: nautilus: Degraded PG does not discover remapped data on originating OSD
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27744
merged
Yuri Weinstein
04:03 PM Backport #39419: nautilus: rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) Shut...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27771
merged
Yuri Weinstein
04:03 PM Backport #39219: nautilus: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(soid...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27839
merged
Yuri Weinstein
03:29 PM Bug #39449 (Resolved): Uninit in EVP_DecryptFinal_ex on ceph::crypto::onwire::AES128GCM_OnWireRxH...
This has been backported with:
* https://github.com/ceph/ceph/pull/27320,
* https://github.com/ceph/ceph/pull/27321...
Radoslaw Zarzynski
10:18 AM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
I removed osd.0 and osd.1 from host-247, and re-ran deployment of osds to host-371. Both got added successfully.
...
Iain Buclaw
09:44 AM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
Attached logs of primary monitor with:
debug mon 10
debug ms 1
Started prior to osd-57 being added, and stopped ...
Iain Buclaw
09:41 AM Backport #38850: upgrade: 1 nautilus mon + 1 luminous mon can't automatically form quorum
backport PR to nautilus: https://github.com/ceph/ceph/pull/28262 Joao Eduardo Luis
03:48 AM Bug #40035 (New): smoke.sh failing in jenkins "make check" test randomly
... Kefu Chai
02:40 AM Backport #39538 (In Progress): mimic: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()-...
https://github.com/ceph/ceph/pull/28259 Prashant D

05/27/2019

04:22 PM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
Happens on any host I create osd.57 on. Iain Buclaw
03:53 PM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
Recreating the OSDs, it seems that the monitors consistently crash when creating osd.57. And they consistently recov... Iain Buclaw
03:21 PM Bug #40029 (Resolved): ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(Cep...
When adding a new osd, all primary monitors crashed.... Iain Buclaw

05/25/2019

08:44 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
Note: I've only seen this in a relatively busy and full environment with quite a few backfills going on. Rene Diepstraten
08:21 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
Thanks for the PR.
The problem itself seems to be caused as follows:
- A backfill starts to a set osds
- One of th...
Rene Diepstraten

05/24/2019

08:23 PM Bug #20491 (Fix Under Review): objecter leaked OSDMap in handle_osd_map
https://github.com/ceph/ceph/pull/28242
I think we shouldn't backport the fix, as it might upset misbehaved (unloc...
Sage Weil
08:20 PM Bug #20491 (In Progress): objecter leaked OSDMap in handle_osd_map
... Sage Weil
08:45 AM Bug #36405: unittest_seastar_messenger failure on ARM
Another one:... Sebastian Wagner
12:27 AM Backport #39518 (In Progress): mimic: snaps missing in mapper, should be: ca was r -2...repaired
https://github.com/ceph/ceph/pull/28232 Prashant D

05/23/2019

10:30 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Neha,
It was great meeting with you in Barcelona! I can't remember everything you wanted me to gather, but here's...
Bryan Stillwell
06:49 AM Backport #39513 (In Progress): mimic: osd: segv in _preboot -> heartbeat
https://github.com/ceph/ceph/pull/28220 Prashant D
03:33 AM Backport #39539 (In Progress): nautilus: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent...
https://github.com/ceph/ceph/pull/28219 Prashant D
12:47 AM Bug #18643: SnapTrimmer: inconsistencies may lead to snaptrimmer hang
Do we still need to fix something here? https://github.com/ceph/ceph/pull/15635 at least sets a pg to snaptrim_error... David Zafman

05/22/2019

10:00 PM Bug #39555 (In Progress): backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
David Zafman
06:01 AM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
The pull request https://github.com/ceph/ceph/pull/28204 generates a warning with a better message.
health: HE...
David Zafman
02:27 PM Bug #40000 (New): osds do not bound xattrs and/or aggregate xattr data in pg log
Currently we are having our cluster in an HEALTH_ERR state with 4 PGs inactive (3 of which are "peering" and 4th is "... Vaibhav Bhembre
02:11 PM Bug #39978: Adding OSD to Luminous Cluster will crash the active mon
Indeed the issue is related to adding a new host to the crush map.
I fixed it by manually adding the host to the cru...
Henry Spanka
09:32 AM Bug #39997 (New): not able to create osd keyring
i have set up i mon and i mgr and two osds in a single node.
when i try to create osd keyring via following command:...
pooja gupta
07:07 AM Backport #39475 (In Progress): mimic: segv in fgets() in collect_sys_info reading /proc/cpuinfo
https://github.com/ceph/ceph/pull/28206 Prashant D
07:06 AM Backport #39519 (In Progress): nautilus: snaps missing in mapper, should be: ca was r -2...repaired
https://github.com/ceph/ceph/pull/28205 Prashant D
06:50 AM Bug #24531: Mimic MONs have slow/long running ops
Joao sent this as a possible fix: https://github.com/ceph/ceph/pull/28177 Dan van der Ster
06:41 AM Bug #24531: Mimic MONs have slow/long running ops
The attached file is three mon's dump_historic_slow_ops file.
I deploy v13.2.5 ceph by rook in kunnertes cluster,I...
jun gong

05/21/2019

10:41 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
I know that, but the issue doesn't occur with those osds. The issue occurs with the ssds (checked with `ceph pg ls ba... Rene Diepstraten
10:36 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
Rene, you have what looks more like an expected situation. With some OSDs showing as high as 72% utilization, a big ... David Zafman
09:36 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
Here I can reproduce the issue with the ssd class.
We're in the process of reinstalling/redeploying (one host with a...
Rene Diepstraten
09:21 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
Seeing a "ceph osd df" like Alex provided is helpful in determining what is going on. Looking at it repeatedly while... David Zafman
05:06 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)

Erik:
New code that estimates and reserves final backfill space requirement is not present in v13.2.5. It isn't ...
David Zafman
09:05 PM Backport #39699 (In Progress): nautilus: OSD down on snaptrim.
David Zafman
05:57 PM Backport #39698 (In Progress): mimic: OSD down on snaptrim.
David Zafman
05:51 PM Backport #38341 (In Progress): mimic: pg stuck in backfill_wait with plenty of disk space
David Zafman
02:20 PM Bug #24531: Mimic MONs have slow/long running ops
same problem with Dan van der Ster,on a v13.2.5 cluster five hours ago.
I restart osd.0 when monitor logs show oldes...
jun gong
03:03 AM Backport #39516 (In Progress): nautilus: osd-backfill-space.sh test failed in TEST_backfill_multi...
https://github.com/ceph/ceph/pull/28187 Prashant D

05/20/2019

11:47 PM Backport #39719 (In Progress): luminous: short pg log+nautilus-p2p-stress-split: "Error: finished...
David Zafman
01:38 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
Just to chime in: We too have seen this (on 13.2.5) on OSDs that are only 10-20% full.
It always (magically) clear...
Erik Lindahl
10:38 AM Bug #39978 (Duplicate): Adding OSD to Luminous Cluster will crash the active mon
I recently upgraded my cluster to Luminous v12.2.11. While adding a new OSD the active monitor crashes (attempt to fr... Henry Spanka
07:34 AM Bug #39972 (Fix Under Review): librados 'buffer::create' and related functions are not exported i...
Jason Dillaman
06:51 AM Bug #39972 (Resolved): librados 'buffer::create' and related functions are not exported in C++ API
Currently, there is no way to create any 'buffer::raw' objects since they are no longer exposed (since Nautilus) via ... Jason Dillaman
02:45 AM Backport #39514 (In Progress): nautilus: osd: segv in _preboot -> heartbeat
https://github.com/ceph/ceph/pull/28164 Prashant D

05/18/2019

10:04 AM Feature #39966 (New): mon: allow log messages to be throttled and/or force trimming
If some daemon is sending a lot of cluster log messages, we need a way to
- throttle, filter, or block them
- for...
Sage Weil

05/17/2019

06:41 AM Bug #39956: OSD:Cancel copy op causes memory leak
If two clients access the same snap object at the same time, and the object needs to promote, before the promote is c... tao ning
02:46 AM Bug #39956 (New): OSD:Cancel copy op causes memory leak
ceph version 12.2.7
==00:00:06:00.712 3722687== 15,237,248 (2,770,560 direct, 12,466,688 indirect) bytes in 3,848 ...
tao ning
03:13 AM Backport #39512 (In Progress): nautilus: osd acting cycle
https://github.com/ceph/ceph/pull/28160 Prashant D

05/16/2019

04:21 PM Bug #38827: valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authent...
The RGW verify suite has commented out the lines running valgrind on the mon.
https://github.com/ceph/ceph/pull/2815...
Ali Maredia
11:31 AM Backport #38850: upgrade: 1 nautilus mon + 1 luminous mon can't automatically form quorum
I have been working on it, able to reproduce, just unable yet to pin down the cause.
Reproducing basically takes t...
Joao Eduardo Luis
02:00 AM Backport #39422 (In Progress): mimic: Don't mark removed osds in when running "ceph osd in any|al...
https://github.com/ceph/ceph/pull/28142 Prashant D
01:41 AM Backport #39476 (In Progress): nautilus: segv in fgets() in collect_sys_info reading /proc/cpuinfo
https://github.com/ceph/ceph/pull/28141 Prashant D

05/15/2019

03:50 PM Backport #39373 (In Progress): luminous: ceph tell osd.xx bench help : gives wrong help
Nathan Cutler
03:44 PM Backport #38750 (In Progress): luminous: should report EINVAL in ErasureCode::parse() if m<=0
Nathan Cutler
03:36 PM Backport #38880 (In Progress): luminous: ENOENT in collection_move_rename on EC backfill target
Nathan Cutler
06:59 AM Backport #39374 (In Progress): mimic: ceph tell osd.xx bench help : gives wrong help
https://github.com/ceph/ceph/pull/28097 Prashant D

05/14/2019

11:49 AM Backport #39720 (In Progress): mimic: short pg log+nautilus-p2p-stress-split: "Error: finished ti...
Nathan Cutler
11:48 AM Backport #39721 (In Progress): nautilus: short pg log+nautilus-p2p-stress-split: "Error: finished...
Nathan Cutler
11:41 AM Backport #39744 (Resolved): mimic: mon: "FAILED assert(pending_finishers.empty())" when paxos res...
https://github.com/ceph/ceph/pull/28540 Nathan Cutler
11:41 AM Backport #39743 (Resolved): nautilus: mon: "FAILED assert(pending_finishers.empty())" when paxos ...
https://github.com/ceph/ceph/pull/28528 Nathan Cutler
11:40 AM Backport #39738 (Resolved): nautilus: Binary data in OSD log from "CRC header" message
https://github.com/ceph/ceph/pull/28504 Nathan Cutler
11:40 AM Backport #39737 (Resolved): mimic: Binary data in OSD log from "CRC header" message
https://github.com/ceph/ceph/pull/28503 Nathan Cutler

05/13/2019

11:15 PM Bug #39723 (Duplicate): osd: valgrind Leak_DefinitelyLost
... Patrick Donnelly
10:17 PM Backport #39721 (Resolved): nautilus: short pg log+nautilus-p2p-stress-split: "Error: finished ti...
https://github.com/ceph/ceph/pull/28088 David Zafman
10:16 PM Backport #39720 (Resolved): mimic: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3...
https://github.com/ceph/ceph/pull/28089 David Zafman
10:16 PM Backport #39719 (Resolved): luminous: short pg log+nautilus-p2p-stress-split: "Error: finished ti...
https://github.com/ceph/ceph/pull/28185 David Zafman
08:21 PM Bug #39304 (Pending Backport): short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 whe...
David Zafman
08:09 PM Bug #39304 (Resolved): short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_a...
David Zafman
08:08 PM Bug #39582 (Pending Backport): Binary data in OSD log from "CRC header" message
David Zafman
03:31 AM Bug #39665: kstore: memory may leak on KStore::_do_read_stripe
https://github.com/ceph/ceph/pull/28056 Shanchun Lv
02:27 AM Backport #39421 (In Progress): nautilus: Don't mark removed osds in when running "ceph osd in any...
https://github.com/ceph/ceph/pull/28072 Prashant D

05/12/2019

12:24 AM Bug #24974: Segmentation fault in tcmalloc::ThreadCache::ReleaseToCentralCache()
dzafman-2019-05-09_20:06:24-rados-wip-zafman-testing-distro-basic-smithi/3943901... David Zafman

05/11/2019

03:43 PM Backport #39205: nautilus: osd: leaked pg refs on shutdown
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27803
merged
Yuri Weinstein

05/10/2019

09:27 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
The bucket was created on 2017-01-26 while the cluster was running the 0.94.3 (Hammer) release. Also the cluster has... Bryan Stillwell
09:22 PM Bug #39484 (Pending Backport): mon: "FAILED assert(pending_finishers.empty())" when paxos restart
Sage Weil
09:13 PM Backport #38881 (Resolved): nautilus: ENOENT in collection_move_rename on EC backfill target
Nathan Cutler
03:19 PM Backport #38881: nautilus: ENOENT in collection_move_rename on EC backfill target
Neha Ojha wrote:
> https://github.com/ceph/ceph/pull/27654
merged
Yuri Weinstein
09:12 PM Backport #39504 (Resolved): nautilus: Give recovery for inactive PGs a higher priority
Nathan Cutler
03:18 PM Backport #39504: nautilus: Give recovery for inactive PGs a higher priority
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27854
merged
Yuri Weinstein
05:32 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
/a/nojha-2019-05-10_00:33:57-upgrade-wip-parial-recovery-2019-05-09-distro-basic-smithi/3943156/ Neha Ojha
12:33 PM Backport #39206 (Resolved): mimic: osd: leaked pg refs on shutdown
Nathan Cutler
12:23 PM Backport #39220 (Resolved): mimic: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_miss...
Nathan Cutler
12:23 PM Backport #38443 (Resolved): mimic: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Nathan Cutler
12:22 PM Backport #38879 (Resolved): mimic: ENOENT in collection_move_rename on EC backfill target
Nathan Cutler
12:13 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
I'm having this issue on Nautilus (14.2.1) but there is no way OSDs can be full as I'm not using more than 6.8% raw s... Alex Cucu
11:00 AM Backport #39700 (Resolved): nautilus: [RFE] If the nodeep-scrub/noscrub flags are set in pools in...
https://github.com/ceph/ceph/pull/29991 Nathan Cutler
11:00 AM Backport #39699 (Resolved): nautilus: OSD down on snaptrim.
https://github.com/ceph/ceph/pull/28203 Nathan Cutler
11:00 AM Backport #39698 (Resolved): mimic: OSD down on snaptrim.
https://github.com/ceph/ceph/pull/28202 Nathan Cutler
10:59 AM Backport #39694 (Rejected): luminous: _txc_add_transaction error (39) Directory not empty not han...
Nathan Cutler
10:59 AM Backport #39693 (Resolved): nautilus: _txc_add_transaction error (39) Directory not empty not han...
https://github.com/ceph/ceph/pull/29115 Nathan Cutler
10:58 AM Backport #39692 (Resolved): mimic: _txc_add_transaction error (39) Directory not empty not handle...
https://github.com/ceph/ceph/pull/29217 Nathan Cutler
10:56 AM Backport #39682 (Resolved): nautilus: filestore pre-split may not split enough directories
https://github.com/ceph/ceph/pull/29988 Nathan Cutler
10:56 AM Backport #39681 (Rejected): luminous: filestore pre-split may not split enough directories
Nathan Cutler
08:29 AM Bug #39665 (Resolved): kstore: memory may leak on KStore::_do_read_stripe
On testing kstore, we found that memory leaks when execute read ops. The root cause is when execute read ops, the in-... Shanchun Lv
04:02 AM Bug #39661: kstore: memory may leak on KStore::_do_read_stripe
There is no need to cache the in-flight stripes on read process, we can just discard it on read ops. Shanchun Lv
03:53 AM Bug #39661 (New): kstore: memory may leak on KStore::_do_read_stripe
On testing kstore, we found that memory leaks when execute read ops. The root cause is when execute read ops, the in-... Shanchun Lv
03:34 AM Bug #39636 (Resolved): osd: PeeringState valgrind error UninitCondition
Kefu Chai
03:34 AM Bug #39636 (Fix Under Review): osd: PeeringState valgrind error UninitCondition
Kefu Chai
12:16 AM Bug #39659 (New): FAILED ceph_assert(info.history.same_interval_since != 0)
http://pulpito.ceph.com/sjust-2019-05-09_13:40:11-smoke-sjust-wip-peering-state-cleanup-distro-basic-smithi/3942704/ ... Samuel Just

05/09/2019

11:38 PM Bug #38893: RuntimeError: expected MON_CLOCK_SKEW but got none
rados/multimon/{clusters/3.yaml msgr-failures/few.yaml msgr/async-v2only.yaml objectstore/bluestore-stupid.yaml rados... Neha Ojha
04:22 PM Bug #38827: valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authent...
Is this being actively worked on?
How close are we to a fix on this?
I would like to make this a high priority ...
J. Eric Ivancich
03:48 PM Backport #39206: mimic: osd: leaked pg refs on shutdown
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27938
merged
Yuri Weinstein
02:22 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
The cluster itself followed this upgrade path:
0.94.10 -> 10.2.10 -> 12.2.5 -> 12.2.8
We will look into the histo...
Wes Dillingham
12:12 AM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Hi Wes,
Can you check if the bucket on which you are seeing issues, was present in a jewel cluster. We have seen s...
Neha Ojha
07:14 AM Bug #39390 (Pending Backport): filestore pre-split may not split enough directories
Kefu Chai
07:10 AM Bug #38124: OSD down on snaptrim.
Greg Farnum wrote:
> No ETA; it'll have to wend its way through the backports process. I don't think any releases ar...
Erikas Kučinskis
03:40 AM Bug #39636: osd: PeeringState valgrind error UninitCondition
Found it, testing. Samuel Just
03:05 AM Backport #39375 (In Progress): nautilus: ceph tell osd.xx bench help : gives wrong help
https://github.com/ceph/ceph/pull/28035 Prashant D
02:05 AM Bug #21174 (Rejected): OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_up...
I'm closing this bug. The hardware configuration must make data safe that has been sync'ed to disk. This requires th... David Zafman
01:49 AM Documentation #39011 (Resolved): Document how get_recovery_priority() and get_backfill_priority()...
David Zafman
01:49 AM Bug #39304 (Fix Under Review): short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 whe...
David Zafman
12:31 AM Bug #23145: OSD crashes during recovery of EC pg
FWIW, I'm running into this, too (on Nautilus). I've got 2 OSD's in this situation. Let me know if you want any debug... Richard Hesse

05/08/2019

11:57 PM Support #39594: OSD marked as down, had timed out after 15, handle_connect_reply connect got RESE...
The ceph-users mailing list might be good place to seek help on this kind of issue. Neha Ojha
09:33 PM Bug #38124 (Pending Backport): OSD down on snaptrim.
No ETA; it'll have to wend its way through the backports process. I don't think any releases are imminent so it shoul... Greg Farnum
09:24 PM Bug #39636: osd: PeeringState valgrind error UninitCondition
Rebuilding without inlining to narrow down the problem. Samuel Just
06:40 PM Bug #39636 (Resolved): osd: PeeringState valgrind error UninitCondition
... Patrick Donnelly
07:42 PM Backport #39220: mimic: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(soid) |...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27940
merged
Yuri Weinstein
07:19 PM Backport #38443: mimic: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27907
merged
Yuri Weinstein
07:18 PM Backport #38879: mimic: ENOENT in collection_move_rename on EC backfill target
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27943
merged
Yuri Weinstein
06:04 PM Bug #39581: osd/PG.cc: 2523: FAILED ceph_assert(scrub_queued)
/a/nojha-2019-05-07_17:20:56-rados-fix-pg-notify-distro-basic-smithi/3938003/ Neha Ojha
04:53 PM Bug #38195: osd-backfill-space.sh exposes rocksdb hang
another instance in mimic: /a/yuriw-2019-05-07_14:33:13-rados-wip-yuri-testing-2019-05-06-2158-mimic-distro-basic-smi... Neha Ojha
03:39 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Hi Neha,
I am on Bryan's team. Bryan is out this week but is returning soon.
I was able to inspect logs for abo...
Wes Dillingham
03:19 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)

The backfull_toofull state is like backfill_wait except that it indicates the reason that backfill can not proceed ...
David Zafman

05/07/2019

05:50 PM Bug #38724 (Pending Backport): _txc_add_transaction error (39) Directory not empty not handled on...
Sage Weil
05:38 PM Feature #38029 (Pending Backport): [RFE] If the nodeep-scrub/noscrub flags are set in pools inste...
Vikhyat Umrao
10:16 AM Bug #38124: OSD down on snaptrim.
Erikas Kučinskis wrote:
> Hi is there any ETA when the bug fix will be live?
Erikas Kučinskis
10:15 AM Bug #38124: OSD down on snaptrim.
Hi is there any ETA when the bug will be live? Erikas Kučinskis
07:30 AM Backport #39506: mimic: Give recovery for inactive PGs a higher priority
Assigning to Neha based on http://tracker.ceph.com/issues/39099#note-11 Nathan Cutler
07:29 AM Backport #39505: luminous: Give recovery for inactive PGs a higher priority
Assigning to Neha based on http://tracker.ceph.com/issues/39099#note-11 Nathan Cutler
06:13 AM Bug #16553: Removing Writeback Cache Tier Does not clean up Incomplete_Clones
Still hit the same issue on 12.2.10 Jun Yang
05:49 AM Backport #39311 (In Progress): mimic: crushtool crash on Fedora 28 and newer
https://github.com/ceph/ceph/pull/27986 Prashant D

05/06/2019

09:44 PM Bug #25182 (Resolved): Upmaps forgotten after restarting OSDs
Thanks for verifying the fixes Bryan. Looks like those are all backported to mimic + luminous. Josh Durgin
09:52 AM Support #39594 (New): OSD marked as down, had timed out after 15, handle_connect_reply connect go...
Hi,
recently we saw random slow requests in our cluster. in monitor ceph.log I could see that at the same time OSD...
Alon Avrahami
09:18 AM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
This may be the case indeed, but I'd expect that unless pgs are evacuated, the state would be backfill_wait, not back... Rene Diepstraten

05/05/2019

09:13 PM Bug #39152: nautilus osd crash: Caught signal (Aborted) tp_osd_tp
Sage Weil wrote:
> I'm guessing this is a dup of #38724
>
> Wen, can you tell us what the cluster workload was? ...
K Jarrett
 

Also available in: Atom