Project

General

Profile

Activity

From 03/13/2019 to 04/11/2019

04/11/2019

08:40 PM Bug #38840 (In Progress): snaps missing in mapper, should be: ca was r -2...repaired
David Zafman
07:14 PM Bug #39263 (Resolved): rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) Shutting...
... Neha Ojha
04:50 PM Bug #21388 (Duplicate): inconsistent pg but repair does nothing reporting head data_digest != dat...
This was merged to master Jul 31, 2018 in https://github.com/ceph/ceph/pull/23217 for a different tracker. David Zafman
04:32 PM Bug #39099 (In Progress): Give recovery for inactive PGs a higher priority
David Zafman
12:36 PM Bug #39249: Some PGs stuck in active+remapped state
OSD.11 previously took part in this PG. I don't know now if as primary or not. The bug happened after I made `ceph os... Марк Коренберг
12:35 PM Bug #39249: Some PGs stuck in active+remapped state
... Марк Коренберг
12:23 PM Bug #39249: Some PGs stuck in active+remapped state
... Марк Коренберг
12:22 PM Bug #39249: Some PGs stuck in active+remapped state
... Марк Коренберг
12:22 PM Bug #39249 (Closed): Some PGs stuck in active+remapped state
Sometimes my PGs stuck in this state. When I stop primary OSD containig this PG, it becomes `active+undersized+degrad... Марк Коренберг
12:14 PM Feature #39248 (New): Add ability to limit number of simultaneously backfilling PGs
I want to reduce affect of `ceph osd out osd.xxx`. A already set
--osd-recovery-max-active 1
--osd-max-backfills ...
Марк Коренберг
11:46 AM Bug #38783: Changing mon_pg_warn_max_object_skew has no effect.
Injecting into mgr has solved the issue, thanks! Andrew Mitroshin
11:07 AM Backport #39239: luminous: "sudo yum -y install python34-cephfs" fails on mimic
note to myself or anyone who wants to backport this change to luminous, you need to blacklist the python36 package wh... Nathan Cutler
10:59 AM Backport #39239 (Resolved): luminous: "sudo yum -y install python34-cephfs" fails on mimic
https://github.com/ceph/ceph/pull/28493 Nathan Cutler
10:59 AM Bug #39164 (Pending Backport): "sudo yum -y install python34-cephfs" fails on mimic
Nathan Cutler
10:54 AM Backport #39236 (In Progress): nautilus: "sudo yum -y install python34-cephfs" fails on mimic
Nathan Cutler
02:46 AM Backport #39236: nautilus: "sudo yum -y install python34-cephfs" fails on mimic
https://github.com/ceph/ceph/pull/27505 Kefu Chai
02:44 AM Backport #39236 (Resolved): nautilus: "sudo yum -y install python34-cephfs" fails on mimic
https://github.com/ceph/ceph/pull/27505 Kefu Chai
07:56 AM Bug #39174 (In Progress): crushtool crash on Fedora 28 and newer
Brad Hubbard
07:10 AM Bug #39174 (Fix Under Review): crushtool crash on Fedora 28 and newer
Brad Hubbard
06:02 AM Bug #39174: crushtool crash on Fedora 28 and newer
https://bugzilla.redhat.com/show_bug.cgi?id=1515858 Brad Hubbard
04:36 AM Bug #39174: crushtool crash on Fedora 28 and newer
Turning up verbosity gives clues to what might be the problem.... Brad Hubbard
02:31 AM Bug #39174: crushtool crash on Fedora 28 and newer
Brad Hubbard
02:30 AM Bug #39174: crushtool crash on Fedora 28 and newer
Vasu Kulkarni wrote:
> very good reason to drop one distro in teuthology and replace it with fedora 28, I think Brad...
Brad Hubbard
02:47 AM Backport #39237 (In Progress): mimic: "sudo yum -y install python34-cephfs" fails on mimic
Kefu Chai
02:47 AM Backport #39237 (Resolved): mimic: "sudo yum -y install python34-cephfs" fails on mimic
https://github.com/ceph/ceph/pull/27476 Kefu Chai

04/10/2019

11:51 PM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
Ah, that's because a jewel osd does not know how to deal with this REJECT in the Started/ReplicaActive/RepNotRecoveri... Neha Ojha
02:22 AM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
Fails in 1 out of 20 runs http://pulpito.ceph.com/nojha-2019-04-09_17:54:07-rados:upgrade:jewel-x-singleton-luminous-... Neha Ojha
11:46 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
mon.c timeline:
2019-04-06 08:58:28.846 hits a lease timeout and triggers the election process
2019-04-06 08:58:28....
Greg Farnum
10:03 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
Greg Farnum wrote:
> The monitor was out of quorum for 30 minutes; it probably has to do with holding on to client c...
Patrick Donnelly
09:59 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
The monitor was out of quorum for 30 minutes; it probably has to do with holding on to client connections or else not... Greg Farnum
10:19 PM Backport #38720 (Resolved): mimic: crush: choose_args array size mis-sized when weight-sets are e...
Nathan Cutler
10:18 PM Bug #38826 (Resolved): upmap broken the crush rule
Nathan Cutler
10:18 PM Backport #38858 (Resolved): mimic: upmap broken the crush rule
Nathan Cutler
09:48 PM Bug #39085 (Resolved): monmap created timestamp may be blank
Sage Weil
09:12 PM Bug #39085 (Pending Backport): monmap created timestamp may be blank
Neha Ojha
09:45 PM Bug #38359 (Fix Under Review): osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Sage Weil
09:45 PM Bug #38359: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
npoe, that didn't fix it:
/a/sage-2019-04-10_15:25:57-rados-wip-sage4-testing-2019-04-10-0709-distro-basic-smithi/3...
Sage Weil
09:36 PM Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd
Hmm, maybe the pg_map is purged of any OSD marked out? Although you can have up OSDs that are out so that shouldn't b... Greg Farnum
09:30 PM Bug #39174: crushtool crash on Fedora 28 and newer
very good reason to drop one distro in teuthology and replace it with fedora 28, I think Brad brought this up long ti... Vasu Kulkarni
08:30 PM Bug #39174 (Resolved): crushtool crash on Fedora 28 and newer
On Fedora 29, Fedora 30, and RHEL 8, /usr/bin/crushtool crashes when trying to compile the map that Rook uses.
<pr...
Ken Dreyer
09:28 PM Bug #39054 (Closed): osd push failed because local copy is 4394'133607637
As Jewel is an outdated release and you ran the potentially-destructive repair tools, you'll have better luck taking ... Greg Farnum
09:16 PM Backport #38904 (In Progress): mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to ...
Nathan Cutler
09:16 PM Backport #38906 (Resolved): nautilus: osd/PGLog.h: print olog_can_rollback_to before deciding to ...
Nathan Cutler
09:14 PM Bug #39039: mon connection reset, command not resent
So it's not the command specifically but that the client doesn't reconnect to a working monitor, right? Greg Farnum
09:10 PM Backport #38442 (Resolved): luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Nathan Cutler
09:07 PM Backport #39220 (Resolved): mimic: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_miss...
https://github.com/ceph/ceph/pull/27940 Nathan Cutler
09:07 PM Bug #36598 (Can't reproduce): osd: "bluestore(/var/lib/ceph/osd/ceph-6) ENOENT on clone suggests ...
This has not shown up recently, so maybe this got resolved as a result of http://tracker.ceph.com/issues/36739 being ... Neha Ojha
09:07 PM Backport #39219 (Resolved): nautilus: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_m...
https://github.com/ceph/ceph/pull/27839 Nathan Cutler
09:07 PM Backport #39218 (Resolved): luminous: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_m...
https://github.com/ceph/ceph/pull/27878 Nathan Cutler
09:05 PM Backport #39206 (Resolved): mimic: osd: leaked pg refs on shutdown
https://github.com/ceph/ceph/pull/27938 Nathan Cutler
09:05 PM Backport #39205 (Resolved): nautilus: osd: leaked pg refs on shutdown
https://github.com/ceph/ceph/pull/27803 Nathan Cutler
09:05 PM Backport #39204 (Resolved): luminous: osd: leaked pg refs on shutdown
https://github.com/ceph/ceph/pull/27810 Nathan Cutler
09:01 PM Bug #39175 (Resolved): RGW DELETE calls partially missed shortly after OSD startup
We have two separate clusters (physically 2,000+ miles apart) that are seeing
PGs going inconsistent while doing reb...
Bryan Stillwell
04:06 PM Feature #39162 (In Progress): Improvements to standalone tests.
David Zafman
05:58 AM Bug #38892: /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation fault
See https://github.com/ceph/ceph/pull/27479 for a viable workaround. Note that this is a bug in gcc7 [1] and the pref... Brad Hubbard
04:46 AM Backport #38567 (In Progress): luminous: osd_recovery_priority is not documented (but osd_recover...
Nathan Cutler
04:16 AM Bug #39164: "sudo yum -y install python34-cephfs" fails on mimic
note to myself or anyone who wants to backport this change to luminous, you need to blacklist the python36 package wh... Kefu Chai
04:13 AM Bug #39164 (Fix Under Review): "sudo yum -y install python34-cephfs" fails on mimic
Kefu Chai
03:24 AM Bug #39164 (Resolved): "sudo yum -y install python34-cephfs" fails on mimic
see http://pulpito.ceph.com/yuriw-2019-04-09_19:20:36-multimds-wip-yuri3-testing-2019-04-08-2038-mimic-testing-basic-... Kefu Chai
03:56 AM Bug #38582: Pool storage MAX AVAIL reduction seems higher when single OSD reweight is done
Correction in the description.
It looks like the pools MAX AVAIL value had dropped after there was a hard disk fail...
Nokia ceph-users

04/09/2019

10:22 PM Bug #38724 (Need More Info): _txc_add_transaction error (39) Directory not empty not handled on o...
logging level isn't high enough to tell what data is in this pg. :( Sage Weil
10:17 PM Bug #38786 (Fix Under Review): autoscale down can lead to max_pg_per_osd limit
https://github.com/ceph/ceph/pull/27473 Sage Weil
09:21 PM Feature #39162 (Resolved): Improvements to standalone tests.

Now that OSDs default to bluestore, need to fix the use of run_osd(). We should replace run_osd_bluestore() with r...
David Zafman
08:29 PM Backport #38567: luminous: osd_recovery_priority is not documented (but osd_recovery_op_priority is)
https://github.com/ceph/ceph/pull/27471 Neha Ojha
02:54 PM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
From the osd log including the thread before the crash.... David Zafman
02:36 PM Bug #38219 (Fix Under Review): rebuild-mondb hangs
Kefu Chai
12:25 PM Bug #39159 (Resolved): qa: Fix ambiguous store_thrash thrash_store in mon_thrash.py
Both store_thrash and thrash_store names are used for the same thing in mon_thrash.py. 'thrash_store' is used here: h... Jos Collin
08:13 AM Bug #39154 (Resolved): Don't mark removed osds in when running "ceph osd in any|all|*"
To reproduce.... Brad Hubbard
01:47 AM Bug #23030 (Fix Under Review): osd: crash during recovery with assert(p != recovery_info.ss.clone...
https://github.com/ceph/ceph/pull/27273 Neha Ojha
01:04 AM Bug #39152 (Duplicate): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
OSD continously crashed
-1> 2019-04-08 17:47:06.615 7f3f3ef62700 -1 /build/ceph-14.2.0/src/os/bluestore/Bl...
Wen Wei

04/08/2019

11:00 PM Bug #37264 (Resolved): scrub warning check incorrectly uses mon scrub interval
David Zafman
10:49 PM Bug #26971 (Duplicate): failed to become clean before timeout expired
David Zafman
10:18 PM Bug #26971: failed to become clean before timeout expired
see http://tracker.ceph.com/issues/39149 Sage Weil
10:15 PM Bug #26971: failed to become clean before timeout expired
oh, it's because there's alos 1/10th the probability of choosing the second host:... Sage Weil
07:59 PM Bug #26971: failed to become clean before timeout expired
This is just CRUSH failing. I extracted the osdmap from the data/mon.a.tgz and verified with osdmaptool that it's ju... Sage Weil
10:37 PM Bug #39150 (Resolved): mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
... Patrick Donnelly
08:42 PM Bug #39148 (New): luminous: powercycle: reached maximum tries (500) after waiting for 3000 seconds
... Neha Ojha
07:02 PM Bug #39145 (New): luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine eve...
... Neha Ojha
05:14 PM Bug #37775: some pg_created messages not sent to mon
/a/yuriw-2019-04-04_00:00:53-rados-luminous-distro-basic-smithi/3806121/ Neha Ojha

04/05/2019

08:56 PM Bug #26971: failed to become clean before timeout expired
The up set seems to be problem here.
This is the point when we find out that osd.5 is down...
Neha Ojha
08:50 PM Backport #38720: mimic: crush: choose_args array size mis-sized when weight-sets are enabled
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27082
merged
Yuri Weinstein
08:50 PM Backport #38858: mimic: upmap broken the crush rule
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27257
merged
Yuri Weinstein
08:30 PM Bug #39087: ec_lost_unfound: a EC shard has missing object after `osd lost`
/a/yuriw-2019-04-02_20:09:55-rados-wip-yuri3-testing-2019-04-02-1623-mimic-distro-basic-smithi/3801955/ - looks like ... Neha Ojha
08:06 PM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
/a/yuriw-2019-04-02_20:09:55-rados-wip-yuri3-testing-2019-04-02-1623-mimic-distro-basic-smithi/3801823/ Neha Ojha
08:04 PM Backport #38906: nautilus: osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27302
merged
Yuri Weinstein
05:45 PM Feature #38940: Allow marking noout by failure domain for maintainance and planned downtime.
Also related: https://github.com/rook/rook/issues/2825 Blaine Gardner
05:43 PM Feature #38940: Allow marking noout by failure domain for maintainance and planned downtime.
Relevant discussion as this relates to Rook https://github.com/rook/rook/issues/2253 Blaine Gardner
04:16 PM Bug #37509: require past_interval bounds mismatch due to osd oldest_map
/a/yuriw-2019-04-05_00:28:05-rados-wip-yuri2-testing-2019-04-04-1953-nautilus-distro-basic-smithi/3811215/ Neha Ojha
04:12 PM Bug #38238: rados/test.sh: api_aio_pp doesn't seem to start
/a/yuriw-2019-04-05_00:28:05-rados-wip-yuri2-testing-2019-04-04-1953-nautilus-distro-basic-smithi/3811205/ Neha Ojha
12:00 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Yes, most likely the issue was triggered by a power outage, the 2x OSD FAILED assert and the cluster is unable to rec... Grant Slater
11:07 AM Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash
The fix so far is switching the osd back to filestore. Iain Buclaw
08:36 AM Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash
Another PG.... Iain Buclaw
07:40 AM Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash
... Iain Buclaw
06:48 AM Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash
David Zafman wrote:
> Please find a stack trace in the osd log. Is there an assert that would look like this?
> ...
Iain Buclaw
08:23 AM Bug #39115: ceph pg repair doesn't fix itself if osd is bluestore
Another PG, where the missing is reported on osd.0/filestore (not osd.9/bluestore in the previous).... Iain Buclaw
08:08 AM Bug #39115: ceph pg repair doesn't fix itself if osd is bluestore
... Iain Buclaw
07:08 AM Bug #39115: ceph pg repair doesn't fix itself if osd is bluestore
David Zafman wrote:
> It would be helpful to see a ceph pg deep-scrub (wait for it to finish) followed by the output...
Iain Buclaw
07:14 AM Bug #39120 (New): rados: Segmentation fault in thread 7f0aebfff700 thread_name:fn_anonymous
... Iain Buclaw

04/04/2019

09:29 PM Bug #38931: osd does not proactively remove leftover PGs
Greg Farnum wrote:
> So should we backport part of that PR, Neha?
>
> To answer your question more directly, Dan:...
Neha Ojha
08:37 PM Bug #38931: osd does not proactively remove leftover PGs
So should we backport part of that PR, Neha?
To answer your question more directly, Dan: OSDs don't delete PGs the...
Greg Farnum
08:51 PM Bug #38900: EC pools don't self repair on client read error
Yes, client IO is served. The PG is degraded, but the PG state won't necessarily reflect that. David Zafman
08:33 PM Bug #38900: EC pools don't self repair on client read error
Just to be clear, this means the object remains degraded, but client IO continues to be served? Greg Farnum
08:32 PM Backport #38850: upgrade: 1 nautilus mon + 1 luminous mon can't automatically form quorum
This is super weird; the only other recent reference I see to min_mon_release is https://github.com/ceph/ceph/pull/27... Greg Farnum
04:52 PM Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash
Please find a stack trace in the osd log. Is there an assert that would look like this?
/build/ceph-13.2.5-g###...
David Zafman
03:24 PM Bug #39116 (New): Draining filestore osd, removing, and adding new bluestore osd causes OSDs to c...
... Iain Buclaw
04:44 PM Bug #39115: ceph pg repair doesn't fix itself if osd is bluestore
It would be helpful to see a ceph pg deep-scrub (wait for it to finish) followed by the output of rados list-inconsis... David Zafman
03:15 PM Bug #39115 (Duplicate): ceph pg repair doesn't fix itself if osd is bluestore
Running ceph pg repair on an inconsistent PG with missing data, I usually notice that the OSD is marked as down/up be... Iain Buclaw
01:48 PM Bug #39111 (New): "ceph config set" accepts osd ID with letters
... Sébastien Han
09:44 AM Bug #38219: rebuild-mondb hangs
... Kefu Chai
04:19 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)

Here is what the bad log looks like that caused one of the crashes. Clearly _head_ is bad because the log ends wit...
David Zafman
02:40 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Maybe we could check each log in load_pgs(). If it is corrupt (head != head entry's version), move PG aside and igno... David Zafman

04/03/2019

11:28 PM Bug #39099 (Resolved): Give recovery for inactive PGs a higher priority

Backfill inactive gets priority 220 and we should make sure that if we can have inactive that needs recovery only i...
David Zafman
07:23 AM Bug #39087: ec_lost_unfound: a EC shard has missing object after `osd lost`
is this `scrub error` we expect? what we should do is to find out why ceph doesn't recovery PG 2.4s0 ? Chang Liu
07:16 AM Bug #39087 (New): ec_lost_unfound: a EC shard has missing object after `osd lost`
http://pulpito.ceph.com/kchai-2019-04-01_10:38:29-rados-wip-kefu-testing-2019-04-01-1531-distro-basic-mira/3797065/
...
Chang Liu
04:37 AM Feature #38616 (Resolved): Improvements to auto repair
David Zafman

04/02/2019

10:22 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Per request on irc.
pg log:
1.cas2 on osd.2: ceph-post-file: d74a0006-c0e9-41b1-a904-7bfe41617253
1.96s3 on osd....
Grant Slater
07:51 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Output from: ceph-objectstore-tool --no-mon-config --data-path /var/lib/ceph/osd/ceph-0 --op log --pgid 1.cas0
1.c...
Grant Slater
06:29 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Hi Grant, is there a way you could dump the pg log by using a command like this "ceph-objectstore-tool --no-mon-confi... Neha Ojha
09:51 PM Bug #39085 (Fix Under Review): monmap created timestamp may be blank
Sage Weil
09:51 PM Bug #39085: monmap created timestamp may be blank
https://github.com/ceph/ceph/pull/27327 Sage Weil
07:13 PM Bug #39085 (Resolved): monmap created timestamp may be blank
On at least one old cluster, monmap created timestamp is empty. lab cluster:... Sage Weil
07:27 PM Bug #38219: rebuild-mondb hangs
I reproduced this again on master, http://pulpito.ceph.com/nojha-2019-04-02_17:39:35-rados:singleton-master-distro-ba... Neha Ojha
11:46 AM Bug #38219: rebuild-mondb hangs
http://pulpito.ceph.com/kchai-2019-04-02_08:04:13-rados-wip-kefu-testing-2019-04-01-1531-distro-basic-smithi/ Kefu Chai
01:02 PM Bug #38124: OSD down on snaptrim.
Hello it's been two months now is there any update about this bug? Erikas Kučinskis
12:19 PM Bug #38783: Changing mon_pg_warn_max_object_skew has no effect.
xie xingguo
12:18 PM Bug #38783: Changing mon_pg_warn_max_object_skew has no effect.
It's an mgr option. You should instead inject it to the mgr daemon. xie xingguo
05:56 AM Backport #38905 (In Progress): luminous: osd/PGLog.h: print olog_can_rollback_to before deciding ...
https://github.com/ceph/ceph/pull/27715 Prashant D
01:52 AM Backport #38983 (Resolved): nautilus: Improvements to auto repair
Sage Weil
12:12 AM Backport #38906 (In Progress): nautilus: osd/PGLog.h: print olog_can_rollback_to before deciding ...
https://github.com/ceph/ceph/pull/27302 Prashant D

04/01/2019

10:59 PM Bug #37439: Degraded PG does not discover remapped data on originating OSD
My proposal to fix this bug is to call @discover_all_missing@ not only if there are missing objects, but also when th... Jonas Jelten
09:11 PM Bug #37439: Degraded PG does not discover remapped data on originating OSD
Hi Jonas, thanks for creating a fix for this bug. Could you please upload the latest logs from nautilus, that you hav... Neha Ojha
08:58 PM Bug #37439 (Fix Under Review): Degraded PG does not discover remapped data on originating OSD
Neha Ojha
01:07 AM Bug #37439: Degraded PG does not discover remapped data on originating OSD
More findings, now on Nautilus 14.2.0:
OSD.62 once was part of pg 6.65, but content on it got remapped. A restart ...
Jonas Jelten
10:46 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Grant: I notice that the initial event outlined above is from October. Is that the very first anomalous behavior exh... Samuel Just
10:45 PM Feature #3362 (Resolved): Warn users before allowing pools to be created with more than N*<num_os...
Patrick Donnelly
09:22 PM Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26616
merged
Yuri Weinstein
09:22 PM Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26616
merged
Yuri Weinstein
04:10 PM Fix #39071 (New): monclient: initial probe is non-optimal with v2+v1
When we are probing both v2 and v1 addrs for mons, we treat them as separate mons, which means we might be probing N ... Sage Weil
02:25 PM Feature #39066 (Fix Under Review): src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
Kefu Chai
02:21 PM Feature #39066 (Resolved): src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
Currently it's only possible to run `...make; make tests -j8; ctest ...` on the same machine.
Please consider chan...
Kefu Chai
01:48 PM Bug #38945 (Pending Backport): osd: leaked pg refs on shutdown
Kefu Chai
01:09 PM Bug #38219: rebuild-mondb hangs
i am using following script to reproduce this issue locally, so far no luck... Kefu Chai
11:51 AM Bug #39059 (Can't reproduce): assert in ceph::net::SocketMessenger::unregister_conn()
... Kefu Chai
03:22 AM Bug #39056: localize-reads does not increment pg stats read count
when set the flag of '--localize-reads', maybe peer_pg will complete read task, but peer_pg will not count read_num.... Jilong li
03:09 AM Bug #39056 (New): localize-reads does not increment pg stats read count
when I mounted ceph-fuse, I setted the flag of '--localize-reads'. I found during the test that read_num count was In... Jilong li
12:52 AM Backport #38904: mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
https://github.com/ceph/ceph/pull/27284 Prashant D

03/31/2019

07:02 PM Bug #39055 (New): OSD's crash when specific PG is trying to backfill
Hi,
I've got a peculiar issue whereby a specific PG is trying to backfill it's objects to the other peers, but th...
Alex Tijhuis
12:08 PM Bug #39054 (Closed): osd push failed because local copy is 4394'133607637
ceph-osd.1.log:7085:2019-02-27 13:07:21.336004 7f666b5bb700 -1 log_channel(cluster) log [ERR] : 3.33 push 3:ccb8da9c:... yite gu

03/30/2019

07:14 PM Bug #38931: osd does not proactively remove leftover PGs
https://github.com/ceph/ceph/pull/27205/commits/f7c5b01e181630bb15e8b923b0334eb6adfdf50a Neha Ojha
06:15 PM Bug #39053 (New): changing pool crush rule may lead to IO stop

How to reproduce:
1. create some OSDs
2. change their class to, say, "xxx"
3. create replicated crush rule ref...
Марк Коренберг
01:37 PM Backport #38860 (Resolved): nautilus: upmap broken the crush rule
Sage Weil
08:46 AM Bug #38784 (Pending Backport): osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(...
xie xingguo
08:21 AM Backport #38854 (Resolved): luminous: .mgrstat failed to decode mgrstat state; luminous dev version?
Nathan Cutler
08:21 AM Backport #38859 (Resolved): luminous: upmap broken the crush rule
Nathan Cutler
08:20 AM Backport #38857 (Resolved): luminous: should set EPOLLET flag on del_event()
Nathan Cutler
08:18 AM Backport #39044 (Resolved): mimic: osd/PGLog: preserve original_crt to check rollbackability
https://github.com/ceph/ceph/pull/27629 Nathan Cutler
08:18 AM Backport #39043 (Resolved): nautilus: osd/PGLog: preserve original_crt to check rollbackability
https://github.com/ceph/ceph/pull/27632 Nathan Cutler
08:18 AM Backport #39042 (Resolved): luminous: osd/PGLog: preserve original_crt to check rollbackability
https://github.com/ceph/ceph/pull/27715 Nathan Cutler

03/29/2019

11:04 PM Bug #39039 (Duplicate): mon connection reset, command not resent
... Sage Weil
07:45 PM Backport #38854: luminous: .mgrstat failed to decode mgrstat state; luminous dev version?
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27207
merged
Yuri Weinstein
07:45 PM Backport #38859: luminous: upmap broken the crush rule
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27224
merged
Yuri Weinstein
07:44 PM Backport #38857: luminous: should set EPOLLET flag on del_event()
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27226
merged
Yuri Weinstein
07:12 AM Backport #38872 (In Progress): mimic: Rados.get_fsid() returning bytes in python3
https://github.com/ceph/ceph/pull/27259 Prashant D
04:47 AM Backport #38858 (In Progress): mimic: upmap broken the crush rule
https://github.com/ceph/ceph/pull/27257 Prashant D
03:04 AM Backport #38860 (In Progress): nautilus: upmap broken the crush rule
Prashant D

03/28/2019

10:23 PM Bug #39023 (Resolved): osd/PGLog: preserve original_crt to check rollbackability
Related to the issue discovered in https://tracker.ceph.com/issues/21174#note-11. Neha Ojha
07:12 PM Feature #39012 (Resolved): osd: distinguish unfound + impossible to find, vs start some down OSDs...

This may be a command that gets information from the primary of a pg listing unfound objects and where they may be ...
David Zafman
06:59 PM Documentation #39011 (Resolved): Document how get_recovery_priority() and get_backfill_priority()...

Describe the get_recovery_priority() and get_backfill_priority() as it relates to these constants:...
David Zafman
06:57 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Hi Grant,
Thanks for applying the patch and updating the logs. Looks like the earlier crash on osd.2(ENOENT on cl...
Neha Ojha
05:22 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
I am still seeing crashes with https://github.com/ceph/ceph/pull/27200 backported.
Attached are logs.
osd.2 cep...
Grant Slater
02:23 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
https://github.com/ceph/ceph/pull/27200 attempts to resolve the failure seen on osd.2 Neha Ojha
04:03 PM Bug #39006: ceph tell osd.xx bench help : gives wrong help
moreover, it says that first number is a count of block, but actually it is the count of bytes for whole operation:
...
Марк Коренберг
04:01 PM Bug #39006 (Resolved): ceph tell osd.xx bench help : gives wrong help
```
$ ceph tell osd.11 bench help
help not valid: help doesn't represent an int
Invalid command: unused arguments...
Марк Коренберг
12:34 PM Backport #38859 (In Progress): luminous: upmap broken the crush rule
Nathan Cutler
01:39 AM Backport #38859: luminous: upmap broken the crush rule
https://github.com/ceph/ceph/pull/27224 xie xingguo
11:10 AM Backport #38510 (Resolved): luminous: ceph CLI ability to change file ownership
Nathan Cutler
11:09 AM Backport #38562 (Resolved): luminous: mgr deadlock
Nathan Cutler
11:06 AM Backport #38903 (Resolved): nautilus: Minor rados related documentation fixes
Nathan Cutler
07:50 AM Bug #38945: osd: leaked pg refs on shutdown
please note, in luminous, we also need to stop @snap_sleep_timer@ and @scrub_sleep_timer@ into @OSDService::shutdown(... Kefu Chai
07:43 AM Bug #38945 (Fix Under Review): osd: leaked pg refs on shutdown
Kefu Chai
06:12 AM Bug #38892: /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation fault
per Brad
> If we see this again we could try temporarily adding "--param ggc-min-expand=1 --param ggc-min-heapsize...
Kefu Chai
03:22 AM Backport #38993 (Resolved): nautilus: unable to link rocksdb library if use system rocksdb
https://github.com/ceph/ceph/pull/27601 Kefu Chai
03:04 AM Bug #38992 (Resolved): unable to link rocksdb library if use system rocksdb
Kefu Chai
02:33 AM Backport #38750 (New): luminous: should report EINVAL in ErasureCode::parse() if m<=0
Prashant D
02:31 AM Backport #38750 (In Progress): luminous: should report EINVAL in ErasureCode::parse() if m<=0
Prashant D
02:21 AM Backport #38857 (In Progress): luminous: should set EPOLLET flag on del_event()
https://github.com/ceph/ceph/pull/27226 Prashant D
02:00 AM Backport #38860: nautilus: upmap broken the crush rule
https://github.com/ceph/ceph/pull/27225 xie xingguo

03/27/2019

10:56 PM Bug #38839: .mgrstat failed to decode mgrstat state; luminous dev version?
Sage, Could this have something to do with #38941 ? The timing is right. Brad Hubbard
05:00 PM Backport #38983 (In Progress): nautilus: Improvements to auto repair
David Zafman
04:24 PM Backport #38983 (Resolved): nautilus: Improvements to auto repair
https://github.com/ceph/ceph/pull/27220 Nathan Cutler
04:38 PM Bug #38784 (Fix Under Review): osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(...
Neha Ojha
04:01 AM Bug #26971: failed to become clean before timeout expired

dzafman-2019-03-26_16:39:54-rados:thrash-wip-zafman-26971-diag-distro-basic-smithi/3776762
Another run with diag...
David Zafman
03:44 AM Backport #38854 (In Progress): luminous: .mgrstat failed to decode mgrstat state; luminous dev ve...
https://github.com/ceph/ceph/pull/27207 Prashant D
01:54 AM Bug #38945 (Resolved): osd: leaked pg refs on shutdown
recovery_request_timer may hold some QueuePeeringEvts which PGRef,
if we dont shutdown it earlier, it potentially ca...
Zengran Zhang
01:37 AM Feature #38616: Improvements to auto repair
Also need to backport 0fb951963ff9d03a592bad0d4442049603195e25 with this. David Zafman

03/26/2019

11:49 PM Feature #38616 (Pending Backport): Improvements to auto repair
David Zafman
04:56 PM Backport #38510: luminous: ceph CLI ability to change file ownership
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26758
mergedReviewed-by: Sébastien Han <seb@redhat.com>
Yuri Weinstein
04:49 PM Backport #38562: luminous: mgr deadlock
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26830
merged
Yuri Weinstein
04:29 PM Bug #38219: rebuild-mondb hangs
/a/sage-2019-03-26_03:52:56-rados-wip-sage-testing-2019-03-25-1934-distro-basic-smithi/3774206 Sage Weil
09:38 AM Backport #38903 (In Progress): nautilus: Minor rados related documentation fixes
Nathan Cutler
09:29 AM Backport #38901 (In Progress): mimic: Minor rados related documentation fixes
Nathan Cutler
09:04 AM Backport #38902 (In Progress): luminous: Minor rados related documentation fixes
Nathan Cutler
04:15 AM Feature #38940 (New): Allow marking noout by failure domain for maintainance and planned downtime.
- Sometimes an entire host can have planned downtime for maintenance.
- Disk failures outside of the affected area ...
Rohan Joseph

03/25/2019

10:02 PM Subtask #37731 (Resolved): upgrade/luminous-x - add "require-osd-release nautilus" and clean up
Yes, done as a part of these.
https://github.com/ceph/ceph/pull/26389
https://github.com/ceph/ceph/pull/26455
Neha Ojha
07:49 PM Subtask #37731: upgrade/luminous-x - add "require-osd-release nautilus" and clean up
@neha I think this is done, just want to confirm, pls resolve Yuri Weinstein
09:02 PM Bug #38041 (Resolved): Fix recovery and backfill priority handling
David Zafman
09:01 PM Backport #38275 (Resolved): mimic: Fix recovery and backfill priority handling
David Zafman
06:47 PM Bug #38927 (Resolved): should print min_mon_release correctly
Sage Weil
06:13 PM Bug #38357: ClsLock.TestExclusiveEphemeralStealEphemeral failed
Similar,... Sage Weil
03:04 PM Bug #38195: osd-backfill-space.sh exposes rocksdb hang
Seen in mimic backport testing with new osd-backfill-prio.sh test.
http://pulpito.ceph.com/dzafman-2019-03-20_19:4...
David Zafman
11:14 AM Bug #38931 (Resolved): osd does not proactively remove leftover PGs
(Context: cephfs cluster running v12.2.11)
We had an osd go nearfull this weekend. I reweighted it to move out som...
Dan van der Ster
10:58 AM Bug #38930 (Duplicate): ceph osd safe-to-destroy wrongly approves any out osd
With v12.2.11, we found that ceph osd safe-to-destroy is wrongly reporting that all out osds are safe to destroy.
...
Dan van der Ster
10:07 AM Backport #38850: upgrade: 1 nautilus mon + 1 luminous mon can't automatically form quorum
Agreed, my expectation would be that we can maintain quorum during the entire upgrade period. Even discounting OS upg... Lars Marowsky-Brée

03/24/2019

04:08 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Yes I can still reproduce it, the cluster is still in a broken state.... Grant Slater
03:37 PM Backport #38853 (Resolved): nautilus: .mgrstat failed to decode mgrstat state; luminous dev version?
Kefu Chai
02:31 AM Bug #38927: should print min_mon_release correctly
> Brad Hubbard wrote:
> 15 - 15 !> 2 ?
>
>
> https://github.com/ceph/ceph/pull/27107 should fix this.
Kefu Chai
02:30 AM Bug #38927 (Pending Backport): should print min_mon_release correctly
Kefu Chai
02:30 AM Bug #38927 (Resolved): should print min_mon_release correctly

dzafman-2019-03-20_19:53:02-rados-wip-zafman-testing-distro-basic-smithi/3754307
rados/upgrade/luminous-x-single...
Kefu Chai

03/23/2019

10:48 PM Backport #38901: mimic: Minor rados related documentation fixes
Remove "premerge" pg state which doesn't apply in mimic. David Zafman
09:13 PM Backport #38901 (Resolved): mimic: Minor rados related documentation fixes
https://github.com/ceph/ceph/pull/27188 Nathan Cutler
10:48 PM Backport #38902: luminous: Minor rados related documentation fixes
Remove "premerge" pg state which doesn't apply in luminous. David Zafman
09:13 PM Backport #38902 (Resolved): luminous: Minor rados related documentation fixes
https://github.com/ceph/ceph/pull/27185 Nathan Cutler
09:13 PM Backport #38906 (Resolved): nautilus: osd/PGLog.h: print olog_can_rollback_to before deciding to ...
https://github.com/ceph/ceph/pull/27302 Nathan Cutler
09:13 PM Backport #38905 (Resolved): luminous: osd/PGLog.h: print olog_can_rollback_to before deciding to ...
https://github.com/ceph/ceph/pull/27715 Nathan Cutler
09:13 PM Backport #38904 (Resolved): mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to rol...
https://github.com/ceph/ceph/pull/27284 Nathan Cutler
09:13 PM Backport #38903 (Resolved): nautilus: Minor rados related documentation fixes
https://github.com/ceph/ceph/pull/27189 Nathan Cutler
09:13 PM Backport #38853 (In Progress): nautilus: .mgrstat failed to decode mgrstat state; luminous dev ve...
Nathan Cutler
05:41 PM Bug #38900 (New): EC pools don't self repair on client read error

When a replicated client read fails at the primary, it will pull the object from another OSD (see rep_repair_primar...
David Zafman
11:42 AM Documentation #38896 (Pending Backport): Minor rados related documentation fixes
Kefu Chai
12:22 AM Documentation #38896 (Resolved): Minor rados related documentation fixes

Document all pg states
Add auto repair items
"premerge" is not pg state in luminous nor mimic
David Zafman

03/22/2019

09:27 PM Bug #38845 (Resolved): mon.a@-1(probing) e1 current monmap has min_mon_release 15 (luminous) whic...
Sage Weil
03:57 PM Bug #38845: mon.a@-1(probing) e1 current monmap has min_mon_release 15 (luminous) which is >2 rel...
https://github.com/ceph/ceph/pull/27131 Yuri Weinstein
02:28 PM Bug #38845 (Fix Under Review): mon.a@-1(probing) e1 current monmap has min_mon_release 15 (lumino...
Neha Ojha
02:02 AM Bug #38845: mon.a@-1(probing) e1 current monmap has min_mon_release 15 (luminous) which is >2 rel...
Brad Hubbard wrote:
> 15 - 15 !> 2 ?
https://github.com/ceph/ceph/pull/27107 should fix this.
Neha Ojha
12:10 AM Bug #38845: mon.a@-1(probing) e1 current monmap has min_mon_release 15 (luminous) which is >2 rel...
15 - 15 !> 2 ? Brad Hubbard
09:05 PM Bug #38892: /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation fault
While I was looking into this I noticed this warning in the Jenkins output.... Brad Hubbard
04:46 PM Bug #38892 (Closed): /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation...
... Sebastian Wagner
07:12 PM Bug #38894 (Pending Backport): osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
Neha Ojha
05:20 PM Bug #38894 (Resolved): osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
This is important for debugging failures in merge_object_divergent_entries() before a decision to rollback is made. Neha Ojha
05:16 PM Bug #38893 (Resolved): RuntimeError: expected MON_CLOCK_SKEW but got none
... Neha Ojha
05:09 PM Cleanup #38635: bluestore: test osd_memory_target
https://github.com/ceph/ceph/pull/27083 - Merged
Will mark Pending Backport when Part-2 merges.
Neha Ojha
02:07 PM Bug #37766 (Resolved): rados_shutdown hang forever in ~objecter()
Nathan Cutler
02:06 PM Backport #38398 (Resolved): mimic: rados_shutdown hang forever in ~objecter()
Nathan Cutler
01:05 PM Backport #38881 (Resolved): nautilus: ENOENT in collection_move_rename on EC backfill target
https://github.com/ceph/ceph/pull/27654 Nathan Cutler
01:05 PM Backport #38880 (Resolved): luminous: ENOENT in collection_move_rename on EC backfill target
https://github.com/ceph/ceph/pull/28110 Nathan Cutler
01:04 PM Backport #38879 (Resolved): mimic: ENOENT in collection_move_rename on EC backfill target
https://github.com/ceph/ceph/pull/27943 Nathan Cutler
01:03 PM Backport #38873 (Resolved): luminous: Rados.get_fsid() returning bytes in python3
https://github.com/ceph/ceph/pull/27674 Nathan Cutler
01:03 PM Backport #38872 (Resolved): mimic: Rados.get_fsid() returning bytes in python3
https://github.com/ceph/ceph/pull/27259 Nathan Cutler
01:01 PM Backport #38860 (Resolved): nautilus: upmap broken the crush rule
https://github.com/ceph/ceph/pull/27225 Nathan Cutler
01:01 PM Backport #38859 (Resolved): luminous: upmap broken the crush rule
https://github.com/ceph/ceph/pull/27224 Nathan Cutler
01:01 PM Backport #38858 (Resolved): mimic: upmap broken the crush rule
https://github.com/ceph/ceph/pull/27257 Nathan Cutler
01:00 PM Backport #38857 (Resolved): luminous: should set EPOLLET flag on del_event()
https://github.com/ceph/ceph/pull/27226 Nathan Cutler
01:00 PM Backport #38856 (Resolved): mimic: should set EPOLLET flag on del_event()
https://github.com/ceph/ceph/pull/29250 Nathan Cutler
01:00 PM Backport #38854 (Resolved): luminous: .mgrstat failed to decode mgrstat state; luminous dev version?
https://github.com/ceph/ceph/pull/27207 Nathan Cutler
01:00 PM Backport #38853 (Resolved): nautilus: .mgrstat failed to decode mgrstat state; luminous dev version?
https://github.com/ceph/ceph/pull/27116 Nathan Cutler
01:00 PM Backport #38852 (Resolved): mimic: .mgrstat failed to decode mgrstat state; luminous dev version?
https://github.com/ceph/ceph/pull/29249 Nathan Cutler
11:05 AM Backport #38850: upgrade: 1 nautilus mon + 1 luminous mon can't automatically form quorum
Just to clarify slightly -- I know the upgrade instructions in the Nautilus release announcement say to "upgrade moni... Tim Serong
10:19 AM Backport #38850 (Resolved): upgrade: 1 nautilus mon + 1 luminous mon can't automatically form quorum
Seen while upgrading Luminous (12.2.10) to Nautilus (14.2.0). Three mon hosts, four osd hosts. The process was:
...
Tim Serong
09:30 AM Bug #38839: .mgrstat failed to decode mgrstat state; luminous dev version?
nautilus https://github.com/ceph/ceph/pull/27116 Sage Weil
09:30 AM Bug #38839 (Pending Backport): .mgrstat failed to decode mgrstat state; luminous dev version?
Sage Weil
07:37 AM Bug #38826 (Pending Backport): upmap broken the crush rule
Kefu Chai
01:33 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
It is possible that the crash we are seeing on osd.2 is due to 1:537949df:::20000a2c834.00000105:head incorrectly rol... Neha Ojha
01:05 AM Bug #38846: dump_pgstate_history doesn't really produce useful json output, needs an array around...
Probably be nice if it dumped the current state stack for each pg as well. Samuel Just

03/21/2019

11:06 PM Bug #38846 (Resolved): dump_pgstate_history doesn't really produce useful json output, needs an a...
... Samuel Just
08:42 PM Bug #38845 (Resolved): mon.a@-1(probing) e1 current monmap has min_mon_release 15 (luminous) whic...

dzafman-2019-03-20_19:53:02-rados-wip-zafman-testing-distro-basic-smithi/3754307
rados/upgrade/luminous-x-single...
David Zafman
06:05 PM Bug #38841 (New): Objects degraded higher than 100%
1. Working Mimic or Nautilus deployment with Bluestore (haven't tested with Filestore)
2. All OSDs up, all PGs activ...
Simon Ironside
05:29 PM Bug #38840 (Resolved): snaps missing in mapper, should be: ca was r -2...repaired

dzafman-2019-03-20_19:53:02-rados-wip-zafman-testing-distro-basic-smithi/3754443
This looks like a cache tier ev...
David Zafman
04:59 PM Bug #38839 (Fix Under Review): .mgrstat failed to decode mgrstat state; luminous dev version?
https://github.com/ceph/ceph/pull/27101 Sage Weil
04:57 PM Bug #38839 (Resolved): .mgrstat failed to decode mgrstat state; luminous dev version?
... Sage Weil
02:26 AM Backport #38719 (In Progress): luminous: crush: choose_args array size mis-sized when weight-sets...
https://github.com/ceph/ceph/pull/27085 Prashant D
01:38 AM Cleanup #38635 (In Progress): bluestore: test osd_memory_target
https://github.com/ceph/ceph/pull/27083 Neha Ojha
01:31 AM Backport #38720 (In Progress): mimic: crush: choose_args array size mis-sized when weight-sets ar...
https://github.com/ceph/ceph/pull/27082 Prashant D

03/20/2019

10:50 PM Bug #26971: failed to become clean before timeout expired

I'm not sure what this means, but pg 1.0 (size 3) needs to pick another one of the 2 remaining OSDs (4 OSDs in) to ...
David Zafman
12:05 PM Bug #38582: Pool storage MAX AVAIL reduction seems higher when single OSD reweight is done
Sorry for the delay. Attaching the required.
osd 155 is the OSD mentioned in description. The one which was manually...
Nokia ceph-users
11:51 AM Bug #38381 (Pending Backport): Rados.get_fsid() returning bytes in python3
Kefu Chai
11:40 AM Bug #38827 (In Progress): valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHa...
Radoslaw Zarzynski
11:24 AM Bug #38827: valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authent...
the test branch contains https://github.com/ceph/ceph/pull/27012 Kefu Chai
11:21 AM Bug #38827 (Resolved): valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWireRxHandl...
... Kefu Chai
11:27 AM Bug #38828 (Resolved): should set EPOLLET flag on del_event()
Kefu Chai
10:41 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
As requested.
osd.0: ceph-post-file: 17efe900-501c-479f-ba56-dd29fef18c58
osd.4: ceph-post-file: ff22f830-e6bc-4f...
Grant Slater
12:36 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Hi Grant,
Looking at the logs, it seems that the first crash was seen on osd.2 on pg id 1.cas2...
Neha Ojha
08:27 AM Bug #38826: upmap broken the crush rule
Here is the crush rule... huang jun
08:24 AM Bug #38826 (Resolved): upmap broken the crush rule
I setup a cluster and want to specify the primary osds through crush rule.
Here is the test script...
huang jun
03:14 AM Backport #38275 (In Progress): mimic: Fix recovery and backfill priority handling
David Zafman
12:43 AM Backport #38244 (Resolved): luminous: scrub warning check incorrectly uses mon scrub interval
David Zafman
12:43 AM Backport #38274 (Resolved): luminous: Fix recovery and backfill priority handling
David Zafman

03/19/2019

11:30 PM Bug #36739 (Pending Backport): ENOENT in collection_move_rename on EC backfill target
Neha Ojha
08:38 PM Backport #38398: mimic: rados_shutdown hang forever in ~objecter()
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26583
merged
Yuri Weinstein

03/18/2019

06:55 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
Err. I believe I mixed up two different bugs, please disregard my previous comment. I don't currently recall what I ... Martin Millnert
06:52 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
For completeness: The root cause for the crashes I experienced were that I had oversized RADOS objects (2-10GB, max ... Martin Millnert
02:22 PM Bug #38124: OSD down on snaptrim.
Hello any updates about this? Erikas Kučinskis
06:35 AM Bug #38793 (New): data inconsistent
I did some test on rbd snap, and found data inconsistent.
cluster status:...
hongpeng lu

03/17/2019

10:21 PM Bug #38787 (Fix Under Review): osd: cache tiering flush clone wrongly
Patrick Donnelly
02:38 AM Bug #38787 (Fix Under Review): osd: cache tiering flush clone wrongly
because cephfs file snapcontext seq may start from 1, we find that in a never snaped fs,
the flush of file will dele...
Zengran Zhang
07:21 PM Bug #38294 (Resolved): osd/PG.cc: 6141: FAILED ceph_assert(info.history.same_interval_since != 0)...
Sage Weil
10:01 AM Bug #38294 (Fix Under Review): osd/PG.cc: 6141: FAILED ceph_assert(info.history.same_interval_sin...
https://github.com/ceph/ceph/pull/27018 Sage Weil
09:57 AM Bug #38294 (In Progress): osd/PG.cc: 6141: FAILED ceph_assert(info.history.same_interval_since !=...
/a/sage-2019-03-17_00:28:04-upgrade:luminous-x-wip-sage4-testing-2019-03-16-1713-distro-basic-smithi/3737326
pg 1....
Sage Weil
12:10 AM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
... Sage Weil

03/16/2019

11:20 PM Bug #21174: OSD crash: 903: FAILED assert(objiter->second->version > last_divergent_update)
I have a similar issue with OSDs dropping out:... Grant Slater
06:33 PM Bug #38786 (Resolved): autoscale down can lead to max_pg_per_osd limit
we adjust pgp_num all the way down to the target, which can make osds hit the max_pgs_per_osd if it's going too far.
...
Sage Weil

03/15/2019

09:45 PM Bug #38623 (Resolved): 2.8s2 past_intervals [6539,6541) start interval does not contain the requi...
Sage Weil
08:31 PM Bug #38655 (Resolved): osd: missing, size mismatch, snap mapper errors
Sage Weil
06:11 PM Bug #36739: ENOENT in collection_move_rename on EC backfill target
https://github.com/ceph/ceph/pull/26996 is a more complete fix for this issue. Neha Ojha
06:06 PM Bug #38784 (Resolved): osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(soid) ||...
... Neha Ojha
05:08 PM Bug #38746 (Resolved): msgr2 leaking buffers
https://github.com/ceph/ceph/pull/26965 Sage Weil
03:20 AM Bug #38746: msgr2 leaking buffers
hmm it happens on some osds but not others.
i added to rxbuf and txbuf lengths to the dout prefix and got this
...
Sage Weil
03:01 AM Bug #38746 (Resolved): msgr2 leaking buffers
osds with bluestore consume too much ram (seeing 20GB on sepia)
to reproduce with vstart, watch bin/ceph daemon os...
Sage Weil
05:03 PM Bug #38783 (New): Changing mon_pg_warn_max_object_skew has no effect.
... Andrew Mitroshin
03:20 PM Documentation #38051 (Resolved): doc/rados/configuration: refresh osdmap section
Nathan Cutler
03:19 PM Backport #38095 (Resolved): luminous: doc/rados/configuration: refresh osdmap section
Nathan Cutler
12:13 PM Bug #38762 (New): Ubuntu/Debian repo has incorrect InRelease
On Ubuntu Bionic trying to update repo package I got error:
E: Failed to fetch https://download.ceph.com/debian-mi...
Alexander Sytar
08:59 AM Backport #38751 (Resolved): mimic: should report EINVAL in ErasureCode::parse() if m<=0
https://github.com/ceph/ceph/pull/28995 Nathan Cutler
08:58 AM Backport #38750 (Resolved): luminous: should report EINVAL in ErasureCode::parse() if m<=0
https://github.com/ceph/ceph/pull/28111 Nathan Cutler

03/14/2019

04:45 PM Feature #38616: Improvements to auto repair

I don't think we need to set "failed_repair" if primary can't recover itself on a read error. We are already setti...
David Zafman
02:46 PM Feature #38616 (In Progress): Improvements to auto repair
David Zafman
04:17 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Forgot mention, the op appears to be an MForward. Sage Weil
11:58 AM Bug #38682 (Pending Backport): should report EINVAL in ErasureCode::parse() if m<=0
Sage Weil
12:53 AM Cleanup #38635: bluestore: test osd_memory_target
Part 1: Test with a value of osd_memory_target lesser than the default, maybe half or less than that. This can be don... Neha Ojha

03/13/2019

11:03 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
... Sage Weil
10:55 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
... Sage Weil
04:29 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
ceph-post-file: 26dab2cb-36c9-40de-8455-1379406477e8
Sage Weil
04:29 PM Bug #38724 (Resolved): _txc_add_transaction error (39) Directory not empty not handled on operati...
... Sage Weil
03:08 PM Bug #38631 (Resolved): osd-scrub-repair.sh fails due to num_objects wrong
David Zafman
03:05 PM Bug #38678 (Resolved): Minor cleanups in tests and log output
David Zafman
01:16 PM Bug #38705 (Resolved): mgr: segv in module thread, PyArg_ParseTuple
Sage Weil
12:03 PM Backport #38720 (Resolved): mimic: crush: choose_args array size mis-sized when weight-sets are e...
https://github.com/ceph/ceph/pull/27082 Nathan Cutler
12:03 PM Backport #38719 (Resolved): luminous: crush: choose_args array size mis-sized when weight-sets ar...
https://github.com/ceph/ceph/pull/27085 Nathan Cutler
11:56 AM Bug #38664 (Pending Backport): crush: choose_args array size mis-sized when weight-sets are enabled
Sage Weil
11:56 AM Bug #38703 (Resolved): lazy omap stats aren't incorportaed into pg_autoscaler size value
Sage Weil
11:55 AM Bug #38238: rados/test.sh: api_aio_pp doesn't seem to start
/a/sage-2019-03-13_02:19:41-rados-wip-sage3-testing-2019-03-12-1657-distro-basic-smithi/3715202 Sage Weil
11:54 AM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
... Sage Weil
11:52 AM Bug #38718 (New): 'osd crush weight-set create-compat' (and other OSDMonitor commands) can leak u...
... Sage Weil
10:55 AM Backport #38506 (Resolved): luminous: ENOENT on setattrs (obj was recently deleted)
Nathan Cutler
10:39 AM Bug #38258 (Resolved): filestore: fsync(2) return value not checked
Nathan Cutler
10:38 AM Backport #38316 (Resolved): luminous: filestore: fsync(2) return value not checked
Nathan Cutler
04:40 AM Backport #38423 (Resolved): luminous: osd/TestPGLog.cc: Verify that dup_index is being trimmed
Brad Hubbard
 

Also available in: Atom