Project

General

Profile

Activity

From 02/02/2018 to 03/03/2018

03/02/2018

10:40 PM Bug #18165 (Resolved): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_target...
David Zafman
10:23 PM Bug #23204 (Duplicate): missing primary copy of object in mixed luminous<->master cluster with bl...
The dead jobs here failed due to this:
http://pulpito.ceph.com/yuriw-2018-03-01_22:45:38-upgrade:luminous-x-wip-yu...
Josh Durgin
09:21 PM Bug #22050: ERROR type entries of pglog do not update min_last_complete_ondisk, potentially ballo...
This seems to be biting rgw's usage pools when rgw-admin usage trim occurs in pgs with little other activity. Josh Durgin
04:18 PM Bug #23200 (Resolved): invalid JSON returned when querying pool parameters
When requesting JSON formatted results for querying for pool
parameters, the list that comes back is not valid JSON....
Wyllys Ingersoll
01:56 PM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Both the image HEAD and snapshot "snap" show a size of 10GB, so if your exported sizes are different, the export must... Jason Dillaman
09:49 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
> There are no logs indicating osd crash and the outputs of 'ceph daemon osd.x log dump' are all empty ({}).
The m...
Mykola Golub
08:26 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Jason Dillaman wrote:
> Can you please run "rados -p <pool name> listomapvals rbd_header.<image id>" and provide the...
宏伟 唐
12:06 PM Bug #23194 (Rejected): librados client is sending bad omap value just before program exits
Thanks Jason. You were absolutely right -- the omap get/put at exit is being driven by ganesha. I had missed that bef... Jeff Layton
04:37 AM Bug #23130: No error is shown when "osd_mon_report_interval_min" value is greater than "osd_mon_...
Jewel is scheduled to reach End of Life when Mimic is released (around June 2018). It's possible this issue will not ... Nathan Cutler

03/01/2018

11:46 PM Bug #23195 (Resolved): Read operations segfaulting multiple OSDs
I'm seeing some OSDs crashing at the same time with (mostly) the same error message related to a reading an erasure c... Paul Emmerich
11:14 PM Bug #23194: librados client is sending bad omap value just before program exits
... there was a "omap get" right before the store and the values stored where the (truncated) values that were just r... Jason Dillaman
10:38 PM Bug #23194: librados client is sending bad omap value just before program exits
rados_kv_get does look hinky, but I don't think we're calling into it here. We're basically doing a rados_kv_put into... Jeff Layton
09:53 PM Bug #23194: librados client is sending bad omap value just before program exits
I don't know what nfs-ganesha code to look at, but this [1] looks very suspect to me since you are returning a pointe... Jason Dillaman
09:43 PM Bug #23194: librados client is sending bad omap value just before program exits
Frame 201:
Object: rec-00000000:0000000000000017
Key: 6528071705456279553
Value: ::ffff:192.168.1.243-(37:Linux NF...
Jason Dillaman
09:16 PM Bug #23194: librados client is sending bad omap value just before program exits
I do have the ability to collect client logs within the container, and can turn up debugging in there if it'll help. Jeff Layton
08:56 PM Bug #23194: librados client is sending bad omap value just before program exits
Ahh, the object name is 29 bytes in this case, so maybe there is some confusion about lengths down in the code that i... Jeff Layton
08:49 PM Bug #23194 (Rejected): librados client is sending bad omap value just before program exits
I've been tracking down a problem in nfs-ganesha where an omap value in an object ends up truncated. It doesn't alway... Jeff Layton
11:47 AM Backport #23160 (Need More Info): luminous: Multiple asserts caused by DNE pgs left behind after ...
Waiting for code review for backport PR : https://github.com/ceph/ceph/pull/20668 Prashant D
11:16 AM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
... Kefu Chai
09:55 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Yoann Moulin wrote:
> David Zafman wrote:
> > Yoann Moulin wrote:
> > > is that normal all files in 11.5f_head hav...
Yoann Moulin
09:26 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
in attachment the result of the dump for each OSD with the good args Yoann Moulin
09:08 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
David Zafman wrote:
> Yoann Moulin wrote:
> > is that normal all files in 11.5f_head have size=0 on each replicate ...
Yoann Moulin
09:06 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
in attachment the result of the dump for each OSD
and the extended attributes for the files on disk :...
Yoann Moulin
01:55 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Yoann Moulin wrote:
> is that normal all files in 11.5f_head have size=0 on each replicate of the PG ?
>
> [...]
...
David Zafman
01:11 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Can you dump the object with something like the following.... Brad Hubbard
09:33 AM Backport #23186 (In Progress): luminous: ceph tell mds.* <command> prints only one matching usage
https://github.com/ceph/ceph/pull/20664 Kefu Chai
09:26 AM Backport #23186 (Resolved): luminous: ceph tell mds.* <command> prints only one matching usage
https://github.com/ceph/ceph/pull/20664 Kefu Chai
09:25 AM Bug #23125 (Duplicate): Bad help text when 'ceph osd pool' is run
Kefu Chai
02:38 AM Bug #23125: Bad help text when 'ceph osd pool' is run
I am working on this issue. Thanks. guotao Yao
08:36 AM Feature #23045 (Fix Under Review): mon: warn on slow ops in OpTracker
Kefu Chai
07:56 AM Feature #23045: mon: warn on slow ops in OpTracker
https://github.com/ceph/ceph/pull/20660 Chang Liu
03:30 AM Bug #23124: Status of OSDs are not showing properly after disabling ceph.target and ceph-osd.target
As OSDs are brought up by the udev rules, regardless of the enabled status of "ceph.target" and "ceph-osd.target" hen... Debashis Mondal
03:14 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Can you please run "rados -p <pool name> listomapvals rbd_header.<image id>" and provide the output? You can determin... Jason Dillaman
01:57 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Jason Dillaman wrote:
> Yes, snapshots are read-only so the only thing I can think of is some sort of data corruptio...
宏伟 唐
12:21 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Mykola Golub wrote:
> It looks like your log entries are from in memory log dump. Did you osd crash (could be seen i...
宏伟 唐

02/28/2018

10:41 PM Bug #23132 (Triaged): some config values should be unsigned, to disallow negative values
Josh Durgin
10:37 PM Bug #23130 (Triaged): No error is shown when "osd_mon_report_interval_min" value is greater than...
This only affects jewel since the osd_mon_report_interval_max option is no longer used in luminous and later. Josh Durgin
10:35 PM Bug #23128: invalid values in ceph.conf do not issue visible warnings
This is reverting to the default value since 1.1 is not a valide value for the option.
This is being improved with...
Josh Durgin
10:34 PM Bug #23128 (Triaged): invalid values in ceph.conf do not issue visible warnings
Josh Durgin
10:31 PM Bug #23125 (Triaged): Bad help text when 'ceph osd pool' is run
Josh Durgin
10:30 PM Bug #23124 (Won't Fix): Status of OSDs are not showing properly after disabling ceph.target and c...
As Nathan explained, this isn't how the targets are meant to work. Josh Durgin
10:27 PM Bug #23145: OSD crashes during recovery of EC pg
Sage, is this a bluestore issue, or did we lose the rollback info somewhere?
It looks like it's getting enoent for...
Josh Durgin
11:23 AM Backport #23181 (In Progress): jewel: Can't repair corrupt object info due to bad oid on all repl...
Nathan Cutler
11:22 AM Backport #23181 (Resolved): jewel: Can't repair corrupt object info due to bad oid on all replicas
https://github.com/ceph/ceph/pull/20622 Nathan Cutler
11:20 AM Backport #23174 (Resolved): luminous: SRV resolution fails to lookup AAAA records
https://github.com/ceph/ceph/pull/20710 Nathan Cutler
11:19 AM Bug #20471 (Pending Backport): Can't repair corrupt object info due to bad oid on all replicas
Nathan Cutler
10:33 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
is that normal all files in 11.5f_head have size=0 on each replicate of the PG ?... Yoann Moulin
08:17 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Here the result of the 3 commands for each replicate of the PG, osd.78 on iccluster020 is the one with the error :
...
Yoann Moulin
06:57 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
It looks like your log entries are from in memory log dump. Did you osd crash (could be seen in the log) or did you u... Mykola Golub
12:30 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Jason Dillaman wrote:
> Yes, snapshots are read-only so the only thing I can think of is some sort of data corruptio...
宏伟 唐
01:26 AM Bug #23078 (Pending Backport): SRV resolution fails to lookup AAAA records
Kefu Chai
01:19 AM Bug #22462 (Resolved): mon: unknown message type 1537 in luminous->mimic upgrade tests
Kefu Chai
01:13 AM Bug #22656: scrub mismatch on bytes (cache pools)
http://pulpito.ceph.com/kchai-2018-02-27_10:33:49-rados-wip-kefu-testing-2018-02-27-1348-distro-basic-mira/2232486/
...
Kefu Chai

02/27/2018

11:13 PM Bug #22902: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
This one looks like a similar failure: http://pulpito.ceph.com/nojha-2018-02-23_18:13:41-rados-wip-async-recovery-201... Neha Ojha
06:49 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
To summarize what I've figured out to reproduce this:
* both rbd client and mon are running 12.2.4, happened with ...
Paul Emmerich
05:46 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
Still happening on 12.2.4... Paul Emmerich
04:23 PM Bug #23124: Status of OSDs are not showing properly after disabling ceph.target and ceph-osd.target
The ceph.target and ceph-osd.target cannot be used this way. Assuming ceph-disk is being used, the OSDs are brought u... Nathan Cutler
04:08 PM Feature #22974 (Resolved): documentation - pg state table missing "activating" state
Nathan Cutler
04:08 PM Backport #23113 (Resolved): luminous: documentation - pg state table missing "activating" state
Nathan Cutler
12:55 PM Backport #23160 (Resolved): luminous: Multiple asserts caused by DNE pgs left behind after lots o...
https://github.com/ceph/ceph/pull/20668 Nathan Cutler
11:48 AM Bug #21716 (Resolved): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Nathan Cutler
11:47 AM Backport #21871 (Rejected): luminous: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
@smithfarm i am sorry that it turns out that this backport is not needed, because of http://tracker.ceph.com/issues/2... Nathan Cutler
08:58 AM Bug #23145 (Duplicate): OSD crashes during recovery of EC pg
I've got a cluster (running released debs of ceph 12.2.3) that started crashing on OSD startup a little bit ago. I di... Peter Woodman
06:12 AM Backport #23075 (In Progress): luminous: osd: objecter sends out of sync with pg epochs for proxi...
https://github.com/ceph/ceph/pull/20609 Prashant D

02/26/2018

09:18 PM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
... possible, but it actually does say it's replicated cache tiers in front of EC backends which should rule-out data... Jason Dillaman
09:01 PM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Couldn't this be related to #21639 (snapshots was not created/deleted against data pool)? The reported version here i... Mykola Golub
07:16 PM Bug #23119 (Need More Info): MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glan...
Yes, snapshots are read-only so the only thing I can think of is some sort of data corruption on the OSDs. Have you r... Jason Dillaman
08:23 PM Bug #22996 (Resolved): Snapset inconsistency is no longer detected
David Zafman
08:20 PM Backport #23054 (Resolved): luminous: Snapset inconsistency is no longer detected
https://github.com/ceph/ceph/pull/20501 David Zafman
08:04 PM Backport #23093 (Resolved): luminous: last-stat-seq returns 0 because osd stats are cleared
David Zafman
08:03 PM Bug #21833 (Pending Backport): Multiple asserts caused by DNE pgs left behind after lots of OSD r...
David Zafman
07:32 PM Feature #23087 (Duplicate): Add OSD metrics to keep track of per-client IO
We've discussed "rbd top" before (http://tracker.ceph.com/projects/ceph/wiki/CDM_07-DEC-2016, http://tracker.ceph.com... Greg Farnum
05:07 AM Bug #23132 (Triaged): some config values should be unsigned, to disallow negative values
Execution Steps:
-------------------
1. Set negative value for parameter "osd_heartbeat_interval" in ceph.conf
2....
Debashis Mondal
04:56 AM Backport #23114 (In Progress): luminous: can't delete object from pool when Ceph out of space
https://github.com/ceph/ceph/pull/20585 Prashant D
04:54 AM Bug #23130 (Triaged): No error is shown when "osd_mon_report_interval_min" value is greater than...
Execution Steps:
------------------
1. Set the "osd_mon_report_interval_min" value using CLI
# ceph daemon osd...
Debashis Mondal
04:37 AM Feature #23129: After creating a snapshot of a rados pool when we try to rollback the pool it all...
rados -p testpool rollback myobject1 testpool-snap
[Note :- Only mentioned object is roll backed from snapshot]
Debashis Mondal
04:35 AM Feature #23129 (New): After creating a snapshot of a rados pool when we try to rollback the pool ...
Execution Steps:
------------------
1. Creating a pool
# ceph osd pool create testpool 16 16
2. Add ...
Debashis Mondal
04:29 AM Bug #23128 (Triaged): invalid values in ceph.conf do not issue visible warnings
Execution Steps
-----------------
1. Change the setting of "mon osd down out interval" in ceph.conf as per below
...
Debashis Mondal
04:24 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Yes, size 0 object is expected since all copies report '"size": 0'.
The discrepancy appears to be in the omap data...
Brad Hubbard
04:10 AM Bug #23125 (Duplicate): Bad help text when 'ceph osd pool' is run
Execution Steps :
-----------------
1.While executing the cli for creating a snapshot of a pool
#ceph osd pool ...
Debashis Mondal
04:04 AM Bug #23124 (Won't Fix): Status of OSDs are not showing properly after disabling ceph.target and c...
Execution Steps:
----------------
1. # ceph osd tree [ceph is in running state]
2. # systemctl disab...
Debashis Mondal
03:49 AM Feature #23123 (New): use pwrite to emulate posix_fallocate
less IO when using a plain file as the store for testing bluestore if posix_fallocate() is not available.
see ht...
Kefu Chai
03:24 AM Backport #23113 (In Progress): luminous: documentation - pg state table missing "activating" state
https://github.com/ceph/ceph/pull/20584 Prashant D

02/25/2018

08:31 AM Bug #23119 (Need More Info): MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glan...
Ceph Version: 12.2.2 Luminous Stable
Problem description:
We use ceph as the backend storage for OpenStack Glance...
宏伟 唐

02/24/2018

07:14 PM Bug #23117 (Fix Under Review): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has ...
In the following setup:
* 6 OSD hosts
* Each host with 32 disks = 32 OSDs
* Pool with 2048 PGs, EC, k=4, m=2, crus...
Oliver Freyermuth
05:54 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
https://github.com/ceph/ceph/pull/20571 David Zafman
05:54 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
Not sure if these needs a Jewel backport David Zafman
11:22 AM Backport #23114 (Resolved): luminous: can't delete object from pool when Ceph out of space
https://github.com/ceph/ceph/pull/20585 Nathan Cutler
11:21 AM Backport #23113 (Resolved): luminous: documentation - pg state table missing "activating" state
https://github.com/ceph/ceph/pull/20584 Nathan Cutler
04:39 AM Feature #22974 (Pending Backport): documentation - pg state table missing "activating" state
https://github.com/ceph/ceph/pull/20504 Kefu Chai
04:35 AM Bug #23078: SRV resolution fails to lookup AAAA records
Kefu Chai
04:32 AM Bug #22952 (Duplicate): Monitor stopped responding after awhile
great! i am marking this ticket as a "duplicate". please reopen it if you think otherwise.
happy Chinese new year ...
Kefu Chai
04:20 AM Bug #22413 (Pending Backport): can't delete object from pool when Ceph out of space
Kefu Chai

02/23/2018

10:24 PM Feature #23096: mon: don't remove auth caps without a flag
We could throw an error instead, yeah. That is probably a wise forcing function. I think we still want the flag thoug... Greg Farnum
11:37 AM Feature #23096: mon: don't remove auth caps without a flag
Bit torn on this one: there is a security downside to changing this behaviour in-place -- any existing scripts that e... John Spray
01:08 AM Feature #23096 (New): mon: don't remove auth caps without a flag
With current syntax, something like... Greg Farnum
08:02 PM Bug #21833 (In Progress): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
David Zafman
02:03 AM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
I was working on this last week, but got distracted by other issues. I'm going to force this scenario and see about f... David Zafman
02:01 PM Backport #23103 (In Progress): luminous: v12.2.2 unable to create bluestore osd using ceph-disk
Nathan Cutler
01:50 PM Backport #23103 (Resolved): luminous: v12.2.2 unable to create bluestore osd using ceph-disk
https://github.com/ceph/ceph/pull/20563 Nathan Cutler
11:54 AM Bug #18165 (Pending Backport): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfil...
This should not have been marked Resolved when one of the backports was still open. Nathan Cutler
08:33 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Hello Brad,
Sorry I have been too fast,
the rados get with the good pool return a file with size=0...
Yoann Moulin
03:42 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Does "rados -p disks ls" list the object? Can you find the actual storage for this object on the disks used for these... Brad Hubbard

02/22/2018

11:56 PM Backport #23093 (In Progress): luminous: last-stat-seq returns 0 because osd stats are cleared
David Zafman
11:43 PM Backport #23093: luminous: last-stat-seq returns 0 because osd stats are cleared
https://github.com/ceph/ceph/pull/20548 David Zafman
05:52 PM Backport #23093 (Resolved): luminous: last-stat-seq returns 0 because osd stats are cleared

I added an assert which crashes ceph-mgr because PGMap::apply_incremental() processes a osd_stat_t that is all zero...
David Zafman
11:40 PM Bug #22882 (Fix Under Review): Objecter deadlocked on op budget while holding rwlock in ms_handle...
https://github.com/ceph/ceph/pull/20519 Greg Farnum
09:40 PM Bug #22952: Monitor stopped responding after awhile
Thanks, with the 12.2.3 + this patch, the cluster is now back to HEALTH_OK state Frank Li
06:37 PM Bug #22952: Monitor stopped responding after awhile
Kefu Chai wrote:
> Frank, sorry for the latency. i am just back from the holiday. i pushed 12.2.3 + https://github.c...
Frank Li
10:07 AM Bug #22952: Monitor stopped responding after awhile
Frank, sorry for the latency. i am just back from the holiday. i pushed 12.2.3 + https://github.com/ceph/ceph/pull/20... Kefu Chai
06:06 PM Bug #22662 (Resolved): ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Nathan Cutler
06:05 PM Backport #22866 (Resolved): jewel: ceph osd df json output validation reported invalid numbers (-...
Nathan Cutler
04:03 PM Bug #21121 (Resolved): test_health_warnings.sh can fail
Nathan Cutler
04:03 PM Backport #21239 (Resolved): jewel: test_health_warnings.sh can fail
Nathan Cutler
02:09 PM Backport #23077 (Need More Info): luminous: mon: ops get stuck in "resend forwarded message to le...
This backport has two master PRs:
* https://github.com/ceph/ceph/pull/20467
* https://github.com/ceph/ceph/pull/2...
Nathan Cutler
01:14 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Hello,
I'm also having this issue...
Yoann Moulin
12:54 PM Feature #23087 (Duplicate): Add OSD metrics to keep track of per-client IO
In our online clusters, there are times when some RBD images' size increase rapidly, which could fill up the whole cl... Xuehan Xu
11:10 AM Bug #22413 (Fix Under Review): can't delete object from pool when Ceph out of space
https://github.com/ceph/ceph/pull/20534 Kefu Chai
09:58 AM Bug #22354 (Pending Backport): v12.2.2 unable to create bluestore osd using ceph-disk
Kefu Chai
08:17 AM Bug #23078 (Fix Under Review): SRV resolution fails to lookup AAAA records
Kefu Chai
08:09 AM Bug #23078: SRV resolution fails to lookup AAAA records
In the meantime btw, a Round Robin IPv6 DNS record works just fine, something like:... Wido den Hollander
07:35 AM Bug #23078: SRV resolution fails to lookup AAAA records
Simon Leinen wrote:
> WANG Guoqin actually noted the lack of IPv6 support in "a comment on issue #14527":http://trac...
Wido den Hollander
06:29 AM Bug #22462 (Fix Under Review): mon: unknown message type 1537 in luminous->mimic upgrade tests
https://github.com/ceph/ceph/pull/20528 Kefu Chai
05:41 AM Bug #22462: mon: unknown message type 1537 in luminous->mimic upgrade tests
MMonHealth (MSG_MON_HEALTH=0x601 (1537)) was removed in https://github.com/ceph/ceph/commit/7b4a741fbda4dc817a003c694... Kefu Chai

02/21/2018

10:46 PM Feature #14527: Lookup monitors through DNS
WANG Guoqin wrote:
> The recent code doesn't support IPv6, apparently. Maybe we can choose among ns_t_a and ns_t_aaa...
Simon Leinen
10:44 PM Bug #23078: SRV resolution fails to lookup AAAA records
WANG Guoqin actually noted the lack of IPv6 support in "a comment on issue #14527":http://tracker.ceph.com/issues/145... Simon Leinen
10:26 PM Bug #23078 (Resolved): SRV resolution fails to lookup AAAA records
We have some IPv6 Rados clusters. So far we have been specifying the addresses of each cluster's three mons using li... Simon Leinen
09:56 PM Support #23005: Implement rados for Python library with some problem
Does this work without pyinstaller on your system? Josh Durgin
09:54 PM Bug #23029: osd does not handle eio on meta objects (e.g., osdmap)
We could at least fail more politely here even if we can't recover from it in the short term. Josh Durgin
09:50 PM Bug #23049: ceph Status shows only WARN when traffic to cluster fails
Can reproduce easily - thanks for the report.
2 bugs here - 1) the monitor is still enforcing the mon_osd_min_up_r...
Josh Durgin
09:46 PM Support #23050 (Closed): PG doesn't move to down state in replica pool
'stale' means there haven't been any reports from the primary in a while. Since there's no osd to report the status o... Josh Durgin
09:40 PM Bug #23051: PGs stuck in down state
Can you post the results of 'ceph pg $PGID query' for some of the down pgs? Josh Durgin
09:34 PM Bug #22994: rados bench doesn't use --max-objects
rados tool options are pretty confusing - help text should make more clear what the options are for bench vs load-gen... Josh Durgin
09:27 PM Backport #23076 (In Progress): jewel: osd: objecter sends out of sync with pg epochs for proxied ops
Nathan Cutler
09:26 PM Backport #23076 (Resolved): jewel: osd: objecter sends out of sync with pg epochs for proxied ops
https://github.com/ceph/ceph/pull/20518 Nathan Cutler
09:26 PM Backport #23077 (Resolved): luminous: mon: ops get stuck in "resend forwarded message to leader"
https://github.com/ceph/ceph/pull/21016 Nathan Cutler
09:26 PM Backport #23075 (Resolved): luminous: osd: objecter sends out of sync with pg epochs for proxied ops
https://github.com/ceph/ceph/pull/20609 Nathan Cutler
07:48 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
Oh, second PR for the OSD beacons and PG create messages: https://github.com/ceph/ceph/pull/20517 Greg Farnum
04:35 PM Bug #22114 (Pending Backport): mon: ops get stuck in "resend forwarded message to leader"
Sage Weil
04:34 PM Bug #22123 (Pending Backport): osd: objecter sends out of sync with pg epochs for proxied ops
Sage Weil
01:06 PM Bug #19737: EAGAIN encountered during pg scrub (jewel)
@Josh - thanks
https://github.com/ceph/ceph/pull/20508
Nathan Cutler
12:49 AM Bug #23031: FAILED assert(!parent->get_log().get_missing().is_missing(soid))

osd.0 was the primary before it crashed came back up and crashed again as original indicated in this bug. This is ...
David Zafman

02/20/2018

04:03 PM Backport #23054 (Resolved): luminous: Snapset inconsistency is no longer detected

The fix for #20243 required additional handling of snapset inconsistency. The Object info and snapset aren't part ...
David Zafman
12:26 PM Bug #23051 (New): PGs stuck in down state
Hello,
We see PGs stuck in down state even when the respective osds are started and recovered from the failure sc...
Nokia ceph-users
10:38 AM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
I can confirm this on 12.2.2. It makes data unavailable.
My output:...
Rafal Wadolowski
10:14 AM Support #23050: PG doesn't move to down state in replica pool
Please let me know of the required logs/info to be added if any. Nokia ceph-users
10:13 AM Support #23050 (Closed): PG doesn't move to down state in replica pool
Hello,
Environment used - 3 node cluster
Replication - 3
#ceph osd pool ls detail
pool 16 'cdvr_ec' replica...
Nokia ceph-users
09:45 AM Backport #17445 (Resolved): jewel: list-snap cache tier missing promotion logic (was: rbd cli seg...
Nathan Cutler
09:43 AM Feature #15835 (Resolved): filestore: randomize split threshold
Nathan Cutler
09:42 AM Backport #22658 (Resolved): filestore: randomize split threshold
Nathan Cutler
09:35 AM Backport #22794 (Resolved): jewel: heartbeat peers need to be updated when a new OSD added into a...
Nathan Cutler
09:33 AM Bug #20705 (Resolved): repair_test fails due to race with osd start
Nathan Cutler
09:33 AM Backport #22818 (Resolved): jewel: repair_test fails due to race with osd start
Nathan Cutler
09:04 AM Backport #23024 (In Progress): luminous: thrash-eio + bluestore (hangs with unfound objects or re...
https://github.com/ceph/ceph/pull/20495 Prashant D
06:16 AM Backport #23024: luminous: thrash-eio + bluestore (hangs with unfound objects or read_log_and_mis...
I'm on it. Prashant D
08:55 AM Bug #23049: ceph Status shows only WARN when traffic to cluster fails
Please let me know of the required logs/info to be added if any. Nokia ceph-users
08:54 AM Bug #23049 (New): ceph Status shows only WARN when traffic to cluster fails
Hello,
While using Kraken, i have seen the status change to ERR but in luminous we do not see the status of ceph ...
Nokia ceph-users
07:46 AM Bug #22996 (Pending Backport): Snapset inconsistency is no longer detected
https://github.com/ceph/ceph/pull/20450 Kefu Chai
05:30 AM Bug #19737: EAGAIN encountered during pg scrub (jewel)
Looked at the logs from http://pulpito.front.sepia.ceph.com/smithfarm-2018-02-06_21:07:15-rados-wip-jewel-backports-d... Josh Durgin

02/19/2018

10:59 PM Bug #18178 (Won't Fix): Unfound objects lost after OSD daemons restarted

Reasons this is being close
1. PG repair is moving to user mode so on the fly object repair probably won't use r...
David Zafman
09:58 PM Feature #23045: mon: warn on slow ops in OpTracker
I've assigned this to myself but I don't know when I can get to it, so if you want to work on this feel free to take it! Greg Farnum
09:56 PM Feature #23045 (Resolved): mon: warn on slow ops in OpTracker
The monitor has an OpTracker now, but it doesn't warn on slow ops the way the MDS or OSD do. We should enable that to... Greg Farnum
09:52 PM Bug #23030: osd: crash during recovery with assert(p != recovery_info.ss.clone_snap)and assert(re...
This snapshot assert looks like "Ceph Luminous - pg is down due to src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)... Greg Farnum
09:02 PM Feature #23044 (New): osd: use madvise with MADV_DONTDUMP to prevent cached data from being core ...
Idea here is to reduce the size of the coredumps but also to prevent sensitive data from being leaked. Patrick Donnelly
02:55 PM Bug #22123 (Fix Under Review): osd: objecter sends out of sync with pg epochs for proxied ops
https://github.com/ceph/ceph/pull/20484
I opted for the marginally more complex solution of cancelling multiple o...
Sage Weil

02/17/2018

02:20 AM Bug #23031 (New): FAILED assert(!parent->get_log().get_missing().is_missing(soid))
Using vstart to start 3 OSDs with -o filestore debug inject read err=1
Manually injectdataerr on all replicas of o...
David Zafman
12:37 AM Bug #23030 (Fix Under Review): osd: crash during recovery with assert(p != recovery_info.ss.clone...
I've got some OSDs in an 5/3 EC pool crashing during recovery. The crash happens simultaneously on 5 to 10 OSDs, some... Paul Emmerich
12:36 AM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
I looked at it briefly and the output doesn't make any sense to me, but I don't have a lot of context around what the... Greg Farnum

02/16/2018

11:49 PM Bug #22114 (Fix Under Review): mon: ops get stuck in "resend forwarded message to leader"
https://github.com/ceph/ceph/pull/20467 Greg Farnum
02:14 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
Greg Farnum wrote:
> Ummm, yep, that looks right to me at a quick glance! Can you submit a PR with that change? :)
...
hongpeng lu
02:04 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
Maybe not. you should check the code on github.com. hongpeng lu
01:22 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
hongpeng lu wrote:
> The messages can not be forwarded appropriately, you must change the code like this.
> [...]
...
Oleg Glushak
01:17 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
The messages can not be forwarded appropriately, you must change the code like this.... hongpeng lu
12:52 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
We have the same problem on all our Luminous clusters. Any news regarding fix?
Most stuck messages in our case are o...
Oleg Glushak
10:35 PM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
@nathan This doesn't have cache tier, so it would be a different issue. Maybe related to upgrade? David Zafman
07:58 PM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
@David I guess this is a duplicate, too? Nathan Cutler
04:27 PM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
seems reproducible, see
http://pulpito.ceph.com/teuthology-2018-02-16_01:15:03-upgrade:hammer-x-jewel-distro-basic-...
Yuri Weinstein
10:01 PM Bug #23029 (New): osd does not handle eio on meta objects (e.g., osdmap)
... Sage Weil
05:00 PM Bug #22063 (Duplicate): "RadosModel.h: 1703: FAILED assert(!version || comp->get_version64() == v...
David Zafman
04:59 PM Bug #22064 (Duplicate): "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi
David Zafman
11:03 AM Backport #23024 (Resolved): luminous: thrash-eio + bluestore (hangs with unfound objects or read_...
https://github.com/ceph/ceph/pull/20495 Nathan Cutler
12:07 AM Bug #21218 (Pending Backport): thrash-eio + bluestore (hangs with unfound objects or read_log_and...
Sage Weil

02/15/2018

06:53 PM Bug #22952: Monitor stopped responding after awhile
Frank Li wrote:
> either 12.2.2 + the patch or 12.2.3 RC + the patch would be good, whichever is more convenient to ...
Frank Li
05:09 PM Bug #22996 (Fix Under Review): Snapset inconsistency is no longer detected
David Zafman
04:13 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
I've got a cluster here where this issue is 100% reproducible when trying to delete snapshots. Let me know if we can ... Paul Emmerich
04:07 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
I'm also seeing this on 12.2.2. The crashing OSD has some bad PG which crashes it on startup. I first assumed the dis... Paul Emmerich
03:47 PM Backport #21871 (In Progress): luminous: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Nathan Cutler
03:45 PM Backport #21871 (Need More Info): luminous: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
somewhat non-trivial, @Kefu could you take a look? Nathan Cutler
03:40 PM Backport #21871 (In Progress): luminous: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Nathan Cutler
03:39 PM Backport #21872 (Resolved): jewel: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Nathan Cutler
07:34 AM Support #23005 (New): Implement rados for Python library with some problem
Hi all,
This is my first time to be here.
I use the ceph raods library to implement customize python code, and ...
Chen BO-YU

02/14/2018

10:02 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
I'm seeing this on Luminous. Some kRBD clients are sending requests of death killing the active monitor.
No special ...
Paul Emmerich
08:30 PM Bug #22462: mon: unknown message type 1537 in luminous->mimic upgrade tests
@Kefu could pls take a look? Yuri Weinstein
05:48 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
ok, I'll wait for 12.2.4 or a 12.2.3 + the patch then. Frank Li
09:10 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Frank Li wrote:
> just curious, I saw this patch got merged to the master branch and has the target version of 12.2....
Nathan Cutler
06:51 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
just curious, I saw this patch got merged to the master branch and has the target version of 12.2.3, does that mean i... Frank Li
06:50 AM Bug #22952: Monitor stopped responding after awhile
either 12.2.2 + the patch or 12.2.3 RC + the patch would be good, whichever is more convenient to build. Frank Li
06:05 AM Bug #22996: Snapset inconsistency is no longer detected
We also need this fix to include tests that happen in the QA suite to prevent a future regression! :)
(Presumably th...
Greg Farnum
03:39 AM Bug #22996: Snapset inconsistency is no longer detected
David Zafman
03:37 AM Bug #22996 (Resolved): Snapset inconsistency is no longer detected

The fix for #20243 required additional handling of snapset inconsistency. The Object info and snapset aren't part ...
David Zafman

02/13/2018

07:53 PM Bug #22994 (New): rados bench doesn't use --max-objects
It would be useful for testing OSD caching behavior if rados bench would respect --max-objects parameter. It seems t... Ben England
07:30 PM Bug #22992: mon: add RAM usage (including avail) to HealthMonitor::check_member_health?
Turned out it was just the monitor being thrashed (didn't realize we were doing that in kcephfs!): #22993
Still, m...
Patrick Donnelly
06:43 PM Bug #22992 (New): mon: add RAM usage (including avail) to HealthMonitor::check_member_health?
I'm looking into several MON_DOWN failures from
http://pulpito.ceph.com/pdonnell-2018-02-13_17:49:41-kcephfs-wip-p...
Patrick Donnelly
06:12 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
https://github.com/ceph/ceph/pull/20410 David Zafman
04:04 AM Bug #21218 (Fix Under Review): thrash-eio + bluestore (hangs with unfound objects or read_log_and...
David Zafman
12:27 PM Bug #22063: "RadosModel.h: 1703: FAILED assert(!version || comp->get_version64() == version)" inr...
Another jewel run with this bug:
* http://qa-proxy.ceph.com/teuthology/smithfarm-2018-02-06_21:07:15-rados-wip-jew...
Nathan Cutler
06:52 AM Bug #22952: Monitor stopped responding after awhile
Kefu Chai wrote:
> > I reproduced the issue in a seperate cluster
>
> could you share the steps to reproduce this...
Frank Li

02/12/2018

10:35 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)

This assert can only happen in the following two cases:
osd debug verify missing on start = true. Used in t...
David Zafman
10:07 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
For kefu's run above,... Sage Weil
03:07 AM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
thrash-eio + bluestore
/a/kchai-2018-02-11_04:16:47-rados-wip-kefu-testing-2018-02-11-0959-distro-basic-mira/2181825...
Kefu Chai
10:05 AM Bug #22354 (Fix Under Review): v12.2.2 unable to create bluestore osd using ceph-disk
https://github.com/ceph/ceph/pull/20400
Kefu Chai
09:52 AM Bug #22445: ceph osd metadata reports wrong "back_iface"
John Spray wrote:
> Hmm, this could well be the first time anyone's really tested the IPv6 path here.
https://git...
cory gu
09:27 AM Backport #22942 (In Progress): luminous: ceph osd force-create-pg cause all ceph-mon to crash and...
Nathan Cutler
08:57 AM Bug #22952: Monitor stopped responding after awhile
> I reproduced the issue in a seperate cluster
could you share the steps to reproduce this issue? so i can try it ...
Kefu Chai
05:58 AM Bug #22949 (Rejected): ceph_test_admin_socket_output --all times out
Kefu Chai
05:57 AM Bug #22949: ceph_test_admin_socket_output --all times out
thanks Brad. my bad, i thought the bug was in master also. closing this ticket, as the related PR is not yet merged. Kefu Chai

02/10/2018

08:50 AM Bug #22949: ceph_test_admin_socket_output --all times out
Brad Hubbard
08:39 AM Bug #22949: ceph_test_admin_socket_output --all times out
This is not a problem with the test (although it highlights a deficiency with error reporting which I'll submit a PR ... Brad Hubbard
02:32 AM Bug #22882 (In Progress): Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
I finally realized that the op throttler *does* drop the global rwlock while waiting for throttle, so it at least doe... Greg Farnum

02/09/2018

10:08 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Just FYI, Using this new patch, the leader ceph-mon will hung once it is up, and any kind of OSD command is ran, like... Frank Li
10:06 PM Bug #22952: Monitor stopped responding after awhile
Frank Li wrote:
> Frank Li wrote:
> > I reproduced the issue in a seperate cluster, it seems that whichever ceph-mo...
Frank Li
08:40 PM Bug #22952: Monitor stopped responding after awhile
Frank Li wrote:
> I reproduced the issue in a seperate cluster, it seems that whichever ceph-mon became the leader w...
Frank Li
08:35 PM Bug #22952: Monitor stopped responding after awhile
I reproduced the issue in a seperate cluster, it seems that whichever ceph-mon became the leader will be stuck, as I ... Frank Li
07:50 PM Feature #22973 (Duplicate): log lines when hitting "pg overdose protection"
You're right that it's bad! This will be fixed in the next luminous release after a belated backport finally happened... Greg Farnum
02:15 PM Feature #22973 (Duplicate): log lines when hitting "pg overdose protection"
After upgrading to Luminous we ran into situation where 10% of our pgs remained unavailable, stuck in "activating" st... Dan Stoner
04:24 PM Bug #22300 (Rejected): ceph osd reweightn command seems to change weight value
the parameter of reweigtn is an array of fixed point integers. and the integers are int(weight * 0x10000), where weig... Kefu Chai
02:20 PM Feature #22974 (Resolved): documentation - pg state table missing "activating" state
"activating" is not listed in the pg state table:
http://docs.ceph.com/docs/master/rados/operations/pg-states/
...
Dan Stoner
06:41 AM Bug #22949: ceph_test_admin_socket_output --all times out
Sure mate, added a patch to get better debugging and will test as soon as it's built. Brad Hubbard
12:24 AM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
Oh, and I had the LingerOp and Op conflated in my head a bit when looking at that before, but they are different.
...
Greg Farnum
12:03 AM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
Jason, how did you establish the number of in-flight ops? I wonder if maybe it *did* have them but they weren't able ... Greg Farnum
12:02 AM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
Okay, so presumably on resend you shouldn't need to grab op budget again, since it's already budgeted, right?
And ...
Greg Farnum

02/08/2018

02:37 PM Bug #22949: ceph_test_admin_socket_output --all times out
Brad, i am not able to reproduce this issue. could you help take a look? Kefu Chai
02:25 AM Bug #20086 (Resolved): LibRadosLockECPP.LockSharedDurPP gets EEXIST
Kefu Chai
02:24 AM Bug #22440 (Resolved): New pgs per osd hard limit can cause peering issues on existing clusters
@Nick, if you think this issue deserves a different fix, please feel free to reopen this ticket Kefu Chai
12:51 AM Bug #22848: Pull the cable,5mins later,Put back to the cable,pg stuck a long time ulitl to resta...
Hi Josh Durginm,
1.They both are fibre-optical cable in our networkcard.
2.Log files cann't be found yet,due to at...
Yong Wang

02/07/2018

11:09 PM Bug #22220: osd/ReplicatedPG.h:1667:14: internal compiler error: in force_type_die, at dwarf2out....
Fixed by gcc-7.3.1-2.fc26 gcc-7.3.1-2.fc27 in fc27 Brad Hubbard
10:49 PM Bug #22440: New pgs per osd hard limit can cause peering issues on existing clusters
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/20204
merged
Yuri Weinstein
09:44 PM Bug #22848: Pull the cable,5mins later,Put back to the cable,pg stuck a long time ulitl to resta...
Which cable are you pulling? Do you have logs from the monitors and osds? The default failure detection timeouts can ... Josh Durgin
09:40 PM Bug #22916 (Duplicate): OSD crashing in peering
Josh Durgin
09:40 PM Bug #21287 (Duplicate): 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->i...
Josh Durgin
03:52 AM Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"
see https://github.com/ceph/ceph/pull/16675 Chang Liu
02:37 AM Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"
we hit this bug too in ec pool 2+1, i find one peer did not receive one piece of op message sended from primary osd, ... lingjie kong
06:12 PM Bug #22952: Monitor stopped responding after awhile
here is where the first mon server is stuck, running mon_status hang:
[root@dl1-kaf101 frli]# ceph --admin-daemon /v...
Frank Li
06:06 PM Bug #22952 (Duplicate): Monitor stopped responding after awhile
After a crash of ceph-mon in 12.2.2 and using a private build provided by ceph developers, the ceph-mon would come up... Frank Li
06:06 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
https://tracker.ceph.com/issues/22952
ticket opened for ceph-mon not responding issue.
Frank Li
06:02 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
I'll open a separate ticket to track the monitor not responding issue. the fix for the force-create-pg issue is good. Frank Li
06:01 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Kefu Chai wrote:
> [...]
>
>
> the cluster formed a quorum of [0,1,2,3,4] since 18:02:21. and it was not in pro...
Frank Li
05:58 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Kefu Chai wrote:
> [...]
>
> was any osd up when you were testing?
Yes, but they were in Booting State, all of...
Frank Li
06:56 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
... Kefu Chai
06:12 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
... Kefu Chai
04:05 PM Bug #22746 (Resolved): osd/common: ceph-osd process is terminated by the logratote task
Kefu Chai
03:33 PM Bug #22949 (Rejected): ceph_test_admin_socket_output --all times out
http://pulpito.ceph.com/kchai-2018-02-07_01:22:25-rados-wip-kefu-testing-2018-02-06-1514-distro-basic-mira/2161301/ Kefu Chai
05:50 AM Backport #22942 (Resolved): luminous: ceph osd force-create-pg cause all ceph-mon to crash and un...
https://github.com/ceph/ceph/pull/20399 Nathan Cutler
05:01 AM Backport #22934 (Resolved): luminous: filestore journal replay does not guard omap operations
https://github.com/ceph/ceph/pull/21547 Nathan Cutler
12:54 AM Backport #22866 (In Progress): jewel: ceph osd df json output validation reported invalid numbers...
https://github.com/ceph/ceph/pull/20344 Prashant D

02/06/2018

08:01 PM Bug #22350 (Resolved): nearfull OSD count in 'ceph -w'
Sage Weil
07:49 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
so anything I can do to help recover the cluster ?? Frank Li
06:50 AM Bug #22847 (Pending Backport): ceph osd force-create-pg cause all ceph-mon to crash and unable to...
Kefu Chai
01:23 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
please see attached logs for when the monitor was started, and then later got into the stuck mode.
I just replaced t...
Frank Li
04:54 PM Bug #22920: filestore journal replay does not guard omap operations
lowering the priority since in practice we don't clone objects with omap on them. Sage Weil
04:53 PM Bug #22920 (Pending Backport): filestore journal replay does not guard omap operations
Sage Weil
04:07 PM Bug #22656: scrub mismatch on bytes (cache pools)
aah, just popped up on luminous: http://pulpito.ceph.com/yuriw-2018-02-05_23:07:16-rados-wip-yuri-testing-2018-02-05-... Sage Weil
02:24 PM Bug #20924: osd: leaked Session on osd.7
/a/yuriw-2018-02-02_20:31:37-rados-wip_yuri_master_2.2.18-distro-basic-smithi/2143177 Sage Weil

02/05/2018

09:06 PM Feature #4305: CRUSH: it should be possible use ssd as primary and hdd for replicas but still mak...
Assuming @Patrick meant "RADOS" and not "rados-java" Nathan Cutler
08:58 PM Bug #21977: null map from OSDService::get_map in advance_pg
Seems persisting, see in
http://qa-proxy.ceph.com/teuthology/teuthology-2018-02-05_04:23:02-upgrade:jewel-x-lumino...
Yuri Weinstein
08:01 PM Feature #3586 (Resolved): CRUSH: separate library
Patrick Donnelly
07:53 PM Feature #3764: osd: async replicas
Patrick Donnelly
07:33 PM Feature #11046 (Resolved): osd: rados io hints improvements
PR merged. Patrick Donnelly
03:17 PM Bug #22920: filestore journal replay does not guard omap operations
https://github.com/ceph/ceph/pull/20279 Sage Weil
03:16 PM Bug #22920 (Resolved): filestore journal replay does not guard omap operations
omap operations are replayed without checking the guards, which means that omap data can leak between objects that ar... Sage Weil
12:05 PM Bug #20924: osd: leaked Session on osd.7
/a/yuriw-2018-02-02_20:31:37-rados-wip_yuri_master_2.2.18-distro-basic-smithi/2143177/remote/smithi111/log/valgrind/o... Kefu Chai
09:37 AM Support #22917 (New): mon keeps on crashing ( 12.2.2 )
mon keeps on crashing ( 0> 2018-02-05 00:22:49.915541 7f6d0a781700 -1 *** Caught signal (Aborted) **
in thread 7f6d...
yair mackenzi
08:49 AM Bug #22916 (Duplicate): OSD crashing in peering
Bluestore OSD is crashed with a stacktrace:... Artemy Kapitula
03:52 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
> Should I be updating the ceph-osd to the same patched version ??
no need to update ceph-osd.
> but very soon,...
Kefu Chai
01:41 AM Bug #22668 (Resolved): osd/ExtentCache.h: 371: FAILED assert(tid == 0)
Kefu Chai

02/04/2018

07:29 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
ALso, while the monitors came up and form a forum, but very soon, they would all stop responding again, and then I fi... Frank Li

02/03/2018

09:39 PM Backport #21239 (In Progress): jewel: test_health_warnings.sh can fail
Nathan Cutler
07:01 PM Backport #22450 (Resolved): luminous: Visibility for snap trim queue length
Nathan Cutler
06:37 PM Bug #22409 (Resolved): ceph_objectstore_tool: no flush before collection_empty() calls; ObjectSto...
Nathan Cutler
06:37 PM Backport #22707 (Resolved): luminous: ceph_objectstore_tool: no flush before collection_empty() c...
Nathan Cutler
06:36 PM Bug #21147 (Resolved): Manager daemon x is unresponsive. No standby daemons available
Nathan Cutler
06:35 PM Backport #22399 (Resolved): luminous: Manager daemon x is unresponsive. No standby daemons available
Nathan Cutler
07:18 AM Backport #22906 (Rejected): jewel: bluestore: New OSD - Caught signal - bstore_kv_sync (throttle ...
Nathan Cutler
07:17 AM Bug #22539: bluestore: New OSD - Caught signal - bstore_kv_sync
Adding jewel backport on the theory that (1) Jenkins CI is using modern glibc/kernel to run make check on jewel, brea... Nathan Cutler
12:45 AM Backport #22389 (Resolved): luminous: ceph-objectstore-tool: Add option "dump-import" to examine ...
David Zafman
12:43 AM Bug #22837 (In Progress): discover_all_missing() not always called during activating
Part of https://github.com/ceph/ceph/pull/20220 David Zafman
12:41 AM Bug #18162 (Resolved): osd/ReplicatedPG.cc: recover_replicas: object added to missing set for bac...
David Zafman
12:40 AM Backport #22013 (Resolved): jewel: osd/ReplicatedPG.cc: recover_replicas: object added to missing...
David Zafman

02/02/2018

11:08 PM Backport #22707: luminous: ceph_objectstore_tool: no flush before collection_empty() calls; Objec...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/19967
merged
Yuri Weinstein
11:01 PM Backport #22389: luminous: ceph-objectstore-tool: Add option "dump-import" to examine an export
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/19487
merged
Yuri Weinstein
11:00 PM Backport #22399: luminous: Manager daemon x is unresponsive. No standby daemons available
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/19501
merged
Yuri Weinstein
09:15 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Frank Li wrote:
> I've updated all the ceph-mon with the RPMs from the patch repo, they came up fine, and I've resta...
Frank Li
09:14 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
I've updated all the ceph-mon with the RPMs from the patch repo, they came up fine, and I've restarted the OSDs, but ... Frank Li
08:29 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Just for future operational references, is there anyway to revert the Monitor map to a previous state in the case of ... Frank Li
06:22 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Please note the Crash happend on the monitor, not the OSD, the OSDs all stayed up, but all the monitor crashed. Frank Li
06:21 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
-4> 2018-01-31 22:47:22.942381 7fc641d0b700 1 -- 10.102.52.37:6789/0 <== mon.0 10.102.52.37:6789/0 0 ==== log(1 ... Frank Li
06:09 PM Bug #22847 (Fix Under Review): ceph osd force-create-pg cause all ceph-mon to crash and unable to...
https://github.com/ceph/ceph/pull/20267 Sage Weil
05:46 PM Bug #22847 (Need More Info): ceph osd force-create-pg cause all ceph-mon to crash and unable to c...
Can you attach the entire osd log for the crashed osd? (In particular, we need to see what assertion failed.) Thanks! Sage Weil
07:32 PM Bug #22902 (Resolved): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")

http://pulpito.ceph.com/dzafman-2018-02-01_09:46:36-rados-wip-zafman-testing-distro-basic-smithi/2138315
I think...
David Zafman
07:23 PM Bug #22834 (Resolved): Primary ends up in peer_info which isn't supposed to be there
David Zafman
09:48 AM Bug #22257 (Resolved): mon: mgrmaps not trimmed
Nathan Cutler
09:48 AM Backport #22258 (Resolved): mon: mgrmaps not trimmed
Nathan Cutler
09:47 AM Backport #22402 (Resolved): luminous: osd: replica read can trigger cache promotion
Nathan Cutler
08:05 AM Backport #22807 (Resolved): luminous: "osd pool stats" shows recovery information bugly
Nathan Cutler
07:54 AM Bug #22715 (Resolved): log entries weirdly zeroed out after 'osd pg-temp' command
Nathan Cutler
07:54 AM Backport #22744 (Resolved): luminous: log entries weirdly zeroed out after 'osd pg-temp' command
Nathan Cutler
05:46 AM Documentation #22843: [doc][luminous] the configuration guide still contains osd_op_threads and d...
For downstream Red Hat products, you should use the Red Hat bugzilla to report bugs. This is the upstream bug tracker... Nathan Cutler
05:15 AM Backport #22013 (In Progress): jewel: osd/ReplicatedPG.cc: recover_replicas: object added to miss...
Nathan Cutler
12:17 AM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
When I saw the test running for 4 hours my first thought was that the cluster was unhealthy -- but all OSDs were up a... Jason Dillaman
 

Also available in: Atom