Project

General

Profile

Activity

From 02/06/2018 to 03/07/2018

03/07/2018

11:26 PM Bug #23270 (New): failed mutex assert in PipeConnection::try_get_pipe() (via OSD::do_command())
... Sage Weil
09:29 PM Bug #23269 (New): Early use of clog in OSD startup crashes OSD

This crash occurred because log_weirdness() called osd->clog->error() probably out of init() -> load_pgs() -> read_...
David Zafman
05:13 PM Bug #23128: invalid values in ceph.conf do not issue visible warnings
Josh Durgin wrote:
> This is being improved with a centralized configuration stored in the monitors in mimic.
I...
Joshua Schmid
05:01 PM Bug #23267 (Resolved): scrub errors not cleared on replicas can cause inconsistent pg state when ...

The PG_STATE_INCONSISTENT flag is set based on num_scrub_errors. A pg query can show after scrub inconsistencies r...
David Zafman
12:39 PM Bug #22092 (Resolved): ceph-kvstore-tool's store-crc command does not save result to the file as ...
Chang Liu
11:48 AM Bug #23258: OSDs keep crashing.
Additional info: We were running on Kraken until last week, than upgraded to 12.2.3, where the problems started and u... Jan Marquardt
11:44 AM Bug #23258 (New): OSDs keep crashing.
At least two OSDs (#11 and #20) on two different hosts in our cluster keep crashing, which prevent our cluster to get... Jan Marquardt
10:33 AM Backport #23256 (In Progress): luminous: bluestore: should recalc_allocated when decoding bluefs_...
https://github.com/ceph/ceph/pull/20771 Kefu Chai
10:32 AM Backport #23256 (Resolved): luminous: bluestore: should recalc_allocated when decoding bluefs_fno...
https://github.com/ceph/ceph/pull/20771 Kefu Chai
10:30 AM Bug #23212 (Pending Backport): bluestore: should recalc_allocated when decoding bluefs_fnode_t
Kefu Chai
05:01 AM Feature #23242: ceph-objectstore-tool command to trim the pg log
The assert(num_unsent <= log_queue.size()) probably doesn't relate directly with this feature. The log_weirdness() f... David Zafman
01:02 AM Feature #23242: ceph-objectstore-tool command to trim the pg log

From PG::log_weirdness():
2018-03-06 16:18:57.413 7f0a593b9dc0 -1 log_channel(cluster) log [ERR] : 1.0 log bound...
David Zafman
12:26 AM Feature #23242 (In Progress): ceph-objectstore-tool command to trim the pg log

When testing the log trimming code on master the OSD crashes like this....
David Zafman
02:34 AM Feature #23236 (Rejected): should allow osd to dump slow ops
Kefu Chai

03/06/2018

07:25 PM Feature #23236: should allow osd to dump slow ops
Oh yep, that'll do it. So I'm a bit confused what this ticket is supposed to mean. Greg Farnum
09:55 AM Feature #23236: should allow osd to dump slow ops
Isn't this what dump_blocked_ops is for? See also https://tracker.ceph.com/issues/23205 John Spray
04:23 AM Feature #23236: should allow osd to dump slow ops
I guess this is saying we don’t have a slow-only output command? dump_ops_in_flight et al certainly will print them o... Greg Farnum
04:11 AM Feature #23236 (Rejected): should allow osd to dump slow ops
after f4b74125e44fe78154fb377fa06fc08b3325859d, we have no way to print out the slow ops of OSDs. only a summary is o... Kefu Chai
03:01 PM Bug #23145 (Need More Info): OSD crashes during recovery of EC pg
Radoslaw Zarzynski
02:29 PM Bug #23145: OSD crashes during recovery of EC pg
@Peter:
Is there a chance to get a log with both OSD and BlueStore debugs levels turned to 20? At the moment I can't...
Radoslaw Zarzynski
01:51 PM Feature #23242 (Resolved): ceph-objectstore-tool command to trim the pg log
ceph-objectstore-tool command to trim the pg log
The motive of this bug is to have a command to trim the pg log with...
Vikhyat Umrao
12:54 PM Bug #23200 (Fix Under Review): invalid JSON returned when querying pool parameters
https://github.com/ceph/ceph/pull/20745. Chang Liu
06:07 AM Bug #23233 (Duplicate): The randomness of the hash function causes the object to be inhomogeneous...
Nathan Cutler
02:41 AM Bug #23233: The randomness of the hash function causes the object to be inhomogeneous to the PG.T...
Sorry, this question is not completed. Please ignore it. junwei liao
02:34 AM Bug #23233 (Duplicate): The randomness of the hash function causes the object to be inhomogeneous...
junwei liao
05:30 AM Backport #23077 (New): luminous: mon: ops get stuck in "resend forwarded message to leader"
These are both done and the backport can proceed. :) Greg Farnum
03:03 AM Bug #23235 (New): The randomness of the hash function causes the object to be inhomogeneous to th...
The randomness of the ceph_str_hash_rjenkins hash function causes the object to be inhomogeneous to the PG.The result... junwei liao
12:26 AM Bug #20924: osd: leaked Session on osd.7
/a/kchai-2018-03-05_17:31:09-rados-wip-kefu-testing-2018-03-05-2238-distro-basic-smithi/2252897 Kefu Chai

03/05/2018

07:12 PM Bug #23228 (Closed): scrub mismatch on objects
... Sage Weil
06:27 PM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
saw this again,... Sage Weil
04:45 PM Bug #23215: config.cc: ~/.ceph/$cluster.conf is passed unexpanded to fopen()
(I think this is rbd, right?) John Spray
06:01 AM Bug #23215 (Resolved): config.cc: ~/.ceph/$cluster.conf is passed unexpanded to fopen()
parse_file() in "src/dmclock/sim/src/ConfUtils.cc" receives a filename without the tilde being expanded to correspond... Rishabh Dave
09:34 AM Backport #23174 (In Progress): luminous: SRV resolution fails to lookup AAAA records
https://github.com/ceph/ceph/pull/20710 Prashant D
03:37 AM Bug #23212: bluestore: should recalc_allocated when decoding bluefs_fnode_t
https://github.com/ceph/ceph/pull/20701 Kefu Chai
03:35 AM Bug #23212 (Resolved): bluestore: should recalc_allocated when decoding bluefs_fnode_t
... Kefu Chai
03:20 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
宏伟 唐 wrote:
> 宏伟 唐 wrote:
> > Mykola Golub wrote:
> > > > There are no logs indicating osd crash and the outputs o...
宏伟 唐
02:36 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
宏伟 唐 wrote:
> Mykola Golub wrote:
> > > There are no logs indicating osd crash and the outputs of 'ceph daemon osd....
宏伟 唐

03/04/2018

02:33 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Mykola Golub wrote:
> > There are no logs indicating osd crash and the outputs of 'ceph daemon osd.x log dump' are a...
宏伟 唐

03/02/2018

10:40 PM Bug #18165 (Resolved): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_target...
David Zafman
10:23 PM Bug #23204 (Duplicate): missing primary copy of object in mixed luminous<->master cluster with bl...
The dead jobs here failed due to this:
http://pulpito.ceph.com/yuriw-2018-03-01_22:45:38-upgrade:luminous-x-wip-yu...
Josh Durgin
09:21 PM Bug #22050: ERROR type entries of pglog do not update min_last_complete_ondisk, potentially ballo...
This seems to be biting rgw's usage pools when rgw-admin usage trim occurs in pgs with little other activity. Josh Durgin
04:18 PM Bug #23200 (Resolved): invalid JSON returned when querying pool parameters
When requesting JSON formatted results for querying for pool
parameters, the list that comes back is not valid JSON....
Wyllys Ingersoll
01:56 PM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Both the image HEAD and snapshot "snap" show a size of 10GB, so if your exported sizes are different, the export must... Jason Dillaman
09:49 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
> There are no logs indicating osd crash and the outputs of 'ceph daemon osd.x log dump' are all empty ({}).
The m...
Mykola Golub
08:26 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Jason Dillaman wrote:
> Can you please run "rados -p <pool name> listomapvals rbd_header.<image id>" and provide the...
宏伟 唐
12:06 PM Bug #23194 (Rejected): librados client is sending bad omap value just before program exits
Thanks Jason. You were absolutely right -- the omap get/put at exit is being driven by ganesha. I had missed that bef... Jeff Layton
04:37 AM Bug #23130: No error is shown when "osd_mon_report_interval_min" value is greater than "osd_mon_...
Jewel is scheduled to reach End of Life when Mimic is released (around June 2018). It's possible this issue will not ... Nathan Cutler

03/01/2018

11:46 PM Bug #23195 (Resolved): Read operations segfaulting multiple OSDs
I'm seeing some OSDs crashing at the same time with (mostly) the same error message related to a reading an erasure c... Paul Emmerich
11:14 PM Bug #23194: librados client is sending bad omap value just before program exits
... there was a "omap get" right before the store and the values stored where the (truncated) values that were just r... Jason Dillaman
10:38 PM Bug #23194: librados client is sending bad omap value just before program exits
rados_kv_get does look hinky, but I don't think we're calling into it here. We're basically doing a rados_kv_put into... Jeff Layton
09:53 PM Bug #23194: librados client is sending bad omap value just before program exits
I don't know what nfs-ganesha code to look at, but this [1] looks very suspect to me since you are returning a pointe... Jason Dillaman
09:43 PM Bug #23194: librados client is sending bad omap value just before program exits
Frame 201:
Object: rec-00000000:0000000000000017
Key: 6528071705456279553
Value: ::ffff:192.168.1.243-(37:Linux NF...
Jason Dillaman
09:16 PM Bug #23194: librados client is sending bad omap value just before program exits
I do have the ability to collect client logs within the container, and can turn up debugging in there if it'll help. Jeff Layton
08:56 PM Bug #23194: librados client is sending bad omap value just before program exits
Ahh, the object name is 29 bytes in this case, so maybe there is some confusion about lengths down in the code that i... Jeff Layton
08:49 PM Bug #23194 (Rejected): librados client is sending bad omap value just before program exits
I've been tracking down a problem in nfs-ganesha where an omap value in an object ends up truncated. It doesn't alway... Jeff Layton
11:47 AM Backport #23160 (Need More Info): luminous: Multiple asserts caused by DNE pgs left behind after ...
Waiting for code review for backport PR : https://github.com/ceph/ceph/pull/20668 Prashant D
11:16 AM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
... Kefu Chai
09:55 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Yoann Moulin wrote:
> David Zafman wrote:
> > Yoann Moulin wrote:
> > > is that normal all files in 11.5f_head hav...
Yoann Moulin
09:26 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
in attachment the result of the dump for each OSD with the good args Yoann Moulin
09:08 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
David Zafman wrote:
> Yoann Moulin wrote:
> > is that normal all files in 11.5f_head have size=0 on each replicate ...
Yoann Moulin
09:06 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
in attachment the result of the dump for each OSD
and the extended attributes for the files on disk :...
Yoann Moulin
01:55 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Yoann Moulin wrote:
> is that normal all files in 11.5f_head have size=0 on each replicate of the PG ?
>
> [...]
...
David Zafman
01:11 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Can you dump the object with something like the following.... Brad Hubbard
09:33 AM Backport #23186 (In Progress): luminous: ceph tell mds.* <command> prints only one matching usage
https://github.com/ceph/ceph/pull/20664 Kefu Chai
09:26 AM Backport #23186 (Resolved): luminous: ceph tell mds.* <command> prints only one matching usage
https://github.com/ceph/ceph/pull/20664 Kefu Chai
09:25 AM Bug #23125 (Duplicate): Bad help text when 'ceph osd pool' is run
Kefu Chai
02:38 AM Bug #23125: Bad help text when 'ceph osd pool' is run
I am working on this issue. Thanks. guotao Yao
08:36 AM Feature #23045 (Fix Under Review): mon: warn on slow ops in OpTracker
Kefu Chai
07:56 AM Feature #23045: mon: warn on slow ops in OpTracker
https://github.com/ceph/ceph/pull/20660 Chang Liu
03:30 AM Bug #23124: Status of OSDs are not showing properly after disabling ceph.target and ceph-osd.target
As OSDs are brought up by the udev rules, regardless of the enabled status of "ceph.target" and "ceph-osd.target" hen... Debashis Mondal
03:14 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Can you please run "rados -p <pool name> listomapvals rbd_header.<image id>" and provide the output? You can determin... Jason Dillaman
01:57 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Jason Dillaman wrote:
> Yes, snapshots are read-only so the only thing I can think of is some sort of data corruptio...
宏伟 唐
12:21 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Mykola Golub wrote:
> It looks like your log entries are from in memory log dump. Did you osd crash (could be seen i...
宏伟 唐

02/28/2018

10:41 PM Bug #23132 (Triaged): some config values should be unsigned, to disallow negative values
Josh Durgin
10:37 PM Bug #23130 (Triaged): No error is shown when "osd_mon_report_interval_min" value is greater than...
This only affects jewel since the osd_mon_report_interval_max option is no longer used in luminous and later. Josh Durgin
10:35 PM Bug #23128: invalid values in ceph.conf do not issue visible warnings
This is reverting to the default value since 1.1 is not a valide value for the option.
This is being improved with...
Josh Durgin
10:34 PM Bug #23128 (Triaged): invalid values in ceph.conf do not issue visible warnings
Josh Durgin
10:31 PM Bug #23125 (Triaged): Bad help text when 'ceph osd pool' is run
Josh Durgin
10:30 PM Bug #23124 (Won't Fix): Status of OSDs are not showing properly after disabling ceph.target and c...
As Nathan explained, this isn't how the targets are meant to work. Josh Durgin
10:27 PM Bug #23145: OSD crashes during recovery of EC pg
Sage, is this a bluestore issue, or did we lose the rollback info somewhere?
It looks like it's getting enoent for...
Josh Durgin
11:23 AM Backport #23181 (In Progress): jewel: Can't repair corrupt object info due to bad oid on all repl...
Nathan Cutler
11:22 AM Backport #23181 (Resolved): jewel: Can't repair corrupt object info due to bad oid on all replicas
https://github.com/ceph/ceph/pull/20622 Nathan Cutler
11:20 AM Backport #23174 (Resolved): luminous: SRV resolution fails to lookup AAAA records
https://github.com/ceph/ceph/pull/20710 Nathan Cutler
11:19 AM Bug #20471 (Pending Backport): Can't repair corrupt object info due to bad oid on all replicas
Nathan Cutler
10:33 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
is that normal all files in 11.5f_head have size=0 on each replicate of the PG ?... Yoann Moulin
08:17 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Here the result of the 3 commands for each replicate of the PG, osd.78 on iccluster020 is the one with the error :
...
Yoann Moulin
06:57 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
It looks like your log entries are from in memory log dump. Did you osd crash (could be seen in the log) or did you u... Mykola Golub
12:30 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Jason Dillaman wrote:
> Yes, snapshots are read-only so the only thing I can think of is some sort of data corruptio...
宏伟 唐
01:26 AM Bug #23078 (Pending Backport): SRV resolution fails to lookup AAAA records
Kefu Chai
01:19 AM Bug #22462 (Resolved): mon: unknown message type 1537 in luminous->mimic upgrade tests
Kefu Chai
01:13 AM Bug #22656: scrub mismatch on bytes (cache pools)
http://pulpito.ceph.com/kchai-2018-02-27_10:33:49-rados-wip-kefu-testing-2018-02-27-1348-distro-basic-mira/2232486/
...
Kefu Chai

02/27/2018

11:13 PM Bug #22902: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
This one looks like a similar failure: http://pulpito.ceph.com/nojha-2018-02-23_18:13:41-rados-wip-async-recovery-201... Neha Ojha
06:49 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
To summarize what I've figured out to reproduce this:
* both rbd client and mon are running 12.2.4, happened with ...
Paul Emmerich
05:46 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
Still happening on 12.2.4... Paul Emmerich
04:23 PM Bug #23124: Status of OSDs are not showing properly after disabling ceph.target and ceph-osd.target
The ceph.target and ceph-osd.target cannot be used this way. Assuming ceph-disk is being used, the OSDs are brought u... Nathan Cutler
04:08 PM Feature #22974 (Resolved): documentation - pg state table missing "activating" state
Nathan Cutler
04:08 PM Backport #23113 (Resolved): luminous: documentation - pg state table missing "activating" state
Nathan Cutler
12:55 PM Backport #23160 (Resolved): luminous: Multiple asserts caused by DNE pgs left behind after lots o...
https://github.com/ceph/ceph/pull/20668 Nathan Cutler
11:48 AM Bug #21716 (Resolved): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Nathan Cutler
11:47 AM Backport #21871 (Rejected): luminous: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
@smithfarm i am sorry that it turns out that this backport is not needed, because of http://tracker.ceph.com/issues/2... Nathan Cutler
08:58 AM Bug #23145 (Duplicate): OSD crashes during recovery of EC pg
I've got a cluster (running released debs of ceph 12.2.3) that started crashing on OSD startup a little bit ago. I di... Peter Woodman
06:12 AM Backport #23075 (In Progress): luminous: osd: objecter sends out of sync with pg epochs for proxi...
https://github.com/ceph/ceph/pull/20609 Prashant D

02/26/2018

09:18 PM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
... possible, but it actually does say it's replicated cache tiers in front of EC backends which should rule-out data... Jason Dillaman
09:01 PM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Couldn't this be related to #21639 (snapshots was not created/deleted against data pool)? The reported version here i... Mykola Golub
07:16 PM Bug #23119 (Need More Info): MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glan...
Yes, snapshots are read-only so the only thing I can think of is some sort of data corruption on the OSDs. Have you r... Jason Dillaman
08:23 PM Bug #22996 (Resolved): Snapset inconsistency is no longer detected
David Zafman
08:20 PM Backport #23054 (Resolved): luminous: Snapset inconsistency is no longer detected
https://github.com/ceph/ceph/pull/20501 David Zafman
08:04 PM Backport #23093 (Resolved): luminous: last-stat-seq returns 0 because osd stats are cleared
David Zafman
08:03 PM Bug #21833 (Pending Backport): Multiple asserts caused by DNE pgs left behind after lots of OSD r...
David Zafman
07:32 PM Feature #23087 (Duplicate): Add OSD metrics to keep track of per-client IO
We've discussed "rbd top" before (http://tracker.ceph.com/projects/ceph/wiki/CDM_07-DEC-2016, http://tracker.ceph.com... Greg Farnum
05:07 AM Bug #23132 (Triaged): some config values should be unsigned, to disallow negative values
Execution Steps:
-------------------
1. Set negative value for parameter "osd_heartbeat_interval" in ceph.conf
2....
Debashis Mondal
04:56 AM Backport #23114 (In Progress): luminous: can't delete object from pool when Ceph out of space
https://github.com/ceph/ceph/pull/20585 Prashant D
04:54 AM Bug #23130 (Triaged): No error is shown when "osd_mon_report_interval_min" value is greater than...
Execution Steps:
------------------
1. Set the "osd_mon_report_interval_min" value using CLI
# ceph daemon osd...
Debashis Mondal
04:37 AM Feature #23129: After creating a snapshot of a rados pool when we try to rollback the pool it all...
rados -p testpool rollback myobject1 testpool-snap
[Note :- Only mentioned object is roll backed from snapshot]
Debashis Mondal
04:35 AM Feature #23129 (New): After creating a snapshot of a rados pool when we try to rollback the pool ...
Execution Steps:
------------------
1. Creating a pool
# ceph osd pool create testpool 16 16
2. Add ...
Debashis Mondal
04:29 AM Bug #23128 (Triaged): invalid values in ceph.conf do not issue visible warnings
Execution Steps
-----------------
1. Change the setting of "mon osd down out interval" in ceph.conf as per below
...
Debashis Mondal
04:24 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Yes, size 0 object is expected since all copies report '"size": 0'.
The discrepancy appears to be in the omap data...
Brad Hubbard
04:10 AM Bug #23125 (Duplicate): Bad help text when 'ceph osd pool' is run
Execution Steps :
-----------------
1.While executing the cli for creating a snapshot of a pool
#ceph osd pool ...
Debashis Mondal
04:04 AM Bug #23124 (Won't Fix): Status of OSDs are not showing properly after disabling ceph.target and c...
Execution Steps:
----------------
1. # ceph osd tree [ceph is in running state]
2. # systemctl disab...
Debashis Mondal
03:49 AM Feature #23123 (New): use pwrite to emulate posix_fallocate
less IO when using a plain file as the store for testing bluestore if posix_fallocate() is not available.
see ht...
Kefu Chai
03:24 AM Backport #23113 (In Progress): luminous: documentation - pg state table missing "activating" state
https://github.com/ceph/ceph/pull/20584 Prashant D

02/25/2018

08:31 AM Bug #23119 (Need More Info): MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glan...
Ceph Version: 12.2.2 Luminous Stable
Problem description:
We use ceph as the backend storage for OpenStack Glance...
宏伟 唐

02/24/2018

07:14 PM Bug #23117 (Fix Under Review): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has ...
In the following setup:
* 6 OSD hosts
* Each host with 32 disks = 32 OSDs
* Pool with 2048 PGs, EC, k=4, m=2, crus...
Oliver Freyermuth
05:54 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
https://github.com/ceph/ceph/pull/20571 David Zafman
05:54 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
Not sure if these needs a Jewel backport David Zafman
11:22 AM Backport #23114 (Resolved): luminous: can't delete object from pool when Ceph out of space
https://github.com/ceph/ceph/pull/20585 Nathan Cutler
11:21 AM Backport #23113 (Resolved): luminous: documentation - pg state table missing "activating" state
https://github.com/ceph/ceph/pull/20584 Nathan Cutler
04:39 AM Feature #22974 (Pending Backport): documentation - pg state table missing "activating" state
https://github.com/ceph/ceph/pull/20504 Kefu Chai
04:35 AM Bug #23078: SRV resolution fails to lookup AAAA records
Kefu Chai
04:32 AM Bug #22952 (Duplicate): Monitor stopped responding after awhile
great! i am marking this ticket as a "duplicate". please reopen it if you think otherwise.
happy Chinese new year ...
Kefu Chai
04:20 AM Bug #22413 (Pending Backport): can't delete object from pool when Ceph out of space
Kefu Chai

02/23/2018

10:24 PM Feature #23096: mon: don't remove auth caps without a flag
We could throw an error instead, yeah. That is probably a wise forcing function. I think we still want the flag thoug... Greg Farnum
11:37 AM Feature #23096: mon: don't remove auth caps without a flag
Bit torn on this one: there is a security downside to changing this behaviour in-place -- any existing scripts that e... John Spray
01:08 AM Feature #23096 (New): mon: don't remove auth caps without a flag
With current syntax, something like... Greg Farnum
08:02 PM Bug #21833 (In Progress): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
David Zafman
02:03 AM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
I was working on this last week, but got distracted by other issues. I'm going to force this scenario and see about f... David Zafman
02:01 PM Backport #23103 (In Progress): luminous: v12.2.2 unable to create bluestore osd using ceph-disk
Nathan Cutler
01:50 PM Backport #23103 (Resolved): luminous: v12.2.2 unable to create bluestore osd using ceph-disk
https://github.com/ceph/ceph/pull/20563 Nathan Cutler
11:54 AM Bug #18165 (Pending Backport): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfil...
This should not have been marked Resolved when one of the backports was still open. Nathan Cutler
08:33 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Hello Brad,
Sorry I have been too fast,
the rados get with the good pool return a file with size=0...
Yoann Moulin
03:42 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Does "rados -p disks ls" list the object? Can you find the actual storage for this object on the disks used for these... Brad Hubbard

02/22/2018

11:56 PM Backport #23093 (In Progress): luminous: last-stat-seq returns 0 because osd stats are cleared
David Zafman
11:43 PM Backport #23093: luminous: last-stat-seq returns 0 because osd stats are cleared
https://github.com/ceph/ceph/pull/20548 David Zafman
05:52 PM Backport #23093 (Resolved): luminous: last-stat-seq returns 0 because osd stats are cleared

I added an assert which crashes ceph-mgr because PGMap::apply_incremental() processes a osd_stat_t that is all zero...
David Zafman
11:40 PM Bug #22882 (Fix Under Review): Objecter deadlocked on op budget while holding rwlock in ms_handle...
https://github.com/ceph/ceph/pull/20519 Greg Farnum
09:40 PM Bug #22952: Monitor stopped responding after awhile
Thanks, with the 12.2.3 + this patch, the cluster is now back to HEALTH_OK state Frank Li
06:37 PM Bug #22952: Monitor stopped responding after awhile
Kefu Chai wrote:
> Frank, sorry for the latency. i am just back from the holiday. i pushed 12.2.3 + https://github.c...
Frank Li
10:07 AM Bug #22952: Monitor stopped responding after awhile
Frank, sorry for the latency. i am just back from the holiday. i pushed 12.2.3 + https://github.com/ceph/ceph/pull/20... Kefu Chai
06:06 PM Bug #22662 (Resolved): ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Nathan Cutler
06:05 PM Backport #22866 (Resolved): jewel: ceph osd df json output validation reported invalid numbers (-...
Nathan Cutler
04:03 PM Bug #21121 (Resolved): test_health_warnings.sh can fail
Nathan Cutler
04:03 PM Backport #21239 (Resolved): jewel: test_health_warnings.sh can fail
Nathan Cutler
02:09 PM Backport #23077 (Need More Info): luminous: mon: ops get stuck in "resend forwarded message to le...
This backport has two master PRs:
* https://github.com/ceph/ceph/pull/20467
* https://github.com/ceph/ceph/pull/2...
Nathan Cutler
01:14 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Hello,
I'm also having this issue...
Yoann Moulin
12:54 PM Feature #23087 (Duplicate): Add OSD metrics to keep track of per-client IO
In our online clusters, there are times when some RBD images' size increase rapidly, which could fill up the whole cl... Xuehan Xu
11:10 AM Bug #22413 (Fix Under Review): can't delete object from pool when Ceph out of space
https://github.com/ceph/ceph/pull/20534 Kefu Chai
09:58 AM Bug #22354 (Pending Backport): v12.2.2 unable to create bluestore osd using ceph-disk
Kefu Chai
08:17 AM Bug #23078 (Fix Under Review): SRV resolution fails to lookup AAAA records
Kefu Chai
08:09 AM Bug #23078: SRV resolution fails to lookup AAAA records
In the meantime btw, a Round Robin IPv6 DNS record works just fine, something like:... Wido den Hollander
07:35 AM Bug #23078: SRV resolution fails to lookup AAAA records
Simon Leinen wrote:
> WANG Guoqin actually noted the lack of IPv6 support in "a comment on issue #14527":http://trac...
Wido den Hollander
06:29 AM Bug #22462 (Fix Under Review): mon: unknown message type 1537 in luminous->mimic upgrade tests
https://github.com/ceph/ceph/pull/20528 Kefu Chai
05:41 AM Bug #22462: mon: unknown message type 1537 in luminous->mimic upgrade tests
MMonHealth (MSG_MON_HEALTH=0x601 (1537)) was removed in https://github.com/ceph/ceph/commit/7b4a741fbda4dc817a003c694... Kefu Chai

02/21/2018

10:46 PM Feature #14527: Lookup monitors through DNS
WANG Guoqin wrote:
> The recent code doesn't support IPv6, apparently. Maybe we can choose among ns_t_a and ns_t_aaa...
Simon Leinen
10:44 PM Bug #23078: SRV resolution fails to lookup AAAA records
WANG Guoqin actually noted the lack of IPv6 support in "a comment on issue #14527":http://tracker.ceph.com/issues/145... Simon Leinen
10:26 PM Bug #23078 (Resolved): SRV resolution fails to lookup AAAA records
We have some IPv6 Rados clusters. So far we have been specifying the addresses of each cluster's three mons using li... Simon Leinen
09:56 PM Support #23005: Implement rados for Python library with some problem
Does this work without pyinstaller on your system? Josh Durgin
09:54 PM Bug #23029: osd does not handle eio on meta objects (e.g., osdmap)
We could at least fail more politely here even if we can't recover from it in the short term. Josh Durgin
09:50 PM Bug #23049: ceph Status shows only WARN when traffic to cluster fails
Can reproduce easily - thanks for the report.
2 bugs here - 1) the monitor is still enforcing the mon_osd_min_up_r...
Josh Durgin
09:46 PM Support #23050 (Closed): PG doesn't move to down state in replica pool
'stale' means there haven't been any reports from the primary in a while. Since there's no osd to report the status o... Josh Durgin
09:40 PM Bug #23051: PGs stuck in down state
Can you post the results of 'ceph pg $PGID query' for some of the down pgs? Josh Durgin
09:34 PM Bug #22994: rados bench doesn't use --max-objects
rados tool options are pretty confusing - help text should make more clear what the options are for bench vs load-gen... Josh Durgin
09:27 PM Backport #23076 (In Progress): jewel: osd: objecter sends out of sync with pg epochs for proxied ops
Nathan Cutler
09:26 PM Backport #23076 (Resolved): jewel: osd: objecter sends out of sync with pg epochs for proxied ops
https://github.com/ceph/ceph/pull/20518 Nathan Cutler
09:26 PM Backport #23077 (Resolved): luminous: mon: ops get stuck in "resend forwarded message to leader"
https://github.com/ceph/ceph/pull/21016 Nathan Cutler
09:26 PM Backport #23075 (Resolved): luminous: osd: objecter sends out of sync with pg epochs for proxied ops
https://github.com/ceph/ceph/pull/20609 Nathan Cutler
07:48 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
Oh, second PR for the OSD beacons and PG create messages: https://github.com/ceph/ceph/pull/20517 Greg Farnum
04:35 PM Bug #22114 (Pending Backport): mon: ops get stuck in "resend forwarded message to leader"
Sage Weil
04:34 PM Bug #22123 (Pending Backport): osd: objecter sends out of sync with pg epochs for proxied ops
Sage Weil
01:06 PM Bug #19737: EAGAIN encountered during pg scrub (jewel)
@Josh - thanks
https://github.com/ceph/ceph/pull/20508
Nathan Cutler
12:49 AM Bug #23031: FAILED assert(!parent->get_log().get_missing().is_missing(soid))

osd.0 was the primary before it crashed came back up and crashed again as original indicated in this bug. This is ...
David Zafman

02/20/2018

04:03 PM Backport #23054 (Resolved): luminous: Snapset inconsistency is no longer detected

The fix for #20243 required additional handling of snapset inconsistency. The Object info and snapset aren't part ...
David Zafman
12:26 PM Bug #23051 (New): PGs stuck in down state
Hello,
We see PGs stuck in down state even when the respective osds are started and recovered from the failure sc...
Nokia ceph-users
10:38 AM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
I can confirm this on 12.2.2. It makes data unavailable.
My output:...
Rafal Wadolowski
10:14 AM Support #23050: PG doesn't move to down state in replica pool
Please let me know of the required logs/info to be added if any. Nokia ceph-users
10:13 AM Support #23050 (Closed): PG doesn't move to down state in replica pool
Hello,
Environment used - 3 node cluster
Replication - 3
#ceph osd pool ls detail
pool 16 'cdvr_ec' replica...
Nokia ceph-users
09:45 AM Backport #17445 (Resolved): jewel: list-snap cache tier missing promotion logic (was: rbd cli seg...
Nathan Cutler
09:43 AM Feature #15835 (Resolved): filestore: randomize split threshold
Nathan Cutler
09:42 AM Backport #22658 (Resolved): filestore: randomize split threshold
Nathan Cutler
09:35 AM Backport #22794 (Resolved): jewel: heartbeat peers need to be updated when a new OSD added into a...
Nathan Cutler
09:33 AM Bug #20705 (Resolved): repair_test fails due to race with osd start
Nathan Cutler
09:33 AM Backport #22818 (Resolved): jewel: repair_test fails due to race with osd start
Nathan Cutler
09:04 AM Backport #23024 (In Progress): luminous: thrash-eio + bluestore (hangs with unfound objects or re...
https://github.com/ceph/ceph/pull/20495 Prashant D
06:16 AM Backport #23024: luminous: thrash-eio + bluestore (hangs with unfound objects or read_log_and_mis...
I'm on it. Prashant D
08:55 AM Bug #23049: ceph Status shows only WARN when traffic to cluster fails
Please let me know of the required logs/info to be added if any. Nokia ceph-users
08:54 AM Bug #23049 (New): ceph Status shows only WARN when traffic to cluster fails
Hello,
While using Kraken, i have seen the status change to ERR but in luminous we do not see the status of ceph ...
Nokia ceph-users
07:46 AM Bug #22996 (Pending Backport): Snapset inconsistency is no longer detected
https://github.com/ceph/ceph/pull/20450 Kefu Chai
05:30 AM Bug #19737: EAGAIN encountered during pg scrub (jewel)
Looked at the logs from http://pulpito.front.sepia.ceph.com/smithfarm-2018-02-06_21:07:15-rados-wip-jewel-backports-d... Josh Durgin

02/19/2018

10:59 PM Bug #18178 (Won't Fix): Unfound objects lost after OSD daemons restarted

Reasons this is being close
1. PG repair is moving to user mode so on the fly object repair probably won't use r...
David Zafman
09:58 PM Feature #23045: mon: warn on slow ops in OpTracker
I've assigned this to myself but I don't know when I can get to it, so if you want to work on this feel free to take it! Greg Farnum
09:56 PM Feature #23045 (Resolved): mon: warn on slow ops in OpTracker
The monitor has an OpTracker now, but it doesn't warn on slow ops the way the MDS or OSD do. We should enable that to... Greg Farnum
09:52 PM Bug #23030: osd: crash during recovery with assert(p != recovery_info.ss.clone_snap)and assert(re...
This snapshot assert looks like "Ceph Luminous - pg is down due to src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)... Greg Farnum
09:02 PM Feature #23044 (New): osd: use madvise with MADV_DONTDUMP to prevent cached data from being core ...
Idea here is to reduce the size of the coredumps but also to prevent sensitive data from being leaked. Patrick Donnelly
02:55 PM Bug #22123 (Fix Under Review): osd: objecter sends out of sync with pg epochs for proxied ops
https://github.com/ceph/ceph/pull/20484
I opted for the marginally more complex solution of cancelling multiple o...
Sage Weil

02/17/2018

02:20 AM Bug #23031 (New): FAILED assert(!parent->get_log().get_missing().is_missing(soid))
Using vstart to start 3 OSDs with -o filestore debug inject read err=1
Manually injectdataerr on all replicas of o...
David Zafman
12:37 AM Bug #23030 (Fix Under Review): osd: crash during recovery with assert(p != recovery_info.ss.clone...
I've got some OSDs in an 5/3 EC pool crashing during recovery. The crash happens simultaneously on 5 to 10 OSDs, some... Paul Emmerich
12:36 AM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
I looked at it briefly and the output doesn't make any sense to me, but I don't have a lot of context around what the... Greg Farnum

02/16/2018

11:49 PM Bug #22114 (Fix Under Review): mon: ops get stuck in "resend forwarded message to leader"
https://github.com/ceph/ceph/pull/20467 Greg Farnum
02:14 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
Greg Farnum wrote:
> Ummm, yep, that looks right to me at a quick glance! Can you submit a PR with that change? :)
...
hongpeng lu
02:04 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
Maybe not. you should check the code on github.com. hongpeng lu
01:22 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
hongpeng lu wrote:
> The messages can not be forwarded appropriately, you must change the code like this.
> [...]
...
Oleg Glushak
01:17 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
The messages can not be forwarded appropriately, you must change the code like this.... hongpeng lu
12:52 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
We have the same problem on all our Luminous clusters. Any news regarding fix?
Most stuck messages in our case are o...
Oleg Glushak
10:35 PM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
@nathan This doesn't have cache tier, so it would be a different issue. Maybe related to upgrade? David Zafman
07:58 PM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
@David I guess this is a duplicate, too? Nathan Cutler
04:27 PM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
seems reproducible, see
http://pulpito.ceph.com/teuthology-2018-02-16_01:15:03-upgrade:hammer-x-jewel-distro-basic-...
Yuri Weinstein
10:01 PM Bug #23029 (New): osd does not handle eio on meta objects (e.g., osdmap)
... Sage Weil
05:00 PM Bug #22063 (Duplicate): "RadosModel.h: 1703: FAILED assert(!version || comp->get_version64() == v...
David Zafman
04:59 PM Bug #22064 (Duplicate): "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi
David Zafman
11:03 AM Backport #23024 (Resolved): luminous: thrash-eio + bluestore (hangs with unfound objects or read_...
https://github.com/ceph/ceph/pull/20495 Nathan Cutler
12:07 AM Bug #21218 (Pending Backport): thrash-eio + bluestore (hangs with unfound objects or read_log_and...
Sage Weil

02/15/2018

06:53 PM Bug #22952: Monitor stopped responding after awhile
Frank Li wrote:
> either 12.2.2 + the patch or 12.2.3 RC + the patch would be good, whichever is more convenient to ...
Frank Li
05:09 PM Bug #22996 (Fix Under Review): Snapset inconsistency is no longer detected
David Zafman
04:13 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
I've got a cluster here where this issue is 100% reproducible when trying to delete snapshots. Let me know if we can ... Paul Emmerich
04:07 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
I'm also seeing this on 12.2.2. The crashing OSD has some bad PG which crashes it on startup. I first assumed the dis... Paul Emmerich
03:47 PM Backport #21871 (In Progress): luminous: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Nathan Cutler
03:45 PM Backport #21871 (Need More Info): luminous: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
somewhat non-trivial, @Kefu could you take a look? Nathan Cutler
03:40 PM Backport #21871 (In Progress): luminous: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Nathan Cutler
03:39 PM Backport #21872 (Resolved): jewel: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Nathan Cutler
07:34 AM Support #23005 (New): Implement rados for Python library with some problem
Hi all,
This is my first time to be here.
I use the ceph raods library to implement customize python code, and ...
Chen BO-YU

02/14/2018

10:02 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
I'm seeing this on Luminous. Some kRBD clients are sending requests of death killing the active monitor.
No special ...
Paul Emmerich
08:30 PM Bug #22462: mon: unknown message type 1537 in luminous->mimic upgrade tests
@Kefu could pls take a look? Yuri Weinstein
05:48 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
ok, I'll wait for 12.2.4 or a 12.2.3 + the patch then. Frank Li
09:10 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Frank Li wrote:
> just curious, I saw this patch got merged to the master branch and has the target version of 12.2....
Nathan Cutler
06:51 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
just curious, I saw this patch got merged to the master branch and has the target version of 12.2.3, does that mean i... Frank Li
06:50 AM Bug #22952: Monitor stopped responding after awhile
either 12.2.2 + the patch or 12.2.3 RC + the patch would be good, whichever is more convenient to build. Frank Li
06:05 AM Bug #22996: Snapset inconsistency is no longer detected
We also need this fix to include tests that happen in the QA suite to prevent a future regression! :)
(Presumably th...
Greg Farnum
03:39 AM Bug #22996: Snapset inconsistency is no longer detected
David Zafman
03:37 AM Bug #22996 (Resolved): Snapset inconsistency is no longer detected

The fix for #20243 required additional handling of snapset inconsistency. The Object info and snapset aren't part ...
David Zafman

02/13/2018

07:53 PM Bug #22994 (New): rados bench doesn't use --max-objects
It would be useful for testing OSD caching behavior if rados bench would respect --max-objects parameter. It seems t... Ben England
07:30 PM Bug #22992: mon: add RAM usage (including avail) to HealthMonitor::check_member_health?
Turned out it was just the monitor being thrashed (didn't realize we were doing that in kcephfs!): #22993
Still, m...
Patrick Donnelly
06:43 PM Bug #22992 (New): mon: add RAM usage (including avail) to HealthMonitor::check_member_health?
I'm looking into several MON_DOWN failures from
http://pulpito.ceph.com/pdonnell-2018-02-13_17:49:41-kcephfs-wip-p...
Patrick Donnelly
06:12 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
https://github.com/ceph/ceph/pull/20410 David Zafman
04:04 AM Bug #21218 (Fix Under Review): thrash-eio + bluestore (hangs with unfound objects or read_log_and...
David Zafman
12:27 PM Bug #22063: "RadosModel.h: 1703: FAILED assert(!version || comp->get_version64() == version)" inr...
Another jewel run with this bug:
* http://qa-proxy.ceph.com/teuthology/smithfarm-2018-02-06_21:07:15-rados-wip-jew...
Nathan Cutler
06:52 AM Bug #22952: Monitor stopped responding after awhile
Kefu Chai wrote:
> > I reproduced the issue in a seperate cluster
>
> could you share the steps to reproduce this...
Frank Li

02/12/2018

10:35 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)

This assert can only happen in the following two cases:
osd debug verify missing on start = true. Used in t...
David Zafman
10:07 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
For kefu's run above,... Sage Weil
03:07 AM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
thrash-eio + bluestore
/a/kchai-2018-02-11_04:16:47-rados-wip-kefu-testing-2018-02-11-0959-distro-basic-mira/2181825...
Kefu Chai
10:05 AM Bug #22354 (Fix Under Review): v12.2.2 unable to create bluestore osd using ceph-disk
https://github.com/ceph/ceph/pull/20400
Kefu Chai
09:52 AM Bug #22445: ceph osd metadata reports wrong "back_iface"
John Spray wrote:
> Hmm, this could well be the first time anyone's really tested the IPv6 path here.
https://git...
cory gu
09:27 AM Backport #22942 (In Progress): luminous: ceph osd force-create-pg cause all ceph-mon to crash and...
Nathan Cutler
08:57 AM Bug #22952: Monitor stopped responding after awhile
> I reproduced the issue in a seperate cluster
could you share the steps to reproduce this issue? so i can try it ...
Kefu Chai
05:58 AM Bug #22949 (Rejected): ceph_test_admin_socket_output --all times out
Kefu Chai
05:57 AM Bug #22949: ceph_test_admin_socket_output --all times out
thanks Brad. my bad, i thought the bug was in master also. closing this ticket, as the related PR is not yet merged. Kefu Chai

02/10/2018

08:50 AM Bug #22949: ceph_test_admin_socket_output --all times out
Brad Hubbard
08:39 AM Bug #22949: ceph_test_admin_socket_output --all times out
This is not a problem with the test (although it highlights a deficiency with error reporting which I'll submit a PR ... Brad Hubbard
02:32 AM Bug #22882 (In Progress): Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
I finally realized that the op throttler *does* drop the global rwlock while waiting for throttle, so it at least doe... Greg Farnum

02/09/2018

10:08 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Just FYI, Using this new patch, the leader ceph-mon will hung once it is up, and any kind of OSD command is ran, like... Frank Li
10:06 PM Bug #22952: Monitor stopped responding after awhile
Frank Li wrote:
> Frank Li wrote:
> > I reproduced the issue in a seperate cluster, it seems that whichever ceph-mo...
Frank Li
08:40 PM Bug #22952: Monitor stopped responding after awhile
Frank Li wrote:
> I reproduced the issue in a seperate cluster, it seems that whichever ceph-mon became the leader w...
Frank Li
08:35 PM Bug #22952: Monitor stopped responding after awhile
I reproduced the issue in a seperate cluster, it seems that whichever ceph-mon became the leader will be stuck, as I ... Frank Li
07:50 PM Feature #22973 (Duplicate): log lines when hitting "pg overdose protection"
You're right that it's bad! This will be fixed in the next luminous release after a belated backport finally happened... Greg Farnum
02:15 PM Feature #22973 (Duplicate): log lines when hitting "pg overdose protection"
After upgrading to Luminous we ran into situation where 10% of our pgs remained unavailable, stuck in "activating" st... Dan Stoner
04:24 PM Bug #22300 (Rejected): ceph osd reweightn command seems to change weight value
the parameter of reweigtn is an array of fixed point integers. and the integers are int(weight * 0x10000), where weig... Kefu Chai
02:20 PM Feature #22974 (Resolved): documentation - pg state table missing "activating" state
"activating" is not listed in the pg state table:
http://docs.ceph.com/docs/master/rados/operations/pg-states/
...
Dan Stoner
06:41 AM Bug #22949: ceph_test_admin_socket_output --all times out
Sure mate, added a patch to get better debugging and will test as soon as it's built. Brad Hubbard
12:24 AM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
Oh, and I had the LingerOp and Op conflated in my head a bit when looking at that before, but they are different.
...
Greg Farnum
12:03 AM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
Jason, how did you establish the number of in-flight ops? I wonder if maybe it *did* have them but they weren't able ... Greg Farnum
12:02 AM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
Okay, so presumably on resend you shouldn't need to grab op budget again, since it's already budgeted, right?
And ...
Greg Farnum

02/08/2018

02:37 PM Bug #22949: ceph_test_admin_socket_output --all times out
Brad, i am not able to reproduce this issue. could you help take a look? Kefu Chai
02:25 AM Bug #20086 (Resolved): LibRadosLockECPP.LockSharedDurPP gets EEXIST
Kefu Chai
02:24 AM Bug #22440 (Resolved): New pgs per osd hard limit can cause peering issues on existing clusters
@Nick, if you think this issue deserves a different fix, please feel free to reopen this ticket Kefu Chai
12:51 AM Bug #22848: Pull the cable,5mins later,Put back to the cable,pg stuck a long time ulitl to resta...
Hi Josh Durginm,
1.They both are fibre-optical cable in our networkcard.
2.Log files cann't be found yet,due to at...
Yong Wang

02/07/2018

11:09 PM Bug #22220: osd/ReplicatedPG.h:1667:14: internal compiler error: in force_type_die, at dwarf2out....
Fixed by gcc-7.3.1-2.fc26 gcc-7.3.1-2.fc27 in fc27 Brad Hubbard
10:49 PM Bug #22440: New pgs per osd hard limit can cause peering issues on existing clusters
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/20204
merged
Yuri Weinstein
09:44 PM Bug #22848: Pull the cable,5mins later,Put back to the cable,pg stuck a long time ulitl to resta...
Which cable are you pulling? Do you have logs from the monitors and osds? The default failure detection timeouts can ... Josh Durgin
09:40 PM Bug #22916 (Duplicate): OSD crashing in peering
Josh Durgin
09:40 PM Bug #21287 (Duplicate): 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->i...
Josh Durgin
03:52 AM Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"
see https://github.com/ceph/ceph/pull/16675 Chang Liu
02:37 AM Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"
we hit this bug too in ec pool 2+1, i find one peer did not receive one piece of op message sended from primary osd, ... lingjie kong
06:12 PM Bug #22952: Monitor stopped responding after awhile
here is where the first mon server is stuck, running mon_status hang:
[root@dl1-kaf101 frli]# ceph --admin-daemon /v...
Frank Li
06:06 PM Bug #22952 (Duplicate): Monitor stopped responding after awhile
After a crash of ceph-mon in 12.2.2 and using a private build provided by ceph developers, the ceph-mon would come up... Frank Li
06:06 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
https://tracker.ceph.com/issues/22952
ticket opened for ceph-mon not responding issue.
Frank Li
06:02 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
I'll open a separate ticket to track the monitor not responding issue. the fix for the force-create-pg issue is good. Frank Li
06:01 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Kefu Chai wrote:
> [...]
>
>
> the cluster formed a quorum of [0,1,2,3,4] since 18:02:21. and it was not in pro...
Frank Li
05:58 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Kefu Chai wrote:
> [...]
>
> was any osd up when you were testing?
Yes, but they were in Booting State, all of...
Frank Li
06:56 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
... Kefu Chai
06:12 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
... Kefu Chai
04:05 PM Bug #22746 (Resolved): osd/common: ceph-osd process is terminated by the logratote task
Kefu Chai
03:33 PM Bug #22949 (Rejected): ceph_test_admin_socket_output --all times out
http://pulpito.ceph.com/kchai-2018-02-07_01:22:25-rados-wip-kefu-testing-2018-02-06-1514-distro-basic-mira/2161301/ Kefu Chai
05:50 AM Backport #22942 (Resolved): luminous: ceph osd force-create-pg cause all ceph-mon to crash and un...
https://github.com/ceph/ceph/pull/20399 Nathan Cutler
05:01 AM Backport #22934 (Resolved): luminous: filestore journal replay does not guard omap operations
https://github.com/ceph/ceph/pull/21547 Nathan Cutler
12:54 AM Backport #22866 (In Progress): jewel: ceph osd df json output validation reported invalid numbers...
https://github.com/ceph/ceph/pull/20344 Prashant D

02/06/2018

08:01 PM Bug #22350 (Resolved): nearfull OSD count in 'ceph -w'
Sage Weil
07:49 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
so anything I can do to help recover the cluster ?? Frank Li
06:50 AM Bug #22847 (Pending Backport): ceph osd force-create-pg cause all ceph-mon to crash and unable to...
Kefu Chai
01:23 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
please see attached logs for when the monitor was started, and then later got into the stuck mode.
I just replaced t...
Frank Li
04:54 PM Bug #22920: filestore journal replay does not guard omap operations
lowering the priority since in practice we don't clone objects with omap on them. Sage Weil
04:53 PM Bug #22920 (Pending Backport): filestore journal replay does not guard omap operations
Sage Weil
04:07 PM Bug #22656: scrub mismatch on bytes (cache pools)
aah, just popped up on luminous: http://pulpito.ceph.com/yuriw-2018-02-05_23:07:16-rados-wip-yuri-testing-2018-02-05-... Sage Weil
02:24 PM Bug #20924: osd: leaked Session on osd.7
/a/yuriw-2018-02-02_20:31:37-rados-wip_yuri_master_2.2.18-distro-basic-smithi/2143177 Sage Weil
 

Also available in: Atom