Project

General

Profile

Activity

From 02/19/2018 to 03/20/2018

03/20/2018

12:32 PM Bug #23145 (In Progress): OSD crashes during recovery of EC pg
Radoslaw Zarzynski
12:32 PM Bug #23145: OSD crashes during recovery of EC pg
Sorry for missing your updates, Peter. :-( I've just scripted my Gmail for _X-Redmine-Project: bluestore_.
From th...
Radoslaw Zarzynski

03/19/2018

09:35 PM Bug #23145 (New): OSD crashes during recovery of EC pg
Nathan Cutler
07:32 PM Bug #23145: OSD crashes during recovery of EC pg
Can't seem to flip this ticket out of 'Needs more info', unfortunately.. Peter Woodman
04:42 PM Backport #23413 (Resolved): jewel: delete type mismatch in CephContext teardown
https://github.com/ceph/ceph/pull/21084 Nathan Cutler
04:42 PM Backport #23412 (Resolved): luminous: delete type mismatch in CephContext teardown
https://github.com/ceph/ceph/pull/20998 Nathan Cutler
04:42 PM Backport #23408 (Resolved): luminous: mgrc's ms_handle_reset races with send_pgstats()
https://github.com/ceph/ceph/pull/23791 Nathan Cutler
04:26 PM Bug #23267 (In Progress): scrub errors not cleared on replicas can cause inconsistent pg state wh...
David Zafman
04:00 PM Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica tak...
David Zafman
01:00 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
Appears the error is with calculating the host weight.
It has set it at 43.664 when it should be set to 43.668
...
Warren Jeffs
10:34 AM Bug #23403 (Closed): Mon cannot join quorum
Hi all,
On a 3-mon cluster running infernalis one of the mon left the quorum and we are unable to make it come bac...
Gauvain Pocentek
10:23 AM Backport #23351 (In Progress): luminous: filestore: do_copy_range replay bad return value
https://github.com/ceph/ceph/pull/20957 Prashant D
09:24 AM Bug #23402 (Duplicate): objecter: does not resend op on split interval
... Sage Weil
09:01 AM Bug #23370 (Pending Backport): mgrc's ms_handle_reset races with send_pgstats()
Kefu Chai

03/18/2018

10:19 PM Bug #23339 (Resolved): Scrub errors after ec-small-objects-overwrites test
http://pulpito.ceph.com/sage-2018-03-18_09:19:17-rados-wip-sage-testing-2018-03-18-0231-distro-basic-smithi/ Sage Weil

03/17/2018

02:08 AM Bug #23395: qa/standalone/special/ceph_objectstore_tool.py causes ceph-mon core dump

../qa/run-standalone.sh ceph_objectstore_tool.py
--- ../qa/standalone/special/ceph_objectstore_tool.py ---
vst...
David Zafman
02:05 AM Bug #23395 (Can't reproduce): qa/standalone/special/ceph_objectstore_tool.py causes ceph-mon core...

I assume erasure code profile handling must have changed. It shouldn't crash but we may need a test change too.
...
David Zafman

03/16/2018

10:38 PM Feature #23364: Special scrub handling of hinfo_key errors
https://github.com/ceph/ceph/pull/20947 David Zafman
08:37 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
Appears Paul Emmerich has found the problem and its down the weights.
The email chain can be seen from the mailin...
Warren Jeffs
09:22 AM Bug #23386 (Resolved): crush device class: Monitor Crash when moving Bucket into Default root
When moving prestaged hosts with disks that out side of a root moving them into the root, causes the monitor to crash... Warren Jeffs
08:08 PM Bug #23339 (Fix Under Review): Scrub errors after ec-small-objects-overwrites test
http://pulpito.ceph.com/sage-2018-03-16_17:59:04-rados:thrash-erasure-code-overwrites-wip-sage-testing-2018-03-16-112... Sage Weil
05:09 PM Bug #23352: osd: segfaults under normal operation
Here is the link to the core dump https://drive.google.com/open?id=1tOTqSOaS94gOhHfXmGbbfuXLNFFfOVuf Alex Gorbachev
04:34 PM Bug #23324 (Pending Backport): delete type mismatch in CephContext teardown
Kefu Chai
03:03 AM Bug #23324 (In Progress): delete type mismatch in CephContext teardown
https://github.com/ceph/ceph/pull/20930 Brad Hubbard
01:38 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
Forgot to mention the exact place it breaks:... Daniel Glaser
10:21 AM Bug #23387 (Resolved): Building Ceph on armhf fails due to out-of-memory
Hi,
I'm currently struggling with building ceph through make-deps.sh on a armhf (namely the ODROID HC2). Everythin...
Daniel Glaser
09:16 AM Bug #23385: osd: master osd crash when pg scrub
The ceph version is 10.2.3 rongzhen zhan
09:11 AM Bug #23385 (New): osd: master osd crash when pg scrub
my ceph on arm 4.4.52-armada-17.06.2.I put a object into rados.when scrub the pg with handle,the master osd crash.bel... rongzhen zhan
08:56 AM Bug #23320: OSD suicide itself because of a firewall rule but reports a received signal
Can I have some inputs on this topic ? I can make the PR but I'd love having your opinion on it.
Thx,
Anonymous

03/15/2018

06:00 PM Bug #23145: OSD crashes during recovery of EC pg
Let me know if you need anything else off this cluster, I probably will have to trash this busted PG at some point so... Peter Woodman
05:37 AM Bug #23370 (Fix Under Review): mgrc's ms_handle_reset races with send_pgstats()
https://github.com/ceph/ceph/pull/20909 Kefu Chai
05:34 AM Bug #23370 (Resolved): mgrc's ms_handle_reset races with send_pgstats()
2018-03-14T12:29:45.168 INFO:teuthology.orchestra.run.mira056:Running: 'sudo adjust-ulimits ceph-coverage /home/ubunt... Kefu Chai
05:34 AM Bug #23371 (New): OSDs flaps when cluster network is made down
we are having a 5 node cluster with 5 mons and 120 OSDs equally distributed.
As part of our resiliency test we ma...
Nokia ceph-users
04:06 AM Backport #23315 (In Progress): luminous: pool create cmd's expected_num_objects is not correctly ...
https://github.com/ceph/ceph/pull/20907 Prashant D

03/14/2018

09:37 PM Bug #22346: OSD_ORPHAN issues after jewel->luminous upgrade, but orphaned osds not in crushmap
Hi Jun,
It's not really possible to pinpoint an exact PR at this stage as it's possible there was more than one an...
Brad Hubbard
10:19 AM Bug #22346: OSD_ORPHAN issues after jewel->luminous upgrade, but orphaned osds not in crushmap
Brad Hubbard wrote:
> Hi Graham,
>
> The consensus is that this was caused by a bug in a previous release which f...
huang jun
08:41 PM Bug #23365 (New): CEPH device class not honored for erasure encoding.
To start, this cluster isn't happy. It is my destructive testing/learning cluster.
Recently I rebuilt the cluster...
Brian Woods
08:36 PM Feature #23364 (Resolved): Special scrub handling of hinfo_key errors

We shouldn't handle hinfo_key as just another user xattr
Add the following errors specific to hinfo_key for eras...
David Zafman
06:32 PM Bug #23361 (New): /build/ceph-12.2.4/src/osd/PGLog.h: 888: FAILED assert(i->prior_version == last...

Log with debug_osd=20 and debug_bluestore=20 enabled:
https://drive.google.com/open?id=1Yr_MIXHzrgWUR5ZsV1xKlPUqZH...
Christoffer Lilja
04:49 PM Bug #23360 (Duplicate): call to 'ceph osd erasure-code-profile set' asserts the monitors
duplicate of http://tracker.ceph.com/issues/23345 Joao Eduardo Luis
04:16 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
A proper fix would be to provide a proper error message in @OSDMonitor::parse_erasure_code_profile@ instead of assert... Sebastian Wagner
04:15 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
Found the cause of this. From the mon.a.log:... Sebastian Wagner
03:08 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
Hm. quite possible that this is in fact not a classc deadlock.
Turns out, the `ceph` command line tool is also br...
Sebastian Wagner
02:52 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
The @send_command()@ function visible in this traceback is: https://github.com/ceph/ceph/pull/20865/files#diff-188b91... Sebastian Wagner
02:48 PM Bug #23360: call to 'ceph osd erasure-code-profile set' asserts the monitors
Could you point to the code, or provide a small python example, that triggers this deadlock? Ricardo Dias
02:37 PM Bug #23360 (Duplicate): call to 'ceph osd erasure-code-profile set' asserts the monitors
I've attached `thread apply all bt` mixed with `thread apply all py-bt`
Threads 38 35 34 32 and 31 are waiting for...
Sebastian Wagner
03:48 PM Bug #23352: osd: segfaults under normal operation
Sage, PM'ed to you the public download link, hope it works. Alex Gorbachev
03:39 PM Bug #23352: osd: segfaults under normal operation
HI Sage, I do have the core dump. Where can I upload the file, it's rather large, 850 MB compressed. Alex Gorbachev
01:54 PM Bug #23352 (Need More Info): osd: segfaults under normal operation
Do you have a core file? I haven't seen this crash before. Sage Weil
02:13 AM Bug #23352 (Resolved): osd: segfaults under normal operation
-1> 2018-03-13 22:03:27.390956 7f42eec36700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1520993007390955, "job": 454,... Alex Gorbachev
01:58 PM Bug #23345: `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
Sage Weil
01:55 PM Bug #23339: Scrub errors after ec-small-objects-overwrites test
Sage Weil
10:59 AM Bug #22351: Couldn't init storage provider (RADOS)
@Brad - that's perfect, thanks. Backport PR open. Nathan Cutler
10:27 AM Bug #22351: Couldn't init storage provider (RADOS)
@Nathan Oops, sorry mate, my bad.
These are the two we need.
https://github.com/ceph/ceph/pull/20022
https:/...
Brad Hubbard
09:44 AM Bug #22351: Couldn't init storage provider (RADOS)
@Brad - I was confused because you changed the status to Resolved, apparently before the backport was done.
Could ...
Nathan Cutler
12:25 AM Bug #22351: Couldn't init storage provider (RADOS)
@Nathan There wasn't one, I just set the backport field?
Just let me know if you need any action from me on this.
Brad Hubbard
10:57 AM Backport #23349 (In Progress): luminous: Couldn't init storage provider (RADOS)
Nathan Cutler
07:13 AM Documentation #23354 (Resolved): doc: osd_op_queue & osd_op_queue_cut_off
In docs:
osd_op_queue default is: `prio`. Real is `wpq`. So this is a doc's bug.
If I understand properly: if o...
Konstantin Shalygin
05:12 AM Backport #23312 (In Progress): luminous: invalid JSON returned when querying pool parameters
https://github.com/ceph/ceph/pull/20890 Prashant D

03/13/2018

11:15 PM Backport #23307 (In Progress): jewel: ceph-objectstore-tool command to trim the pg log
David Zafman
10:22 PM Backport #23351 (Resolved): luminous: filestore: do_copy_range replay bad return value
https://github.com/ceph/ceph/pull/20957 Nathan Cutler
10:22 PM Bug #23298 (Pending Backport): filestore: do_copy_range replay bad return value
Sage Weil
10:13 PM Backport #23323 (Resolved): luminous: ERROR type entries of pglog do not update min_last_complete...
Nathan Cutler
09:58 PM Backport #23349 (Resolved): luminous: Couldn't init storage provider (RADOS)
https://github.com/ceph/ceph/pull/20896 Nathan Cutler
09:46 PM Bug #22351 (Pending Backport): Couldn't init storage provider (RADOS)
@Brad, I missed which PR is the luminous backport PR? Nathan Cutler
09:27 PM Bug #22887: osd/ECBackend.cc: 2202: FAILED assert((offset + length) <= (range.first.get_off() + r...
Here's another: /ceph/teuthology-archive/pdonnell-2018-03-11_22:42:18-multimds-wip-pdonnell-testing-20180311.180352-t... Patrick Donnelly
09:20 PM Bug #23345: `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
Running either... Joao Eduardo Luis
09:09 PM Bug #23345 (Resolved): `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
Coming into OSDMonitor::parse_erasure_code_profile() will trigger an assert that probably should be an error instead.... Joao Eduardo Luis
08:58 PM Bug #22902 (In Progress): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine eve...
David Zafman
08:55 PM Bug #23282 (New): If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0...
Greg Farnum
11:44 AM Bug #23282 (Closed): If you add extra characters to an fsid, it gets parsed as "00000000-0000-000...
John Spray
04:00 AM Bug #23282: If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0000-00...
Greg Farnum wrote:
> So it got better when you took away the extra "80" prefix?
yes,my mistake.
Amine Liu
05:16 PM Bug #23339 (Resolved): Scrub errors after ec-small-objects-overwrites test

dzafman-2018-03-12_08:11:53-rados-wip-zafman-testing-distro-basic-smithi/2283533...
David Zafman
07:13 AM Bug #23258: OSDs keep crashing.
... Jan Marquardt
07:04 AM Bug #23258: OSDs keep crashing.
We are now having the same issue on osd.1, osd.11, osd.20 and osd.25, each located on different host. osd.1 uses file... Jan Marquardt
06:13 AM Bug #23324: delete type mismatch in CephContext teardown
This has to do with the use of placement new in the overload of Log::create_entry with the expected_size argument. I'... Brad Hubbard

03/12/2018

10:56 PM Bug #22902: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")

OSD 4 is the primary [6,5,4]/[4,5,7] with osd.6 crashing...
David Zafman
09:18 PM Bug #22050 (Resolved): ERROR type entries of pglog do not update min_last_complete_ondisk, potent...
Josh Durgin
04:45 PM Bug #22050 (Pending Backport): ERROR type entries of pglog do not update min_last_complete_ondisk...
Josh Durgin
09:12 PM Bug #23325: osd_max_pg_per_osd.py: race between pool creation and wait_for_clean
Seen here as well: http://pulpito.ceph.com/nojha-2018-03-02_23:59:23-rados-wip-async-recovery-2018-03-02-distro-basic... Neha Ojha
09:06 PM Bug #23325 (New): osd_max_pg_per_osd.py: race between pool creation and wait_for_clean
Seen in http://pulpito.ceph.com/joshd-2018-03-12_15:49:43-rados-wip-pg-log-trim-error-luminous-distro-basic-smithi/22... Josh Durgin
06:22 PM Bug #23324: delete type mismatch in CephContext teardown
It looks more to me like we're allocating an object of one type (Entry) and then casting it to another (Log)? Is ther... Jeff Layton
05:16 PM Bug #23324: delete type mismatch in CephContext teardown
I don't recognize this from elsewhere and it looks like the kind of issue that could arise from trying to delete some... Greg Farnum
04:56 PM Bug #23324: delete type mismatch in CephContext teardown
Package in this case is:
librados2-13.0.1-2356.gf2b88f364515.fc27.x86_64
Jeff Layton
04:51 PM Bug #23324 (Resolved): delete type mismatch in CephContext teardown
I've been hunting some memory corruption in ganesha and ran across this. Seems unlikely to be the cause of the crashe... Jeff Layton
05:19 PM Bug #23282: If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0000-00...
So it got better when you took away the extra "80" prefix? Greg Farnum
06:31 AM Bug #23282: If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0000-00...
My mistake. I don't know why there's an extra "80" of fsid in My conf.
Amine Liu
05:19 PM Bug #23290: "/test/osd/RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basi...
Is that the "the disk errored out" bug? Greg Farnum
04:44 PM Backport #23323 (Resolved): luminous: ERROR type entries of pglog do not update min_last_complete...
https://github.com/ceph/ceph/pull/20851 Josh Durgin
01:39 PM Bug #22656: scrub mismatch on bytes (cache pools)
/a/sage-2018-03-11_23:03:25-rados-wip-sage2-testing-2018-03-10-1616-distro-basic-smithi/2280391
description: rados...
Sage Weil
01:09 PM Bug #23320: OSD suicide itself because of a firewall rule but reports a received signal
I used this url https://www.mkssoftware.com/docs/man5/siginfo_t.5.asp#Signal_Codes to get a better understanding of t... Anonymous
01:08 PM Bug #23320: OSD suicide itself because of a firewall rule but reports a received signal
I'm attaching the patch for more readability. Anonymous
11:13 AM Bug #23320 (Resolved): OSD suicide itself because of a firewall rule but reports a received signal
We (leseb & I) had an issue where the OSD crashes with the following message :
2018-03-08 14:30:26.042607 7f6142b7...
Anonymous
10:40 AM Bug #23281 (Resolved): run-tox-ceph-disk fails in luminous's "make check" run by jenkins
Nathan Cutler
10:39 AM Bug #23283 (Duplicate): os/bluestore:cache arise a Segmentation fault
Duplicated https://tracker.ceph.com/issues/21259 Igor Fedotov
10:23 AM Bug #23258: OSDs keep crashing.
After extending the cluster to 40 osds and removing osd.11 from it, the problem has moved to osd.1:... Jan Marquardt
09:16 AM Backport #23316 (Resolved): jewel: pool create cmd's expected_num_objects is not correctly interp...
https://github.com/ceph/ceph/pull/22050 Nathan Cutler
09:16 AM Backport #23315 (Resolved): luminous: pool create cmd's expected_num_objects is not correctly int...
https://github.com/ceph/ceph/pull/20907 Nathan Cutler
09:14 AM Backport #23312 (Resolved): luminous: invalid JSON returned when querying pool parameters
https://github.com/ceph/ceph/pull/20890 Nathan Cutler
09:14 AM Backport #23307 (Resolved): jewel: ceph-objectstore-tool command to trim the pg log
https://github.com/ceph/ceph/pull/20882 Nathan Cutler

03/11/2018

11:04 PM Bug #23297: mon-seesaw 'failed to become clean before timeout' due to laggy pg create
/a/sage-2018-03-11_02:12:48-rados-wip-sage2-testing-2018-03-10-1616-distro-basic-smithi/2276594 Sage Weil
02:19 AM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
Anyways, the only place where this can happen is if @snap_seq < max(removed_snaps)@ because the deletion request inse... Paul Emmerich
12:36 AM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
Well, turns out there were both 12.2.1 and 12.2.4 clients doing snapshot operations. This messed up removed_snaps due... Paul Emmerich

03/10/2018

11:28 PM Bug #22351 (Resolved): Couldn't init storage provider (RADOS)
All of these PRs have merged on the RADOS side. Brad Hubbard
09:00 PM Bug #23298: filestore: do_copy_range replay bad return value
https://github.com/ceph/ceph/pull/20832 Sage Weil
08:55 PM Bug #23298 (Resolved): filestore: do_copy_range replay bad return value
+ if (r < 0 && replaying) {
+ assert(r == -ERANGE);
+ derr << "Filestore: short source tolerated because we ...
Sage Weil
08:41 PM Bug #23297 (Fix Under Review): mon-seesaw 'failed to become clean before timeout' due to laggy pg...
The OSD gets the pg_create but for a future osdmap and never gets the osdmap due to the mons being slow and thrashy.
...
Sage Weil
12:31 AM Bug #22050 (Fix Under Review): ERROR type entries of pglog do not update min_last_complete_ondisk...
https://github.com/ceph/ceph/pull/20827
Backport only needed to luminous since error pg log entries did not exist ...
Josh Durgin
12:04 AM Bug #23294 (New): OSD booted with noup never got marked in; pgs stuck peering while osd up, but out
http://pulpito.ceph.com/joshd-2018-03-09_22:47:53-rados-master-distro-basic-smithi/2273020/
This test restarts osd...
Josh Durgin

03/09/2018

11:38 PM Bug #22351: Couldn't init storage provider (RADOS)
I have another couldn't init storage provider, I think a better error message would definitely benefit in fixing this... Vasu Kulkarni
10:02 PM Bug #22902: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
Another instance: http://pulpito.ceph.com/joshd-2018-03-09_00:39:29-rados-wip-pg-log-trim-errors-distro-basic-smithi/... Josh Durgin
09:45 PM Bug #22530 (Pending Backport): pool create cmd's expected_num_objects is not correctly interpreted
https://github.com/ceph/ceph/pull/19651 Josh Durgin
03:56 PM Bug #23290 (New): "/test/osd/RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distr...
Run: http://pulpito.ceph.com/teuthology-2018-03-09_01:15:03-upgrade:hammer-x-jewel-distro-basic-smithi/
Jobs: '22671...
Yuri Weinstein
10:33 AM Bug #23282: If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0000-00...
[root@s3-master-1 ceph]# ceph --show-config --debug_mon=5|grep fsid
fsid = 00000000-0000-0000-0000-00...
Amine Liu
08:37 AM Bug #23282: If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0000-00...
[root@s3-master-1 my-cluster]# ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.s3-master-1.asok ter-1.aso... Amine Liu
08:08 AM Bug #23282 (New): If you add extra characters to an fsid, it gets parsed as "00000000-0000-0000-0...

[root@s3-master-1 my-cluster]# ceph --show-config |grep fsid
fsid = 00000000-0000-0000-0000-000000000000
[root@...
Amine Liu
08:42 AM Bug #23283 (Duplicate): os/bluestore:cache arise a Segmentation fault
_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_s... tangwenjun tang
08:33 AM Bug #23145: OSD crashes during recovery of EC pg
Sure thing. There's 400MB of logs waiting for you at 67404d04-3a55-4afc-8ea2-1d3fc74c3c28. Peter Woodman
08:17 AM Bug #23281 (Fix Under Review): run-tox-ceph-disk fails in luminous's "make check" run by jenkins
https://github.com/ceph/ceph/pull/20817
Kefu Chai
07:18 AM Bug #23281 (Resolved): run-tox-ceph-disk fails in luminous's "make check" run by jenkins
without full path specified... Kefu Chai
04:03 AM Bug #23200 (Pending Backport): invalid JSON returned when querying pool parameters
Kefu Chai
02:14 AM Support #23279 (New): OSD data directory does not exist
I created OSD succeed at all ,it was result of ceph -s cmd “health ok” but use cmd “ journalctl -xe” why remainder ... Alex Liu
01:22 AM Bug #22050: ERROR type entries of pglog do not update min_last_complete_ondisk, potentially ballo...
running a fix through testing http://pulpito.ceph.com/joshd-2018-03-09_00:39:29-rados-wip-pg-log-trim-errors-distro-b... Josh Durgin
12:48 AM Backport #23275 (Resolved): luminous: ceph-objectstore-tool command to trim the pg log
David Zafman
12:48 AM Feature #23242 (Pending Backport): ceph-objectstore-tool command to trim the pg log
David Zafman

03/08/2018

07:41 PM Backport #23275: luminous: ceph-objectstore-tool command to trim the pg log
Switching backport to use queue_transactions() at least for the rmkeys portion. David Zafman
07:32 PM Backport #23275: luminous: ceph-objectstore-tool command to trim the pg log
Deadlock in Luminous:
Thread 6 (Thread 0x7f236f0b9700 (LWP 21238)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../n...
David Zafman
04:02 PM Backport #23275 (Resolved): luminous: ceph-objectstore-tool command to trim the pg log
https://github.com/ceph/ceph/pull/20803 David Zafman
01:04 PM Feature #23242: ceph-objectstore-tool command to trim the pg log
https://github.com/ceph/ceph/pull/20786 Vikhyat Umrao
07:12 AM Bug #23273 (New): segmentation fault in PrimaryLogPG::recover_got()
we encounter this fault while using cephfs, it seems be similar to this issue http://tracker.ceph.com/issues/17645
...
Yan Jun

03/07/2018

11:26 PM Bug #23270 (New): failed mutex assert in PipeConnection::try_get_pipe() (via OSD::do_command())
... Sage Weil
09:29 PM Bug #23269 (New): Early use of clog in OSD startup crashes OSD

This crash occurred because log_weirdness() called osd->clog->error() probably out of init() -> load_pgs() -> read_...
David Zafman
05:13 PM Bug #23128: invalid values in ceph.conf do not issue visible warnings
Josh Durgin wrote:
> This is being improved with a centralized configuration stored in the monitors in mimic.
I...
Joshua Schmid
05:01 PM Bug #23267 (Resolved): scrub errors not cleared on replicas can cause inconsistent pg state when ...

The PG_STATE_INCONSISTENT flag is set based on num_scrub_errors. A pg query can show after scrub inconsistencies r...
David Zafman
12:39 PM Bug #22092 (Resolved): ceph-kvstore-tool's store-crc command does not save result to the file as ...
Chang Liu
11:48 AM Bug #23258: OSDs keep crashing.
Additional info: We were running on Kraken until last week, than upgraded to 12.2.3, where the problems started and u... Jan Marquardt
11:44 AM Bug #23258 (New): OSDs keep crashing.
At least two OSDs (#11 and #20) on two different hosts in our cluster keep crashing, which prevent our cluster to get... Jan Marquardt
10:33 AM Backport #23256 (In Progress): luminous: bluestore: should recalc_allocated when decoding bluefs_...
https://github.com/ceph/ceph/pull/20771 Kefu Chai
10:32 AM Backport #23256 (Resolved): luminous: bluestore: should recalc_allocated when decoding bluefs_fno...
https://github.com/ceph/ceph/pull/20771 Kefu Chai
10:30 AM Bug #23212 (Pending Backport): bluestore: should recalc_allocated when decoding bluefs_fnode_t
Kefu Chai
05:01 AM Feature #23242: ceph-objectstore-tool command to trim the pg log
The assert(num_unsent <= log_queue.size()) probably doesn't relate directly with this feature. The log_weirdness() f... David Zafman
01:02 AM Feature #23242: ceph-objectstore-tool command to trim the pg log

From PG::log_weirdness():
2018-03-06 16:18:57.413 7f0a593b9dc0 -1 log_channel(cluster) log [ERR] : 1.0 log bound...
David Zafman
12:26 AM Feature #23242 (In Progress): ceph-objectstore-tool command to trim the pg log

When testing the log trimming code on master the OSD crashes like this....
David Zafman
02:34 AM Feature #23236 (Rejected): should allow osd to dump slow ops
Kefu Chai

03/06/2018

07:25 PM Feature #23236: should allow osd to dump slow ops
Oh yep, that'll do it. So I'm a bit confused what this ticket is supposed to mean. Greg Farnum
09:55 AM Feature #23236: should allow osd to dump slow ops
Isn't this what dump_blocked_ops is for? See also https://tracker.ceph.com/issues/23205 John Spray
04:23 AM Feature #23236: should allow osd to dump slow ops
I guess this is saying we don’t have a slow-only output command? dump_ops_in_flight et al certainly will print them o... Greg Farnum
04:11 AM Feature #23236 (Rejected): should allow osd to dump slow ops
after f4b74125e44fe78154fb377fa06fc08b3325859d, we have no way to print out the slow ops of OSDs. only a summary is o... Kefu Chai
03:01 PM Bug #23145 (Need More Info): OSD crashes during recovery of EC pg
Radoslaw Zarzynski
02:29 PM Bug #23145: OSD crashes during recovery of EC pg
@Peter:
Is there a chance to get a log with both OSD and BlueStore debugs levels turned to 20? At the moment I can't...
Radoslaw Zarzynski
01:51 PM Feature #23242 (Resolved): ceph-objectstore-tool command to trim the pg log
ceph-objectstore-tool command to trim the pg log
The motive of this bug is to have a command to trim the pg log with...
Vikhyat Umrao
12:54 PM Bug #23200 (Fix Under Review): invalid JSON returned when querying pool parameters
https://github.com/ceph/ceph/pull/20745. Chang Liu
06:07 AM Bug #23233 (Duplicate): The randomness of the hash function causes the object to be inhomogeneous...
Nathan Cutler
02:41 AM Bug #23233: The randomness of the hash function causes the object to be inhomogeneous to the PG.T...
Sorry, this question is not completed. Please ignore it. junwei liao
02:34 AM Bug #23233 (Duplicate): The randomness of the hash function causes the object to be inhomogeneous...
junwei liao
05:30 AM Backport #23077 (New): luminous: mon: ops get stuck in "resend forwarded message to leader"
These are both done and the backport can proceed. :) Greg Farnum
03:03 AM Bug #23235 (New): The randomness of the hash function causes the object to be inhomogeneous to th...
The randomness of the ceph_str_hash_rjenkins hash function causes the object to be inhomogeneous to the PG.The result... junwei liao
12:26 AM Bug #20924: osd: leaked Session on osd.7
/a/kchai-2018-03-05_17:31:09-rados-wip-kefu-testing-2018-03-05-2238-distro-basic-smithi/2252897 Kefu Chai

03/05/2018

07:12 PM Bug #23228 (Closed): scrub mismatch on objects
... Sage Weil
06:27 PM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
saw this again,... Sage Weil
04:45 PM Bug #23215: config.cc: ~/.ceph/$cluster.conf is passed unexpanded to fopen()
(I think this is rbd, right?) John Spray
06:01 AM Bug #23215 (Resolved): config.cc: ~/.ceph/$cluster.conf is passed unexpanded to fopen()
parse_file() in "src/dmclock/sim/src/ConfUtils.cc" receives a filename without the tilde being expanded to correspond... Rishabh Dave
09:34 AM Backport #23174 (In Progress): luminous: SRV resolution fails to lookup AAAA records
https://github.com/ceph/ceph/pull/20710 Prashant D
03:37 AM Bug #23212: bluestore: should recalc_allocated when decoding bluefs_fnode_t
https://github.com/ceph/ceph/pull/20701 Kefu Chai
03:35 AM Bug #23212 (Resolved): bluestore: should recalc_allocated when decoding bluefs_fnode_t
... Kefu Chai
03:20 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
宏伟 唐 wrote:
> 宏伟 唐 wrote:
> > Mykola Golub wrote:
> > > > There are no logs indicating osd crash and the outputs o...
宏伟 唐
02:36 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
宏伟 唐 wrote:
> Mykola Golub wrote:
> > > There are no logs indicating osd crash and the outputs of 'ceph daemon osd....
宏伟 唐

03/04/2018

02:33 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Mykola Golub wrote:
> > There are no logs indicating osd crash and the outputs of 'ceph daemon osd.x log dump' are a...
宏伟 唐

03/02/2018

10:40 PM Bug #18165 (Resolved): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_target...
David Zafman
10:23 PM Bug #23204 (Duplicate): missing primary copy of object in mixed luminous<->master cluster with bl...
The dead jobs here failed due to this:
http://pulpito.ceph.com/yuriw-2018-03-01_22:45:38-upgrade:luminous-x-wip-yu...
Josh Durgin
09:21 PM Bug #22050: ERROR type entries of pglog do not update min_last_complete_ondisk, potentially ballo...
This seems to be biting rgw's usage pools when rgw-admin usage trim occurs in pgs with little other activity. Josh Durgin
04:18 PM Bug #23200 (Resolved): invalid JSON returned when querying pool parameters
When requesting JSON formatted results for querying for pool
parameters, the list that comes back is not valid JSON....
Wyllys Ingersoll
01:56 PM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Both the image HEAD and snapshot "snap" show a size of 10GB, so if your exported sizes are different, the export must... Jason Dillaman
09:49 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
> There are no logs indicating osd crash and the outputs of 'ceph daemon osd.x log dump' are all empty ({}).
The m...
Mykola Golub
08:26 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Jason Dillaman wrote:
> Can you please run "rados -p <pool name> listomapvals rbd_header.<image id>" and provide the...
宏伟 唐
12:06 PM Bug #23194 (Rejected): librados client is sending bad omap value just before program exits
Thanks Jason. You were absolutely right -- the omap get/put at exit is being driven by ganesha. I had missed that bef... Jeff Layton
04:37 AM Bug #23130: No error is shown when "osd_mon_report_interval_min" value is greater than "osd_mon_...
Jewel is scheduled to reach End of Life when Mimic is released (around June 2018). It's possible this issue will not ... Nathan Cutler

03/01/2018

11:46 PM Bug #23195 (Resolved): Read operations segfaulting multiple OSDs
I'm seeing some OSDs crashing at the same time with (mostly) the same error message related to a reading an erasure c... Paul Emmerich
11:14 PM Bug #23194: librados client is sending bad omap value just before program exits
... there was a "omap get" right before the store and the values stored where the (truncated) values that were just r... Jason Dillaman
10:38 PM Bug #23194: librados client is sending bad omap value just before program exits
rados_kv_get does look hinky, but I don't think we're calling into it here. We're basically doing a rados_kv_put into... Jeff Layton
09:53 PM Bug #23194: librados client is sending bad omap value just before program exits
I don't know what nfs-ganesha code to look at, but this [1] looks very suspect to me since you are returning a pointe... Jason Dillaman
09:43 PM Bug #23194: librados client is sending bad omap value just before program exits
Frame 201:
Object: rec-00000000:0000000000000017
Key: 6528071705456279553
Value: ::ffff:192.168.1.243-(37:Linux NF...
Jason Dillaman
09:16 PM Bug #23194: librados client is sending bad omap value just before program exits
I do have the ability to collect client logs within the container, and can turn up debugging in there if it'll help. Jeff Layton
08:56 PM Bug #23194: librados client is sending bad omap value just before program exits
Ahh, the object name is 29 bytes in this case, so maybe there is some confusion about lengths down in the code that i... Jeff Layton
08:49 PM Bug #23194 (Rejected): librados client is sending bad omap value just before program exits
I've been tracking down a problem in nfs-ganesha where an omap value in an object ends up truncated. It doesn't alway... Jeff Layton
11:47 AM Backport #23160 (Need More Info): luminous: Multiple asserts caused by DNE pgs left behind after ...
Waiting for code review for backport PR : https://github.com/ceph/ceph/pull/20668 Prashant D
11:16 AM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
... Kefu Chai
09:55 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Yoann Moulin wrote:
> David Zafman wrote:
> > Yoann Moulin wrote:
> > > is that normal all files in 11.5f_head hav...
Yoann Moulin
09:26 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
in attachment the result of the dump for each OSD with the good args Yoann Moulin
09:08 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
David Zafman wrote:
> Yoann Moulin wrote:
> > is that normal all files in 11.5f_head have size=0 on each replicate ...
Yoann Moulin
09:06 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
in attachment the result of the dump for each OSD
and the extended attributes for the files on disk :...
Yoann Moulin
01:55 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Yoann Moulin wrote:
> is that normal all files in 11.5f_head have size=0 on each replicate of the PG ?
>
> [...]
...
David Zafman
01:11 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Can you dump the object with something like the following.... Brad Hubbard
09:33 AM Backport #23186 (In Progress): luminous: ceph tell mds.* <command> prints only one matching usage
https://github.com/ceph/ceph/pull/20664 Kefu Chai
09:26 AM Backport #23186 (Resolved): luminous: ceph tell mds.* <command> prints only one matching usage
https://github.com/ceph/ceph/pull/20664 Kefu Chai
09:25 AM Bug #23125 (Duplicate): Bad help text when 'ceph osd pool' is run
Kefu Chai
02:38 AM Bug #23125: Bad help text when 'ceph osd pool' is run
I am working on this issue. Thanks. guotao Yao
08:36 AM Feature #23045 (Fix Under Review): mon: warn on slow ops in OpTracker
Kefu Chai
07:56 AM Feature #23045: mon: warn on slow ops in OpTracker
https://github.com/ceph/ceph/pull/20660 Chang Liu
03:30 AM Bug #23124: Status of OSDs are not showing properly after disabling ceph.target and ceph-osd.target
As OSDs are brought up by the udev rules, regardless of the enabled status of "ceph.target" and "ceph-osd.target" hen... Debashis Mondal
03:14 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Can you please run "rados -p <pool name> listomapvals rbd_header.<image id>" and provide the output? You can determin... Jason Dillaman
01:57 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Jason Dillaman wrote:
> Yes, snapshots are read-only so the only thing I can think of is some sort of data corruptio...
宏伟 唐
12:21 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Mykola Golub wrote:
> It looks like your log entries are from in memory log dump. Did you osd crash (could be seen i...
宏伟 唐

02/28/2018

10:41 PM Bug #23132 (Triaged): some config values should be unsigned, to disallow negative values
Josh Durgin
10:37 PM Bug #23130 (Triaged): No error is shown when "osd_mon_report_interval_min" value is greater than...
This only affects jewel since the osd_mon_report_interval_max option is no longer used in luminous and later. Josh Durgin
10:35 PM Bug #23128: invalid values in ceph.conf do not issue visible warnings
This is reverting to the default value since 1.1 is not a valide value for the option.
This is being improved with...
Josh Durgin
10:34 PM Bug #23128 (Triaged): invalid values in ceph.conf do not issue visible warnings
Josh Durgin
10:31 PM Bug #23125 (Triaged): Bad help text when 'ceph osd pool' is run
Josh Durgin
10:30 PM Bug #23124 (Won't Fix): Status of OSDs are not showing properly after disabling ceph.target and c...
As Nathan explained, this isn't how the targets are meant to work. Josh Durgin
10:27 PM Bug #23145: OSD crashes during recovery of EC pg
Sage, is this a bluestore issue, or did we lose the rollback info somewhere?
It looks like it's getting enoent for...
Josh Durgin
11:23 AM Backport #23181 (In Progress): jewel: Can't repair corrupt object info due to bad oid on all repl...
Nathan Cutler
11:22 AM Backport #23181 (Resolved): jewel: Can't repair corrupt object info due to bad oid on all replicas
https://github.com/ceph/ceph/pull/20622 Nathan Cutler
11:20 AM Backport #23174 (Resolved): luminous: SRV resolution fails to lookup AAAA records
https://github.com/ceph/ceph/pull/20710 Nathan Cutler
11:19 AM Bug #20471 (Pending Backport): Can't repair corrupt object info due to bad oid on all replicas
Nathan Cutler
10:33 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
is that normal all files in 11.5f_head have size=0 on each replicate of the PG ?... Yoann Moulin
08:17 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Here the result of the 3 commands for each replicate of the PG, osd.78 on iccluster020 is the one with the error :
...
Yoann Moulin
06:57 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
It looks like your log entries are from in memory log dump. Did you osd crash (could be seen in the log) or did you u... Mykola Golub
12:30 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Jason Dillaman wrote:
> Yes, snapshots are read-only so the only thing I can think of is some sort of data corruptio...
宏伟 唐
01:26 AM Bug #23078 (Pending Backport): SRV resolution fails to lookup AAAA records
Kefu Chai
01:19 AM Bug #22462 (Resolved): mon: unknown message type 1537 in luminous->mimic upgrade tests
Kefu Chai
01:13 AM Bug #22656: scrub mismatch on bytes (cache pools)
http://pulpito.ceph.com/kchai-2018-02-27_10:33:49-rados-wip-kefu-testing-2018-02-27-1348-distro-basic-mira/2232486/
...
Kefu Chai

02/27/2018

11:13 PM Bug #22902: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
This one looks like a similar failure: http://pulpito.ceph.com/nojha-2018-02-23_18:13:41-rados-wip-async-recovery-201... Neha Ojha
06:49 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
To summarize what I've figured out to reproduce this:
* both rbd client and mon are running 12.2.4, happened with ...
Paul Emmerich
05:46 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
Still happening on 12.2.4... Paul Emmerich
04:23 PM Bug #23124: Status of OSDs are not showing properly after disabling ceph.target and ceph-osd.target
The ceph.target and ceph-osd.target cannot be used this way. Assuming ceph-disk is being used, the OSDs are brought u... Nathan Cutler
04:08 PM Feature #22974 (Resolved): documentation - pg state table missing "activating" state
Nathan Cutler
04:08 PM Backport #23113 (Resolved): luminous: documentation - pg state table missing "activating" state
Nathan Cutler
12:55 PM Backport #23160 (Resolved): luminous: Multiple asserts caused by DNE pgs left behind after lots o...
https://github.com/ceph/ceph/pull/20668 Nathan Cutler
11:48 AM Bug #21716 (Resolved): ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Nathan Cutler
11:47 AM Backport #21871 (Rejected): luminous: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
@smithfarm i am sorry that it turns out that this backport is not needed, because of http://tracker.ceph.com/issues/2... Nathan Cutler
08:58 AM Bug #23145 (Duplicate): OSD crashes during recovery of EC pg
I've got a cluster (running released debs of ceph 12.2.3) that started crashing on OSD startup a little bit ago. I di... Peter Woodman
06:12 AM Backport #23075 (In Progress): luminous: osd: objecter sends out of sync with pg epochs for proxi...
https://github.com/ceph/ceph/pull/20609 Prashant D

02/26/2018

09:18 PM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
... possible, but it actually does say it's replicated cache tiers in front of EC backends which should rule-out data... Jason Dillaman
09:01 PM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Couldn't this be related to #21639 (snapshots was not created/deleted against data pool)? The reported version here i... Mykola Golub
07:16 PM Bug #23119 (Need More Info): MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glan...
Yes, snapshots are read-only so the only thing I can think of is some sort of data corruption on the OSDs. Have you r... Jason Dillaman
08:23 PM Bug #22996 (Resolved): Snapset inconsistency is no longer detected
David Zafman
08:20 PM Backport #23054 (Resolved): luminous: Snapset inconsistency is no longer detected
https://github.com/ceph/ceph/pull/20501 David Zafman
08:04 PM Backport #23093 (Resolved): luminous: last-stat-seq returns 0 because osd stats are cleared
David Zafman
08:03 PM Bug #21833 (Pending Backport): Multiple asserts caused by DNE pgs left behind after lots of OSD r...
David Zafman
07:32 PM Feature #23087 (Duplicate): Add OSD metrics to keep track of per-client IO
We've discussed "rbd top" before (http://tracker.ceph.com/projects/ceph/wiki/CDM_07-DEC-2016, http://tracker.ceph.com... Greg Farnum
05:07 AM Bug #23132 (Triaged): some config values should be unsigned, to disallow negative values
Execution Steps:
-------------------
1. Set negative value for parameter "osd_heartbeat_interval" in ceph.conf
2....
Debashis Mondal
04:56 AM Backport #23114 (In Progress): luminous: can't delete object from pool when Ceph out of space
https://github.com/ceph/ceph/pull/20585 Prashant D
04:54 AM Bug #23130 (Triaged): No error is shown when "osd_mon_report_interval_min" value is greater than...
Execution Steps:
------------------
1. Set the "osd_mon_report_interval_min" value using CLI
# ceph daemon osd...
Debashis Mondal
04:37 AM Feature #23129: After creating a snapshot of a rados pool when we try to rollback the pool it all...
rados -p testpool rollback myobject1 testpool-snap
[Note :- Only mentioned object is roll backed from snapshot]
Debashis Mondal
04:35 AM Feature #23129 (New): After creating a snapshot of a rados pool when we try to rollback the pool ...
Execution Steps:
------------------
1. Creating a pool
# ceph osd pool create testpool 16 16
2. Add ...
Debashis Mondal
04:29 AM Bug #23128 (Triaged): invalid values in ceph.conf do not issue visible warnings
Execution Steps
-----------------
1. Change the setting of "mon osd down out interval" in ceph.conf as per below
...
Debashis Mondal
04:24 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Yes, size 0 object is expected since all copies report '"size": 0'.
The discrepancy appears to be in the omap data...
Brad Hubbard
04:10 AM Bug #23125 (Duplicate): Bad help text when 'ceph osd pool' is run
Execution Steps :
-----------------
1.While executing the cli for creating a snapshot of a pool
#ceph osd pool ...
Debashis Mondal
04:04 AM Bug #23124 (Won't Fix): Status of OSDs are not showing properly after disabling ceph.target and c...
Execution Steps:
----------------
1. # ceph osd tree [ceph is in running state]
2. # systemctl disab...
Debashis Mondal
03:49 AM Feature #23123 (New): use pwrite to emulate posix_fallocate
less IO when using a plain file as the store for testing bluestore if posix_fallocate() is not available.
see ht...
Kefu Chai
03:24 AM Backport #23113 (In Progress): luminous: documentation - pg state table missing "activating" state
https://github.com/ceph/ceph/pull/20584 Prashant D

02/25/2018

08:31 AM Bug #23119 (Need More Info): MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glan...
Ceph Version: 12.2.2 Luminous Stable
Problem description:
We use ceph as the backend storage for OpenStack Glance...
宏伟 唐

02/24/2018

07:14 PM Bug #23117 (Fix Under Review): PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has ...
In the following setup:
* 6 OSD hosts
* Each host with 32 disks = 32 OSDs
* Pool with 2048 PGs, EC, k=4, m=2, crus...
Oliver Freyermuth
05:54 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
https://github.com/ceph/ceph/pull/20571 David Zafman
05:54 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
Not sure if these needs a Jewel backport David Zafman
11:22 AM Backport #23114 (Resolved): luminous: can't delete object from pool when Ceph out of space
https://github.com/ceph/ceph/pull/20585 Nathan Cutler
11:21 AM Backport #23113 (Resolved): luminous: documentation - pg state table missing "activating" state
https://github.com/ceph/ceph/pull/20584 Nathan Cutler
04:39 AM Feature #22974 (Pending Backport): documentation - pg state table missing "activating" state
https://github.com/ceph/ceph/pull/20504 Kefu Chai
04:35 AM Bug #23078: SRV resolution fails to lookup AAAA records
Kefu Chai
04:32 AM Bug #22952 (Duplicate): Monitor stopped responding after awhile
great! i am marking this ticket as a "duplicate". please reopen it if you think otherwise.
happy Chinese new year ...
Kefu Chai
04:20 AM Bug #22413 (Pending Backport): can't delete object from pool when Ceph out of space
Kefu Chai

02/23/2018

10:24 PM Feature #23096: mon: don't remove auth caps without a flag
We could throw an error instead, yeah. That is probably a wise forcing function. I think we still want the flag thoug... Greg Farnum
11:37 AM Feature #23096: mon: don't remove auth caps without a flag
Bit torn on this one: there is a security downside to changing this behaviour in-place -- any existing scripts that e... John Spray
01:08 AM Feature #23096 (New): mon: don't remove auth caps without a flag
With current syntax, something like... Greg Farnum
08:02 PM Bug #21833 (In Progress): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
David Zafman
02:03 AM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
I was working on this last week, but got distracted by other issues. I'm going to force this scenario and see about f... David Zafman
02:01 PM Backport #23103 (In Progress): luminous: v12.2.2 unable to create bluestore osd using ceph-disk
Nathan Cutler
01:50 PM Backport #23103 (Resolved): luminous: v12.2.2 unable to create bluestore osd using ceph-disk
https://github.com/ceph/ceph/pull/20563 Nathan Cutler
11:54 AM Bug #18165 (Pending Backport): OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfil...
This should not have been marked Resolved when one of the backports was still open. Nathan Cutler
08:33 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Hello Brad,
Sorry I have been too fast,
the rados get with the good pool return a file with size=0...
Yoann Moulin
03:42 AM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Does "rados -p disks ls" list the object? Can you find the actual storage for this object on the disks used for these... Brad Hubbard

02/22/2018

11:56 PM Backport #23093 (In Progress): luminous: last-stat-seq returns 0 because osd stats are cleared
David Zafman
11:43 PM Backport #23093: luminous: last-stat-seq returns 0 because osd stats are cleared
https://github.com/ceph/ceph/pull/20548 David Zafman
05:52 PM Backport #23093 (Resolved): luminous: last-stat-seq returns 0 because osd stats are cleared

I added an assert which crashes ceph-mgr because PGMap::apply_incremental() processes a osd_stat_t that is all zero...
David Zafman
11:40 PM Bug #22882 (Fix Under Review): Objecter deadlocked on op budget while holding rwlock in ms_handle...
https://github.com/ceph/ceph/pull/20519 Greg Farnum
09:40 PM Bug #22952: Monitor stopped responding after awhile
Thanks, with the 12.2.3 + this patch, the cluster is now back to HEALTH_OK state Frank Li
06:37 PM Bug #22952: Monitor stopped responding after awhile
Kefu Chai wrote:
> Frank, sorry for the latency. i am just back from the holiday. i pushed 12.2.3 + https://github.c...
Frank Li
10:07 AM Bug #22952: Monitor stopped responding after awhile
Frank, sorry for the latency. i am just back from the holiday. i pushed 12.2.3 + https://github.com/ceph/ceph/pull/20... Kefu Chai
06:06 PM Bug #22662 (Resolved): ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Nathan Cutler
06:05 PM Backport #22866 (Resolved): jewel: ceph osd df json output validation reported invalid numbers (-...
Nathan Cutler
04:03 PM Bug #21121 (Resolved): test_health_warnings.sh can fail
Nathan Cutler
04:03 PM Backport #21239 (Resolved): jewel: test_health_warnings.sh can fail
Nathan Cutler
02:09 PM Backport #23077 (Need More Info): luminous: mon: ops get stuck in "resend forwarded message to le...
This backport has two master PRs:
* https://github.com/ceph/ceph/pull/20467
* https://github.com/ceph/ceph/pull/2...
Nathan Cutler
01:14 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Hello,
I'm also having this issue...
Yoann Moulin
12:54 PM Feature #23087 (Duplicate): Add OSD metrics to keep track of per-client IO
In our online clusters, there are times when some RBD images' size increase rapidly, which could fill up the whole cl... Xuehan Xu
11:10 AM Bug #22413 (Fix Under Review): can't delete object from pool when Ceph out of space
https://github.com/ceph/ceph/pull/20534 Kefu Chai
09:58 AM Bug #22354 (Pending Backport): v12.2.2 unable to create bluestore osd using ceph-disk
Kefu Chai
08:17 AM Bug #23078 (Fix Under Review): SRV resolution fails to lookup AAAA records
Kefu Chai
08:09 AM Bug #23078: SRV resolution fails to lookup AAAA records
In the meantime btw, a Round Robin IPv6 DNS record works just fine, something like:... Wido den Hollander
07:35 AM Bug #23078: SRV resolution fails to lookup AAAA records
Simon Leinen wrote:
> WANG Guoqin actually noted the lack of IPv6 support in "a comment on issue #14527":http://trac...
Wido den Hollander
06:29 AM Bug #22462 (Fix Under Review): mon: unknown message type 1537 in luminous->mimic upgrade tests
https://github.com/ceph/ceph/pull/20528 Kefu Chai
05:41 AM Bug #22462: mon: unknown message type 1537 in luminous->mimic upgrade tests
MMonHealth (MSG_MON_HEALTH=0x601 (1537)) was removed in https://github.com/ceph/ceph/commit/7b4a741fbda4dc817a003c694... Kefu Chai

02/21/2018

10:46 PM Feature #14527: Lookup monitors through DNS
WANG Guoqin wrote:
> The recent code doesn't support IPv6, apparently. Maybe we can choose among ns_t_a and ns_t_aaa...
Simon Leinen
10:44 PM Bug #23078: SRV resolution fails to lookup AAAA records
WANG Guoqin actually noted the lack of IPv6 support in "a comment on issue #14527":http://tracker.ceph.com/issues/145... Simon Leinen
10:26 PM Bug #23078 (Resolved): SRV resolution fails to lookup AAAA records
We have some IPv6 Rados clusters. So far we have been specifying the addresses of each cluster's three mons using li... Simon Leinen
09:56 PM Support #23005: Implement rados for Python library with some problem
Does this work without pyinstaller on your system? Josh Durgin
09:54 PM Bug #23029: osd does not handle eio on meta objects (e.g., osdmap)
We could at least fail more politely here even if we can't recover from it in the short term. Josh Durgin
09:50 PM Bug #23049: ceph Status shows only WARN when traffic to cluster fails
Can reproduce easily - thanks for the report.
2 bugs here - 1) the monitor is still enforcing the mon_osd_min_up_r...
Josh Durgin
09:46 PM Support #23050 (Closed): PG doesn't move to down state in replica pool
'stale' means there haven't been any reports from the primary in a while. Since there's no osd to report the status o... Josh Durgin
09:40 PM Bug #23051: PGs stuck in down state
Can you post the results of 'ceph pg $PGID query' for some of the down pgs? Josh Durgin
09:34 PM Bug #22994: rados bench doesn't use --max-objects
rados tool options are pretty confusing - help text should make more clear what the options are for bench vs load-gen... Josh Durgin
09:27 PM Backport #23076 (In Progress): jewel: osd: objecter sends out of sync with pg epochs for proxied ops
Nathan Cutler
09:26 PM Backport #23076 (Resolved): jewel: osd: objecter sends out of sync with pg epochs for proxied ops
https://github.com/ceph/ceph/pull/20518 Nathan Cutler
09:26 PM Backport #23077 (Resolved): luminous: mon: ops get stuck in "resend forwarded message to leader"
https://github.com/ceph/ceph/pull/21016 Nathan Cutler
09:26 PM Backport #23075 (Resolved): luminous: osd: objecter sends out of sync with pg epochs for proxied ops
https://github.com/ceph/ceph/pull/20609 Nathan Cutler
07:48 PM Bug #22114: mon: ops get stuck in "resend forwarded message to leader"
Oh, second PR for the OSD beacons and PG create messages: https://github.com/ceph/ceph/pull/20517 Greg Farnum
04:35 PM Bug #22114 (Pending Backport): mon: ops get stuck in "resend forwarded message to leader"
Sage Weil
04:34 PM Bug #22123 (Pending Backport): osd: objecter sends out of sync with pg epochs for proxied ops
Sage Weil
01:06 PM Bug #19737: EAGAIN encountered during pg scrub (jewel)
@Josh - thanks
https://github.com/ceph/ceph/pull/20508
Nathan Cutler
12:49 AM Bug #23031: FAILED assert(!parent->get_log().get_missing().is_missing(soid))

osd.0 was the primary before it crashed came back up and crashed again as original indicated in this bug. This is ...
David Zafman

02/20/2018

04:03 PM Backport #23054 (Resolved): luminous: Snapset inconsistency is no longer detected

The fix for #20243 required additional handling of snapset inconsistency. The Object info and snapset aren't part ...
David Zafman
12:26 PM Bug #23051 (New): PGs stuck in down state
Hello,
We see PGs stuck in down state even when the respective osds are started and recovered from the failure sc...
Nokia ceph-users
10:38 AM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
I can confirm this on 12.2.2. It makes data unavailable.
My output:...
Rafal Wadolowski
10:14 AM Support #23050: PG doesn't move to down state in replica pool
Please let me know of the required logs/info to be added if any. Nokia ceph-users
10:13 AM Support #23050 (Closed): PG doesn't move to down state in replica pool
Hello,
Environment used - 3 node cluster
Replication - 3
#ceph osd pool ls detail
pool 16 'cdvr_ec' replica...
Nokia ceph-users
09:45 AM Backport #17445 (Resolved): jewel: list-snap cache tier missing promotion logic (was: rbd cli seg...
Nathan Cutler
09:43 AM Feature #15835 (Resolved): filestore: randomize split threshold
Nathan Cutler
09:42 AM Backport #22658 (Resolved): filestore: randomize split threshold
Nathan Cutler
09:35 AM Backport #22794 (Resolved): jewel: heartbeat peers need to be updated when a new OSD added into a...
Nathan Cutler
09:33 AM Bug #20705 (Resolved): repair_test fails due to race with osd start
Nathan Cutler
09:33 AM Backport #22818 (Resolved): jewel: repair_test fails due to race with osd start
Nathan Cutler
09:04 AM Backport #23024 (In Progress): luminous: thrash-eio + bluestore (hangs with unfound objects or re...
https://github.com/ceph/ceph/pull/20495 Prashant D
06:16 AM Backport #23024: luminous: thrash-eio + bluestore (hangs with unfound objects or read_log_and_mis...
I'm on it. Prashant D
08:55 AM Bug #23049: ceph Status shows only WARN when traffic to cluster fails
Please let me know of the required logs/info to be added if any. Nokia ceph-users
08:54 AM Bug #23049 (New): ceph Status shows only WARN when traffic to cluster fails
Hello,
While using Kraken, i have seen the status change to ERR but in luminous we do not see the status of ceph ...
Nokia ceph-users
07:46 AM Bug #22996 (Pending Backport): Snapset inconsistency is no longer detected
https://github.com/ceph/ceph/pull/20450 Kefu Chai
05:30 AM Bug #19737: EAGAIN encountered during pg scrub (jewel)
Looked at the logs from http://pulpito.front.sepia.ceph.com/smithfarm-2018-02-06_21:07:15-rados-wip-jewel-backports-d... Josh Durgin

02/19/2018

10:59 PM Bug #18178 (Won't Fix): Unfound objects lost after OSD daemons restarted

Reasons this is being close
1. PG repair is moving to user mode so on the fly object repair probably won't use r...
David Zafman
09:58 PM Feature #23045: mon: warn on slow ops in OpTracker
I've assigned this to myself but I don't know when I can get to it, so if you want to work on this feel free to take it! Greg Farnum
09:56 PM Feature #23045 (Resolved): mon: warn on slow ops in OpTracker
The monitor has an OpTracker now, but it doesn't warn on slow ops the way the MDS or OSD do. We should enable that to... Greg Farnum
09:52 PM Bug #23030: osd: crash during recovery with assert(p != recovery_info.ss.clone_snap)and assert(re...
This snapshot assert looks like "Ceph Luminous - pg is down due to src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)... Greg Farnum
09:02 PM Feature #23044 (New): osd: use madvise with MADV_DONTDUMP to prevent cached data from being core ...
Idea here is to reduce the size of the coredumps but also to prevent sensitive data from being leaked. Patrick Donnelly
02:55 PM Bug #22123 (Fix Under Review): osd: objecter sends out of sync with pg epochs for proxied ops
https://github.com/ceph/ceph/pull/20484
I opted for the marginally more complex solution of cancelling multiple o...
Sage Weil
 

Also available in: Atom