Project

General

Profile

Activity

From 12/27/2017 to 01/25/2018

01/25/2018

07:59 PM Bug #20243 (Resolved): Improve size scrub error handling and ignore system attrs in xattr checking
David Zafman
07:59 PM Backport #21051 (Resolved): luminous: Improve size scrub error handling and ignore system attrs i...
David Zafman
07:58 PM Bug #21382 (Resolved): Erasure code recovery should send additional reads if necessary
David Zafman
07:56 PM Bug #22145 (Resolved): PG stuck in recovery_unfound
David Zafman
07:56 PM Bug #20059 (Resolved): miscounting degraded objects
David Zafman
07:55 PM Backport #22724 (Resolved): luminous: miscounting degraded objects
David Zafman
07:33 PM Backport #22724 (Fix Under Review): luminous: miscounting degraded objects
Included in https://github.com/ceph/ceph/pull/20055 David Zafman
07:55 PM Backport #22387 (Resolved): luminous: PG stuck in recovery_unfound
David Zafman
07:35 PM Backport #22387 (Fix Under Review): luminous: PG stuck in recovery_unfound
David Zafman
07:54 PM Backport #21653 (Resolved): luminous: Erasure code recovery should send additional reads if neces...
David Zafman
07:53 PM Backport #22069 (Resolved): luminous: osd/ReplicatedPG.cc: recover_replicas: object added to miss...
David Zafman
04:18 PM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Chang Liu wrote:
> Enrico Labedzki wrote:
> > Chang Liu wrote:
> > > Enrico Labedzki wrote:
> > > > Chang Liu wro...
Enrico Labedzki
03:45 PM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Enrico Labedzki wrote:
> Chang Liu wrote:
> > Enrico Labedzki wrote:
> > > Chang Liu wrote:
> > > > Sage Weil wro...
Chang Liu
09:40 AM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Chang Liu wrote:
> Enrico Labedzki wrote:
> > Chang Liu wrote:
> > > Sage Weil wrote:
> > > > 1. it's not valid j...
Enrico Labedzki
09:30 AM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Enrico Labedzki wrote:
> Chang Liu wrote:
> > Sage Weil wrote:
> > > 1. it's not valid json.. Formatter shouldn't ...
Chang Liu
08:52 AM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Chang Liu wrote:
> Sage Weil wrote:
> > 1. it's not valid json.. Formatter shouldn't allow it
> > 2. we should hav...
Enrico Labedzki
06:36 AM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Sage Weil wrote:
> 1. it's not valid json.. Formatter shouldn't allow it
> 2. we should have a valid value (or 0) t...
Chang Liu
04:02 AM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)

This bug has been fixed by https://github.com/ceph/ceph/pull/13531. We should backport it to Jewel.
Chang Liu
04:08 PM Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi

As Josh said it seems easier to trigger in Jewel. This is based on my attempt to reproduce in master.
All 50 ma...
David Zafman
02:22 AM Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi
Looking through the logs more with David, we found this sequence of events in 1946610:
1) osd.5 gets a write to ob...
Josh Durgin
12:45 PM Bug #22266: mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)
Master PR for second round of backporting: https://github.com/ceph/ceph/pull/19780
Luminous backport PR: https://g...
Nathan Cutler
08:44 AM Bug #22656: scrub mismatch on bytes (cache pools)
Happened here as well: http://pulpito.ceph.com/smithfarm-2018-01-24_19:46:55-rados-wip-smithfarm-testing-distro-basic... Nathan Cutler
04:24 AM Backport #22794 (In Progress): jewel: heartbeat peers need to be updated when a new OSD added int...
https://github.com/ceph/ceph/pull/20108 Kefu Chai
04:14 AM Backport #22794 (Resolved): jewel: heartbeat peers need to be updated when a new OSD added into a...
https://github.com/ceph/ceph/pull/20108 Kefu Chai
04:13 AM Backport #22793 (Rejected): osd: sends messages to marked-down peers
i wanted to backport fix of #18004 not this one. Kefu Chai
04:12 AM Backport #22793 (Rejected): osd: sends messages to marked-down peers
the async osdmap updates introduce a new problem:
- handle_osd_map map X marks down osd Y
- pg thread uses map X-...
Kefu Chai

01/24/2018

09:18 PM Backport #21636: luminous: ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18754
merged
Yuri Weinstein
09:10 PM Bug #22329: mon: Valgrind: mon (Leak_DefinitelyLost, Leak_IndirectlyLost)
New one:
/ceph/teuthology-archive/yuriw-2018-01-23_20:26:59-multimds-wip-yuri-testing-2018-01-22-1653-luminous-tes...
Patrick Donnelly
07:56 PM Backport #22502: luminous: Pool Compression type option doesn't apply to new OSD's
Master commit was reverted - redoing the backport. Nathan Cutler
06:12 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
Disregard my previous comment; different error message for the same assert was unfortunately buried in the logs. Sorr... Joao Eduardo Luis
06:04 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
FWIW, I am currently reproducing this quite reliably on my dev env, on a quite outdated version of master (cbe78ae629... Joao Eduardo Luis
01:55 PM Bug #21407 (Resolved): backoff causes out of order op
Nathan Cutler
01:54 PM Backport #21794 (Resolved): luminous: backoff causes out of order op
Nathan Cutler
11:23 AM Backport #22450 (In Progress): luminous: Visibility for snap trim queue length
https://github.com/ceph/ceph/pull/20098 Piotr Dalek

01/23/2018

11:57 PM Bug #21566 (Resolved): OSDService::recovery_need_sleep read+updated without locking
Nathan Cutler
11:57 PM Backport #21697 (Resolved): luminous: OSDService::recovery_need_sleep read+updated without locking
Nathan Cutler
11:06 PM Backport #21697: luminous: OSDService::recovery_need_sleep read+updated without locking
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18753
merged
Yuri Weinstein
11:56 PM Backport #21785 (Resolved): luminous: OSDMap cache assert on shutdown
Nathan Cutler
11:07 PM Backport #21785: luminous: OSDMap cache assert on shutdown
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18749
merged
Yuri Weinstein
11:55 PM Bug #21845 (Resolved): Objecter::_send_op unnecessarily constructs costly hobject_t
Nathan Cutler
11:55 PM Backport #21921 (Resolved): luminous: Objecter::_send_op unnecessarily constructs costly hobject_t
Nathan Cutler
11:09 PM Backport #21921: luminous: Objecter::_send_op unnecessarily constructs costly hobject_t
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18745
merged
Yuri Weinstein
11:54 PM Backport #21922 (Resolved): luminous: Objecter::C_ObjectOperation_sparse_read throws/catches exce...
Nathan Cutler
11:10 PM Backport #21922: luminous: Objecter::C_ObjectOperation_sparse_read throws/catches exceptions on -...
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18744
merged
Yuri Weinstein
11:25 PM Bug #21818 (Resolved): ceph_test_objectstore fails ObjectStore/StoreTest.Synthetic/1 (filestore) ...
Nathan Cutler
11:25 PM Backport #21924 (Resolved): luminous: ceph_test_objectstore fails ObjectStore/StoreTest.Synthetic...
Nathan Cutler
11:10 PM Backport #21924: luminous: ceph_test_objectstore fails ObjectStore/StoreTest.Synthetic/1 (filesto...
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18742
merged
Yuri Weinstein
08:30 PM Backport #22423 (Closed): luminous: osd: initial minimal efforts to clean up PG interface
I was able to cleanly backport http://tracker.ceph.com/issues/22069 without this large change. David Zafman
11:01 AM Bug #22351: Couldn't init storage provider (RADOS)
No, I set it to Luminous based on the request by theanalyst in https://github.com/ceph/ceph/pull/20023. I'm fine with... Brad Hubbard
10:24 AM Bug #22351: Couldn't init storage provider (RADOS)
@Brad Assigning to you and leaving the backport field on "luminous" (but feel free to zero it out if it's enough to m... Nathan Cutler
10:14 AM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
@David I can only guess that this is not reproducible in master and that's why it requires a luminous-only fix. Could... Nathan Cutler
10:01 AM Backport #22761 (In Progress): luminous: osd checks out-of-date osdmap for DESTROYED flag on start
Nathan Cutler
09:40 AM Backport #22761 (Resolved): luminous: osd checks out-of-date osdmap for DESTROYED flag on start
https://github.com/ceph/ceph/pull/20068 Nathan Cutler
07:48 AM Bug #22673 (Pending Backport): osd checks out-of-date osdmap for DESTROYED flag on start
Kefu Chai
06:38 AM Bug #22727: "osd pool stats" shows recovery information bugly
need to backport it to jewel and luminous. but it at least dates back to 9.2.0. see also http://lists.ceph.com/piperm... Kefu Chai
06:32 AM Bug #22727 (Fix Under Review): "osd pool stats" shows recovery information bugly
Kefu Chai

01/22/2018

11:50 PM Bug #22419 (Pending Backport): Pool Compression type option doesn't apply to new OSD's
Sage Weil
08:12 AM Bug #22419 (Fix Under Review): Pool Compression type option doesn't apply to new OSD's
https://github.com/ceph/ceph/pull/20044 Kefu Chai
11:46 PM Bug #22711 (Resolved): qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands: expect...
Sage Weil
12:53 PM Bug #22711 (Fix Under Review): qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands:...
https://github.com/ceph/ceph/pull/20046 Kefu Chai
11:06 AM Bug #22711: qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands: expect_false test...
the weirdness of this issue is that some PGs are mapped to a single OSD:... Kefu Chai
03:13 AM Bug #22711: qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands: expect_false test...
the curr_object_copies_rate value in PGMap.cc dump_object_stat_sum is .5, which is counteracting the 2x replication f... Sage Weil
07:04 PM Bug #22752: snapmapper inconsistency, crash on luminous
https://github.com/ceph/ceph/pull/20040 Sage Weil
07:03 PM Bug #22752 (Resolved): snapmapper inconsistency, crash on luminous
from Stefan Priebe on ceph-devel ML:... Sage Weil
06:47 PM Backport #22387 (In Progress): luminous: PG stuck in recovery_unfound

Included with another dependent backport as https://github.com/ceph/ceph/pull/20055
David Zafman
12:40 PM Backport #22387 (Need More Info): luminous: PG stuck in recovery_unfound
Non-trivial backport Nathan Cutler
02:27 PM Feature #22750 (Fix Under Review): libradosstriper conditional compile
-https://github.com/ceph/ceph/pull/18197- Nathan Cutler
01:21 PM Feature #22750 (Resolved): libradosstriper conditional compile
Currently libradosstriper is a hard dependency of the rados CLI tool.
Please add a "WITH_LIBRADOSSTRIPER" compile-...
Nathan Cutler
02:16 PM Bug #22746 (Fix Under Review): osd/common: ceph-osd process is terminated by the logratote task
John Spray
11:51 AM Bug #22746 (Resolved): osd/common: ceph-osd process is terminated by the logratote task
1. Construct the scene:
(1) step 1:
Open the terminal_1, and
Prepare the cmd: "killall -q -1 ceph-mon ...
huanwen ren
12:59 PM Support #22749 (Closed): dmClock OP classification
Why does dmClock algorithm in CEPH attribute recovery's read and write OP to osd_op_queue_mclock_osd_sub, so that whe... 何 伟俊
12:41 PM Backport #22724 (Need More Info): luminous: miscounting degraded objects
Nathan Cutler
12:41 PM Backport #22724: luminous: miscounting degraded objects
David, while you're doing this one, can you include https://tracker.ceph.com/issues/22387 as well? Nathan Cutler
12:23 PM Support #22680 (Resolved): mons segmentation faults New 12.2.2 cluster
Nathan Cutler
03:04 AM Bug #22715 (Pending Backport): log entries weirdly zeroed out after 'osd pg-temp' command
Kefu Chai
03:04 AM Backport #22744 (In Progress): luminous: log entries weirdly zeroed out after 'osd pg-temp' command
https://github.com/ceph/ceph/pull/20042 Kefu Chai
03:03 AM Backport #22744 (Resolved): luminous: log entries weirdly zeroed out after 'osd pg-temp' command
https://github.com/ceph/ceph/pull/20042 Kefu Chai

01/21/2018

08:29 PM Bug #22715 (Resolved): log entries weirdly zeroed out after 'osd pg-temp' command
Sage Weil
06:56 PM Bug #22743 (New): "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-sm...
Run: http://pulpito.ceph.com/teuthology-2018-01-19_01:15:02-upgrade:hammer-x-jewel-distro-basic-smithi/
Job: 2088826...
Yuri Weinstein

01/20/2018

11:18 PM Bug #22351 (In Progress): Couldn't init storage provider (RADOS)
Reopening this and reassigning it to RADOS as there are a couple of changes we can make to logging to make this easie... Brad Hubbard

01/19/2018

04:16 PM Support #20108: PGs are not remapped correctly when one host fails
Hi,
Thank you for your answer!
I've seen that page before, but which tunable are you suggesting for the problem...
Laszlo Budai
09:59 AM Bug #22233 (Fix Under Review): prime_pg_temp breaks on uncreated pgs
https://github.com/ceph/ceph/pull/20025 Kefu Chai
09:08 AM Bug #22711: qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands: expect_false test...
... Chang Liu
02:51 AM Support #22553: ceph-object-tool can not remove metadata pool's object
not something wrong with disk,it can be repeat peng zhang

01/18/2018

10:57 PM Support #20108: PGs are not remapped correctly when one host fails
http://docs.ceph.com/docs/master/rados/operations/crush-map/?highlight=tunables#tunables Greg Farnum
07:02 PM Bug #22351 (Closed): Couldn't init storage provider (RADOS)
Yehuda Sadeh
10:47 AM Bug #22351: Couldn't init storage provider (RADOS)
Brad Hubbard wrote:
>
> (6*1024)*3 = 18432, thus 18432/47 ~ 392 PGs per OSD. You omitted the size of the pools.
...
Nikos Kormpakis
03:21 AM Bug #22351: Couldn't init storage provider (RADOS)
https://ceph.com/pgcalc/ should be used as a guide/starting point. Brad Hubbard
03:07 PM Bug #22727: "osd pool stats" shows recovery information bugly
https://github.com/ceph/ceph/pull/20009 Chang Liu
05:18 AM Bug #22727 (In Progress): "osd pool stats" shows recovery information bugly
Chang Liu
03:16 AM Bug #22727 (Resolved): "osd pool stats" shows recovery information bugly
... Chang Liu
03:51 AM Bug #22715 (Fix Under Review): log entries weirdly zeroed out after 'osd pg-temp' command
https://github.com/ceph/ceph/pull/19998 Sage Weil

01/17/2018

10:28 PM Bug #22351: Couldn't init storage provider (RADOS)
Nikos Kormpakis wrote:
> But I still cannot understand why I'm hitting this error.
> Regarding my cluster, I have t...
Brad Hubbard
01:15 PM Bug #22351: Couldn't init storage provider (RADOS)
Brad Hubbard wrote:
> I'm able to reproduce something like what you are seeing, the messages are a little different....
Nikos Kormpakis
03:30 AM Bug #22351: Couldn't init storage provider (RADOS)
I'm able to reproduce something like what you are seeing, the messages are a little different.
What I see is this....
Brad Hubbard
12:12 AM Bug #22351: Couldn't init storage provider (RADOS)
It turns out what we need is the hexadecimal int representation of '-34' from the ltrace output.
$ c++filt </tmp/l...
Brad Hubbard
10:26 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Ryan Anstey wrote:
> I'm working on fixing all my inconsistent pgs but I'm having issues with rados get... hopefully...
Brian Andrus
09:07 PM Bug #22656: scrub mismatch on bytes (cache pools)
/a/sage-2018-01-17_14:40:55-rados-wip-sage-testing-2018-01-16-2156-distro-basic-smithi/2082959
description: rados/...
Sage Weil
07:54 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
David Zafman
07:48 PM Bug #20059: miscounting degraded objects
https://github.com/ceph/ceph/pull/19850 David Zafman
07:36 PM Bug #21387 (Can't reproduce): mark_unfound_lost hangs
Multiple fixes to mark_all_unfound_lost() has fixed this. Possibly the most important master branch commit is 689bff... David Zafman
06:00 PM Bug #22668 (Fix Under Review): osd/ExtentCache.h: 371: FAILED assert(tid == 0)
https://github.com/ceph/ceph/pull/19989 Sage Weil
05:10 PM Backport #22724 (Resolved): luminous: miscounting degraded objects
on bigbang,... David Zafman
04:39 PM Bug #22673 (Fix Under Review): osd checks out-of-date osdmap for DESTROYED flag on start
note: you can work around this by waiting a bit until some osd maps trim from the monitor.
https://github.com/ceph...
Sage Weil
02:54 PM Bug #22673: osd checks out-of-date osdmap for DESTROYED flag on start
It looks like the _preboot destroyed check should go after we catch up on maps. Sage Weil
02:53 PM Bug #22673: osd checks out-of-date osdmap for DESTROYED flag on start
This is a real bug, should be straightforward to fix. Thanks for the report! Sage Weil
02:59 PM Bug #22544: objecter cannot resend split-dropped op when racing with con reset
Hmm, I'm not sure what the best fix is. Do you see a good path to fixing this with ms_handle_connect()? Sage Weil
02:57 PM Bug #22659 (In Progress): During the cache tiering configuration ,ceph-mon daemon getting crashed...
This will need to be backported to luminous and jewel once merged. Joao Eduardo Luis
09:36 AM Bug #22659: During the cache tiering configuration ,ceph-mon daemon getting crashed after setting...
https://github.com/ceph/ceph/pull/19983 Jing Li
02:55 PM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to use
Sage Weil
02:52 PM Bug #22661 (Triaged): Segmentation fault occurs when the following CLI is executed
Joao Eduardo Luis
02:51 PM Bug #22672 (Triaged): OSDs frequently segfault in PrimaryLogPG::find_object_context() with empty ...
Joao Eduardo Luis
02:28 PM Bug #22597 (Fix Under Review): "sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-0'" fails in upgra...
https://github.com/ceph/ceph/pull/19987 Kefu Chai
01:32 PM Bug #22233 (In Progress): prime_pg_temp breaks on uncreated pgs
Kefu Chai
11:24 AM Support #22664: some random OSD are down (with a Abort signal on exception) after replace/rebuild...
Hi Greg,
can you point me to the link, as far we have seen yet, all ulimit 10 times higher as needed on all nodes....
Enrico Labedzki

01/16/2018

09:49 PM Bug #22715 (Resolved): log entries weirdly zeroed out after 'osd pg-temp' command
... Sage Weil
07:59 PM Bug #20059 (Pending Backport): miscounting degraded objects
David Zafman
07:10 PM Bug #22711 (Resolved): qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands: expect...
... Sage Weil
07:09 PM Bug #22677 (Resolved): rados/test_rados_tool.sh failure
Sage Weil
04:16 PM Bug #22351: Couldn't init storage provider (RADOS)
Hello,
we're facing the same issue on a Luminous cluster.
Some info about the cluster:
Version: ceph version 1...
Nikos Kormpakis
03:08 PM Bug #20874: osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end() || (miter->second...
/a/sage-2018-01-16_03:08:54-rados-wip-sage2-testing-2018-01-15-1257-distro-basic-smithi/2077982... Sage Weil
01:33 PM Backport #22707 (In Progress): luminous: ceph_objectstore_tool: no flush before collection_empty(...
Nathan Cutler
01:30 PM Backport #22707 (Resolved): luminous: ceph_objectstore_tool: no flush before collection_empty() c...
https://github.com/ceph/ceph/pull/19967 Nathan Cutler
01:21 PM Bug #22409 (Pending Backport): ceph_objectstore_tool: no flush before collection_empty() calls; O...
Sage Weil
12:53 PM Support #20108: PGs are not remapped correctly when one host fails
Hello,
I'm sorry I've missed your message. Can you please give me some clues about the "newer crush tunables" that...
Laszlo Budai
12:48 PM Bug #22668: osd/ExtentCache.h: 371: FAILED assert(tid == 0)
/a/sage-2018-01-15_18:49:16-rados-wip-sage-testing-2018-01-14-1341-distro-basic-smithi/2076047 Sage Weil
12:48 PM Bug #22668: osd/ExtentCache.h: 371: FAILED assert(tid == 0)
/a/sage-2018-01-15_18:49:16-rados-wip-sage-testing-2018-01-14-1341-distro-basic-smithi/2075822 Sage Weil
11:10 AM Support #22680: mons segmentation faults New 12.2.2 cluster
Thanks! We had jemalloc in LD_PRELOAD since Infernalis, so i didn't think about that. I removed this from sysconfig, ... Kenneth Waegeman

01/15/2018

07:26 PM Feature #22442: ceph daemon mon.id mon_status -> ceph daemon mon.id status
Joao, did mon_status just precede the other status commands, or was there a reason for them to be different? Greg Farnum
07:22 PM Bug #22486: ceph shows wrong MAX AVAIL with hybrid (chooseleaf firstn 1, chooseleaf firstn -1) CR...
Well, the hybrid ruleset isn't giving you as much host isolation as you're probably thinking, since it can select an ... Greg Farnum
07:11 PM Support #22664 (Closed): some random OSD are down (with a Abort signal on exception) after replac...
It's failing to create a new thread. You probably need to bump the ulimit; this is discussed in the documentation. :) Greg Farnum
07:08 PM Support #22680: mons segmentation faults New 12.2.2 cluster
This is buried in the depths of RocksDB doing IO, so the only causes I know of/can think of are
1) you've found an u...
Greg Farnum
10:39 AM Support #22680 (Resolved): mons segmentation faults New 12.2.2 cluster

Hi all,
I installed a new Luminous 12.2.2 cluster. The monitors were up at first, but quickly started failing, s...
Kenneth Waegeman
05:48 PM Backport #22387: luminous: PG stuck in recovery_unfound
Include commit 64047e1 "osd: Don't start recovery for missing until active pg state set" from https://github.com/ceph... David Zafman
11:00 AM Support #22531: OSD flapping under repair/scrub after recieve inconsistent PG LFNIndex.cc: 439: F...
Josh Durgin wrote:
> Can you provide a directory listing for pg 1.f? It seems a file that does not obey the internal...
Jan Michlik
06:12 AM Bug #22351: Couldn't init storage provider (RADOS)
Brad Hubbard wrote:
> If this is a RADOS function returning ERANGE (34) then it should be possible to find it by att...
Amine Liu
05:05 AM Bug #22351: Couldn't init storage provider (RADOS)
If this is a RADOS function returning ERANGE (34) then it should be possible to find it by attempting to start the ra... Brad Hubbard
03:26 AM Bug #20059 (Fix Under Review): miscounting degraded objects
David Zafman
02:56 AM Bug #22668: osd/ExtentCache.h: 371: FAILED assert(tid == 0)
/a//kchai-2018-01-11_06:11:31-rados-wip-kefu-testing-2018-01-11-1036-distro-basic-mira/2058373/remote/mira002/log/cep... Kefu Chai

01/14/2018

10:46 PM Bug #22672: OSDs frequently segfault in PrimaryLogPG::find_object_context() with empty clone_snap...
To (relatively) stabilise the frequently crashing OSDs, we've added an early -ENOENT return to PrimaryLogPG::find_obj... David Disseldorp
04:37 PM Bug #22677: rados/test_rados_tool.sh failure
https://github.com/ceph/ceph/pull/19946 Sage Weil

01/13/2018

03:54 PM Bug #22677 (Resolved): rados/test_rados_tool.sh failure
... Sage Weil

01/12/2018

10:43 PM Bug #22438 (Resolved): mon: leak in lttng dlopen / __tracepoints__init
Patrick Donnelly
06:29 AM Bug #22438: mon: leak in lttng dlopen / __tracepoints__init
https://github.com/ceph/teuthology/pull/1144 Kefu Chai
10:23 PM Bug #22672: OSDs frequently segfault in PrimaryLogPG::find_object_context() with empty clone_snap...
That looks like a good way to investigate. We've seen a few reports of issues with cache tier snapshots since that re... Greg Farnum
02:54 PM Bug #22672: OSDs frequently segfault in PrimaryLogPG::find_object_context() with empty clone_snap...
to detect this case during scrub, I'm currently testing the following change:
-https://github.com/ddiss/ceph/commit/...
David Disseldorp
12:55 AM Bug #22672 (Triaged): OSDs frequently segfault in PrimaryLogPG::find_object_context() with empty ...
Environment is a Luminous cache-tiered deployment with some of the hot-tier OSDs converted to bluestore. The remainin... David Disseldorp
07:38 PM Bug #22063: "RadosModel.h: 1703: FAILED assert(!version || comp->get_version64() == version)" inr...
Also in http://qa-proxy.ceph.com/teuthology/teuthology-2017-11-17_18:17:24-rados-jewel-distro-basic-smithi/1857527/te... David Zafman
07:36 PM Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi
Yuri Weinstein wrote:
> Also in http://qa-proxy.ceph.com/teuthology/teuthology-2017-11-17_18:17:24-rados-jewel-distr...
David Zafman
07:18 PM Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi
As 17815 has to do with when scrub is allowed to start, it wouldn't be related to this bug. David Zafman
01:03 PM Bug #22673 (Resolved): osd checks out-of-date osdmap for DESTROYED flag on start
When trying an in-place migration of a filestore to bluestore OSD, we encountered a situation where ceph-osd would re... J Mozdzen
07:45 AM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
i am rerunning the failed test at http://pulpito.ceph.com/kchai-2018-01-12_07:44:06-multimds-wip-pdonnell-testing-201... Kefu Chai
07:29 AM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
i agree it's a bug in osd. but i don't think osd should return -ENOENT in this case. as Sage pointed out, it should c... Kefu Chai
01:15 AM Bug #22351: Couldn't init storage provider (RADOS)
Abhishek Lekshmanan wrote:
> can you tell us the ceph pg num and pgp num setting in ceph.conf (or rather paste teh c...
Amine Liu

01/11/2018

09:43 PM Bug #22668 (Resolved): osd/ExtentCache.h: 371: FAILED assert(tid == 0)
... Sage Weil
06:52 PM Bug #22351: Couldn't init storage provider (RADOS)
can you tell us the ceph pg num and pgp num setting in ceph.conf (or rather paste teh ceph.conf retracting sensitive ... Abhishek Lekshmanan
04:05 PM Bug #22561: PG stuck during recovery, requires OSD restart
OSD 32 was running and actively serving client IO. Paul Emmerich
02:39 PM Support #22664 (Closed): some random OSD are down (with a Abort signal on exception) after replac...
Hello,
currently we are facing with a strange behavior, where some OSDs are got ramdomly down with a Abort signal,...
Enrico Labedzki
12:57 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Recovery from non starting OSDs in this case is as following. Run OSD with debug:... Zdenek Janda
10:55 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Also several osds (as you can see the ceph osd tree output) are getting dumped out of the crush map. After putting th... Michal Cila
10:44 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
More info on affected PG... Zdenek Janda
10:39 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
I have succeeded in identifying faulty PG:... Zdenek Janda
10:17 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Adding last 10000 lines of strace of OSD affected by the bug.
The ABRT signal is generated right after ...
Zdenek Janda
09:45 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
also adding our current ceph -s/ceph osd tree state:... Josef Zelenka
09:44 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
we are also affected by this bug. we are running luminous 12.2.2 on ubuntu 16.04, 3 node cluster, 8 HDDs per node, bl... Josef Zelenka
10:30 AM Bug #22662 (Resolved): ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Hi,
we have a monitoring script which parses the 'ceph osd df -f json' output, but from time to time it will happe...
Enrico Labedzki
08:36 AM Bug #22661 (Triaged): Segmentation fault occurs when the following CLI is executed
Observation:
--------------
It is observed that when a user executes the CLI without providing the value of osd-u...
Debashis Mondal
07:34 AM Bug #22659 (In Progress): During the cache tiering configuration ,ceph-mon daemon getting crashed...
Observation:
--------------
Before setting the value of "hit_set_count" Ceph health was OK but after configuring th...
Debashis Mondal
02:54 AM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
OSD should reply -ENOENT for that case. should be OSD bug Zheng Yan

01/10/2018

11:38 PM Bug #22351: Couldn't init storage provider (RADOS)
Related to the ERROR: failed to initialize watch: (34) Numerical result out of range, it looks a class path issue. Th... Javier M. Mellid
11:38 PM Backport #22658 (In Progress): filestore: randomize split threshold
Josh Durgin
10:39 PM Backport #22658 (Resolved): filestore: randomize split threshold
https://github.com/ceph/ceph/pull/19906 Josh Durgin
10:16 PM Feature #15835 (Pending Backport): filestore: randomize split threshold
Josh Durgin
10:03 PM Support #22531: OSD flapping under repair/scrub after recieve inconsistent PG LFNIndex.cc: 439: F...
Can you provide a directory listing for pg 1.f? It seems a file that does not obey the internal naming rules of files... Josh Durgin
09:48 PM Bug #22561: PG stuck during recovery, requires OSD restart
Was OSD 32 running at the time? It sounds like correct behavior if OSD 32 was not reachable. It might have been marke... Josh Durgin
09:44 PM Support #22566: Some osd remain 100% CPU after upgrade jewel => luminous (v12.2.2) and some work
This is likely the singe-time startup cost of accounting for a bug in omap, where the osd has to scan the whole omap ... Josh Durgin
09:39 PM Bug #22597: "sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-0'" fails in upgrade test
IIRC we didn't have the ceph user in hammer - need to account for that in the suite if we want to keep running it at ... Josh Durgin
09:36 PM Bug #22641 (Resolved): uninit condition in PrimaryLogPG::process_copy_chunk_manifest
Josh Durgin
09:22 PM Bug #22641: uninit condition in PrimaryLogPG::process_copy_chunk_manifest
myoungwon oh wrote:
> https://github.com/ceph/ceph/pull/19874
merged
Yuri Weinstein
09:22 PM Bug #22656 (New): scrub mismatch on bytes (cache pools)
... Sage Weil
09:21 PM Bug #21557: osd.6 found snap mapper error on pg 2.0 oid 2:0e781f33:::smithi14431805-379 ... :187 ...
/a/yuriw-2018-01-09_21:50:35-rados-wip-yuri2-testing-2018-01-09-1813-distro-basic-smithi/2050823
another one.
<...
Sage Weil
09:01 PM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
/a/yuriw-2018-01-09_21:50:35-rados-wip-yuri2-testing-2018-01-09-1813-distro-basic-smithi/2050802
Sage Weil
03:34 PM Bug #22539: bluestore: New OSD - Caught signal - bstore_kv_sync
https://github.com/ceph/ceph/pull/19759 Kefu Chai
03:33 PM Bug #22539 (Pending Backport): bluestore: New OSD - Caught signal - bstore_kv_sync
Kefu Chai
02:56 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
That would be an fs bug, sure.
However, shouldn't the OSD not assert due to an object not existing?
Patrick Donnelly
02:48 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
I think the problem here is that the object doesn't exist but we're doing omap_setkeys on it.. which doesn't implicit... Sage Weil
08:57 AM Bug #22438 (Fix Under Review): mon: leak in lttng dlopen / __tracepoints__init
https://github.com/ceph/teuthology/pull/1143 Kefu Chai
08:16 AM Bug #22525 (Fix Under Review): auth: ceph auth add does not sanity-check caps
Jos Collin

01/09/2018

10:39 PM Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi
Actually, I may have seen an instance of the failure in a run that did not include 17815, so please don't take what I... Nathan Cutler
05:49 PM Bug #21557: osd.6 found snap mapper error on pg 2.0 oid 2:0e781f33:::smithi14431805-379 ... :187 ...
Not 100% sure if that's the same issue but we have a customer who faces an assert in SnapMapper::get_snaps()
2018-01...
Igor Fedotov
04:02 PM Bug #22641: uninit condition in PrimaryLogPG::process_copy_chunk_manifest
https://github.com/ceph/ceph/pull/19874 Myoungwon Oh
02:43 PM Bug #22641 (Resolved): uninit condition in PrimaryLogPG::process_copy_chunk_manifest
... Sage Weil
03:54 PM Bug #22278: FreeBSD fails to build with WITH_SPDK=ON
patch merged in DPDK. waiting for SPDK to pick up the latest DPDK. Kefu Chai
03:49 PM Support #22520 (Closed): nearfull threshold is not cleared when osd really is not nearfull.
You need to change this in the osd map, not the config. "ceph osd set-nearfull-ratio" or something similar. Greg Farnum
02:59 PM Bug #22409 (Resolved): ceph_objectstore_tool: no flush before collection_empty() calls; ObjectSto...
Kefu Chai
01:52 AM Bug #22351: Couldn't init storage provider (RADOS)
Orit Wasserman wrote:
> what is your pool configuration?
all default, just a default pool 'rbd'.
Amine Liu

01/08/2018

11:54 PM Bug #22624 (Duplicate): filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No suc...
... Patrick Donnelly
12:35 PM Bug #22409 (Fix Under Review): ceph_objectstore_tool: no flush before collection_empty() calls; O...
Igor Fedotov
12:35 PM Bug #22409: ceph_objectstore_tool: no flush before collection_empty() calls; ObjectStore/StoreTes...
https://github.com/ceph/ceph/pull/19764 Igor Fedotov
08:21 AM Bug #22409: ceph_objectstore_tool: no flush before collection_empty() calls; ObjectStore/StoreTes...
sage, i am taking this ticket from you. as it's simple enough and it won't cause too much duplication of efforts.
...
Kefu Chai
07:22 AM Bug #22415 (Duplicate): 'pg dump' fails after mon rebuild
Kefu Chai

01/06/2018

01:29 AM Bug #22220: osd/ReplicatedPG.h:1667:14: internal compiler error: in force_type_die, at dwarf2out....
For DTS this should be fixed in the 7.1 release. Brad Hubbard
12:35 AM Bug #20439: PG never finishes getting created
Same thing in http://pulpito.ceph.com/yuriw-2018-01-04_20:43:14-rados-wip-yuri4-testing-2018-01-04-1750-distro-basic-... Josh Durgin

01/05/2018

03:57 PM Bug #22597 (Resolved): "sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-0'" fails in upgrade test
http://pulpito.ceph.com/kchai-2018-01-05_15:34:38-upgrade-wip-kefu-testing-2018-01-04-1836-distro-basic-mira/
<pre...
Kefu Chai
09:51 AM Bug #22525: auth: ceph auth add does not sanity-check caps
-https://github.com/ceph/ceph/pull/19794- Jing Li

01/04/2018

07:13 PM Bug #22351 (Need More Info): Couldn't init storage provider (RADOS)
what is your pool configuration? Orit Wasserman
02:42 PM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
So my OSDs had the default Bluestore layout the first time around, i.e. a 100MB DB/WAL (xfs) partition followed by th... Jon Heese
07:06 AM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
Jon Heese wrote:
> Unfortunately, `ceph-disk zap /dev/sde` does not wipe enough of the disk to avoid this issue. As...
Hua Liu
02:35 PM Bug #22266 (Pending Backport): mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)
Kefu Chai
02:32 PM Bug #22266 (Resolved): mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)
Sage Weil
01:23 PM Bug #22266 (Fix Under Review): mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)
http://tracker.ceph.com/issues/22266 Kefu Chai
01:52 PM Support #22566 (New): Some osd remain 100% CPU after upgrade jewel => luminous (v12.2.2) and some...
h1. I have some OSDs that remain at 100% startup without any debug info in the logs :... David Casier
07:12 AM Support #22422: Block fsid does not match our fsid
See, [[http://tracker.ceph.com/issues/22354]] Hua Liu
01:07 AM Bug #22561 (New): PG stuck during recovery, requires OSD restart
We are sometimes encountering issues with PGs getting stuck in recovery.
For example, we ran some stress tests wit...
Paul Emmerich

01/03/2018

09:28 PM Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi
So Nathan seems to have narrowed it down to https://github.com/ceph/ceph/pull/17815 - can you look at this when you'r... Josh Durgin
09:23 PM Support #22422: Block fsid does not match our fsid
It looks like you may have had a partial prepare there in the past - if you're sure it's the right disk, wipe it with... Josh Durgin
09:22 PM Bug #22438 (Resolved): mon: leak in lttng dlopen / __tracepoints__init
Josh Durgin
09:17 PM Support #22466 (Closed): PG failing to map to any OSDs
Josh Durgin
09:08 PM Support #22553: ceph-object-tool can not remove metadata pool's object
Is there possibly something wrong with that disk? Josh Durgin
03:28 PM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
Jon Heese wrote:
> Unfortunately, `ceph-disk zap /dev/sde` does not wipe enough of the disk to avoid this issue. As...
Curt Bruns
01:41 AM Bug #22346 (Resolved): OSD_ORPHAN issues after jewel->luminous upgrade, but orphaned osds not in ...
Not for me.
$ crushtool -d crushmap.bad -o crushmap.bad.txt
$ crushtool -d crushmap.good -o crushmap.good.txt
$ ...
Brad Hubbard

01/02/2018

09:03 PM Bug #22539: bluestore: New OSD - Caught signal - bstore_kv_sync
Alright, that fixed it!
It also fixed the heavy IO issue as well as the rather large amount of consumption I was s...
Brian Woods
06:20 PM Bug #22539: bluestore: New OSD - Caught signal - bstore_kv_sync
Sorry for the spam.
That broke it good!!!...
Brian Woods
06:15 PM Bug #22539: bluestore: New OSD - Caught signal - bstore_kv_sync
Was able to out them all:... Brian Woods
06:14 PM Bug #22539: bluestore: New OSD - Caught signal - bstore_kv_sync
I can't mark the OSDs out.... Brian Woods
03:42 PM Bug #22539: bluestore: New OSD - Caught signal - bstore_kv_sync
Hard to say excatly, but I would not be surprised to see any manner of odd behaviors with a huge map like that--we ha... Sage Weil
04:28 PM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
Unfortunately, `ceph-disk zap /dev/sde` does not wipe enough of the disk to avoid this issue. As I mentioned above, ... Jon Heese
01:01 PM Support #22553 (New): ceph-object-tool can not remove metadata pool's object
i put an object to the rbd pool
rados -p rbd put qinli.sh
then stop osd and remove it
[root@lab71 ~]# ceph-objec...
peng zhang

12/31/2017

11:13 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
I'm working on fixing all my inconsistent pgs but I'm having issues with rados get... hopefully I'm just doing the co... Ryan Anstey

12/30/2017

02:30 AM Bug #22539: bluestore: New OSD - Caught signal - bstore_kv_sync
I had no idea the ID would impact the map calculations that way (makes sense now)!!! Very good to know! And those I... Brian Woods

12/29/2017

10:34 PM Bug #22539 (In Progress): bluestore: New OSD - Caught signal - bstore_kv_sync
Brian, note that one reason why this triggered is that your osdmap is huge... because you have some osds with very la... Sage Weil

12/28/2017

11:02 PM Bug #22539: bluestore: New OSD - Caught signal - bstore_kv_sync
I'm a bit lost hence trying to re-arrange things:
Let's handle the crash first.
IMO it's caused by throttle value...
Igor Fedotov

12/27/2017

04:46 AM Bug #22539: bluestore: New OSD - Caught signal - bstore_kv_sync
A chunk from the mon log:
https://pastebin.com/MA1BStEc
Some screenshots of the IO:
https://imgur.com/a/BOKWc
...
Brian Woods
04:29 AM Bug #22544 (Resolved): objecter cannot resend split-dropped op when racing with con reset
@
if (split && con && con->has_features(CEPH_FEATUREMASK_RESEND_ON_SPLIT)) {
return RECALC_OP_TARGET_NEED_RES...
mingxin liu
 

Also available in: Atom