Project

General

Profile

Activity

From 01/16/2018 to 02/14/2018

02/14/2018

10:02 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
I'm seeing this on Luminous. Some kRBD clients are sending requests of death killing the active monitor.
No special ...
Paul Emmerich
08:30 PM Bug #22462: mon: unknown message type 1537 in luminous->mimic upgrade tests
@Kefu could pls take a look? Yuri Weinstein
05:48 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
ok, I'll wait for 12.2.4 or a 12.2.3 + the patch then. Frank Li
09:10 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Frank Li wrote:
> just curious, I saw this patch got merged to the master branch and has the target version of 12.2....
Nathan Cutler
06:51 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
just curious, I saw this patch got merged to the master branch and has the target version of 12.2.3, does that mean i... Frank Li
06:50 AM Bug #22952: Monitor stopped responding after awhile
either 12.2.2 + the patch or 12.2.3 RC + the patch would be good, whichever is more convenient to build. Frank Li
06:05 AM Bug #22996: Snapset inconsistency is no longer detected
We also need this fix to include tests that happen in the QA suite to prevent a future regression! :)
(Presumably th...
Greg Farnum
03:39 AM Bug #22996: Snapset inconsistency is no longer detected
David Zafman
03:37 AM Bug #22996 (Resolved): Snapset inconsistency is no longer detected

The fix for #20243 required additional handling of snapset inconsistency. The Object info and snapset aren't part ...
David Zafman

02/13/2018

07:53 PM Bug #22994 (New): rados bench doesn't use --max-objects
It would be useful for testing OSD caching behavior if rados bench would respect --max-objects parameter. It seems t... Ben England
07:30 PM Bug #22992: mon: add RAM usage (including avail) to HealthMonitor::check_member_health?
Turned out it was just the monitor being thrashed (didn't realize we were doing that in kcephfs!): #22993
Still, m...
Patrick Donnelly
06:43 PM Bug #22992 (New): mon: add RAM usage (including avail) to HealthMonitor::check_member_health?
I'm looking into several MON_DOWN failures from
http://pulpito.ceph.com/pdonnell-2018-02-13_17:49:41-kcephfs-wip-p...
Patrick Donnelly
06:12 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
https://github.com/ceph/ceph/pull/20410 David Zafman
04:04 AM Bug #21218 (Fix Under Review): thrash-eio + bluestore (hangs with unfound objects or read_log_and...
David Zafman
12:27 PM Bug #22063: "RadosModel.h: 1703: FAILED assert(!version || comp->get_version64() == version)" inr...
Another jewel run with this bug:
* http://qa-proxy.ceph.com/teuthology/smithfarm-2018-02-06_21:07:15-rados-wip-jew...
Nathan Cutler
06:52 AM Bug #22952: Monitor stopped responding after awhile
Kefu Chai wrote:
> > I reproduced the issue in a seperate cluster
>
> could you share the steps to reproduce this...
Frank Li

02/12/2018

10:35 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)

This assert can only happen in the following two cases:
osd debug verify missing on start = true. Used in t...
David Zafman
10:07 PM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
For kefu's run above,... Sage Weil
03:07 AM Bug #21218: thrash-eio + bluestore (hangs with unfound objects or read_log_and_missing assert)
thrash-eio + bluestore
/a/kchai-2018-02-11_04:16:47-rados-wip-kefu-testing-2018-02-11-0959-distro-basic-mira/2181825...
Kefu Chai
10:05 AM Bug #22354 (Fix Under Review): v12.2.2 unable to create bluestore osd using ceph-disk
https://github.com/ceph/ceph/pull/20400
Kefu Chai
09:52 AM Bug #22445: ceph osd metadata reports wrong "back_iface"
John Spray wrote:
> Hmm, this could well be the first time anyone's really tested the IPv6 path here.
https://git...
cory gu
09:27 AM Backport #22942 (In Progress): luminous: ceph osd force-create-pg cause all ceph-mon to crash and...
Nathan Cutler
08:57 AM Bug #22952: Monitor stopped responding after awhile
> I reproduced the issue in a seperate cluster
could you share the steps to reproduce this issue? so i can try it ...
Kefu Chai
05:58 AM Bug #22949 (Rejected): ceph_test_admin_socket_output --all times out
Kefu Chai
05:57 AM Bug #22949: ceph_test_admin_socket_output --all times out
thanks Brad. my bad, i thought the bug was in master also. closing this ticket, as the related PR is not yet merged. Kefu Chai

02/10/2018

08:50 AM Bug #22949: ceph_test_admin_socket_output --all times out
Brad Hubbard
08:39 AM Bug #22949: ceph_test_admin_socket_output --all times out
This is not a problem with the test (although it highlights a deficiency with error reporting which I'll submit a PR ... Brad Hubbard
02:32 AM Bug #22882 (In Progress): Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
I finally realized that the op throttler *does* drop the global rwlock while waiting for throttle, so it at least doe... Greg Farnum

02/09/2018

10:08 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Just FYI, Using this new patch, the leader ceph-mon will hung once it is up, and any kind of OSD command is ran, like... Frank Li
10:06 PM Bug #22952: Monitor stopped responding after awhile
Frank Li wrote:
> Frank Li wrote:
> > I reproduced the issue in a seperate cluster, it seems that whichever ceph-mo...
Frank Li
08:40 PM Bug #22952: Monitor stopped responding after awhile
Frank Li wrote:
> I reproduced the issue in a seperate cluster, it seems that whichever ceph-mon became the leader w...
Frank Li
08:35 PM Bug #22952: Monitor stopped responding after awhile
I reproduced the issue in a seperate cluster, it seems that whichever ceph-mon became the leader will be stuck, as I ... Frank Li
07:50 PM Feature #22973 (Duplicate): log lines when hitting "pg overdose protection"
You're right that it's bad! This will be fixed in the next luminous release after a belated backport finally happened... Greg Farnum
02:15 PM Feature #22973 (Duplicate): log lines when hitting "pg overdose protection"
After upgrading to Luminous we ran into situation where 10% of our pgs remained unavailable, stuck in "activating" st... Dan Stoner
04:24 PM Bug #22300 (Rejected): ceph osd reweightn command seems to change weight value
the parameter of reweigtn is an array of fixed point integers. and the integers are int(weight * 0x10000), where weig... Kefu Chai
02:20 PM Feature #22974 (Resolved): documentation - pg state table missing "activating" state
"activating" is not listed in the pg state table:
http://docs.ceph.com/docs/master/rados/operations/pg-states/
...
Dan Stoner
06:41 AM Bug #22949: ceph_test_admin_socket_output --all times out
Sure mate, added a patch to get better debugging and will test as soon as it's built. Brad Hubbard
12:24 AM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
Oh, and I had the LingerOp and Op conflated in my head a bit when looking at that before, but they are different.
...
Greg Farnum
12:03 AM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
Jason, how did you establish the number of in-flight ops? I wonder if maybe it *did* have them but they weren't able ... Greg Farnum
12:02 AM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
Okay, so presumably on resend you shouldn't need to grab op budget again, since it's already budgeted, right?
And ...
Greg Farnum

02/08/2018

02:37 PM Bug #22949: ceph_test_admin_socket_output --all times out
Brad, i am not able to reproduce this issue. could you help take a look? Kefu Chai
02:25 AM Bug #20086 (Resolved): LibRadosLockECPP.LockSharedDurPP gets EEXIST
Kefu Chai
02:24 AM Bug #22440 (Resolved): New pgs per osd hard limit can cause peering issues on existing clusters
@Nick, if you think this issue deserves a different fix, please feel free to reopen this ticket Kefu Chai
12:51 AM Bug #22848: Pull the cable,5mins later,Put back to the cable,pg stuck a long time ulitl to resta...
Hi Josh Durginm,
1.They both are fibre-optical cable in our networkcard.
2.Log files cann't be found yet,due to at...
Yong Wang

02/07/2018

11:09 PM Bug #22220: osd/ReplicatedPG.h:1667:14: internal compiler error: in force_type_die, at dwarf2out....
Fixed by gcc-7.3.1-2.fc26 gcc-7.3.1-2.fc27 in fc27 Brad Hubbard
10:49 PM Bug #22440: New pgs per osd hard limit can cause peering issues on existing clusters
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/20204
merged
Yuri Weinstein
09:44 PM Bug #22848: Pull the cable,5mins later,Put back to the cable,pg stuck a long time ulitl to resta...
Which cable are you pulling? Do you have logs from the monitors and osds? The default failure detection timeouts can ... Josh Durgin
09:40 PM Bug #22916 (Duplicate): OSD crashing in peering
Josh Durgin
09:40 PM Bug #21287 (Duplicate): 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->i...
Josh Durgin
03:52 AM Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"
see https://github.com/ceph/ceph/pull/16675 Chang Liu
02:37 AM Bug #21287: 1 PG down, OSD fails with "FAILED assert(i->prior_version == last || i->is_error())"
we hit this bug too in ec pool 2+1, i find one peer did not receive one piece of op message sended from primary osd, ... lingjie kong
06:12 PM Bug #22952: Monitor stopped responding after awhile
here is where the first mon server is stuck, running mon_status hang:
[root@dl1-kaf101 frli]# ceph --admin-daemon /v...
Frank Li
06:06 PM Bug #22952 (Duplicate): Monitor stopped responding after awhile
After a crash of ceph-mon in 12.2.2 and using a private build provided by ceph developers, the ceph-mon would come up... Frank Li
06:06 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
https://tracker.ceph.com/issues/22952
ticket opened for ceph-mon not responding issue.
Frank Li
06:02 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
I'll open a separate ticket to track the monitor not responding issue. the fix for the force-create-pg issue is good. Frank Li
06:01 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Kefu Chai wrote:
> [...]
>
>
> the cluster formed a quorum of [0,1,2,3,4] since 18:02:21. and it was not in pro...
Frank Li
05:58 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Kefu Chai wrote:
> [...]
>
> was any osd up when you were testing?
Yes, but they were in Booting State, all of...
Frank Li
06:56 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
... Kefu Chai
06:12 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
... Kefu Chai
04:05 PM Bug #22746 (Resolved): osd/common: ceph-osd process is terminated by the logratote task
Kefu Chai
03:33 PM Bug #22949 (Rejected): ceph_test_admin_socket_output --all times out
http://pulpito.ceph.com/kchai-2018-02-07_01:22:25-rados-wip-kefu-testing-2018-02-06-1514-distro-basic-mira/2161301/ Kefu Chai
05:50 AM Backport #22942 (Resolved): luminous: ceph osd force-create-pg cause all ceph-mon to crash and un...
https://github.com/ceph/ceph/pull/20399 Nathan Cutler
05:01 AM Backport #22934 (Resolved): luminous: filestore journal replay does not guard omap operations
https://github.com/ceph/ceph/pull/21547 Nathan Cutler
12:54 AM Backport #22866 (In Progress): jewel: ceph osd df json output validation reported invalid numbers...
https://github.com/ceph/ceph/pull/20344 Prashant D

02/06/2018

08:01 PM Bug #22350 (Resolved): nearfull OSD count in 'ceph -w'
Sage Weil
07:49 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
so anything I can do to help recover the cluster ?? Frank Li
06:50 AM Bug #22847 (Pending Backport): ceph osd force-create-pg cause all ceph-mon to crash and unable to...
Kefu Chai
01:23 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
please see attached logs for when the monitor was started, and then later got into the stuck mode.
I just replaced t...
Frank Li
04:54 PM Bug #22920: filestore journal replay does not guard omap operations
lowering the priority since in practice we don't clone objects with omap on them. Sage Weil
04:53 PM Bug #22920 (Pending Backport): filestore journal replay does not guard omap operations
Sage Weil
04:07 PM Bug #22656: scrub mismatch on bytes (cache pools)
aah, just popped up on luminous: http://pulpito.ceph.com/yuriw-2018-02-05_23:07:16-rados-wip-yuri-testing-2018-02-05-... Sage Weil
02:24 PM Bug #20924: osd: leaked Session on osd.7
/a/yuriw-2018-02-02_20:31:37-rados-wip_yuri_master_2.2.18-distro-basic-smithi/2143177 Sage Weil

02/05/2018

09:06 PM Feature #4305: CRUSH: it should be possible use ssd as primary and hdd for replicas but still mak...
Assuming @Patrick meant "RADOS" and not "rados-java" Nathan Cutler
08:58 PM Bug #21977: null map from OSDService::get_map in advance_pg
Seems persisting, see in
http://qa-proxy.ceph.com/teuthology/teuthology-2018-02-05_04:23:02-upgrade:jewel-x-lumino...
Yuri Weinstein
08:01 PM Feature #3586 (Resolved): CRUSH: separate library
Patrick Donnelly
07:53 PM Feature #3764: osd: async replicas
Patrick Donnelly
07:33 PM Feature #11046 (Resolved): osd: rados io hints improvements
PR merged. Patrick Donnelly
03:17 PM Bug #22920: filestore journal replay does not guard omap operations
https://github.com/ceph/ceph/pull/20279 Sage Weil
03:16 PM Bug #22920 (Resolved): filestore journal replay does not guard omap operations
omap operations are replayed without checking the guards, which means that omap data can leak between objects that ar... Sage Weil
12:05 PM Bug #20924: osd: leaked Session on osd.7
/a/yuriw-2018-02-02_20:31:37-rados-wip_yuri_master_2.2.18-distro-basic-smithi/2143177/remote/smithi111/log/valgrind/o... Kefu Chai
09:37 AM Support #22917 (New): mon keeps on crashing ( 12.2.2 )
mon keeps on crashing ( 0> 2018-02-05 00:22:49.915541 7f6d0a781700 -1 *** Caught signal (Aborted) **
in thread 7f6d...
yair mackenzi
08:49 AM Bug #22916 (Duplicate): OSD crashing in peering
Bluestore OSD is crashed with a stacktrace:... Artemy Kapitula
03:52 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
> Should I be updating the ceph-osd to the same patched version ??
no need to update ceph-osd.
> but very soon,...
Kefu Chai
01:41 AM Bug #22668 (Resolved): osd/ExtentCache.h: 371: FAILED assert(tid == 0)
Kefu Chai

02/04/2018

07:29 AM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
ALso, while the monitors came up and form a forum, but very soon, they would all stop responding again, and then I fi... Frank Li

02/03/2018

09:39 PM Backport #21239 (In Progress): jewel: test_health_warnings.sh can fail
Nathan Cutler
07:01 PM Backport #22450 (Resolved): luminous: Visibility for snap trim queue length
Nathan Cutler
06:37 PM Bug #22409 (Resolved): ceph_objectstore_tool: no flush before collection_empty() calls; ObjectSto...
Nathan Cutler
06:37 PM Backport #22707 (Resolved): luminous: ceph_objectstore_tool: no flush before collection_empty() c...
Nathan Cutler
06:36 PM Bug #21147 (Resolved): Manager daemon x is unresponsive. No standby daemons available
Nathan Cutler
06:35 PM Backport #22399 (Resolved): luminous: Manager daemon x is unresponsive. No standby daemons available
Nathan Cutler
07:18 AM Backport #22906 (Rejected): jewel: bluestore: New OSD - Caught signal - bstore_kv_sync (throttle ...
Nathan Cutler
07:17 AM Bug #22539: bluestore: New OSD - Caught signal - bstore_kv_sync
Adding jewel backport on the theory that (1) Jenkins CI is using modern glibc/kernel to run make check on jewel, brea... Nathan Cutler
12:45 AM Backport #22389 (Resolved): luminous: ceph-objectstore-tool: Add option "dump-import" to examine ...
David Zafman
12:43 AM Bug #22837 (In Progress): discover_all_missing() not always called during activating
Part of https://github.com/ceph/ceph/pull/20220 David Zafman
12:41 AM Bug #18162 (Resolved): osd/ReplicatedPG.cc: recover_replicas: object added to missing set for bac...
David Zafman
12:40 AM Backport #22013 (Resolved): jewel: osd/ReplicatedPG.cc: recover_replicas: object added to missing...
David Zafman

02/02/2018

11:08 PM Backport #22707: luminous: ceph_objectstore_tool: no flush before collection_empty() calls; Objec...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/19967
merged
Yuri Weinstein
11:01 PM Backport #22389: luminous: ceph-objectstore-tool: Add option "dump-import" to examine an export
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/19487
merged
Yuri Weinstein
11:00 PM Backport #22399: luminous: Manager daemon x is unresponsive. No standby daemons available
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/19501
merged
Yuri Weinstein
09:15 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Frank Li wrote:
> I've updated all the ceph-mon with the RPMs from the patch repo, they came up fine, and I've resta...
Frank Li
09:14 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
I've updated all the ceph-mon with the RPMs from the patch repo, they came up fine, and I've restarted the OSDs, but ... Frank Li
08:29 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Just for future operational references, is there anyway to revert the Monitor map to a previous state in the case of ... Frank Li
06:22 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
Please note the Crash happend on the monitor, not the OSD, the OSDs all stayed up, but all the monitor crashed. Frank Li
06:21 PM Bug #22847: ceph osd force-create-pg cause all ceph-mon to crash and unable to come up again
-4> 2018-01-31 22:47:22.942381 7fc641d0b700 1 -- 10.102.52.37:6789/0 <== mon.0 10.102.52.37:6789/0 0 ==== log(1 ... Frank Li
06:09 PM Bug #22847 (Fix Under Review): ceph osd force-create-pg cause all ceph-mon to crash and unable to...
https://github.com/ceph/ceph/pull/20267 Sage Weil
05:46 PM Bug #22847 (Need More Info): ceph osd force-create-pg cause all ceph-mon to crash and unable to c...
Can you attach the entire osd log for the crashed osd? (In particular, we need to see what assertion failed.) Thanks! Sage Weil
07:32 PM Bug #22902 (Resolved): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")

http://pulpito.ceph.com/dzafman-2018-02-01_09:46:36-rados-wip-zafman-testing-distro-basic-smithi/2138315
I think...
David Zafman
07:23 PM Bug #22834 (Resolved): Primary ends up in peer_info which isn't supposed to be there
David Zafman
09:48 AM Bug #22257 (Resolved): mon: mgrmaps not trimmed
Nathan Cutler
09:48 AM Backport #22258 (Resolved): mon: mgrmaps not trimmed
Nathan Cutler
09:47 AM Backport #22402 (Resolved): luminous: osd: replica read can trigger cache promotion
Nathan Cutler
08:05 AM Backport #22807 (Resolved): luminous: "osd pool stats" shows recovery information bugly
Nathan Cutler
07:54 AM Bug #22715 (Resolved): log entries weirdly zeroed out after 'osd pg-temp' command
Nathan Cutler
07:54 AM Backport #22744 (Resolved): luminous: log entries weirdly zeroed out after 'osd pg-temp' command
Nathan Cutler
05:46 AM Documentation #22843: [doc][luminous] the configuration guide still contains osd_op_threads and d...
For downstream Red Hat products, you should use the Red Hat bugzilla to report bugs. This is the upstream bug tracker... Nathan Cutler
05:15 AM Backport #22013 (In Progress): jewel: osd/ReplicatedPG.cc: recover_replicas: object added to miss...
Nathan Cutler
12:17 AM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
When I saw the test running for 4 hours my first thought was that the cluster was unhealthy -- but all OSDs were up a... Jason Dillaman

02/01/2018

11:26 PM Bug #22117 (Resolved): crushtool decompile prints bogus when osd < max_osd_id are missing
Nathan Cutler
11:25 PM Backport #22199 (Resolved): crushtool decompile prints bogus when osd < max_osd_id are missing
Nathan Cutler
11:24 PM Bug #22113 (Resolved): osd: pg limit on replica test failure
Nathan Cutler
11:24 PM Backport #22176 (Resolved): luminous: osd: pg limit on replica test failure
Nathan Cutler
11:24 PM Bug #21907 (Resolved): On pg repair the primary is not favored as was intended
Nathan Cutler
11:23 PM Backport #22213 (Resolved): luminous: On pg repair the primary is not favored as was intended
Nathan Cutler
11:10 PM Backport #22258: mon: mgrmaps not trimmed
Kefu Chai wrote:
> mgrmonitor does not trim old mgrmaps. these can accumulate forever.
>
> https://github.com/ce...
Yuri Weinstein
11:08 PM Backport #22402: luminous: osd: replica read can trigger cache promotion
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/19499
merged
Yuri Weinstein
11:04 PM Bug #22673 (Resolved): osd checks out-of-date osdmap for DESTROYED flag on start
Nathan Cutler
11:03 PM Backport #22761 (Resolved): luminous: osd checks out-of-date osdmap for DESTROYED flag on start
Nathan Cutler
11:01 PM Backport #22761: luminous: osd checks out-of-date osdmap for DESTROYED flag on start
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20068
merged
Yuri Weinstein
11:00 PM Backport #22807: luminous: "osd pool stats" shows recovery information bugly
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20150
merged
Yuri Weinstein
10:59 PM Bug #22419 (Resolved): Pool Compression type option doesn't apply to new OSD's
Nathan Cutler
10:59 PM Backport #22502 (Resolved): luminous: Pool Compression type option doesn't apply to new OSD's
Nathan Cutler
09:04 PM Backport #22502: luminous: Pool Compression type option doesn't apply to new OSD's
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20106
merged
Yuri Weinstein
10:43 PM Bug #22887 (Duplicate): osd/ECBackend.cc: 2202: FAILED assert((offset + length) <= (range.first.g...
... Patrick Donnelly
10:29 PM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
It's not quite that simple; ops on a failed OSD or closed session get moved into the homeless_session and at a quick ... Greg Farnum
06:52 PM Bug #22882: Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
http://qa-proxy.ceph.com/teuthology/jdillaman-2018-02-01_08:21:33-rbd-wip-jd-testing-luminous-distro-basic-smithi/213... Jason Dillaman
06:51 PM Bug #22882 (Resolved): Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
... Jason Dillaman
09:06 PM Bug #22715: log entries weirdly zeroed out after 'osd pg-temp' command
merged https://github.com/ceph/ceph/pull/20042 Yuri Weinstein
06:09 PM Bug #22881 (Resolved): scrub interaction with HEAD boundaries and snapmapper repair is broken
symptom:... Sage Weil
11:45 AM Bug #22842: (luminous) ceph-disk prepare of simple filestore failed with 'Unable to set partition...
John Spray wrote:
> I would suspect that something is strange about the disk (non-GPT partition table perhaps?), and...
Enrico Labedzki
11:11 AM Bug #22842: (luminous) ceph-disk prepare of simple filestore failed with 'Unable to set partition...
I would suspect that something is strange about the disk (non-GPT partition table perhaps?), and you're getting less-... John Spray
11:43 AM Backport #22449: jewel: Visibility for snap trim queue length
presumably non-trivial backport; assigning to the developer Nathan Cutler
11:40 AM Feature #22448 (Pending Backport): Visibility for snap trim queue length
Nathan Cutler
10:49 AM Backport #22866 (Resolved): jewel: ceph osd df json output validation reported invalid numbers (-...
https://github.com/ceph/ceph/pull/20344 Nathan Cutler
08:12 AM Bug #21750 (Resolved): scrub stat mismatch on bytes
The code is gone. xie xingguo
05:42 AM Bug #22848: Pull the cable,5mins later,Put back to the cable,pg stuck a long time ulitl to resta...
why pgs status is peering alawys, I could sure that such as monitor osd both ok.
those pg state machine should wo...
Yong Wang
05:32 AM Bug #22848 (New): Pull the cable,5mins later,Put back to the cable,pg stuck a long time ulitl to...
Hi all,
We have 3 nodes ceph cluster, version 10.2.10.
new installing enviroment and prosessional rpms from downlo...
Yong Wang
04:29 AM Bug #22847 (Resolved): ceph osd force-create-pg cause all ceph-mon to crash and unable to come up...
during the course of trouble-shooting an osd issue, I ran this command:
ceph osd force-create-pg 1.ace11d67
then al...
Frank Li

01/31/2018

10:39 PM Bug #22656: scrub mismatch on bytes (cache pools)
We just aren't assigning that much priority to cache tiering. Greg Farnum
10:27 PM Bug #22752 (Fix Under Review): snapmapper inconsistency, crash on luminous
Greg Farnum
03:32 PM Bug #22440: New pgs per osd hard limit can cause peering issues on existing clusters
https://github.com/ceph/ceph/pull/20204 Kefu Chai
01:53 PM Documentation #22843 (Won't Fix): [doc][luminous] the configuration guide still contains osd_op_t...
In the configuration guide for RHCS 3 is still mentioned osd_op_threads, which is not already part of RHCS 3 code.
...
Tomas Petr
01:51 PM Bug #22842 (New): (luminous) ceph-disk prepare of simple filestore failed with 'Unable to set par...
Hi,
can't create a simple filestore with help of ceph-disk under ubuntu trusy, please have a look on this...
<p...
Enrico Labedzki
12:50 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
Yes, I believe so. Anonymous
12:48 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
Sorry Is it fine now?
Kallepalli Mounika Smitha
12:44 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
No, the line should be:... Anonymous
12:34 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
Did the changes. Please tell the changes are correct or not.Please review. Sorry if I am wrong. Kallepalli Mounika Smitha
12:03 PM Bug #22142 (Resolved): mon doesn't send health status after paxos service is inactive temporarily
John Spray
12:03 PM Backport #22421 (Resolved): mon doesn't send health status after paxos service is inactive tempor...
John Spray
04:10 AM Bug #22837 (Resolved): discover_all_missing() not always called during activating

Sometimes discover_all_missing() isn't called so we don't get a complete picture of misplaced objects. This makes ...
David Zafman
12:44 AM Backport #22164: luminous: cluster [ERR] Unhandled exception from module 'balancer' while running...
Prashant D wrote:
> https://github.com/ceph/ceph/pull/19023
merged
Yuri Weinstein
12:44 AM Backport #22167: luminous: Various odd clog messages for mons
Prashant D wrote:
> https://github.com/ceph/ceph/pull/19031
merged
Yuri Weinstein
12:43 AM Backport #22199: crushtool decompile prints bogus when osd < max_osd_id are missing
Jan Fajerski wrote:
> https://github.com/ceph/ceph/pull/19039
merged
Yuri Weinstein
12:41 AM Backport #22176: luminous: osd: pg limit on replica test failure
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/19059
merged
Yuri Weinstein
12:40 AM Backport #22213: luminous: On pg repair the primary is not favored as was intended
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/19083
merged
Yuri Weinstein
12:13 AM Bug #22834: Primary ends up in peer_info which isn't supposed to be there

Workaround
https://github.com/ceph/ceph/pull/20189
David Zafman

01/30/2018

11:43 PM Bug #22834 (Resolved): Primary ends up in peer_info which isn't supposed to be there

rados/singleton/{all/lost-unfound.yaml msgr-failures/few.yaml msgr/random.yaml objectstore/bluestore-bitmap.yaml ra...
David Zafman
04:01 PM Bug #22440: New pgs per osd hard limit can cause peering issues on existing clusters
will backport https://github.com/ceph/ceph/pull/18614 to luminous. it helps to make this status more visible to user. Kefu Chai

01/29/2018

09:26 PM Bug #22656: scrub mismatch on bytes (cache pools)
/a/sage-2018-01-29_18:07:24-rados-wip-sage-testing-2018-01-29-0927-distro-basic-smithi/2122957
description: rados/th...
Sage Weil
08:01 PM Bug #22743: "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
I don't think a bug in a hammer binary during an upgrade test to jewel is an urgent problem at this point? Greg Farnum
03:15 PM Bug #22201: PG removal with ceph-objectstore-tool segfaulting
We're getting close to converting the OSDs in this cluster to Bluestore. If you would like any tests to be run on th... David Turner
02:56 PM Bug #22668: osd/ExtentCache.h: 371: FAILED assert(tid == 0)
simpler fix: https://github.com/ceph/ceph/pull/20169 Sage Weil
02:38 PM Bug #22440: New pgs per osd hard limit can cause peering issues on existing clusters
First, perhaps this will help to make these issues more visible: https://github.com/ceph/ceph/pull/20167
Second, i...
Dan van der Ster
10:23 AM Bug #20086 (Fix Under Review): LibRadosLockECPP.LockSharedDurPP gets EEXIST
https://github.com/ceph/ceph/pull/20161 Kefu Chai
07:28 AM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
/a/kchai-2018-01-28_09:53:35-rados-wip-kefu-testing-2018-01-27-2356-distro-basic-mira/2120659... Kefu Chai
01:15 AM Bug #21471 (Resolved): mon osd feature checks for osdmap flags and require-osd-release fail if 0 ...
Brad Hubbard

01/28/2018

11:59 PM Backport #22807 (In Progress): luminous: "osd pool stats" shows recovery information bugly
https://github.com/ceph/ceph/pull/20150 Prashant D
12:31 AM Backport #22818 (In Progress): jewel: repair_test fails due to race with osd start
Nathan Cutler

01/27/2018

08:35 AM Backport #21872 (In Progress): jewel: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
Nathan Cutler
06:49 AM Bug #22662 (Pending Backport): ceph osd df json output validation reported invalid numbers (-nan)...
Nathan Cutler

01/26/2018

06:01 PM Backport #22818 (Resolved): jewel: repair_test fails due to race with osd start
https://github.com/ceph/ceph/pull/20146 Nathan Cutler
05:54 PM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
+1 for null, which is an English word and hence far more comprehensible than "NaN", which is what I would call "Progr... Nathan Cutler
05:42 PM Bug #21577 (Resolved): ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
Nathan Cutler
05:41 PM Backport #21636 (Resolved): luminous: ceph-monstore-tool --readable mode doesn't understand FSMap...
Nathan Cutler
05:21 PM Bug #20705 (Pending Backport): repair_test fails due to race with osd start
Seen in Jewel so marking for backport
http://qa-proxy.ceph.com/teuthology/dzafman-2018-01-25_13:41:04-rados-wip-za...
David Zafman
05:16 PM Backport #21872: jewel: ObjectStore/StoreTest.FiemapHoles/3 fails with kstore
This backport is needed as seen in:
http://qa-proxy.ceph.com/teuthology/dzafman-2018-01-25_13:41:04-rados-wip-zafm...
David Zafman
11:55 AM Bug #18239 (New): nan in ceph osd df again
Chang Liu
11:54 AM Bug #18239 (Duplicate): nan in ceph osd df again
duplicated with #22662 Chang Liu
08:00 AM Backport #22808 (Rejected): jewel: "osd pool stats" shows recovery information bugly
Nathan Cutler
08:00 AM Backport #22807 (Resolved): luminous: "osd pool stats" shows recovery information bugly
https://github.com/ceph/ceph/pull/20150 Nathan Cutler
07:30 AM Bug #22727 (Pending Backport): "osd pool stats" shows recovery information bugly
Kefu Chai

01/25/2018

07:59 PM Bug #20243 (Resolved): Improve size scrub error handling and ignore system attrs in xattr checking
David Zafman
07:59 PM Backport #21051 (Resolved): luminous: Improve size scrub error handling and ignore system attrs i...
David Zafman
07:58 PM Bug #21382 (Resolved): Erasure code recovery should send additional reads if necessary
David Zafman
07:56 PM Bug #22145 (Resolved): PG stuck in recovery_unfound
David Zafman
07:56 PM Bug #20059 (Resolved): miscounting degraded objects
David Zafman
07:55 PM Backport #22724 (Resolved): luminous: miscounting degraded objects
David Zafman
07:33 PM Backport #22724 (Fix Under Review): luminous: miscounting degraded objects
Included in https://github.com/ceph/ceph/pull/20055 David Zafman
07:55 PM Backport #22387 (Resolved): luminous: PG stuck in recovery_unfound
David Zafman
07:35 PM Backport #22387 (Fix Under Review): luminous: PG stuck in recovery_unfound
David Zafman
07:54 PM Backport #21653 (Resolved): luminous: Erasure code recovery should send additional reads if neces...
David Zafman
07:53 PM Backport #22069 (Resolved): luminous: osd/ReplicatedPG.cc: recover_replicas: object added to miss...
David Zafman
04:18 PM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Chang Liu wrote:
> Enrico Labedzki wrote:
> > Chang Liu wrote:
> > > Enrico Labedzki wrote:
> > > > Chang Liu wro...
Enrico Labedzki
03:45 PM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Enrico Labedzki wrote:
> Chang Liu wrote:
> > Enrico Labedzki wrote:
> > > Chang Liu wrote:
> > > > Sage Weil wro...
Chang Liu
09:40 AM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Chang Liu wrote:
> Enrico Labedzki wrote:
> > Chang Liu wrote:
> > > Sage Weil wrote:
> > > > 1. it's not valid j...
Enrico Labedzki
09:30 AM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Enrico Labedzki wrote:
> Chang Liu wrote:
> > Sage Weil wrote:
> > > 1. it's not valid json.. Formatter shouldn't ...
Chang Liu
08:52 AM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Chang Liu wrote:
> Sage Weil wrote:
> > 1. it's not valid json.. Formatter shouldn't allow it
> > 2. we should hav...
Enrico Labedzki
06:36 AM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
Sage Weil wrote:
> 1. it's not valid json.. Formatter shouldn't allow it
> 2. we should have a valid value (or 0) t...
Chang Liu
04:02 AM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)

This bug has been fixed by https://github.com/ceph/ceph/pull/13531. We should backport it to Jewel.
Chang Liu
04:08 PM Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi

As Josh said it seems easier to trigger in Jewel. This is based on my attempt to reproduce in master.
All 50 ma...
David Zafman
02:22 AM Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi
Looking through the logs more with David, we found this sequence of events in 1946610:
1) osd.5 gets a write to ob...
Josh Durgin
12:45 PM Bug #22266: mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)
Master PR for second round of backporting: https://github.com/ceph/ceph/pull/19780
Luminous backport PR: https://g...
Nathan Cutler
08:44 AM Bug #22656: scrub mismatch on bytes (cache pools)
Happened here as well: http://pulpito.ceph.com/smithfarm-2018-01-24_19:46:55-rados-wip-smithfarm-testing-distro-basic... Nathan Cutler
04:24 AM Backport #22794 (In Progress): jewel: heartbeat peers need to be updated when a new OSD added int...
https://github.com/ceph/ceph/pull/20108 Kefu Chai
04:14 AM Backport #22794 (Resolved): jewel: heartbeat peers need to be updated when a new OSD added into a...
https://github.com/ceph/ceph/pull/20108 Kefu Chai
04:13 AM Backport #22793 (Rejected): osd: sends messages to marked-down peers
i wanted to backport fix of #18004 not this one. Kefu Chai
04:12 AM Backport #22793 (Rejected): osd: sends messages to marked-down peers
the async osdmap updates introduce a new problem:
- handle_osd_map map X marks down osd Y
- pg thread uses map X-...
Kefu Chai

01/24/2018

09:18 PM Backport #21636: luminous: ceph-monstore-tool --readable mode doesn't understand FSMap, MgrMap
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18754
merged
Yuri Weinstein
09:10 PM Bug #22329: mon: Valgrind: mon (Leak_DefinitelyLost, Leak_IndirectlyLost)
New one:
/ceph/teuthology-archive/yuriw-2018-01-23_20:26:59-multimds-wip-yuri-testing-2018-01-22-1653-luminous-tes...
Patrick Donnelly
07:56 PM Backport #22502: luminous: Pool Compression type option doesn't apply to new OSD's
Master commit was reverted - redoing the backport. Nathan Cutler
06:12 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
Disregard my previous comment; different error message for the same assert was unfortunately buried in the logs. Sorr... Joao Eduardo Luis
06:04 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
FWIW, I am currently reproducing this quite reliably on my dev env, on a quite outdated version of master (cbe78ae629... Joao Eduardo Luis
01:55 PM Bug #21407 (Resolved): backoff causes out of order op
Nathan Cutler
01:54 PM Backport #21794 (Resolved): luminous: backoff causes out of order op
Nathan Cutler
11:23 AM Backport #22450 (In Progress): luminous: Visibility for snap trim queue length
https://github.com/ceph/ceph/pull/20098 Piotr Dalek

01/23/2018

11:57 PM Bug #21566 (Resolved): OSDService::recovery_need_sleep read+updated without locking
Nathan Cutler
11:57 PM Backport #21697 (Resolved): luminous: OSDService::recovery_need_sleep read+updated without locking
Nathan Cutler
11:06 PM Backport #21697: luminous: OSDService::recovery_need_sleep read+updated without locking
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18753
merged
Yuri Weinstein
11:56 PM Backport #21785 (Resolved): luminous: OSDMap cache assert on shutdown
Nathan Cutler
11:07 PM Backport #21785: luminous: OSDMap cache assert on shutdown
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18749
merged
Yuri Weinstein
11:55 PM Bug #21845 (Resolved): Objecter::_send_op unnecessarily constructs costly hobject_t
Nathan Cutler
11:55 PM Backport #21921 (Resolved): luminous: Objecter::_send_op unnecessarily constructs costly hobject_t
Nathan Cutler
11:09 PM Backport #21921: luminous: Objecter::_send_op unnecessarily constructs costly hobject_t
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18745
merged
Yuri Weinstein
11:54 PM Backport #21922 (Resolved): luminous: Objecter::C_ObjectOperation_sparse_read throws/catches exce...
Nathan Cutler
11:10 PM Backport #21922: luminous: Objecter::C_ObjectOperation_sparse_read throws/catches exceptions on -...
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18744
merged
Yuri Weinstein
11:25 PM Bug #21818 (Resolved): ceph_test_objectstore fails ObjectStore/StoreTest.Synthetic/1 (filestore) ...
Nathan Cutler
11:25 PM Backport #21924 (Resolved): luminous: ceph_test_objectstore fails ObjectStore/StoreTest.Synthetic...
Nathan Cutler
11:10 PM Backport #21924: luminous: ceph_test_objectstore fails ObjectStore/StoreTest.Synthetic/1 (filesto...
Shinobu Kinjo wrote:
> https://github.com/ceph/ceph/pull/18742
merged
Yuri Weinstein
08:30 PM Backport #22423 (Closed): luminous: osd: initial minimal efforts to clean up PG interface
I was able to cleanly backport http://tracker.ceph.com/issues/22069 without this large change. David Zafman
11:01 AM Bug #22351: Couldn't init storage provider (RADOS)
No, I set it to Luminous based on the request by theanalyst in https://github.com/ceph/ceph/pull/20023. I'm fine with... Brad Hubbard
10:24 AM Bug #22351: Couldn't init storage provider (RADOS)
@Brad Assigning to you and leaving the backport field on "luminous" (but feel free to zero it out if it's enough to m... Nathan Cutler
10:14 AM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
@David I can only guess that this is not reproducible in master and that's why it requires a luminous-only fix. Could... Nathan Cutler
10:01 AM Backport #22761 (In Progress): luminous: osd checks out-of-date osdmap for DESTROYED flag on start
Nathan Cutler
09:40 AM Backport #22761 (Resolved): luminous: osd checks out-of-date osdmap for DESTROYED flag on start
https://github.com/ceph/ceph/pull/20068 Nathan Cutler
07:48 AM Bug #22673 (Pending Backport): osd checks out-of-date osdmap for DESTROYED flag on start
Kefu Chai
06:38 AM Bug #22727: "osd pool stats" shows recovery information bugly
need to backport it to jewel and luminous. but it at least dates back to 9.2.0. see also http://lists.ceph.com/piperm... Kefu Chai
06:32 AM Bug #22727 (Fix Under Review): "osd pool stats" shows recovery information bugly
Kefu Chai

01/22/2018

11:50 PM Bug #22419 (Pending Backport): Pool Compression type option doesn't apply to new OSD's
Sage Weil
08:12 AM Bug #22419 (Fix Under Review): Pool Compression type option doesn't apply to new OSD's
https://github.com/ceph/ceph/pull/20044 Kefu Chai
11:46 PM Bug #22711 (Resolved): qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands: expect...
Sage Weil
12:53 PM Bug #22711 (Fix Under Review): qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands:...
https://github.com/ceph/ceph/pull/20046 Kefu Chai
11:06 AM Bug #22711: qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands: expect_false test...
the weirdness of this issue is that some PGs are mapped to a single OSD:... Kefu Chai
03:13 AM Bug #22711: qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands: expect_false test...
the curr_object_copies_rate value in PGMap.cc dump_object_stat_sum is .5, which is counteracting the 2x replication f... Sage Weil
07:04 PM Bug #22752: snapmapper inconsistency, crash on luminous
https://github.com/ceph/ceph/pull/20040 Sage Weil
07:03 PM Bug #22752 (Resolved): snapmapper inconsistency, crash on luminous
from Stefan Priebe on ceph-devel ML:... Sage Weil
06:47 PM Backport #22387 (In Progress): luminous: PG stuck in recovery_unfound

Included with another dependent backport as https://github.com/ceph/ceph/pull/20055
David Zafman
12:40 PM Backport #22387 (Need More Info): luminous: PG stuck in recovery_unfound
Non-trivial backport Nathan Cutler
02:27 PM Feature #22750 (Fix Under Review): libradosstriper conditional compile
-https://github.com/ceph/ceph/pull/18197- Nathan Cutler
01:21 PM Feature #22750 (Resolved): libradosstriper conditional compile
Currently libradosstriper is a hard dependency of the rados CLI tool.
Please add a "WITH_LIBRADOSSTRIPER" compile-...
Nathan Cutler
02:16 PM Bug #22746 (Fix Under Review): osd/common: ceph-osd process is terminated by the logratote task
John Spray
11:51 AM Bug #22746 (Resolved): osd/common: ceph-osd process is terminated by the logratote task
1. Construct the scene:
(1) step 1:
Open the terminal_1, and
Prepare the cmd: "killall -q -1 ceph-mon ...
huanwen ren
12:59 PM Support #22749 (Closed): dmClock OP classification
Why does dmClock algorithm in CEPH attribute recovery's read and write OP to osd_op_queue_mclock_osd_sub, so that whe... 何 伟俊
12:41 PM Backport #22724 (Need More Info): luminous: miscounting degraded objects
Nathan Cutler
12:41 PM Backport #22724: luminous: miscounting degraded objects
David, while you're doing this one, can you include https://tracker.ceph.com/issues/22387 as well? Nathan Cutler
12:23 PM Support #22680 (Resolved): mons segmentation faults New 12.2.2 cluster
Nathan Cutler
03:04 AM Bug #22715 (Pending Backport): log entries weirdly zeroed out after 'osd pg-temp' command
Kefu Chai
03:04 AM Backport #22744 (In Progress): luminous: log entries weirdly zeroed out after 'osd pg-temp' command
https://github.com/ceph/ceph/pull/20042 Kefu Chai
03:03 AM Backport #22744 (Resolved): luminous: log entries weirdly zeroed out after 'osd pg-temp' command
https://github.com/ceph/ceph/pull/20042 Kefu Chai

01/21/2018

08:29 PM Bug #22715 (Resolved): log entries weirdly zeroed out after 'osd pg-temp' command
Sage Weil
06:56 PM Bug #22743 (New): "RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-sm...
Run: http://pulpito.ceph.com/teuthology-2018-01-19_01:15:02-upgrade:hammer-x-jewel-distro-basic-smithi/
Job: 2088826...
Yuri Weinstein

01/20/2018

11:18 PM Bug #22351 (In Progress): Couldn't init storage provider (RADOS)
Reopening this and reassigning it to RADOS as there are a couple of changes we can make to logging to make this easie... Brad Hubbard

01/19/2018

04:16 PM Support #20108: PGs are not remapped correctly when one host fails
Hi,
Thank you for your answer!
I've seen that page before, but which tunable are you suggesting for the problem...
Laszlo Budai
09:59 AM Bug #22233 (Fix Under Review): prime_pg_temp breaks on uncreated pgs
https://github.com/ceph/ceph/pull/20025 Kefu Chai
09:08 AM Bug #22711: qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands: expect_false test...
... Chang Liu
02:51 AM Support #22553: ceph-object-tool can not remove metadata pool's object
not something wrong with disk,it can be repeat peng zhang

01/18/2018

10:57 PM Support #20108: PGs are not remapped correctly when one host fails
http://docs.ceph.com/docs/master/rados/operations/crush-map/?highlight=tunables#tunables Greg Farnum
07:02 PM Bug #22351 (Closed): Couldn't init storage provider (RADOS)
Yehuda Sadeh
10:47 AM Bug #22351: Couldn't init storage provider (RADOS)
Brad Hubbard wrote:
>
> (6*1024)*3 = 18432, thus 18432/47 ~ 392 PGs per OSD. You omitted the size of the pools.
...
Nikos Kormpakis
03:21 AM Bug #22351: Couldn't init storage provider (RADOS)
https://ceph.com/pgcalc/ should be used as a guide/starting point. Brad Hubbard
03:07 PM Bug #22727: "osd pool stats" shows recovery information bugly
https://github.com/ceph/ceph/pull/20009 Chang Liu
05:18 AM Bug #22727 (In Progress): "osd pool stats" shows recovery information bugly
Chang Liu
03:16 AM Bug #22727 (Resolved): "osd pool stats" shows recovery information bugly
... Chang Liu
03:51 AM Bug #22715 (Fix Under Review): log entries weirdly zeroed out after 'osd pg-temp' command
https://github.com/ceph/ceph/pull/19998 Sage Weil

01/17/2018

10:28 PM Bug #22351: Couldn't init storage provider (RADOS)
Nikos Kormpakis wrote:
> But I still cannot understand why I'm hitting this error.
> Regarding my cluster, I have t...
Brad Hubbard
01:15 PM Bug #22351: Couldn't init storage provider (RADOS)
Brad Hubbard wrote:
> I'm able to reproduce something like what you are seeing, the messages are a little different....
Nikos Kormpakis
03:30 AM Bug #22351: Couldn't init storage provider (RADOS)
I'm able to reproduce something like what you are seeing, the messages are a little different.
What I see is this....
Brad Hubbard
12:12 AM Bug #22351: Couldn't init storage provider (RADOS)
It turns out what we need is the hexadecimal int representation of '-34' from the ltrace output.
$ c++filt </tmp/l...
Brad Hubbard
10:26 PM Bug #21388: inconsistent pg but repair does nothing reporting head data_digest != data_digest fro...
Ryan Anstey wrote:
> I'm working on fixing all my inconsistent pgs but I'm having issues with rados get... hopefully...
Brian Andrus
09:07 PM Bug #22656: scrub mismatch on bytes (cache pools)
/a/sage-2018-01-17_14:40:55-rados-wip-sage-testing-2018-01-16-2156-distro-basic-smithi/2082959
description: rados/...
Sage Weil
07:54 PM Bug #21833: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
David Zafman
07:48 PM Bug #20059: miscounting degraded objects
https://github.com/ceph/ceph/pull/19850 David Zafman
07:36 PM Bug #21387 (Can't reproduce): mark_unfound_lost hangs
Multiple fixes to mark_all_unfound_lost() has fixed this. Possibly the most important master branch commit is 689bff... David Zafman
06:00 PM Bug #22668 (Fix Under Review): osd/ExtentCache.h: 371: FAILED assert(tid == 0)
https://github.com/ceph/ceph/pull/19989 Sage Weil
05:10 PM Backport #22724 (Resolved): luminous: miscounting degraded objects
on bigbang,... David Zafman
04:39 PM Bug #22673 (Fix Under Review): osd checks out-of-date osdmap for DESTROYED flag on start
note: you can work around this by waiting a bit until some osd maps trim from the monitor.
https://github.com/ceph...
Sage Weil
02:54 PM Bug #22673: osd checks out-of-date osdmap for DESTROYED flag on start
It looks like the _preboot destroyed check should go after we catch up on maps. Sage Weil
02:53 PM Bug #22673: osd checks out-of-date osdmap for DESTROYED flag on start
This is a real bug, should be straightforward to fix. Thanks for the report! Sage Weil
02:59 PM Bug #22544: objecter cannot resend split-dropped op when racing with con reset
Hmm, I'm not sure what the best fix is. Do you see a good path to fixing this with ms_handle_connect()? Sage Weil
02:57 PM Bug #22659 (In Progress): During the cache tiering configuration ,ceph-mon daemon getting crashed...
This will need to be backported to luminous and jewel once merged. Joao Eduardo Luis
09:36 AM Bug #22659: During the cache tiering configuration ,ceph-mon daemon getting crashed after setting...
https://github.com/ceph/ceph/pull/19983 Jing Li
02:55 PM Bug #22662: ceph osd df json output validation reported invalid numbers (-nan) (jewel)
1. it's not valid json.. Formatter shouldn't allow it
2. we should have a valid value (or 0) to use
Sage Weil
02:52 PM Bug #22661 (Triaged): Segmentation fault occurs when the following CLI is executed
Joao Eduardo Luis
02:51 PM Bug #22672 (Triaged): OSDs frequently segfault in PrimaryLogPG::find_object_context() with empty ...
Joao Eduardo Luis
02:28 PM Bug #22597 (Fix Under Review): "sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-0'" fails in upgra...
https://github.com/ceph/ceph/pull/19987 Kefu Chai
01:32 PM Bug #22233 (In Progress): prime_pg_temp breaks on uncreated pgs
Kefu Chai
11:24 AM Support #22664: some random OSD are down (with a Abort signal on exception) after replace/rebuild...
Hi Greg,
can you point me to the link, as far we have seen yet, all ulimit 10 times higher as needed on all nodes....
Enrico Labedzki

01/16/2018

09:49 PM Bug #22715 (Resolved): log entries weirdly zeroed out after 'osd pg-temp' command
... Sage Weil
07:59 PM Bug #20059 (Pending Backport): miscounting degraded objects
David Zafman
07:10 PM Bug #22711 (Resolved): qa/workunits/cephtool/test.sh fails with test_mon_cephdf_commands: expect...
... Sage Weil
07:09 PM Bug #22677 (Resolved): rados/test_rados_tool.sh failure
Sage Weil
04:16 PM Bug #22351: Couldn't init storage provider (RADOS)
Hello,
we're facing the same issue on a Luminous cluster.
Some info about the cluster:
Version: ceph version 1...
Nikos Kormpakis
03:08 PM Bug #20874: osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end() || (miter->second...
/a/sage-2018-01-16_03:08:54-rados-wip-sage2-testing-2018-01-15-1257-distro-basic-smithi/2077982... Sage Weil
01:33 PM Backport #22707 (In Progress): luminous: ceph_objectstore_tool: no flush before collection_empty(...
Nathan Cutler
01:30 PM Backport #22707 (Resolved): luminous: ceph_objectstore_tool: no flush before collection_empty() c...
https://github.com/ceph/ceph/pull/19967 Nathan Cutler
01:21 PM Bug #22409 (Pending Backport): ceph_objectstore_tool: no flush before collection_empty() calls; O...
Sage Weil
12:53 PM Support #20108: PGs are not remapped correctly when one host fails
Hello,
I'm sorry I've missed your message. Can you please give me some clues about the "newer crush tunables" that...
Laszlo Budai
12:48 PM Bug #22668: osd/ExtentCache.h: 371: FAILED assert(tid == 0)
/a/sage-2018-01-15_18:49:16-rados-wip-sage-testing-2018-01-14-1341-distro-basic-smithi/2076047 Sage Weil
12:48 PM Bug #22668: osd/ExtentCache.h: 371: FAILED assert(tid == 0)
/a/sage-2018-01-15_18:49:16-rados-wip-sage-testing-2018-01-14-1341-distro-basic-smithi/2075822 Sage Weil
11:10 AM Support #22680: mons segmentation faults New 12.2.2 cluster
Thanks! We had jemalloc in LD_PRELOAD since Infernalis, so i didn't think about that. I removed this from sysconfig, ... Kenneth Waegeman
 

Also available in: Atom