Project

General

Profile

Activity

From 04/08/2018 to 05/07/2018

05/07/2018

08:38 PM Bug #24037 (Resolved): osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_nod...
... Patrick Donnelly
07:24 PM Bug #23909: snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a1,1a3 was r...

Nevermind. I see you branch was still on ci repo.
$ git branch --contains c20a95b0b9f4082dcebb339135683b91fe39e...
David Zafman
07:18 PM Bug #23909 (Need More Info): snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,...

Does your branch include c20a95b0b9f4082dcebb339135683b91fe39ec0a? The change I made was needed to make that fix w...
David Zafman
05:25 PM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
Alternative Mimic fix: https://github.com/ceph/ceph/pull/21859 Jason Dillaman
02:55 PM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
will reset the member variables of C_notify_Finish in its dtor for debugging, to see if it has been destroyed or not ... Kefu Chai
07:43 AM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
the test still fails with the fixes above: /a/kchai-2018-05-06_15:50:41-rados-wip-kefu-testing-2018-05-06-2204-distro... Kefu Chai
03:34 PM Bug #24033 (Fix Under Review): rados: not all exceptions accept keyargs
Patrick Donnelly
01:19 PM Bug #24033: rados: not all exceptions accept keyargs
https://github.com/ceph/ceph/pull/21853 Rishabh Dave
12:55 PM Bug #24033 (Resolved): rados: not all exceptions accept keyargs
The method make_ex() in rados.pyx raises exceptions irrespective of the fact whether an exception can or cannot handl... Rishabh Dave
02:15 AM Backport #23925 (In Progress): luminous: assert on pg upmap
Prashant D
12:05 AM Bug #24023: Segfault on OSD in 12.2.5
Another one occurred today on a different OSD:
2018-05-06 19:48:33.636221 7f0f55922700 -1 *** Caught signal (Segme...
Alex Gorbachev

05/06/2018

09:01 AM Backport #23925: luminous: assert on pg upmap
https://github.com/ceph/ceph/pull/21818 xie xingguo
08:57 AM Bug #23921 (Pending Backport): pg-upmap cannot balance in some case
xie xingguo
03:35 AM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
mimic: https://github.com/ceph/ceph/pull/21834 Kefu Chai
03:32 AM Backport #24027 (In Progress): mimic: ceph_daemon.py format_dimless units list index out of range
Kefu Chai
03:30 AM Backport #24027 (Resolved): mimic: ceph_daemon.py format_dimless units list index out of range
https://github.com/ceph/ceph/pull/21836 Kefu Chai
03:29 AM Bug #23962 (Pending Backport): ceph_daemon.py format_dimless units list index out of range
Kefu Chai
03:28 AM Backport #24026 (In Progress): mimic: pg-upmap cannot balance in some case
Kefu Chai
03:27 AM Backport #24026 (Resolved): mimic: pg-upmap cannot balance in some case
https://github.com/ceph/ceph/pull/21835 Kefu Chai
03:24 AM Bug #23627 (Resolved): Error EACCES: problem getting command descriptions from mgr.None from 'cep...
Kefu Chai

05/05/2018

08:32 PM Bug #24025: RocksDB compression is not supported at least on Debian.
I use:
deb https://download.ceph.com/debian-luminous/ stretch main
Ceph 12.2.5 and Debian 9.
Марк Коренберг
08:31 PM Bug #24025 (Resolved): RocksDB compression is not supported at least on Debian.
... Марк Коренберг
04:20 PM Bug #23909: snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a1,1a3 was r...
http://pulpito.ceph.com/kchai-2018-05-05_14:56:43-rados-wip-kefu-testing-2018-05-05-1912-distro-basic-smithi/
<pre...
Kefu Chai
01:47 PM Backport #23904 (Resolved): luminous: Deleting a pool with active watch/notify linger ops can res...
Nathan Cutler
11:55 AM Bug #24023 (Duplicate): Segfault on OSD in 12.2.5
2018-05-05 06:33:42.383231 7f83289a4700 -1 *** Caught signal (Segmentation fault) **
in thread 7f83289a4700 thread_...
Alex Gorbachev
11:23 AM Bug #24022: "ceph tell osd.x bench" writes resulting JSON to stderr instead of stdout.
Maybe not only this command, but also some others. Марк Коренберг
11:23 AM Bug #24022 (Resolved): "ceph tell osd.x bench" writes resulting JSON to stderr instead of stdout.
Марк Коренберг
11:05 AM Bug #23627: Error EACCES: problem getting command descriptions from mgr.None from 'ceph tell mgr'
https://github.com/ceph/ceph/pull/21832 Kefu Chai
10:57 AM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
master: https://github.com/ceph/ceph/pull/21831 Kefu Chai
08:57 AM Bug #21977 (Resolved): null map from OSDService::get_map in advance_pg
Nathan Cutler
08:57 AM Backport #23870 (Resolved): luminous: null map from OSDService::get_map in advance_pg
Nathan Cutler
08:56 AM Backport #24016: luminous: scrub interaction with HEAD boundaries and snapmapper repair is broken
Quoting David Zafman, PR to backport is:
https://github.com/ceph/ceph/pull/21546
Backport the entire pull reque...
Nathan Cutler

05/04/2018

07:01 PM Backport #23784 (Resolved): luminous: osd: Warn about objects with too many omap entries
Nathan Cutler
05:13 PM Backport #23784: luminous: osd: Warn about objects with too many omap entries
Vikhyat Umrao wrote:
> https://github.com/ceph/ceph/pull/21518
merged
Yuri Weinstein
06:22 PM Bug #24000: mon: snap delete on deleted pool returns 0 without proper payload
Jason put a client-side handler in, but we should change the monitor as well so that we don't break older clients (or... Greg Farnum
05:16 PM Backport #23904: luminous: Deleting a pool with active watch/notify linger ops can result in seg ...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21752
merged
Yuri Weinstein
05:14 PM Backport #23870: luminous: null map from OSDService::get_map in advance_pg
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21737
merged
Yuri Weinstein
03:20 PM Backport #24016 (Resolved): luminous: scrub interaction with HEAD boundaries and snapmapper repai...
Included in
https://github.com/ceph/ceph/pull/22044
Nathan Cutler
03:19 PM Backport #24015 (Resolved): luminous: UninitCondition in PG::RecoveryState::Incomplete::react(PG:...
https://github.com/ceph/ceph/pull/21993 Nathan Cutler
02:30 PM Bug #23921 (Fix Under Review): pg-upmap cannot balance in some case
xie xingguo
02:30 PM Bug #23921: pg-upmap cannot balance in some case
https://github.com/ceph/ceph/pull/21815 xie xingguo
08:29 AM Bug #24007 (New): rados.connect get a segmentation fault
if i try to use librados in this follow way, i will get a segmentation fault.
!http://img0.ph.126.net/ekMbDVzMROb-o_...
xianpao chen
04:04 AM Feature #22420 (Fix Under Review): Add support for obtaining a list of available compression options
https://github.com/ceph/ceph/pull/21809 Kefu Chai
01:47 AM Bug #24006 (New): ceph-osd --mkfs has nondeterministic output
On 12.2.3, my `ceph-osd` has nondeterministic output. I'm running it s root.
Sometimes it prints "created object s...
Niklas Hambuechen
12:46 AM Bug #22881: scrub interaction with HEAD boundaries and snapmapper repair is broken
https://github.com/ceph/ceph/pull/21546
Backport the entire pull request which also fixes http://tracker.ceph.com/...
David Zafman
12:43 AM Bug #22881 (Pending Backport): scrub interaction with HEAD boundaries and snapmapper repair is br...
David Zafman
12:45 AM Bug #23909 (Resolved): snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a...
Included in https://github.com/ceph/ceph/pull/21546 David Zafman

05/03/2018

10:30 PM Bug #23980 (Pending Backport): UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap...
Sage Weil
01:45 PM Bug #23980 (Fix Under Review): UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap...
https://github.com/ceph/ceph/pull/21798 Sage Weil
01:03 AM Bug #23980 (Resolved): UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap const&)
... Sage Weil
08:56 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
Are there messages "not scheduling scrubs due to active recovery" in the logs on any of the primary OSDs? That messa... David Zafman
08:40 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
Ran into something similar this past week. ( active+clean+inconsistent) where forced scrubs would not run. The foll... Sebastian Sobolewski
07:27 PM Bug #24000 (Fix Under Review): mon: snap delete on deleted pool returns 0 without proper payload
*PR*: https://github.com/ceph/ceph/pull/21804 Jason Dillaman
07:21 PM Bug #24000 (Resolved): mon: snap delete on deleted pool returns 0 without proper payload
It can lead to an abort in the client application since an empty reply w/o an error code is constructed in the monito... Jason Dillaman
03:44 PM Documentation #23999 (Resolved): osd_recovery_priority is not documented (but osd_recovery_op_pri...
Please document osd_recovery_priority and how it differs from osd_recovery_op_priority. Марк Коренберг
02:48 PM Bug #23961 (Duplicate): valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::re...
Kefu Chai
02:18 PM Backport #23998 (Resolved): luminous: osd/EC: slow/hung ops in multimds suite test
https://github.com/ceph/ceph/pull/24393 Nathan Cutler
02:08 PM Backport #23915 (Resolved): luminous: monitors crashing ./include/interval_set.h: 355: FAILED ass...
Nathan Cutler
01:51 PM Backport #23915: luminous: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jew...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21717
merged
Yuri Weinstein
01:40 PM Bug #23769 (Pending Backport): osd/EC: slow/hung ops in multimds suite test
Sage Weil
11:58 AM Feature #22420 (New): Add support for obtaining a list of available compression options
i am reopening this ticket. as the plugin registry is empty before any of the supported compressor plugin is created ... Kefu Chai
11:27 AM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
I didn't import or export any pgs, that was working osd in the cluster.
Is it possible that the restart of the osd ...
Rafal Wadolowski
10:28 AM Backport #23988 (Resolved): luminous: luminous->master: luminous crashes with AllReplicasRecovere...
https://github.com/ceph/ceph/pull/21964 Nathan Cutler
10:27 AM Backport #23986 (Resolved): luminous: recursive lock of objecter session::lock on cancel
https://github.com/ceph/ceph/pull/21939 Nathan Cutler
05:21 AM Bug #22220: osd/ReplicatedPG.h:1667:14: internal compiler error: in force_type_die, at dwarf2out....
https://access.redhat.com/errata/RHBA-2018:1293 Brad Hubbard
01:37 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Jason Dillaman wrote:
> Moving to RADOS since it sounds like it's an issue of corruption on your cache tier.
How ...
宏伟 唐
01:00 AM Bug #22656: scrub mismatch on bytes (cache pools)
/a/sage-2018-05-02_22:22:16-rados-wip-sage3-testing-2018-05-02-1448-distro-basic-smithi/2468046
description: rados...
Sage Weil
12:20 AM Feature #23979 (Resolved): Limit pg log length during recovery/backfill so that we don't run out ...

This means if there's another failure, we'll need to restart backfill or go from recovery to backfill, but that's b...
David Zafman

05/02/2018

09:02 PM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
Did you import or export any PGs? The on-disk pg info from comment #2 indicates the pg doesn't exist on osd.33 yet.
...
Josh Durgin
08:53 PM Bug #23961: valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::react(PG::AdvM...
What PRs were in the test branch that hit this? Did any of them change the PG class or related structures? Josh Durgin
12:23 PM Bug #23961: valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::react(PG::AdvM...
rerunning this test with another branch did not reproduce this issue.
http://pulpito.ceph.com/kchai-2018-05-02_11:...
Kefu Chai
01:50 AM Bug #23961 (Duplicate): valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::re...
... Kefu Chai
08:48 PM Bug #23830: rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
The pg meta object is supposed to be empty since many versions ago. IIRC sage suggested this may be from a race that ... Josh Durgin
08:42 PM Bug #23860 (Pending Backport): luminous->master: luminous crashes with AllReplicasRecovered in St...
Josh Durgin
08:40 PM Bug #23942 (Duplicate): test_mon_osdmap_prune.sh failures
Josh Durgin
07:50 PM Bug #23769 (Fix Under Review): osd/EC: slow/hung ops in multimds suite test
https://github.com/ceph/ceph/pull/21684 Sage Weil
05:26 PM Bug #23966 (Fix Under Review): Deleting a pool with active notify linger ops can result in seg fault
*PR*: https://github.com/ceph/ceph/pull/21786 Jason Dillaman
04:00 PM Bug #23966 (In Progress): Deleting a pool with active notify linger ops can result in seg fault
Jason Dillaman
03:51 PM Bug #23966 (Resolved): Deleting a pool with active notify linger ops can result in seg fault
It's possible that if a notification is sent while a pool is being deleted, the Objecter will fail the Op w/ -ENOENT ... Jason Dillaman
02:50 PM Bug #23965 (New): FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cach...
teuthology run with debug-ms 1 at http://pulpito.ceph.com/joshd-2018-05-01_18:40:57-rgw-master-distro-basic-smithi/ Casey Bodley
01:42 PM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
http://pulpito.ceph.com/pdonnell-2018-05-01_20:58:18-multimds-wip-pdonnell-testing-20180501.191840-testing-basic-smit... Patrick Donnelly
11:47 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
Moving to RADOS since it sounds like it's an issue of corruption on your cache tier. Jason Dillaman
02:41 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
More discovery:
The snapshot exported from cache tier(rep_glance pool) is an all-zero file (viewed by "od xxx.snap...
宏伟 唐
11:40 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
We frequently experience this with 12.2.3 running Ceph in a Kubernetes cluster, cf. https://github.com/ceph/ceph-cont... Tim Niemueller
11:32 AM Bug #23952: "ceph -f json osd pool ls detail" has missing pool namd and pool id
Sorry, pool_name is here. Only pool id is missing. Марк Коренберг
10:11 AM Bug #23952: "ceph -f json osd pool ls detail" has missing pool namd and pool id
Are you sure you're not getting pool name? I'm getting a pool_name field when I try this, and it appears to have bee... John Spray
11:04 AM Backport #23924 (In Progress): luminous: LibRadosAio.PoolQuotaPP failed
https://github.com/ceph/ceph/pull/21778 Prashant D
06:53 AM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
Any update? Mentioned workaround is not good idea for us. Jarek Owsiewski
06:42 AM Bug #23949 (Resolved): osd: "failed to encode map e19 with expected crc" in cluster log "
Kefu Chai
05:22 AM Bug #23962 (Fix Under Review): ceph_daemon.py format_dimless units list index out of range
https://github.com/ceph/ceph/pull/21765 Kefu Chai
04:02 AM Bug #23962: ceph_daemon.py format_dimless units list index out of range
sorry, the actual max magnitude is EB level instead of ZB. Ivan Guan
03:48 AM Bug #23962 (Resolved): ceph_daemon.py format_dimless units list index out of range
The largest order of magnitude of original list max only to the PB level,however the ceph cluster Objecter actv metri... Ivan Guan
03:31 AM Backport #23914 (In Progress): luminous: cache-try-flush hits wrlock, busy loops
https://github.com/ceph/ceph/pull/21764 Prashant D

05/01/2018

06:31 PM Bug #23827: osd sends op_reply out of order
For object 10000000004.00000004 osd_op_reply for 102425 is received before 93353.... Neha Ojha
05:52 PM Bug #23949 (Fix Under Review): osd: "failed to encode map e19 with expected crc" in cluster log "
https://github.com/ceph/ceph/pull/21756 Sage Weil
03:53 PM Bug #23949: osd: "failed to encode map e19 with expected crc" in cluster log "
/a/sage-2018-05-01_15:25:33-fs-master-distro-basic-smithi/2462491
reproduces on master
Sage Weil
03:09 PM Bug #23949 (In Progress): osd: "failed to encode map e19 with expected crc" in cluster log "
Sage Weil
03:09 PM Bug #23949: osd: "failed to encode map e19 with expected crc" in cluster log "
... Sage Weil
02:17 PM Bug #23949: osd: "failed to encode map e19 with expected crc" in cluster log "
More from master: http://pulpito.ceph.com/pdonnell-2018-05-01_03:21:36-fs-master-testing-basic-smithi/ Patrick Donnelly
05:26 PM Bug #23940 (Pending Backport): recursive lock of objecter session::lock on cancel
Sage Weil
02:39 PM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
The problem of left-over OSD data still persists when the partition table has been removed before "ceph-disk zap" is ... Geert Kloosterman
12:42 PM Backport #23905 (In Progress): jewel: Deleting a pool with active watch/notify linger ops can res...
https://github.com/ceph/ceph/pull/21754 Prashant D
11:36 AM Backport #23904 (In Progress): luminous: Deleting a pool with active watch/notify linger ops can ...
https://github.com/ceph/ceph/pull/21752 Prashant D
07:01 AM Bug #23952 (New): "ceph -f json osd pool ls detail" has missing pool namd and pool id
`ceph osd pool ls detail` shows information about pool id and pool name, but with '-f json' this information disappears. Марк Коренберг

04/30/2018

11:10 PM Bug #23949 (Resolved): osd: "failed to encode map e19 with expected crc" in cluster log "
http://pulpito.ceph.com/pdonnell-2018-04-30_21:17:21-fs-wip-pdonnell-testing-20180430.193008-testing-basic-smithi/245... Patrick Donnelly
05:46 PM Bug #23860: luminous->master: luminous crashes with AllReplicasRecovered in Started/Primary/Activ...
Sage Weil
05:25 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025569.html
Paul Emmerich wrote:
> looks like it fai...
Greg Farnum
12:28 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
(Pulling backtrace into the ticket) John Spray
03:57 PM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
This pg has 0 value in same_interval_since. I checked this with following output:
https://paste.fedoraproject.org/pa...
Rafal Wadolowski
01:12 PM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
I found a little more... Rafal Wadolowski
03:48 PM Bug #23942 (Duplicate): test_mon_osdmap_prune.sh failures
... Sage Weil
02:55 PM Bug #23922 (Resolved): ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
Sage Weil
01:44 PM Bug #23922 (Fix Under Review): ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
https://github.com/ceph/ceph/pull/21739 Kefu Chai
01:32 PM Bug #23922: ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
... Kefu Chai
01:06 PM Bug #23922: ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
failed to reproduce this issue locally.
adding...
Kefu Chai
11:00 AM Bug #23922: ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
http://pulpito.ceph.com/kchai-2018-04-30_00:59:17-rados-wip-kefu-testing-2018-04-29-1248-distro-basic-smithi/2454246/ Kefu Chai
02:53 PM Bug #23940 (Fix Under Review): recursive lock of objecter session::lock on cancel
https://github.com/ceph/ceph/pull/21742 Sage Weil
02:30 PM Bug #23940 (Resolved): recursive lock of objecter session::lock on cancel
... Sage Weil
12:30 PM Backport #23870 (In Progress): luminous: null map from OSDService::get_map in advance_pg
https://github.com/ceph/ceph/pull/21737 Prashant D

04/29/2018

11:46 PM Bug #23937 (New): FAILED assert(info.history.same_interval_since != 0)
Two of our osds hit these assert and now they are down.... Rafal Wadolowski
10:23 AM Bug #22354 (Resolved): v12.2.2 unable to create bluestore osd using ceph-disk
Nathan Cutler
10:23 AM Backport #23103 (Resolved): luminous: v12.2.2 unable to create bluestore osd using ceph-disk
Nathan Cutler
10:22 AM Bug #22082 (Resolved): Various odd clog messages for mons
Nathan Cutler
10:21 AM Backport #22167 (Resolved): luminous: Various odd clog messages for mons
Nathan Cutler
10:21 AM Bug #22090 (Resolved): cluster [ERR] Unhandled exception from module 'balancer' while running on ...
Nathan Cutler
10:20 AM Backport #22164 (Resolved): luminous: cluster [ERR] Unhandled exception from module 'balancer' wh...
Nathan Cutler
10:20 AM Bug #21993 (Resolved): "ceph osd create" is not idempotent
Nathan Cutler
10:20 AM Backport #22019 (Resolved): luminous: "ceph osd create" is not idempotent
Nathan Cutler
10:19 AM Bug #21203 (Resolved): build_initial_pg_history doesn't update up/acting/etc
Nathan Cutler
10:19 AM Backport #21236 (Resolved): luminous: build_initial_pg_history doesn't update up/acting/etc
Nathan Cutler
07:07 AM Bug #21206 (Resolved): thrashosds read error injection doesn't take live_osds into account
Nathan Cutler
07:07 AM Backport #21235 (Resolved): luminous: thrashosds read error injection doesn't take live_osds into...
Nathan Cutler
06:22 AM Backport #23915 (In Progress): luminous: monitors crashing ./include/interval_set.h: 355: FAILED ...
Nathan Cutler
05:44 AM Backport #22934: luminous: filestore journal replay does not guard omap operations
https://github.com/ceph/ceph/pull/21547 Victor Denisov

04/28/2018

10:32 PM Backport #23915: luminous: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jew...
https://github.com/ceph/ceph/pull/21717 Paul Emmerich
07:11 PM Backport #23926 (Rejected): luminous: disable bluestore cache caused a rocksdb error
Nathan Cutler
07:11 PM Backport #23925 (Resolved): luminous: assert on pg upmap
https://github.com/ceph/ceph/pull/21818 Nathan Cutler
07:11 PM Backport #23924 (Resolved): luminous: LibRadosAio.PoolQuotaPP failed
https://github.com/ceph/ceph/pull/21778 Nathan Cutler
06:19 PM Bug #23816 (Pending Backport): disable bluestore cache caused a rocksdb error
Sage Weil
06:17 PM Bug #23878 (Pending Backport): assert on pg upmap
Sage Weil
06:17 PM Bug #23916 (Pending Backport): LibRadosAio.PoolQuotaPP failed
Sage Weil
06:16 PM Bug #23922 (Resolved): ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
... Sage Weil
04:23 AM Bug #23921: pg-upmap cannot balance in some case
But if i unlink all osds from 'root default / host huangjun', every thing works ok.... huang jun
04:04 AM Bug #23921 (Resolved): pg-upmap cannot balance in some case
I have a cluster with 21 osds, cluster topology is... huang jun

04/27/2018

10:38 PM Bug #23916 (Fix Under Review): LibRadosAio.PoolQuotaPP failed
https://github.com/ceph/ceph/pull/21709 Sage Weil
09:22 PM Bug #23916 (Resolved): LibRadosAio.PoolQuotaPP failed
http://qa-proxy.ceph.com/teuthology/yuriw-2018-04-27_16:52:05-rados-wip-yuri-testing-2018-04-27-1519-distro-basic-smi... Josh Durgin
10:27 PM Bug #23917 (Duplicate): LibRadosAio.PoolQuotaPP failure
Josh Durgin
10:24 PM Bug #23917 (Duplicate): LibRadosAio.PoolQuotaPP failure
... Sage Weil
08:07 PM Backport #23915 (Resolved): luminous: monitors crashing ./include/interval_set.h: 355: FAILED ass...
https://github.com/ceph/ceph/pull/21717 Nathan Cutler
08:06 PM Backport #23914 (Resolved): luminous: cache-try-flush hits wrlock, busy loops
https://github.com/ceph/ceph/pull/21764 Nathan Cutler
08:01 PM Bug #23860 (Fix Under Review): luminous->master: luminous crashes with AllReplicasRecovered in St...
https://github.com/ceph/ceph/pull/21706 Sage Weil
07:30 PM Bug #18746 (Pending Backport): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) ...
Sage Weil
07:28 PM Bug #23664 (Pending Backport): cache-try-flush hits wrlock, busy loops
Sage Weil
07:28 PM Bug #21165 (Can't reproduce): 2 pgs stuck in unknown during thrashing
Sage Weil
07:27 PM Bug #23788 (Duplicate): luminous->mimic: EIO (crc mismatch) on copy-get from ec pool
I think this was a dup of #23871 Sage Weil
07:24 PM Backport #23912 (Resolved): luminous: mon: High MON cpu usage when cluster is changing
https://github.com/ceph/ceph/pull/21968 Nathan Cutler
07:17 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
The zap run in this is definitely not zero'ing the first block based on log output... Vasu Kulkarni
06:49 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
we clean more than 100m but i think its from the end
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph...
Vasu Kulkarni
06:25 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
Thanks alfredo
It shows that zap is not working now, I think we should fix the ceph-disk zap to properly clean the...
Vasu Kulkarni
06:07 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
Looking at the logs for the OSD that failed:... Alfredo Deza
05:48 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
seen in both 14.04, 16.04 and centos for bluestore option only
14.04:
http://qa-proxy.ceph.com/teuthology/teuth...
Vasu Kulkarni
05:45 PM Bug #23911 (Won't Fix - EOL): ceph:luminous: osd out/down when setup with ubuntu/bluestore
this could be a systemd issue or more,
a) setup cluster using ceph-deploy
b) use ceph-disk/bluestore option for ...
Vasu Kulkarni
05:26 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
Moving this back to RADOS as it seems the new consensus is that it's a RADOS bug. Patrick Donnelly
06:46 AM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
From message: "error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, count... jianpeng ma
04:38 PM Bug #23893 (Resolved): jewel clients fail to decode mimic osdmap
it was a bug in wip-osdmap-encode, fixed before merge Sage Weil
04:14 PM Bug #23713 (Pending Backport): High MON cpu usage when cluster is changing
Sage Weil
03:01 PM Bug #23909 (Resolved): snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a...

New code for tracker #22881 in pull request https://github.com/ceph/ceph/pull/21546 no calls _scan_snaps() on each ...
David Zafman
01:23 PM Bug #23627 (Fix Under Review): Error EACCES: problem getting command descriptions from mgr.None f...
https://github.com/ceph/ceph/pull/21698 Kefu Chai
01:16 PM Bug #23627: Error EACCES: problem getting command descriptions from mgr.None from 'ceph tell mgr'
... Kefu Chai
12:22 PM Bug #23627: Error EACCES: problem getting command descriptions from mgr.None from 'ceph tell mgr'
/a//kchai-2018-04-27_07:23:02-rados-wip-kefu-testing-2018-04-27-0902-distro-basic-smithi/2444194 Kefu Chai
10:43 AM Backport #23905 (Resolved): jewel: Deleting a pool with active watch/notify linger ops can result...
https://github.com/ceph/ceph/pull/21754 Nathan Cutler
10:42 AM Backport #23904 (Resolved): luminous: Deleting a pool with active watch/notify linger ops can res...
https://github.com/ceph/ceph/pull/21752 Nathan Cutler
10:39 AM Backport #23850 (New): luminous: Read operations segfaulting multiple OSDs
Status can change to "In Progress" when the PR is open and URL of PR is mentioned in a comment. Nathan Cutler
06:29 AM Backport #23850 (In Progress): luminous: Read operations segfaulting multiple OSDs
Victor Denisov
10:17 AM Bug #23899: run cmd 'ceph daemon osd.0 smart' cause osd daemon Segmentation fault
root cause is sometimes output.read_fd() could return 0 length data.
ret = output.read_fd(smartctl.get_stdout(), 1...
cory gu
10:15 AM Bug #23899 (Resolved): run cmd 'ceph daemon osd.0 smart' cause osd daemon Segmentation fault

2018-04-27 09:44:51.572 7fb787a05700 -1 osd.0 57 smartctl output is:
2018-04-27 09:44:51.576 7fb787a05700 -1 *** C...
cory gu
09:00 AM Bug #23879: test_mon_osdmap_prune.sh fails
... Kefu Chai
01:34 AM Bug #23878: assert on pg upmap
This pr #21670 passed tests failed before in my local cluster, needs qa huang jun
12:55 AM Bug #23872 (Pending Backport): Deleting a pool with active watch/notify linger ops can result in ...
Kefu Chai

04/26/2018

11:27 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh

Seen again:
http://qa-proxy.ceph.com/teuthology/dzafman-2018-04-26_10:04:07-rados-wip-zafman-testing-distro-basi...
David Zafman
10:33 PM Bug #23893 (Resolved): jewel clients fail to decode mimic osdmap
http://pulpito.ceph.com/sage-2018-04-26_19:17:57-rados:thrash-old-clients-wip-sage-testing-2018-04-26-1251-distro-bas... Sage Weil
10:22 PM Bug #23871 (Resolved): luminous->mimic: missing primary copy of xxx, wil try copies on 3, then fu...
Sage Weil
10:20 PM Bug #23892 (Can't reproduce): luminous->mimic: mon segv in ~MonOpRequest from OpHistoryServiceThread
... Sage Weil
05:06 PM Bug #23785 (Resolved): "test_prometheus (tasks.mgr.test_module_selftest.TestModuleSelftest) ... E...
test is passing now Sage Weil
02:23 PM Bug #23769 (In Progress): osd/EC: slow/hung ops in multimds suite test
Sage Weil
01:55 PM Bug #23878 (Fix Under Review): assert on pg upmap
https://github.com/ceph/ceph/pull/21670 Sage Weil
01:55 PM Bug #23878: assert on pg upmap
Sage Weil
09:52 AM Bug #23878: assert on pg upmap
I’ll prepare a patch soon xie xingguo
06:44 AM Bug #23878: assert on pg upmap
And then if i do pg-upmap operation.... huang jun
05:35 AM Bug #23878: assert on pg upmap
After pick the pr https://github.com/ceph/ceph/pull/21325
It works fine.
But i have some question:
the upmap items...
huang jun
04:31 AM Bug #23878 (Resolved): assert on pg upmap
I use the follow script to test upmap... huang jun
10:09 AM Backport #23863 (In Progress): luminous: scrub interaction with HEAD boundaries and clones is broken
Prashant D
09:16 AM Backport #23863: luminous: scrub interaction with HEAD boundaries and clones is broken
https://github.com/ceph/ceph/pull/21665 Prashant D
07:46 AM Bug #23879 (Can't reproduce): test_mon_osdmap_prune.sh fails
... Kefu Chai
02:46 AM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
/kchai-2018-04-26_00:52:32-rados-wip-kefu-testing-2018-04-25-2253-distro-basic-smithi/2439501/ Kefu Chai
12:02 AM Bug #20924: osd: leaked Session on osd.7
osd.3 here:
http://pulpito.ceph.com/yuriw-2018-04-23_23:19:23-rados-wip-yuri-testing-2018-04-23-1502-distro-basic-...
Josh Durgin

04/25/2018

10:10 PM Bug #23875 (Resolved): Removal of snapshot with corrupt replica crashes osd

This may be a completely legitimate crash due to the curruption.
See pending test case TEST_scrub_snaps_replica ...
David Zafman
09:46 PM Bug #23816 (Fix Under Review): disable bluestore cache caused a rocksdb error
Josh Durgin
09:29 PM Bug #23204 (Duplicate): missing primary copy of object in mixed luminous<->master cluster with bl...
Sage Weil
09:28 PM Bug #21992 (Duplicate): osd: src/common/interval_map.h: 161: FAILED assert(len > 0)
Sage Weil
09:14 PM Backport #23786 (Fix Under Review): luminous: "utilities/env_librados.cc:175:33: error: unused pa...
https://github.com/ceph/ceph/pull/21655 Sage Weil
09:09 PM Bug #23827: osd sends op_reply out of order
Sage Weil
06:26 AM Bug #23827: osd sends op_reply out of order
same bug #20742 jianpeng ma
03:46 AM Bug #23827: osd sends op_reply out of order
Ignore my statement. Dispatch do put_back.So no race . jianpeng ma
03:24 AM Bug #23827: osd sends op_reply out of order
For this case: if slot->to_process is null. And Op1 enqueue_front. At the same time Op2 dispatch. Because two threads... jianpeng ma
09:09 PM Bug #23664 (Fix Under Review): cache-try-flush hits wrlock, busy loops
Sage Weil
08:35 PM Bug #23664: cache-try-flush hits wrlock, busy loops
reproducing this semi-frequently, see #23847
This should fix it: https://github.com/ceph/ceph/pull/21653
Sage Weil
09:07 PM Bug #23871 (In Progress): luminous->mimic: missing primary copy of xxx, wil try copies on 3, then...
Sage Weil
04:25 PM Bug #23871 (Resolved): luminous->mimic: missing primary copy of xxx, wil try copies on 3, then fu...
... Sage Weil
08:34 PM Bug #23847 (Duplicate): osd stuck recovery
Sage Weil
05:52 PM Bug #23847: osd stuck recovery
Recovery is starved by #23664, a cache tiering infinite loop. Sage Weil
05:36 PM Bug #23847: osd stuck recovery
recovery on 3.3 stalls out here... Sage Weil
05:26 PM Bug #23872: Deleting a pool with active watch/notify linger ops can result in seg fault
Original test failure where this issue was discovered: http://pulpito.ceph.com/trociny-2018-04-24_08:17:18-rbd-wip-mg... Jason Dillaman
05:24 PM Bug #23872 (Fix Under Review): Deleting a pool with active watch/notify linger ops can result in ...
*PR*: https://github.com/ceph/ceph/pull/21649 Jason Dillaman
05:17 PM Bug #23872 (Resolved): Deleting a pool with active watch/notify linger ops can result in seg fault
... Jason Dillaman
04:24 PM Backport #23870 (Resolved): luminous: null map from OSDService::get_map in advance_pg
https://github.com/ceph/ceph/pull/21737 Nathan Cutler
04:23 PM Backport #23863 (Resolved): luminous: scrub interaction with HEAD boundaries and clones is broken
https://github.com/ceph/ceph/pull/22044 Nathan Cutler
04:00 PM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
master commit says:
Consider a scenario like:
- scrub [3:2525d100:::earlier:head,3:2525d12f:::foo:200]
- we see...
Sage Weil
03:58 PM Bug #23646 (Pending Backport): scrub interaction with HEAD boundaries and clones is broken
Sage Weil
03:48 PM Bug #21977 (Pending Backport): null map from OSDService::get_map in advance_pg
Sage Weil
03:45 PM Bug #23857: flush (manifest) vs async recovery causes out of order op
maybe: /a/sage-2018-04-25_02:28:01-rados-wip-sage3-testing-2018-04-24-1729-distro-basic-smithi/2436808
rados/thras...
Sage Weil
03:37 PM Bug #23857: flush (manifest) vs async recovery causes out of order op
maybe: /a/sage-2018-04-25_02:28:01-rados-wip-sage3-testing-2018-04-24-1729-distro-basic-smithi/2436663
rados/thras...
Sage Weil
01:53 PM Bug #23857: flush (manifest) vs async recovery causes out of order op
The core problem is that the requeue logic assumes that objects always go from degraded to not degraded.. never the o... Sage Weil
01:49 PM Bug #23857 (Can't reproduce): flush (manifest) vs async recovery causes out of order op
... Sage Weil
03:44 PM Bug #23860 (Resolved): luminous->master: luminous crashes with AllReplicasRecovered in Started/Pr...
... Sage Weil
11:10 AM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
We hit this today on Jewel release (10.2.7), all OSDs connected to one of the monitor in the quorum having this issue... Xiaoxi Chen
08:23 AM Backport #23852 (In Progress): luminous: OSD crashes on empty snapset
Nathan Cutler
08:18 AM Backport #23852 (Resolved): luminous: OSD crashes on empty snapset
https://github.com/ceph/ceph/pull/21638 Nathan Cutler
08:18 AM Bug #23851 (Resolved): OSD crashes on empty snapset
Fix merged to master: https://github.com/ceph/ceph/pull/21058 Nathan Cutler
04:49 AM Backport #23850 (Resolved): luminous: Read operations segfaulting multiple OSDs
https://github.com/ceph/ceph/pull/21911 Nathan Cutler

04/24/2018

10:33 PM Bug #21931 (In Progress): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (ra...
This is a bug in trimtrunc handling with EC pools. Josh Durgin
10:27 PM Bug #23195 (Pending Backport): Read operations segfaulting multiple OSDs
https://github.com/ceph/ceph/pull/21273 Josh Durgin
10:20 PM Bug #23195 (Resolved): Read operations segfaulting multiple OSDs
Sage Weil
10:25 PM Bug #23847 (Duplicate): osd stuck recovery
... Sage Weil
10:24 PM Bug #23827 (In Progress): osd sends op_reply out of order
Sage Weil
09:26 AM Bug #23827: osd sends op_reply out of order
... Zheng Yan
08:36 PM Bug #23646 (Fix Under Review): scrub interaction with HEAD boundaries and clones is broken
https://github.com/ceph/ceph/pull/21628
I think this will fix it?
Sage Weil
12:42 AM Bug #23646: scrub interaction with HEAD boundaries and clones is broken

The commit below adds code to honor the no_whiteout flag even when it looks like clones exist or will exist soon. ...
David Zafman
06:03 PM Bug #21977: null map from OSDService::get_map in advance_pg
https://github.com/ceph/ceph/pull/21623 Sage Weil
06:02 PM Bug #21977 (Fix Under Review): null map from OSDService::get_map in advance_pg
advance_pg ran before init() published the initial map to OSDService. Sage Weil
05:33 PM Bug #23716 (Resolved): osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (o...
This seems to be resolved. My guess is it's fallout from https://github.com/ceph/ceph/pull/21604 Sage Weil
03:45 PM Bug #23763 (Pending Backport): upgrade: bad pg num and stale health status in mixed lumnious/mimi...
Kefu Chai

04/23/2018

09:40 PM Bug #23646 (In Progress): scrub interaction with HEAD boundaries and clones is broken

The osd log for primary osd.1 shows that pg 3.0 is a cache pool in a cache tiering configuration. The message "_de...
David Zafman
09:12 PM Bug #23830 (Can't reproduce): rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
... Sage Weil
07:45 PM Bug #23828 (Can't reproduce): ec gen object leaks into different filestore collection just after ...
... Sage Weil
05:11 PM Bug #23827 (Resolved): osd sends op_reply out of order
... Patrick Donnelly
03:13 AM Bug #23713 (Fix Under Review): High MON cpu usage when cluster is changing
https://github.com/ceph/ceph/pull/21532 Sage Weil

04/22/2018

08:07 PM Bug #21977: null map from OSDService::get_map in advance_pg
From the latest logs, the peering thread id does not appear at all in the log until the crash.
I'm wondering if we...
Josh Durgin
08:05 PM Bug #21977: null map from OSDService::get_map in advance_pg
Seen again here:
http://pulpito.ceph.com/yuriw-2018-04-20_20:02:29-upgrade:jewel-x-luminous-distro-basic-ovh/2420862/
Josh Durgin

04/21/2018

04:06 PM Bug #23793: ceph-osd consumed 10+GB rss memory
Set osd_debug_op_order to false can fix this problem.
My ceph cluster is created through vstart.sh which set osd_deb...
Honggang Yang
03:57 PM Bug #23816: disable bluestore cache caused a rocksdb error
https://github.com/ceph/ceph/pull/21583 Honggang Yang
03:53 PM Bug #23816 (Resolved): disable bluestore cache caused a rocksdb error
I disabled bluestore/rocksdb cache to estimate ceph-osd's memory consumption
by set bluestore_cache_size_ssd/bluesto...
Honggang Yang
06:55 AM Bug #23145: OSD crashes during recovery of EC pg
`2018-03-09 08:29:09.170227 7f901e6b30 10 merge_log log((17348'18587,17348'18587], crt=17348'18585) from osd.6(2) int... Zengran Zhang

04/20/2018

09:09 PM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
You don't really have authentication without the message signing. Since we don't do full encryption, signing is the o... Greg Farnum
03:07 PM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
How costly is just the authentication piece, i.e. keep cephx but turn off message signing? Josh Durgin
07:21 AM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
Summary of the discussion:
`check_message_signature` in `AsyncConnection::process` is being already protected by `...
Radoslaw Zarzynski
06:38 AM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
per Radoslaw Zarzynski
> the overhead between `CreateContextBySym` and `DigestBegin` is small
and probably we c...
Kefu Chai
08:53 PM Bug #23517 (Resolved): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
Jason Dillaman
02:27 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
https://github.com/ceph/ceph/pull/21571 Kefu Chai
08:47 PM Bug #23811: RADOS stat slow for some objects on same OSD
We are still debugging this. On a further look, it looks like all objects on that PG (aka _79.1f9_) show similar slow... Vaibhav Bhembre
05:30 PM Bug #23811 (New): RADOS stat slow for some objects on same OSD
We have observed that queries have been slow for some RADOS objects while others on the same OSD respond much quickly... Vaibhav Bhembre
05:19 PM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
I guess the intention is that scrubbing takes priority and proceeds even if trimming is in progress. Before more tri... David Zafman
04:45 PM Bug #23646: scrub interaction with HEAD boundaries and clones is broken

We don't start trimming if scrubbing is happening, so maybe the only hole is that scrubbing doesn't check for trimm...
David Zafman
04:38 PM Bug #23810: ceph mon dump outputs verbose text to stderr
As a simple verification, running:... Anonymous
04:26 PM Bug #23810 (New): ceph mon dump outputs verbose text to stderr
When executing... Anonymous
02:41 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
My opinion is that this is different from a problem where the inconsistent flag reappears after repairing a PG becaus... David Turner
12:55 PM Backport #23808 (In Progress): luminous: upgrade: bad pg num and stale health status in mixed lum...
https://github.com/ceph/ceph/pull/21556 Kefu Chai
12:55 PM Backport #23808 (Resolved): luminous: upgrade: bad pg num and stale health status in mixed lumnio...
https://github.com/ceph/ceph/pull/21556 Kefu Chai
11:11 AM Bug #23763 (Fix Under Review): upgrade: bad pg num and stale health status in mixed lumnious/mimi...
https://github.com/ceph/ceph/pull/21555 Kefu Chai
10:09 AM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
i think the pg_num = 11 is set by LibRadosList.EnumerateObjects... Kefu Chai
12:32 AM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
Yuri reproduced the bad pg_num in 1 of 2 runs:... Josh Durgin
12:48 AM Bug #22881 (In Progress): scrub interaction with HEAD boundaries and snapmapper repair is broken
David Zafman

04/19/2018

02:18 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
Kefu Chai
12:34 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
https://github.com/ceph/ceph/pull/21280 Kefu Chai
07:42 AM Bug #23517 (Fix Under Review): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
... Kefu Chai
09:33 AM Backport #23784 (In Progress): luminous: osd: Warn about objects with too many omap entries
Nathan Cutler
09:33 AM Backport #23784: luminous: osd: Warn about objects with too many omap entries
h3. description
As discussed in this PR - https://github.com/ceph/ceph/pull/16332
Nathan Cutler
07:29 AM Bug #23793: ceph-osd consumed 10+GB rss memory
the "mon max pg per osd" is 1024 in my test. Honggang Yang
07:14 AM Bug #23793 (New): ceph-osd consumed 10+GB rss memory
After 26GB data is written, ceph-osd's memory(rss) reached 10+GB.
The objectstore backed is *KStore*. master branc...
Honggang Yang
06:42 AM Backport #22934 (In Progress): luminous: filestore journal replay does not guard omap operations
Victor Denisov

04/18/2018

09:10 PM Bug #23620 (Resolved): tasks.mgr.test_failover.TestFailover failure
Josh Durgin
08:25 PM Bug #23788 (Duplicate): luminous->mimic: EIO (crc mismatch) on copy-get from ec pool
... Sage Weil
08:01 PM Bug #23787 (Rejected): luminous: "osd-scrub-repair.sh'" failures in rados
This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s...
Yuri Weinstein
07:57 PM Backport #23786 (Resolved): luminous: "utilities/env_librados.cc:175:33: error: unused parameter ...
This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s...
Yuri Weinstein
07:52 PM Bug #23785 (Resolved): "test_prometheus (tasks.mgr.test_module_selftest.TestModuleSelftest) ... E...
This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s...
Yuri Weinstein
06:45 PM Backport #23784 (Resolved): luminous: osd: Warn about objects with too many omap entries
https://github.com/ceph/ceph/pull/21518 Vikhyat Umrao
03:34 PM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
the pgs with creating or unknown status "pg dump" were active+clean after 2018-04-16 22:47. so the output of last "pg... Kefu Chai
01:29 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
Any update on this? David Turner
12:14 PM Bug #23701 (Fix Under Review): oi(object_info_t) size mismatch with real object size during old w...
Kefu Chai
12:12 PM Bug #23701: oi(object_info_t) size mismatch with real object size during old write arrives after ...
i think this issue only exists in jewel. Kefu Chai
02:47 AM Documentation #23777: doc: description of OSD_OUT_OF_ORDER_FULL problem
the default values :
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
long li
02:39 AM Documentation #23777 (Resolved): doc: description of OSD_OUT_OF_ORDER_FULL problem
The description of OSD_OUT_OF_ORDER_FULL is... long li
12:30 AM Feature #23364 (Resolved): Special scrub handling of hinfo_key errors
David Zafman
12:30 AM Backport #23654 (Resolved): luminous: Special scrub handling of hinfo_key errors
David Zafman

04/17/2018

07:00 PM Backport #23772 (Resolved): luminous: ceph status shows wrong number of objects
https://github.com/ceph/ceph/pull/22680 Nathan Cutler
06:36 PM Bug #23769 (Resolved): osd/EC: slow/hung ops in multimds suite test
... Patrick Donnelly
03:40 PM Feature #23364: Special scrub handling of hinfo_key errors
This pull request is another follow on:
https://github.com/ceph/ceph/pull/21450
David Zafman
11:41 AM Bug #20924: osd: leaked Session on osd.7
/a/sage-2018-04-17_04:17:03-rados-wip-sage3-testing-2018-04-16-2028-distro-basic-smithi/2404155
this time on osd.4...
Sage Weil
07:37 AM Bug #23767: "ceph ping mon" doesn't work
so "ceph ping mon.<id>" will remind you mon.<id> doesn't existed. however, if you run "ceph ping mon.a", you can get ... cory gu
07:33 AM Bug #23767 (New): "ceph ping mon" doesn't work
if there is only mon_host= ip1, ip2...in the ceph.conf, then "ceph ping mon.<id>" doesn't work.
Root cause is in the...
cory gu
06:14 AM Bug #23273: segmentation fault in PrimaryLogPG::recover_got()
Sorry for late reply, but it's hard to reproduce. we reproduce it once with... Yan Jun
02:09 AM Documentation #23765 (New): librbd hangs if permissions are incorrect
I've been building rust bindings for librbd against ceph jewel and luminous. I found out by accident that if a cephx... Chris Holcombe
12:14 AM Bug #23763 (Resolved): upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
This happened in a luminous-x/point-to-point run. Logs in teuthology:/home/yuriw/logs/2387999/
Versions at this po...
Josh Durgin

04/16/2018

05:52 PM Bug #23760 (New): mon: `config get <who>` does not allow `who` as 'mon'/'osd'
`config set mon` is allowed, but `config get mon` is not.
This is due to <who> on `get` being parsed as an EntityN...
Joao Eduardo Luis
04:39 PM Bug #23753: "Error ENXIO: problem getting command descriptions from osd.4" in upgrade:kraken-x-lu...
This generally means the OSD isn't on? Greg Farnum

04/15/2018

10:22 PM Bug #23753 (Can't reproduce): "Error ENXIO: problem getting command descriptions from osd.4" in u...
Run: http://pulpito.ceph.com/teuthology-2018-04-15_03:25:02-upgrade:kraken-x-luminous-distro-basic-smithi/
Jobs: '23...
Yuri Weinstein
05:44 PM Bug #23471 (Resolved): add --add-bucket and --move options to crushtool
Nathan Cutler
02:51 PM Bug #22095 (Pending Backport): ceph status shows wrong number of objects
Kefu Chai
08:52 AM Bug #19348: "ceph ping mon.c" cli prints assertion failure on timeout
https://github.com/ceph/ceph/pull/21432 Rishabh Dave

04/14/2018

06:11 AM Support #23719: Three nodes shutdown,only boot two of the nodes,many pgs down.(host failure-domai...
fix description: If pg1.1 have acting set [1,2,3], I power down osd.3 first,then power down osd.2. In case I boot osd... junwei liao
05:50 AM Support #23719 (New): Three nodes shutdown,only boot two of the nodes,many pgs down.(host failure...
The interval mechanism of PG will cause a problem in the process of cluster restart.If I have 3 nodes(host failure-do... junwei liao

04/13/2018

10:40 PM Bug #23716: osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (on upgrade f...
... Sage Weil
10:21 PM Bug #23716 (Resolved): osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (o...
... Sage Weil
07:33 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
Live multimds run: /ceph/teuthology-archive/pdonnell-2018-04-13_04:19:33-multimds-wip-pdonnell-testing-20180413.02283... Patrick Donnelly
07:30 PM Bug #21992: osd: src/common/interval_map.h: 161: FAILED assert(len > 0)
/ceph/teuthology-archive/pdonnell-2018-04-13_04:19:33-multimds-wip-pdonnell-testing-20180413.022831-testing-basic-smi... Patrick Donnelly
06:28 PM Bug #23713: High MON cpu usage when cluster is changing
My guess is that this is the compat reencoding of the OSDMap for the pre-luminous clients.
Are you by chance makin...
Sage Weil
06:10 PM Bug #23713 (Resolved): High MON cpu usage when cluster is changing
After upgrading to Luminous 12.2.4 (from Jewel 10.2.5), we consistently see high cpu usage when OSDMap changes , esp... Xiaoxi Chen
03:03 PM Bug #23228 (Closed): scrub mismatch on objects
The failure in comment (2) looks unrelated, but i twas a test branch. let's see if it happens again.
The original ...
Sage Weil
01:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
Is there any testing, logs, etc that will be helpful for tracking down the cause of this problem. I had a fairly bad... David Turner
08:20 AM Bug #23701: oi(object_info_t) size mismatch with real object size during old write arrives after ...
here is my pull request to fix this problem
https://github.com/ceph/ceph/pull/21408
Peng Xie
08:08 AM Bug #23701 (Fix Under Review): oi(object_info_t) size mismatch with real object size during old w...
currently, in our test environment (jewel : cephfs + cache tier + ec pool), we found several osd coredump
in the fol...
Peng Xie
01:52 AM Backport #23654 (In Progress): luminous: Special scrub handling of hinfo_key errors
https://github.com/ceph/ceph/pull/21397 Kefu Chai

04/12/2018

11:08 PM Feature #23364: Special scrub handling of hinfo_key errors
Follow on pull request included in backport to this tracker
https://github.com/ceph/ceph/pull/21362
David Zafman
09:49 PM Bug #23610 (Resolved): pg stuck in activating because of dropped pg-temp message
Nathan Cutler
09:48 PM Backport #23630 (Resolved): luminous: pg stuck in activating
Nathan Cutler
09:28 PM Backport #23630: luminous: pg stuck in activating
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21330
merged
Yuri Weinstein
05:35 PM Bug #23228: scrub mismatch on objects
My change only affects the scrub error counts in the stats. However, if setting dirty_info in proc_primary_info() wo... David Zafman
04:27 PM Bug #23228: scrub mismatch on objects
The original report was an EC test, so it looks like a dup of #23339.
David, your failures are not EC. Could they...
Sage Weil
04:43 PM Bug #20439 (Can't reproduce): PG never finishes getting created
Sage Weil
04:29 PM Bug #22656: scrub mismatch on bytes (cache pools)
Just bytes
dzafman-2018-03-28_18:21:29-rados-wip-zafman-testing-distro-basic-smithi/2332093
[ERR] 3.0 scrub sta...
Sage Weil
02:29 PM Bug #20874: osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end() || (miter->second...
/a/sage-2018-04-11_22:26:40-rados-wip-sage-testing-2018-04-11-1604-distro-basic-smithi/2387226 Sage Weil
02:25 PM Backport #23668 (In Progress): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrit...
https://github.com/ceph/ceph/pull/21378 Prashant D
01:34 AM Backport #23668 (Resolved): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrites'...
https://github.com/ceph/ceph/pull/21378 Nathan Cutler
07:19 AM Backport #23675 (In Progress): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
Nathan Cutler
07:07 AM Backport #23675 (Resolved): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
https://github.com/ceph/ceph/pull/21368 Nathan Cutler
03:27 AM Bug #23662 (Resolved): osd: regression causes SLOW_OPS warnings in multimds suite
Sage Weil
02:59 AM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
not able to move this to CI somehow... moving it to RADOS. Kefu Chai
02:54 AM Backport #23315 (Resolved): luminous: pool create cmd's expected_num_objects is not correctly int...
Nathan Cutler
02:41 AM Bug #23622 (Pending Backport): qa/workunits/mon/test_mon_config_key.py fails on master
Sage Weil
02:01 AM Bug #23564: OSD Segfaults
Correct, Bluestore and Luminous 12.2.4 Alex Gorbachev
01:57 AM Backport #23673 (In Progress): jewel: auth: ceph auth add does not sanity-check caps
Nathan Cutler
01:43 AM Backport #23673 (Resolved): jewel: auth: ceph auth add does not sanity-check caps
https://github.com/ceph/ceph/pull/21367 Nathan Cutler
01:53 AM Bug #23578 (Resolved): large-omap-object-warnings test fails
Brad Hubbard
01:52 AM Backport #23633 (Rejected): luminous: large-omap-object-warnings test fails
We can close this if that test isn't present in luminous. Brad Hubbard
01:35 AM Backport #23633 (Need More Info): luminous: large-omap-object-warnings test fails
Brad,
Backporting PR#21295 to luminous is unrelated unless we get qa/suites/rados/singleton-nomsgr/all/large-omap-ob...
Prashant D
01:41 AM Backport #23670 (In Progress): luminous: auth: ceph auth add does not sanity-check caps
Nathan Cutler
01:34 AM Backport #23670 (Resolved): luminous: auth: ceph auth add does not sanity-check caps
https://github.com/ceph/ceph/pull/24906 Nathan Cutler
01:34 AM Backport #23654 (New): luminous: Special scrub handling of hinfo_key errors
Nathan Cutler
01:33 AM Bug #22525 (Pending Backport): auth: ceph auth add does not sanity-check caps
Nathan Cutler

04/11/2018

11:22 PM Bug #23662 (Fix Under Review): osd: regression causes SLOW_OPS warnings in multimds suite
https://github.com/ceph/teuthology/pull/1166 Patrick Donnelly
09:38 PM Bug #23662: osd: regression causes SLOW_OPS warnings in multimds suite
Looks like the obvious cause: https://github.com/ceph/ceph/pull/20660 Patrick Donnelly
07:56 PM Bug #23662 (Resolved): osd: regression causes SLOW_OPS warnings in multimds suite
See: [1], first instance of the problem at [0].
The last run which did not cause most multimds jobs to fail with S...
Patrick Donnelly
11:20 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair

Any scrub that completes without errors will set num_scrub_errors in pg stats to 0. That will cause the inconsiste...
David Zafman
10:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
David, is there any way a missing object wouldn't be reported in list-inconsistent output? Josh Durgin
11:01 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
Let's see if this happens again now that sage's fast peering branch is merged. Josh Durgin
10:58 PM Backport #23485 (Resolved): luminous: scrub errors not cleared on replicas can cause inconsistent...
David Zafman
10:58 PM Bug #23585: osd: safe_timer segfault
Possibly the same as http://tracker.ceph.com/issues/23431 Josh Durgin
02:10 PM Bug #23585: osd: safe_timer segfault
Got segfault in safe_timer too. Got it just once so can not provide more info at the moment.
2018-04-03 05:53:07...
Sergey Malinin
10:57 PM Bug #23564: OSD Segfaults
Is this on bluestore? there are a few reports of this occurring on bluestore including your other bug http://tracker.... Josh Durgin
10:44 PM Bug #23590: kstore: statfs: (95) Operation not supported
Josh Durgin
10:42 PM Bug #23601 (Resolved): Cannot see the value of parameter osd_op_threads via ceph --show-config
Josh Durgin
10:37 PM Bug #23614: local_reserver double-reservation of backfilled pg
This may be the same root cause as http://tracker.ceph.com/issues/23490 Josh Durgin
10:36 PM Bug #23565: Inactive PGs don't seem to cause HEALTH_ERR
Brad, can you take a look at this? I think it can be handled by the stuck pg code, that iirc already warns about pgs ... Josh Durgin
10:25 PM Bug #23664 (Resolved): cache-try-flush hits wrlock, busy loops
... Sage Weil
10:12 PM Bug #23403 (Closed): Mon cannot join quorum
Thanks for letting us know. Brad Hubbard
01:15 PM Bug #23403: Mon cannot join quorum
After more investigation we discovered that one of the bonds on the machine was not behaving properly. We removed the... Gauvain Pocentek
11:28 AM Bug #23403: Mon cannot join quorum
Thanks for the investigation Brad.
The "fault, initiating reconnect" and "RESETSESSION" messages only appear when ...
Gauvain Pocentek
07:57 PM Bug #23595: osd: recovery/backfill is extremely slow
@Greg Farnum: Ah, great that part is already handled!
What about my other questions though, like
> I think it i...
Niklas Hambuechen
06:45 PM Bug #23595 (Duplicate): osd: recovery/backfill is extremely slow
https://tracker.ceph.com/issues/23141
Sorry you ran into this, it's a bug in BlueStore/BlueFS. The fix will be in ...
Greg Farnum
07:49 PM Backport #23315: luminous: pool create cmd's expected_num_objects is not correctly interpreted
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20907
merged
Yuri Weinstein
05:45 PM Feature #23660 (New): when scrub errors are due to disk read errors, ceph status can say "likely ...
If some of the scrub errors are due to disk read errors, we can also say in the status output "likely disk errors" an... Vasu Kulkarni
03:49 PM Bug #23487 (Pending Backport): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
Sage Weil
03:39 PM Backport #23654 (Resolved): luminous: Special scrub handling of hinfo_key errors
https://github.com/ceph/ceph/pull/21397 David Zafman
03:09 PM Bug #23621 (Resolved): qa/standalone/mon/misc.sh fails on master
Kefu Chai
01:40 PM Backport #23316: jewel: pool create cmd's expected_num_objects is not correctly interpreted
-https://github.com/ceph/ceph/pull/21042-
but test/mon/osd-pool-create.sh failing, looking into it.
Prashant D
05:00 AM Bug #23610 (Pending Backport): pg stuck in activating because of dropped pg-temp message
Kefu Chai
04:56 AM Bug #23648 (New): max-pg-per-osd.from-primary fails because of activating pg
the reason why we have activating pg when the number of pg is under the hard limit of max-pg-per-osd is that:
1. o...
Kefu Chai
03:01 AM Bug #23647 (In Progress): thrash-eio test can prevent recovery
We are injecting random EIOs. However, in a recovery situation an EIO leads us to decide the object is missing in on... Sage Weil

04/10/2018

11:38 PM Feature #23364 (Pending Backport): Special scrub handling of hinfo_key errors
David Zafman
09:13 PM Bug #23428: Snapset inconsistency is hard to diagnose because authoritative copy used by list-inc...
In the pull request https://github.com/ceph/ceph/pull/20947 there is a change to partially address this issue. Unfor... David Zafman
09:08 PM Bug #23646 (Resolved): scrub interaction with HEAD boundaries and clones is broken
Scrub will work in chunks, accumulating work in cleaned_meta_map. A single object's clones may stretch across two su... Sage Weil
06:12 PM Backport #23630 (In Progress): luminous: pg stuck in activating
Nathan Cutler
05:53 PM Backport #23630 (Resolved): luminous: pg stuck in activating
https://github.com/ceph/ceph/pull/21330 Nathan Cutler
05:53 PM Bug #23610 (Fix Under Review): pg stuck in activating because of dropped pg-temp message
Nathan Cutler
05:53 PM Bug #23610 (Pending Backport): pg stuck in activating because of dropped pg-temp message
Nathan Cutler
05:53 PM Backport #23633 (Rejected): luminous: large-omap-object-warnings test fails
Nathan Cutler
05:47 PM Bug #18746 (Fix Under Review): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) ...
Greg Farnum
04:26 PM Bug #23495 (Resolved): Need (SLOW_OPS) in whitelist for another yaml
Kefu Chai
11:26 AM Bug #23495 (Fix Under Review): Need (SLOW_OPS) in whitelist for another yaml
https://github.com/ceph/ceph/pull/21324 Kefu Chai
01:55 PM Bug #23627 (Resolved): Error EACCES: problem getting command descriptions from mgr.None from 'cep...
... Sage Weil
01:32 PM Bug #23622 (Fix Under Review): qa/workunits/mon/test_mon_config_key.py fails on master
https://github.com/ceph/ceph/pull/21329 Sage Weil
03:42 AM Bug #23622: qa/workunits/mon/test_mon_config_key.py fails on master
see https://github.com/ceph/ceph/pull/21317 (not a fix) Sage Weil
02:56 AM Bug #23622 (Resolved): qa/workunits/mon/test_mon_config_key.py fails on master
... Sage Weil
07:04 AM Bug #20919 (Resolved): osd: replica read can trigger cache promotion
Nathan Cutler
06:59 AM Backport #22403 (Resolved): jewel: osd: replica read can trigger cache promotion
Nathan Cutler
06:22 AM Bug #23585: osd: safe_timer segfault
https://drive.google.com/open?id=1x_0p9s9JkQ1zo-LCx6mHxm0DQO5sc1UA too larger about(1.2G). And ceph-osd.297.log.gz di... jianpeng ma
05:53 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
doc don't update. So i create a PR:https://github.com/ceph/ceph/pull/21319. jianpeng ma
04:57 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
In this commit:08731c3567300b28d83b1ac1c2ba. It removed. Maybe docs didn't update or you read old docs. jianpeng ma
04:27 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
But I can see this option in document !! The setting is work in Jewel
So osd_op_threads was removed from Luminous ??
Cyril Chang
03:14 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
There is no "osd_op_threads". Now it call osd_op_num_shards/osd_op_num_shards_hdd/osd_op_num_shards_sdd. jianpeng ma
05:34 AM Bug #23595: osd: recovery/backfill is extremely slow
check hdd or ssd by code at osd started and not changed after starting.
I think we need increase the log level fo...
jianpeng ma
05:19 AM Bug #23593 (Resolved): RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
Kefu Chai
04:29 AM Bug #23621 (In Progress): qa/standalone/mon/misc.sh fails on master
https://github.com/ceph/ceph/pull/21318 Brad Hubbard
04:17 AM Bug #23621: qa/standalone/mon/misc.sh fails on master
bc5df2b4497104c2a8747daf0530bb5184f9fecb added ceph::features::mon::FEATURE_OSDMAP_PRUNE so the output that's failing... Brad Hubbard
02:53 AM Bug #23621: qa/standalone/mon/misc.sh fails on master
http://pulpito.ceph.com/sage-2018-04-10_02:19:56-rados-master-distro-basic-smithi/2377263
http://pulpito.ceph.com/sa...
Sage Weil
02:51 AM Bug #23621 (Resolved): qa/standalone/mon/misc.sh fails on master
This appears to be from the addition of the osdmap-prune mon feature? Sage Weil
02:49 AM Bug #23620 (Fix Under Review): tasks.mgr.test_failover.TestFailover failure
https://github.com/ceph/ceph/pull/21315 Sage Weil
02:43 AM Bug #23620 (Resolved): tasks.mgr.test_failover.TestFailover failure
http://pulpito.ceph.com/sage-2018-04-10_02:19:56-rados-master-distro-basic-smithi/2377255... Sage Weil
12:57 AM Bug #23578 (Pending Backport): large-omap-object-warnings test fails
Just a note that my analysis above was incorrect and this was not due to the lost coin flips but due to a pg map upda... Brad Hubbard
12:18 AM Backport #23485 (In Progress): luminous: scrub errors not cleared on replicas can cause inconsist...
David Zafman

04/09/2018

10:24 PM Feature #23616 (New): osd: admin socket should help debug status at all times
Last week I was looking at an LRC OSD which was having trouble, and it wasn't clear why.
The cause ended up being ...
Greg Farnum
10:18 PM Bug #22882 (Resolved): Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
Whoops, this merged way back then with a slightly different plan than discussed here (see PR discussion). Greg Farnum
09:59 PM Bug #22525: auth: ceph auth add does not sanity-check caps
https://github.com/ceph/ceph/pull/21311 Sage Weil
09:21 PM Feature #23045 (Resolved): mon: warn on slow ops in OpTracker
That PR got merged a while ago and we've been working through the slow ops warnings that turn up since. Seems to be a... Greg Farnum
08:59 PM Feature #21084 (Resolved): auth: add osd auth caps based on pool metadata
Patrick Donnelly
06:53 PM Bug #23614: local_reserver double-reservation of backfilled pg
Looking through the code I don't see where the reservation is supposed to be released. I see releases for
- the p...
Sage Weil
06:52 PM Bug #23614 (Resolved): local_reserver double-reservation of backfilled pg
- pg gets reservations (incl local_reserver)
- pg backfills, finishes
- ...apparently enver releases the reservatio...
Sage Weil
06:15 PM Bug #23365: CEPH device class not honored for erasure encoding.
A quote from Greg Farnum on the crash from another ticket:... Brian Woods
06:13 PM Bug #23365: CEPH device class not honored for erasure encoding.
I put 12.2.2, but that is incorrect. It is version ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) lu... Brian Woods
05:38 PM Bug #23365: CEPH device class not honored for erasure encoding.
What version are you running? How are your OSDs configured?
There was a bug with BlueStore SSDs being misreported ...
Greg Farnum
05:36 PM Bug #23371: OSDs flaps when cluster network is made down
You tested this on a version prior to luminous and the behavior has *changed*?
This must be a result of some chang...
Greg Farnum
05:24 PM Documentation #23613 (Closed): doc: add description of new fs-client auth profile
Patrick Donnelly
05:23 PM Documentation #23613 (Closed): doc: add description of new fs-client auth profile
On that page: http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
https:...
Patrick Donnelly
05:23 PM Documentation #23612 (New): doc: add description of new auth profiles
On that page: http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
https:...
Patrick Donnelly
05:18 PM Support #23455 (Resolved): osd: large number of inconsistent objects after recover or backfilling
fiemap is disabled by default precisely because there are a number of known bugs in the local filesystems across kern... Greg Farnum
05:07 PM Bug #23610 (Fix Under Review): pg stuck in activating because of dropped pg-temp message
https://github.com/ceph/ceph/pull/21310 Kefu Chai
05:02 PM Bug #23610 (Resolved): pg stuck in activating because of dropped pg-temp message
http://pulpito.ceph.com/yuriw-2018-04-05_22:33:03-rados-wip-yuri3-testing-2018-04-05-1940-luminous-distro-basic-smith... Kefu Chai
05:06 PM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
This is a dupe of...something. We can track it down later.
For now, note that the crash is happening with Hammer c...
Greg Farnum
06:17 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
Hm hm hm Nathan Cutler
02:56 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
h3. rados bisect
Reproducer: ...
Nathan Cutler
02:11 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
This problem was not happening so reproducibly before the current integration run, so one of the following PRs might ... Nathan Cutler
02:05 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
Set priority to Urgent because this prevents us from getting a clean rados run in jewel 10.2.11 integration testing. Nathan Cutler
02:04 AM Bug #23598 (Duplicate): hammer->jewel: ceph_test_rados crashes during radosbench task in jewel ra...
Test description: rados/upgrade/{hammer-x-singleton/{0-cluster/{openstack.yaml start.yaml} 1-hammer-install/hammer.ya... Nathan Cutler
04:39 PM Bug #23595: osd: recovery/backfill is extremely slow
*I have it figured out!*
The issue was "osd_recovery_sleep_hdd", which defaults to 0.1 seconds.
After setting
...
Niklas Hambuechen
03:23 PM Bug #23595: osd: recovery/backfill is extremely slow
OK, if I only have the 6 large files in the cephfs AND set the options... Niklas Hambuechen
02:55 PM Bug #23595: osd: recovery/backfill is extremely slow
I have now tested with only the 6*1GB files, having deleted the 270k empty files from cephfs.
I continue to see ex...
Niklas Hambuechen
12:30 PM Bug #23595: osd: recovery/backfill is extremely slow
You can find a core dump of the -O0 version created with GDB at http://nh2.me/ceph-issue-23595-osd-O0.core.xz Niklas Hambuechen
12:06 PM Bug #23595: osd: recovery/backfill is extremely slow
Attached are two GDB runs of a sender node.
In the release build there were many values "<optimized out>", so I re...
Niklas Hambuechen
11:45 AM Bug #23595: osd: recovery/backfill is extremely slow
On https://forum.proxmox.com/threads/increase-ceph-recovery-speed.36728/ people reported the same number as me of 10 ... Niklas Hambuechen
10:43 AM Bug #23601 (Resolved): Cannot see the value of parameter osd_op_threads via ceph --show-config
I have set the parameter of "osd op threads" in configuration file
but I cannot see the value of parameter "osd op t...
Cyril Chang
10:17 AM Bug #23403 (Need More Info): Mon cannot join quorum
Brad Hubbard
07:23 AM Bug #23578 (In Progress): large-omap-object-warnings test fails
https://github.com/ceph/ceph/pull/21295 Brad Hubbard
01:33 AM Bug #23578: large-omap-object-warnings test fails
We instruct the OSDs to scrub at around 16:15.... Brad Hubbard
04:31 AM Bug #23593 (Fix Under Review): RESTControllerTest.test_detail_route and RESTControllerTest.test_f...
Kefu Chai
02:08 AM Bug #22123: osd: objecter sends out of sync with pg epochs for proxied ops
Despite the jewel backport of this fix being merged, this problem has reappeared in jewel 10.2.11 integration testing... Nathan Cutler

04/08/2018

07:55 PM Bug #23595: osd: recovery/backfill is extremely slow
For the record, I installed the following debugging packages for gdb stack traces:... Niklas Hambuechen
07:53 PM Bug #23595: osd: recovery/backfill is extremely slow
I have read https://www.spinics.net/lists/ceph-devel/msg38331.html which suggests that there is some throttling going... Niklas Hambuechen
06:17 PM Bug #23595 (Duplicate): osd: recovery/backfill is extremely slow
I made a Ceph 12.2.4 (luminous stable) cluster of 3 machines with 10-Gigabit networking on Ubuntu 16.04, using pretty... Niklas Hambuechen
05:40 PM Bug #23593: RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
PR: https://github.com/ceph/ceph/pull/21290 Ricardo Dias
03:10 PM Bug #23593 (Resolved): RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
... Kefu Chai
04:31 PM Documentation #23594: auth: document what to do when locking client.admin out
I found one way to fix it on the mailing list:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/01...
Niklas Hambuechen
04:23 PM Documentation #23594 (New): auth: document what to do when locking client.admin out
I accidentally ran ... Niklas Hambuechen
11:06 AM Bug #23590: kstore: statfs: (95) Operation not supported
https://github.com/ceph/ceph/pull/21287 Honggang Yang
11:01 AM Bug #23590 (Fix Under Review): kstore: statfs: (95) Operation not supported
2018-04-07 16:19:07.248 7fdec4675700 -1 osd.0 0 statfs() failed: (95) Operation not supported
2018-04-07 16:19:08....
Honggang Yang
08:50 AM Bug #23589 (New): jewel: KStore Segmentation fault in ceph_test_objectstore --gtest_filter=-*/2:-*/3
Test description: rados/objectstore/objectstore.yaml
Log excerpt:...
Nathan Cutler
08:39 AM Bug #23588 (New): LibRadosAioEC.IsCompletePP test fails in jewel 10.2.11 integration testing
Test description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yam... Nathan Cutler
06:53 AM Bug #23511: forwarded osd_failure leak in mon
Greg, no. both tests below include the no_reply() fix.
see
- http://pulpito.ceph.com/kchai-2018-03-30_15:29:52-r...
Kefu Chai
06:42 AM Bug #23585 (Duplicate): osd: safe_timer segfault
... Alex Gorbachev
 

Also available in: Atom