Activity
From 04/04/2018 to 05/03/2018
05/03/2018
- 10:30 PM Bug #23980 (Pending Backport): UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap...
- 01:45 PM Bug #23980 (Fix Under Review): UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap...
- https://github.com/ceph/ceph/pull/21798
- 01:03 AM Bug #23980 (Resolved): UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap const&)
- ...
- 08:56 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Are there messages "not scheduling scrubs due to active recovery" in the logs on any of the primary OSDs? That messa...
- 08:40 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Ran into something similar this past week. ( active+clean+inconsistent) where forced scrubs would not run. The foll...
- 07:27 PM Bug #24000 (Fix Under Review): mon: snap delete on deleted pool returns 0 without proper payload
- *PR*: https://github.com/ceph/ceph/pull/21804
- 07:21 PM Bug #24000 (Resolved): mon: snap delete on deleted pool returns 0 without proper payload
- It can lead to an abort in the client application since an empty reply w/o an error code is constructed in the monito...
- 03:44 PM Documentation #23999 (Resolved): osd_recovery_priority is not documented (but osd_recovery_op_pri...
- Please document osd_recovery_priority and how it differs from osd_recovery_op_priority.
- 02:48 PM Bug #23961 (Duplicate): valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::re...
- 02:18 PM Backport #23998 (Resolved): luminous: osd/EC: slow/hung ops in multimds suite test
- https://github.com/ceph/ceph/pull/24393
- 02:08 PM Backport #23915 (Resolved): luminous: monitors crashing ./include/interval_set.h: 355: FAILED ass...
- 01:51 PM Backport #23915: luminous: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jew...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21717
merged - 01:40 PM Bug #23769 (Pending Backport): osd/EC: slow/hung ops in multimds suite test
- 11:58 AM Feature #22420 (New): Add support for obtaining a list of available compression options
- i am reopening this ticket. as the plugin registry is empty before any of the supported compressor plugin is created ...
- 11:27 AM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
- I didn't import or export any pgs, that was working osd in the cluster.
Is it possible that the restart of the osd ... - 10:28 AM Backport #23988 (Resolved): luminous: luminous->master: luminous crashes with AllReplicasRecovere...
- https://github.com/ceph/ceph/pull/21964
- 10:27 AM Backport #23986 (Resolved): luminous: recursive lock of objecter session::lock on cancel
- https://github.com/ceph/ceph/pull/21939
- 05:21 AM Bug #22220: osd/ReplicatedPG.h:1667:14: internal compiler error: in force_type_die, at dwarf2out....
- https://access.redhat.com/errata/RHBA-2018:1293
- 01:37 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
- Jason Dillaman wrote:
> Moving to RADOS since it sounds like it's an issue of corruption on your cache tier.
How ... - 01:00 AM Bug #22656: scrub mismatch on bytes (cache pools)
- /a/sage-2018-05-02_22:22:16-rados-wip-sage3-testing-2018-05-02-1448-distro-basic-smithi/2468046
description: rados... - 12:20 AM Feature #23979 (Resolved): Limit pg log length during recovery/backfill so that we don't run out ...
This means if there's another failure, we'll need to restart backfill or go from recovery to backfill, but that's b...
05/02/2018
- 09:02 PM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
- Did you import or export any PGs? The on-disk pg info from comment #2 indicates the pg doesn't exist on osd.33 yet.
... - 08:53 PM Bug #23961: valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::react(PG::AdvM...
- What PRs were in the test branch that hit this? Did any of them change the PG class or related structures?
- 12:23 PM Bug #23961: valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::react(PG::AdvM...
- rerunning this test with another branch did not reproduce this issue.
http://pulpito.ceph.com/kchai-2018-05-02_11:... - 01:50 AM Bug #23961 (Duplicate): valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::re...
- ...
- 08:48 PM Bug #23830: rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
- The pg meta object is supposed to be empty since many versions ago. IIRC sage suggested this may be from a race that ...
- 08:42 PM Bug #23860 (Pending Backport): luminous->master: luminous crashes with AllReplicasRecovered in St...
- 08:40 PM Bug #23942 (Duplicate): test_mon_osdmap_prune.sh failures
- 07:50 PM Bug #23769 (Fix Under Review): osd/EC: slow/hung ops in multimds suite test
- https://github.com/ceph/ceph/pull/21684
- 05:26 PM Bug #23966 (Fix Under Review): Deleting a pool with active notify linger ops can result in seg fault
- *PR*: https://github.com/ceph/ceph/pull/21786
- 04:00 PM Bug #23966 (In Progress): Deleting a pool with active notify linger ops can result in seg fault
- 03:51 PM Bug #23966 (Resolved): Deleting a pool with active notify linger ops can result in seg fault
- It's possible that if a notification is sent while a pool is being deleted, the Objecter will fail the Op w/ -ENOENT ...
- 02:50 PM Bug #23965 (New): FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cach...
- teuthology run with debug-ms 1 at http://pulpito.ceph.com/joshd-2018-05-01_18:40:57-rgw-master-distro-basic-smithi/
- 01:42 PM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- http://pulpito.ceph.com/pdonnell-2018-05-01_20:58:18-multimds-wip-pdonnell-testing-20180501.191840-testing-basic-smit...
- 11:47 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
- Moving to RADOS since it sounds like it's an issue of corruption on your cache tier.
- 02:41 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
- More discovery:
The snapshot exported from cache tier(rep_glance pool) is an all-zero file (viewed by "od xxx.snap... - 11:40 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- We frequently experience this with 12.2.3 running Ceph in a Kubernetes cluster, cf. https://github.com/ceph/ceph-cont...
- 11:32 AM Bug #23952: "ceph -f json osd pool ls detail" has missing pool namd and pool id
- Sorry, pool_name is here. Only pool id is missing.
- 10:11 AM Bug #23952: "ceph -f json osd pool ls detail" has missing pool namd and pool id
- Are you sure you're not getting pool name? I'm getting a pool_name field when I try this, and it appears to have bee...
- 11:04 AM Backport #23924 (In Progress): luminous: LibRadosAio.PoolQuotaPP failed
- https://github.com/ceph/ceph/pull/21778
- 06:53 AM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- Any update? Mentioned workaround is not good idea for us.
- 06:42 AM Bug #23949 (Resolved): osd: "failed to encode map e19 with expected crc" in cluster log "
- 05:22 AM Bug #23962 (Fix Under Review): ceph_daemon.py format_dimless units list index out of range
- https://github.com/ceph/ceph/pull/21765
- 04:02 AM Bug #23962: ceph_daemon.py format_dimless units list index out of range
- sorry, the actual max magnitude is EB level instead of ZB.
- 03:48 AM Bug #23962 (Resolved): ceph_daemon.py format_dimless units list index out of range
- The largest order of magnitude of original list max only to the PB level,however the ceph cluster Objecter actv metri...
- 03:31 AM Backport #23914 (In Progress): luminous: cache-try-flush hits wrlock, busy loops
- https://github.com/ceph/ceph/pull/21764
05/01/2018
- 06:31 PM Bug #23827: osd sends op_reply out of order
- For object 10000000004.00000004 osd_op_reply for 102425 is received before 93353....
- 05:52 PM Bug #23949 (Fix Under Review): osd: "failed to encode map e19 with expected crc" in cluster log "
- https://github.com/ceph/ceph/pull/21756
- 03:53 PM Bug #23949: osd: "failed to encode map e19 with expected crc" in cluster log "
- /a/sage-2018-05-01_15:25:33-fs-master-distro-basic-smithi/2462491
reproduces on master - 03:09 PM Bug #23949 (In Progress): osd: "failed to encode map e19 with expected crc" in cluster log "
- 03:09 PM Bug #23949: osd: "failed to encode map e19 with expected crc" in cluster log "
- ...
- 02:17 PM Bug #23949: osd: "failed to encode map e19 with expected crc" in cluster log "
- More from master: http://pulpito.ceph.com/pdonnell-2018-05-01_03:21:36-fs-master-testing-basic-smithi/
- 05:26 PM Bug #23940 (Pending Backport): recursive lock of objecter session::lock on cancel
- 02:39 PM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
- The problem of left-over OSD data still persists when the partition table has been removed before "ceph-disk zap" is ...
- 12:42 PM Backport #23905 (In Progress): jewel: Deleting a pool with active watch/notify linger ops can res...
- https://github.com/ceph/ceph/pull/21754
- 11:36 AM Backport #23904 (In Progress): luminous: Deleting a pool with active watch/notify linger ops can ...
- https://github.com/ceph/ceph/pull/21752
- 07:01 AM Bug #23952 (New): "ceph -f json osd pool ls detail" has missing pool namd and pool id
- `ceph osd pool ls detail` shows information about pool id and pool name, but with '-f json' this information disappears.
04/30/2018
- 11:10 PM Bug #23949 (Resolved): osd: "failed to encode map e19 with expected crc" in cluster log "
- http://pulpito.ceph.com/pdonnell-2018-04-30_21:17:21-fs-wip-pdonnell-testing-20180430.193008-testing-basic-smithi/245...
- 05:46 PM Bug #23860: luminous->master: luminous crashes with AllReplicasRecovered in Started/Primary/Activ...
- 05:25 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025569.html
Paul Emmerich wrote:
> looks like it fai... - 12:28 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- (Pulling backtrace into the ticket)
- 03:57 PM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
- This pg has 0 value in same_interval_since. I checked this with following output:
https://paste.fedoraproject.org/pa... - 01:12 PM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
- I found a little more...
- 03:48 PM Bug #23942 (Duplicate): test_mon_osdmap_prune.sh failures
- ...
- 02:55 PM Bug #23922 (Resolved): ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- 01:44 PM Bug #23922 (Fix Under Review): ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- https://github.com/ceph/ceph/pull/21739
- 01:32 PM Bug #23922: ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- ...
- 01:06 PM Bug #23922: ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- failed to reproduce this issue locally.
adding... - 11:00 AM Bug #23922: ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- http://pulpito.ceph.com/kchai-2018-04-30_00:59:17-rados-wip-kefu-testing-2018-04-29-1248-distro-basic-smithi/2454246/
- 02:53 PM Bug #23940 (Fix Under Review): recursive lock of objecter session::lock on cancel
- https://github.com/ceph/ceph/pull/21742
- 02:30 PM Bug #23940 (Resolved): recursive lock of objecter session::lock on cancel
- ...
- 12:30 PM Backport #23870 (In Progress): luminous: null map from OSDService::get_map in advance_pg
- https://github.com/ceph/ceph/pull/21737
04/29/2018
- 11:46 PM Bug #23937 (New): FAILED assert(info.history.same_interval_since != 0)
- Two of our osds hit these assert and now they are down....
- 10:23 AM Bug #22354 (Resolved): v12.2.2 unable to create bluestore osd using ceph-disk
- 10:23 AM Backport #23103 (Resolved): luminous: v12.2.2 unable to create bluestore osd using ceph-disk
- 10:22 AM Bug #22082 (Resolved): Various odd clog messages for mons
- 10:21 AM Backport #22167 (Resolved): luminous: Various odd clog messages for mons
- 10:21 AM Bug #22090 (Resolved): cluster [ERR] Unhandled exception from module 'balancer' while running on ...
- 10:20 AM Backport #22164 (Resolved): luminous: cluster [ERR] Unhandled exception from module 'balancer' wh...
- 10:20 AM Bug #21993 (Resolved): "ceph osd create" is not idempotent
- 10:20 AM Backport #22019 (Resolved): luminous: "ceph osd create" is not idempotent
- 10:19 AM Bug #21203 (Resolved): build_initial_pg_history doesn't update up/acting/etc
- 10:19 AM Backport #21236 (Resolved): luminous: build_initial_pg_history doesn't update up/acting/etc
- 07:07 AM Bug #21206 (Resolved): thrashosds read error injection doesn't take live_osds into account
- 07:07 AM Backport #21235 (Resolved): luminous: thrashosds read error injection doesn't take live_osds into...
- 06:22 AM Backport #23915 (In Progress): luminous: monitors crashing ./include/interval_set.h: 355: FAILED ...
- 05:44 AM Backport #22934: luminous: filestore journal replay does not guard omap operations
- https://github.com/ceph/ceph/pull/21547
04/28/2018
- 10:32 PM Backport #23915: luminous: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jew...
- https://github.com/ceph/ceph/pull/21717
- 07:11 PM Backport #23926 (Rejected): luminous: disable bluestore cache caused a rocksdb error
- 07:11 PM Backport #23925 (Resolved): luminous: assert on pg upmap
- https://github.com/ceph/ceph/pull/21818
- 07:11 PM Backport #23924 (Resolved): luminous: LibRadosAio.PoolQuotaPP failed
- https://github.com/ceph/ceph/pull/21778
- 06:19 PM Bug #23816 (Pending Backport): disable bluestore cache caused a rocksdb error
- 06:17 PM Bug #23878 (Pending Backport): assert on pg upmap
- 06:17 PM Bug #23916 (Pending Backport): LibRadosAio.PoolQuotaPP failed
- 06:16 PM Bug #23922 (Resolved): ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- ...
- 04:23 AM Bug #23921: pg-upmap cannot balance in some case
- But if i unlink all osds from 'root default / host huangjun', every thing works ok....
- 04:04 AM Bug #23921 (Resolved): pg-upmap cannot balance in some case
- I have a cluster with 21 osds, cluster topology is...
04/27/2018
- 10:38 PM Bug #23916 (Fix Under Review): LibRadosAio.PoolQuotaPP failed
- https://github.com/ceph/ceph/pull/21709
- 09:22 PM Bug #23916 (Resolved): LibRadosAio.PoolQuotaPP failed
- http://qa-proxy.ceph.com/teuthology/yuriw-2018-04-27_16:52:05-rados-wip-yuri-testing-2018-04-27-1519-distro-basic-smi...
- 10:27 PM Bug #23917 (Duplicate): LibRadosAio.PoolQuotaPP failure
- 10:24 PM Bug #23917 (Duplicate): LibRadosAio.PoolQuotaPP failure
- ...
- 08:07 PM Backport #23915 (Resolved): luminous: monitors crashing ./include/interval_set.h: 355: FAILED ass...
- https://github.com/ceph/ceph/pull/21717
- 08:06 PM Backport #23914 (Resolved): luminous: cache-try-flush hits wrlock, busy loops
- https://github.com/ceph/ceph/pull/21764
- 08:01 PM Bug #23860 (Fix Under Review): luminous->master: luminous crashes with AllReplicasRecovered in St...
- https://github.com/ceph/ceph/pull/21706
- 07:30 PM Bug #18746 (Pending Backport): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) ...
- 07:28 PM Bug #23664 (Pending Backport): cache-try-flush hits wrlock, busy loops
- 07:28 PM Bug #21165 (Can't reproduce): 2 pgs stuck in unknown during thrashing
- 07:27 PM Bug #23788 (Duplicate): luminous->mimic: EIO (crc mismatch) on copy-get from ec pool
- I think this was a dup of #23871
- 07:24 PM Backport #23912 (Resolved): luminous: mon: High MON cpu usage when cluster is changing
- https://github.com/ceph/ceph/pull/21968
- 07:17 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
- The zap run in this is definitely not zero'ing the first block based on log output...
- 06:49 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
- we clean more than 100m but i think its from the end
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph... - 06:25 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
- Thanks alfredo
It shows that zap is not working now, I think we should fix the ceph-disk zap to properly clean the... - 06:07 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
- Looking at the logs for the OSD that failed:...
- 05:48 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
- seen in both 14.04, 16.04 and centos for bluestore option only
14.04:
http://qa-proxy.ceph.com/teuthology/teuth... - 05:45 PM Bug #23911 (Won't Fix - EOL): ceph:luminous: osd out/down when setup with ubuntu/bluestore
- this could be a systemd issue or more,
a) setup cluster using ceph-deploy
b) use ceph-disk/bluestore option for ... - 05:26 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
- Moving this back to RADOS as it seems the new consensus is that it's a RADOS bug.
- 06:46 AM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
- From message: "error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, count...
- 04:38 PM Bug #23893 (Resolved): jewel clients fail to decode mimic osdmap
- it was a bug in wip-osdmap-encode, fixed before merge
- 04:14 PM Bug #23713 (Pending Backport): High MON cpu usage when cluster is changing
- 03:01 PM Bug #23909 (Resolved): snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a...
New code for tracker #22881 in pull request https://github.com/ceph/ceph/pull/21546 no calls _scan_snaps() on each ...- 01:23 PM Bug #23627 (Fix Under Review): Error EACCES: problem getting command descriptions from mgr.None f...
- https://github.com/ceph/ceph/pull/21698
- 01:16 PM Bug #23627: Error EACCES: problem getting command descriptions from mgr.None from 'ceph tell mgr'
- ...
- 12:22 PM Bug #23627: Error EACCES: problem getting command descriptions from mgr.None from 'ceph tell mgr'
- /a//kchai-2018-04-27_07:23:02-rados-wip-kefu-testing-2018-04-27-0902-distro-basic-smithi/2444194
- 10:43 AM Backport #23905 (Resolved): jewel: Deleting a pool with active watch/notify linger ops can result...
- https://github.com/ceph/ceph/pull/21754
- 10:42 AM Backport #23904 (Resolved): luminous: Deleting a pool with active watch/notify linger ops can res...
- https://github.com/ceph/ceph/pull/21752
- 10:39 AM Backport #23850 (New): luminous: Read operations segfaulting multiple OSDs
- Status can change to "In Progress" when the PR is open and URL of PR is mentioned in a comment.
- 06:29 AM Backport #23850 (In Progress): luminous: Read operations segfaulting multiple OSDs
- 10:17 AM Bug #23899: run cmd 'ceph daemon osd.0 smart' cause osd daemon Segmentation fault
- root cause is sometimes output.read_fd() could return 0 length data.
ret = output.read_fd(smartctl.get_stdout(), 1... - 10:15 AM Bug #23899 (Resolved): run cmd 'ceph daemon osd.0 smart' cause osd daemon Segmentation fault
2018-04-27 09:44:51.572 7fb787a05700 -1 osd.0 57 smartctl output is:
2018-04-27 09:44:51.576 7fb787a05700 -1 *** C...- 09:00 AM Bug #23879: test_mon_osdmap_prune.sh fails
- ...
- 01:34 AM Bug #23878: assert on pg upmap
- This pr #21670 passed tests failed before in my local cluster, needs qa
- 12:55 AM Bug #23872 (Pending Backport): Deleting a pool with active watch/notify linger ops can result in ...
04/26/2018
- 11:27 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
Seen again:
http://qa-proxy.ceph.com/teuthology/dzafman-2018-04-26_10:04:07-rados-wip-zafman-testing-distro-basi...- 10:33 PM Bug #23893 (Resolved): jewel clients fail to decode mimic osdmap
- http://pulpito.ceph.com/sage-2018-04-26_19:17:57-rados:thrash-old-clients-wip-sage-testing-2018-04-26-1251-distro-bas...
- 10:22 PM Bug #23871 (Resolved): luminous->mimic: missing primary copy of xxx, wil try copies on 3, then fu...
- 10:20 PM Bug #23892 (Can't reproduce): luminous->mimic: mon segv in ~MonOpRequest from OpHistoryServiceThread
- ...
- 05:06 PM Bug #23785 (Resolved): "test_prometheus (tasks.mgr.test_module_selftest.TestModuleSelftest) ... E...
- test is passing now
- 02:23 PM Bug #23769 (In Progress): osd/EC: slow/hung ops in multimds suite test
- 01:55 PM Bug #23878 (Fix Under Review): assert on pg upmap
- https://github.com/ceph/ceph/pull/21670
- 01:55 PM Bug #23878: assert on pg upmap
- 09:52 AM Bug #23878: assert on pg upmap
- I’ll prepare a patch soon
- 06:44 AM Bug #23878: assert on pg upmap
- And then if i do pg-upmap operation....
- 05:35 AM Bug #23878: assert on pg upmap
- After pick the pr https://github.com/ceph/ceph/pull/21325
It works fine.
But i have some question:
the upmap items... - 04:31 AM Bug #23878 (Resolved): assert on pg upmap
- I use the follow script to test upmap...
- 10:09 AM Backport #23863 (In Progress): luminous: scrub interaction with HEAD boundaries and clones is broken
- 09:16 AM Backport #23863: luminous: scrub interaction with HEAD boundaries and clones is broken
- https://github.com/ceph/ceph/pull/21665
- 07:46 AM Bug #23879 (Can't reproduce): test_mon_osdmap_prune.sh fails
- ...
- 02:46 AM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- /kchai-2018-04-26_00:52:32-rados-wip-kefu-testing-2018-04-25-2253-distro-basic-smithi/2439501/
- 12:02 AM Bug #20924: osd: leaked Session on osd.7
- osd.3 here:
http://pulpito.ceph.com/yuriw-2018-04-23_23:19:23-rados-wip-yuri-testing-2018-04-23-1502-distro-basic-...
04/25/2018
- 10:10 PM Bug #23875 (Resolved): Removal of snapshot with corrupt replica crashes osd
This may be a completely legitimate crash due to the curruption.
See pending test case TEST_scrub_snaps_replica ...- 09:46 PM Bug #23816 (Fix Under Review): disable bluestore cache caused a rocksdb error
- 09:29 PM Bug #23204 (Duplicate): missing primary copy of object in mixed luminous<->master cluster with bl...
- 09:28 PM Bug #21992 (Duplicate): osd: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 09:14 PM Backport #23786 (Fix Under Review): luminous: "utilities/env_librados.cc:175:33: error: unused pa...
- https://github.com/ceph/ceph/pull/21655
- 09:09 PM Bug #23827: osd sends op_reply out of order
- 06:26 AM Bug #23827: osd sends op_reply out of order
- same bug #20742
- 03:46 AM Bug #23827: osd sends op_reply out of order
- Ignore my statement. Dispatch do put_back.So no race .
- 03:24 AM Bug #23827: osd sends op_reply out of order
- For this case: if slot->to_process is null. And Op1 enqueue_front. At the same time Op2 dispatch. Because two threads...
- 09:09 PM Bug #23664 (Fix Under Review): cache-try-flush hits wrlock, busy loops
- 08:35 PM Bug #23664: cache-try-flush hits wrlock, busy loops
- reproducing this semi-frequently, see #23847
This should fix it: https://github.com/ceph/ceph/pull/21653 - 09:07 PM Bug #23871 (In Progress): luminous->mimic: missing primary copy of xxx, wil try copies on 3, then...
- 04:25 PM Bug #23871 (Resolved): luminous->mimic: missing primary copy of xxx, wil try copies on 3, then fu...
- ...
- 08:34 PM Bug #23847 (Duplicate): osd stuck recovery
- 05:52 PM Bug #23847: osd stuck recovery
- Recovery is starved by #23664, a cache tiering infinite loop.
- 05:36 PM Bug #23847: osd stuck recovery
- recovery on 3.3 stalls out here...
- 05:26 PM Bug #23872: Deleting a pool with active watch/notify linger ops can result in seg fault
- Original test failure where this issue was discovered: http://pulpito.ceph.com/trociny-2018-04-24_08:17:18-rbd-wip-mg...
- 05:24 PM Bug #23872 (Fix Under Review): Deleting a pool with active watch/notify linger ops can result in ...
- *PR*: https://github.com/ceph/ceph/pull/21649
- 05:17 PM Bug #23872 (Resolved): Deleting a pool with active watch/notify linger ops can result in seg fault
- ...
- 04:24 PM Backport #23870 (Resolved): luminous: null map from OSDService::get_map in advance_pg
- https://github.com/ceph/ceph/pull/21737
- 04:23 PM Backport #23863 (Resolved): luminous: scrub interaction with HEAD boundaries and clones is broken
- https://github.com/ceph/ceph/pull/22044
- 04:00 PM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
- master commit says:
Consider a scenario like:
- scrub [3:2525d100:::earlier:head,3:2525d12f:::foo:200]
- we see... - 03:58 PM Bug #23646 (Pending Backport): scrub interaction with HEAD boundaries and clones is broken
- 03:48 PM Bug #21977 (Pending Backport): null map from OSDService::get_map in advance_pg
- 03:45 PM Bug #23857: flush (manifest) vs async recovery causes out of order op
- maybe: /a/sage-2018-04-25_02:28:01-rados-wip-sage3-testing-2018-04-24-1729-distro-basic-smithi/2436808
rados/thras... - 03:37 PM Bug #23857: flush (manifest) vs async recovery causes out of order op
- maybe: /a/sage-2018-04-25_02:28:01-rados-wip-sage3-testing-2018-04-24-1729-distro-basic-smithi/2436663
rados/thras... - 01:53 PM Bug #23857: flush (manifest) vs async recovery causes out of order op
- The core problem is that the requeue logic assumes that objects always go from degraded to not degraded.. never the o...
- 01:49 PM Bug #23857 (Can't reproduce): flush (manifest) vs async recovery causes out of order op
- ...
- 03:44 PM Bug #23860 (Resolved): luminous->master: luminous crashes with AllReplicasRecovered in Started/Pr...
- ...
- 11:10 AM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
- We hit this today on Jewel release (10.2.7), all OSDs connected to one of the monitor in the quorum having this issue...
- 08:23 AM Backport #23852 (In Progress): luminous: OSD crashes on empty snapset
- 08:18 AM Backport #23852 (Resolved): luminous: OSD crashes on empty snapset
- https://github.com/ceph/ceph/pull/21638
- 08:18 AM Bug #23851 (Resolved): OSD crashes on empty snapset
- Fix merged to master: https://github.com/ceph/ceph/pull/21058
- 04:49 AM Backport #23850 (Resolved): luminous: Read operations segfaulting multiple OSDs
- https://github.com/ceph/ceph/pull/21911
04/24/2018
- 10:33 PM Bug #21931 (In Progress): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (ra...
- This is a bug in trimtrunc handling with EC pools.
- 10:27 PM Bug #23195 (Pending Backport): Read operations segfaulting multiple OSDs
- https://github.com/ceph/ceph/pull/21273
- 10:20 PM Bug #23195 (Resolved): Read operations segfaulting multiple OSDs
- 10:25 PM Bug #23847 (Duplicate): osd stuck recovery
- ...
- 10:24 PM Bug #23827 (In Progress): osd sends op_reply out of order
- 09:26 AM Bug #23827: osd sends op_reply out of order
- ...
- 08:36 PM Bug #23646 (Fix Under Review): scrub interaction with HEAD boundaries and clones is broken
- https://github.com/ceph/ceph/pull/21628
I think this will fix it? - 12:42 AM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
The commit below adds code to honor the no_whiteout flag even when it looks like clones exist or will exist soon. ...- 06:03 PM Bug #21977: null map from OSDService::get_map in advance_pg
- https://github.com/ceph/ceph/pull/21623
- 06:02 PM Bug #21977 (Fix Under Review): null map from OSDService::get_map in advance_pg
- advance_pg ran before init() published the initial map to OSDService.
- 05:33 PM Bug #23716 (Resolved): osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (o...
- This seems to be resolved. My guess is it's fallout from https://github.com/ceph/ceph/pull/21604
- 03:45 PM Bug #23763 (Pending Backport): upgrade: bad pg num and stale health status in mixed lumnious/mimi...
04/23/2018
- 09:40 PM Bug #23646 (In Progress): scrub interaction with HEAD boundaries and clones is broken
The osd log for primary osd.1 shows that pg 3.0 is a cache pool in a cache tiering configuration. The message "_de...- 09:12 PM Bug #23830 (Can't reproduce): rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
- ...
- 07:45 PM Bug #23828 (Can't reproduce): ec gen object leaks into different filestore collection just after ...
- ...
- 05:11 PM Bug #23827 (Resolved): osd sends op_reply out of order
- ...
- 03:13 AM Bug #23713 (Fix Under Review): High MON cpu usage when cluster is changing
- https://github.com/ceph/ceph/pull/21532
04/22/2018
- 08:07 PM Bug #21977: null map from OSDService::get_map in advance_pg
- From the latest logs, the peering thread id does not appear at all in the log until the crash.
I'm wondering if we... - 08:05 PM Bug #21977: null map from OSDService::get_map in advance_pg
- Seen again here:
http://pulpito.ceph.com/yuriw-2018-04-20_20:02:29-upgrade:jewel-x-luminous-distro-basic-ovh/2420862/
04/21/2018
- 04:06 PM Bug #23793: ceph-osd consumed 10+GB rss memory
- Set osd_debug_op_order to false can fix this problem.
My ceph cluster is created through vstart.sh which set osd_deb... - 03:57 PM Bug #23816: disable bluestore cache caused a rocksdb error
- https://github.com/ceph/ceph/pull/21583
- 03:53 PM Bug #23816 (Resolved): disable bluestore cache caused a rocksdb error
- I disabled bluestore/rocksdb cache to estimate ceph-osd's memory consumption
by set bluestore_cache_size_ssd/bluesto... - 06:55 AM Bug #23145: OSD crashes during recovery of EC pg
- `2018-03-09 08:29:09.170227 7f901e6b30 10 merge_log log((17348'18587,17348'18587], crt=17348'18585) from osd.6(2) int...
04/20/2018
- 09:09 PM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- You don't really have authentication without the message signing. Since we don't do full encryption, signing is the o...
- 03:07 PM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- How costly is just the authentication piece, i.e. keep cephx but turn off message signing?
- 07:21 AM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- Summary of the discussion:
`check_message_signature` in `AsyncConnection::process` is being already protected by `... - 06:38 AM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- per Radoslaw Zarzynski
> the overhead between `CreateContextBySym` and `DigestBegin` is small
and probably we c... - 08:53 PM Bug #23517 (Resolved): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- 02:27 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- https://github.com/ceph/ceph/pull/21571
- 08:47 PM Bug #23811: RADOS stat slow for some objects on same OSD
- We are still debugging this. On a further look, it looks like all objects on that PG (aka _79.1f9_) show similar slow...
- 05:30 PM Bug #23811 (New): RADOS stat slow for some objects on same OSD
- We have observed that queries have been slow for some RADOS objects while others on the same OSD respond much quickly...
- 05:19 PM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
- I guess the intention is that scrubbing takes priority and proceeds even if trimming is in progress. Before more tri...
- 04:45 PM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
We don't start trimming if scrubbing is happening, so maybe the only hole is that scrubbing doesn't check for trimm...- 04:38 PM Bug #23810: ceph mon dump outputs verbose text to stderr
- As a simple verification, running:...
- 04:26 PM Bug #23810 (New): ceph mon dump outputs verbose text to stderr
- When executing...
- 02:41 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- My opinion is that this is different from a problem where the inconsistent flag reappears after repairing a PG becaus...
- 12:55 PM Backport #23808 (In Progress): luminous: upgrade: bad pg num and stale health status in mixed lum...
- https://github.com/ceph/ceph/pull/21556
- 12:55 PM Backport #23808 (Resolved): luminous: upgrade: bad pg num and stale health status in mixed lumnio...
- https://github.com/ceph/ceph/pull/21556
- 11:11 AM Bug #23763 (Fix Under Review): upgrade: bad pg num and stale health status in mixed lumnious/mimi...
- https://github.com/ceph/ceph/pull/21555
- 10:09 AM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- i think the pg_num = 11 is set by LibRadosList.EnumerateObjects...
- 12:32 AM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- Yuri reproduced the bad pg_num in 1 of 2 runs:...
- 12:48 AM Bug #22881 (In Progress): scrub interaction with HEAD boundaries and snapmapper repair is broken
04/19/2018
- 02:18 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- 12:34 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- https://github.com/ceph/ceph/pull/21280
- 07:42 AM Bug #23517 (Fix Under Review): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- ...
- 09:33 AM Backport #23784 (In Progress): luminous: osd: Warn about objects with too many omap entries
- 09:33 AM Backport #23784: luminous: osd: Warn about objects with too many omap entries
- h3. description
As discussed in this PR - https://github.com/ceph/ceph/pull/16332 - 07:29 AM Bug #23793: ceph-osd consumed 10+GB rss memory
- the "mon max pg per osd" is 1024 in my test.
- 07:14 AM Bug #23793 (New): ceph-osd consumed 10+GB rss memory
- After 26GB data is written, ceph-osd's memory(rss) reached 10+GB.
The objectstore backed is *KStore*. master branc... - 06:42 AM Backport #22934 (In Progress): luminous: filestore journal replay does not guard omap operations
04/18/2018
- 09:10 PM Bug #23620 (Resolved): tasks.mgr.test_failover.TestFailover failure
- 08:25 PM Bug #23788 (Duplicate): luminous->mimic: EIO (crc mismatch) on copy-get from ec pool
- ...
- 08:01 PM Bug #23787 (Rejected): luminous: "osd-scrub-repair.sh'" failures in rados
- This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s... - 07:57 PM Backport #23786 (Resolved): luminous: "utilities/env_librados.cc:175:33: error: unused parameter ...
- This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s... - 07:52 PM Bug #23785 (Resolved): "test_prometheus (tasks.mgr.test_module_selftest.TestModuleSelftest) ... E...
- This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s... - 06:45 PM Backport #23784 (Resolved): luminous: osd: Warn about objects with too many omap entries
- https://github.com/ceph/ceph/pull/21518
- 03:34 PM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- the pgs with creating or unknown status "pg dump" were active+clean after 2018-04-16 22:47. so the output of last "pg...
- 01:29 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Any update on this?
- 12:14 PM Bug #23701 (Fix Under Review): oi(object_info_t) size mismatch with real object size during old w...
- 12:12 PM Bug #23701: oi(object_info_t) size mismatch with real object size during old write arrives after ...
- i think this issue only exists in jewel.
- 02:47 AM Documentation #23777: doc: description of OSD_OUT_OF_ORDER_FULL problem
- the default values :
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
- 02:39 AM Documentation #23777 (Resolved): doc: description of OSD_OUT_OF_ORDER_FULL problem
- The description of OSD_OUT_OF_ORDER_FULL is...
- 12:30 AM Feature #23364 (Resolved): Special scrub handling of hinfo_key errors
- 12:30 AM Backport #23654 (Resolved): luminous: Special scrub handling of hinfo_key errors
04/17/2018
- 07:00 PM Backport #23772 (Resolved): luminous: ceph status shows wrong number of objects
- https://github.com/ceph/ceph/pull/22680
- 06:36 PM Bug #23769 (Resolved): osd/EC: slow/hung ops in multimds suite test
- ...
- 03:40 PM Feature #23364: Special scrub handling of hinfo_key errors
- This pull request is another follow on:
https://github.com/ceph/ceph/pull/21450 - 11:41 AM Bug #20924: osd: leaked Session on osd.7
- /a/sage-2018-04-17_04:17:03-rados-wip-sage3-testing-2018-04-16-2028-distro-basic-smithi/2404155
this time on osd.4... - 07:37 AM Bug #23767: "ceph ping mon" doesn't work
- so "ceph ping mon.<id>" will remind you mon.<id> doesn't existed. however, if you run "ceph ping mon.a", you can get ...
- 07:33 AM Bug #23767 (New): "ceph ping mon" doesn't work
- if there is only mon_host= ip1, ip2...in the ceph.conf, then "ceph ping mon.<id>" doesn't work.
Root cause is in the... - 06:14 AM Bug #23273: segmentation fault in PrimaryLogPG::recover_got()
- Sorry for late reply, but it's hard to reproduce. we reproduce it once with...
- 02:09 AM Documentation #23765 (New): librbd hangs if permissions are incorrect
- I've been building rust bindings for librbd against ceph jewel and luminous. I found out by accident that if a cephx...
- 12:14 AM Bug #23763 (Resolved): upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- This happened in a luminous-x/point-to-point run. Logs in teuthology:/home/yuriw/logs/2387999/
Versions at this po...
04/16/2018
- 05:52 PM Bug #23760 (New): mon: `config get <who>` does not allow `who` as 'mon'/'osd'
- `config set mon` is allowed, but `config get mon` is not.
This is due to <who> on `get` being parsed as an EntityN... - 04:39 PM Bug #23753: "Error ENXIO: problem getting command descriptions from osd.4" in upgrade:kraken-x-lu...
- This generally means the OSD isn't on?
04/15/2018
- 10:22 PM Bug #23753 (Can't reproduce): "Error ENXIO: problem getting command descriptions from osd.4" in u...
- Run: http://pulpito.ceph.com/teuthology-2018-04-15_03:25:02-upgrade:kraken-x-luminous-distro-basic-smithi/
Jobs: '23... - 05:44 PM Bug #23471 (Resolved): add --add-bucket and --move options to crushtool
- 02:51 PM Bug #22095 (Pending Backport): ceph status shows wrong number of objects
- 08:52 AM Bug #19348: "ceph ping mon.c" cli prints assertion failure on timeout
- https://github.com/ceph/ceph/pull/21432
04/14/2018
- 06:11 AM Support #23719: Three nodes shutdown,only boot two of the nodes,many pgs down.(host failure-domai...
- fix description: If pg1.1 have acting set [1,2,3], I power down osd.3 first,then power down osd.2. In case I boot osd...
- 05:50 AM Support #23719 (New): Three nodes shutdown,only boot two of the nodes,many pgs down.(host failure...
- The interval mechanism of PG will cause a problem in the process of cluster restart.If I have 3 nodes(host failure-do...
04/13/2018
- 10:40 PM Bug #23716: osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (on upgrade f...
- ...
- 10:21 PM Bug #23716 (Resolved): osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (o...
- ...
- 07:33 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- Live multimds run: /ceph/teuthology-archive/pdonnell-2018-04-13_04:19:33-multimds-wip-pdonnell-testing-20180413.02283...
- 07:30 PM Bug #21992: osd: src/common/interval_map.h: 161: FAILED assert(len > 0)
- /ceph/teuthology-archive/pdonnell-2018-04-13_04:19:33-multimds-wip-pdonnell-testing-20180413.022831-testing-basic-smi...
- 06:28 PM Bug #23713: High MON cpu usage when cluster is changing
- My guess is that this is the compat reencoding of the OSDMap for the pre-luminous clients.
Are you by chance makin... - 06:10 PM Bug #23713 (Resolved): High MON cpu usage when cluster is changing
- After upgrading to Luminous 12.2.4 (from Jewel 10.2.5), we consistently see high cpu usage when OSDMap changes , esp...
- 03:03 PM Bug #23228 (Closed): scrub mismatch on objects
- The failure in comment (2) looks unrelated, but i twas a test branch. let's see if it happens again.
The original ... - 01:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Is there any testing, logs, etc that will be helpful for tracking down the cause of this problem. I had a fairly bad...
- 08:20 AM Bug #23701: oi(object_info_t) size mismatch with real object size during old write arrives after ...
- here is my pull request to fix this problem
https://github.com/ceph/ceph/pull/21408 - 08:08 AM Bug #23701 (Fix Under Review): oi(object_info_t) size mismatch with real object size during old w...
- currently, in our test environment (jewel : cephfs + cache tier + ec pool), we found several osd coredump
in the fol... - 01:52 AM Backport #23654 (In Progress): luminous: Special scrub handling of hinfo_key errors
- https://github.com/ceph/ceph/pull/21397
04/12/2018
- 11:08 PM Feature #23364: Special scrub handling of hinfo_key errors
- Follow on pull request included in backport to this tracker
https://github.com/ceph/ceph/pull/21362 - 09:49 PM Bug #23610 (Resolved): pg stuck in activating because of dropped pg-temp message
- 09:48 PM Backport #23630 (Resolved): luminous: pg stuck in activating
- 09:28 PM Backport #23630: luminous: pg stuck in activating
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21330
merged - 05:35 PM Bug #23228: scrub mismatch on objects
- My change only affects the scrub error counts in the stats. However, if setting dirty_info in proc_primary_info() wo...
- 04:27 PM Bug #23228: scrub mismatch on objects
- The original report was an EC test, so it looks like a dup of #23339.
David, your failures are not EC. Could they... - 04:43 PM Bug #20439 (Can't reproduce): PG never finishes getting created
- 04:29 PM Bug #22656: scrub mismatch on bytes (cache pools)
- Just bytes
dzafman-2018-03-28_18:21:29-rados-wip-zafman-testing-distro-basic-smithi/2332093
[ERR] 3.0 scrub sta... - 02:29 PM Bug #20874: osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end() || (miter->second...
- /a/sage-2018-04-11_22:26:40-rados-wip-sage-testing-2018-04-11-1604-distro-basic-smithi/2387226
- 02:25 PM Backport #23668 (In Progress): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrit...
- https://github.com/ceph/ceph/pull/21378
- 01:34 AM Backport #23668 (Resolved): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrites'...
- https://github.com/ceph/ceph/pull/21378
- 07:19 AM Backport #23675 (In Progress): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- 07:07 AM Backport #23675 (Resolved): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- https://github.com/ceph/ceph/pull/21368
- 03:27 AM Bug #23662 (Resolved): osd: regression causes SLOW_OPS warnings in multimds suite
- 02:59 AM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- not able to move this to CI somehow... moving it to RADOS.
- 02:54 AM Backport #23315 (Resolved): luminous: pool create cmd's expected_num_objects is not correctly int...
- 02:41 AM Bug #23622 (Pending Backport): qa/workunits/mon/test_mon_config_key.py fails on master
- 02:01 AM Bug #23564: OSD Segfaults
- Correct, Bluestore and Luminous 12.2.4
- 01:57 AM Backport #23673 (In Progress): jewel: auth: ceph auth add does not sanity-check caps
- 01:43 AM Backport #23673 (Resolved): jewel: auth: ceph auth add does not sanity-check caps
- https://github.com/ceph/ceph/pull/21367
- 01:53 AM Bug #23578 (Resolved): large-omap-object-warnings test fails
- 01:52 AM Backport #23633 (Rejected): luminous: large-omap-object-warnings test fails
- We can close this if that test isn't present in luminous.
- 01:35 AM Backport #23633 (Need More Info): luminous: large-omap-object-warnings test fails
- Brad,
Backporting PR#21295 to luminous is unrelated unless we get qa/suites/rados/singleton-nomsgr/all/large-omap-ob... - 01:41 AM Backport #23670 (In Progress): luminous: auth: ceph auth add does not sanity-check caps
- 01:34 AM Backport #23670 (Resolved): luminous: auth: ceph auth add does not sanity-check caps
- https://github.com/ceph/ceph/pull/24906
- 01:34 AM Backport #23654 (New): luminous: Special scrub handling of hinfo_key errors
- 01:33 AM Bug #22525 (Pending Backport): auth: ceph auth add does not sanity-check caps
04/11/2018
- 11:22 PM Bug #23662 (Fix Under Review): osd: regression causes SLOW_OPS warnings in multimds suite
- https://github.com/ceph/teuthology/pull/1166
- 09:38 PM Bug #23662: osd: regression causes SLOW_OPS warnings in multimds suite
- Looks like the obvious cause: https://github.com/ceph/ceph/pull/20660
- 07:56 PM Bug #23662 (Resolved): osd: regression causes SLOW_OPS warnings in multimds suite
- See: [1], first instance of the problem at [0].
The last run which did not cause most multimds jobs to fail with S... - 11:20 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
Any scrub that completes without errors will set num_scrub_errors in pg stats to 0. That will cause the inconsiste...- 10:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- David, is there any way a missing object wouldn't be reported in list-inconsistent output?
- 11:01 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- Let's see if this happens again now that sage's fast peering branch is merged.
- 10:58 PM Backport #23485 (Resolved): luminous: scrub errors not cleared on replicas can cause inconsistent...
- 10:58 PM Bug #23585: osd: safe_timer segfault
- Possibly the same as http://tracker.ceph.com/issues/23431
- 02:10 PM Bug #23585: osd: safe_timer segfault
- Got segfault in safe_timer too. Got it just once so can not provide more info at the moment.
2018-04-03 05:53:07... - 10:57 PM Bug #23564: OSD Segfaults
- Is this on bluestore? there are a few reports of this occurring on bluestore including your other bug http://tracker....
- 10:44 PM Bug #23590: kstore: statfs: (95) Operation not supported
- 10:42 PM Bug #23601 (Resolved): Cannot see the value of parameter osd_op_threads via ceph --show-config
- 10:37 PM Bug #23614: local_reserver double-reservation of backfilled pg
- This may be the same root cause as http://tracker.ceph.com/issues/23490
- 10:36 PM Bug #23565: Inactive PGs don't seem to cause HEALTH_ERR
- Brad, can you take a look at this? I think it can be handled by the stuck pg code, that iirc already warns about pgs ...
- 10:25 PM Bug #23664 (Resolved): cache-try-flush hits wrlock, busy loops
- ...
- 10:12 PM Bug #23403 (Closed): Mon cannot join quorum
- Thanks for letting us know.
- 01:15 PM Bug #23403: Mon cannot join quorum
- After more investigation we discovered that one of the bonds on the machine was not behaving properly. We removed the...
- 11:28 AM Bug #23403: Mon cannot join quorum
- Thanks for the investigation Brad.
The "fault, initiating reconnect" and "RESETSESSION" messages only appear when ... - 07:57 PM Bug #23595: osd: recovery/backfill is extremely slow
- @Greg Farnum: Ah, great that part is already handled!
What about my other questions though, like
> I think it i... - 06:45 PM Bug #23595 (Duplicate): osd: recovery/backfill is extremely slow
- https://tracker.ceph.com/issues/23141
Sorry you ran into this, it's a bug in BlueStore/BlueFS. The fix will be in ... - 07:49 PM Backport #23315: luminous: pool create cmd's expected_num_objects is not correctly interpreted
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20907
merged - 05:45 PM Feature #23660 (New): when scrub errors are due to disk read errors, ceph status can say "likely ...
- If some of the scrub errors are due to disk read errors, we can also say in the status output "likely disk errors" an...
- 03:49 PM Bug #23487 (Pending Backport): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- 03:39 PM Backport #23654 (Resolved): luminous: Special scrub handling of hinfo_key errors
- https://github.com/ceph/ceph/pull/21397
- 03:09 PM Bug #23621 (Resolved): qa/standalone/mon/misc.sh fails on master
- 01:40 PM Backport #23316: jewel: pool create cmd's expected_num_objects is not correctly interpreted
- -https://github.com/ceph/ceph/pull/21042-
but test/mon/osd-pool-create.sh failing, looking into it. - 05:00 AM Bug #23610 (Pending Backport): pg stuck in activating because of dropped pg-temp message
- 04:56 AM Bug #23648 (New): max-pg-per-osd.from-primary fails because of activating pg
- the reason why we have activating pg when the number of pg is under the hard limit of max-pg-per-osd is that:
1. o... - 03:01 AM Bug #23647 (In Progress): thrash-eio test can prevent recovery
- We are injecting random EIOs. However, in a recovery situation an EIO leads us to decide the object is missing in on...
04/10/2018
- 11:38 PM Feature #23364 (Pending Backport): Special scrub handling of hinfo_key errors
- 09:13 PM Bug #23428: Snapset inconsistency is hard to diagnose because authoritative copy used by list-inc...
- In the pull request https://github.com/ceph/ceph/pull/20947 there is a change to partially address this issue. Unfor...
- 09:08 PM Bug #23646 (Resolved): scrub interaction with HEAD boundaries and clones is broken
- Scrub will work in chunks, accumulating work in cleaned_meta_map. A single object's clones may stretch across two su...
- 06:12 PM Backport #23630 (In Progress): luminous: pg stuck in activating
- 05:53 PM Backport #23630 (Resolved): luminous: pg stuck in activating
- https://github.com/ceph/ceph/pull/21330
- 05:53 PM Bug #23610 (Fix Under Review): pg stuck in activating because of dropped pg-temp message
- 05:53 PM Bug #23610 (Pending Backport): pg stuck in activating because of dropped pg-temp message
- 05:53 PM Backport #23633 (Rejected): luminous: large-omap-object-warnings test fails
- 05:47 PM Bug #18746 (Fix Under Review): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) ...
- 04:26 PM Bug #23495 (Resolved): Need (SLOW_OPS) in whitelist for another yaml
- 11:26 AM Bug #23495 (Fix Under Review): Need (SLOW_OPS) in whitelist for another yaml
- https://github.com/ceph/ceph/pull/21324
- 01:55 PM Bug #23627 (Resolved): Error EACCES: problem getting command descriptions from mgr.None from 'cep...
- ...
- 01:32 PM Bug #23622 (Fix Under Review): qa/workunits/mon/test_mon_config_key.py fails on master
- https://github.com/ceph/ceph/pull/21329
- 03:42 AM Bug #23622: qa/workunits/mon/test_mon_config_key.py fails on master
- see https://github.com/ceph/ceph/pull/21317 (not a fix)
- 02:56 AM Bug #23622 (Resolved): qa/workunits/mon/test_mon_config_key.py fails on master
- ...
- 07:04 AM Bug #20919 (Resolved): osd: replica read can trigger cache promotion
- 06:59 AM Backport #22403 (Resolved): jewel: osd: replica read can trigger cache promotion
- 06:22 AM Bug #23585: osd: safe_timer segfault
- https://drive.google.com/open?id=1x_0p9s9JkQ1zo-LCx6mHxm0DQO5sc1UA too larger about(1.2G). And ceph-osd.297.log.gz di...
- 05:53 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- doc don't update. So i create a PR:https://github.com/ceph/ceph/pull/21319.
- 04:57 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- In this commit:08731c3567300b28d83b1ac1c2ba. It removed. Maybe docs didn't update or you read old docs.
- 04:27 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- But I can see this option in document !! The setting is work in Jewel
So osd_op_threads was removed from Luminous ??
- 03:14 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- There is no "osd_op_threads". Now it call osd_op_num_shards/osd_op_num_shards_hdd/osd_op_num_shards_sdd.
- 05:34 AM Bug #23595: osd: recovery/backfill is extremely slow
- check hdd or ssd by code at osd started and not changed after starting.
I think we need increase the log level fo... - 05:19 AM Bug #23593 (Resolved): RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- 04:29 AM Bug #23621 (In Progress): qa/standalone/mon/misc.sh fails on master
- https://github.com/ceph/ceph/pull/21318
- 04:17 AM Bug #23621: qa/standalone/mon/misc.sh fails on master
- bc5df2b4497104c2a8747daf0530bb5184f9fecb added ceph::features::mon::FEATURE_OSDMAP_PRUNE so the output that's failing...
- 02:53 AM Bug #23621: qa/standalone/mon/misc.sh fails on master
- http://pulpito.ceph.com/sage-2018-04-10_02:19:56-rados-master-distro-basic-smithi/2377263
http://pulpito.ceph.com/sa... - 02:51 AM Bug #23621 (Resolved): qa/standalone/mon/misc.sh fails on master
- This appears to be from the addition of the osdmap-prune mon feature?
- 02:49 AM Bug #23620 (Fix Under Review): tasks.mgr.test_failover.TestFailover failure
- https://github.com/ceph/ceph/pull/21315
- 02:43 AM Bug #23620 (Resolved): tasks.mgr.test_failover.TestFailover failure
- http://pulpito.ceph.com/sage-2018-04-10_02:19:56-rados-master-distro-basic-smithi/2377255...
- 12:57 AM Bug #23578 (Pending Backport): large-omap-object-warnings test fails
- Just a note that my analysis above was incorrect and this was not due to the lost coin flips but due to a pg map upda...
- 12:18 AM Backport #23485 (In Progress): luminous: scrub errors not cleared on replicas can cause inconsist...
04/09/2018
- 10:24 PM Feature #23616 (New): osd: admin socket should help debug status at all times
- Last week I was looking at an LRC OSD which was having trouble, and it wasn't clear why.
The cause ended up being ... - 10:18 PM Bug #22882 (Resolved): Objecter deadlocked on op budget while holding rwlock in ms_handle_reset()
- Whoops, this merged way back then with a slightly different plan than discussed here (see PR discussion).
- 09:59 PM Bug #22525: auth: ceph auth add does not sanity-check caps
- https://github.com/ceph/ceph/pull/21311
- 09:21 PM Feature #23045 (Resolved): mon: warn on slow ops in OpTracker
- That PR got merged a while ago and we've been working through the slow ops warnings that turn up since. Seems to be a...
- 08:59 PM Feature #21084 (Resolved): auth: add osd auth caps based on pool metadata
- 06:53 PM Bug #23614: local_reserver double-reservation of backfilled pg
- Looking through the code I don't see where the reservation is supposed to be released. I see releases for
- the p... - 06:52 PM Bug #23614 (Resolved): local_reserver double-reservation of backfilled pg
- - pg gets reservations (incl local_reserver)
- pg backfills, finishes
- ...apparently enver releases the reservatio... - 06:15 PM Bug #23365: CEPH device class not honored for erasure encoding.
- A quote from Greg Farnum on the crash from another ticket:...
- 06:13 PM Bug #23365: CEPH device class not honored for erasure encoding.
- I put 12.2.2, but that is incorrect. It is version ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) lu...
- 05:38 PM Bug #23365: CEPH device class not honored for erasure encoding.
- What version are you running? How are your OSDs configured?
There was a bug with BlueStore SSDs being misreported ... - 05:36 PM Bug #23371: OSDs flaps when cluster network is made down
- You tested this on a version prior to luminous and the behavior has *changed*?
This must be a result of some chang... - 05:24 PM Documentation #23613 (Closed): doc: add description of new fs-client auth profile
- 05:23 PM Documentation #23613 (Closed): doc: add description of new fs-client auth profile
- On that page: http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
https:... - 05:23 PM Documentation #23612 (New): doc: add description of new auth profiles
- On that page: http://docs.ceph.com/docs/master/rados/operations/user-management/#authorization-capabilities
https:... - 05:18 PM Support #23455 (Resolved): osd: large number of inconsistent objects after recover or backfilling
- fiemap is disabled by default precisely because there are a number of known bugs in the local filesystems across kern...
- 05:07 PM Bug #23610 (Fix Under Review): pg stuck in activating because of dropped pg-temp message
- https://github.com/ceph/ceph/pull/21310
- 05:02 PM Bug #23610 (Resolved): pg stuck in activating because of dropped pg-temp message
- http://pulpito.ceph.com/yuriw-2018-04-05_22:33:03-rados-wip-yuri3-testing-2018-04-05-1940-luminous-distro-basic-smith...
- 05:06 PM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- This is a dupe of...something. We can track it down later.
For now, note that the crash is happening with Hammer c... - 06:17 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- Hm hm hm
- 02:56 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- h3. rados bisect
Reproducer: ... - 02:11 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- This problem was not happening so reproducibly before the current integration run, so one of the following PRs might ...
- 02:05 AM Bug #23598: hammer->jewel: ceph_test_rados crashes during radosbench task in jewel rados upgrade ...
- Set priority to Urgent because this prevents us from getting a clean rados run in jewel 10.2.11 integration testing.
- 02:04 AM Bug #23598 (Duplicate): hammer->jewel: ceph_test_rados crashes during radosbench task in jewel ra...
- Test description: rados/upgrade/{hammer-x-singleton/{0-cluster/{openstack.yaml start.yaml} 1-hammer-install/hammer.ya...
- 04:39 PM Bug #23595: osd: recovery/backfill is extremely slow
- *I have it figured out!*
The issue was "osd_recovery_sleep_hdd", which defaults to 0.1 seconds.
After setting
... - 03:23 PM Bug #23595: osd: recovery/backfill is extremely slow
- OK, if I only have the 6 large files in the cephfs AND set the options...
- 02:55 PM Bug #23595: osd: recovery/backfill is extremely slow
- I have now tested with only the 6*1GB files, having deleted the 270k empty files from cephfs.
I continue to see ex... - 12:30 PM Bug #23595: osd: recovery/backfill is extremely slow
- You can find a core dump of the -O0 version created with GDB at http://nh2.me/ceph-issue-23595-osd-O0.core.xz
- 12:06 PM Bug #23595: osd: recovery/backfill is extremely slow
- Attached are two GDB runs of a sender node.
In the release build there were many values "<optimized out>", so I re... - 11:45 AM Bug #23595: osd: recovery/backfill is extremely slow
- On https://forum.proxmox.com/threads/increase-ceph-recovery-speed.36728/ people reported the same number as me of 10 ...
- 10:43 AM Bug #23601 (Resolved): Cannot see the value of parameter osd_op_threads via ceph --show-config
- I have set the parameter of "osd op threads" in configuration file
but I cannot see the value of parameter "osd op t... - 10:17 AM Bug #23403 (Need More Info): Mon cannot join quorum
- 07:23 AM Bug #23578 (In Progress): large-omap-object-warnings test fails
- https://github.com/ceph/ceph/pull/21295
- 01:33 AM Bug #23578: large-omap-object-warnings test fails
- We instruct the OSDs to scrub at around 16:15....
- 04:31 AM Bug #23593 (Fix Under Review): RESTControllerTest.test_detail_route and RESTControllerTest.test_f...
- 02:08 AM Bug #22123: osd: objecter sends out of sync with pg epochs for proxied ops
- Despite the jewel backport of this fix being merged, this problem has reappeared in jewel 10.2.11 integration testing...
04/08/2018
- 07:55 PM Bug #23595: osd: recovery/backfill is extremely slow
- For the record, I installed the following debugging packages for gdb stack traces:...
- 07:53 PM Bug #23595: osd: recovery/backfill is extremely slow
- I have read https://www.spinics.net/lists/ceph-devel/msg38331.html which suggests that there is some throttling going...
- 06:17 PM Bug #23595 (Duplicate): osd: recovery/backfill is extremely slow
- I made a Ceph 12.2.4 (luminous stable) cluster of 3 machines with 10-Gigabit networking on Ubuntu 16.04, using pretty...
- 05:40 PM Bug #23593: RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- PR: https://github.com/ceph/ceph/pull/21290
- 03:10 PM Bug #23593 (Resolved): RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- ...
- 04:31 PM Documentation #23594: auth: document what to do when locking client.admin out
- I found one way to fix it on the mailing list:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/01... - 04:23 PM Documentation #23594 (New): auth: document what to do when locking client.admin out
- I accidentally ran ...
- 11:06 AM Bug #23590: kstore: statfs: (95) Operation not supported
- https://github.com/ceph/ceph/pull/21287
- 11:01 AM Bug #23590 (Fix Under Review): kstore: statfs: (95) Operation not supported
- 2018-04-07 16:19:07.248 7fdec4675700 -1 osd.0 0 statfs() failed: (95) Operation not supported
2018-04-07 16:19:08.... - 08:50 AM Bug #23589 (New): jewel: KStore Segmentation fault in ceph_test_objectstore --gtest_filter=-*/2:-*/3
- Test description: rados/objectstore/objectstore.yaml
Log excerpt:... - 08:39 AM Bug #23588 (New): LibRadosAioEC.IsCompletePP test fails in jewel 10.2.11 integration testing
- Test description: rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yam...
- 06:53 AM Bug #23511: forwarded osd_failure leak in mon
- Greg, no. both tests below include the no_reply() fix.
see
- http://pulpito.ceph.com/kchai-2018-03-30_15:29:52-r... - 06:42 AM Bug #23585 (Duplicate): osd: safe_timer segfault
- ...
04/07/2018
- 03:04 AM Bug #23195: Read operations segfaulting multiple OSDs
Change the test-erasure-eio.sh test as following:...
04/06/2018
- 10:23 PM Bug #22165 (Fix Under Review): split pg not actually created, gets stuck in state unknown
- Fixed by https://github.com/ceph/ceph/pull/20469
- 09:29 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
- You'll definitely get more attention and advice if somebody else has hit this issue before.
- 08:45 PM Bug #23195: Read operations segfaulting multiple OSDs
- For anyone running into the send_all_remaining_reads() crash, a workaround is to use these osd settings:...
- 04:17 PM Bug #23195 (Fix Under Review): Read operations segfaulting multiple OSDs
- https://github.com/ceph/ceph/pull/21273
I'm going to treat this issue as tracking the first crash, in send_all_rem... - 03:10 AM Bug #23195 (In Progress): Read operations segfaulting multiple OSDs
- 08:41 PM Bug #23200 (Resolved): invalid JSON returned when querying pool parameters
- 08:40 PM Backport #23312 (Resolved): luminous: invalid JSON returned when querying pool parameters
- 07:28 PM Backport #23312: luminous: invalid JSON returned when querying pool parameters
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20890
merged - 08:40 PM Bug #23324 (Resolved): delete type mismatch in CephContext teardown
- 08:40 PM Backport #23412 (Resolved): luminous: delete type mismatch in CephContext teardown
- 07:28 PM Backport #23412: luminous: delete type mismatch in CephContext teardown
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20998
merged - 08:38 PM Bug #23477 (Resolved): should not check for VERSION_ID
- 08:38 PM Backport #23478 (Resolved): should not check for VERSION_ID
- 07:26 PM Backport #23478: should not check for VERSION_ID
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21090
merged - 06:03 PM Bug #21833 (Resolved): Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
- 06:02 PM Backport #23160 (Resolved): luminous: Multiple asserts caused by DNE pgs left behind after lots o...
- 03:57 PM Backport #23160: luminous: Multiple asserts caused by DNE pgs left behind after lots of OSD restarts
- Prashant D wrote:
> Waiting for code review for backport PR : https://github.com/ceph/ceph/pull/20668
merged - 06:02 PM Bug #23078 (Resolved): SRV resolution fails to lookup AAAA records
- 06:02 PM Backport #23174 (Resolved): luminous: SRV resolution fails to lookup AAAA records
- 03:56 PM Backport #23174: luminous: SRV resolution fails to lookup AAAA records
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20710
merged - 05:57 PM Backport #23472 (Resolved): luminous: add --add-bucket and --move options to crushtool
- 03:53 PM Backport #23472: luminous: add --add-bucket and --move options to crushtool
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21079
mergedReviewed-by: Josh Durgin <jdurgin@redhat.com> - 05:37 PM Bug #23578 (Resolved): large-omap-object-warnings test fails
- ...
- 03:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Sorry, forgot to mention I am running 12.2.4.
- 03:50 PM Bug #23576 (Can't reproduce): osd: active+clean+inconsistent pg will not scrub or repair
- My apologies if I'm too premature in posting this.
Myself and so far two others on the mailing list: http://lists.... - 03:44 AM Bug #23345 (Resolved): `ceph osd erasure-code-profile set` crashes the monitors on vstart clusters
- https://github.com/ceph/ceph/pull/20986
- 01:57 AM Bug #21737 (Resolved): OSDMap cache assert on shutdown
- 01:56 AM Backport #21786 (Resolved): jewel: OSDMap cache assert on shutdown
04/05/2018
- 09:12 PM Bug #22887 (Duplicate): osd/ECBackend.cc: 2202: FAILED assert((offset + length) <= (range.first.g...
- 09:12 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- From #22887, this also appeared in /ceph/teuthology-archive/pdonnell-2018-01-30_23:38:56-kcephfs-wip-pdonnell-i22627-...
- 09:09 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- That was the fix I was wondering about, but it was merged to master as https://github.com/ceph/ceph/pull/15712 and so...
- 09:05 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- https://github.com/ceph/ceph/pull/15712
- 09:10 PM Bug #19882: rbd/qemu: [ERR] handle_sub_read: Error -2 reading 1:e97125f5:::rbd_data.0.10251ca0c5f...
- https://github.com/ceph/ceph/pull/15712
- 06:35 PM Bug #22351 (Resolved): Couldn't init storage provider (RADOS)
- 06:35 PM Backport #23349 (Resolved): luminous: Couldn't init storage provider (RADOS)
- 05:22 PM Backport #23349: luminous: Couldn't init storage provider (RADOS)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20896
merged - 06:33 PM Bug #22114 (Resolved): mon: ops get stuck in "resend forwarded message to leader"
- 06:33 PM Backport #23077 (Resolved): luminous: mon: ops get stuck in "resend forwarded message to leader"
- 04:57 PM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21016
merged - 04:57 PM Backport #23077: luminous: mon: ops get stuck in "resend forwarded message to leader"
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21016
merged - 06:31 PM Bug #22752 (Resolved): snapmapper inconsistency, crash on luminous
- 06:31 PM Backport #23500 (Resolved): luminous: snapmapper inconsistency, crash on luminous
- 04:55 PM Backport #23500: luminous: snapmapper inconsistency, crash on luminous
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21118
merged
- 05:14 PM Bug #23565 (Fix Under Review): Inactive PGs don't seem to cause HEALTH_ERR
- In looking at https://tracker.ceph.com/issues/23562, there were inactive PGs starting at...
- 04:43 PM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
- ...
- 04:18 PM Bug #23564 (Duplicate): OSD Segfaults
- Apr 5 11:40:31 roc05r-sc3a100 kernel: [126029.543698] safe_timer[28863]: segfault at 8d ip 00007fa9ad4dcccb sp 00007...
- 12:24 PM Bug #23562 (New): VDO OSD caused cluster to hang
- I awoke to alerts that apache serving teuthology logs on the Octo Long Running Cluster was unresponsive.
Here was ... - 08:37 AM Bug #23439: Crashing OSDs after 'ceph pg repair'
- Hi Greg,
thanks for your response.
> That URL denies access. You can use ceph-post-file instead to upload logs ... - 03:31 AM Bug #23403: Mon cannot join quorum
- My apologies. It appears my previous analysis was incorrect.
I've pored over the logs and it appears the issue is ...
04/04/2018
- 11:19 PM Bug #23554: mon: mons need to be aware of VDO statistics
- Right, but AFAICT the monitor is then not even aware of VDO being involved. Which seems fine to my naive thoughts, bu...
- 11:05 PM Bug #23554: mon: mons need to be aware of VDO statistics
- Of course Sage is already on it :)
I don't know where the ... - 10:46 PM Bug #23554: mon: mons need to be aware of VDO statistics
- At least this: https://github.com/ceph/ceph/pull/20516
- 10:44 PM Bug #23554: mon: mons need to be aware of VDO statistics
- What would we expect this monitor awareness to look like? Extra columns duplicating the output of vdostats?
- 05:48 PM Bug #23554 (New): mon: mons need to be aware of VDO statistics
- I created an OSD on top of a logical volume with a VDO device underneath.
Ceph is unaware of how much compression ... - 09:58 PM Bug #23128: invalid values in ceph.conf do not issue visible warnings
- http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/ has been updated with information about this
- 09:53 PM Bug #23273: segmentation fault in PrimaryLogPG::recover_got()
- Can you reproduce with osds configured with:...
- 09:43 PM Bug #23439: Crashing OSDs after 'ceph pg repair'
- That URL denies access. You can use ceph-post-file instead to upload logs to a secure location.
It's not clear wha... - 09:39 PM Bug #23320 (Fix Under Review): OSD suicide itself because of a firewall rule but reports a receiv...
- github.com/ceph/ceph/pull/21000
- 09:37 PM Bug #23487: There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- 09:31 PM Bug #23510 (Resolved): rocksdb spillover for hard drive configurations
- 09:31 PM Bug #23511: forwarded osd_failure leak in mon
- Kefu, did your latest no_reply() PR resolve this?
- 09:29 PM Bug #23535 (Closed): 'ceph --show-config --conf /dev/null' does not work any more
- Yeah, you should use the monitor config commands now! :)
- 09:28 PM Bug #23258: OSDs keep crashing.
- Brian, that's a separate bug; the code address you've picked up on is just part of the generic failure handling code....
- 09:19 PM Bug #23258: OSDs keep crashing.
- I was about to start a new bug and found this, I am also seeing 0xa74234 and ceph::__ceph_assert_fail...
A while b... - 09:22 PM Bug #20924: osd: leaked Session on osd.7
- /a/sage-2018-04-04_02:28:04-rados-wip-sage2-testing-2018-04-03-1634-distro-basic-smithi/2351291
rados/verify/{ceph... - 09:21 PM Bug #18746: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+kraken)
- Under discussion on the PR, which is good on its own terms but suffering from a prior CephFS bug. :(
- 09:19 PM Bug #23297: mon-seesaw 'failed to become clean before timeout' due to laggy pg create
- I suspect this is resolved in https://github.com/ceph/ceph/pull/19973 by the commit that has the OSDs proactively go ...
- 09:16 PM Bug #23490: luminous: osd: double recovery reservation for PG when EIO injected (while already re...
- David, can you look at this when you get a chance? I think it's due to EIO triggering recovery when recovery is alrea...
- 09:13 PM Bug #23204: missing primary copy of object in mixed luminous<->master cluster with bluestore
- We should see this again as we run the upgrade suite for mimic...
- 09:08 PM Bug #22902 (Resolved): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
- https://github.com/ceph/ceph/pull/20933
- 09:07 PM Bug #23267 (Pending Backport): scrub errors not cleared on replicas can cause inconsistent pg sta...
- 07:25 PM Backport #23413 (Resolved): jewel: delete type mismatch in CephContext teardown
- 07:23 PM Bug #20471 (Resolved): Can't repair corrupt object info due to bad oid on all replicas
- 07:23 PM Backport #23181 (Resolved): jewel: Can't repair corrupt object info due to bad oid on all replicas
- 06:24 PM Bug #21758 (Resolved): cli/crushtools/build.t sometimes fails in jenkins' "make check" run
- 06:24 PM Backport #21784 (Resolved): jewel: cli/crushtools/build.t sometimes fails in jenkins' "make check...
- 06:18 PM Feature #23242 (Resolved): ceph-objectstore-tool command to trim the pg log
- 06:18 PM Backport #23307 (Resolved): jewel: ceph-objectstore-tool command to trim the pg log
- 08:14 AM Feature #23552 (New): cache PK11Context in Connection and probably other consumers of CryptoKeyHa...
- please see attached flamegraph, the 0.67% CPU cycle is used by PK11_CreateContextBySymKey(), if we cache the PK11Cont...
Also available in: Atom