Activity
From 04/10/2018 to 05/09/2018
05/09/2018
- 09:24 PM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
- The pg is waiting for state from osd.33 - can you use ceph-post-file to upload the full log from the crash?
You mi... - 09:11 PM Bug #24000: mon: snap delete on deleted pool returns 0 without proper payload
- 09:11 PM Bug #24006: ceph-osd --mkfs has nondeterministic output
- Sounds like we need to flush the log before exiting in ceph-osd.
- 09:08 PM Bug #23879: test_mon_osdmap_prune.sh fails
- Sounds like we need to block for trimming sometimes when there's a constant propose workload.
- 09:02 PM Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value...
- Sounds like a use-after-free of some sort, unrelated to other crashes we've seen.
- 08:48 PM Bug #24057: cbt fails to copy results to the archive dir
- This seems to be an issue with cbt not being able to copy output files to its archive dir, and hence we don't find th...
- 12:00 PM Bug #24057: cbt fails to copy results to the archive dir
- Neha, mind taking a look? i've run into this failure couple times.
- 11:59 AM Bug #24057 (Rejected): cbt fails to copy results to the archive dir
- /a/kchai-2018-05-08_12:15:21-rados-wip-kefu-testing2-2018-05-08-1834-distro-basic-mira/2501280...
- 06:44 PM Backport #24068 (Resolved): luminous: osd sends op_reply out of order
- https://github.com/ceph/ceph/pull/23137
- 06:38 PM Bug #23827 (Pending Backport): osd sends op_reply out of order
- 04:45 PM Bug #23827 (Fix Under Review): osd sends op_reply out of order
- The cause for this issue is that we are not tracking enough dup ops for this test, which does multiple writes to the ...
- 04:01 PM Backport #24059 (Resolved): luminous: Deleting a pool with active notify linger ops can result in...
- https://github.com/ceph/ceph/pull/22143
- 04:01 PM Backport #24058 (Resolved): jewel: Deleting a pool with active notify linger ops can result in se...
- https://github.com/ceph/ceph/pull/22188
- 02:21 PM Bug #24022 (Fix Under Review): "ceph tell osd.x bench" writes resulting JSON to stderr instead of...
- I tend to agree: https://github.com/ceph/ceph/pull/21905
- 02:09 PM Backport #24026 (Resolved): mimic: pg-upmap cannot balance in some case
- 12:11 PM Bug #23966 (Pending Backport): Deleting a pool with active notify linger ops can result in seg fault
- 08:05 AM Bug #23851 (Resolved): OSD crashes on empty snapset
- 08:05 AM Backport #23852 (Resolved): luminous: OSD crashes on empty snapset
05/08/2018
- 11:09 PM Support #22531: OSD flapping under repair/scrub after recieve inconsistent PG LFNIndex.cc: 439: F...
- For the record...
I was also suffering this problem on a pg repair. That was because I was following the procedure... - 11:05 PM Backport #23852: luminous: OSD crashes on empty snapset
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21638
merged - 09:37 PM Bug #23909 (Resolved): snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a...
- 08:56 PM Backport #24048 (Resolved): luminous: pg-upmap cannot balance in some case
- https://github.com/ceph/ceph/pull/22115
- 05:08 PM Bug #20876: BADAUTHORIZER on mgr, hung ceph tell mon.*
- Triggered on Luminous 12.2.5 again.
Mon quorum worked as expected, after all monitors restart, not healed. all pgs... - 04:53 PM Bug #24045 (Resolved): Eviction still raced with scrub due to preemption
We put code in cache tier eviction to check the scrub range, but that isn't sufficient. During scrub preemption re...- 06:58 AM Backport #23850 (In Progress): luminous: Read operations segfaulting multiple OSDs
- -https://github.com/ceph/ceph/pull/21873-
- 06:48 AM Bug #23402: objecter: does not resend op on split interval
- we also met this problem with osd_debug_op_order=true, that result "out of order" assert
- 04:30 AM Backport #24042 (In Progress): luminous: ceph-disk log is written to /var/run/ceph
- 04:30 AM Backport #24042 (Resolved): luminous: ceph-disk log is written to /var/run/ceph
- https://github.com/ceph/ceph/pull/21870
- 04:28 AM Bug #24041: ceph-disk log is written to /var/run/ceph
- https://github.com/ceph/ceph/pull/18375
- 04:28 AM Bug #24041 (Resolved): ceph-disk log is written to /var/run/ceph
- it should go to /var/log/ceph
05/07/2018
- 08:38 PM Bug #24037 (Resolved): osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_nod...
- ...
- 07:24 PM Bug #23909: snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a1,1a3 was r...
Nevermind. I see you branch was still on ci repo.
$ git branch --contains c20a95b0b9f4082dcebb339135683b91fe39e...- 07:18 PM Bug #23909 (Need More Info): snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,...
Does your branch include c20a95b0b9f4082dcebb339135683b91fe39ec0a? The change I made was needed to make that fix w...- 05:25 PM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
- Alternative Mimic fix: https://github.com/ceph/ceph/pull/21859
- 02:55 PM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
- will reset the member variables of C_notify_Finish in its dtor for debugging, to see if it has been destroyed or not ...
- 07:43 AM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
- the test still fails with the fixes above: /a/kchai-2018-05-06_15:50:41-rados-wip-kefu-testing-2018-05-06-2204-distro...
- 03:34 PM Bug #24033 (Fix Under Review): rados: not all exceptions accept keyargs
- 01:19 PM Bug #24033: rados: not all exceptions accept keyargs
- https://github.com/ceph/ceph/pull/21853
- 12:55 PM Bug #24033 (Resolved): rados: not all exceptions accept keyargs
- The method make_ex() in rados.pyx raises exceptions irrespective of the fact whether an exception can or cannot handl...
- 02:15 AM Backport #23925 (In Progress): luminous: assert on pg upmap
- 12:05 AM Bug #24023: Segfault on OSD in 12.2.5
- Another one occurred today on a different OSD:
2018-05-06 19:48:33.636221 7f0f55922700 -1 *** Caught signal (Segme...
05/06/2018
- 09:01 AM Backport #23925: luminous: assert on pg upmap
- https://github.com/ceph/ceph/pull/21818
- 08:57 AM Bug #23921 (Pending Backport): pg-upmap cannot balance in some case
- 03:35 AM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
- mimic: https://github.com/ceph/ceph/pull/21834
- 03:32 AM Backport #24027 (In Progress): mimic: ceph_daemon.py format_dimless units list index out of range
- 03:30 AM Backport #24027 (Resolved): mimic: ceph_daemon.py format_dimless units list index out of range
- https://github.com/ceph/ceph/pull/21836
- 03:29 AM Bug #23962 (Pending Backport): ceph_daemon.py format_dimless units list index out of range
- 03:28 AM Backport #24026 (In Progress): mimic: pg-upmap cannot balance in some case
- 03:27 AM Backport #24026 (Resolved): mimic: pg-upmap cannot balance in some case
- https://github.com/ceph/ceph/pull/21835
- 03:24 AM Bug #23627 (Resolved): Error EACCES: problem getting command descriptions from mgr.None from 'cep...
05/05/2018
- 08:32 PM Bug #24025: RocksDB compression is not supported at least on Debian.
- I use:
deb https://download.ceph.com/debian-luminous/ stretch main
Ceph 12.2.5 and Debian 9. - 08:31 PM Bug #24025 (Resolved): RocksDB compression is not supported at least on Debian.
- ...
- 04:20 PM Bug #23909: snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a1,1a3 was r...
- http://pulpito.ceph.com/kchai-2018-05-05_14:56:43-rados-wip-kefu-testing-2018-05-05-1912-distro-basic-smithi/
<pre... - 01:47 PM Backport #23904 (Resolved): luminous: Deleting a pool with active watch/notify linger ops can res...
- 11:55 AM Bug #24023 (Duplicate): Segfault on OSD in 12.2.5
- 2018-05-05 06:33:42.383231 7f83289a4700 -1 *** Caught signal (Segmentation fault) **
in thread 7f83289a4700 thread_... - 11:23 AM Bug #24022: "ceph tell osd.x bench" writes resulting JSON to stderr instead of stdout.
- Maybe not only this command, but also some others.
- 11:23 AM Bug #24022 (Resolved): "ceph tell osd.x bench" writes resulting JSON to stderr instead of stdout.
- 11:05 AM Bug #23627: Error EACCES: problem getting command descriptions from mgr.None from 'ceph tell mgr'
- https://github.com/ceph/ceph/pull/21832
- 10:57 AM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
- master: https://github.com/ceph/ceph/pull/21831
- 08:57 AM Bug #21977 (Resolved): null map from OSDService::get_map in advance_pg
- 08:57 AM Backport #23870 (Resolved): luminous: null map from OSDService::get_map in advance_pg
- 08:56 AM Backport #24016: luminous: scrub interaction with HEAD boundaries and snapmapper repair is broken
- Quoting David Zafman, PR to backport is:
https://github.com/ceph/ceph/pull/21546
Backport the entire pull reque...
05/04/2018
- 07:01 PM Backport #23784 (Resolved): luminous: osd: Warn about objects with too many omap entries
- 05:13 PM Backport #23784: luminous: osd: Warn about objects with too many omap entries
- Vikhyat Umrao wrote:
> https://github.com/ceph/ceph/pull/21518
merged - 06:22 PM Bug #24000: mon: snap delete on deleted pool returns 0 without proper payload
- Jason put a client-side handler in, but we should change the monitor as well so that we don't break older clients (or...
- 05:16 PM Backport #23904: luminous: Deleting a pool with active watch/notify linger ops can result in seg ...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21752
merged - 05:14 PM Backport #23870: luminous: null map from OSDService::get_map in advance_pg
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21737
merged - 03:20 PM Backport #24016 (Resolved): luminous: scrub interaction with HEAD boundaries and snapmapper repai...
- Included in
https://github.com/ceph/ceph/pull/22044 - 03:19 PM Backport #24015 (Resolved): luminous: UninitCondition in PG::RecoveryState::Incomplete::react(PG:...
- https://github.com/ceph/ceph/pull/21993
- 02:30 PM Bug #23921 (Fix Under Review): pg-upmap cannot balance in some case
- 02:30 PM Bug #23921: pg-upmap cannot balance in some case
- https://github.com/ceph/ceph/pull/21815
- 08:29 AM Bug #24007 (New): rados.connect get a segmentation fault
- if i try to use librados in this follow way, i will get a segmentation fault.
!http://img0.ph.126.net/ekMbDVzMROb-o_... - 04:04 AM Feature #22420 (Fix Under Review): Add support for obtaining a list of available compression options
- https://github.com/ceph/ceph/pull/21809
- 01:47 AM Bug #24006 (New): ceph-osd --mkfs has nondeterministic output
- On 12.2.3, my `ceph-osd` has nondeterministic output. I'm running it s root.
Sometimes it prints "created object s... - 12:46 AM Bug #22881: scrub interaction with HEAD boundaries and snapmapper repair is broken
- https://github.com/ceph/ceph/pull/21546
Backport the entire pull request which also fixes http://tracker.ceph.com/... - 12:43 AM Bug #22881 (Pending Backport): scrub interaction with HEAD boundaries and snapmapper repair is br...
- 12:45 AM Bug #23909 (Resolved): snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a...
- Included in https://github.com/ceph/ceph/pull/21546
05/03/2018
- 10:30 PM Bug #23980 (Pending Backport): UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap...
- 01:45 PM Bug #23980 (Fix Under Review): UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap...
- https://github.com/ceph/ceph/pull/21798
- 01:03 AM Bug #23980 (Resolved): UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap const&)
- ...
- 08:56 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Are there messages "not scheduling scrubs due to active recovery" in the logs on any of the primary OSDs? That messa...
- 08:40 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Ran into something similar this past week. ( active+clean+inconsistent) where forced scrubs would not run. The foll...
- 07:27 PM Bug #24000 (Fix Under Review): mon: snap delete on deleted pool returns 0 without proper payload
- *PR*: https://github.com/ceph/ceph/pull/21804
- 07:21 PM Bug #24000 (Resolved): mon: snap delete on deleted pool returns 0 without proper payload
- It can lead to an abort in the client application since an empty reply w/o an error code is constructed in the monito...
- 03:44 PM Documentation #23999 (Resolved): osd_recovery_priority is not documented (but osd_recovery_op_pri...
- Please document osd_recovery_priority and how it differs from osd_recovery_op_priority.
- 02:48 PM Bug #23961 (Duplicate): valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::re...
- 02:18 PM Backport #23998 (Resolved): luminous: osd/EC: slow/hung ops in multimds suite test
- https://github.com/ceph/ceph/pull/24393
- 02:08 PM Backport #23915 (Resolved): luminous: monitors crashing ./include/interval_set.h: 355: FAILED ass...
- 01:51 PM Backport #23915: luminous: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jew...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21717
merged - 01:40 PM Bug #23769 (Pending Backport): osd/EC: slow/hung ops in multimds suite test
- 11:58 AM Feature #22420 (New): Add support for obtaining a list of available compression options
- i am reopening this ticket. as the plugin registry is empty before any of the supported compressor plugin is created ...
- 11:27 AM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
- I didn't import or export any pgs, that was working osd in the cluster.
Is it possible that the restart of the osd ... - 10:28 AM Backport #23988 (Resolved): luminous: luminous->master: luminous crashes with AllReplicasRecovere...
- https://github.com/ceph/ceph/pull/21964
- 10:27 AM Backport #23986 (Resolved): luminous: recursive lock of objecter session::lock on cancel
- https://github.com/ceph/ceph/pull/21939
- 05:21 AM Bug #22220: osd/ReplicatedPG.h:1667:14: internal compiler error: in force_type_die, at dwarf2out....
- https://access.redhat.com/errata/RHBA-2018:1293
- 01:37 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
- Jason Dillaman wrote:
> Moving to RADOS since it sounds like it's an issue of corruption on your cache tier.
How ... - 01:00 AM Bug #22656: scrub mismatch on bytes (cache pools)
- /a/sage-2018-05-02_22:22:16-rados-wip-sage3-testing-2018-05-02-1448-distro-basic-smithi/2468046
description: rados... - 12:20 AM Feature #23979 (Resolved): Limit pg log length during recovery/backfill so that we don't run out ...
This means if there's another failure, we'll need to restart backfill or go from recovery to backfill, but that's b...
05/02/2018
- 09:02 PM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
- Did you import or export any PGs? The on-disk pg info from comment #2 indicates the pg doesn't exist on osd.33 yet.
... - 08:53 PM Bug #23961: valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::react(PG::AdvM...
- What PRs were in the test branch that hit this? Did any of them change the PG class or related structures?
- 12:23 PM Bug #23961: valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::react(PG::AdvM...
- rerunning this test with another branch did not reproduce this issue.
http://pulpito.ceph.com/kchai-2018-05-02_11:... - 01:50 AM Bug #23961 (Duplicate): valgrind reports UninitCondition in osd PG::RecoveryState::Incomplete::re...
- ...
- 08:48 PM Bug #23830: rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
- The pg meta object is supposed to be empty since many versions ago. IIRC sage suggested this may be from a race that ...
- 08:42 PM Bug #23860 (Pending Backport): luminous->master: luminous crashes with AllReplicasRecovered in St...
- 08:40 PM Bug #23942 (Duplicate): test_mon_osdmap_prune.sh failures
- 07:50 PM Bug #23769 (Fix Under Review): osd/EC: slow/hung ops in multimds suite test
- https://github.com/ceph/ceph/pull/21684
- 05:26 PM Bug #23966 (Fix Under Review): Deleting a pool with active notify linger ops can result in seg fault
- *PR*: https://github.com/ceph/ceph/pull/21786
- 04:00 PM Bug #23966 (In Progress): Deleting a pool with active notify linger ops can result in seg fault
- 03:51 PM Bug #23966 (Resolved): Deleting a pool with active notify linger ops can result in seg fault
- It's possible that if a notification is sent while a pool is being deleted, the Objecter will fail the Op w/ -ENOENT ...
- 02:50 PM Bug #23965 (New): FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cach...
- teuthology run with debug-ms 1 at http://pulpito.ceph.com/joshd-2018-05-01_18:40:57-rgw-master-distro-basic-smithi/
- 01:42 PM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- http://pulpito.ceph.com/pdonnell-2018-05-01_20:58:18-multimds-wip-pdonnell-testing-20180501.191840-testing-basic-smit...
- 11:47 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
- Moving to RADOS since it sounds like it's an issue of corruption on your cache tier.
- 02:41 AM Bug #23119: MD5-checksum of the snapshot for rbd image in Ceph(as OpenStack-Glance backend Storag...
- More discovery:
The snapshot exported from cache tier(rep_glance pool) is an all-zero file (viewed by "od xxx.snap... - 11:40 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- We frequently experience this with 12.2.3 running Ceph in a Kubernetes cluster, cf. https://github.com/ceph/ceph-cont...
- 11:32 AM Bug #23952: "ceph -f json osd pool ls detail" has missing pool namd and pool id
- Sorry, pool_name is here. Only pool id is missing.
- 10:11 AM Bug #23952: "ceph -f json osd pool ls detail" has missing pool namd and pool id
- Are you sure you're not getting pool name? I'm getting a pool_name field when I try this, and it appears to have bee...
- 11:04 AM Backport #23924 (In Progress): luminous: LibRadosAio.PoolQuotaPP failed
- https://github.com/ceph/ceph/pull/21778
- 06:53 AM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- Any update? Mentioned workaround is not good idea for us.
- 06:42 AM Bug #23949 (Resolved): osd: "failed to encode map e19 with expected crc" in cluster log "
- 05:22 AM Bug #23962 (Fix Under Review): ceph_daemon.py format_dimless units list index out of range
- https://github.com/ceph/ceph/pull/21765
- 04:02 AM Bug #23962: ceph_daemon.py format_dimless units list index out of range
- sorry, the actual max magnitude is EB level instead of ZB.
- 03:48 AM Bug #23962 (Resolved): ceph_daemon.py format_dimless units list index out of range
- The largest order of magnitude of original list max only to the PB level,however the ceph cluster Objecter actv metri...
- 03:31 AM Backport #23914 (In Progress): luminous: cache-try-flush hits wrlock, busy loops
- https://github.com/ceph/ceph/pull/21764
05/01/2018
- 06:31 PM Bug #23827: osd sends op_reply out of order
- For object 10000000004.00000004 osd_op_reply for 102425 is received before 93353....
- 05:52 PM Bug #23949 (Fix Under Review): osd: "failed to encode map e19 with expected crc" in cluster log "
- https://github.com/ceph/ceph/pull/21756
- 03:53 PM Bug #23949: osd: "failed to encode map e19 with expected crc" in cluster log "
- /a/sage-2018-05-01_15:25:33-fs-master-distro-basic-smithi/2462491
reproduces on master - 03:09 PM Bug #23949 (In Progress): osd: "failed to encode map e19 with expected crc" in cluster log "
- 03:09 PM Bug #23949: osd: "failed to encode map e19 with expected crc" in cluster log "
- ...
- 02:17 PM Bug #23949: osd: "failed to encode map e19 with expected crc" in cluster log "
- More from master: http://pulpito.ceph.com/pdonnell-2018-05-01_03:21:36-fs-master-testing-basic-smithi/
- 05:26 PM Bug #23940 (Pending Backport): recursive lock of objecter session::lock on cancel
- 02:39 PM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
- The problem of left-over OSD data still persists when the partition table has been removed before "ceph-disk zap" is ...
- 12:42 PM Backport #23905 (In Progress): jewel: Deleting a pool with active watch/notify linger ops can res...
- https://github.com/ceph/ceph/pull/21754
- 11:36 AM Backport #23904 (In Progress): luminous: Deleting a pool with active watch/notify linger ops can ...
- https://github.com/ceph/ceph/pull/21752
- 07:01 AM Bug #23952 (New): "ceph -f json osd pool ls detail" has missing pool namd and pool id
- `ceph osd pool ls detail` shows information about pool id and pool name, but with '-f json' this information disappears.
04/30/2018
- 11:10 PM Bug #23949 (Resolved): osd: "failed to encode map e19 with expected crc" in cluster log "
- http://pulpito.ceph.com/pdonnell-2018-04-30_21:17:21-fs-wip-pdonnell-testing-20180430.193008-testing-basic-smithi/245...
- 05:46 PM Bug #23860: luminous->master: luminous crashes with AllReplicasRecovered in Started/Primary/Activ...
- 05:25 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025569.html
Paul Emmerich wrote:
> looks like it fai... - 12:28 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- (Pulling backtrace into the ticket)
- 03:57 PM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
- This pg has 0 value in same_interval_since. I checked this with following output:
https://paste.fedoraproject.org/pa... - 01:12 PM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
- I found a little more...
- 03:48 PM Bug #23942 (Duplicate): test_mon_osdmap_prune.sh failures
- ...
- 02:55 PM Bug #23922 (Resolved): ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- 01:44 PM Bug #23922 (Fix Under Review): ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- https://github.com/ceph/ceph/pull/21739
- 01:32 PM Bug #23922: ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- ...
- 01:06 PM Bug #23922: ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- failed to reproduce this issue locally.
adding... - 11:00 AM Bug #23922: ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- http://pulpito.ceph.com/kchai-2018-04-30_00:59:17-rados-wip-kefu-testing-2018-04-29-1248-distro-basic-smithi/2454246/
- 02:53 PM Bug #23940 (Fix Under Review): recursive lock of objecter session::lock on cancel
- https://github.com/ceph/ceph/pull/21742
- 02:30 PM Bug #23940 (Resolved): recursive lock of objecter session::lock on cancel
- ...
- 12:30 PM Backport #23870 (In Progress): luminous: null map from OSDService::get_map in advance_pg
- https://github.com/ceph/ceph/pull/21737
04/29/2018
- 11:46 PM Bug #23937 (New): FAILED assert(info.history.same_interval_since != 0)
- Two of our osds hit these assert and now they are down....
- 10:23 AM Bug #22354 (Resolved): v12.2.2 unable to create bluestore osd using ceph-disk
- 10:23 AM Backport #23103 (Resolved): luminous: v12.2.2 unable to create bluestore osd using ceph-disk
- 10:22 AM Bug #22082 (Resolved): Various odd clog messages for mons
- 10:21 AM Backport #22167 (Resolved): luminous: Various odd clog messages for mons
- 10:21 AM Bug #22090 (Resolved): cluster [ERR] Unhandled exception from module 'balancer' while running on ...
- 10:20 AM Backport #22164 (Resolved): luminous: cluster [ERR] Unhandled exception from module 'balancer' wh...
- 10:20 AM Bug #21993 (Resolved): "ceph osd create" is not idempotent
- 10:20 AM Backport #22019 (Resolved): luminous: "ceph osd create" is not idempotent
- 10:19 AM Bug #21203 (Resolved): build_initial_pg_history doesn't update up/acting/etc
- 10:19 AM Backport #21236 (Resolved): luminous: build_initial_pg_history doesn't update up/acting/etc
- 07:07 AM Bug #21206 (Resolved): thrashosds read error injection doesn't take live_osds into account
- 07:07 AM Backport #21235 (Resolved): luminous: thrashosds read error injection doesn't take live_osds into...
- 06:22 AM Backport #23915 (In Progress): luminous: monitors crashing ./include/interval_set.h: 355: FAILED ...
- 05:44 AM Backport #22934: luminous: filestore journal replay does not guard omap operations
- https://github.com/ceph/ceph/pull/21547
04/28/2018
- 10:32 PM Backport #23915: luminous: monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jew...
- https://github.com/ceph/ceph/pull/21717
- 07:11 PM Backport #23926 (Rejected): luminous: disable bluestore cache caused a rocksdb error
- 07:11 PM Backport #23925 (Resolved): luminous: assert on pg upmap
- https://github.com/ceph/ceph/pull/21818
- 07:11 PM Backport #23924 (Resolved): luminous: LibRadosAio.PoolQuotaPP failed
- https://github.com/ceph/ceph/pull/21778
- 06:19 PM Bug #23816 (Pending Backport): disable bluestore cache caused a rocksdb error
- 06:17 PM Bug #23878 (Pending Backport): assert on pg upmap
- 06:17 PM Bug #23916 (Pending Backport): LibRadosAio.PoolQuotaPP failed
- 06:16 PM Bug #23922 (Resolved): ENOMEM from ceph_test_rados and ceph_test_cls_rbd (hammer client)
- ...
- 04:23 AM Bug #23921: pg-upmap cannot balance in some case
- But if i unlink all osds from 'root default / host huangjun', every thing works ok....
- 04:04 AM Bug #23921 (Resolved): pg-upmap cannot balance in some case
- I have a cluster with 21 osds, cluster topology is...
04/27/2018
- 10:38 PM Bug #23916 (Fix Under Review): LibRadosAio.PoolQuotaPP failed
- https://github.com/ceph/ceph/pull/21709
- 09:22 PM Bug #23916 (Resolved): LibRadosAio.PoolQuotaPP failed
- http://qa-proxy.ceph.com/teuthology/yuriw-2018-04-27_16:52:05-rados-wip-yuri-testing-2018-04-27-1519-distro-basic-smi...
- 10:27 PM Bug #23917 (Duplicate): LibRadosAio.PoolQuotaPP failure
- 10:24 PM Bug #23917 (Duplicate): LibRadosAio.PoolQuotaPP failure
- ...
- 08:07 PM Backport #23915 (Resolved): luminous: monitors crashing ./include/interval_set.h: 355: FAILED ass...
- https://github.com/ceph/ceph/pull/21717
- 08:06 PM Backport #23914 (Resolved): luminous: cache-try-flush hits wrlock, busy loops
- https://github.com/ceph/ceph/pull/21764
- 08:01 PM Bug #23860 (Fix Under Review): luminous->master: luminous crashes with AllReplicasRecovered in St...
- https://github.com/ceph/ceph/pull/21706
- 07:30 PM Bug #18746 (Pending Backport): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) ...
- 07:28 PM Bug #23664 (Pending Backport): cache-try-flush hits wrlock, busy loops
- 07:28 PM Bug #21165 (Can't reproduce): 2 pgs stuck in unknown during thrashing
- 07:27 PM Bug #23788 (Duplicate): luminous->mimic: EIO (crc mismatch) on copy-get from ec pool
- I think this was a dup of #23871
- 07:24 PM Backport #23912 (Resolved): luminous: mon: High MON cpu usage when cluster is changing
- https://github.com/ceph/ceph/pull/21968
- 07:17 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
- The zap run in this is definitely not zero'ing the first block based on log output...
- 06:49 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
- we clean more than 100m but i think its from the end
https://github.com/ceph/ceph/blob/luminous/src/ceph-disk/ceph... - 06:25 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
- Thanks alfredo
It shows that zap is not working now, I think we should fix the ceph-disk zap to properly clean the... - 06:07 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
- Looking at the logs for the OSD that failed:...
- 05:48 PM Bug #23911: ceph:luminous: osd out/down when setup with ubuntu/bluestore
- seen in both 14.04, 16.04 and centos for bluestore option only
14.04:
http://qa-proxy.ceph.com/teuthology/teuth... - 05:45 PM Bug #23911 (Won't Fix - EOL): ceph:luminous: osd out/down when setup with ubuntu/bluestore
- this could be a systemd issue or more,
a) setup cluster using ceph-deploy
b) use ceph-disk/bluestore option for ... - 05:26 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
- Moving this back to RADOS as it seems the new consensus is that it's a RADOS bug.
- 06:46 AM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
- From message: "error (2) No such file or directory not handled on operation 0x55e1ce80443c (21888.1.0, or op 0, count...
- 04:38 PM Bug #23893 (Resolved): jewel clients fail to decode mimic osdmap
- it was a bug in wip-osdmap-encode, fixed before merge
- 04:14 PM Bug #23713 (Pending Backport): High MON cpu usage when cluster is changing
- 03:01 PM Bug #23909 (Resolved): snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a...
New code for tracker #22881 in pull request https://github.com/ceph/ceph/pull/21546 no calls _scan_snaps() on each ...- 01:23 PM Bug #23627 (Fix Under Review): Error EACCES: problem getting command descriptions from mgr.None f...
- https://github.com/ceph/ceph/pull/21698
- 01:16 PM Bug #23627: Error EACCES: problem getting command descriptions from mgr.None from 'ceph tell mgr'
- ...
- 12:22 PM Bug #23627: Error EACCES: problem getting command descriptions from mgr.None from 'ceph tell mgr'
- /a//kchai-2018-04-27_07:23:02-rados-wip-kefu-testing-2018-04-27-0902-distro-basic-smithi/2444194
- 10:43 AM Backport #23905 (Resolved): jewel: Deleting a pool with active watch/notify linger ops can result...
- https://github.com/ceph/ceph/pull/21754
- 10:42 AM Backport #23904 (Resolved): luminous: Deleting a pool with active watch/notify linger ops can res...
- https://github.com/ceph/ceph/pull/21752
- 10:39 AM Backport #23850 (New): luminous: Read operations segfaulting multiple OSDs
- Status can change to "In Progress" when the PR is open and URL of PR is mentioned in a comment.
- 06:29 AM Backport #23850 (In Progress): luminous: Read operations segfaulting multiple OSDs
- 10:17 AM Bug #23899: run cmd 'ceph daemon osd.0 smart' cause osd daemon Segmentation fault
- root cause is sometimes output.read_fd() could return 0 length data.
ret = output.read_fd(smartctl.get_stdout(), 1... - 10:15 AM Bug #23899 (Resolved): run cmd 'ceph daemon osd.0 smart' cause osd daemon Segmentation fault
2018-04-27 09:44:51.572 7fb787a05700 -1 osd.0 57 smartctl output is:
2018-04-27 09:44:51.576 7fb787a05700 -1 *** C...- 09:00 AM Bug #23879: test_mon_osdmap_prune.sh fails
- ...
- 01:34 AM Bug #23878: assert on pg upmap
- This pr #21670 passed tests failed before in my local cluster, needs qa
- 12:55 AM Bug #23872 (Pending Backport): Deleting a pool with active watch/notify linger ops can result in ...
04/26/2018
- 11:27 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
Seen again:
http://qa-proxy.ceph.com/teuthology/dzafman-2018-04-26_10:04:07-rados-wip-zafman-testing-distro-basi...- 10:33 PM Bug #23893 (Resolved): jewel clients fail to decode mimic osdmap
- http://pulpito.ceph.com/sage-2018-04-26_19:17:57-rados:thrash-old-clients-wip-sage-testing-2018-04-26-1251-distro-bas...
- 10:22 PM Bug #23871 (Resolved): luminous->mimic: missing primary copy of xxx, wil try copies on 3, then fu...
- 10:20 PM Bug #23892 (Can't reproduce): luminous->mimic: mon segv in ~MonOpRequest from OpHistoryServiceThread
- ...
- 05:06 PM Bug #23785 (Resolved): "test_prometheus (tasks.mgr.test_module_selftest.TestModuleSelftest) ... E...
- test is passing now
- 02:23 PM Bug #23769 (In Progress): osd/EC: slow/hung ops in multimds suite test
- 01:55 PM Bug #23878 (Fix Under Review): assert on pg upmap
- https://github.com/ceph/ceph/pull/21670
- 01:55 PM Bug #23878: assert on pg upmap
- 09:52 AM Bug #23878: assert on pg upmap
- I’ll prepare a patch soon
- 06:44 AM Bug #23878: assert on pg upmap
- And then if i do pg-upmap operation....
- 05:35 AM Bug #23878: assert on pg upmap
- After pick the pr https://github.com/ceph/ceph/pull/21325
It works fine.
But i have some question:
the upmap items... - 04:31 AM Bug #23878 (Resolved): assert on pg upmap
- I use the follow script to test upmap...
- 10:09 AM Backport #23863 (In Progress): luminous: scrub interaction with HEAD boundaries and clones is broken
- 09:16 AM Backport #23863: luminous: scrub interaction with HEAD boundaries and clones is broken
- https://github.com/ceph/ceph/pull/21665
- 07:46 AM Bug #23879 (Can't reproduce): test_mon_osdmap_prune.sh fails
- ...
- 02:46 AM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- /kchai-2018-04-26_00:52:32-rados-wip-kefu-testing-2018-04-25-2253-distro-basic-smithi/2439501/
- 12:02 AM Bug #20924: osd: leaked Session on osd.7
- osd.3 here:
http://pulpito.ceph.com/yuriw-2018-04-23_23:19:23-rados-wip-yuri-testing-2018-04-23-1502-distro-basic-...
04/25/2018
- 10:10 PM Bug #23875 (Resolved): Removal of snapshot with corrupt replica crashes osd
This may be a completely legitimate crash due to the curruption.
See pending test case TEST_scrub_snaps_replica ...- 09:46 PM Bug #23816 (Fix Under Review): disable bluestore cache caused a rocksdb error
- 09:29 PM Bug #23204 (Duplicate): missing primary copy of object in mixed luminous<->master cluster with bl...
- 09:28 PM Bug #21992 (Duplicate): osd: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 09:14 PM Backport #23786 (Fix Under Review): luminous: "utilities/env_librados.cc:175:33: error: unused pa...
- https://github.com/ceph/ceph/pull/21655
- 09:09 PM Bug #23827: osd sends op_reply out of order
- 06:26 AM Bug #23827: osd sends op_reply out of order
- same bug #20742
- 03:46 AM Bug #23827: osd sends op_reply out of order
- Ignore my statement. Dispatch do put_back.So no race .
- 03:24 AM Bug #23827: osd sends op_reply out of order
- For this case: if slot->to_process is null. And Op1 enqueue_front. At the same time Op2 dispatch. Because two threads...
- 09:09 PM Bug #23664 (Fix Under Review): cache-try-flush hits wrlock, busy loops
- 08:35 PM Bug #23664: cache-try-flush hits wrlock, busy loops
- reproducing this semi-frequently, see #23847
This should fix it: https://github.com/ceph/ceph/pull/21653 - 09:07 PM Bug #23871 (In Progress): luminous->mimic: missing primary copy of xxx, wil try copies on 3, then...
- 04:25 PM Bug #23871 (Resolved): luminous->mimic: missing primary copy of xxx, wil try copies on 3, then fu...
- ...
- 08:34 PM Bug #23847 (Duplicate): osd stuck recovery
- 05:52 PM Bug #23847: osd stuck recovery
- Recovery is starved by #23664, a cache tiering infinite loop.
- 05:36 PM Bug #23847: osd stuck recovery
- recovery on 3.3 stalls out here...
- 05:26 PM Bug #23872: Deleting a pool with active watch/notify linger ops can result in seg fault
- Original test failure where this issue was discovered: http://pulpito.ceph.com/trociny-2018-04-24_08:17:18-rbd-wip-mg...
- 05:24 PM Bug #23872 (Fix Under Review): Deleting a pool with active watch/notify linger ops can result in ...
- *PR*: https://github.com/ceph/ceph/pull/21649
- 05:17 PM Bug #23872 (Resolved): Deleting a pool with active watch/notify linger ops can result in seg fault
- ...
- 04:24 PM Backport #23870 (Resolved): luminous: null map from OSDService::get_map in advance_pg
- https://github.com/ceph/ceph/pull/21737
- 04:23 PM Backport #23863 (Resolved): luminous: scrub interaction with HEAD boundaries and clones is broken
- https://github.com/ceph/ceph/pull/22044
- 04:00 PM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
- master commit says:
Consider a scenario like:
- scrub [3:2525d100:::earlier:head,3:2525d12f:::foo:200]
- we see... - 03:58 PM Bug #23646 (Pending Backport): scrub interaction with HEAD boundaries and clones is broken
- 03:48 PM Bug #21977 (Pending Backport): null map from OSDService::get_map in advance_pg
- 03:45 PM Bug #23857: flush (manifest) vs async recovery causes out of order op
- maybe: /a/sage-2018-04-25_02:28:01-rados-wip-sage3-testing-2018-04-24-1729-distro-basic-smithi/2436808
rados/thras... - 03:37 PM Bug #23857: flush (manifest) vs async recovery causes out of order op
- maybe: /a/sage-2018-04-25_02:28:01-rados-wip-sage3-testing-2018-04-24-1729-distro-basic-smithi/2436663
rados/thras... - 01:53 PM Bug #23857: flush (manifest) vs async recovery causes out of order op
- The core problem is that the requeue logic assumes that objects always go from degraded to not degraded.. never the o...
- 01:49 PM Bug #23857 (Can't reproduce): flush (manifest) vs async recovery causes out of order op
- ...
- 03:44 PM Bug #23860 (Resolved): luminous->master: luminous crashes with AllReplicasRecovered in Started/Pr...
- ...
- 11:10 AM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
- We hit this today on Jewel release (10.2.7), all OSDs connected to one of the monitor in the quorum having this issue...
- 08:23 AM Backport #23852 (In Progress): luminous: OSD crashes on empty snapset
- 08:18 AM Backport #23852 (Resolved): luminous: OSD crashes on empty snapset
- https://github.com/ceph/ceph/pull/21638
- 08:18 AM Bug #23851 (Resolved): OSD crashes on empty snapset
- Fix merged to master: https://github.com/ceph/ceph/pull/21058
- 04:49 AM Backport #23850 (Resolved): luminous: Read operations segfaulting multiple OSDs
- https://github.com/ceph/ceph/pull/21911
04/24/2018
- 10:33 PM Bug #21931 (In Progress): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (ra...
- This is a bug in trimtrunc handling with EC pools.
- 10:27 PM Bug #23195 (Pending Backport): Read operations segfaulting multiple OSDs
- https://github.com/ceph/ceph/pull/21273
- 10:20 PM Bug #23195 (Resolved): Read operations segfaulting multiple OSDs
- 10:25 PM Bug #23847 (Duplicate): osd stuck recovery
- ...
- 10:24 PM Bug #23827 (In Progress): osd sends op_reply out of order
- 09:26 AM Bug #23827: osd sends op_reply out of order
- ...
- 08:36 PM Bug #23646 (Fix Under Review): scrub interaction with HEAD boundaries and clones is broken
- https://github.com/ceph/ceph/pull/21628
I think this will fix it? - 12:42 AM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
The commit below adds code to honor the no_whiteout flag even when it looks like clones exist or will exist soon. ...- 06:03 PM Bug #21977: null map from OSDService::get_map in advance_pg
- https://github.com/ceph/ceph/pull/21623
- 06:02 PM Bug #21977 (Fix Under Review): null map from OSDService::get_map in advance_pg
- advance_pg ran before init() published the initial map to OSDService.
- 05:33 PM Bug #23716 (Resolved): osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (o...
- This seems to be resolved. My guess is it's fallout from https://github.com/ceph/ceph/pull/21604
- 03:45 PM Bug #23763 (Pending Backport): upgrade: bad pg num and stale health status in mixed lumnious/mimi...
04/23/2018
- 09:40 PM Bug #23646 (In Progress): scrub interaction with HEAD boundaries and clones is broken
The osd log for primary osd.1 shows that pg 3.0 is a cache pool in a cache tiering configuration. The message "_de...- 09:12 PM Bug #23830 (Can't reproduce): rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
- ...
- 07:45 PM Bug #23828 (Can't reproduce): ec gen object leaks into different filestore collection just after ...
- ...
- 05:11 PM Bug #23827 (Resolved): osd sends op_reply out of order
- ...
- 03:13 AM Bug #23713 (Fix Under Review): High MON cpu usage when cluster is changing
- https://github.com/ceph/ceph/pull/21532
04/22/2018
- 08:07 PM Bug #21977: null map from OSDService::get_map in advance_pg
- From the latest logs, the peering thread id does not appear at all in the log until the crash.
I'm wondering if we... - 08:05 PM Bug #21977: null map from OSDService::get_map in advance_pg
- Seen again here:
http://pulpito.ceph.com/yuriw-2018-04-20_20:02:29-upgrade:jewel-x-luminous-distro-basic-ovh/2420862/
04/21/2018
- 04:06 PM Bug #23793: ceph-osd consumed 10+GB rss memory
- Set osd_debug_op_order to false can fix this problem.
My ceph cluster is created through vstart.sh which set osd_deb... - 03:57 PM Bug #23816: disable bluestore cache caused a rocksdb error
- https://github.com/ceph/ceph/pull/21583
- 03:53 PM Bug #23816 (Resolved): disable bluestore cache caused a rocksdb error
- I disabled bluestore/rocksdb cache to estimate ceph-osd's memory consumption
by set bluestore_cache_size_ssd/bluesto... - 06:55 AM Bug #23145: OSD crashes during recovery of EC pg
- `2018-03-09 08:29:09.170227 7f901e6b30 10 merge_log log((17348'18587,17348'18587], crt=17348'18585) from osd.6(2) int...
04/20/2018
- 09:09 PM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- You don't really have authentication without the message signing. Since we don't do full encryption, signing is the o...
- 03:07 PM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- How costly is just the authentication piece, i.e. keep cephx but turn off message signing?
- 07:21 AM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- Summary of the discussion:
`check_message_signature` in `AsyncConnection::process` is being already protected by `... - 06:38 AM Feature #23552: cache PK11Context in Connection and probably other consumers of CryptoKeyHandler
- per Radoslaw Zarzynski
> the overhead between `CreateContextBySym` and `DigestBegin` is small
and probably we c... - 08:53 PM Bug #23517 (Resolved): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- 02:27 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- https://github.com/ceph/ceph/pull/21571
- 08:47 PM Bug #23811: RADOS stat slow for some objects on same OSD
- We are still debugging this. On a further look, it looks like all objects on that PG (aka _79.1f9_) show similar slow...
- 05:30 PM Bug #23811 (New): RADOS stat slow for some objects on same OSD
- We have observed that queries have been slow for some RADOS objects while others on the same OSD respond much quickly...
- 05:19 PM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
- I guess the intention is that scrubbing takes priority and proceeds even if trimming is in progress. Before more tri...
- 04:45 PM Bug #23646: scrub interaction with HEAD boundaries and clones is broken
We don't start trimming if scrubbing is happening, so maybe the only hole is that scrubbing doesn't check for trimm...- 04:38 PM Bug #23810: ceph mon dump outputs verbose text to stderr
- As a simple verification, running:...
- 04:26 PM Bug #23810 (New): ceph mon dump outputs verbose text to stderr
- When executing...
- 02:41 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- My opinion is that this is different from a problem where the inconsistent flag reappears after repairing a PG becaus...
- 12:55 PM Backport #23808 (In Progress): luminous: upgrade: bad pg num and stale health status in mixed lum...
- https://github.com/ceph/ceph/pull/21556
- 12:55 PM Backport #23808 (Resolved): luminous: upgrade: bad pg num and stale health status in mixed lumnio...
- https://github.com/ceph/ceph/pull/21556
- 11:11 AM Bug #23763 (Fix Under Review): upgrade: bad pg num and stale health status in mixed lumnious/mimi...
- https://github.com/ceph/ceph/pull/21555
- 10:09 AM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- i think the pg_num = 11 is set by LibRadosList.EnumerateObjects...
- 12:32 AM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- Yuri reproduced the bad pg_num in 1 of 2 runs:...
- 12:48 AM Bug #22881 (In Progress): scrub interaction with HEAD boundaries and snapmapper repair is broken
04/19/2018
- 02:18 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- 12:34 PM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- https://github.com/ceph/ceph/pull/21280
- 07:42 AM Bug #23517 (Fix Under Review): TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- ...
- 09:33 AM Backport #23784 (In Progress): luminous: osd: Warn about objects with too many omap entries
- 09:33 AM Backport #23784: luminous: osd: Warn about objects with too many omap entries
- h3. description
As discussed in this PR - https://github.com/ceph/ceph/pull/16332 - 07:29 AM Bug #23793: ceph-osd consumed 10+GB rss memory
- the "mon max pg per osd" is 1024 in my test.
- 07:14 AM Bug #23793 (New): ceph-osd consumed 10+GB rss memory
- After 26GB data is written, ceph-osd's memory(rss) reached 10+GB.
The objectstore backed is *KStore*. master branc... - 06:42 AM Backport #22934 (In Progress): luminous: filestore journal replay does not guard omap operations
04/18/2018
- 09:10 PM Bug #23620 (Resolved): tasks.mgr.test_failover.TestFailover failure
- 08:25 PM Bug #23788 (Duplicate): luminous->mimic: EIO (crc mismatch) on copy-get from ec pool
- ...
- 08:01 PM Bug #23787 (Rejected): luminous: "osd-scrub-repair.sh'" failures in rados
- This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s... - 07:57 PM Backport #23786 (Resolved): luminous: "utilities/env_librados.cc:175:33: error: unused parameter ...
- This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s... - 07:52 PM Bug #23785 (Resolved): "test_prometheus (tasks.mgr.test_module_selftest.TestModuleSelftest) ... E...
- This is v12.2.5 QE validation
Run: http://pulpito.ceph.com/yuriw-2018-04-17_20:36:16-rados-luminous-distro-basic-s... - 06:45 PM Backport #23784 (Resolved): luminous: osd: Warn about objects with too many omap entries
- https://github.com/ceph/ceph/pull/21518
- 03:34 PM Bug #23763: upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- the pgs with creating or unknown status "pg dump" were active+clean after 2018-04-16 22:47. so the output of last "pg...
- 01:29 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Any update on this?
- 12:14 PM Bug #23701 (Fix Under Review): oi(object_info_t) size mismatch with real object size during old w...
- 12:12 PM Bug #23701: oi(object_info_t) size mismatch with real object size during old write arrives after ...
- i think this issue only exists in jewel.
- 02:47 AM Documentation #23777: doc: description of OSD_OUT_OF_ORDER_FULL problem
- the default values :
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
- 02:39 AM Documentation #23777 (Resolved): doc: description of OSD_OUT_OF_ORDER_FULL problem
- The description of OSD_OUT_OF_ORDER_FULL is...
- 12:30 AM Feature #23364 (Resolved): Special scrub handling of hinfo_key errors
- 12:30 AM Backport #23654 (Resolved): luminous: Special scrub handling of hinfo_key errors
04/17/2018
- 07:00 PM Backport #23772 (Resolved): luminous: ceph status shows wrong number of objects
- https://github.com/ceph/ceph/pull/22680
- 06:36 PM Bug #23769 (Resolved): osd/EC: slow/hung ops in multimds suite test
- ...
- 03:40 PM Feature #23364: Special scrub handling of hinfo_key errors
- This pull request is another follow on:
https://github.com/ceph/ceph/pull/21450 - 11:41 AM Bug #20924: osd: leaked Session on osd.7
- /a/sage-2018-04-17_04:17:03-rados-wip-sage3-testing-2018-04-16-2028-distro-basic-smithi/2404155
this time on osd.4... - 07:37 AM Bug #23767: "ceph ping mon" doesn't work
- so "ceph ping mon.<id>" will remind you mon.<id> doesn't existed. however, if you run "ceph ping mon.a", you can get ...
- 07:33 AM Bug #23767 (New): "ceph ping mon" doesn't work
- if there is only mon_host= ip1, ip2...in the ceph.conf, then "ceph ping mon.<id>" doesn't work.
Root cause is in the... - 06:14 AM Bug #23273: segmentation fault in PrimaryLogPG::recover_got()
- Sorry for late reply, but it's hard to reproduce. we reproduce it once with...
- 02:09 AM Documentation #23765 (New): librbd hangs if permissions are incorrect
- I've been building rust bindings for librbd against ceph jewel and luminous. I found out by accident that if a cephx...
- 12:14 AM Bug #23763 (Resolved): upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- This happened in a luminous-x/point-to-point run. Logs in teuthology:/home/yuriw/logs/2387999/
Versions at this po...
04/16/2018
- 05:52 PM Bug #23760 (New): mon: `config get <who>` does not allow `who` as 'mon'/'osd'
- `config set mon` is allowed, but `config get mon` is not.
This is due to <who> on `get` being parsed as an EntityN... - 04:39 PM Bug #23753: "Error ENXIO: problem getting command descriptions from osd.4" in upgrade:kraken-x-lu...
- This generally means the OSD isn't on?
04/15/2018
- 10:22 PM Bug #23753 (Can't reproduce): "Error ENXIO: problem getting command descriptions from osd.4" in u...
- Run: http://pulpito.ceph.com/teuthology-2018-04-15_03:25:02-upgrade:kraken-x-luminous-distro-basic-smithi/
Jobs: '23... - 05:44 PM Bug #23471 (Resolved): add --add-bucket and --move options to crushtool
- 02:51 PM Bug #22095 (Pending Backport): ceph status shows wrong number of objects
- 08:52 AM Bug #19348: "ceph ping mon.c" cli prints assertion failure on timeout
- https://github.com/ceph/ceph/pull/21432
04/14/2018
- 06:11 AM Support #23719: Three nodes shutdown,only boot two of the nodes,many pgs down.(host failure-domai...
- fix description: If pg1.1 have acting set [1,2,3], I power down osd.3 first,then power down osd.2. In case I boot osd...
- 05:50 AM Support #23719 (New): Three nodes shutdown,only boot two of the nodes,many pgs down.(host failure...
- The interval mechanism of PG will cause a problem in the process of cluster restart.If I have 3 nodes(host failure-do...
04/13/2018
- 10:40 PM Bug #23716: osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (on upgrade f...
- ...
- 10:21 PM Bug #23716 (Resolved): osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (o...
- ...
- 07:33 PM Bug #21931: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range.first.get_...
- Live multimds run: /ceph/teuthology-archive/pdonnell-2018-04-13_04:19:33-multimds-wip-pdonnell-testing-20180413.02283...
- 07:30 PM Bug #21992: osd: src/common/interval_map.h: 161: FAILED assert(len > 0)
- /ceph/teuthology-archive/pdonnell-2018-04-13_04:19:33-multimds-wip-pdonnell-testing-20180413.022831-testing-basic-smi...
- 06:28 PM Bug #23713: High MON cpu usage when cluster is changing
- My guess is that this is the compat reencoding of the OSDMap for the pre-luminous clients.
Are you by chance makin... - 06:10 PM Bug #23713 (Resolved): High MON cpu usage when cluster is changing
- After upgrading to Luminous 12.2.4 (from Jewel 10.2.5), we consistently see high cpu usage when OSDMap changes , esp...
- 03:03 PM Bug #23228 (Closed): scrub mismatch on objects
- The failure in comment (2) looks unrelated, but i twas a test branch. let's see if it happens again.
The original ... - 01:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Is there any testing, logs, etc that will be helpful for tracking down the cause of this problem. I had a fairly bad...
- 08:20 AM Bug #23701: oi(object_info_t) size mismatch with real object size during old write arrives after ...
- here is my pull request to fix this problem
https://github.com/ceph/ceph/pull/21408 - 08:08 AM Bug #23701 (Fix Under Review): oi(object_info_t) size mismatch with real object size during old w...
- currently, in our test environment (jewel : cephfs + cache tier + ec pool), we found several osd coredump
in the fol... - 01:52 AM Backport #23654 (In Progress): luminous: Special scrub handling of hinfo_key errors
- https://github.com/ceph/ceph/pull/21397
04/12/2018
- 11:08 PM Feature #23364: Special scrub handling of hinfo_key errors
- Follow on pull request included in backport to this tracker
https://github.com/ceph/ceph/pull/21362 - 09:49 PM Bug #23610 (Resolved): pg stuck in activating because of dropped pg-temp message
- 09:48 PM Backport #23630 (Resolved): luminous: pg stuck in activating
- 09:28 PM Backport #23630: luminous: pg stuck in activating
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21330
merged - 05:35 PM Bug #23228: scrub mismatch on objects
- My change only affects the scrub error counts in the stats. However, if setting dirty_info in proc_primary_info() wo...
- 04:27 PM Bug #23228: scrub mismatch on objects
- The original report was an EC test, so it looks like a dup of #23339.
David, your failures are not EC. Could they... - 04:43 PM Bug #20439 (Can't reproduce): PG never finishes getting created
- 04:29 PM Bug #22656: scrub mismatch on bytes (cache pools)
- Just bytes
dzafman-2018-03-28_18:21:29-rados-wip-zafman-testing-distro-basic-smithi/2332093
[ERR] 3.0 scrub sta... - 02:29 PM Bug #20874: osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end() || (miter->second...
- /a/sage-2018-04-11_22:26:40-rados-wip-sage-testing-2018-04-11-1604-distro-basic-smithi/2387226
- 02:25 PM Backport #23668 (In Progress): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrit...
- https://github.com/ceph/ceph/pull/21378
- 01:34 AM Backport #23668 (Resolved): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrites'...
- https://github.com/ceph/ceph/pull/21378
- 07:19 AM Backport #23675 (In Progress): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- 07:07 AM Backport #23675 (Resolved): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- https://github.com/ceph/ceph/pull/21368
- 03:27 AM Bug #23662 (Resolved): osd: regression causes SLOW_OPS warnings in multimds suite
- 02:59 AM Bug #23517: TestMockDeepCopyRequest.SimpleCopy fails in run-rbd-unit-tests.sh
- not able to move this to CI somehow... moving it to RADOS.
- 02:54 AM Backport #23315 (Resolved): luminous: pool create cmd's expected_num_objects is not correctly int...
- 02:41 AM Bug #23622 (Pending Backport): qa/workunits/mon/test_mon_config_key.py fails on master
- 02:01 AM Bug #23564: OSD Segfaults
- Correct, Bluestore and Luminous 12.2.4
- 01:57 AM Backport #23673 (In Progress): jewel: auth: ceph auth add does not sanity-check caps
- 01:43 AM Backport #23673 (Resolved): jewel: auth: ceph auth add does not sanity-check caps
- https://github.com/ceph/ceph/pull/21367
- 01:53 AM Bug #23578 (Resolved): large-omap-object-warnings test fails
- 01:52 AM Backport #23633 (Rejected): luminous: large-omap-object-warnings test fails
- We can close this if that test isn't present in luminous.
- 01:35 AM Backport #23633 (Need More Info): luminous: large-omap-object-warnings test fails
- Brad,
Backporting PR#21295 to luminous is unrelated unless we get qa/suites/rados/singleton-nomsgr/all/large-omap-ob... - 01:41 AM Backport #23670 (In Progress): luminous: auth: ceph auth add does not sanity-check caps
- 01:34 AM Backport #23670 (Resolved): luminous: auth: ceph auth add does not sanity-check caps
- https://github.com/ceph/ceph/pull/24906
- 01:34 AM Backport #23654 (New): luminous: Special scrub handling of hinfo_key errors
- 01:33 AM Bug #22525 (Pending Backport): auth: ceph auth add does not sanity-check caps
04/11/2018
- 11:22 PM Bug #23662 (Fix Under Review): osd: regression causes SLOW_OPS warnings in multimds suite
- https://github.com/ceph/teuthology/pull/1166
- 09:38 PM Bug #23662: osd: regression causes SLOW_OPS warnings in multimds suite
- Looks like the obvious cause: https://github.com/ceph/ceph/pull/20660
- 07:56 PM Bug #23662 (Resolved): osd: regression causes SLOW_OPS warnings in multimds suite
- See: [1], first instance of the problem at [0].
The last run which did not cause most multimds jobs to fail with S... - 11:20 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
Any scrub that completes without errors will set num_scrub_errors in pg stats to 0. That will cause the inconsiste...- 10:51 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- David, is there any way a missing object wouldn't be reported in list-inconsistent output?
- 11:01 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- Let's see if this happens again now that sage's fast peering branch is merged.
- 10:58 PM Backport #23485 (Resolved): luminous: scrub errors not cleared on replicas can cause inconsistent...
- 10:58 PM Bug #23585: osd: safe_timer segfault
- Possibly the same as http://tracker.ceph.com/issues/23431
- 02:10 PM Bug #23585: osd: safe_timer segfault
- Got segfault in safe_timer too. Got it just once so can not provide more info at the moment.
2018-04-03 05:53:07... - 10:57 PM Bug #23564: OSD Segfaults
- Is this on bluestore? there are a few reports of this occurring on bluestore including your other bug http://tracker....
- 10:44 PM Bug #23590: kstore: statfs: (95) Operation not supported
- 10:42 PM Bug #23601 (Resolved): Cannot see the value of parameter osd_op_threads via ceph --show-config
- 10:37 PM Bug #23614: local_reserver double-reservation of backfilled pg
- This may be the same root cause as http://tracker.ceph.com/issues/23490
- 10:36 PM Bug #23565: Inactive PGs don't seem to cause HEALTH_ERR
- Brad, can you take a look at this? I think it can be handled by the stuck pg code, that iirc already warns about pgs ...
- 10:25 PM Bug #23664 (Resolved): cache-try-flush hits wrlock, busy loops
- ...
- 10:12 PM Bug #23403 (Closed): Mon cannot join quorum
- Thanks for letting us know.
- 01:15 PM Bug #23403: Mon cannot join quorum
- After more investigation we discovered that one of the bonds on the machine was not behaving properly. We removed the...
- 11:28 AM Bug #23403: Mon cannot join quorum
- Thanks for the investigation Brad.
The "fault, initiating reconnect" and "RESETSESSION" messages only appear when ... - 07:57 PM Bug #23595: osd: recovery/backfill is extremely slow
- @Greg Farnum: Ah, great that part is already handled!
What about my other questions though, like
> I think it i... - 06:45 PM Bug #23595 (Duplicate): osd: recovery/backfill is extremely slow
- https://tracker.ceph.com/issues/23141
Sorry you ran into this, it's a bug in BlueStore/BlueFS. The fix will be in ... - 07:49 PM Backport #23315: luminous: pool create cmd's expected_num_objects is not correctly interpreted
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/20907
merged - 05:45 PM Feature #23660 (New): when scrub errors are due to disk read errors, ceph status can say "likely ...
- If some of the scrub errors are due to disk read errors, we can also say in the status output "likely disk errors" an...
- 03:49 PM Bug #23487 (Pending Backport): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- 03:39 PM Backport #23654 (Resolved): luminous: Special scrub handling of hinfo_key errors
- https://github.com/ceph/ceph/pull/21397
- 03:09 PM Bug #23621 (Resolved): qa/standalone/mon/misc.sh fails on master
- 01:40 PM Backport #23316: jewel: pool create cmd's expected_num_objects is not correctly interpreted
- -https://github.com/ceph/ceph/pull/21042-
but test/mon/osd-pool-create.sh failing, looking into it. - 05:00 AM Bug #23610 (Pending Backport): pg stuck in activating because of dropped pg-temp message
- 04:56 AM Bug #23648 (New): max-pg-per-osd.from-primary fails because of activating pg
- the reason why we have activating pg when the number of pg is under the hard limit of max-pg-per-osd is that:
1. o... - 03:01 AM Bug #23647 (In Progress): thrash-eio test can prevent recovery
- We are injecting random EIOs. However, in a recovery situation an EIO leads us to decide the object is missing in on...
04/10/2018
- 11:38 PM Feature #23364 (Pending Backport): Special scrub handling of hinfo_key errors
- 09:13 PM Bug #23428: Snapset inconsistency is hard to diagnose because authoritative copy used by list-inc...
- In the pull request https://github.com/ceph/ceph/pull/20947 there is a change to partially address this issue. Unfor...
- 09:08 PM Bug #23646 (Resolved): scrub interaction with HEAD boundaries and clones is broken
- Scrub will work in chunks, accumulating work in cleaned_meta_map. A single object's clones may stretch across two su...
- 06:12 PM Backport #23630 (In Progress): luminous: pg stuck in activating
- 05:53 PM Backport #23630 (Resolved): luminous: pg stuck in activating
- https://github.com/ceph/ceph/pull/21330
- 05:53 PM Bug #23610 (Fix Under Review): pg stuck in activating because of dropped pg-temp message
- 05:53 PM Bug #23610 (Pending Backport): pg stuck in activating because of dropped pg-temp message
- 05:53 PM Backport #23633 (Rejected): luminous: large-omap-object-warnings test fails
- 05:47 PM Bug #18746 (Fix Under Review): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) ...
- 04:26 PM Bug #23495 (Resolved): Need (SLOW_OPS) in whitelist for another yaml
- 11:26 AM Bug #23495 (Fix Under Review): Need (SLOW_OPS) in whitelist for another yaml
- https://github.com/ceph/ceph/pull/21324
- 01:55 PM Bug #23627 (Resolved): Error EACCES: problem getting command descriptions from mgr.None from 'cep...
- ...
- 01:32 PM Bug #23622 (Fix Under Review): qa/workunits/mon/test_mon_config_key.py fails on master
- https://github.com/ceph/ceph/pull/21329
- 03:42 AM Bug #23622: qa/workunits/mon/test_mon_config_key.py fails on master
- see https://github.com/ceph/ceph/pull/21317 (not a fix)
- 02:56 AM Bug #23622 (Resolved): qa/workunits/mon/test_mon_config_key.py fails on master
- ...
- 07:04 AM Bug #20919 (Resolved): osd: replica read can trigger cache promotion
- 06:59 AM Backport #22403 (Resolved): jewel: osd: replica read can trigger cache promotion
- 06:22 AM Bug #23585: osd: safe_timer segfault
- https://drive.google.com/open?id=1x_0p9s9JkQ1zo-LCx6mHxm0DQO5sc1UA too larger about(1.2G). And ceph-osd.297.log.gz di...
- 05:53 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- doc don't update. So i create a PR:https://github.com/ceph/ceph/pull/21319.
- 04:57 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- In this commit:08731c3567300b28d83b1ac1c2ba. It removed. Maybe docs didn't update or you read old docs.
- 04:27 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- But I can see this option in document !! The setting is work in Jewel
So osd_op_threads was removed from Luminous ??
- 03:14 AM Bug #23601: Cannot see the value of parameter osd_op_threads via ceph --show-config
- There is no "osd_op_threads". Now it call osd_op_num_shards/osd_op_num_shards_hdd/osd_op_num_shards_sdd.
- 05:34 AM Bug #23595: osd: recovery/backfill is extremely slow
- check hdd or ssd by code at osd started and not changed after starting.
I think we need increase the log level fo... - 05:19 AM Bug #23593 (Resolved): RESTControllerTest.test_detail_route and RESTControllerTest.test_fill fail
- 04:29 AM Bug #23621 (In Progress): qa/standalone/mon/misc.sh fails on master
- https://github.com/ceph/ceph/pull/21318
- 04:17 AM Bug #23621: qa/standalone/mon/misc.sh fails on master
- bc5df2b4497104c2a8747daf0530bb5184f9fecb added ceph::features::mon::FEATURE_OSDMAP_PRUNE so the output that's failing...
- 02:53 AM Bug #23621: qa/standalone/mon/misc.sh fails on master
- http://pulpito.ceph.com/sage-2018-04-10_02:19:56-rados-master-distro-basic-smithi/2377263
http://pulpito.ceph.com/sa... - 02:51 AM Bug #23621 (Resolved): qa/standalone/mon/misc.sh fails on master
- This appears to be from the addition of the osdmap-prune mon feature?
- 02:49 AM Bug #23620 (Fix Under Review): tasks.mgr.test_failover.TestFailover failure
- https://github.com/ceph/ceph/pull/21315
- 02:43 AM Bug #23620 (Resolved): tasks.mgr.test_failover.TestFailover failure
- http://pulpito.ceph.com/sage-2018-04-10_02:19:56-rados-master-distro-basic-smithi/2377255...
- 12:57 AM Bug #23578 (Pending Backport): large-omap-object-warnings test fails
- Just a note that my analysis above was incorrect and this was not due to the lost coin flips but due to a pg map upda...
- 12:18 AM Backport #23485 (In Progress): luminous: scrub errors not cleared on replicas can cause inconsist...
Also available in: Atom