Activity
From 05/14/2018 to 06/12/2018
06/12/2018
- 08:01 AM Backport #24501 (Resolved): luminous: osd: eternal stuck PG in 'unfound_recovery'
- https://github.com/ceph/ceph/pull/22546
- 08:01 AM Backport #24500 (Resolved): mimic: osd: eternal stuck PG in 'unfound_recovery'
- https://github.com/ceph/ceph/pull/22545
- 08:00 AM Backport #24495 (Resolved): luminous: osd: segv in Session::have_backoff
- https://github.com/ceph/ceph/pull/22729
- 08:00 AM Backport #24494 (Resolved): mimic: osd: segv in Session::have_backoff
- https://github.com/ceph/ceph/pull/22730
- 03:22 AM Bug #24486 (Pending Backport): osd: segv in Session::have_backoff
06/11/2018
- 09:32 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I am going to add this test for upgrade as well, steps to recreate...
- 04:19 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I have also experienced this issue while continuing the Bluestore conversion of OSDs on my Ceph cluster, after carryi...
- 02:16 PM Backport #24059: luminous: Deleting a pool with active notify linger ops can result in seg fault
- Casey Bodley wrote:
> https://github.com/ceph/ceph/pull/22143
merged - 02:33 AM Bug #24487: osd: choose_acting loop
- It looks like the "choose_async_recovery_ec candidates by cost are: 178,2(0)" line is different in the second case.. ...
- 01:45 AM Bug #24487 (Resolved): osd: choose_acting loop
- ec pg looping between [2,3,0,1] and [-,3,0,1].
osd.3 says...
06/10/2018
- 06:41 PM Bug #24486 (Fix Under Review): osd: segv in Session::have_backoff
- https://github.com/ceph/ceph/pull/22497
- 06:34 PM Bug #24486 (Resolved): osd: segv in Session::have_backoff
- ...
- 04:41 PM Bug #24485 (Resolved): LibRadosTwoPoolsPP.ManifestUnset failure
- ...
- 03:30 PM Bug #24484 (Fix Under Review): osdc: wrong offset in BufferHead
- 03:15 PM Bug #24484: osdc: wrong offset in BufferHead
- this bug will lead to an exception "buffer::end_of_buffer" which is thrown in function "buffer::list::substr_of"
Thi... - 03:08 PM Bug #24484: osdc: wrong offset in BufferHead
- PR: https://github.com/ceph/ceph/pull/22495
- 03:07 PM Bug #24484 (Resolved): osdc: wrong offset in BufferHead
- The offset of BufferHead should be "opos - bh->start()"
- 02:12 AM Backport #24329 (In Progress): mimic: assert manager.get_num_active_clean() == pg_num on rados/si...
06/09/2018
- 07:21 PM Bug #24321 (Pending Backport): assert manager.get_num_active_clean() == pg_num on rados/singleton...
- 05:56 AM Bug #24321 (Fix Under Review): assert manager.get_num_active_clean() == pg_num on rados/singleton...
- https://github.com/ceph/ceph/pull/22485
- 06:50 PM Bug #22462: mon: unknown message type 1537 in luminous->mimic upgrade tests
- Maybe i have the same issue during upgrade Jewel->Luminous http://tracker.ceph.com/issues/24481?next_issue_id=24480&p...
- 02:23 PM Bug #24373 (Pending Backport): osd: eternal stuck PG in 'unfound_recovery'
- 11:20 AM Backport #24478 (Resolved): luminous: read object attrs failed at EC recovery
- https://github.com/ceph/ceph/pull/24327
- 11:18 AM Backport #24473 (Resolved): mimic: cosbench stuck at booting cosbench driver
- https://github.com/ceph/ceph/pull/22887
- 11:18 AM Backport #24472 (Resolved): mimic: Ceph-osd crash when activate SPDK
- https://github.com/ceph/ceph/pull/22684
- 11:18 AM Backport #24471 (Resolved): luminous: Ceph-osd crash when activate SPDK
- https://github.com/ceph/ceph/pull/22686
- 11:18 AM Backport #24468 (Resolved): mimic: tell ... config rm <foo> not idempotent
- https://github.com/ceph/ceph/pull/22552
- 06:07 AM Bug #24452 (Resolved): Backfill hangs in a test case in master not mimic
06/08/2018
- 11:03 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I can't reproduce this on any new Mimic cluster, it only happens on clusters upgraded from Luminous (which is why we ...
- 09:04 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I'm trying to make new OSDs with ceph-volume osd create --dmcrypt --bluestore --data /dev/sdg and am getting the same...
- 07:05 PM Bug #24454 (Duplicate): failed to recover before timeout expired
- #24452
- 12:29 PM Bug #24454 (Duplicate): failed to recover before timeout expired
- tons of this on current master
http://pulpito.ceph.com/kchai-2018-06-06_04:56:43-rados-wip-kefu-testing-2018-06-06... - 07:05 PM Bug #24452 (Fix Under Review): Backfill hangs in a test case in master not mimic
- https://github.com/ceph/ceph/pull/22478
- 02:48 PM Bug #24452: Backfill hangs in a test case in master not mimic
Final messages on primary during backfill about pg 1.0....- 04:57 AM Bug #24452 (Resolved): Backfill hangs in a test case in master not mimic
../qa/run-standalone.sh "osd-backfill-stats.sh TEST_backfill_down_out" 2>&1 | tee obs.log
This test times out wa...- 02:34 PM Backport #23912: luminous: mon: High MON cpu usage when cluster is changing
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21968
merged - 02:33 PM Backport #24245: luminous: Manager daemon y is unresponsive during teuthology cluster teardown
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22331
merged - 02:31 PM Backport #24374: luminous: mon: auto compaction on rocksdb should kick in more often
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22360
merged - 08:18 AM Bug #23352: osd: segfaults under normal operation
- Experiencing the a safe_timer segfault with a freshly deployed cluster. No data on the cluster yet. Just an empty poo...
06/07/2018
- 03:20 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- We are also seeing this when creating OSDs with IDs that existed previously.
I verified that the old osd was delet... - 01:21 PM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
- https://github.com/ceph/ceph/pull/22456
- 01:14 PM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
- Okay, I see the problem. Two fixes: first, reset every pg on down->up (simpler approach), but the bigger issue is th...
- 12:58 PM Bug #24450: OSD Caught signal (Aborted)
- I have the same problem.
http://tracker.ceph.com/issues/24423 - 12:03 PM Bug #24450 (Duplicate): OSD Caught signal (Aborted)
- Hi,
I have done a rolling_upgrade to mimic with ceph-ansible. It works perfect! Now, I want to deploy new OSDs, bu... - 11:46 AM Bug #24448 (Won't Fix): (Filestore) ABRT report for package ceph has reached 10 occurrences
- https://retrace.fedoraproject.org/faf/reports/bthash/fe768f98e5fff65f0c850668c4bdae8d4da7e086/
https://retrace.fedor...
06/06/2018
- 09:11 PM Bug #24264 (Closed): ssd-primary crush rule not working as intended
- I don't think there's a good way to express that requirement in the current crush language. The rule in the docs does...
- 09:06 PM Bug #24362 (Triaged): ceph-objectstore-tool incorrectly invokes crush_location_hook
- Seems like the way to fix this is to stop ceph-objectstore-tool from trying to use the crush location hook at all.
... - 07:15 AM Bug #23145: OSD crashes during recovery of EC pg
- -3> 2018-06-06 15:00:40.462930 7fffddb25700 -1 bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2...
- 02:45 AM Bug #23145: OSD crashes during recovery of EC pg
- @Sage Weil
@Zengran Zahng
we meet the some question, and osd crash not recover until now.
env is 12.2.5 ec 2+1 b... - 06:02 AM Backport #24293 (In Progress): jewel: mon: slow op on log message
- https://github.com/ceph/ceph/pull/22431
- 02:34 AM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
- Attached full log (download ceph-osd.3.log.gz).
Points are:... - 12:33 AM Bug #24371 (Pending Backport): Ceph-osd crash when activate SPDK
06/05/2018
- 05:34 PM Bug #24365 (Pending Backport): cosbench stuck at booting cosbench driver
- 01:33 AM Bug #24365 (Fix Under Review): cosbench stuck at booting cosbench driver
- https://github.com/ceph/ceph/pull/22405
- 04:04 PM Bug #24408 (Pending Backport): tell ... config rm <foo> not idempotent
- 11:00 AM Bug #24423 (Resolved): failed to load OSD map for epoch X, got 0 bytes
- After upgrading to Mimic I deleted a non-lvm OSD and recreated it with 'ceph-volume lvm prepare --bluestore --data /d...
- 10:37 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- the same to https://tracker.ceph.com/issues/21475. and i already modify bluestore_deferred_throttle_bytes = 0
bluest... - 10:31 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- 2018-06-05T17:46:28.273183+08:00 node54 ceph-osd: /work/build/rpmbuild/BUILD/infinity-3.2.5/src/os/bluestore/BlueStor...
- 10:31 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- 鹏 张 wrote:
> ceph version: 12.2.5
> data pool use Ec module 2 + 1.
> When restart one osd,it case crash and restar... - 10:26 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- 1.-45> 2018-06-05 17:47:56.886142 7f8972974700 -1 bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2)...
- 10:25 AM Bug #24422 (Duplicate): Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- ceph version: 12.2.5
data pool use Ec module 3 + 1.
When restart one osd,it case crash and restart more and more.
... - 04:42 AM Bug #24419 (Won't Fix): ceph-objectstore-tool unable to open mon store
- Hi,everyone;
I use luminous v12.2.5,and i try to recovery monitor database from osds,
I perform step by step acc... - 03:32 AM Backport #24291 (In Progress): jewel: common: JSON output from rados bench write has typo in max_...
- https://github.com/ceph/ceph/pull/22407
- 02:37 AM Bug #23875: Removal of snapshot with corrupt replica crashes osd
If update_snap_map() ignores the error from remove_oid() we still crash because an op from the primary related to...- 02:20 AM Backport #24292 (In Progress): mimic: common: JSON output from rados bench write has typo in max_...
- https://github.com/ceph/ceph/pull/22406
06/04/2018
- 06:32 PM Bug #24368: osd: should not restart on permanent failures
- It would, but the previous settings were there for a reason so I'm not sure if it's feasible to backport this for cep...
- 05:10 PM Bug #24371 (Fix Under Review): Ceph-osd crash when activate SPDK
- 04:00 PM Bug #24408 (Fix Under Review): tell ... config rm <foo> not idempotent
- https://github.com/ceph/ceph/pull/22395
- 03:56 PM Bug #24408 (Resolved): tell ... config rm <foo> not idempotent
- ...
- 02:56 PM Backport #24407 (In Progress): mimic: read object attrs failed at EC recovery
- 02:56 PM Backport #24407 (Resolved): mimic: read object attrs failed at EC recovery
- https://github.com/ceph/ceph/pull/22394
- 02:54 PM Bug #24406 (Resolved): read object attrs failed at EC recovery
- https://github.com/ceph/ceph/pull/22196
- 02:18 PM Backport #24290 (In Progress): luminous: common: JSON output from rados bench write has typo in m...
- https://github.com/ceph/ceph/pull/22391
- 11:53 AM Bug #24366 (Pending Backport): omap_digest handling still not correct
- 06:27 AM Bug #23352: osd: segfaults under normal operation
- Looking at the crash in http://tracker.ceph.com/issues/23352#note-14 there's a fairly glaring problem....
- 12:14 AM Bug #23352: osd: segfaults under normal operation
- Hi Kjetil,
Sure, worth a look, but AFAICT all access is protected by SafeTimers locks. - 02:08 AM Backport #24258 (In Progress): luminous: crush device class: Monitor Crash when moving Bucket int...
- https://github.com/ceph/ceph/pull/22381
06/02/2018
- 12:04 AM Bug #24365 (In Progress): cosbench stuck at booting cosbench driver
- Two things caused this issue:
1. cosbench requires openjdk-8. The cbt task does install this dependency, but we al...
06/01/2018
- 08:05 PM Bug #23352: osd: segfaults under normal operation
- Brad Hubbard wrote:
> I've confirmed that in all of the SafeTimer segfaults the 'schedule' multimap is empty, indica... - 06:01 PM Bug #24368: osd: should not restart on permanent failures
- Sounds like something that would be useful in our stable releases - Greg, do you agree?
- 05:56 PM Backport #24360 (Need More Info): luminous: osd: leaked Session on osd.7
- Do Not Backport For Now
see https://github.com/ceph/ceph/pull/22339#issuecomment-393574371 for details - 05:44 PM Backport #24383 (Resolved): mimic: osd: stray osds in async_recovery_targets cause out of order ops
- https://github.com/ceph/ceph/pull/22889
- 05:28 PM Backport #24381 (Resolved): luminous: omap_digest handling still not correct
- https://github.com/ceph/ceph/pull/22375
- 05:28 PM Backport #24380 (Resolved): mimic: omap_digest handling still not correct
- https://github.com/ceph/ceph/pull/22374
- 08:02 AM Bug #24342: Monitor's routed_requests leak
- Greg Farnum wrote:
> What version are you running? The MRoute handling is all pretty old; though we've certainly dis... - 07:16 AM Bug #24373 (Fix Under Review): osd: eternal stuck PG in 'unfound_recovery'
- 05:22 AM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
- https://github.com/ceph/ceph/pull/22358
- 04:57 AM Bug #24373 (Resolved): osd: eternal stuck PG in 'unfound_recovery'
- A PG might be eternally stuck in 'unfound_recovery' after some OSDs are marked down.
For example, the following st... - 06:12 AM Backport #24375 (In Progress): mimic: mon: auto compaction on rocksdb should kick in more often
- 06:11 AM Backport #24375 (Resolved): mimic: mon: auto compaction on rocksdb should kick in more often
- https://github.com/ceph/ceph/pull/22361
- 06:10 AM Backport #24374 (In Progress): luminous: mon: auto compaction on rocksdb should kick in more often
- 06:08 AM Backport #24374 (Resolved): luminous: mon: auto compaction on rocksdb should kick in more often
- https://github.com/ceph/ceph/pull/22360
- 06:08 AM Bug #24361 (Pending Backport): auto compaction on rocksdb should kick in more often
- 04:47 AM Bug #24371: Ceph-osd crash when activate SPDK
- This is a bug in NVMEDevice, the bug fix has been committed.
Please have a review PR https://github.com/ceph/ceph... - 02:02 AM Bug #24371: Ceph-osd crash when activate SPDK
- I'm working on the issue.
- 02:01 AM Bug #24371 (Resolved): Ceph-osd crash when activate SPDK
- Enable SPDK and configure bluestore as mentioned in http://docs.ceph.com/docs/master/rados/configuration/bluestore-co...
- 02:56 AM Feature #24363: Configure DPDK with mellanox NIC
- next, compiling pass. but all binaries can not run.
output error
EAL: VFIO_RESOURCE_LIST tailq is already registere... - 02:38 AM Feature #24363: Configure DPDK with mellanox NIC
- log details
mellanox NIC over fabric
When compiling output error.
1. lack numa and cryptopp libraries
I ... - 12:23 AM Feature #24363: Configure DPDK with mellanox NIC
- Append
NIC over optical fiber - 12:07 AM Bug #24160 (Resolved): Monitor down when large store data needs to compact triggered by ceph tell...
05/31/2018
- 11:34 PM Bug #24368 (In Progress): osd: should not restart on permanent failures
- https://github.com/ceph/ceph/pull/22349 has the simple restart interval change. Will investigate the options for cond...
- 11:25 PM Bug #24368: osd: should not restart on permanent failures
- See https://www.freedesktop.org/software/systemd/man/systemd.service.html#Restart= for the details on Restart options.
- 11:17 PM Bug #24368 (Resolved): osd: should not restart on permanent failures
- Last week at OpenStack I heard a few users report OSDs were not failing hard and fast as they should be on disk issue...
- 07:01 PM Bug #24366 (In Progress): omap_digest handling still not correct
- https://github.com/ceph/ceph/pull/22346
- 05:39 PM Bug #24366 (Resolved): omap_digest handling still not correct
When running bluestore the object info data_digest is not needed. In that case the omap_digest handling is still b...- 06:08 PM Bug #24349 (Pending Backport): osd: stray osds in async_recovery_targets cause out of order ops
- 12:51 AM Bug #24349: osd: stray osds in async_recovery_targets cause out of order ops
- https://github.com/ceph/ceph/pull/22330
- 12:46 AM Bug #24349 (Resolved): osd: stray osds in async_recovery_targets cause out of order ops
- Related to https://tracker.ceph.com/issues/23827
http://pulpito.ceph.com/yuriw-2018-05-24_17:07:20-powercycle-mast... - 05:07 PM Bug #24365 (Resolved): cosbench stuck at booting cosbench driver
- ...
- 03:54 PM Bug #24342: Monitor's routed_requests leak
- What version are you running? The MRoute handling is all pretty old; though we've certainly discovered a number of le...
- 02:17 PM Feature #24363 (New): Configure DPDK with mellanox NIC
- Hi all
Whether ceph-13.1.0 support DPDK on mellanox NIC?
I found many issues when compiling. I even though handle t... - 01:22 PM Bug #24362 (Triaged): ceph-objectstore-tool incorrectly invokes crush_location_hook
- Ceph release being used: 12.5.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
/etc/ceph/ceph.conf c... - 11:50 AM Backport #24359 (In Progress): mimic: osd: leaked Session on osd.7
- 07:39 AM Backport #24359 (Resolved): mimic: osd: leaked Session on osd.7
- https://github.com/ceph/ceph/pull/22339
- 09:40 AM Bug #24361 (Fix Under Review): auto compaction on rocksdb should kick in more often
- https://github.com/ceph/ceph/pull/22337
- 09:07 AM Bug #24361 (Resolved): auto compaction on rocksdb should kick in more often
- in rocksdb, by default, "max_bytes_for_level_base" is 256MB, "max_bytes_for_level_multiplier" is 10. so with this set...
- 07:39 AM Backport #24360 (Resolved): luminous: osd: leaked Session on osd.7
- https://github.com/ceph/ceph/pull/29859
- 07:38 AM Backport #24350 (In Progress): mimic: slow mon ops from osd_failure
- 07:37 AM Backport #24350 (Resolved): mimic: slow mon ops from osd_failure
- https://github.com/ceph/ceph/pull/22297
- 07:38 AM Backport #24356 (Resolved): luminous: osd: pg hard limit too easy to hit
- https://github.com/ceph/ceph/pull/22592
- 07:38 AM Backport #24355 (Resolved): mimic: osd: pg hard limit too easy to hit
- https://github.com/ceph/ceph/pull/22621
- 07:37 AM Backport #24351 (Resolved): luminous: slow mon ops from osd_failure
- https://github.com/ceph/ceph/pull/22568
- 05:31 AM Bug #20924 (Pending Backport): osd: leaked Session on osd.7
- i think https://github.com/ceph/ceph/pull/22292 indeed addresses this issue
https://github.com/ceph/ceph/pull/22384 - 04:51 AM Backport #24246 (In Progress): mimic: Manager daemon y is unresponsive during teuthology cluster ...
- https://github.com/ceph/ceph/pull/22333
- 02:55 AM Backport #24245 (In Progress): luminous: Manager daemon y is unresponsive during teuthology clust...
- https://github.com/ceph/ceph/pull/22331
05/30/2018
- 11:31 PM Bug #24160 (Fix Under Review): Monitor down when large store data needs to compact triggered by c...
- 10:45 PM Bug #23830: rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
- This looks like a similar failure: http://pulpito.ceph.com/nojha-2018-05-30_20:43:02-rados-wip-async-up2-2018-05-30-d...
- 02:17 PM Bug #24342: Monitor's routed_requests leak
- It seems that this problem has been fixed by https://github.com/ceph/ceph/commit/39e06ef8f070e136e54452bdea3f6105cd79...
- 01:10 PM Bug #24342 (Closed): Monitor's routed_requests leak
- 12:09 PM Bug #24342: Monitor's routed_requests leak
- Sorry, it seems that the latest version doesn't have this problem. Really sorry. please close this.
- 09:36 AM Bug #24342: Monitor's routed_requests leak
- https://github.com/ceph/ceph/pull/22315
- 08:54 AM Bug #24342 (Closed): Monitor's routed_requests leak
- Recently, we found that, in our non-leader monitors, there are a lot of routed requests that has not been recycled, a...
- 01:58 PM Bug #24327: osd: segv in pg_log_entry_t::encode()
- Sage Weil wrote:
> This crash doesn't look familiar, and it's not clear to me what might cause segfault here. Do yo... - 01:48 PM Bug #24327 (Need More Info): osd: segv in pg_log_entry_t::encode()
- This crash doesn't look familiar, and it's not clear to me what might cause segfault here. Do you have a core file?
- 01:55 PM Bug #24339: FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in scan_requ...
- Josh and I noticed this by code inspection. I'm nailing down out of space handling nits in the kernel client and wan...
- 01:46 PM Bug #24339: FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in scan_requ...
- This is somewhat by design (or lack thereof)... the fail-safe check is there to prevent us from writing when we are *...
- 05:40 AM Backport #24215 (In Progress): mimic: "process (unknown)" in ceph logs
- https://github.com/ceph/ceph/pull/22311
- 03:29 AM Backport #24214 (In Progress): luminous: Module 'balancer' has failed: could not find bucket -14
- https://github.com/ceph/ceph/pull/22308
05/29/2018
- 11:01 PM Feature #23979: Limit pg log length during recovery/backfill so that we don't run out of memory.
- Initial testing is referenced here: https://github.com/ceph/ceph/pull/21508
- 10:59 PM Bug #24243 (Pending Backport): osd: pg hard limit too easy to hit
- https://github.com/ceph/ceph/pull/22187
- 10:59 PM Bug #24304 (Fix Under Review): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
- wrong bug
- 10:58 PM Bug #24304 (Pending Backport): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
- https://github.com/ceph/ceph/pull/22187
- 10:03 PM Feature #11601: osd: share cached osdmaps across osd daemons
- A vague possibility that the future seastar-based OSD may run each logical disk OSD inside a single process, which co...
- 07:38 PM Bug #24339 (New): FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in sca...
- FULL_FORCE ops are dropped if fail-safe full check fails in do_op(). scan_requests() uses op->respects_full() which ...
- 06:49 PM Bug #23646 (Resolved): scrub interaction with HEAD boundaries and clones is broken
- 01:11 PM Bug #24322 (Pending Backport): slow mon ops from osd_failure
- mimic: https://github.com/ceph/ceph/pull/22297
- 12:53 PM Backport #24328 (In Progress): luminous: assert manager.get_num_active_clean() == pg_num on rados...
- 09:40 AM Backport #24328 (Resolved): luminous: assert manager.get_num_active_clean() == pg_num on rados/si...
- https://github.com/ceph/ceph/pull/22296
- 12:47 PM Backport #24329 (Resolved): mimic: assert manager.get_num_active_clean() == pg_num on rados/singl...
- 09:40 AM Backport #24329 (Resolved): mimic: assert manager.get_num_active_clean() == pg_num on rados/singl...
- https://github.com/ceph/ceph/pull/22492
- 10:02 AM Bug #22530 (Resolved): pool create cmd's expected_num_objects is not correctly interpreted
- 10:02 AM Backport #23316 (Resolved): jewel: pool create cmd's expected_num_objects is not correctly interp...
- 10:01 AM Backport #24058 (Resolved): jewel: Deleting a pool with active notify linger ops can result in se...
- 09:59 AM Backport #24244 (Resolved): jewel: osd/EC: slow/hung ops in multimds suite test
- 09:59 AM Backport #24244 (In Progress): jewel: osd/EC: slow/hung ops in multimds suite test
- 09:56 AM Backport #24294 (Resolved): mimic: control-c on ceph cli leads to segv
- 09:55 AM Backport #24294 (In Progress): mimic: control-c on ceph cli leads to segv
- 09:52 AM Backport #24256 (Resolved): mimic: osd: Assertion `!node_algorithms::inited(this->priv_value_tra...
- 09:41 AM Backport #24333 (Resolved): luminous: local_reserver double-reservation of backfilled pg
- https://github.com/ceph/ceph/pull/23493
- 09:41 AM Backport #24332 (Resolved): mimic: local_reserver double-reservation of backfilled pg
- https://github.com/ceph/ceph/pull/22559
- 08:26 AM Feature #24231: librbd/libcephfs/librgw should ignore rados_mon/osd_op_timeouts options (requires...
- libcephfs doesn't use librados, so it doesn't need any changes.
The rados_mon_op_timeout affects anything that use... - 07:55 AM Bug #20924: osd: leaked Session on osd.7
- https://github.com/ceph/ceph/pull/22292 might address this issue.
- 07:37 AM Bug #24327 (Need More Info): osd: segv in pg_log_entry_t::encode()
- The affected osd restarted itself and everything seems fine then.But what is the cause of the crash?...
- 06:37 AM Backport #24204 (In Progress): mimic: LibRadosMiscPool.PoolCreationRace segv
- https://github.com/ceph/ceph/pull/22291
- 06:20 AM Backport #24216 (In Progress): luminous: "process (unknown)" in ceph logs
- https://github.com/ceph/ceph/pull/22290
- 03:32 AM Bug #24321: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max-pg-per-osd...
- mimic: https://github.com/ceph/ceph/pull/22288
- 03:31 AM Bug #24321 (Pending Backport): assert manager.get_num_active_clean() == pg_num on rados/singleton...
05/28/2018
- 10:54 PM Feature #24176: osd: add command to drop OSD cache
- Anyone looking into this? If not, I can pick it up.
- 03:21 PM Bug #24145 (Duplicate): osdmap decode error in rados/standalone/*
- 03:19 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- /a/kchai-2018-05-28_09:21:54-rados-wip-kefu-testing-2018-05-28-1113-distro-basic-smithi/2601187
on mimic branch.
... - 11:51 AM Bug #24321 (Fix Under Review): assert manager.get_num_active_clean() == pg_num on rados/singleton...
- https://github.com/ceph/ceph/pull/22275
- 05:28 AM Bug #23352: osd: segfaults under normal operation
- I've confirmed that in all of the SafeTimer segfaults the 'schedule' multimap is empty, indicating this is the last e...
- 05:16 AM Bug #23352: osd: segfaults under normal operation
- If we look at the coredump from 23585 and compare it to this message.
[117735.930255] safe_timer[52573]: segfault ... - 04:32 AM Bug #24023 (Duplicate): Segfault on OSD in 12.2.5
- Duplicate of 23352
- 04:30 AM Bug #23564 (Duplicate): OSD Segfaults
- Duplicate of 23352
- 04:28 AM Bug #23585 (Duplicate): osd: safe_timer segfault
- Duplicate of 23352
- 02:47 AM Bug #24160: Monitor down when large store data needs to compact triggered by ceph tell mon.xx com...
- PR :
https://github.com/ceph/ceph/pull/22056/
05/27/2018
- 05:58 PM Feature #11601: osd: share cached osdmaps across osd daemons
- Attached the file CephScaleTestMarch2015.pdf
Do we have any plan for this guys? - 02:55 PM Bug #24322 (Fix Under Review): slow mon ops from osd_failure
- https://github.com/ceph/ceph/pull/22259
- 02:46 PM Bug #23585: osd: safe_timer segfault
- Hi Brad, sure, thanks.
05/26/2018
- 01:51 PM Bug #24322 (Resolved): slow mon ops from osd_failure
- ...
- 01:39 PM Bug #24162 (Resolved): control-c on ceph cli leads to segv
- 01:38 PM Bug #24219 (Resolved): osd: InProgressOp freed by on_change(); in-flight op may use-after-free in...
- 01:36 PM Bug #24321 (Resolved): assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max...
- ...
- 01:29 PM Bug #24320 (Resolved): out of order reply and/or osd assert with set-chunks-read.yaml
- ...
- 02:00 AM Bug #23614 (Pending Backport): local_reserver double-reservation of backfilled pg
- 01:59 AM Bug #23490 (Duplicate): luminous: osd: double recovery reservation for PG when EIO injected (whil...
- 01:25 AM Bug #23352: osd: segfaults under normal operation
- Thanks,
That gives us seven cores across 12.2.4-12.2.5 on Xenial and Centos and one core from the MMgrReport::enco... - 12:35 AM Bug #23431 (Duplicate): OSD Segmentation fault in thread_name:safe_timer
- Closing as a duplicate of #23352 where we are focussing.
- 12:33 AM Bug #23564: OSD Segfaults
- Since the stack from this core is the following can we also close this as a duplicate of 23352?
(gdb) bt
#0 0x00... - 12:31 AM Bug #23585: osd: safe_timer segfault
- Alex,
Can we close this bug also as a duplicate of 23352? - 12:28 AM Bug #24023: Segfault on OSD in 12.2.5
- Alex,
Why are we running multiple trackers for the same issue?
Can we close this as a duplicate?
05/25/2018
- 10:25 PM Bug #23614 (Fix Under Review): local_reserver double-reservation of backfilled pg
- Explanation of the problem and resolution included in the pull request.
https://github.com/ceph/ceph/pull/22255 - 10:06 PM Bug #24219 (Pending Backport): osd: InProgressOp freed by on_change(); in-flight op may use-after...
- 09:25 PM Bug #24304 (Fix Under Review): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
- This is due to the fast-path decoding for object_stat_sum_t not being updated in the backport. Fix: https://github.co...
- 04:22 PM Bug #24304 (Closed): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
- This appears to be specific to a downstream build, closing.
- 12:29 PM Bug #24304 (Resolved): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
- ...
- 03:08 PM Backport #24297 (Resolved): mimic: RocksDB compression is not supported at least on Debian.
- 11:03 AM Backport #24297 (Resolved): mimic: RocksDB compression is not supported at least on Debian.
- https://github.com/ceph/ceph/pull/22183
- 03:06 PM Bug #24023: Segfault on OSD in 12.2.5
- ALso posted this in bug http://tracker.ceph.com/issues/23352
Hi Brad, we had one too just now, core dump and log:
... - 08:04 AM Bug #24023: Segfault on OSD in 12.2.5
- hi,
i've noticed similar/same segfault on my deployment. random segfaults on random osds appears under load or wit... - 03:05 PM Bug #23352: osd: segfaults under normal operation
- Hi Brad, we had one too just now, core dump and log:
https://drive.google.com/open?id=1t1jfjqwjhUUBzWjxamos3Hr7ghj... - 07:54 AM Bug #23352: osd: segfaults under normal operation
- Thanks Beom-Seok,
I've set up a centos environment to debug those cores along with the Xenial ones. I will update ... - 03:11 AM Bug #23352: osd: segfaults under normal operation
- Today two osd crashes.
coredump at:
https://drive.google.com/open?id=1rXtW0riZMBwP5OqrJ7QdRIOAsKFr-kYw
https://d... - 02:10 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
- https://github.com/ceph/ceph/pull/22126 merged to remove failures from rgw suite. moving to rados project
- 12:28 PM Backport #24259 (Resolved): mimic: crush device class: Monitor Crash when moving Bucket into Defa...
- 11:03 AM Backport #24294 (Resolved): mimic: control-c on ceph cli leads to segv
- https://github.com/ceph/ceph/pull/22225
- 11:03 AM Backport #24293 (Resolved): jewel: mon: slow op on log message
- https://github.com/ceph/ceph/pull/22431
- 11:03 AM Backport #24292 (Resolved): mimic: common: JSON output from rados bench write has typo in max_lat...
- https://github.com/ceph/ceph/pull/22406
- 11:03 AM Backport #24291 (Resolved): jewel: common: JSON output from rados bench write has typo in max_lat...
- https://github.com/ceph/ceph/pull/22407
- 11:03 AM Backport #24290 (Resolved): luminous: common: JSON output from rados bench write has typo in max_...
- https://github.com/ceph/ceph/pull/22391
- 03:47 AM Bug #24045 (Resolved): Eviction still raced with scrub due to preemption
- 03:47 AM Bug #22881 (Resolved): scrub interaction with HEAD boundaries and snapmapper repair is broken
- 03:46 AM Backport #24016 (Resolved): luminous: scrub interaction with HEAD boundaries and snapmapper repai...
- 03:43 AM Backport #23863 (Resolved): luminous: scrub interaction with HEAD boundaries and clones is broken
- 03:39 AM Backport #24153 (Resolved): luminous: Eviction still raced with scrub due to preemption
- 03:38 AM Bug #23267 (Resolved): scrub errors not cleared on replicas can cause inconsistent pg state when ...
- 03:37 AM Backport #23486 (Resolved): jewel: scrub errors not cleared on replicas can cause inconsistent pg...
- 03:30 AM Bug #23811: RADOS stat slow for some objects on same OSD
- ...
05/24/2018
- 08:41 PM Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica tak...
- merged https://github.com/ceph/ceph/pull/21194
- 08:38 PM Backport #23316: jewel: pool create cmd's expected_num_objects is not correctly interpreted
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22050
merged - 08:38 PM Backport #23316: jewel: pool create cmd's expected_num_objects is not correctly interpreted
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22050
merged - 08:37 PM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
- merged https://github.com/ceph/ceph/pull/22188
- 08:36 PM Bug #23769: osd/EC: slow/hung ops in multimds suite test
- jewel backport PR https://github.com/ceph/ceph/pull/22189 merged
- 06:07 PM Bug #24192: cluster [ERR] Corruption detected: object 2:f59d1934:::smithi14913526-5822:head is mi...
- ...
- 06:05 PM Bug #24199 (Pending Backport): common: JSON output from rados bench write has typo in max_latency...
- 06:03 PM Bug #24162 (Pending Backport): control-c on ceph cli leads to segv
- mimic backport https://github.com/ceph/ceph/pull/22225
- 05:59 PM Bug #23879: test_mon_osdmap_prune.sh fails
- /a/sage-2018-05-23_14:50:29-rados-wip-sage2-testing-2018-05-22-1410-distro-basic-smithi/2576533
- 03:40 PM Feature #24232: Add new command ceph mon status
- added a card to the backlog: https://trello.com/c/PTgwBpmx
- 01:27 PM Feature #24232: Add new command ceph mon status
- Sorry for the confusion, I did not check that we have ceph osd stat and ceph mon stat has the same purpose. I wanted ...
- 10:55 AM Feature #24232: Add new command ceph mon status
- copy/pasting from the PR opened to address this issue (https://github.com/ceph/ceph/pull/22202):...
- 01:44 PM Bug #24037 (Resolved): osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_nod...
- 01:42 PM Bug #24145: osdmap decode error in rados/standalone/*
- ...
- 01:39 PM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
- ...
- 12:08 PM Backport #24279 (In Progress): luminous: RocksDB compression is not supported at least on Debian.
- 12:08 PM Backport #24279 (Resolved): luminous: RocksDB compression is not supported at least on Debian.
- https://github.com/ceph/ceph/pull/22215
- 09:48 AM Bug #24025 (Pending Backport): RocksDB compression is not supported at least on Debian.
- 09:43 AM Bug #24025: RocksDB compression is not supported at least on Debian.
- tested...
- 08:22 AM Bug #23352: osd: segfaults under normal operation
- Hi Alex,
I notice there are several more coredumps attached to the related bug reports. Are they all separate cras... - 03:07 AM Bug #24264: ssd-primary crush rule not working as intended
- Sorry, here's my updated rule instead of the one in the document.
rule ssd-primary {
id 2
type r... - 03:05 AM Bug #24264 (Closed): ssd-primary crush rule not working as intended
- I've set up the rule according to the doc, but some of the PGs are still being assigned to the same host though my fa...
05/23/2018
- 09:36 PM Bug #23787 (Rejected): luminous: "osd-scrub-repair.sh'" failures in rados
- This is an incompatibility between the OSD version 64ffa817000d59d91379f7335439845930f58530 (luminous) and the versio...
- 06:40 PM Bug #22920 (Resolved): filestore journal replay does not guard omap operations
- 06:40 PM Backport #22934 (Resolved): luminous: filestore journal replay does not guard omap operations
- 06:35 PM Bug #23878 (Resolved): assert on pg upmap
- 06:34 PM Backport #23925 (Resolved): luminous: assert on pg upmap
- 06:32 PM Backport #24259 (Resolved): mimic: crush device class: Monitor Crash when moving Bucket into Defa...
- https://github.com/ceph/ceph/pull/22169
- 06:32 PM Backport #24258 (Resolved): luminous: crush device class: Monitor Crash when moving Bucket into D...
- https://github.com/ceph/ceph/pull/22381
- 06:32 PM Backport #24244 (New): jewel: osd/EC: slow/hung ops in multimds suite test
- 05:09 PM Backport #24244 (Resolved): jewel: osd/EC: slow/hung ops in multimds suite test
- https://github.com/ceph/ceph/pull/22189
partial backport for mdsmonitor - 06:31 PM Backport #24256 (Resolved): mimic: osd: Assertion `!node_algorithms::inited(this->priv_value_tra...
- https://github.com/ceph/ceph/pull/22160
- 06:31 PM Backport #24246 (Resolved): mimic: Manager daemon y is unresponsive during teuthology cluster tea...
- https://github.com/ceph/ceph/pull/22333
- 06:31 PM Backport #24245 (Resolved): luminous: Manager daemon y is unresponsive during teuthology cluster ...
- https://github.com/ceph/ceph/pull/22331
- 04:27 PM Bug #23352: osd: segfaults under normal operation
- Sage, I had tried to do this, but we don't know when these crashes would happen, just that they will occur. Random t...
- 04:10 PM Bug #23352 (Need More Info): osd: segfaults under normal operation
- Alex, how reproducible is this for you? Could you reproduce with debug timer = 20?
- 04:21 PM Backport #24058 (In Progress): jewel: Deleting a pool with active notify linger ops can result in...
- https://github.com/ceph/ceph/pull/22188
- 04:15 PM Bug #24243 (Resolved): osd: pg hard limit too easy to hit
- The default ratio of 2x mon_max_pg_per_osd is easy to hit for clusters that have differently weighted disks (e.g. 1 a...
- 03:27 PM Bug #24025: RocksDB compression is not supported at least on Debian.
- mimic: https://github.com/ceph/ceph/pull/22183
- 03:25 PM Bug #24025 (Fix Under Review): RocksDB compression is not supported at least on Debian.
- https://github.com/ceph/ceph/pull/22181
- 02:53 PM Bug #24025: RocksDB compression is not supported at least on Debian.
- because we fail to pass -DWITH_SNAPPY etc to cmake while building rocksdb. this bug also impacts rpm package. i can h...
- 01:51 PM Bug #24229 (Triaged): Libradosstriper successfully removes nonexistent objects instead of returni...
- 11:57 AM Bug #24242 (New): tcmalloc::ThreadCache::ReleaseToCentralCache on rhel (w/ centos packages)
- ...
- 11:43 AM Bug #24222 (Pending Backport): Manager daemon y is unresponsive during teuthology cluster teardown
- 08:41 AM Bug #23145: OSD crashes during recovery of EC pg
- osd in last peering stage will call pg_log.roll_forward(at last of PG::activate), is there possible the entry rollbf...
- 06:52 AM Bug #23386 (Pending Backport): crush device class: Monitor Crash when moving Bucket into Default ...
- https://github.com/ceph/ceph/pull/22169
- 01:21 AM Bug #24037 (Pending Backport): osd: Assertion `!node_algorithms::inited(this->priv_value_traits(...
05/22/2018
- 09:55 PM Bug #24222 (Fix Under Review): Manager daemon y is unresponsive during teuthology cluster teardown
- https://github.com/ceph/ceph/pull/22158
- 02:20 AM Bug #24222 (Resolved): Manager daemon y is unresponsive during teuthology cluster teardown
- ...
- 08:47 PM Feature #24232 (Fix Under Review): Add new command ceph mon status
- Add new command ceph mon status
For more information please check - https://tracker.ceph.com/issues/24217
Changed... - 08:32 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
- Josh Durgin wrote:
> Casey, could you or someone else familiar with rgw look through the logs for this and identify ... - 03:19 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
- Casey, could you or someone else familiar with rgw look through the logs for this and identify the relevant OSD reque...
- 07:17 PM Feature #24231 (New): librbd/libcephfs/librgw should ignore rados_mon/osd_op_timeouts options (re...
- librbd/libcephfs/librgw should ignore rados_mon/osd_op_timeouts options
https://bugzilla.redhat.com/show_bug.cgi?id=... - 04:09 PM Bug #24025 (In Progress): RocksDB compression is not supported at least on Debian.
- ...
- 03:48 PM Bug #24037 (Fix Under Review): osd: Assertion `!node_algorithms::inited(this->priv_value_traits(...
- https://github.com/ceph/ceph/pull/22156
- 02:35 PM Bug #24229 (Triaged): Libradosstriper successfully removes nonexistent objects instead of returni...
- libradosstriper remove() call on nonexistent objects returns zero instead of ENOENT.
Tested on luminous 12.2.5-1xe... - 11:35 AM Feature #24099: osd: Improve workflow when creating OSD on raw block device if there was bluestor...
> Point out that it found existing data on the OSD, and possibly suggest using `ceph-volume lvm zap` if that's what...- 10:51 AM Bug #24199 (Fix Under Review): common: JSON output from rados bench write has typo in max_latency...
- 07:00 AM Bug #23371: OSDs flaps when cluster network is made down
- we have not observed this behavior in kraken.
when ever the Cluster interface is made down, few OSDs which goes do... - 03:55 AM Bug #23352: osd: segfaults under normal operation
- OSD log attached
- 03:15 AM Bug #23352: osd: segfaults under normal operation
- It's an internal comment for others looking at this - though if you (Alex) have an osd log to go with the 'MMgrReport...
- 02:59 AM Bug #23352: osd: segfaults under normal operation
- Josh, is this something I can extract from the OSD node for you, or is this an internal comment?
- 01:10 AM Bug #23352: osd: segfaults under normal operation
- I put the core file from comment #14 and binaries from 12.2.5 in senta02:/slow/jdurgin/ceph/bugs/tracker_23352/2018-0...
- 03:49 AM Backport #24059 (In Progress): luminous: Deleting a pool with active notify linger ops can result...
- https://github.com/ceph/ceph/pull/22143
05/21/2018
- 10:04 PM Bug #24219: osd: InProgressOp freed by on_change(); in-flight op may use-after-free in op_commit()
- /a/teuthology-2018-05-21_20:00:50-powercycle-mimic-distro-basic-smithi/2563192
powercycle/osd/{clusters/3osd-1per-... - 09:40 PM Bug #24219 (Fix Under Review): osd: InProgressOp freed by on_change(); in-flight op may use-after...
- https://github.com/ceph/ceph/pull/22133
- 09:28 PM Bug #24219 (Resolved): osd: InProgressOp freed by on_change(); in-flight op may use-after-free in...
- ...
- 07:29 PM Bug #22330 (Need More Info): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- need to capture some logs...
- 07:15 PM Bug #23031: FAILED assert(!parent->get_log().get_missing().is_missing(soid))
- I hit this issue a couple of times while trying to reproduce #23614...
- 06:36 PM Backport #24200 (Resolved): mimic: PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
- 08:48 AM Backport #24200 (Resolved): mimic: PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
- 06:24 PM Bug #23386 (Fix Under Review): crush device class: Monitor Crash when moving Bucket into Default ...
- https://github.com/ceph/ceph/pull/22127
- 05:14 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- reproduces on luminous with...
- 01:52 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
- I suspect the recent pr https://github.com/ceph/ceph/pull/22091 fixed this, but figuring out how to reproduce to be s...
- 05:59 PM Bug #23965 (Fix Under Review): FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part...
- https://github.com/ceph/ceph/pull/22126 removes ec-cache pools from the rgw suite
- 04:55 PM Bug #22656: scrub mismatch on bytes (cache pools)
- http://qa-proxy.ceph.com/teuthology/dzafman-2018-05-18_11:33:31-rados-wip-zafman-testing-mimic-distro-basic-smithi/25...
- 04:21 PM Backport #22934: luminous: filestore journal replay does not guard omap operations
- Victor Denisov wrote:
> https://github.com/ceph/ceph/pull/21547
merged - 04:13 PM Backport #23925: luminous: assert on pg upmap
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21818
merged - 04:01 PM Backport #24213 (In Progress): mimic: Module 'balancer' has failed: could not find bucket -14
- 03:59 PM Backport #24213 (Resolved): mimic: Module 'balancer' has failed: could not find bucket -14
- https://github.com/ceph/ceph/pull/22120
- 03:59 PM Backport #24216 (Resolved): luminous: "process (unknown)" in ceph logs
- https://github.com/ceph/ceph/pull/22290
- 03:59 PM Backport #24215 (Resolved): mimic: "process (unknown)" in ceph logs
- https://github.com/ceph/ceph/pull/22311
- 03:59 PM Backport #24214 (Resolved): luminous: Module 'balancer' has failed: could not find bucket -14
- https://github.com/ceph/ceph/pull/22308
- 03:03 PM Bug #23585 (Triaged): osd: safe_timer segfault
- 02:17 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- We are experiencing this too. Majority of the OSDs went down. We tried removing the intervals. It works on some OSDs ...
- 01:44 PM Bug #24167: Module 'balancer' has failed: could not find bucket -14
- mimic backport: https://github.com/ceph/ceph/pull/22120
- 01:42 PM Bug #24167 (Pending Backport): Module 'balancer' has failed: could not find bucket -14
- 01:00 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
- Hi.
We have the same issue: ... - 12:07 PM Bug #24123 (Pending Backport): "process (unknown)" in ceph logs
- 09:50 AM Backport #24048 (In Progress): luminous: pg-upmap cannot balance in some case
- https://github.com/ceph/ceph/pull/22115
- 09:43 AM Bug #24199: common: JSON output from rados bench write has typo in max_latency key
- PR: https://github.com/ceph/ceph/pull/22112
- 06:23 AM Bug #24199 (Resolved): common: JSON output from rados bench write has typo in max_latency key
- The JSON output from `rados bench write --format json/json-pretty` has a typo in the `max_latency` key.
It contains ... - 08:48 AM Backport #24204 (Resolved): mimic: LibRadosMiscPool.PoolCreationRace segv
- https://github.com/ceph/ceph/pull/22291
- 08:43 AM Bug #24174: PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
- mimic: https://github.com/ceph/ceph/pull/22113
- 08:41 AM Bug #24174 (Pending Backport): PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
- 07:11 AM Bug #24076 (Duplicate): rados/test.sh fails in "bin/ceph_test_rados_api_misc --gtest_filter=*Pool...
- 06:24 AM Backport #24198 (In Progress): luminous: mon: slow op on log message
- 06:23 AM Backport #24198 (Resolved): luminous: mon: slow op on log message
- https://github.com/ceph/ceph/pull/22109
- 06:20 AM Backport #24195 (Resolved): mimic: mon: slow op on log message
- 02:51 AM Bug #20924: osd: leaked Session on osd.7
- osd.4
/a/sage-2018-05-20_18:11:15-rados-wip-sage3-testing-2018-05-20-1031-distro-basic-smithi/2558319
rados/ver... - 02:24 AM Bug #24150 (Pending Backport): LibRadosMiscPool.PoolCreationRace segv
05/20/2018
- 06:58 PM Bug #18239 (Duplicate): nan in ceph osd df again
- 10:32 AM Bug #24023: Segfault on OSD in 12.2.5
- Alexander M wrote:
> Alex Gorbachev wrote:
> > This continues to happen every day, usually during scrub
>
> I've... - 10:30 AM Bug #24023: Segfault on OSD in 12.2.5
- Alex Gorbachev wrote:
> This continues to happen every day, usually during scrub
I've faced with the same issue
... - 09:45 AM Backport #24195 (In Progress): mimic: mon: slow op on log message
- https://github.com/ceph/ceph/pull/22104
- 09:42 AM Backport #24195 (Resolved): mimic: mon: slow op on log message
- 09:40 AM Bug #24180 (Pending Backport): mon: slow op on log message
05/19/2018
- 07:04 PM Bug #24192 (Duplicate): cluster [ERR] Corruption detected: object 2:f59d1934:::smithi14913526-582...
davidz@teuthology:/a/dzafman-2018-05-18_11:36:58-rados-wip-zafman-testing-distro-basic-smithi/2549009...
05/18/2018
- 08:45 PM Bug #24180: mon: slow op on log message
- https://github.com/ceph/ceph/pull/22098
- 08:44 PM Bug #24180 (Fix Under Review): mon: slow op on log message
- https://github.com/ceph/ceph/pull/22098
- 08:41 PM Bug #24180 (Resolved): mon: slow op on log message
- ...
- 08:37 PM Bug #20924: osd: leaked Session on osd.7
- osd.7
/a/sage-2018-05-18_16:20:24-rados-wip-sage-testing-2018-05-18-0817-distro-basic-smithi/2548324
rados/veri... - 02:26 PM Bug #20924: osd: leaked Session on osd.7
- osd.7
/a/sage-2018-05-18_13:08:19-rados-wip-sage2-testing-2018-05-17-0701-distro-basic-smithi/2546923
rados/ver... - 08:16 PM Backport #24149 (Resolved): mimic: Eviction still raced with scrub due to preemption
- 07:24 PM Bug #24162 (Fix Under Review): control-c on ceph cli leads to segv
- hacky workaround: https://github.com/ceph/ceph/pull/22093
- 07:18 PM Bug #24162: control-c on ceph cli leads to segv
- ...
- 07:09 PM Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value...
- related?...
- 01:26 PM Bug #24037 (In Progress): osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_...
- 01:15 PM Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value...
- Scenario I can see after static analysis:
1. An instance of `TrackedOp` in `STATE_LIVE` is being dereferenced - th... - 06:59 PM Bug #23352: osd: segfaults under normal operation
- The latest ones look like this, below.
Crash dump at https://drive.google.com/open?id=12v95-TCHlkrBZ16ni5UkhYkXRt... - 06:41 PM Bug #23352: osd: segfaults under normal operation
- For some reason we are also seeing more of these happening, simultaneous failures and recoveries are occurring during...
- 02:36 AM Bug #23352: osd: segfaults under normal operation
- I run into this issue with 12.2.5, it affects cluster stability heavily.
- 06:12 PM Bug #24167 (Fix Under Review): Module 'balancer' has failed: could not find bucket -14
- https://github.com/ceph/ceph/pull/22091
- 05:02 PM Feature #24176 (Resolved): osd: add command to drop OSD cache
- Idea here is to basically make it possible for performance testing on the same data set in RADOS without restarting t...
- 04:24 PM Feature #22420 (Resolved): Add support for obtaining a list of available compression options
- 04:04 PM Bug #23487 (Resolved): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- 04:04 PM Backport #23668 (Resolved): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrites'...
- 04:03 PM Bug #23664 (Resolved): cache-try-flush hits wrlock, busy loops
- 04:03 PM Backport #23914 (Resolved): luminous: cache-try-flush hits wrlock, busy loops
- 04:02 PM Bug #23860 (Resolved): luminous->master: luminous crashes with AllReplicasRecovered in Started/Pr...
- 04:02 PM Backport #23988 (Resolved): luminous: luminous->master: luminous crashes with AllReplicasRecovere...
- 04:02 PM Bug #23980 (Resolved): UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap const&)
- 04:01 PM Backport #24015 (Resolved): luminous: UninitCondition in PG::RecoveryState::Incomplete::react(PG:...
- 02:30 PM Backport #24135 (Resolved): mimic: Add support for obtaining a list of available compression options
- 02:25 PM Bug #24174: PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
- https://github.com/ceph/ceph/pull/22084
- 02:24 PM Bug #24174 (Resolved): PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
- ...
05/17/2018
- 10:22 PM Bug #24167: Module 'balancer' has failed: could not find bucket -14
- It looks like we also don't create weight-sets for new buckets. And if you create buckets and move things into them ...
- 09:58 PM Bug #24167 (Resolved): Module 'balancer' has failed: could not find bucket -14
- crushmap may contain choose_args for deleted buckets...
- 05:39 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
- 03:52 PM Bug #23763 (Resolved): upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
- 03:52 PM Backport #23808 (Resolved): luminous: upgrade: bad pg num and stale health status in mixed lumnio...
- 03:42 PM Backport #23808: luminous: upgrade: bad pg num and stale health status in mixed lumnious/mimic cl...
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21556
merged - 03:45 PM Bug #24162 (Resolved): control-c on ceph cli leads to segv
- ...
- 03:43 PM Backport #23668: luminous: There is no 'ceph osd pool get erasure allow_ec_overwrites' command
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21378
merged - 03:42 PM Backport #23914: luminous: cache-try-flush hits wrlock, busy loops
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21764
merged - 03:41 PM Backport #23988: luminous: luminous->master: luminous crashes with AllReplicasRecovered in Starte...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21964
merged - 03:40 PM Backport #23988: luminous: luminous->master: luminous crashes with AllReplicasRecovered in Starte...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21964
merged - 03:38 PM Backport #24015: luminous: UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap con...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21993
merged - 01:55 PM Backport #23786 (Resolved): luminous: "utilities/env_librados.cc:175:33: error: unused parameter ...
- 01:55 PM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 01:50 PM Bug #23145: OSD crashes during recovery of EC pg
- Peter Woodman wrote:
> Each OSD is on its own host- these are small arm64 machines. Unfortunately i've already tried... - 11:49 AM Bug #24159 (Duplicate): Monitor down when large store data needs to compact triggered by ceph tel...
- 10:38 AM Bug #24159 (Duplicate): Monitor down when large store data needs to compact triggered by ceph tel...
- I have met a monitor problem with capacity too large in our production environment.
This logical volume for monito... - 10:38 AM Bug #24160 (Resolved): Monitor down when large store data needs to compact triggered by ceph tell...
- I have met a monitor problem with capacity too large in our production environment.
This logical volume for monito... - 09:04 AM Bug #23598 (Duplicate): hammer->jewel: ceph_test_rados crashes during radosbench task in jewel ra...
- #23290 does not contain any of the PR mentioned above. so it's not a regression.
- 08:33 AM Backport #24153 (In Progress): luminous: Eviction still raced with scrub due to preemption
- 08:33 AM Backport #24149 (In Progress): mimic: Eviction still raced with scrub due to preemption
- 08:33 AM Backport #24149 (New): mimic: Eviction still raced with scrub due to preemption
- 08:27 AM Bug #23962 (Resolved): ceph_daemon.py format_dimless units list index out of range
- 08:26 AM Bug #24000 (Resolved): mon: snap delete on deleted pool returns 0 without proper payload
- 08:25 AM Bug #23899 (Resolved): run cmd 'ceph daemon osd.0 smart' cause osd daemon Segmentation fault
- 07:37 AM Backport #23316 (In Progress): jewel: pool create cmd's expected_num_objects is not correctly int...
05/16/2018
- 10:29 PM Backport #24153: luminous: Eviction still raced with scrub due to preemption
- I'm pulling in these pull requests also on top of existing pull request (not yet merged) https://github.com/ceph/ceph...
- 10:26 PM Backport #24153 (Resolved): luminous: Eviction still raced with scrub due to preemption
- https://github.com/ceph/ceph/pull/22044
- 08:52 PM Bug #24150 (Fix Under Review): LibRadosMiscPool.PoolCreationRace segv
- https://github.com/ceph/ceph/pull/22042
- 08:51 PM Bug #24150 (Resolved): LibRadosMiscPool.PoolCreationRace segv
- ...
- 08:36 PM Backport #24149 (Resolved): mimic: Eviction still raced with scrub due to preemption
- https://github.com/ceph/ceph/pull/22041
- 08:28 PM Bug #24045 (Pending Backport): Eviction still raced with scrub due to preemption
- 07:31 PM Bug #24148: Segmentation fault out of ObcLockManager::get_lock_type()
- The pg 3.3 involved here was never scrubbed, so unrelated to my changes.
- 07:16 PM Bug #24148 (Duplicate): Segmentation fault out of ObcLockManager::get_lock_type()
teuthology:/a/dzafman-2018-05-16_09:57:45-rados:thrash-wip-zafman-testing-distro-basic-smithi/2539708
remote/smi...- 06:53 PM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
- kobi ginon wrote:
> Note: i still believe there is a relation to rocksdb somehow and the clearing of disk's forces t... - 03:52 PM Backport #24027 (Resolved): mimic: ceph_daemon.py format_dimless units list index out of range
- 03:51 PM Backport #24103 (Resolved): mimic: mon: snap delete on deleted pool returns 0 without proper payload
- 03:50 PM Backport #24104 (Resolved): mimic: run cmd 'ceph daemon osd.0 smart' cause osd daemon Segmentatio...
- 03:49 PM Bug #24145 (Duplicate): osdmap decode error in rados/standalone/*
- ...
- 12:03 PM Feature #24099: osd: Improve workflow when creating OSD on raw block device if there was bluestor...
- This is not a ceph-volume issue, the description of this issue doesn't point to a ceph-volume operation, but rather, ...
05/15/2018
- 10:58 PM Bug #23145: OSD crashes during recovery of EC pg
- Each OSD is on its own host- these are small arm64 machines. Unfortunately i've already tried stopping osd6, it just ...
- 10:37 PM Bug #23145: OSD crashes during recovery of EC pg
- Hmm, it's possible that if you stop osd.6 that this PG will be able to peer with the remaining OSDs... want to give i...
- 10:34 PM Bug #23145: OSD crashes during recovery of EC pg
- Peter Woodman wrote:
> For the record, I discovered recently that a number of OSDs were operating with write caching... - 10:33 PM Bug #23145: OSD crashes during recovery of EC pg
- Hmm, I think the problem comes before that. This is problematic:...
- 10:20 PM Bug #23145: OSD crashes during recovery of EC pg
- For the record, I discovered recently that a number of OSDs were operating with write caching enabled, and because th...
- 10:15 PM Bug #23145: OSD crashes during recovery of EC pg
- This code appears to be the culprit, at least in this case:...
- 02:48 PM Bug #24023: Segfault on OSD in 12.2.5
- This continues to happen every day, usually during scrub
- 01:15 PM Backport #24135 (In Progress): mimic: Add support for obtaining a list of available compression o...
- 01:13 PM Backport #24135 (Resolved): mimic: Add support for obtaining a list of available compression options
- https://github.com/ceph/ceph/pull/22004
- 12:29 PM Feature #22448 (Resolved): Visibility for snap trim queue length
- Already merged to master, luminous and jewel.
- 12:28 PM Backport #22449 (Resolved): jewel: Visibility for snap trim queue length
- 10:44 AM Bug #23767: "ceph ping mon" doesn't work
- Confirmed on my cluster (13.0.2-1969-g49365c7).
- 10:37 AM Fix #24126: ceph osd purge command error message improvement
- How are you seeing that ugly logfile style output? When I try it, it looks like this:...
- 10:32 AM Feature #24127: "osd purge" should print more helpful message when daemon is up
- This is completely reasonable as a general point, but not really actionable as a tracker ticket -- we aren't ever goi...
- 10:31 AM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
- I can't post using ceph-post-file, so I uploaded file here https://eocloud.eu:8080/swift/v1/rwadolowski/ceph-osd.33.l...
- 06:31 AM Bug #24007: rados.connect get a segmentation fault
- John Spray wrote:
> Is there a backtrace or any other message from the crash?
there are many different backtraces. - 03:15 AM Backport #24015 (In Progress): luminous: UninitCondition in PG::RecoveryState::Incomplete::react(...
- https://github.com/ceph/ceph/pull/21993
05/14/2018
- 09:59 PM Bug #23145: OSD crashes during recovery of EC pg
- happy to see action on this ticket. for the record, i still have the data for this pg.
- 09:38 PM Bug #22837 (Resolved): discover_all_missing() not always called during activating
- 09:36 PM Bug #23576 (Can't reproduce): osd: active+clean+inconsistent pg will not scrub or repair
- 06:12 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- Sorry for the lack of updates, there were no messages of any sort in the logs when attempting to deep scrub or repair...
- 04:26 PM Bug #23576: osd: active+clean+inconsistent pg will not scrub or repair
- No, I never had that message in any of our logs. After a month the PGs ran their own deep-scrub again and I was able...
- 09:29 PM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
- Hi again
indeed your method also works
in my simple test i just cleared 2 GB out of the disk
before zap setting ... - 08:45 PM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
- Hi Jon , thanks a lot for the reply
i'm fighting with issue for a day now, and i have a very strange observation
... - 08:01 PM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
- I think I ran into the same thing last week reusing an OSD disk. I did a dd of /dev/zero to the disk for ~10-15 minu...
- 02:38 AM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
- Hi all
i m using the following version ceph-12.2.2-0.el7.x86_64.
it seem's that even with dd of 100MB or 110MB
i s... - 07:22 PM Feature #24127 (New): "osd purge" should print more helpful message when daemon is up
- Compilers like GCC and clang are sometimes able to make suggestions when a user makes certain
common mistakes for wh... - 07:19 PM Fix #24126 (New): ceph osd purge command error message improvement
- In response to the command "ceph osd purge 1 --yes-i-really-mean-it", we
get:
2018-05-10 15:18:03.444 7f29c0ae2700... - 06:54 PM Bug #24123 (Fix Under Review): "process (unknown)" in ceph logs
- PR: https://github.com/ceph/ceph/pull/21985
- 06:47 PM Bug #24123 (Resolved): "process (unknown)" in ceph logs
- get_process_name from libcommon was broken when cleaning up headers (95fc248). As a result we don't log process name ...
- 02:44 PM Bug #24007: rados.connect get a segmentation fault
- Is there a backtrace or any other message from the crash?
- 10:42 AM Bug #24077 (Resolved): test_pool_create_fail (tasks.mgr.dashboard.test_pool.PoolTest) fails
- 08:05 AM Backport #23912 (In Progress): luminous: mon: High MON cpu usage when cluster is changing
- 06:52 AM Backport #23912: luminous: mon: High MON cpu usage when cluster is changing
- -http://tracker.ceph.com/issues/23912-
https://github.com/ceph/ceph/pull/21968 - 04:20 AM Backport #23988 (In Progress): luminous: luminous->master: luminous crashes with AllReplicasRecov...
- https://github.com/ceph/ceph/pull/21964
Also available in: Atom