Project

General

Profile

Activity

From 05/15/2018 to 06/13/2018

06/13/2018

10:01 PM Backport #24198: luminous: mon: slow op on log message
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22109
merged
Yuri Weinstein
10:00 PM Backport #24216: luminous: "process (unknown)" in ceph logs
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22290
merged
Yuri Weinstein
09:59 PM Backport #24214: luminous: Module 'balancer' has failed: could not find bucket -14
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22308
merged
Yuri Weinstein
08:13 PM Bug #24515 (New): "[WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, mon.c ha...
This seems to be rhel specific
Run: http://pulpito.ceph.com/yuriw-2018-06-12_21:09:43-fs-master-distro-basic-smith...
Yuri Weinstein
05:19 PM Bug #23966 (Resolved): Deleting a pool with active notify linger ops can result in seg fault
Nathan Cutler
05:19 PM Backport #24059 (Resolved): luminous: Deleting a pool with active notify linger ops can result in...
Nathan Cutler
04:46 PM Backport #24468 (In Progress): mimic: tell ... config rm <foo> not idempotent
Nathan Cutler
04:35 PM Backport #24245 (Resolved): luminous: Manager daemon y is unresponsive during teuthology cluster ...
Nathan Cutler
04:34 PM Backport #24374 (Resolved): luminous: mon: auto compaction on rocksdb should kick in more often
Nathan Cutler
12:56 PM Bug #24511 (Duplicate): osd crushed at thread_name:safe_timer
h1. ENV
*ceph version*...
Lei Liu
11:29 AM Bug #23049: ceph Status shows only WARN when traffic to cluster fails
hi,
which is the expected fix release version?
Thanks,
Nokia ceph-users
10:16 AM Backport #24501 (In Progress): luminous: osd: eternal stuck PG in 'unfound_recovery'
Nathan Cutler
10:16 AM Backport #24500 (In Progress): mimic: osd: eternal stuck PG in 'unfound_recovery'
Nathan Cutler

06/12/2018

08:01 AM Backport #24501 (Resolved): luminous: osd: eternal stuck PG in 'unfound_recovery'
https://github.com/ceph/ceph/pull/22546 Nathan Cutler
08:01 AM Backport #24500 (Resolved): mimic: osd: eternal stuck PG in 'unfound_recovery'
https://github.com/ceph/ceph/pull/22545 Nathan Cutler
08:00 AM Backport #24495 (Resolved): luminous: osd: segv in Session::have_backoff
https://github.com/ceph/ceph/pull/22729 Nathan Cutler
08:00 AM Backport #24494 (Resolved): mimic: osd: segv in Session::have_backoff
https://github.com/ceph/ceph/pull/22730 Nathan Cutler
03:22 AM Bug #24486 (Pending Backport): osd: segv in Session::have_backoff
Sage Weil

06/11/2018

09:32 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I am going to add this test for upgrade as well, steps to recreate... Vasu Kulkarni
04:19 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I have also experienced this issue while continuing the Bluestore conversion of OSDs on my Ceph cluster, after carryi... Gavin Baker
02:16 PM Backport #24059: luminous: Deleting a pool with active notify linger ops can result in seg fault
Casey Bodley wrote:
> https://github.com/ceph/ceph/pull/22143
merged
Yuri Weinstein
02:33 AM Bug #24487: osd: choose_acting loop
It looks like the "choose_async_recovery_ec candidates by cost are: 178,2(0)" line is different in the second case.. ... Sage Weil
01:45 AM Bug #24487 (Resolved): osd: choose_acting loop
ec pg looping between [2,3,0,1] and [-,3,0,1].
osd.3 says...
Sage Weil

06/10/2018

06:41 PM Bug #24486 (Fix Under Review): osd: segv in Session::have_backoff
https://github.com/ceph/ceph/pull/22497 Sage Weil
06:34 PM Bug #24486 (Resolved): osd: segv in Session::have_backoff
... Sage Weil
04:41 PM Bug #24485 (Resolved): LibRadosTwoPoolsPP.ManifestUnset failure
... Sage Weil
03:30 PM Bug #24484 (Fix Under Review): osdc: wrong offset in BufferHead
Kefu Chai
03:15 PM Bug #24484: osdc: wrong offset in BufferHead
this bug will lead to an exception "buffer::end_of_buffer" which is thrown in function "buffer::list::substr_of"
Thi...
dongdong tao
03:08 PM Bug #24484: osdc: wrong offset in BufferHead
PR: https://github.com/ceph/ceph/pull/22495 dongdong tao
03:07 PM Bug #24484 (Resolved): osdc: wrong offset in BufferHead
The offset of BufferHead should be "opos - bh->start()" dongdong tao
02:12 AM Backport #24329 (In Progress): mimic: assert manager.get_num_active_clean() == pg_num on rados/si...
Kefu Chai

06/09/2018

07:21 PM Bug #24321 (Pending Backport): assert manager.get_num_active_clean() == pg_num on rados/singleton...
Sage Weil
05:56 AM Bug #24321 (Fix Under Review): assert manager.get_num_active_clean() == pg_num on rados/singleton...
https://github.com/ceph/ceph/pull/22485 Kefu Chai
06:50 PM Bug #22462: mon: unknown message type 1537 in luminous->mimic upgrade tests
Maybe i have the same issue during upgrade Jewel->Luminous http://tracker.ceph.com/issues/24481?next_issue_id=24480&p... Aleksandr Rudenko
02:23 PM Bug #24373 (Pending Backport): osd: eternal stuck PG in 'unfound_recovery'
Kefu Chai
11:20 AM Backport #24478 (Resolved): luminous: read object attrs failed at EC recovery
https://github.com/ceph/ceph/pull/24327 Nathan Cutler
11:18 AM Backport #24473 (Resolved): mimic: cosbench stuck at booting cosbench driver
https://github.com/ceph/ceph/pull/22887 Nathan Cutler
11:18 AM Backport #24472 (Resolved): mimic: Ceph-osd crash when activate SPDK
https://github.com/ceph/ceph/pull/22684 Nathan Cutler
11:18 AM Backport #24471 (Resolved): luminous: Ceph-osd crash when activate SPDK
https://github.com/ceph/ceph/pull/22686 Nathan Cutler
11:18 AM Backport #24468 (Resolved): mimic: tell ... config rm <foo> not idempotent
https://github.com/ceph/ceph/pull/22552 Nathan Cutler
06:07 AM Bug #24452 (Resolved): Backfill hangs in a test case in master not mimic
Kefu Chai

06/08/2018

11:03 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I can't reproduce this on any new Mimic cluster, it only happens on clusters upgraded from Luminous (which is why we ... Paul Emmerich
09:04 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I'm trying to make new OSDs with ceph-volume osd create --dmcrypt --bluestore --data /dev/sdg and am getting the same... Michael Sudnick
07:05 PM Bug #24454 (Duplicate): failed to recover before timeout expired
#24452 Sage Weil
12:29 PM Bug #24454 (Duplicate): failed to recover before timeout expired
tons of this on current master
http://pulpito.ceph.com/kchai-2018-06-06_04:56:43-rados-wip-kefu-testing-2018-06-06...
Sage Weil
07:05 PM Bug #24452 (Fix Under Review): Backfill hangs in a test case in master not mimic
https://github.com/ceph/ceph/pull/22478 Sage Weil
02:48 PM Bug #24452: Backfill hangs in a test case in master not mimic

Final messages on primary during backfill about pg 1.0....
David Zafman
04:57 AM Bug #24452 (Resolved): Backfill hangs in a test case in master not mimic

../qa/run-standalone.sh "osd-backfill-stats.sh TEST_backfill_down_out" 2>&1 | tee obs.log
This test times out wa...
David Zafman
02:34 PM Backport #23912: luminous: mon: High MON cpu usage when cluster is changing
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21968
merged
Yuri Weinstein
02:33 PM Backport #24245: luminous: Manager daemon y is unresponsive during teuthology cluster teardown
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22331
merged
Yuri Weinstein
02:31 PM Backport #24374: luminous: mon: auto compaction on rocksdb should kick in more often
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22360
merged
Yuri Weinstein
08:18 AM Bug #23352: osd: segfaults under normal operation
Experiencing the a safe_timer segfault with a freshly deployed cluster. No data on the cluster yet. Just an empty poo... Vangelis Tasoulas

06/07/2018

03:20 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
We are also seeing this when creating OSDs with IDs that existed previously.
I verified that the old osd was delet...
Paul Emmerich
01:21 PM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
https://github.com/ceph/ceph/pull/22456 Sage Weil
01:14 PM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
Okay, I see the problem. Two fixes: first, reset every pg on down->up (simpler approach), but the bigger issue is th... Sage Weil
12:58 PM Bug #24450: OSD Caught signal (Aborted)
I have the same problem.
http://tracker.ceph.com/issues/24423
Sergey Malinin
12:03 PM Bug #24450 (Duplicate): OSD Caught signal (Aborted)
Hi,
I have done a rolling_upgrade to mimic with ceph-ansible. It works perfect! Now, I want to deploy new OSDs, bu...
Peter Schulz
11:46 AM Bug #24448 (Won't Fix): (Filestore) ABRT report for package ceph has reached 10 occurrences
https://retrace.fedoraproject.org/faf/reports/bthash/fe768f98e5fff65f0c850668c4bdae8d4da7e086/
https://retrace.fedor...
Kaleb KEITHLEY

06/06/2018

09:11 PM Bug #24264 (Closed): ssd-primary crush rule not working as intended
I don't think there's a good way to express that requirement in the current crush language. The rule in the docs does... Josh Durgin
09:06 PM Bug #24362 (Triaged): ceph-objectstore-tool incorrectly invokes crush_location_hook
Seems like the way to fix this is to stop ceph-objectstore-tool from trying to use the crush location hook at all.
...
Josh Durgin
07:15 AM Bug #23145: OSD crashes during recovery of EC pg
-3> 2018-06-06 15:00:40.462930 7fffddb25700 -1 bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2... Yong Wang
02:45 AM Bug #23145: OSD crashes during recovery of EC pg
@Sage Weil
@Zengran Zahng
we meet the some question, and osd crash not recover until now.
env is 12.2.5 ec 2+1 b...
Yong Wang
06:02 AM Backport #24293 (In Progress): jewel: mon: slow op on log message
https://github.com/ceph/ceph/pull/22431 Prashant D
02:34 AM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
Attached full log (download ceph-osd.3.log.gz).
Points are:...
Kouya Shimura
12:33 AM Bug #24371 (Pending Backport): Ceph-osd crash when activate SPDK
Kefu Chai

06/05/2018

05:34 PM Bug #24365 (Pending Backport): cosbench stuck at booting cosbench driver
Neha Ojha
01:33 AM Bug #24365 (Fix Under Review): cosbench stuck at booting cosbench driver
https://github.com/ceph/ceph/pull/22405 Neha Ojha
04:04 PM Bug #24408 (Pending Backport): tell ... config rm <foo> not idempotent
Kefu Chai
11:00 AM Bug #24423 (Resolved): failed to load OSD map for epoch X, got 0 bytes
After upgrading to Mimic I deleted a non-lvm OSD and recreated it with 'ceph-volume lvm prepare --bluestore --data /d... Sergey Malinin
10:37 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
the same to https://tracker.ceph.com/issues/21475. and i already modify bluestore_deferred_throttle_bytes = 0
bluest...
鹏 张
10:31 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
2018-06-05T17:46:28.273183+08:00 node54 ceph-osd: /work/build/rpmbuild/BUILD/infinity-3.2.5/src/os/bluestore/BlueStor... 鹏 张
10:31 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
鹏 张 wrote:
> ceph version: 12.2.5
> data pool use Ec module 2 + 1.
> When restart one osd,it case crash and restar...
鹏 张
10:26 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
1.-45> 2018-06-05 17:47:56.886142 7f8972974700 -1 bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2)... 鹏 张
10:25 AM Bug #24422 (Duplicate): Ceph OSDs crashing in BlueStore::queue_transactions() using EC
ceph version: 12.2.5
data pool use Ec module 3 + 1.
When restart one osd,it case crash and restart more and more.
...
鹏 张
04:42 AM Bug #24419 (Won't Fix): ceph-objectstore-tool unable to open mon store
Hi,everyone;
I use luminous v12.2.5,and i try to recovery monitor database from osds,
I perform step by step acc...
dovefi Z
03:32 AM Backport #24291 (In Progress): jewel: common: JSON output from rados bench write has typo in max_...
https://github.com/ceph/ceph/pull/22407 Prashant D
02:37 AM Bug #23875: Removal of snapshot with corrupt replica crashes osd

If update_snap_map() ignores the error from remove_oid() we still crash because an op from the primary related to...
David Zafman
02:20 AM Backport #24292 (In Progress): mimic: common: JSON output from rados bench write has typo in max_...
https://github.com/ceph/ceph/pull/22406 Prashant D

06/04/2018

06:32 PM Bug #24368: osd: should not restart on permanent failures
It would, but the previous settings were there for a reason so I'm not sure if it's feasible to backport this for cep... Greg Farnum
05:10 PM Bug #24371 (Fix Under Review): Ceph-osd crash when activate SPDK
Greg Farnum
04:00 PM Bug #24408 (Fix Under Review): tell ... config rm <foo> not idempotent
https://github.com/ceph/ceph/pull/22395 Sage Weil
03:56 PM Bug #24408 (Resolved): tell ... config rm <foo> not idempotent
... Sage Weil
02:56 PM Backport #24407 (In Progress): mimic: read object attrs failed at EC recovery
Kefu Chai
02:56 PM Backport #24407 (Resolved): mimic: read object attrs failed at EC recovery
https://github.com/ceph/ceph/pull/22394 Kefu Chai
02:54 PM Bug #24406 (Resolved): read object attrs failed at EC recovery
https://github.com/ceph/ceph/pull/22196 Kefu Chai
02:18 PM Backport #24290 (In Progress): luminous: common: JSON output from rados bench write has typo in m...
https://github.com/ceph/ceph/pull/22391 Prashant D
11:53 AM Bug #24366 (Pending Backport): omap_digest handling still not correct
Kefu Chai
06:27 AM Bug #23352: osd: segfaults under normal operation
Looking at the crash in http://tracker.ceph.com/issues/23352#note-14 there's a fairly glaring problem.... Brad Hubbard
12:14 AM Bug #23352: osd: segfaults under normal operation
Hi Kjetil,
Sure, worth a look, but AFAICT all access is protected by SafeTimers locks.
Brad Hubbard
02:08 AM Backport #24258 (In Progress): luminous: crush device class: Monitor Crash when moving Bucket int...
https://github.com/ceph/ceph/pull/22381 Prashant D

06/02/2018

12:04 AM Bug #24365 (In Progress): cosbench stuck at booting cosbench driver
Two things caused this issue:
1. cosbench requires openjdk-8. The cbt task does install this dependency, but we al...
Neha Ojha

06/01/2018

08:05 PM Bug #23352: osd: segfaults under normal operation
Brad Hubbard wrote:
> I've confirmed that in all of the SafeTimer segfaults the 'schedule' multimap is empty, indica...
Kjetil Joergensen
06:01 PM Bug #24368: osd: should not restart on permanent failures
Sounds like something that would be useful in our stable releases - Greg, do you agree? Nathan Cutler
05:56 PM Backport #24360 (Need More Info): luminous: osd: leaked Session on osd.7
Do Not Backport For Now
see https://github.com/ceph/ceph/pull/22339#issuecomment-393574371 for details
Nathan Cutler
05:44 PM Backport #24383 (Resolved): mimic: osd: stray osds in async_recovery_targets cause out of order ops
https://github.com/ceph/ceph/pull/22889 Nathan Cutler
05:28 PM Backport #24381 (Resolved): luminous: omap_digest handling still not correct
https://github.com/ceph/ceph/pull/22375 David Zafman
05:28 PM Backport #24380 (Resolved): mimic: omap_digest handling still not correct
https://github.com/ceph/ceph/pull/22374 David Zafman
08:02 AM Bug #24342: Monitor's routed_requests leak
Greg Farnum wrote:
> What version are you running? The MRoute handling is all pretty old; though we've certainly dis...
Xuehan Xu
07:16 AM Bug #24373 (Fix Under Review): osd: eternal stuck PG in 'unfound_recovery'
Mykola Golub
05:22 AM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
https://github.com/ceph/ceph/pull/22358
Kouya Shimura
04:57 AM Bug #24373 (Resolved): osd: eternal stuck PG in 'unfound_recovery'
A PG might be eternally stuck in 'unfound_recovery' after some OSDs are marked down.
For example, the following st...
Kouya Shimura
06:12 AM Backport #24375 (In Progress): mimic: mon: auto compaction on rocksdb should kick in more often
Kefu Chai
06:11 AM Backport #24375 (Resolved): mimic: mon: auto compaction on rocksdb should kick in more often
https://github.com/ceph/ceph/pull/22361 Kefu Chai
06:10 AM Backport #24374 (In Progress): luminous: mon: auto compaction on rocksdb should kick in more often
Kefu Chai
06:08 AM Backport #24374 (Resolved): luminous: mon: auto compaction on rocksdb should kick in more often
https://github.com/ceph/ceph/pull/22360 Kefu Chai
06:08 AM Bug #24361 (Pending Backport): auto compaction on rocksdb should kick in more often
Kefu Chai
04:47 AM Bug #24371: Ceph-osd crash when activate SPDK
This is a bug in NVMEDevice, the bug fix has been committed.
Please have a review PR https://github.com/ceph/ceph...
Anonymous
02:02 AM Bug #24371: Ceph-osd crash when activate SPDK
I'm working on the issue. Anonymous
02:01 AM Bug #24371 (Resolved): Ceph-osd crash when activate SPDK
Enable SPDK and configure bluestore as mentioned in http://docs.ceph.com/docs/master/rados/configuration/bluestore-co... Anonymous
02:56 AM Feature #24363: Configure DPDK with mellanox NIC
next, compiling pass. but all binaries can not run.
output error
EAL: VFIO_RESOURCE_LIST tailq is already registere...
YongSheng Zhang
02:38 AM Feature #24363: Configure DPDK with mellanox NIC
log details
mellanox NIC over fabric
When compiling output error.
1. lack numa and cryptopp libraries
I ...
YongSheng Zhang
12:23 AM Feature #24363: Configure DPDK with mellanox NIC
Append
NIC over optical fiber
YongSheng Zhang
12:07 AM Bug #24160 (Resolved): Monitor down when large store data needs to compact triggered by ceph tell...
Kefu Chai

05/31/2018

11:34 PM Bug #24368 (In Progress): osd: should not restart on permanent failures
https://github.com/ceph/ceph/pull/22349 has the simple restart interval change. Will investigate the options for cond... Greg Farnum
11:25 PM Bug #24368: osd: should not restart on permanent failures
See https://www.freedesktop.org/software/systemd/man/systemd.service.html#Restart= for the details on Restart options. Greg Farnum
11:17 PM Bug #24368 (Resolved): osd: should not restart on permanent failures
Last week at OpenStack I heard a few users report OSDs were not failing hard and fast as they should be on disk issue... Greg Farnum
07:01 PM Bug #24366 (In Progress): omap_digest handling still not correct
https://github.com/ceph/ceph/pull/22346 David Zafman
05:39 PM Bug #24366 (Resolved): omap_digest handling still not correct

When running bluestore the object info data_digest is not needed. In that case the omap_digest handling is still b...
David Zafman
06:08 PM Bug #24349 (Pending Backport): osd: stray osds in async_recovery_targets cause out of order ops
Josh Durgin
12:51 AM Bug #24349: osd: stray osds in async_recovery_targets cause out of order ops
https://github.com/ceph/ceph/pull/22330 Josh Durgin
12:46 AM Bug #24349 (Resolved): osd: stray osds in async_recovery_targets cause out of order ops
Related to https://tracker.ceph.com/issues/23827
http://pulpito.ceph.com/yuriw-2018-05-24_17:07:20-powercycle-mast...
Neha Ojha
05:07 PM Bug #24365 (Resolved): cosbench stuck at booting cosbench driver
... Neha Ojha
03:54 PM Bug #24342: Monitor's routed_requests leak
What version are you running? The MRoute handling is all pretty old; though we've certainly discovered a number of le... Greg Farnum
02:17 PM Feature #24363 (New): Configure DPDK with mellanox NIC
Hi all
Whether ceph-13.1.0 support DPDK on mellanox NIC?
I found many issues when compiling. I even though handle t...
YongSheng Zhang
01:22 PM Bug #24362 (Triaged): ceph-objectstore-tool incorrectly invokes crush_location_hook
Ceph release being used: 12.5.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
/etc/ceph/ceph.conf c...
Roman Chebotarev
11:50 AM Backport #24359 (In Progress): mimic: osd: leaked Session on osd.7
Kefu Chai
07:39 AM Backport #24359 (Resolved): mimic: osd: leaked Session on osd.7
https://github.com/ceph/ceph/pull/22339 Nathan Cutler
09:40 AM Bug #24361 (Fix Under Review): auto compaction on rocksdb should kick in more often
https://github.com/ceph/ceph/pull/22337 Kefu Chai
09:07 AM Bug #24361 (Resolved): auto compaction on rocksdb should kick in more often
in rocksdb, by default, "max_bytes_for_level_base" is 256MB, "max_bytes_for_level_multiplier" is 10. so with this set... Kefu Chai
07:39 AM Backport #24360 (Resolved): luminous: osd: leaked Session on osd.7
https://github.com/ceph/ceph/pull/29859 Nathan Cutler
07:38 AM Backport #24350 (In Progress): mimic: slow mon ops from osd_failure
Nathan Cutler
07:37 AM Backport #24350 (Resolved): mimic: slow mon ops from osd_failure
https://github.com/ceph/ceph/pull/22297 Nathan Cutler
07:38 AM Backport #24356 (Resolved): luminous: osd: pg hard limit too easy to hit
https://github.com/ceph/ceph/pull/22592 Nathan Cutler
07:38 AM Backport #24355 (Resolved): mimic: osd: pg hard limit too easy to hit
https://github.com/ceph/ceph/pull/22621 Nathan Cutler
07:37 AM Backport #24351 (Resolved): luminous: slow mon ops from osd_failure
https://github.com/ceph/ceph/pull/22568 Nathan Cutler
05:31 AM Bug #20924 (Pending Backport): osd: leaked Session on osd.7
i think https://github.com/ceph/ceph/pull/22292 indeed addresses this issue
https://github.com/ceph/ceph/pull/22384
Kefu Chai
04:51 AM Backport #24246 (In Progress): mimic: Manager daemon y is unresponsive during teuthology cluster ...
https://github.com/ceph/ceph/pull/22333 Prashant D
02:55 AM Backport #24245 (In Progress): luminous: Manager daemon y is unresponsive during teuthology clust...
https://github.com/ceph/ceph/pull/22331 Prashant D

05/30/2018

11:31 PM Bug #24160 (Fix Under Review): Monitor down when large store data needs to compact triggered by c...
Josh Durgin
10:45 PM Bug #23830: rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
This looks like a similar failure: http://pulpito.ceph.com/nojha-2018-05-30_20:43:02-rados-wip-async-up2-2018-05-30-d... Neha Ojha
02:17 PM Bug #24342: Monitor's routed_requests leak
It seems that this problem has been fixed by https://github.com/ceph/ceph/commit/39e06ef8f070e136e54452bdea3f6105cd79... Xuehan Xu
01:10 PM Bug #24342 (Closed): Monitor's routed_requests leak
Joao Eduardo Luis
12:09 PM Bug #24342: Monitor's routed_requests leak
Sorry, it seems that the latest version doesn't have this problem. Really sorry. please close this. Xuehan Xu
09:36 AM Bug #24342: Monitor's routed_requests leak
https://github.com/ceph/ceph/pull/22315 Xuehan Xu
08:54 AM Bug #24342 (Closed): Monitor's routed_requests leak
Recently, we found that, in our non-leader monitors, there are a lot of routed requests that has not been recycled, a... Xuehan Xu
01:58 PM Bug #24327: osd: segv in pg_log_entry_t::encode()
Sage Weil wrote:
> This crash doesn't look familiar, and it's not clear to me what might cause segfault here. Do yo...
frank lin
01:48 PM Bug #24327 (Need More Info): osd: segv in pg_log_entry_t::encode()
This crash doesn't look familiar, and it's not clear to me what might cause segfault here. Do you have a core file? Sage Weil
01:55 PM Bug #24339: FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in scan_requ...
Josh and I noticed this by code inspection. I'm nailing down out of space handling nits in the kernel client and wan... Ilya Dryomov
01:46 PM Bug #24339: FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in scan_requ...
This is somewhat by design (or lack thereof)... the fail-safe check is there to prevent us from writing when we are *... Sage Weil
05:40 AM Backport #24215 (In Progress): mimic: "process (unknown)" in ceph logs
https://github.com/ceph/ceph/pull/22311 Prashant D
03:29 AM Backport #24214 (In Progress): luminous: Module 'balancer' has failed: could not find bucket -14
https://github.com/ceph/ceph/pull/22308 Prashant D

05/29/2018

11:01 PM Feature #23979: Limit pg log length during recovery/backfill so that we don't run out of memory.
Initial testing is referenced here: https://github.com/ceph/ceph/pull/21508 Josh Durgin
10:59 PM Bug #24243 (Pending Backport): osd: pg hard limit too easy to hit
https://github.com/ceph/ceph/pull/22187 Josh Durgin
10:59 PM Bug #24304 (Fix Under Review): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
wrong bug Josh Durgin
10:58 PM Bug #24304 (Pending Backport): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
https://github.com/ceph/ceph/pull/22187 Josh Durgin
10:03 PM Feature #11601: osd: share cached osdmaps across osd daemons
A vague possibility that the future seastar-based OSD may run each logical disk OSD inside a single process, which co... Greg Farnum
07:38 PM Bug #24339 (New): FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in sca...
FULL_FORCE ops are dropped if fail-safe full check fails in do_op(). scan_requests() uses op->respects_full() which ... Ilya Dryomov
06:49 PM Bug #23646 (Resolved): scrub interaction with HEAD boundaries and clones is broken
David Zafman
01:11 PM Bug #24322 (Pending Backport): slow mon ops from osd_failure
mimic: https://github.com/ceph/ceph/pull/22297 Kefu Chai
12:53 PM Backport #24328 (In Progress): luminous: assert manager.get_num_active_clean() == pg_num on rados...
Kefu Chai
09:40 AM Backport #24328 (Resolved): luminous: assert manager.get_num_active_clean() == pg_num on rados/si...
https://github.com/ceph/ceph/pull/22296 Nathan Cutler
12:47 PM Backport #24329 (Resolved): mimic: assert manager.get_num_active_clean() == pg_num on rados/singl...
Kefu Chai
09:40 AM Backport #24329 (Resolved): mimic: assert manager.get_num_active_clean() == pg_num on rados/singl...
https://github.com/ceph/ceph/pull/22492 Nathan Cutler
10:02 AM Bug #22530 (Resolved): pool create cmd's expected_num_objects is not correctly interpreted
Nathan Cutler
10:02 AM Backport #23316 (Resolved): jewel: pool create cmd's expected_num_objects is not correctly interp...
Nathan Cutler
10:01 AM Backport #24058 (Resolved): jewel: Deleting a pool with active notify linger ops can result in se...
Nathan Cutler
09:59 AM Backport #24244 (Resolved): jewel: osd/EC: slow/hung ops in multimds suite test
Nathan Cutler
09:59 AM Backport #24244 (In Progress): jewel: osd/EC: slow/hung ops in multimds suite test
Nathan Cutler
09:56 AM Backport #24294 (Resolved): mimic: control-c on ceph cli leads to segv
Nathan Cutler
09:55 AM Backport #24294 (In Progress): mimic: control-c on ceph cli leads to segv
Nathan Cutler
09:52 AM Backport #24256 (Resolved): mimic: osd: Assertion `!node_algorithms::inited(this->priv_value_tra...
Nathan Cutler
09:41 AM Backport #24333 (Resolved): luminous: local_reserver double-reservation of backfilled pg
https://github.com/ceph/ceph/pull/23493 Nathan Cutler
09:41 AM Backport #24332 (Resolved): mimic: local_reserver double-reservation of backfilled pg
https://github.com/ceph/ceph/pull/22559 Nathan Cutler
08:26 AM Feature #24231: librbd/libcephfs/librgw should ignore rados_mon/osd_op_timeouts options (requires...
libcephfs doesn't use librados, so it doesn't need any changes.
The rados_mon_op_timeout affects anything that use...
John Spray
07:55 AM Bug #20924: osd: leaked Session on osd.7
https://github.com/ceph/ceph/pull/22292 might address this issue. Kefu Chai
07:37 AM Bug #24327 (Need More Info): osd: segv in pg_log_entry_t::encode()
The affected osd restarted itself and everything seems fine then.But what is the cause of the crash?... frank lin
06:37 AM Backport #24204 (In Progress): mimic: LibRadosMiscPool.PoolCreationRace segv
https://github.com/ceph/ceph/pull/22291 Prashant D
06:20 AM Backport #24216 (In Progress): luminous: "process (unknown)" in ceph logs
https://github.com/ceph/ceph/pull/22290 Prashant D
03:32 AM Bug #24321: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max-pg-per-osd...
mimic: https://github.com/ceph/ceph/pull/22288 Kefu Chai
03:31 AM Bug #24321 (Pending Backport): assert manager.get_num_active_clean() == pg_num on rados/singleton...
Kefu Chai

05/28/2018

10:54 PM Feature #24176: osd: add command to drop OSD cache
Anyone looking into this? If not, I can pick it up. Mohamad Gebai
03:21 PM Bug #24145 (Duplicate): osdmap decode error in rados/standalone/*
Kefu Chai
03:19 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
/a/kchai-2018-05-28_09:21:54-rados-wip-kefu-testing-2018-05-28-1113-distro-basic-smithi/2601187
on mimic branch.
...
Kefu Chai
11:51 AM Bug #24321 (Fix Under Review): assert manager.get_num_active_clean() == pg_num on rados/singleton...
https://github.com/ceph/ceph/pull/22275 Kefu Chai
05:28 AM Bug #23352: osd: segfaults under normal operation
I've confirmed that in all of the SafeTimer segfaults the 'schedule' multimap is empty, indicating this is the last e... Brad Hubbard
05:16 AM Bug #23352: osd: segfaults under normal operation
If we look at the coredump from 23585 and compare it to this message.
[117735.930255] safe_timer[52573]: segfault ...
Brad Hubbard
04:32 AM Bug #24023 (Duplicate): Segfault on OSD in 12.2.5
Duplicate of 23352 Brad Hubbard
04:30 AM Bug #23564 (Duplicate): OSD Segfaults
Duplicate of 23352 Brad Hubbard
04:28 AM Bug #23585 (Duplicate): osd: safe_timer segfault
Duplicate of 23352 Brad Hubbard
02:47 AM Bug #24160: Monitor down when large store data needs to compact triggered by ceph tell mon.xx com...
PR :
https://github.com/ceph/ceph/pull/22056/
相洋 于

05/27/2018

05:58 PM Feature #11601: osd: share cached osdmaps across osd daemons
Attached the file CephScaleTestMarch2015.pdf
Do we have any plan for this guys?
Chuong Le
02:55 PM Bug #24322 (Fix Under Review): slow mon ops from osd_failure
https://github.com/ceph/ceph/pull/22259 Sage Weil
02:46 PM Bug #23585: osd: safe_timer segfault
Hi Brad, sure, thanks. Alex Gorbachev

05/26/2018

01:51 PM Bug #24322 (Resolved): slow mon ops from osd_failure
... Sage Weil
01:39 PM Bug #24162 (Resolved): control-c on ceph cli leads to segv
Sage Weil
01:38 PM Bug #24219 (Resolved): osd: InProgressOp freed by on_change(); in-flight op may use-after-free in...
Sage Weil
01:36 PM Bug #24321 (Resolved): assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max...
... Sage Weil
01:29 PM Bug #24320 (Resolved): out of order reply and/or osd assert with set-chunks-read.yaml
... Sage Weil
02:00 AM Bug #23614 (Pending Backport): local_reserver double-reservation of backfilled pg
Josh Durgin
01:59 AM Bug #23490 (Duplicate): luminous: osd: double recovery reservation for PG when EIO injected (whil...
Josh Durgin
01:25 AM Bug #23352: osd: segfaults under normal operation
Thanks,
That gives us seven cores across 12.2.4-12.2.5 on Xenial and Centos and one core from the MMgrReport::enco...
Brad Hubbard
12:35 AM Bug #23431 (Duplicate): OSD Segmentation fault in thread_name:safe_timer
Closing as a duplicate of #23352 where we are focussing. Brad Hubbard
12:33 AM Bug #23564: OSD Segfaults
Since the stack from this core is the following can we also close this as a duplicate of 23352?
(gdb) bt
#0 0x00...
Brad Hubbard
12:31 AM Bug #23585: osd: safe_timer segfault
Alex,
Can we close this bug also as a duplicate of 23352?
Brad Hubbard
12:28 AM Bug #24023: Segfault on OSD in 12.2.5
Alex,
Why are we running multiple trackers for the same issue?
Can we close this as a duplicate?
Brad Hubbard

05/25/2018

10:25 PM Bug #23614 (Fix Under Review): local_reserver double-reservation of backfilled pg
Explanation of the problem and resolution included in the pull request.
https://github.com/ceph/ceph/pull/22255
Neha Ojha
10:06 PM Bug #24219 (Pending Backport): osd: InProgressOp freed by on_change(); in-flight op may use-after...
Sage Weil
09:25 PM Bug #24304 (Fix Under Review): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
This is due to the fast-path decoding for object_stat_sum_t not being updated in the backport. Fix: https://github.co... Josh Durgin
04:22 PM Bug #24304 (Closed): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
This appears to be specific to a downstream build, closing. John Spray
12:29 PM Bug #24304 (Resolved): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
... John Spray
03:08 PM Backport #24297 (Resolved): mimic: RocksDB compression is not supported at least on Debian.
Kefu Chai
11:03 AM Backport #24297 (Resolved): mimic: RocksDB compression is not supported at least on Debian.
https://github.com/ceph/ceph/pull/22183 Nathan Cutler
03:06 PM Bug #24023: Segfault on OSD in 12.2.5
ALso posted this in bug http://tracker.ceph.com/issues/23352
Hi Brad, we had one too just now, core dump and log:
...
Alex Gorbachev
08:04 AM Bug #24023: Segfault on OSD in 12.2.5
hi,
i've noticed similar/same segfault on my deployment. random segfaults on random osds appears under load or wit...
Jan Krcmar
03:05 PM Bug #23352: osd: segfaults under normal operation
Hi Brad, we had one too just now, core dump and log:
https://drive.google.com/open?id=1t1jfjqwjhUUBzWjxamos3Hr7ghj...
Alex Gorbachev
07:54 AM Bug #23352: osd: segfaults under normal operation
Thanks Beom-Seok,
I've set up a centos environment to debug those cores along with the Xenial ones. I will update ...
Brad Hubbard
03:11 AM Bug #23352: osd: segfaults under normal operation
Today two osd crashes.
coredump at:
https://drive.google.com/open?id=1rXtW0riZMBwP5OqrJ7QdRIOAsKFr-kYw
https://d...
Beom-Seok Park
02:10 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
https://github.com/ceph/ceph/pull/22126 merged to remove failures from rgw suite. moving to rados project Casey Bodley
12:28 PM Backport #24259 (Resolved): mimic: crush device class: Monitor Crash when moving Bucket into Defa...
Kefu Chai
11:03 AM Backport #24294 (Resolved): mimic: control-c on ceph cli leads to segv
https://github.com/ceph/ceph/pull/22225 Nathan Cutler
11:03 AM Backport #24293 (Resolved): jewel: mon: slow op on log message
https://github.com/ceph/ceph/pull/22431 Nathan Cutler
11:03 AM Backport #24292 (Resolved): mimic: common: JSON output from rados bench write has typo in max_lat...
https://github.com/ceph/ceph/pull/22406 Nathan Cutler
11:03 AM Backport #24291 (Resolved): jewel: common: JSON output from rados bench write has typo in max_lat...
https://github.com/ceph/ceph/pull/22407 Nathan Cutler
11:03 AM Backport #24290 (Resolved): luminous: common: JSON output from rados bench write has typo in max_...
https://github.com/ceph/ceph/pull/22391 Nathan Cutler
03:47 AM Bug #24045 (Resolved): Eviction still raced with scrub due to preemption
David Zafman
03:47 AM Bug #22881 (Resolved): scrub interaction with HEAD boundaries and snapmapper repair is broken
David Zafman
03:46 AM Backport #24016 (Resolved): luminous: scrub interaction with HEAD boundaries and snapmapper repai...
David Zafman
03:43 AM Backport #23863 (Resolved): luminous: scrub interaction with HEAD boundaries and clones is broken
David Zafman
03:39 AM Backport #24153 (Resolved): luminous: Eviction still raced with scrub due to preemption
David Zafman
03:38 AM Bug #23267 (Resolved): scrub errors not cleared on replicas can cause inconsistent pg state when ...
David Zafman
03:37 AM Backport #23486 (Resolved): jewel: scrub errors not cleared on replicas can cause inconsistent pg...
David Zafman
03:30 AM Bug #23811: RADOS stat slow for some objects on same OSD
... Chang Liu

05/24/2018

08:41 PM Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica tak...
merged https://github.com/ceph/ceph/pull/21194 Yuri Weinstein
08:38 PM Backport #23316: jewel: pool create cmd's expected_num_objects is not correctly interpreted
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22050
merged
Yuri Weinstein
08:38 PM Backport #23316: jewel: pool create cmd's expected_num_objects is not correctly interpreted
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22050
merged
Yuri Weinstein
08:37 PM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
merged https://github.com/ceph/ceph/pull/22188 Yuri Weinstein
08:36 PM Bug #23769: osd/EC: slow/hung ops in multimds suite test
jewel backport PR https://github.com/ceph/ceph/pull/22189 merged Yuri Weinstein
06:07 PM Bug #24192: cluster [ERR] Corruption detected: object 2:f59d1934:::smithi14913526-5822:head is mi...
... David Zafman
06:05 PM Bug #24199 (Pending Backport): common: JSON output from rados bench write has typo in max_latency...
Sage Weil
06:03 PM Bug #24162 (Pending Backport): control-c on ceph cli leads to segv
mimic backport https://github.com/ceph/ceph/pull/22225 Sage Weil
05:59 PM Bug #23879: test_mon_osdmap_prune.sh fails
/a/sage-2018-05-23_14:50:29-rados-wip-sage2-testing-2018-05-22-1410-distro-basic-smithi/2576533 Sage Weil
03:40 PM Feature #24232: Add new command ceph mon status
added a card to the backlog: https://trello.com/c/PTgwBpmx Joao Eduardo Luis
01:27 PM Feature #24232: Add new command ceph mon status
Sorry for the confusion, I did not check that we have ceph osd stat and ceph mon stat has the same purpose. I wanted ... Vikhyat Umrao
10:55 AM Feature #24232: Add new command ceph mon status
copy/pasting from the PR opened to address this issue (https://github.com/ceph/ceph/pull/22202):... Joao Eduardo Luis
01:44 PM Bug #24037 (Resolved): osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_nod...
Sage Weil
01:42 PM Bug #24145: osdmap decode error in rados/standalone/*
... Sage Weil
01:39 PM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
... Sage Weil
12:08 PM Backport #24279 (In Progress): luminous: RocksDB compression is not supported at least on Debian.
Kefu Chai
12:08 PM Backport #24279 (Resolved): luminous: RocksDB compression is not supported at least on Debian.
https://github.com/ceph/ceph/pull/22215 Kefu Chai
09:48 AM Bug #24025 (Pending Backport): RocksDB compression is not supported at least on Debian.
Kefu Chai
09:43 AM Bug #24025: RocksDB compression is not supported at least on Debian.
tested... Kefu Chai
08:22 AM Bug #23352: osd: segfaults under normal operation
Hi Alex,
I notice there are several more coredumps attached to the related bug reports. Are they all separate cras...
Brad Hubbard
03:07 AM Bug #24264: ssd-primary crush rule not working as intended
Sorry, here's my updated rule instead of the one in the document.
rule ssd-primary {
id 2
type r...
Horace Ng
03:05 AM Bug #24264 (Closed): ssd-primary crush rule not working as intended
I've set up the rule according to the doc, but some of the PGs are still being assigned to the same host though my fa... Horace Ng

05/23/2018

09:36 PM Bug #23787 (Rejected): luminous: "osd-scrub-repair.sh'" failures in rados
This is an incompatibility between the OSD version 64ffa817000d59d91379f7335439845930f58530 (luminous) and the versio... David Zafman
06:40 PM Bug #22920 (Resolved): filestore journal replay does not guard omap operations
Nathan Cutler
06:40 PM Backport #22934 (Resolved): luminous: filestore journal replay does not guard omap operations
Nathan Cutler
06:35 PM Bug #23878 (Resolved): assert on pg upmap
Nathan Cutler
06:34 PM Backport #23925 (Resolved): luminous: assert on pg upmap
Nathan Cutler
06:32 PM Backport #24259 (Resolved): mimic: crush device class: Monitor Crash when moving Bucket into Defa...
https://github.com/ceph/ceph/pull/22169 Nathan Cutler
06:32 PM Backport #24258 (Resolved): luminous: crush device class: Monitor Crash when moving Bucket into D...
https://github.com/ceph/ceph/pull/22381 Nathan Cutler
06:32 PM Backport #24244 (New): jewel: osd/EC: slow/hung ops in multimds suite test
Nathan Cutler
05:09 PM Backport #24244 (Resolved): jewel: osd/EC: slow/hung ops in multimds suite test
https://github.com/ceph/ceph/pull/22189
partial backport for mdsmonitor
Abhishek Lekshmanan
06:31 PM Backport #24256 (Resolved): mimic: osd: Assertion `!node_algorithms::inited(this->priv_value_tra...
https://github.com/ceph/ceph/pull/22160 Nathan Cutler
06:31 PM Backport #24246 (Resolved): mimic: Manager daemon y is unresponsive during teuthology cluster tea...
https://github.com/ceph/ceph/pull/22333 Nathan Cutler
06:31 PM Backport #24245 (Resolved): luminous: Manager daemon y is unresponsive during teuthology cluster ...
https://github.com/ceph/ceph/pull/22331 Nathan Cutler
04:27 PM Bug #23352: osd: segfaults under normal operation
Sage, I had tried to do this, but we don't know when these crashes would happen, just that they will occur. Random t... Alex Gorbachev
04:10 PM Bug #23352 (Need More Info): osd: segfaults under normal operation
Alex, how reproducible is this for you? Could you reproduce with debug timer = 20? Sage Weil
04:21 PM Backport #24058 (In Progress): jewel: Deleting a pool with active notify linger ops can result in...
https://github.com/ceph/ceph/pull/22188 Kefu Chai
04:15 PM Bug #24243 (Resolved): osd: pg hard limit too easy to hit
The default ratio of 2x mon_max_pg_per_osd is easy to hit for clusters that have differently weighted disks (e.g. 1 a... Josh Durgin
03:27 PM Bug #24025: RocksDB compression is not supported at least on Debian.
mimic: https://github.com/ceph/ceph/pull/22183 Kefu Chai
03:25 PM Bug #24025 (Fix Under Review): RocksDB compression is not supported at least on Debian.
https://github.com/ceph/ceph/pull/22181 Kefu Chai
02:53 PM Bug #24025: RocksDB compression is not supported at least on Debian.
because we fail to pass -DWITH_SNAPPY etc to cmake while building rocksdb. this bug also impacts rpm package. i can h... Kefu Chai
01:51 PM Bug #24229 (Triaged): Libradosstriper successfully removes nonexistent objects instead of returni...
Sage Weil
11:57 AM Bug #24242 (New): tcmalloc::ThreadCache::ReleaseToCentralCache on rhel (w/ centos packages)
... Sage Weil
11:43 AM Bug #24222 (Pending Backport): Manager daemon y is unresponsive during teuthology cluster teardown
Sage Weil
08:41 AM Bug #23145: OSD crashes during recovery of EC pg
osd in last peering stage will call pg_log.roll_forward(at last of PG::activate), is there possible the entry rollbf... Zengran Zhang
06:52 AM Bug #23386 (Pending Backport): crush device class: Monitor Crash when moving Bucket into Default ...
https://github.com/ceph/ceph/pull/22169 Kefu Chai
01:21 AM Bug #24037 (Pending Backport): osd: Assertion `!node_algorithms::inited(this->priv_value_traits(...
Sage Weil

05/22/2018

09:55 PM Bug #24222 (Fix Under Review): Manager daemon y is unresponsive during teuthology cluster teardown
https://github.com/ceph/ceph/pull/22158 Sage Weil
02:20 AM Bug #24222 (Resolved): Manager daemon y is unresponsive during teuthology cluster teardown
... Sage Weil
08:47 PM Feature #24232 (Fix Under Review): Add new command ceph mon status
Add new command ceph mon status
For more information please check - https://tracker.ceph.com/issues/24217
Changed...
Vikhyat Umrao
08:32 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
Josh Durgin wrote:
> Casey, could you or someone else familiar with rgw look through the logs for this and identify ...
Casey Bodley
03:19 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
Casey, could you or someone else familiar with rgw look through the logs for this and identify the relevant OSD reque... Josh Durgin
07:17 PM Feature #24231 (New): librbd/libcephfs/librgw should ignore rados_mon/osd_op_timeouts options (re...
librbd/libcephfs/librgw should ignore rados_mon/osd_op_timeouts options
https://bugzilla.redhat.com/show_bug.cgi?id=...
Vikhyat Umrao
04:09 PM Bug #24025 (In Progress): RocksDB compression is not supported at least on Debian.
... Radoslaw Zarzynski
03:48 PM Bug #24037 (Fix Under Review): osd: Assertion `!node_algorithms::inited(this->priv_value_traits(...
https://github.com/ceph/ceph/pull/22156 Radoslaw Zarzynski
02:35 PM Bug #24229 (Triaged): Libradosstriper successfully removes nonexistent objects instead of returni...
libradosstriper remove() call on nonexistent objects returns zero instead of ENOENT.
Tested on luminous 12.2.5-1xe...
Stan K
11:35 AM Feature #24099: osd: Improve workflow when creating OSD on raw block device if there was bluestor...

> Point out that it found existing data on the OSD, and possibly suggest using `ceph-volume lvm zap` if that's what...
John Spray
10:51 AM Bug #24199 (Fix Under Review): common: JSON output from rados bench write has typo in max_latency...
John Spray
07:00 AM Bug #23371: OSDs flaps when cluster network is made down
we have not observed this behavior in kraken.
when ever the Cluster interface is made down, few OSDs which goes do...
Nokia ceph-users
03:55 AM Bug #23352: osd: segfaults under normal operation
OSD log attached Alex Gorbachev
03:15 AM Bug #23352: osd: segfaults under normal operation
It's an internal comment for others looking at this - though if you (Alex) have an osd log to go with the 'MMgrReport... Josh Durgin
02:59 AM Bug #23352: osd: segfaults under normal operation
Josh, is this something I can extract from the OSD node for you, or is this an internal comment? Alex Gorbachev
01:10 AM Bug #23352: osd: segfaults under normal operation
I put the core file from comment #14 and binaries from 12.2.5 in senta02:/slow/jdurgin/ceph/bugs/tracker_23352/2018-0... Josh Durgin
03:49 AM Backport #24059 (In Progress): luminous: Deleting a pool with active notify linger ops can result...
https://github.com/ceph/ceph/pull/22143 Prashant D

05/21/2018

10:04 PM Bug #24219: osd: InProgressOp freed by on_change(); in-flight op may use-after-free in op_commit()
/a/teuthology-2018-05-21_20:00:50-powercycle-mimic-distro-basic-smithi/2563192
powercycle/osd/{clusters/3osd-1per-...
Sage Weil
09:40 PM Bug #24219 (Fix Under Review): osd: InProgressOp freed by on_change(); in-flight op may use-after...
https://github.com/ceph/ceph/pull/22133 Sage Weil
09:28 PM Bug #24219 (Resolved): osd: InProgressOp freed by on_change(); in-flight op may use-after-free in...
... Sage Weil
07:29 PM Bug #22330 (Need More Info): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
need to capture some logs... Sage Weil
07:15 PM Bug #23031: FAILED assert(!parent->get_log().get_missing().is_missing(soid))
I hit this issue a couple of times while trying to reproduce #23614... Neha Ojha
06:36 PM Backport #24200 (Resolved): mimic: PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
Sage Weil
08:48 AM Backport #24200 (Resolved): mimic: PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
Nathan Cutler
06:24 PM Bug #23386 (Fix Under Review): crush device class: Monitor Crash when moving Bucket into Default ...
https://github.com/ceph/ceph/pull/22127 Sage Weil
05:14 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
reproduces on luminous with... Sage Weil
01:52 PM Bug #23386: crush device class: Monitor Crash when moving Bucket into Default root
I suspect the recent pr https://github.com/ceph/ceph/pull/22091 fixed this, but figuring out how to reproduce to be s... Sage Weil
05:59 PM Bug #23965 (Fix Under Review): FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part...
https://github.com/ceph/ceph/pull/22126 removes ec-cache pools from the rgw suite Casey Bodley
04:55 PM Bug #22656: scrub mismatch on bytes (cache pools)
http://qa-proxy.ceph.com/teuthology/dzafman-2018-05-18_11:33:31-rados-wip-zafman-testing-mimic-distro-basic-smithi/25... David Zafman
04:21 PM Backport #22934: luminous: filestore journal replay does not guard omap operations
Victor Denisov wrote:
> https://github.com/ceph/ceph/pull/21547
merged
Yuri Weinstein
04:13 PM Backport #23925: luminous: assert on pg upmap
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21818
merged
Yuri Weinstein
04:01 PM Backport #24213 (In Progress): mimic: Module 'balancer' has failed: could not find bucket -14
Nathan Cutler
03:59 PM Backport #24213 (Resolved): mimic: Module 'balancer' has failed: could not find bucket -14
https://github.com/ceph/ceph/pull/22120 Nathan Cutler
03:59 PM Backport #24216 (Resolved): luminous: "process (unknown)" in ceph logs
https://github.com/ceph/ceph/pull/22290 Nathan Cutler
03:59 PM Backport #24215 (Resolved): mimic: "process (unknown)" in ceph logs
https://github.com/ceph/ceph/pull/22311 Nathan Cutler
03:59 PM Backport #24214 (Resolved): luminous: Module 'balancer' has failed: could not find bucket -14
https://github.com/ceph/ceph/pull/22308 Nathan Cutler
03:03 PM Bug #23585 (Triaged): osd: safe_timer segfault
Josh Durgin
02:17 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
We are experiencing this too. Majority of the OSDs went down. We tried removing the intervals. It works on some OSDs ... Dexter John Genterone
01:44 PM Bug #24167: Module 'balancer' has failed: could not find bucket -14
mimic backport: https://github.com/ceph/ceph/pull/22120 Sage Weil
01:42 PM Bug #24167 (Pending Backport): Module 'balancer' has failed: could not find bucket -14
Sage Weil
01:00 PM Bug #23431: OSD Segmentation fault in thread_name:safe_timer
Hi.
We have the same issue: ...
Aleksei Zakharov
12:07 PM Bug #24123 (Pending Backport): "process (unknown)" in ceph logs
Sage Weil
09:50 AM Backport #24048 (In Progress): luminous: pg-upmap cannot balance in some case
https://github.com/ceph/ceph/pull/22115 Prashant D
09:43 AM Bug #24199: common: JSON output from rados bench write has typo in max_latency key
PR: https://github.com/ceph/ceph/pull/22112 Sandor Zeestraten
06:23 AM Bug #24199 (Resolved): common: JSON output from rados bench write has typo in max_latency key
The JSON output from `rados bench write --format json/json-pretty` has a typo in the `max_latency` key.
It contains ...
Sandor Zeestraten
08:48 AM Backport #24204 (Resolved): mimic: LibRadosMiscPool.PoolCreationRace segv
https://github.com/ceph/ceph/pull/22291 Nathan Cutler
08:43 AM Bug #24174: PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
mimic: https://github.com/ceph/ceph/pull/22113 Kefu Chai
08:41 AM Bug #24174 (Pending Backport): PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
Kefu Chai
07:11 AM Bug #24076 (Duplicate): rados/test.sh fails in "bin/ceph_test_rados_api_misc --gtest_filter=*Pool...
Kefu Chai
06:24 AM Backport #24198 (In Progress): luminous: mon: slow op on log message
Kefu Chai
06:23 AM Backport #24198 (Resolved): luminous: mon: slow op on log message
https://github.com/ceph/ceph/pull/22109 Kefu Chai
06:20 AM Backport #24195 (Resolved): mimic: mon: slow op on log message
Kefu Chai
02:51 AM Bug #20924: osd: leaked Session on osd.7
osd.4
/a/sage-2018-05-20_18:11:15-rados-wip-sage3-testing-2018-05-20-1031-distro-basic-smithi/2558319
rados/ver...
Sage Weil
02:24 AM Bug #24150 (Pending Backport): LibRadosMiscPool.PoolCreationRace segv
Sage Weil

05/20/2018

06:58 PM Bug #18239 (Duplicate): nan in ceph osd df again
Sage Weil
10:32 AM Bug #24023: Segfault on OSD in 12.2.5
Alexander M wrote:
> Alex Gorbachev wrote:
> > This continues to happen every day, usually during scrub
>
> I've...
Alexander Morozov
10:30 AM Bug #24023: Segfault on OSD in 12.2.5
Alex Gorbachev wrote:
> This continues to happen every day, usually during scrub
I've faced with the same issue
...
Alexander Morozov
09:45 AM Backport #24195 (In Progress): mimic: mon: slow op on log message
https://github.com/ceph/ceph/pull/22104 Kefu Chai
09:42 AM Backport #24195 (Resolved): mimic: mon: slow op on log message
Kefu Chai
09:40 AM Bug #24180 (Pending Backport): mon: slow op on log message
Kefu Chai

05/19/2018

07:04 PM Bug #24192 (Duplicate): cluster [ERR] Corruption detected: object 2:f59d1934:::smithi14913526-582...

davidz@teuthology:/a/dzafman-2018-05-18_11:36:58-rados-wip-zafman-testing-distro-basic-smithi/2549009...
David Zafman

05/18/2018

08:45 PM Bug #24180: mon: slow op on log message
https://github.com/ceph/ceph/pull/22098 Sage Weil
08:44 PM Bug #24180 (Fix Under Review): mon: slow op on log message
https://github.com/ceph/ceph/pull/22098 Sage Weil
08:41 PM Bug #24180 (Resolved): mon: slow op on log message
... Sage Weil
08:37 PM Bug #20924: osd: leaked Session on osd.7
osd.7
/a/sage-2018-05-18_16:20:24-rados-wip-sage-testing-2018-05-18-0817-distro-basic-smithi/2548324
rados/veri...
Sage Weil
02:26 PM Bug #20924: osd: leaked Session on osd.7
osd.7
/a/sage-2018-05-18_13:08:19-rados-wip-sage2-testing-2018-05-17-0701-distro-basic-smithi/2546923
rados/ver...
Sage Weil
08:16 PM Backport #24149 (Resolved): mimic: Eviction still raced with scrub due to preemption
David Zafman
07:24 PM Bug #24162 (Fix Under Review): control-c on ceph cli leads to segv
hacky workaround: https://github.com/ceph/ceph/pull/22093 Sage Weil
07:18 PM Bug #24162: control-c on ceph cli leads to segv
... Sage Weil
07:09 PM Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value...
related?... Sage Weil
01:26 PM Bug #24037 (In Progress): osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_...
Radoslaw Zarzynski
01:15 PM Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value...
Scenario I can see after static analysis:
1. An instance of `TrackedOp` in `STATE_LIVE` is being dereferenced - th...
Radoslaw Zarzynski
06:59 PM Bug #23352: osd: segfaults under normal operation
The latest ones look like this, below.
Crash dump at https://drive.google.com/open?id=12v95-TCHlkrBZ16ni5UkhYkXRt...
Alex Gorbachev
06:41 PM Bug #23352: osd: segfaults under normal operation
For some reason we are also seeing more of these happening, simultaneous failures and recoveries are occurring during... Alex Gorbachev
02:36 AM Bug #23352: osd: segfaults under normal operation
I run into this issue with 12.2.5, it affects cluster stability heavily. wei jin
06:12 PM Bug #24167 (Fix Under Review): Module 'balancer' has failed: could not find bucket -14
https://github.com/ceph/ceph/pull/22091 Sage Weil
05:02 PM Feature #24176 (Resolved): osd: add command to drop OSD cache
Idea here is to basically make it possible for performance testing on the same data set in RADOS without restarting t... Patrick Donnelly
04:24 PM Feature #22420 (Resolved): Add support for obtaining a list of available compression options
Nathan Cutler
04:04 PM Bug #23487 (Resolved): There is no 'ceph osd pool get erasure allow_ec_overwrites' command
Nathan Cutler
04:04 PM Backport #23668 (Resolved): luminous: There is no 'ceph osd pool get erasure allow_ec_overwrites'...
Nathan Cutler
04:03 PM Bug #23664 (Resolved): cache-try-flush hits wrlock, busy loops
Nathan Cutler
04:03 PM Backport #23914 (Resolved): luminous: cache-try-flush hits wrlock, busy loops
Nathan Cutler
04:02 PM Bug #23860 (Resolved): luminous->master: luminous crashes with AllReplicasRecovered in Started/Pr...
Nathan Cutler
04:02 PM Backport #23988 (Resolved): luminous: luminous->master: luminous crashes with AllReplicasRecovere...
Nathan Cutler
04:02 PM Bug #23980 (Resolved): UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap const&)
Nathan Cutler
04:01 PM Backport #24015 (Resolved): luminous: UninitCondition in PG::RecoveryState::Incomplete::react(PG:...
Nathan Cutler
02:30 PM Backport #24135 (Resolved): mimic: Add support for obtaining a list of available compression options
Sage Weil
02:25 PM Bug #24174: PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
https://github.com/ceph/ceph/pull/22084 Sage Weil
02:24 PM Bug #24174 (Resolved): PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
... Sage Weil

05/17/2018

10:22 PM Bug #24167: Module 'balancer' has failed: could not find bucket -14
It looks like we also don't create weight-sets for new buckets. And if you create buckets and move things into them ... Sage Weil
09:58 PM Bug #24167 (Resolved): Module 'balancer' has failed: could not find bucket -14
crushmap may contain choose_args for deleted buckets... Sage Weil
05:39 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
Casey Bodley
03:52 PM Bug #23763 (Resolved): upgrade: bad pg num and stale health status in mixed lumnious/mimic cluster
Kefu Chai
03:52 PM Backport #23808 (Resolved): luminous: upgrade: bad pg num and stale health status in mixed lumnio...
Kefu Chai
03:42 PM Backport #23808: luminous: upgrade: bad pg num and stale health status in mixed lumnious/mimic cl...
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/21556
merged
Yuri Weinstein
03:45 PM Bug #24162 (Resolved): control-c on ceph cli leads to segv
... Sage Weil
03:43 PM Backport #23668: luminous: There is no 'ceph osd pool get erasure allow_ec_overwrites' command
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21378
merged
Yuri Weinstein
03:42 PM Backport #23914: luminous: cache-try-flush hits wrlock, busy loops
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21764
merged
Yuri Weinstein
03:41 PM Backport #23988: luminous: luminous->master: luminous crashes with AllReplicasRecovered in Starte...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21964
merged
Yuri Weinstein
03:40 PM Backport #23988: luminous: luminous->master: luminous crashes with AllReplicasRecovered in Starte...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21964
merged
Yuri Weinstein
03:38 PM Backport #24015: luminous: UninitCondition in PG::RecoveryState::Incomplete::react(PG::AdvMap con...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21993
merged
Yuri Weinstein
01:55 PM Backport #23786 (Resolved): luminous: "utilities/env_librados.cc:175:33: error: unused parameter ...
Sage Weil
01:55 PM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Sage Weil
01:50 PM Bug #23145: OSD crashes during recovery of EC pg
Peter Woodman wrote:
> Each OSD is on its own host- these are small arm64 machines. Unfortunately i've already tried...
Sage Weil
11:49 AM Bug #24159 (Duplicate): Monitor down when large store data needs to compact triggered by ceph tel...
Nathan Cutler
10:38 AM Bug #24159 (Duplicate): Monitor down when large store data needs to compact triggered by ceph tel...
I have met a monitor problem with capacity too large in our production environment.
This logical volume for monito...
相洋 于
10:38 AM Bug #24160 (Resolved): Monitor down when large store data needs to compact triggered by ceph tell...
I have met a monitor problem with capacity too large in our production environment.
This logical volume for monito...
相洋 于
09:04 AM Bug #23598 (Duplicate): hammer->jewel: ceph_test_rados crashes during radosbench task in jewel ra...
#23290 does not contain any of the PR mentioned above. so it's not a regression. Kefu Chai
08:33 AM Backport #24153 (In Progress): luminous: Eviction still raced with scrub due to preemption
Nathan Cutler
08:33 AM Backport #24149 (In Progress): mimic: Eviction still raced with scrub due to preemption
Nathan Cutler
08:33 AM Backport #24149 (New): mimic: Eviction still raced with scrub due to preemption
Nathan Cutler
08:27 AM Bug #23962 (Resolved): ceph_daemon.py format_dimless units list index out of range
Nathan Cutler
08:26 AM Bug #24000 (Resolved): mon: snap delete on deleted pool returns 0 without proper payload
Nathan Cutler
08:25 AM Bug #23899 (Resolved): run cmd 'ceph daemon osd.0 smart' cause osd daemon Segmentation fault
Nathan Cutler
07:37 AM Backport #23316 (In Progress): jewel: pool create cmd's expected_num_objects is not correctly int...
Kefu Chai

05/16/2018

10:29 PM Backport #24153: luminous: Eviction still raced with scrub due to preemption
I'm pulling in these pull requests also on top of existing pull request (not yet merged) https://github.com/ceph/ceph... David Zafman
10:26 PM Backport #24153 (Resolved): luminous: Eviction still raced with scrub due to preemption
https://github.com/ceph/ceph/pull/22044 David Zafman
08:52 PM Bug #24150 (Fix Under Review): LibRadosMiscPool.PoolCreationRace segv
https://github.com/ceph/ceph/pull/22042 Sage Weil
08:51 PM Bug #24150 (Resolved): LibRadosMiscPool.PoolCreationRace segv
... Sage Weil
08:36 PM Backport #24149 (Resolved): mimic: Eviction still raced with scrub due to preemption
https://github.com/ceph/ceph/pull/22041 David Zafman
08:28 PM Bug #24045 (Pending Backport): Eviction still raced with scrub due to preemption
Sage Weil
07:31 PM Bug #24148: Segmentation fault out of ObcLockManager::get_lock_type()
The pg 3.3 involved here was never scrubbed, so unrelated to my changes. David Zafman
07:16 PM Bug #24148 (Duplicate): Segmentation fault out of ObcLockManager::get_lock_type()

teuthology:/a/dzafman-2018-05-16_09:57:45-rados:thrash-wip-zafman-testing-distro-basic-smithi/2539708
remote/smi...
David Zafman
06:53 PM Bug #22354: v12.2.2 unable to create bluestore osd using ceph-disk
kobi ginon wrote:
> Note: i still believe there is a relation to rocksdb somehow and the clearing of disk's forces t...
Jon Heese
03:52 PM Backport #24027 (Resolved): mimic: ceph_daemon.py format_dimless units list index out of range
Sage Weil
03:51 PM Backport #24103 (Resolved): mimic: mon: snap delete on deleted pool returns 0 without proper payload
Sage Weil
03:50 PM Backport #24104 (Resolved): mimic: run cmd 'ceph daemon osd.0 smart' cause osd daemon Segmentatio...
Sage Weil
03:49 PM Bug #24145 (Duplicate): osdmap decode error in rados/standalone/*
... Sage Weil
12:03 PM Feature #24099: osd: Improve workflow when creating OSD on raw block device if there was bluestor...
This is not a ceph-volume issue, the description of this issue doesn't point to a ceph-volume operation, but rather, ... Alfredo Deza

05/15/2018

10:58 PM Bug #23145: OSD crashes during recovery of EC pg
Each OSD is on its own host- these are small arm64 machines. Unfortunately i've already tried stopping osd6, it just ... Peter Woodman
10:37 PM Bug #23145: OSD crashes during recovery of EC pg
Hmm, it's possible that if you stop osd.6 that this PG will be able to peer with the remaining OSDs... want to give i... Sage Weil
10:34 PM Bug #23145: OSD crashes during recovery of EC pg
Peter Woodman wrote:
> For the record, I discovered recently that a number of OSDs were operating with write caching...
Sage Weil
10:33 PM Bug #23145: OSD crashes during recovery of EC pg
Hmm, I think the problem comes before that. This is problematic:... Sage Weil
10:20 PM Bug #23145: OSD crashes during recovery of EC pg
For the record, I discovered recently that a number of OSDs were operating with write caching enabled, and because th... Peter Woodman
10:15 PM Bug #23145: OSD crashes during recovery of EC pg
This code appears to be the culprit, at least in this case:... Sage Weil
02:48 PM Bug #24023: Segfault on OSD in 12.2.5
This continues to happen every day, usually during scrub Alex Gorbachev
01:15 PM Backport #24135 (In Progress): mimic: Add support for obtaining a list of available compression o...
Kefu Chai
01:13 PM Backport #24135 (Resolved): mimic: Add support for obtaining a list of available compression options
https://github.com/ceph/ceph/pull/22004 Kefu Chai
12:29 PM Feature #22448 (Resolved): Visibility for snap trim queue length
Already merged to master, luminous and jewel. Piotr Dalek
12:28 PM Backport #22449 (Resolved): jewel: Visibility for snap trim queue length
Piotr Dalek
10:44 AM Bug #23767: "ceph ping mon" doesn't work
Confirmed on my cluster (13.0.2-1969-g49365c7). John Spray
10:37 AM Fix #24126: ceph osd purge command error message improvement
How are you seeing that ugly logfile style output? When I try it, it looks like this:... John Spray
10:32 AM Feature #24127: "osd purge" should print more helpful message when daemon is up
This is completely reasonable as a general point, but not really actionable as a tracker ticket -- we aren't ever goi... John Spray
10:31 AM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
I can't post using ceph-post-file, so I uploaded file here https://eocloud.eu:8080/swift/v1/rwadolowski/ceph-osd.33.l... Rafal Wadolowski
06:31 AM Bug #24007: rados.connect get a segmentation fault
John Spray wrote:
> Is there a backtrace or any other message from the crash?
there are many different backtraces.
xianpao chen
03:15 AM Backport #24015 (In Progress): luminous: UninitCondition in PG::RecoveryState::Incomplete::react(...
https://github.com/ceph/ceph/pull/21993 Prashant D
 

Also available in: Atom