Project

General

Profile

Activity

From 05/22/2018 to 06/20/2018

06/20/2018

10:13 PM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
Sage Weil wrote:
> Can you generate an osd log with 'debug osd = 20' for the crashing osd that leads up to the crash...
Sage Weil
10:13 PM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
Can you generate an osd log with 'debug osd = 20' for the crashing osd that leads up to the crash? Sage Weil
09:50 PM Bug #24422 (Duplicate): Ceph OSDs crashing in BlueStore::queue_transactions() using EC
Josh Durgin
10:11 PM Bug #23145: OSD crashes during recovery of EC pg
Two basic theories:
1. There is a bug that prematurely advances can_rollback_to
2. One of Peter's OSDs warped bac...
Sage Weil
10:05 PM Bug #23145: OSD crashes during recovery of EC pg
Sage Weil wrote:
> Zengran Zhang wrote:
> > osd in last peering stage will call pg_log.roll_forward(at last of PG:...
Sage Weil
10:03 PM Bug #23145 (Need More Info): OSD crashes during recovery of EC pg
Yong Wang, can you provide a full osd log with debug osd = 20 for the primary osd for the PG leading up to the crash... Sage Weil
09:22 PM Bug #23145: OSD crashes during recovery of EC pg
Zengran Zhang wrote:
> osd in last peering stage will call pg_log.roll_forward(at last of PG::activate), is there p...
Sage Weil
01:46 AM Bug #23145: OSD crashes during recovery of EC pg
@Sage Weil @Zengran Zhang
could you shared something about this bug recently?
Yong Wang
01:44 AM Bug #23145: OSD crashes during recovery of EC pg
hi all,did it has any updates please? Yong Wang
10:02 PM Backport #24599 (In Progress): mimic: failed to load OSD map for epoch X, got 0 bytes
Nathan Cutler
10:01 PM Backport #24599 (Resolved): mimic: failed to load OSD map for epoch X, got 0 bytes
https://github.com/ceph/ceph/pull/22651 Nathan Cutler
09:47 PM Bug #24448 (Won't Fix): (Filestore) ABRT report for package ceph has reached 10 occurrences
This is likely due to filestore becoming overloaded (hence waiting on throttles) and hitting the filestore op thread ... Josh Durgin
09:38 PM Bug #24511 (Duplicate): osd crushed at thread_name:safe_timer
Josh Durgin
09:37 PM Bug #24515: "[WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, mon.c has slow...
Kefu, can you take a look at this? Josh Durgin
09:36 PM Bug #24531: Mimic MONs have slow/long running ops
Joao, could you take a look at this? Josh Durgin
09:34 PM Bug #24549 (Won't Fix): FileStore::read assert (ABRT report for package ceph has reached 1000 occ...
As John described, this is not a bug in ceph but due to failing hardware or the filesystem below. Josh Durgin
09:25 PM Bug #23753 (Can't reproduce): "Error ENXIO: problem getting command descriptions from osd.4" in u...
re-open if it recurs Josh Durgin
09:19 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
Josh Durgin
09:12 PM Bug #22085 (Can't reproduce): jewel->luminous: "[ FAILED ] LibRadosAioEC.IsSafe" in upgrade:jew...
assuming this is the mon crush testing timeout, logs are gone so can't be sure Josh Durgin
08:10 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
backport for mimic: https://github.com/ceph/ceph/pull/22651 Sage Weil
08:07 PM Bug #24423 (Pending Backport): failed to load OSD map for epoch X, got 0 bytes
Sage Weil
07:46 PM Bug #24597 (Resolved): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_m...
... Neha Ojha
06:32 PM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
... Neha Ojha
03:01 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh

Now that I've looked at the code there is nothing surprising about the map handling. There is code in dequeue_op()...
David Zafman
12:37 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh

I was able to reproduce by running a loop of a single test case in qa/standalone/erasure-code/test-erasure-eio.sh
...
David Zafman
01:00 PM Backport #23673 (Resolved): jewel: auth: ceph auth add does not sanity-check caps
Nathan Cutler
12:52 PM Bug #23872 (Resolved): Deleting a pool with active watch/notify linger ops can result in seg fault
Nathan Cutler
12:52 PM Backport #23905 (Resolved): jewel: Deleting a pool with active watch/notify linger ops can result...
Nathan Cutler
12:21 PM Backport #24383 (In Progress): mimic: osd: stray osds in async_recovery_targets cause out of orde...
https://github.com/ceph/ceph/pull/22642 Prashant D
08:42 AM Bug #24588 (Fix Under Review): osd: may get empty info at recovery
-https://github.com/ceph/ceph/pull/22362- John Spray
01:42 AM Bug #24588 (Resolved): osd: may get empty info at recovery
2018-06-15 20:34:16.421720 7f89d2c24700 -1 /home/zzr/ceph.sf/src/osd/PG.cc: In function 'void PG::start_peering_inter... tao ning
08:40 AM Bug #24593: s390x: Ceph Monitor crashed with Caught signal (Aborted)
I expect that only people in possession of s390x hardware will be able to debug this
I see that there is another t...
John Spray
05:33 AM Bug #24593 (New): s390x: Ceph Monitor crashed with Caught signal (Aborted)
We are trying to setup ceph cluster on s390x platform.
ceph-mon service crashed with an error: *** Caught signal ...
Nayana Thorat
05:50 AM Feature #24591 (Fix Under Review): FileStore hasn't impl to get kv-db's statistics
Kefu Chai
03:22 AM Feature #24591: FileStore hasn't impl to get kv-db's statistics
https://github.com/ceph/ceph/pull/22633 Jack Lv
03:22 AM Feature #24591 (Fix Under Review): FileStore hasn't impl to get kv-db's statistics
In BlueStore, you can see kv-db's statistics by "ceph daemon osd.X dump_objectstore_kv_stats", but FileStore hasn't i... Jack Lv
03:22 AM Feature #22147: Set multiple flags in a single command line
I don’t think we should skip it entirely. Many of the places that implement a check like that are using a common flag... Greg Farnum

06/19/2018

11:44 PM Bug #24487 (In Progress): osd: choose_acting loop
This happens when an osd which is part of the acting set and not a part the upset, gets chosen as an async_recovery_t... Neha Ojha
10:51 PM Backport #23673: jewel: auth: ceph auth add does not sanity-check caps
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21367
merged
Yuri Weinstein
10:50 PM Backport #23905: jewel: Deleting a pool with active watch/notify linger ops can result in seg fault
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21754
merged
Yuri Weinstein
10:49 PM Feature #22147: Set multiple flags in a single command line
It seems fair to assume that "unset" should support this also.
Question: should settings that require --yes-i-real...
Jesse Williamson
10:40 PM Bug #24587: librados api aio tests race condition
http://pulpito.ceph.com/yuriw-2018-06-13_14:55:30-rados-wip-yuri4-testing-2018-06-12-2037-jewel-distro-basic-smithi/2... Josh Durgin
10:38 PM Bug #24587 (Resolved): librados api aio tests race condition
Seen in a jewel integration branch with no OSD changes:
http://pulpito.ceph.com/yuriw-2018-06-12_22:32:43-rados-wi...
Josh Durgin
09:58 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
I did a run based on d9284902e1b2e292595696caf11cdead18acec96 which is a branch off of master.
http://pulpito.ceph...
David Zafman
07:24 PM Backport #24584 (Resolved): luminous: osdc: wrong offset in BufferHead
https://github.com/ceph/ceph/pull/22865 Nathan Cutler
07:24 PM Backport #24583 (Resolved): mimic: osdc: wrong offset in BufferHead
https://github.com/ceph/ceph/pull/22869 Nathan Cutler
06:02 PM Bug #19971 (Resolved): osd: deletes are performed inline during pg log processing
Nathan Cutler
06:01 PM Backport #22406 (Rejected): jewel: osd: deletes are performed inline during pg log processing
This change was deemed too invasive at such a late stage in Jewel's life cycle. Nathan Cutler
06:01 PM Backport #22405 (Rejected): jewel: store longer dup op information
This change was deemed too invasive at such a late stage in Jewel's life cycle. Nathan Cutler
06:00 PM Backport #22400 (Rejected): jewel: PR #16172 causing performance regression
This change was deemed too invasive at such a late stage in Jewel's life cycle. Nathan Cutler
04:10 PM Bug #24484 (Pending Backport): osdc: wrong offset in BufferHead
Jason Dillaman
11:54 AM Bug #24448: (Filestore) ABRT report for package ceph has reached 10 occurrences
OSD killed by signal, something like OOM incidents perhaps? John Spray
11:53 AM Bug #24450 (Duplicate): OSD Caught signal (Aborted)
http://tracker.ceph.com/issues/24423 Igor Fedotov
11:51 AM Bug #24559 (Fix Under Review): building error for QAT decompress
John Spray
02:10 AM Bug #24559 (Fix Under Review): building error for QAT decompress
The parameter of decompress changes from 'bufferlist::iterator' to 'bufferlist::const_iterator', but chis change miss... Qiaowei Ren
11:34 AM Bug #24549: FileStore::read assert (ABRT report for package ceph has reached 1000 occurrences)
Presumably this is underlying FS failures tripping asserts rather than a bug (perhaps people using ZFS on centos, or ... John Spray
07:26 AM Backport #24355 (In Progress): mimic: osd: pg hard limit too easy to hit
https://github.com/ceph/ceph/pull/22621 Prashant D

06/18/2018

05:51 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
Sage Weil
11:45 AM Bug #24549 (Won't Fix): FileStore::read assert (ABRT report for package ceph has reached 1000 occ...
FileStore::read(coll_t, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int, bool)
...
Kaleb KEITHLEY
07:11 AM Backport #24356 (In Progress): luminous: osd: pg hard limit too easy to hit
https://github.com/ceph/ceph/pull/22592 Prashant D

06/16/2018

02:16 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
How to fix installed Mimic (upgraded from Luminous) with this fix? Is there any way to make startup OSD not requestin... Lazuardi Nasution

06/15/2018

11:40 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I've fixed it here: https://github.com/ceph/ceph/pull/22585 Paul Emmerich
01:36 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
Not sure if this is related, but for a few days, I'm not able to modify crushmap (like adding or removing OSD) on a l... Michel Nicol
09:23 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
Seeing the same here with a new Mimic cluster.
I purged a few OSDs (deployment went wrong) and now they can't star...
Wido den Hollander
03:56 PM Bug #24057: cbt fails to copy results to the archive dir
Neha Ojha
02:48 PM Bug #24531: Mimic MONs have slow/long running ops
... Wido den Hollander
02:41 PM Bug #24531: Mimic MONs have slow/long running ops
What's the output of "ceph versions" on this cluster?
We had issues in the lab with OSD failure reports not gettin...
Greg Farnum
02:20 PM Bug #24531 (Resolved): Mimic MONs have slow/long running ops
When setting up a Mimic 13.2.0 cluster I saw a message like this:... Wido den Hollander
08:39 AM Bug #24529 (New): monitor report empty client io rate when clock not synchronized
we run rados bench when cluster is warn and clock is not synchronized. on the other hand, we watch io speed from resu... hikdata hik
05:08 AM Backport #24351 (In Progress): luminous: slow mon ops from osd_failure
https://github.com/ceph/ceph/pull/22568 Prashant D

06/14/2018

10:21 PM Bug #21142 (Need More Info): OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Sage Weil
10:20 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Tim, Dexter, is this something that is reproducible in your environment? I haven't seen this one, which makes me ver... Sage Weil
07:41 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh

This might be caused by 52dd99e3011bfc787042fe105e02c11b28867c4c which was included in https://github.com/ceph/ceph...
David Zafman
07:27 PM Bug #24526: Mimic OSDs do not start after deleting some pools with size=1
I solved this issue by monkey-patching OSD code:... Vitaliy Filippov
03:48 PM Bug #24526: Mimic OSDs do not start after deleting some pools with size=1
P.S: This happened just after deleting some pool with size=1 - several OSDs died immediately and the latest error mes... Vitaliy Filippov
03:24 PM Bug #24526 (New): Mimic OSDs do not start after deleting some pools with size=1
After some amount of test actions involving creating pools with size=min_size=1 and then deleting them, most OSDs fai... Vitaliy Filippov
07:06 PM Feature #24527 (New): Need a pg query that doens't include invalid peer information

Some fields in the peer info remain unchanged after a peer transitions from being the primary. This information ma...
David Zafman
01:13 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I am getting the same issue.
I also upgraded to Luminous to Mimic.
I used: ceph osd purge
Grant Slater
11:48 AM Backport #24198 (Resolved): luminous: mon: slow op on log message
Nathan Cutler
11:47 AM Backport #24216 (Resolved): luminous: "process (unknown)" in ceph logs
Nathan Cutler
11:46 AM Bug #24167 (Resolved): Module 'balancer' has failed: could not find bucket -14
Nathan Cutler
11:46 AM Backport #24213 (Resolved): mimic: Module 'balancer' has failed: could not find bucket -14
Nathan Cutler
11:45 AM Backport #24214 (Resolved): luminous: Module 'balancer' has failed: could not find bucket -14
Nathan Cutler
05:54 AM Backport #24332 (In Progress): mimic: local_reserver double-reservation of backfilled pg
https://github.com/ceph/ceph/pull/22559 Prashant D

06/13/2018

10:01 PM Backport #24198: luminous: mon: slow op on log message
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22109
merged
Yuri Weinstein
10:00 PM Backport #24216: luminous: "process (unknown)" in ceph logs
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22290
merged
Yuri Weinstein
09:59 PM Backport #24214: luminous: Module 'balancer' has failed: could not find bucket -14
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22308
merged
Yuri Weinstein
08:13 PM Bug #24515 (New): "[WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, mon.c ha...
This seems to be rhel specific
Run: http://pulpito.ceph.com/yuriw-2018-06-12_21:09:43-fs-master-distro-basic-smith...
Yuri Weinstein
05:19 PM Bug #23966 (Resolved): Deleting a pool with active notify linger ops can result in seg fault
Nathan Cutler
05:19 PM Backport #24059 (Resolved): luminous: Deleting a pool with active notify linger ops can result in...
Nathan Cutler
04:46 PM Backport #24468 (In Progress): mimic: tell ... config rm <foo> not idempotent
Nathan Cutler
04:35 PM Backport #24245 (Resolved): luminous: Manager daemon y is unresponsive during teuthology cluster ...
Nathan Cutler
04:34 PM Backport #24374 (Resolved): luminous: mon: auto compaction on rocksdb should kick in more often
Nathan Cutler
12:56 PM Bug #24511 (Duplicate): osd crushed at thread_name:safe_timer
h1. ENV
*ceph version*...
Lei Liu
11:29 AM Bug #23049: ceph Status shows only WARN when traffic to cluster fails
hi,
which is the expected fix release version?
Thanks,
Nokia ceph-users
10:16 AM Backport #24501 (In Progress): luminous: osd: eternal stuck PG in 'unfound_recovery'
Nathan Cutler
10:16 AM Backport #24500 (In Progress): mimic: osd: eternal stuck PG in 'unfound_recovery'
Nathan Cutler

06/12/2018

08:01 AM Backport #24501 (Resolved): luminous: osd: eternal stuck PG in 'unfound_recovery'
https://github.com/ceph/ceph/pull/22546 Nathan Cutler
08:01 AM Backport #24500 (Resolved): mimic: osd: eternal stuck PG in 'unfound_recovery'
https://github.com/ceph/ceph/pull/22545 Nathan Cutler
08:00 AM Backport #24495 (Resolved): luminous: osd: segv in Session::have_backoff
https://github.com/ceph/ceph/pull/22729 Nathan Cutler
08:00 AM Backport #24494 (Resolved): mimic: osd: segv in Session::have_backoff
https://github.com/ceph/ceph/pull/22730 Nathan Cutler
03:22 AM Bug #24486 (Pending Backport): osd: segv in Session::have_backoff
Sage Weil

06/11/2018

09:32 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I am going to add this test for upgrade as well, steps to recreate... Vasu Kulkarni
04:19 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I have also experienced this issue while continuing the Bluestore conversion of OSDs on my Ceph cluster, after carryi... Gavin Baker
02:16 PM Backport #24059: luminous: Deleting a pool with active notify linger ops can result in seg fault
Casey Bodley wrote:
> https://github.com/ceph/ceph/pull/22143
merged
Yuri Weinstein
02:33 AM Bug #24487: osd: choose_acting loop
It looks like the "choose_async_recovery_ec candidates by cost are: 178,2(0)" line is different in the second case.. ... Sage Weil
01:45 AM Bug #24487 (Resolved): osd: choose_acting loop
ec pg looping between [2,3,0,1] and [-,3,0,1].
osd.3 says...
Sage Weil

06/10/2018

06:41 PM Bug #24486 (Fix Under Review): osd: segv in Session::have_backoff
https://github.com/ceph/ceph/pull/22497 Sage Weil
06:34 PM Bug #24486 (Resolved): osd: segv in Session::have_backoff
... Sage Weil
04:41 PM Bug #24485 (Resolved): LibRadosTwoPoolsPP.ManifestUnset failure
... Sage Weil
03:30 PM Bug #24484 (Fix Under Review): osdc: wrong offset in BufferHead
Kefu Chai
03:15 PM Bug #24484: osdc: wrong offset in BufferHead
this bug will lead to an exception "buffer::end_of_buffer" which is thrown in function "buffer::list::substr_of"
Thi...
dongdong tao
03:08 PM Bug #24484: osdc: wrong offset in BufferHead
PR: https://github.com/ceph/ceph/pull/22495 dongdong tao
03:07 PM Bug #24484 (Resolved): osdc: wrong offset in BufferHead
The offset of BufferHead should be "opos - bh->start()" dongdong tao
02:12 AM Backport #24329 (In Progress): mimic: assert manager.get_num_active_clean() == pg_num on rados/si...
Kefu Chai

06/09/2018

07:21 PM Bug #24321 (Pending Backport): assert manager.get_num_active_clean() == pg_num on rados/singleton...
Sage Weil
05:56 AM Bug #24321 (Fix Under Review): assert manager.get_num_active_clean() == pg_num on rados/singleton...
https://github.com/ceph/ceph/pull/22485 Kefu Chai
06:50 PM Bug #22462: mon: unknown message type 1537 in luminous->mimic upgrade tests
Maybe i have the same issue during upgrade Jewel->Luminous http://tracker.ceph.com/issues/24481?next_issue_id=24480&p... Aleksandr Rudenko
02:23 PM Bug #24373 (Pending Backport): osd: eternal stuck PG in 'unfound_recovery'
Kefu Chai
11:20 AM Backport #24478 (Resolved): luminous: read object attrs failed at EC recovery
https://github.com/ceph/ceph/pull/24327 Nathan Cutler
11:18 AM Backport #24473 (Resolved): mimic: cosbench stuck at booting cosbench driver
https://github.com/ceph/ceph/pull/22887 Nathan Cutler
11:18 AM Backport #24472 (Resolved): mimic: Ceph-osd crash when activate SPDK
https://github.com/ceph/ceph/pull/22684 Nathan Cutler
11:18 AM Backport #24471 (Resolved): luminous: Ceph-osd crash when activate SPDK
https://github.com/ceph/ceph/pull/22686 Nathan Cutler
11:18 AM Backport #24468 (Resolved): mimic: tell ... config rm <foo> not idempotent
https://github.com/ceph/ceph/pull/22552 Nathan Cutler
06:07 AM Bug #24452 (Resolved): Backfill hangs in a test case in master not mimic
Kefu Chai

06/08/2018

11:03 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I can't reproduce this on any new Mimic cluster, it only happens on clusters upgraded from Luminous (which is why we ... Paul Emmerich
09:04 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I'm trying to make new OSDs with ceph-volume osd create --dmcrypt --bluestore --data /dev/sdg and am getting the same... Michael Sudnick
07:05 PM Bug #24454 (Duplicate): failed to recover before timeout expired
#24452 Sage Weil
12:29 PM Bug #24454 (Duplicate): failed to recover before timeout expired
tons of this on current master
http://pulpito.ceph.com/kchai-2018-06-06_04:56:43-rados-wip-kefu-testing-2018-06-06...
Sage Weil
07:05 PM Bug #24452 (Fix Under Review): Backfill hangs in a test case in master not mimic
https://github.com/ceph/ceph/pull/22478 Sage Weil
02:48 PM Bug #24452: Backfill hangs in a test case in master not mimic

Final messages on primary during backfill about pg 1.0....
David Zafman
04:57 AM Bug #24452 (Resolved): Backfill hangs in a test case in master not mimic

../qa/run-standalone.sh "osd-backfill-stats.sh TEST_backfill_down_out" 2>&1 | tee obs.log
This test times out wa...
David Zafman
02:34 PM Backport #23912: luminous: mon: High MON cpu usage when cluster is changing
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21968
merged
Yuri Weinstein
02:33 PM Backport #24245: luminous: Manager daemon y is unresponsive during teuthology cluster teardown
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22331
merged
Yuri Weinstein
02:31 PM Backport #24374: luminous: mon: auto compaction on rocksdb should kick in more often
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22360
merged
Yuri Weinstein
08:18 AM Bug #23352: osd: segfaults under normal operation
Experiencing the a safe_timer segfault with a freshly deployed cluster. No data on the cluster yet. Just an empty poo... Vangelis Tasoulas

06/07/2018

03:20 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
We are also seeing this when creating OSDs with IDs that existed previously.
I verified that the old osd was delet...
Paul Emmerich
01:21 PM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
https://github.com/ceph/ceph/pull/22456 Sage Weil
01:14 PM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
Okay, I see the problem. Two fixes: first, reset every pg on down->up (simpler approach), but the bigger issue is th... Sage Weil
12:58 PM Bug #24450: OSD Caught signal (Aborted)
I have the same problem.
http://tracker.ceph.com/issues/24423
Sergey Malinin
12:03 PM Bug #24450 (Duplicate): OSD Caught signal (Aborted)
Hi,
I have done a rolling_upgrade to mimic with ceph-ansible. It works perfect! Now, I want to deploy new OSDs, bu...
Peter Schulz
11:46 AM Bug #24448 (Won't Fix): (Filestore) ABRT report for package ceph has reached 10 occurrences
https://retrace.fedoraproject.org/faf/reports/bthash/fe768f98e5fff65f0c850668c4bdae8d4da7e086/
https://retrace.fedor...
Kaleb KEITHLEY

06/06/2018

09:11 PM Bug #24264 (Closed): ssd-primary crush rule not working as intended
I don't think there's a good way to express that requirement in the current crush language. The rule in the docs does... Josh Durgin
09:06 PM Bug #24362 (Triaged): ceph-objectstore-tool incorrectly invokes crush_location_hook
Seems like the way to fix this is to stop ceph-objectstore-tool from trying to use the crush location hook at all.
...
Josh Durgin
07:15 AM Bug #23145: OSD crashes during recovery of EC pg
-3> 2018-06-06 15:00:40.462930 7fffddb25700 -1 bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2... Yong Wang
02:45 AM Bug #23145: OSD crashes during recovery of EC pg
@Sage Weil
@Zengran Zahng
we meet the some question, and osd crash not recover until now.
env is 12.2.5 ec 2+1 b...
Yong Wang
06:02 AM Backport #24293 (In Progress): jewel: mon: slow op on log message
https://github.com/ceph/ceph/pull/22431 Prashant D
02:34 AM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
Attached full log (download ceph-osd.3.log.gz).
Points are:...
Kouya Shimura
12:33 AM Bug #24371 (Pending Backport): Ceph-osd crash when activate SPDK
Kefu Chai

06/05/2018

05:34 PM Bug #24365 (Pending Backport): cosbench stuck at booting cosbench driver
Neha Ojha
01:33 AM Bug #24365 (Fix Under Review): cosbench stuck at booting cosbench driver
https://github.com/ceph/ceph/pull/22405 Neha Ojha
04:04 PM Bug #24408 (Pending Backport): tell ... config rm <foo> not idempotent
Kefu Chai
11:00 AM Bug #24423 (Resolved): failed to load OSD map for epoch X, got 0 bytes
After upgrading to Mimic I deleted a non-lvm OSD and recreated it with 'ceph-volume lvm prepare --bluestore --data /d... Sergey Malinin
10:37 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
the same to https://tracker.ceph.com/issues/21475. and i already modify bluestore_deferred_throttle_bytes = 0
bluest...
鹏 张
10:31 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
2018-06-05T17:46:28.273183+08:00 node54 ceph-osd: /work/build/rpmbuild/BUILD/infinity-3.2.5/src/os/bluestore/BlueStor... 鹏 张
10:31 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
鹏 张 wrote:
> ceph version: 12.2.5
> data pool use Ec module 2 + 1.
> When restart one osd,it case crash and restar...
鹏 张
10:26 AM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
1.-45> 2018-06-05 17:47:56.886142 7f8972974700 -1 bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2)... 鹏 张
10:25 AM Bug #24422 (Duplicate): Ceph OSDs crashing in BlueStore::queue_transactions() using EC
ceph version: 12.2.5
data pool use Ec module 3 + 1.
When restart one osd,it case crash and restart more and more.
...
鹏 张
04:42 AM Bug #24419 (Won't Fix): ceph-objectstore-tool unable to open mon store
Hi,everyone;
I use luminous v12.2.5,and i try to recovery monitor database from osds,
I perform step by step acc...
dovefi Z
03:32 AM Backport #24291 (In Progress): jewel: common: JSON output from rados bench write has typo in max_...
https://github.com/ceph/ceph/pull/22407 Prashant D
02:37 AM Bug #23875: Removal of snapshot with corrupt replica crashes osd

If update_snap_map() ignores the error from remove_oid() we still crash because an op from the primary related to...
David Zafman
02:20 AM Backport #24292 (In Progress): mimic: common: JSON output from rados bench write has typo in max_...
https://github.com/ceph/ceph/pull/22406 Prashant D

06/04/2018

06:32 PM Bug #24368: osd: should not restart on permanent failures
It would, but the previous settings were there for a reason so I'm not sure if it's feasible to backport this for cep... Greg Farnum
05:10 PM Bug #24371 (Fix Under Review): Ceph-osd crash when activate SPDK
Greg Farnum
04:00 PM Bug #24408 (Fix Under Review): tell ... config rm <foo> not idempotent
https://github.com/ceph/ceph/pull/22395 Sage Weil
03:56 PM Bug #24408 (Resolved): tell ... config rm <foo> not idempotent
... Sage Weil
02:56 PM Backport #24407 (In Progress): mimic: read object attrs failed at EC recovery
Kefu Chai
02:56 PM Backport #24407 (Resolved): mimic: read object attrs failed at EC recovery
https://github.com/ceph/ceph/pull/22394 Kefu Chai
02:54 PM Bug #24406 (Resolved): read object attrs failed at EC recovery
https://github.com/ceph/ceph/pull/22196 Kefu Chai
02:18 PM Backport #24290 (In Progress): luminous: common: JSON output from rados bench write has typo in m...
https://github.com/ceph/ceph/pull/22391 Prashant D
11:53 AM Bug #24366 (Pending Backport): omap_digest handling still not correct
Kefu Chai
06:27 AM Bug #23352: osd: segfaults under normal operation
Looking at the crash in http://tracker.ceph.com/issues/23352#note-14 there's a fairly glaring problem.... Brad Hubbard
12:14 AM Bug #23352: osd: segfaults under normal operation
Hi Kjetil,
Sure, worth a look, but AFAICT all access is protected by SafeTimers locks.
Brad Hubbard
02:08 AM Backport #24258 (In Progress): luminous: crush device class: Monitor Crash when moving Bucket int...
https://github.com/ceph/ceph/pull/22381 Prashant D

06/02/2018

12:04 AM Bug #24365 (In Progress): cosbench stuck at booting cosbench driver
Two things caused this issue:
1. cosbench requires openjdk-8. The cbt task does install this dependency, but we al...
Neha Ojha

06/01/2018

08:05 PM Bug #23352: osd: segfaults under normal operation
Brad Hubbard wrote:
> I've confirmed that in all of the SafeTimer segfaults the 'schedule' multimap is empty, indica...
Kjetil Joergensen
06:01 PM Bug #24368: osd: should not restart on permanent failures
Sounds like something that would be useful in our stable releases - Greg, do you agree? Nathan Cutler
05:56 PM Backport #24360 (Need More Info): luminous: osd: leaked Session on osd.7
Do Not Backport For Now
see https://github.com/ceph/ceph/pull/22339#issuecomment-393574371 for details
Nathan Cutler
05:44 PM Backport #24383 (Resolved): mimic: osd: stray osds in async_recovery_targets cause out of order ops
https://github.com/ceph/ceph/pull/22889 Nathan Cutler
05:28 PM Backport #24381 (Resolved): luminous: omap_digest handling still not correct
https://github.com/ceph/ceph/pull/22375 David Zafman
05:28 PM Backport #24380 (Resolved): mimic: omap_digest handling still not correct
https://github.com/ceph/ceph/pull/22374 David Zafman
08:02 AM Bug #24342: Monitor's routed_requests leak
Greg Farnum wrote:
> What version are you running? The MRoute handling is all pretty old; though we've certainly dis...
Xuehan Xu
07:16 AM Bug #24373 (Fix Under Review): osd: eternal stuck PG in 'unfound_recovery'
Mykola Golub
05:22 AM Bug #24373: osd: eternal stuck PG in 'unfound_recovery'
https://github.com/ceph/ceph/pull/22358
Kouya Shimura
04:57 AM Bug #24373 (Resolved): osd: eternal stuck PG in 'unfound_recovery'
A PG might be eternally stuck in 'unfound_recovery' after some OSDs are marked down.
For example, the following st...
Kouya Shimura
06:12 AM Backport #24375 (In Progress): mimic: mon: auto compaction on rocksdb should kick in more often
Kefu Chai
06:11 AM Backport #24375 (Resolved): mimic: mon: auto compaction on rocksdb should kick in more often
https://github.com/ceph/ceph/pull/22361 Kefu Chai
06:10 AM Backport #24374 (In Progress): luminous: mon: auto compaction on rocksdb should kick in more often
Kefu Chai
06:08 AM Backport #24374 (Resolved): luminous: mon: auto compaction on rocksdb should kick in more often
https://github.com/ceph/ceph/pull/22360 Kefu Chai
06:08 AM Bug #24361 (Pending Backport): auto compaction on rocksdb should kick in more often
Kefu Chai
04:47 AM Bug #24371: Ceph-osd crash when activate SPDK
This is a bug in NVMEDevice, the bug fix has been committed.
Please have a review PR https://github.com/ceph/ceph...
Anonymous
02:02 AM Bug #24371: Ceph-osd crash when activate SPDK
I'm working on the issue. Anonymous
02:01 AM Bug #24371 (Resolved): Ceph-osd crash when activate SPDK
Enable SPDK and configure bluestore as mentioned in http://docs.ceph.com/docs/master/rados/configuration/bluestore-co... Anonymous
02:56 AM Feature #24363: Configure DPDK with mellanox NIC
next, compiling pass. but all binaries can not run.
output error
EAL: VFIO_RESOURCE_LIST tailq is already registere...
YongSheng Zhang
02:38 AM Feature #24363: Configure DPDK with mellanox NIC
log details
mellanox NIC over fabric
When compiling output error.
1. lack numa and cryptopp libraries
I ...
YongSheng Zhang
12:23 AM Feature #24363: Configure DPDK with mellanox NIC
Append
NIC over optical fiber
YongSheng Zhang
12:07 AM Bug #24160 (Resolved): Monitor down when large store data needs to compact triggered by ceph tell...
Kefu Chai

05/31/2018

11:34 PM Bug #24368 (In Progress): osd: should not restart on permanent failures
https://github.com/ceph/ceph/pull/22349 has the simple restart interval change. Will investigate the options for cond... Greg Farnum
11:25 PM Bug #24368: osd: should not restart on permanent failures
See https://www.freedesktop.org/software/systemd/man/systemd.service.html#Restart= for the details on Restart options. Greg Farnum
11:17 PM Bug #24368 (Resolved): osd: should not restart on permanent failures
Last week at OpenStack I heard a few users report OSDs were not failing hard and fast as they should be on disk issue... Greg Farnum
07:01 PM Bug #24366 (In Progress): omap_digest handling still not correct
https://github.com/ceph/ceph/pull/22346 David Zafman
05:39 PM Bug #24366 (Resolved): omap_digest handling still not correct

When running bluestore the object info data_digest is not needed. In that case the omap_digest handling is still b...
David Zafman
06:08 PM Bug #24349 (Pending Backport): osd: stray osds in async_recovery_targets cause out of order ops
Josh Durgin
12:51 AM Bug #24349: osd: stray osds in async_recovery_targets cause out of order ops
https://github.com/ceph/ceph/pull/22330 Josh Durgin
12:46 AM Bug #24349 (Resolved): osd: stray osds in async_recovery_targets cause out of order ops
Related to https://tracker.ceph.com/issues/23827
http://pulpito.ceph.com/yuriw-2018-05-24_17:07:20-powercycle-mast...
Neha Ojha
05:07 PM Bug #24365 (Resolved): cosbench stuck at booting cosbench driver
... Neha Ojha
03:54 PM Bug #24342: Monitor's routed_requests leak
What version are you running? The MRoute handling is all pretty old; though we've certainly discovered a number of le... Greg Farnum
02:17 PM Feature #24363 (New): Configure DPDK with mellanox NIC
Hi all
Whether ceph-13.1.0 support DPDK on mellanox NIC?
I found many issues when compiling. I even though handle t...
YongSheng Zhang
01:22 PM Bug #24362 (Triaged): ceph-objectstore-tool incorrectly invokes crush_location_hook
Ceph release being used: 12.5.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
/etc/ceph/ceph.conf c...
Roman Chebotarev
11:50 AM Backport #24359 (In Progress): mimic: osd: leaked Session on osd.7
Kefu Chai
07:39 AM Backport #24359 (Resolved): mimic: osd: leaked Session on osd.7
https://github.com/ceph/ceph/pull/22339 Nathan Cutler
09:40 AM Bug #24361 (Fix Under Review): auto compaction on rocksdb should kick in more often
https://github.com/ceph/ceph/pull/22337 Kefu Chai
09:07 AM Bug #24361 (Resolved): auto compaction on rocksdb should kick in more often
in rocksdb, by default, "max_bytes_for_level_base" is 256MB, "max_bytes_for_level_multiplier" is 10. so with this set... Kefu Chai
07:39 AM Backport #24360 (Resolved): luminous: osd: leaked Session on osd.7
https://github.com/ceph/ceph/pull/29859 Nathan Cutler
07:38 AM Backport #24350 (In Progress): mimic: slow mon ops from osd_failure
Nathan Cutler
07:37 AM Backport #24350 (Resolved): mimic: slow mon ops from osd_failure
https://github.com/ceph/ceph/pull/22297 Nathan Cutler
07:38 AM Backport #24356 (Resolved): luminous: osd: pg hard limit too easy to hit
https://github.com/ceph/ceph/pull/22592 Nathan Cutler
07:38 AM Backport #24355 (Resolved): mimic: osd: pg hard limit too easy to hit
https://github.com/ceph/ceph/pull/22621 Nathan Cutler
07:37 AM Backport #24351 (Resolved): luminous: slow mon ops from osd_failure
https://github.com/ceph/ceph/pull/22568 Nathan Cutler
05:31 AM Bug #20924 (Pending Backport): osd: leaked Session on osd.7
i think https://github.com/ceph/ceph/pull/22292 indeed addresses this issue
https://github.com/ceph/ceph/pull/22384
Kefu Chai
04:51 AM Backport #24246 (In Progress): mimic: Manager daemon y is unresponsive during teuthology cluster ...
https://github.com/ceph/ceph/pull/22333 Prashant D
02:55 AM Backport #24245 (In Progress): luminous: Manager daemon y is unresponsive during teuthology clust...
https://github.com/ceph/ceph/pull/22331 Prashant D

05/30/2018

11:31 PM Bug #24160 (Fix Under Review): Monitor down when large store data needs to compact triggered by c...
Josh Durgin
10:45 PM Bug #23830: rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
This looks like a similar failure: http://pulpito.ceph.com/nojha-2018-05-30_20:43:02-rados-wip-async-up2-2018-05-30-d... Neha Ojha
02:17 PM Bug #24342: Monitor's routed_requests leak
It seems that this problem has been fixed by https://github.com/ceph/ceph/commit/39e06ef8f070e136e54452bdea3f6105cd79... Xuehan Xu
01:10 PM Bug #24342 (Closed): Monitor's routed_requests leak
Joao Eduardo Luis
12:09 PM Bug #24342: Monitor's routed_requests leak
Sorry, it seems that the latest version doesn't have this problem. Really sorry. please close this. Xuehan Xu
09:36 AM Bug #24342: Monitor's routed_requests leak
https://github.com/ceph/ceph/pull/22315 Xuehan Xu
08:54 AM Bug #24342 (Closed): Monitor's routed_requests leak
Recently, we found that, in our non-leader monitors, there are a lot of routed requests that has not been recycled, a... Xuehan Xu
01:58 PM Bug #24327: osd: segv in pg_log_entry_t::encode()
Sage Weil wrote:
> This crash doesn't look familiar, and it's not clear to me what might cause segfault here. Do yo...
frank lin
01:48 PM Bug #24327 (Need More Info): osd: segv in pg_log_entry_t::encode()
This crash doesn't look familiar, and it's not clear to me what might cause segfault here. Do you have a core file? Sage Weil
01:55 PM Bug #24339: FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in scan_requ...
Josh and I noticed this by code inspection. I'm nailing down out of space handling nits in the kernel client and wan... Ilya Dryomov
01:46 PM Bug #24339: FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in scan_requ...
This is somewhat by design (or lack thereof)... the fail-safe check is there to prevent us from writing when we are *... Sage Weil
05:40 AM Backport #24215 (In Progress): mimic: "process (unknown)" in ceph logs
https://github.com/ceph/ceph/pull/22311 Prashant D
03:29 AM Backport #24214 (In Progress): luminous: Module 'balancer' has failed: could not find bucket -14
https://github.com/ceph/ceph/pull/22308 Prashant D

05/29/2018

11:01 PM Feature #23979: Limit pg log length during recovery/backfill so that we don't run out of memory.
Initial testing is referenced here: https://github.com/ceph/ceph/pull/21508 Josh Durgin
10:59 PM Bug #24243 (Pending Backport): osd: pg hard limit too easy to hit
https://github.com/ceph/ceph/pull/22187 Josh Durgin
10:59 PM Bug #24304 (Fix Under Review): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
wrong bug Josh Durgin
10:58 PM Bug #24304 (Pending Backport): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
https://github.com/ceph/ceph/pull/22187 Josh Durgin
10:03 PM Feature #11601: osd: share cached osdmaps across osd daemons
A vague possibility that the future seastar-based OSD may run each logical disk OSD inside a single process, which co... Greg Farnum
07:38 PM Bug #24339 (New): FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in sca...
FULL_FORCE ops are dropped if fail-safe full check fails in do_op(). scan_requests() uses op->respects_full() which ... Ilya Dryomov
06:49 PM Bug #23646 (Resolved): scrub interaction with HEAD boundaries and clones is broken
David Zafman
01:11 PM Bug #24322 (Pending Backport): slow mon ops from osd_failure
mimic: https://github.com/ceph/ceph/pull/22297 Kefu Chai
12:53 PM Backport #24328 (In Progress): luminous: assert manager.get_num_active_clean() == pg_num on rados...
Kefu Chai
09:40 AM Backport #24328 (Resolved): luminous: assert manager.get_num_active_clean() == pg_num on rados/si...
https://github.com/ceph/ceph/pull/22296 Nathan Cutler
12:47 PM Backport #24329 (Resolved): mimic: assert manager.get_num_active_clean() == pg_num on rados/singl...
Kefu Chai
09:40 AM Backport #24329 (Resolved): mimic: assert manager.get_num_active_clean() == pg_num on rados/singl...
https://github.com/ceph/ceph/pull/22492 Nathan Cutler
10:02 AM Bug #22530 (Resolved): pool create cmd's expected_num_objects is not correctly interpreted
Nathan Cutler
10:02 AM Backport #23316 (Resolved): jewel: pool create cmd's expected_num_objects is not correctly interp...
Nathan Cutler
10:01 AM Backport #24058 (Resolved): jewel: Deleting a pool with active notify linger ops can result in se...
Nathan Cutler
09:59 AM Backport #24244 (Resolved): jewel: osd/EC: slow/hung ops in multimds suite test
Nathan Cutler
09:59 AM Backport #24244 (In Progress): jewel: osd/EC: slow/hung ops in multimds suite test
Nathan Cutler
09:56 AM Backport #24294 (Resolved): mimic: control-c on ceph cli leads to segv
Nathan Cutler
09:55 AM Backport #24294 (In Progress): mimic: control-c on ceph cli leads to segv
Nathan Cutler
09:52 AM Backport #24256 (Resolved): mimic: osd: Assertion `!node_algorithms::inited(this->priv_value_tra...
Nathan Cutler
09:41 AM Backport #24333 (Resolved): luminous: local_reserver double-reservation of backfilled pg
https://github.com/ceph/ceph/pull/23493 Nathan Cutler
09:41 AM Backport #24332 (Resolved): mimic: local_reserver double-reservation of backfilled pg
https://github.com/ceph/ceph/pull/22559 Nathan Cutler
08:26 AM Feature #24231: librbd/libcephfs/librgw should ignore rados_mon/osd_op_timeouts options (requires...
libcephfs doesn't use librados, so it doesn't need any changes.
The rados_mon_op_timeout affects anything that use...
John Spray
07:55 AM Bug #20924: osd: leaked Session on osd.7
https://github.com/ceph/ceph/pull/22292 might address this issue. Kefu Chai
07:37 AM Bug #24327 (Need More Info): osd: segv in pg_log_entry_t::encode()
The affected osd restarted itself and everything seems fine then.But what is the cause of the crash?... frank lin
06:37 AM Backport #24204 (In Progress): mimic: LibRadosMiscPool.PoolCreationRace segv
https://github.com/ceph/ceph/pull/22291 Prashant D
06:20 AM Backport #24216 (In Progress): luminous: "process (unknown)" in ceph logs
https://github.com/ceph/ceph/pull/22290 Prashant D
03:32 AM Bug #24321: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max-pg-per-osd...
mimic: https://github.com/ceph/ceph/pull/22288 Kefu Chai
03:31 AM Bug #24321 (Pending Backport): assert manager.get_num_active_clean() == pg_num on rados/singleton...
Kefu Chai

05/28/2018

10:54 PM Feature #24176: osd: add command to drop OSD cache
Anyone looking into this? If not, I can pick it up. Mohamad Gebai
03:21 PM Bug #24145 (Duplicate): osdmap decode error in rados/standalone/*
Kefu Chai
03:19 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
/a/kchai-2018-05-28_09:21:54-rados-wip-kefu-testing-2018-05-28-1113-distro-basic-smithi/2601187
on mimic branch.
...
Kefu Chai
11:51 AM Bug #24321 (Fix Under Review): assert manager.get_num_active_clean() == pg_num on rados/singleton...
https://github.com/ceph/ceph/pull/22275 Kefu Chai
05:28 AM Bug #23352: osd: segfaults under normal operation
I've confirmed that in all of the SafeTimer segfaults the 'schedule' multimap is empty, indicating this is the last e... Brad Hubbard
05:16 AM Bug #23352: osd: segfaults under normal operation
If we look at the coredump from 23585 and compare it to this message.
[117735.930255] safe_timer[52573]: segfault ...
Brad Hubbard
04:32 AM Bug #24023 (Duplicate): Segfault on OSD in 12.2.5
Duplicate of 23352 Brad Hubbard
04:30 AM Bug #23564 (Duplicate): OSD Segfaults
Duplicate of 23352 Brad Hubbard
04:28 AM Bug #23585 (Duplicate): osd: safe_timer segfault
Duplicate of 23352 Brad Hubbard
02:47 AM Bug #24160: Monitor down when large store data needs to compact triggered by ceph tell mon.xx com...
PR :
https://github.com/ceph/ceph/pull/22056/
相洋 于

05/27/2018

05:58 PM Feature #11601: osd: share cached osdmaps across osd daemons
Attached the file CephScaleTestMarch2015.pdf
Do we have any plan for this guys?
Chuong Le
02:55 PM Bug #24322 (Fix Under Review): slow mon ops from osd_failure
https://github.com/ceph/ceph/pull/22259 Sage Weil
02:46 PM Bug #23585: osd: safe_timer segfault
Hi Brad, sure, thanks. Alex Gorbachev

05/26/2018

01:51 PM Bug #24322 (Resolved): slow mon ops from osd_failure
... Sage Weil
01:39 PM Bug #24162 (Resolved): control-c on ceph cli leads to segv
Sage Weil
01:38 PM Bug #24219 (Resolved): osd: InProgressOp freed by on_change(); in-flight op may use-after-free in...
Sage Weil
01:36 PM Bug #24321 (Resolved): assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max...
... Sage Weil
01:29 PM Bug #24320 (Resolved): out of order reply and/or osd assert with set-chunks-read.yaml
... Sage Weil
02:00 AM Bug #23614 (Pending Backport): local_reserver double-reservation of backfilled pg
Josh Durgin
01:59 AM Bug #23490 (Duplicate): luminous: osd: double recovery reservation for PG when EIO injected (whil...
Josh Durgin
01:25 AM Bug #23352: osd: segfaults under normal operation
Thanks,
That gives us seven cores across 12.2.4-12.2.5 on Xenial and Centos and one core from the MMgrReport::enco...
Brad Hubbard
12:35 AM Bug #23431 (Duplicate): OSD Segmentation fault in thread_name:safe_timer
Closing as a duplicate of #23352 where we are focussing. Brad Hubbard
12:33 AM Bug #23564: OSD Segfaults
Since the stack from this core is the following can we also close this as a duplicate of 23352?
(gdb) bt
#0 0x00...
Brad Hubbard
12:31 AM Bug #23585: osd: safe_timer segfault
Alex,
Can we close this bug also as a duplicate of 23352?
Brad Hubbard
12:28 AM Bug #24023: Segfault on OSD in 12.2.5
Alex,
Why are we running multiple trackers for the same issue?
Can we close this as a duplicate?
Brad Hubbard

05/25/2018

10:25 PM Bug #23614 (Fix Under Review): local_reserver double-reservation of backfilled pg
Explanation of the problem and resolution included in the pull request.
https://github.com/ceph/ceph/pull/22255
Neha Ojha
10:06 PM Bug #24219 (Pending Backport): osd: InProgressOp freed by on_change(); in-flight op may use-after...
Sage Weil
09:25 PM Bug #24304 (Fix Under Review): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
This is due to the fast-path decoding for object_stat_sum_t not being updated in the backport. Fix: https://github.co... Josh Durgin
04:22 PM Bug #24304 (Closed): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
This appears to be specific to a downstream build, closing. John Spray
12:29 PM Bug #24304 (Resolved): MgrStatMonitor decode crash on 12.2.4->12.2.5 upgrade
... John Spray
03:08 PM Backport #24297 (Resolved): mimic: RocksDB compression is not supported at least on Debian.
Kefu Chai
11:03 AM Backport #24297 (Resolved): mimic: RocksDB compression is not supported at least on Debian.
https://github.com/ceph/ceph/pull/22183 Nathan Cutler
03:06 PM Bug #24023: Segfault on OSD in 12.2.5
ALso posted this in bug http://tracker.ceph.com/issues/23352
Hi Brad, we had one too just now, core dump and log:
...
Alex Gorbachev
08:04 AM Bug #24023: Segfault on OSD in 12.2.5
hi,
i've noticed similar/same segfault on my deployment. random segfaults on random osds appears under load or wit...
Jan Krcmar
03:05 PM Bug #23352: osd: segfaults under normal operation
Hi Brad, we had one too just now, core dump and log:
https://drive.google.com/open?id=1t1jfjqwjhUUBzWjxamos3Hr7ghj...
Alex Gorbachev
07:54 AM Bug #23352: osd: segfaults under normal operation
Thanks Beom-Seok,
I've set up a centos environment to debug those cores along with the Xenial ones. I will update ...
Brad Hubbard
03:11 AM Bug #23352: osd: segfaults under normal operation
Today two osd crashes.
coredump at:
https://drive.google.com/open?id=1rXtW0riZMBwP5OqrJ7QdRIOAsKFr-kYw
https://d...
Beom-Seok Park
02:10 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
https://github.com/ceph/ceph/pull/22126 merged to remove failures from rgw suite. moving to rados project Casey Bodley
12:28 PM Backport #24259 (Resolved): mimic: crush device class: Monitor Crash when moving Bucket into Defa...
Kefu Chai
11:03 AM Backport #24294 (Resolved): mimic: control-c on ceph cli leads to segv
https://github.com/ceph/ceph/pull/22225 Nathan Cutler
11:03 AM Backport #24293 (Resolved): jewel: mon: slow op on log message
https://github.com/ceph/ceph/pull/22431 Nathan Cutler
11:03 AM Backport #24292 (Resolved): mimic: common: JSON output from rados bench write has typo in max_lat...
https://github.com/ceph/ceph/pull/22406 Nathan Cutler
11:03 AM Backport #24291 (Resolved): jewel: common: JSON output from rados bench write has typo in max_lat...
https://github.com/ceph/ceph/pull/22407 Nathan Cutler
11:03 AM Backport #24290 (Resolved): luminous: common: JSON output from rados bench write has typo in max_...
https://github.com/ceph/ceph/pull/22391 Nathan Cutler
03:47 AM Bug #24045 (Resolved): Eviction still raced with scrub due to preemption
David Zafman
03:47 AM Bug #22881 (Resolved): scrub interaction with HEAD boundaries and snapmapper repair is broken
David Zafman
03:46 AM Backport #24016 (Resolved): luminous: scrub interaction with HEAD boundaries and snapmapper repai...
David Zafman
03:43 AM Backport #23863 (Resolved): luminous: scrub interaction with HEAD boundaries and clones is broken
David Zafman
03:39 AM Backport #24153 (Resolved): luminous: Eviction still raced with scrub due to preemption
David Zafman
03:38 AM Bug #23267 (Resolved): scrub errors not cleared on replicas can cause inconsistent pg state when ...
David Zafman
03:37 AM Backport #23486 (Resolved): jewel: scrub errors not cleared on replicas can cause inconsistent pg...
David Zafman
03:30 AM Bug #23811: RADOS stat slow for some objects on same OSD
... Chang Liu

05/24/2018

08:41 PM Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica tak...
merged https://github.com/ceph/ceph/pull/21194 Yuri Weinstein
08:38 PM Backport #23316: jewel: pool create cmd's expected_num_objects is not correctly interpreted
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22050
merged
Yuri Weinstein
08:38 PM Backport #23316: jewel: pool create cmd's expected_num_objects is not correctly interpreted
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22050
merged
Yuri Weinstein
08:37 PM Bug #23966: Deleting a pool with active notify linger ops can result in seg fault
merged https://github.com/ceph/ceph/pull/22188 Yuri Weinstein
08:36 PM Bug #23769: osd/EC: slow/hung ops in multimds suite test
jewel backport PR https://github.com/ceph/ceph/pull/22189 merged Yuri Weinstein
06:07 PM Bug #24192: cluster [ERR] Corruption detected: object 2:f59d1934:::smithi14913526-5822:head is mi...
... David Zafman
06:05 PM Bug #24199 (Pending Backport): common: JSON output from rados bench write has typo in max_latency...
Sage Weil
06:03 PM Bug #24162 (Pending Backport): control-c on ceph cli leads to segv
mimic backport https://github.com/ceph/ceph/pull/22225 Sage Weil
05:59 PM Bug #23879: test_mon_osdmap_prune.sh fails
/a/sage-2018-05-23_14:50:29-rados-wip-sage2-testing-2018-05-22-1410-distro-basic-smithi/2576533 Sage Weil
03:40 PM Feature #24232: Add new command ceph mon status
added a card to the backlog: https://trello.com/c/PTgwBpmx Joao Eduardo Luis
01:27 PM Feature #24232: Add new command ceph mon status
Sorry for the confusion, I did not check that we have ceph osd stat and ceph mon stat has the same purpose. I wanted ... Vikhyat Umrao
10:55 AM Feature #24232: Add new command ceph mon status
copy/pasting from the PR opened to address this issue (https://github.com/ceph/ceph/pull/22202):... Joao Eduardo Luis
01:44 PM Bug #24037 (Resolved): osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_nod...
Sage Weil
01:42 PM Bug #24145: osdmap decode error in rados/standalone/*
... Sage Weil
01:39 PM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
... Sage Weil
12:08 PM Backport #24279 (In Progress): luminous: RocksDB compression is not supported at least on Debian.
Kefu Chai
12:08 PM Backport #24279 (Resolved): luminous: RocksDB compression is not supported at least on Debian.
https://github.com/ceph/ceph/pull/22215 Kefu Chai
09:48 AM Bug #24025 (Pending Backport): RocksDB compression is not supported at least on Debian.
Kefu Chai
09:43 AM Bug #24025: RocksDB compression is not supported at least on Debian.
tested... Kefu Chai
08:22 AM Bug #23352: osd: segfaults under normal operation
Hi Alex,
I notice there are several more coredumps attached to the related bug reports. Are they all separate cras...
Brad Hubbard
03:07 AM Bug #24264: ssd-primary crush rule not working as intended
Sorry, here's my updated rule instead of the one in the document.
rule ssd-primary {
id 2
type r...
Horace Ng
03:05 AM Bug #24264 (Closed): ssd-primary crush rule not working as intended
I've set up the rule according to the doc, but some of the PGs are still being assigned to the same host though my fa... Horace Ng

05/23/2018

09:36 PM Bug #23787 (Rejected): luminous: "osd-scrub-repair.sh'" failures in rados
This is an incompatibility between the OSD version 64ffa817000d59d91379f7335439845930f58530 (luminous) and the versio... David Zafman
06:40 PM Bug #22920 (Resolved): filestore journal replay does not guard omap operations
Nathan Cutler
06:40 PM Backport #22934 (Resolved): luminous: filestore journal replay does not guard omap operations
Nathan Cutler
06:35 PM Bug #23878 (Resolved): assert on pg upmap
Nathan Cutler
06:34 PM Backport #23925 (Resolved): luminous: assert on pg upmap
Nathan Cutler
06:32 PM Backport #24259 (Resolved): mimic: crush device class: Monitor Crash when moving Bucket into Defa...
https://github.com/ceph/ceph/pull/22169 Nathan Cutler
06:32 PM Backport #24258 (Resolved): luminous: crush device class: Monitor Crash when moving Bucket into D...
https://github.com/ceph/ceph/pull/22381 Nathan Cutler
06:32 PM Backport #24244 (New): jewel: osd/EC: slow/hung ops in multimds suite test
Nathan Cutler
05:09 PM Backport #24244 (Resolved): jewel: osd/EC: slow/hung ops in multimds suite test
https://github.com/ceph/ceph/pull/22189
partial backport for mdsmonitor
Abhishek Lekshmanan
06:31 PM Backport #24256 (Resolved): mimic: osd: Assertion `!node_algorithms::inited(this->priv_value_tra...
https://github.com/ceph/ceph/pull/22160 Nathan Cutler
06:31 PM Backport #24246 (Resolved): mimic: Manager daemon y is unresponsive during teuthology cluster tea...
https://github.com/ceph/ceph/pull/22333 Nathan Cutler
06:31 PM Backport #24245 (Resolved): luminous: Manager daemon y is unresponsive during teuthology cluster ...
https://github.com/ceph/ceph/pull/22331 Nathan Cutler
04:27 PM Bug #23352: osd: segfaults under normal operation
Sage, I had tried to do this, but we don't know when these crashes would happen, just that they will occur. Random t... Alex Gorbachev
04:10 PM Bug #23352 (Need More Info): osd: segfaults under normal operation
Alex, how reproducible is this for you? Could you reproduce with debug timer = 20? Sage Weil
04:21 PM Backport #24058 (In Progress): jewel: Deleting a pool with active notify linger ops can result in...
https://github.com/ceph/ceph/pull/22188 Kefu Chai
04:15 PM Bug #24243 (Resolved): osd: pg hard limit too easy to hit
The default ratio of 2x mon_max_pg_per_osd is easy to hit for clusters that have differently weighted disks (e.g. 1 a... Josh Durgin
03:27 PM Bug #24025: RocksDB compression is not supported at least on Debian.
mimic: https://github.com/ceph/ceph/pull/22183 Kefu Chai
03:25 PM Bug #24025 (Fix Under Review): RocksDB compression is not supported at least on Debian.
https://github.com/ceph/ceph/pull/22181 Kefu Chai
02:53 PM Bug #24025: RocksDB compression is not supported at least on Debian.
because we fail to pass -DWITH_SNAPPY etc to cmake while building rocksdb. this bug also impacts rpm package. i can h... Kefu Chai
01:51 PM Bug #24229 (Triaged): Libradosstriper successfully removes nonexistent objects instead of returni...
Sage Weil
11:57 AM Bug #24242 (New): tcmalloc::ThreadCache::ReleaseToCentralCache on rhel (w/ centos packages)
... Sage Weil
11:43 AM Bug #24222 (Pending Backport): Manager daemon y is unresponsive during teuthology cluster teardown
Sage Weil
08:41 AM Bug #23145: OSD crashes during recovery of EC pg
osd in last peering stage will call pg_log.roll_forward(at last of PG::activate), is there possible the entry rollbf... Zengran Zhang
06:52 AM Bug #23386 (Pending Backport): crush device class: Monitor Crash when moving Bucket into Default ...
https://github.com/ceph/ceph/pull/22169 Kefu Chai
01:21 AM Bug #24037 (Pending Backport): osd: Assertion `!node_algorithms::inited(this->priv_value_traits(...
Sage Weil

05/22/2018

09:55 PM Bug #24222 (Fix Under Review): Manager daemon y is unresponsive during teuthology cluster teardown
https://github.com/ceph/ceph/pull/22158 Sage Weil
02:20 AM Bug #24222 (Resolved): Manager daemon y is unresponsive during teuthology cluster teardown
... Sage Weil
08:47 PM Feature #24232 (Fix Under Review): Add new command ceph mon status
Add new command ceph mon status
For more information please check - https://tracker.ceph.com/issues/24217
Changed...
Vikhyat Umrao
08:32 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
Josh Durgin wrote:
> Casey, could you or someone else familiar with rgw look through the logs for this and identify ...
Casey Bodley
03:19 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
Casey, could you or someone else familiar with rgw look through the logs for this and identify the relevant OSD reque... Josh Durgin
07:17 PM Feature #24231 (New): librbd/libcephfs/librgw should ignore rados_mon/osd_op_timeouts options (re...
librbd/libcephfs/librgw should ignore rados_mon/osd_op_timeouts options
https://bugzilla.redhat.com/show_bug.cgi?id=...
Vikhyat Umrao
04:09 PM Bug #24025 (In Progress): RocksDB compression is not supported at least on Debian.
... Radoslaw Zarzynski
03:48 PM Bug #24037 (Fix Under Review): osd: Assertion `!node_algorithms::inited(this->priv_value_traits(...
https://github.com/ceph/ceph/pull/22156 Radoslaw Zarzynski
02:35 PM Bug #24229 (Triaged): Libradosstriper successfully removes nonexistent objects instead of returni...
libradosstriper remove() call on nonexistent objects returns zero instead of ENOENT.
Tested on luminous 12.2.5-1xe...
Stan K
11:35 AM Feature #24099: osd: Improve workflow when creating OSD on raw block device if there was bluestor...

> Point out that it found existing data on the OSD, and possibly suggest using `ceph-volume lvm zap` if that's what...
John Spray
10:51 AM Bug #24199 (Fix Under Review): common: JSON output from rados bench write has typo in max_latency...
John Spray
07:00 AM Bug #23371: OSDs flaps when cluster network is made down
we have not observed this behavior in kraken.
when ever the Cluster interface is made down, few OSDs which goes do...
Nokia ceph-users
03:55 AM Bug #23352: osd: segfaults under normal operation
OSD log attached Alex Gorbachev
03:15 AM Bug #23352: osd: segfaults under normal operation
It's an internal comment for others looking at this - though if you (Alex) have an osd log to go with the 'MMgrReport... Josh Durgin
02:59 AM Bug #23352: osd: segfaults under normal operation
Josh, is this something I can extract from the OSD node for you, or is this an internal comment? Alex Gorbachev
01:10 AM Bug #23352: osd: segfaults under normal operation
I put the core file from comment #14 and binaries from 12.2.5 in senta02:/slow/jdurgin/ceph/bugs/tracker_23352/2018-0... Josh Durgin
03:49 AM Backport #24059 (In Progress): luminous: Deleting a pool with active notify linger ops can result...
https://github.com/ceph/ceph/pull/22143 Prashant D
 

Also available in: Atom