Project

General

Profile

Activity

From 10/08/2018 to 11/06/2018

11/06/2018

01:22 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
/a/sage-2018-11-05_22:04:25-rados-wip-sage3-testing-2018-11-05-1406-distro-basic-smithi/3227352 Sage Weil
11:54 AM Support #36326: Huge traffic spike and assert(is_primary())
Thanks for the answer! It looks like traffic spike was caused by another issue: ceph-mon's db grows up to 15GB and it... Aleksei Zakharov
10:07 AM Bug #36709 (Closed): OSD stuck while flushing rocksdb WAL
Hi all,
We use:
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
Clients work on:
...
Aleksei Zakharov
01:30 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
Quoting my reply to ceph-devel for reference:
"Nathan, I don't think we want to revert it for 13.2.2.
This is b...
Neha Ojha

11/05/2018

10:42 PM Bug #22902 (Resolved): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
Nathan Cutler
10:32 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
So, the luminous revert was merged. Neha, will there be a mimic revert as well? Since the pg hard limit patches are p... Nathan Cutler
10:13 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
https://github.com/ceph/ceph/pull/24903 merged Yuri Weinstein
10:28 PM Bug #36508 (Resolved): gperftools-libs-2.6.1-1 or newer required for binaries linked against corr...
Nathan Cutler
10:28 PM Backport #36552 (Resolved): luminous: gperftools-libs-2.6.1-1 or newer required for binaries link...
Nathan Cutler
10:10 PM Backport #36552: luminous: gperftools-libs-2.6.1-1 or newer required for binaries linked against ...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24706
merged
Yuri Weinstein
10:25 PM Bug #34541 (Resolved): deep scrub cannot find the bitrot if the object is cached
Nathan Cutler
10:25 PM Backport #35067 (Resolved): luminous: deep scrub cannot find the bitrot if the object is cached
Nathan Cutler
10:08 PM Backport #35067: luminous: deep scrub cannot find the bitrot if the object is cached
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24802
merged
Yuri Weinstein
10:18 PM Backport #36678 (Resolved): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state...
David Zafman
05:20 PM Feature #24917: Gracefully deal with upgrades when bluestore skipping of data_digest becomes active
Let's include this with any other feature bit addition. David Zafman
01:30 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
> I suspect it shouldn't.
But it does exactly that.
> That's will only re-copy the data to the HEAD revision.
...
Vitaliy Filippov

11/04/2018

06:55 PM Bug #36677 (Fix Under Review): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')'...
A fix is already available. See Sage's PR: https://github.com/ceph/ceph/pull/24835. Radoslaw Zarzynski

11/03/2018

11:27 PM Bug #24923 (Resolved): doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
Nathan Cutler
11:27 PM Backport #25055 (Resolved): mimic: doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
Nathan Cutler
11:26 PM Backport #35071 (In Progress): mimic: FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor...
Nathan Cutler
04:42 AM Backport #23670 (In Progress): luminous: auth: ceph auth add does not sanity-check caps
Kefu Chai
04:24 AM Backport #23670 (New): luminous: auth: ceph auth add does not sanity-check caps
Kefu did the jewel backport, so assigning this to him in hopes he'll pick it up. Nathan Cutler
04:00 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
-Also, is this bug reproducible in master and mimic as well? If not, the Backport field should probably be modified..... Nathan Cutler
03:58 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
Neha, 12.2.9 has already been cut, so we'll need to expedite 12.2.10 to push the revert out to users. Nathan Cutler
03:52 AM Backport #36678 (In Progress): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad st...
Nathan Cutler

11/02/2018

11:57 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
The immediate fix is to revert this for luminous before 12.2.9: https://github.com/ceph/ceph/pull/24903
Neha Ojha
11:51 PM Bug #36686 (Resolved): osd: pg log hard limit can cause crash during upgrade
During an upgrade from an earlier version, a primary running the new code will send a trim_to value to a replica that... Josh Durgin
05:14 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
Ceph has already moved to C++17. The main question is: have we transitioned to C++17 also our public headers xor put ... Radoslaw Zarzynski
04:58 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
The no-message-taking-variant of *static_assert* has been introduced in C++17. The code is being compiled with *-std=... Radoslaw Zarzynski
04:55 PM Bug #36677 (In Progress): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
Radoslaw Zarzynski
05:14 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
Back-and-forth question answering like this is probably better for the mailing list (the ticket is currently closed F... Jason Dillaman
04:57 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
since you've identified that this is an RBD workload, assigning it to that project so that RBD team notices it. HTH. Ben England
02:37 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
Oops. That's more than 2 questions. But anyway :) Vitaliy Filippov
02:36 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
OK, I looked into OSD datastore using ceph-objectstore-tool and I see that for almost every object there are two copi... Vitaliy Filippov
01:39 PM Bug #24835: osd daemon spontaneous segfault
We do use some configuration set by "ceph config set" or "ceph config-key set":... Soenke Schippmann

11/01/2018

11:46 PM Backport #36678 (Resolved): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state...
https://github.com/ceph/ceph/pull/24902 David Zafman
11:19 PM Bug #22902 (Pending Backport): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machin...
Based on similar failures seen in luminous: http://pulpito.ceph.com/yuriw-2018-10-31_22:45:22-rados-wip-yuri4-testing... Neha Ojha
09:10 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
... Neha Ojha
09:06 PM Bug #36677 (Resolved): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
... Neha Ojha
04:44 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
Looking through the ceph/rocksdb repo I don't see how it's possible for rocksdb to be compiled without snappy support... David Turner
03:35 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
This seems to be a problem where rocksdb on CentOS doesn't support snappy compression but the ceph-kvstore-tool is co... David Turner
06:14 AM Bug #36667 (New): OSD object_map sync returned error
i deploy a cephfs and the used the vdbench tool to wirte data in cephfs mount point,after a while osd appears down.
...
yp dai

10/31/2018

09:21 PM Bug #36411 (Closed): OSD crash starting recovery/backfill with EC pool
It's my current belief that these objects were broken as a result of intentional metadata manipulation when some of t... Greg Farnum
09:18 PM Bug #36572: ceph-in: --connect-timeout doesn't work while pinging mon
New PR: https://github.com/ceph/ceph/pull/24733 Greg Farnum
09:17 PM Support #36584 (Closed): OSD Anomaly behaviour in ceph-reweight
Are you running the command repeatedly? reweight-by-utilization does not provide a stable balance; it's really just a... Greg Farnum
08:43 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
https://github.com/ceph/ceph/pull/24868 Sage Weil
05:35 PM Feature #36661: osd: add sanity check on startup to compare osd memory target to available memory...
- in OSD::handle_conf_change, we should sanity check this against current memory available on the system and refuse t... Sage Weil
04:59 PM Feature #36661 (New): osd: add sanity check on startup to compare osd memory target to available ...
This is needed so that we do not fail due to osd_memomory_target being set too high compared to the amount of memory ... Neha Ojha
11:42 AM Backport #36658 (Resolved): mimic: Cache-tier forward mode hang in luminous (again)
https://github.com/ceph/ceph/pull/25075 Nathan Cutler
11:42 AM Backport #36657 (Resolved): luminous: Cache-tier forward mode hang in luminous (again)
https://github.com/ceph/ceph/pull/25074 Nathan Cutler

10/30/2018

08:08 PM Bug #36345 (Resolved): librados C API aio read empty buffer
Sage Weil
08:07 PM Bug #36406 (Pending Backport): Cache-tier forward mode hang in luminous (again)
Sage Weil
05:16 PM Backport #36647 (Resolved): mimic: librados api aio tests race condition
https://github.com/ceph/ceph/pull/25027 Patrick Donnelly
05:16 PM Backport #36646 (Resolved): luminous: librados api aio tests race condition
https://github.com/ceph/ceph/pull/25028 Patrick Donnelly
05:14 PM Backport #36637 (Resolved): mimic: osd: race condition opening heartbeat connection
https://github.com/ceph/ceph/pull/25026 Patrick Donnelly
05:14 PM Backport #36636 (Resolved): luminous: osd: race condition opening heartbeat connection
https://github.com/ceph/ceph/pull/25035 Patrick Donnelly
04:06 PM Bug #36634 (New): LibRadosWatchNotify.WatchNotify2Timeout failure
... Sage Weil
03:33 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
Yes, I'm using EC with RBD and partial overwrites enabled. CephFS pools are only created recently for tests and do no... Vitaliy Filippov
01:05 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
How are you writing these objects? Most sites that used EC were using RGW, but I don't see all the pools that go wit... Ben England
10:31 AM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
In fact it doesn't seem that it will self-heal, and nobody seems to care about it in the mailing list by now...)
C...
Vitaliy Filippov
02:33 PM Bug #36631 (In Progress): potential deadlock in PG::_scan_snaps when repairing snap mapper
If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` ... Mykola Golub
02:28 PM Backport #36630 (Resolved): luminous: potential deadlock in PG::_scan_snaps when repairing snap m...
If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` ... Mykola Golub
02:00 PM Bug #36629 (New): osd:the new file was stored in cache pool which mode was none
ceph version:13.2.1
kernel client 4.17
I created the cache data pool as ceph's instructions:
(1) ceph osd tier add...
qinglong li
01:41 AM Bug #36620: osd:the vim will be hanged when I saved the file
the client: 4.17 kernel client qinglong li
01:36 AM Bug #36620 (New): osd:the vim will be hanged when I saved the file
ceph version: 13.2.1
situtation: the data pool tiered by a cache data pool and the cache tier pool's mode was read...
qinglong li

10/29/2018

10:33 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
Thanks for the response, I wrote to the mailing list ceph-users (is it the correct place?) :) Vitaliy Filippov
08:37 PM Support #36614 (Closed): Cluster uses substantially more space after rebalance (erasure codes)
The mailing list is a better place to resolve this. My guess is data hasn't been cleaned up from its old locations ye... Greg Farnum
12:13 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
How to heal it? If I don't heal it I'll need to purge the whole cluster? O_o... Vitaliy Filippov
12:12 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
ceph df output:... Vitaliy Filippov
11:11 AM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
Proofs from our prometheus monitoring. Two graphs from yesterday: one with number of objects in cluster and other wit... Vitaliy Filippov
10:17 AM Support #36614 (Closed): Cluster uses substantially more space after rebalance (erasure codes)
Hi
After I recreated one OSD + increased pg count of my erasure-coded (2+1) pool (which was way too low, only 100 ...
Vitaliy Filippov
10:21 PM Bug #36525: osd-scrub-snaps.sh failure

Looking at the log another scrub has made the number of "_scan_snaps start" in the log from 2 to 4. It results in ...
David Zafman
01:06 AM Bug #36525: osd-scrub-snaps.sh failure
/a/sage-2018-10-28_14:12:19-rados-master-distro-basic-smithi/3196520
another instance on current master
Sage Weil
09:48 PM Bug #23827 (Resolved): osd sends op_reply out of order
Nathan Cutler
09:47 PM Backport #25010 (Resolved): mimic: osd sends op_reply out of order
Nathan Cutler
08:47 PM Backport #25010: mimic: osd sends op_reply out of order
https://github.com/ceph/ceph/pull/23136 has merged, can we resolve this issue? Neha Ojha
09:43 PM Bug #25154 (Resolved): librados application's symbol could conflict with the libceph-common
Nathan Cutler
09:42 PM Backport #26839 (Resolved): mimic: librados application's symbol could conflict with the libceph-...
Nathan Cutler
08:21 PM Backport #26839: mimic: librados application's symbol could conflict with the libceph-common
Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/24708
merged
Yuri Weinstein
09:40 PM Bug #35969 (Resolved): "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
Nathan Cutler
09:39 PM Backport #36553 (Resolved): mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked ...
Nathan Cutler
08:16 PM Backport #36553: mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked against cor...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24260
merged
Yuri Weinstein
09:39 PM Backport #36132 (Resolved): mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on ...
Nathan Cutler
08:16 PM Backport #36132: mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24260
merged
Yuri Weinstein
08:47 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
The above changes is not entirely correct. This section needs to be ommited:... Louwrentius Louwrentius
08:13 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
Hello!
I've used the instruction created by Daniel Glasser and with some small code adjustments in a few files I w...
Louwrentius Louwrentius
04:17 PM Bug #36610 (Fix Under Review): filestore merge collection replay problem
https://github.com/ceph/ceph/pull/24806 Sage Weil
03:51 PM Bug #36610: filestore merge collection replay problem
the osd is stopped during the merge operation:... Sage Weil
03:46 PM Bug #36182 (Resolved): osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is ...
Neha Ojha
02:59 PM Bug #36473 (Resolved): hung osd_repop, bluestore committed but failed to trigger repop_commit
this is presumably https://github.com/ceph/ceph/pull/24761 Sage Weil
02:58 PM Bug #36548 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh
Sage Weil
01:34 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
/a/sage-2018-10-29_01:11:58-rados-wip-sage-testing-2018-10-28-0943-distro-basic-smithi/3197984 Sage Weil
01:10 AM Bug #36408 (Resolved): [cache tier] failed guarded write + promotion results in "success" op result
Sage Weil

10/28/2018

02:40 PM Bug #36602 (Pending Backport): osd: race condition opening heartbeat connection
Sage Weil
02:37 PM Bug #36610 (Resolved): filestore merge collection replay problem
/a/sage-2018-10-27_02:10:33-rados-wip-sage-testing-2018-10-26-1411-distro-basic-smithi/3188976
osd.3 was partway t...
Sage Weil

10/26/2018

07:24 PM Feature #24591: FileStore hasn't impl to get kv-db's statistics
Jack Lv wrote:
> https://github.com/ceph/ceph/pull/22633
merged
Yuri Weinstein
06:30 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
https://github.com/ceph/ceph/pull/24761 Neha Ojha
05:41 PM Bug #36602: osd: race condition opening heartbeat connection
Greg Farnum
03:39 PM Bug #36602 (Fix Under Review): osd: race condition opening heartbeat connection
https://github.com/ceph/ceph/pull/24780 Sage Weil
03:37 PM Bug #36602 (Resolved): osd: race condition opening heartbeat connection
... Sage Weil
05:10 PM Bug #20694: osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_log().get_log().obje...
/a/yuriw-2018-10-25_15:31:28-rados-wip-yuri4-testing-2018-10-24-2310-mimic-distro-basic-smithi/3183476/ Neha Ojha
04:22 PM Bug #36345 (Fix Under Review): librados C API aio read empty buffer
imirc tw, thank you for your analysis. i am approving https://github.com/ceph/ceph/pull/24534. so "unshared buffer" o... Kefu Chai
09:52 AM Bug #36345: librados C API aio read empty buffer
I figured it out. In Objecter.cc:3279... imirc tw
09:02 AM Bug #36345: librados C API aio read empty buffer
without osd_op_timeout, in Objecter::handle_osd_op_reply, Objecter.cc:3473
op->con px is an AsyncConnection on whic...
imirc tw
07:54 AM Bug #36345: librados C API aio read empty buffer
Some more info from what I can see while debugging.
Without 'rados osd op timeout', the buffer in librados::IoCtx...
imirc tw
02:18 PM Bug #24587 (Pending Backport): librados api aio tests race condition
Sage Weil
11:01 AM Bug #24180 (Resolved): mon: slow op on log message
Nathan Cutler
11:01 AM Backport #24293 (Resolved): jewel: mon: slow op on log message
Nathan Cutler
06:42 AM Bug #24835: osd daemon spontaneous segfault
Our ceph.conf:... Christian Schlittchen
03:46 AM Bug #24615 (Need More Info): error message for 'unable to find any IP address' not shown
Francois,
Can you try reproducing your issue on the latest master?
I fixed a similar issue in master and also fro...
Victor Denisov
03:28 AM Bug #24615 (In Progress): error message for 'unable to find any IP address' not shown
Victor Denisov
02:34 AM Bug #25153 (Resolved): output format is invalid of the crush tree json dumper
Nathan Cutler
02:33 AM Backport #36149 (Resolved): luminous: output format is invalid of the crush tree json dumper
Nathan Cutler
02:33 AM Bug #35845 (Resolved): osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
Nathan Cutler
02:32 AM Backport #36393 (Resolved): luminous: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
Nathan Cutler
02:30 AM Bug #36183 (Resolved): [objecter] client socket failure leads to hung connection
Nathan Cutler
02:30 AM Backport #36295 (Resolved): luminous: [objecter] client socket failure leads to hung connection
Nathan Cutler
02:29 AM Bug #21931 (Resolved): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range...
Nathan Cutler
02:29 AM Backport #36440 (Resolved): luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + le...
Nathan Cutler
02:28 AM Bug #22330 (Resolved): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler
02:28 AM Backport #36438 (Resolved): luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler
02:27 AM Bug #36417 (Resolved): osd: get loadavg per cpu for scrub load threshold check
Nathan Cutler
02:27 AM Backport #36419 (Resolved): luminous: osd: get loadavg per cpu for scrub load threshold check
Nathan Cutler
02:26 AM Bug #36174 (Resolved): ceph pg ls creating: EINVAL
Nathan Cutler
02:26 AM Backport #36297 (Resolved): luminous: ceph pg ls creating: EINVAL
Nathan Cutler
02:25 AM Bug #23614 (Resolved): local_reserver double-reservation of backfilled pg
Nathan Cutler
02:25 AM Backport #24333 (Resolved): luminous: local_reserver double-reservation of backfilled pg
Nathan Cutler
02:24 AM Backport #26932 (Resolved): luminous: scrub livelock
Nathan Cutler

10/25/2018

10:22 PM Backport #36149: luminous: output format is invalid of the crush tree json dumper
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24482
merged
Yuri Weinstein
10:21 PM Backport #36393: luminous: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
David Zafman wrote:
> https://github.com/ceph/ceph/pull/24532
merged
Yuri Weinstein
10:20 PM Backport #36295: luminous: [objecter] client socket failure leads to hung connection
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24574
merged
Yuri Weinstein
10:20 PM Backport #36440: luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (r...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24582
merged
Yuri Weinstein
10:20 PM Backport #36438: luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24582
merged
Yuri Weinstein
10:19 PM Backport #36419: luminous: osd: get loadavg per cpu for scrub load threshold check
David Zafman wrote:
> https://github.com/ceph/ceph/pull/24593
merged
Yuri Weinstein
10:19 PM Backport #36297: luminous: ceph pg ls creating: EINVAL
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24602
mergedReviewed-by: Neha Ojha <nojha@redhat.com>
Yuri Weinstein
10:18 PM Bug #26890: scrub livelock
merged https://github.com/ceph/ceph/pull/24659 Yuri Weinstein
08:01 PM Bug #36345: librados C API aio read empty buffer
Kefu Chai, same happens on master:... imirc tw
04:36 PM Bug #36345: librados C API aio read empty buffer
13.2.2 , i will give it a go on master asap. imirc tw
03:57 PM Bug #36345: librados C API aio read empty buffer
imirc tw, on which release did you reproduce this issue? is master affected? Kefu Chai
01:18 PM Bug #36345: librados C API aio read empty buffer
Hi Kefu,
I'm not that deep into the Ceph code, I was making an assumption based on my observations and past ticket...
imirc tw
08:52 AM Bug #36345: librados C API aio read empty buffer
imirc tw, i don't understand how "rados_osd_op_timeout" is related to this issue. i agree that current @librados::IoC... Kefu Chai
07:02 AM Bug #36345: librados C API aio read empty buffer
Hi Wido,
The 2nd assumption isn't true, that was because the client.admin ceph.conf file used didn't had the osd_o...
imirc tw
06:51 AM Bug #36345: librados C API aio read empty buffer
Updating this ticket as the issue seems to be related to two things:
- When using osd_op_timeout
- When using a u...
Wido den Hollander
06:21 PM Bug #36598 (Can't reproduce): osd: "bluestore(/var/lib/ceph/osd/ceph-6) ENOENT on clone suggests ...
... Patrick Donnelly
04:20 PM Backport #24333: luminous: local_reserver double-reservation of backfilled pg
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23493
merged
Yuri Weinstein
05:18 AM Feature #36474: Add support for osd_delete_sleep configuration value
https://github.com/ceph/ceph/pull/24749 David Zafman

10/24/2018

09:43 PM Bug #25182: Upmaps forgotten after restarting OSDs
One thing I've noticed after living with this for a while is that the upmap entries that are forgotten are always for... Bryan Stillwell
09:31 PM Bug #36517: client crashes osd with empty object name
Attached Noah Watkins
09:17 PM Bug #36517: client crashes osd with empty object name
Noah, the paste doesn't show now, could you paste the trace in the tracker. Neha Ojha
09:21 PM Bug #24485 (Resolved): LibRadosTwoPoolsPP.ManifestUnset failure
Greg Farnum
09:11 PM Bug #36166 (Resolved): pg merge can collide with remapped, upmap pgs
Neha Ojha
02:29 PM Bug #36345: librados C API aio read empty buffer
Kefu, I'm also experiencing this issue. It seems to be related to `rados osd op timeout`. Once this value is set in t... imirc tw
11:01 AM Support #36584 (Closed): OSD Anomaly behaviour in ceph-reweight
ceph version 10.2.5
We have this behaviour with 2 OSDs in cluster making a backfilling bucle.
I'm executing thi...
JUan Galan
10:35 AM Bug #19348 (Can't reproduce): "ceph ping mon.c" cli prints assertion failure on timeout
not able to reproduce with master HEAD anymore. Kefu Chai
10:34 AM Bug #19348: "ceph ping mon.c" cli prints assertion failure on timeout
https://github.com/ceph/ceph/pull/24733 Kefu Chai

10/23/2018

09:36 PM Bug #36040: mon: Valgrind: mon (InvalidFree, InvalidWrite, InvalidRead)
/ceph/teuthology-archive/pdonnell-2018-10-17_19:54:38-multimds-wip-pdonnell-testing-20181017.175152-distro-basic-smit... Patrick Donnelly
09:30 PM Bug #36497: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::replace()
/ceph/teuthology-archive/pdonnell-2018-10-17_19:54:38-multimds-wip-pdonnell-testing-20181017.175152-distro-basic-smit... Patrick Donnelly
09:28 PM Bug #36411: OSD crash starting recovery/backfill with EC pool
I have to add to the previous update, which did not explain the resolution of the problem.
The true solution was w...
Graham Allan
08:27 PM Bug #24587 (Fix Under Review): librados api aio tests race condition
https://github.com/ceph/ceph/pull/24724 Josh Durgin
07:53 PM Bug #36572: ceph-in: --connect-timeout doesn't work while pinging mon
Submitted a "PR":https://github.com/ceph/ceph/pull/24723 for this. Rishabh Dave
07:44 PM Bug #36572 (Closed): ceph-in: --connect-timeout doesn't work while pinging mon
Saw the following output while working on "PR 21432":https://github.com/ceph/ceph/pull/21432 -... Rishabh Dave
03:53 PM Bug #36548: qa/standalone/osd/osd-rep-recov-eio.sh
The failed run did not include the changes in https://github.com/ceph/ceph/pull/24651 (master). This pull request mi... David Zafman
01:43 AM Bug #36548 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh
pg ended up in active+clean, not recovery_unfound
/a/sage-2018-10-22_21:29:13-rados-wip-sage-testing-2018-10-22-11...
Sage Weil
06:04 AM Backport #36553 (In Progress): mimic: gperftools-libs-2.6.1-1 or newer required for binaries link...
Nathan Cutler
05:44 AM Backport #36553 (Resolved): mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked ...
https://github.com/ceph/ceph/pull/24260 Nathan Cutler
05:52 AM Backport #36552 (In Progress): luminous: gperftools-libs-2.6.1-1 or newer required for binaries l...
Nathan Cutler
05:43 AM Backport #36552 (Resolved): luminous: gperftools-libs-2.6.1-1 or newer required for binaries link...
https://github.com/ceph/ceph/pull/24706 Nathan Cutler
05:45 AM Backport #36557 (Resolved): mimic: RBD client IOPS pool stats are incorrect (2x higher; includes ...
https://github.com/ceph/ceph/pull/25024 Nathan Cutler
05:45 AM Backport #36556 (Resolved): luminous: RBD client IOPS pool stats are incorrect (2x higher; includ...
https://github.com/ceph/ceph/pull/25025 Nathan Cutler
05:43 AM Backport #35909 (Resolved): mimic: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
Nathan Cutler
05:31 AM Backport #36439 (Resolved): mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + lengt...
Nathan Cutler
05:31 AM Backport #36437 (Resolved): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler
05:30 AM Backport #36296 (Resolved): mimic: [objecter] client socket failure leads to hung connection
Nathan Cutler
05:30 AM Backport #36298 (Resolved): mimic: ceph pg ls creating: EINVAL
Nathan Cutler
04:41 AM Bug #24835: osd daemon spontaneous segfault
I'd say the cause of most, if not all, of these crashes is memory corruption caused by code responsible for manipulat... Brad Hubbard
04:31 AM Bug #24835: osd daemon spontaneous segfault
The 'safe_timer.5246' is again similar but this time tcmalloc is 'popping' a
single value rather than a range.
<p...
Brad Hubbard
03:54 AM Bug #24835: osd daemon spontaneous segfault
The 'msgr-worker-1.5278' is almost identical to 'tp_osd_tp' except this time 'i'
= 499 so doing that manually is bey...
Brad Hubbard
01:58 AM Bug #24835: osd daemon spontaneous segfault
For the rest of the coredumps adding the debuginfo for libtcmalloc really helps
to understand the problem as we end ...
Brad Hubbard
12:53 AM Bug #24835: osd daemon spontaneous segfault
Starting with the bluestore bufferlist destructor crash.... Brad Hubbard

10/22/2018

11:39 PM Bug #36508 (Pending Backport): gperftools-libs-2.6.1-1 or newer required for binaries linked agai...
Brad Hubbard
11:38 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
Haven't been able to reproduce this on luminous and mimic, so clearing the Backport fields for now. Neha Ojha
07:05 PM Bug #24909 (Pending Backport): RBD client IOPS pool stats are incorrect (2x higher; includes IO h...
Jason Dillaman
03:40 PM Backport #35909: mimic: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
David Zafman wrote:
> https://github.com/ceph/ceph/pull/24017
merged
Yuri Weinstein
03:35 PM Backport #36439: mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (rang...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24581
merged
Yuri Weinstein
03:35 PM Backport #36437: mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24581
merged
Yuri Weinstein
03:34 PM Backport #36296: mimic: [objecter] client socket failure leads to hung connection
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24600
merged
Yuri Weinstein
03:32 PM Backport #36298: mimic: ceph pg ls creating: EINVAL
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24601
merged
Yuri Weinstein
02:31 PM Bug #24956 (Resolved): osd: parent process need to restart log service after fork, or ceph-osd wi...
Kefu Chai
02:25 PM Bug #36546 (Duplicate): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back...
... Sage Weil
02:21 PM Bug #36485 (Resolved): dump-stuck.yaml fails assert len(inactive) == num_inactive
Sage Weil

10/21/2018

03:53 PM Bug #36485 (Fix Under Review): dump-stuck.yaml fails assert len(inactive) == num_inactive
https://github.com/ceph/ceph/pull/24689 Sage Weil
09:25 AM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
https://github.com/ceph/ceph/pull/24687
Myoungwon Oh

10/20/2018

09:43 PM Bug #22144: *** Caught signal (Aborted) ** in thread thread_name:tp_peering
we can confirm we are experiencing the same issue on version 12.2.7 and currently have some random osds that went off... Rams C

10/19/2018

09:12 PM Bug #16279 (Closed): assert(objiter->second->version > last_divergent_update) failed
Closing this ticket since the linked PR was also closed. Greg Farnum
09:09 PM Bug #17252: [Librados] Deadlock on RadosClient::watch_flush
Greg Farnum
08:18 PM Bug #24368: osd: should not restart on permanent failures
Clearing backport field on the assumption that's what was intended by the previous edit. Nathan Cutler
08:01 PM Bug #24368 (Resolved): osd: should not restart on permanent failures
Okay, after discussing with CERN I've merged the PR to master so this isn't an issue going forward.
But unfortunat...
Greg Farnum

10/18/2018

10:35 PM Bug #22561: PG stuck during recovery, requires OSD restart
I've just encountered this again with about 20 OSDs being non-responsive like this. Restarting the OSDs in that state... Paul Emmerich
10:20 PM Bug #36485: dump-stuck.yaml fails assert len(inactive) == num_inactive
/a/sage-2018-10-17_20:20:33-rados-nautilus-distro-basic-smithi/3154379
fails every time
Sage Weil
08:27 PM Bug #36525 (Resolved): osd-scrub-snaps.sh failure
... Sage Weil
08:25 PM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
/a/sage-2018-10-17_20:20:33-rados-nautilus-distro-basic-smithi/3154168 Sage Weil
06:11 PM Bug #36408 (Fix Under Review): [cache tier] failed guarded write + promotion results in "success"...
*PR*: https://github.com/ceph/ceph/pull/24666 Jason Dillaman
06:08 PM Bug #36408 (In Progress): [cache tier] failed guarded write + promotion results in "success" op r...
Jason Dillaman
04:02 PM Bug #24368: osd: should not restart on permanent failures
From a user:
>There is some class of OSD out there (all filestore, IIRC) that are ultra slow to start at boot time i...
Greg Farnum
03:56 PM Bug #22233: prime_pg_temp breaks on uncreated pgs
I don't understand by the bug happened (or what the proposed fix is trying to do). Given the description above, the ... Sage Weil
03:04 PM Bug #36517 (New): client crashes osd with empty object name
I found a RADOS client causing OSDs to crash running bluestore (haven't tried filestore) producing the following erro... Noah Watkins
02:51 PM Bug #36515 (Resolved): config options: 'services' field is empty for many config options
The 'services' field is empty for many config options, e.g.:... Tatjana Dehler
11:17 AM Feature #21902: Support bytearray in python binding
... Kefu Chai
10:36 AM Backport #26932 (In Progress): luminous: scrub livelock
https://github.com/ceph/ceph/pull/24659
i am resetting the target version, and changing its status to "In Progress...
Kefu Chai
01:28 AM Bug #36508 (In Progress): gperftools-libs-2.6.1-1 or newer required for binaries linked against c...
https://github.com/ceph/ceph/pull/24652 Brad Hubbard
12:42 AM Bug #36508 (Resolved): gperftools-libs-2.6.1-1 or newer required for binaries linked against corr...
Binaries compiled against the 2.6 version of libtcmalloc.so.4 (in this case ceph-osd) have the following undefined sy... Brad Hubbard
12:50 AM Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects
David Zafman wrote:
> The original code is operating as intended when issuing this warning:
>
> WARNING: Split oc...
huang jun
12:46 AM Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects
Greg Farnum wrote:
> Did you try importing 2.1f with that original 2.f PG dump?
No, the test script doesn't impor...
huang jun
12:44 AM Bug #35969: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
See http://tracker.ceph.com/issues/36508 Brad Hubbard

10/17/2018

11:24 PM Bug #36412 (Closed): ceph-objectstore-tool import after pg splits which will lost objects

The original code is operating as intended when issuing this warning:
WARNING: Split occurred, some objects may ...
David Zafman
11:19 PM Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects

As Greg pointed out you would use the --pgid 2.1f option with --op import to get the objects that split into that p...
David Zafman
09:28 PM Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects
Did you try importing 2.1f with that original 2.f PG dump? Greg Farnum
09:31 PM Bug #36405: unittest_seastar_messenger failure on ARM
Kefu, could you please take a look. Neha Ojha
09:24 PM Bug #22727 (Resolved): "osd pool stats" shows recovery information bugly
Nathan Cutler
09:24 PM Backport #22808 (Rejected): jewel: "osd pool stats" shows recovery information bugly
Jewel is EOL Nathan Cutler
09:24 PM Bug #22539 (Resolved): bluestore: New OSD - Caught signal - bstore_kv_sync
Nathan Cutler
09:23 PM Backport #22906 (Rejected): jewel: bluestore: New OSD - Caught signal - bstore_kv_sync (throttle ...
Jewel is EOL Nathan Cutler
09:20 PM Backport #36506 (Resolved): luminous: mon osdmap cash too small during upgrade to mimic
https://github.com/ceph/ceph/pull/25021 Nathan Cutler
09:20 PM Backport #36505 (Resolved): mimic: mon osdmap cash too small during upgrade to mimic
https://github.com/ceph/ceph/pull/25019 Nathan Cutler
09:19 PM Bug #36163 (Pending Backport): mon osdmap cash too small during upgrade to mimic
Nathan Cutler
09:07 PM Bug #36163 (Resolved): mon osdmap cash too small during upgrade to mimic
Neha Ojha
09:12 PM Bug #22329 (Closed): mon: Valgrind: mon (Leak_DefinitelyLost, Leak_IndirectlyLost)
Please feel free to reopen it, if this appears again. Neha Ojha
09:09 PM Bug #23879: test_mon_osdmap_prune.sh fails
Joao, we've been seeing this one for a while, could you please take a look. Thanks! Neha Ojha
08:33 PM Backport #24889 (Resolved): mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_h...
Nathan Cutler
07:58 PM Backport #24889: mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23026
merged
Yuri Weinstein
08:31 PM Bug #36170 (Resolved): pg dout log had backfill=[] and bft= which are the same thing
Nathan Cutler
08:31 PM Backport #36292 (Resolved): mimic: pg dout log had backfill=[] and bft= which are the same thing
Nathan Cutler
07:56 PM Backport #36292: mimic: pg dout log had backfill=[] and bft= which are the same thing
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24573
merged
Yuri Weinstein
07:13 PM Bug #36498 (New): failed to recover before timeout expired due to pg stuck in creating+peering
... Neha Ojha
06:58 PM Bug #24866: FAILED assert(0 == "past_interval start interval mismatch") in check_past_interval_bo...
Seeing this on master.... Neha Ojha
06:50 PM Bug #36497 (Resolved): FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::repla...
... Neha Ojha
05:56 PM Bug #36494 (Fix Under Review): Change osd_objectstore default to bluestore
-https://github.com/ceph/ceph/pull/24642- Vikhyat Umrao
05:05 PM Bug #36494 (Resolved): Change osd_objectstore default to bluestore
Change osd_objectstore default to bluestore
https://bugzilla.redhat.com/show_bug.cgi?id=1640257
Vikhyat Umrao
05:56 PM Bug #36485: dump-stuck.yaml fails assert len(inactive) == num_inactive
/a/nojha-2018-10-16_22:54:08-rados-master-distro-basic-smithi/3149382/ Neha Ojha
01:44 PM Bug #36485 (Resolved): dump-stuck.yaml fails assert len(inactive) == num_inactive
/a/sage-2018-10-17_01:58:53-rados-wip-sage-testing-2018-10-16-1758-distro-basic-smithi/3149554 Sage Weil
03:07 PM Bug #35847 (Resolved): wrong cluster_network doesn't cause any errors and ends up using monitor n...
Sage Weil
01:43 PM Bug #36418 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh fails to parse pg dump
Sage Weil
12:43 AM Bug #36473: hung osd_repop, bluestore committed but failed to trigger repop_commit
See https://github.com/ceph/ceph/pull/23317#issuecomment-423432234 (should be the same issue):
I also took a clos...
xie xingguo

10/16/2018

10:53 PM Bug #36473: hung osd_repop, bluestore committed but failed to trigger repop_commit
... Sage Weil
10:45 PM Bug #36473 (Resolved): hung osd_repop, bluestore committed but failed to trigger repop_commit
/a/sage-2018-10-16_18:31:27-rados-wip-sage-testing-2018-10-16-0724-distro-basic-smithi/3148851
Usually after the b...
Sage Weil
10:50 PM Bug #36105 (Resolved): OSD hangs during shutdown
David Zafman
10:48 PM Feature #36474: Add support for osd_delete_sleep configuration value
David Zafman
10:48 PM Feature #36474 (Resolved): Add support for osd_delete_sleep configuration value
[RFE] Introduce an option or flag to throttle the pg deletion process
https://bugzilla.redhat.com/show_bug.cgi?id=16...
David Zafman
10:14 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
This can be reproduced with the fs:basic_workload suite, using --filter 'cfuse_workunit_suites_fsx.yaml'.
Particular...
Neha Ojha
12:19 PM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
/a/sage-2018-10-15_22:20:16-rados-wip-sage4-testing-2018-10-15-1501-distro-basic-smithi/3145753 Sage Weil
10:04 AM Backport #36150 (Resolved): mimic: output format is invalid of the crush tree json dumper
Nathan Cutler
08:51 AM Bug #24768 (Resolved): rgw workload makes osd memory explode
Kefu Chai
08:48 AM Backport #24847 (Resolved): jewel: rgw workload makes osd memory explode
Kefu Chai
07:56 AM Backport #36437 (In Progress): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler
07:54 AM Backport #36437 (New): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler
07:54 AM Backport #36438 (In Progress): luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler
06:59 AM Bug #36345 (Can't reproduce): librados C API aio read empty buffer
Wido, i am closing this issue as "can't reproduce". if you managed to reproduce it, please feel free to reopen it. th... Kefu Chai
05:29 AM Bug #24835: osd daemon spontaneous segfault
... Brad Hubbard
01:00 AM Bug #36418 (Fix Under Review): qa/standalone/osd/osd-rep-recov-eio.sh fails to parse pg dump
xie xingguo

10/15/2018

11:58 PM Backport #36297 (In Progress): luminous: ceph pg ls creating: EINVAL
https://github.com/ceph/ceph/pull/24602 Prashant D
11:57 PM Backport #36298 (In Progress): mimic: ceph pg ls creating: EINVAL
https://github.com/ceph/ceph/pull/24601 Prashant D
11:54 PM Backport #36296 (In Progress): mimic: [objecter] client socket failure leads to hung connection
https://github.com/ceph/ceph/pull/24600 Prashant D
10:08 PM Bug #36411: OSD crash starting recovery/backfill with EC pool
This resolved itself, though in a way that doesn't exactly make any sense...
Eventually I noticed that one of the ...
Graham Allan
08:49 PM Backport #36150: mimic: output format is invalid of the crush tree json dumper
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24481
merged
Yuri Weinstein
07:57 PM Backport #36419 (In Progress): luminous: osd: get loadavg per cpu for scrub load threshold check
David Zafman
04:59 PM Bug #24615: error message for 'unable to find any IP address' not shown
I haven't compiled Ceph: it was installed on CentOS via the RPM Ceph repository (https://download.ceph.com) version 1... Francois Lafont
12:31 AM Bug #24615 (Need More Info): error message for 'unable to find any IP address' not shown
Francois, did you compile your ceph with WITH_SEASTAR option? Victor Denisov
02:13 PM Documentation #23777: doc: description of OSD_OUT_OF_ORDER_FULL problem
Any progress? I'm facing the same issue. Stefan Priebe
10:33 AM Backport #36440 (In Progress): luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset +...
Nathan Cutler
10:28 AM Backport #36440 (Resolved): luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + le...
https://github.com/ceph/ceph/pull/24582 Nathan Cutler
10:31 AM Backport #36439 (In Progress): mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + le...
Nathan Cutler
10:27 AM Backport #36439 (Resolved): mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + lengt...
https://github.com/ceph/ceph/pull/24581 Nathan Cutler
10:30 AM Backport #36437 (In Progress): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler
10:27 AM Backport #36437 (Resolved): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
https://github.com/ceph/ceph/pull/24581 Nathan Cutler
10:27 AM Backport #36438 (Resolved): luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
https://github.com/ceph/ceph/pull/24582 Nathan Cutler
10:26 AM Backport #36436 (Resolved): luminous: rados rm --force-full is blocked when cluster is in full st...
https://github.com/ceph/ceph/pull/25018 Nathan Cutler
10:26 AM Backport #36435 (Resolved): mimic: rados rm --force-full is blocked when cluster is in full status
https://github.com/ceph/ceph/pull/25017 Nathan Cutler
10:25 AM Backport #36434 (Resolved): luminous: monstore tool rebuild does not generate creating_pgs
https://github.com/ceph/ceph/pull/25825 Nathan Cutler
10:25 AM Backport #36433 (Resolved): mimic: monstore tool rebuild does not generate creating_pgs
https://github.com/ceph/ceph/pull/25016 Nathan Cutler
10:25 AM Backport #36432 (Resolved): mimic: Interactive mode CLI prints no output since Mimic
https://github.com/ceph/ceph/pull/24971 Nathan Cutler
09:17 AM Bug #36418: qa/standalone/osd/osd-rep-recov-eio.sh fails to parse pg dump
sorry, I'm later
fixup: https://github.com/ceph/ceph/pull/24579
huanwen ren
12:50 AM Backport #36295 (In Progress): luminous: [objecter] client socket failure leads to hung connection
https://github.com/ceph/ceph/pull/24574 Prashant D
12:48 AM Backport #36292 (In Progress): mimic: pg dout log had backfill=[] and bft= which are the same thing
https://github.com/ceph/ceph/pull/24573 Prashant D

10/14/2018

01:05 PM Bug #36300 (Resolved): Clients receive "wrong fsid" error when CephX is disabled
Sage Weil
01:04 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
/a/sage-2018-10-13_00:36:33-rados-wip-sage-testing-2018-10-12-1741-distro-basic-smithi/3133276 Sage Weil
01:10 AM Bug #35847 (Fix Under Review): wrong cluster_network doesn't cause any errors and ends up using m...
Victor Denisov

10/13/2018

02:19 AM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Note that there is a common PR to be backported for this issue and https://tracker.ceph.com/issues/21931 Neha Ojha
02:17 AM Bug #22330 (Pending Backport): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Neha Ojha
02:16 AM Bug #21931 (Pending Backport): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <...
Neha Ojha

10/12/2018

09:26 PM Bug #36186 (Resolved): failed to become clean before timeout expired - pg stuck in clean+premerge...
this run predates fcb1679eab4240c046ba922060c20423fb35ce43, which fixed the problem! Sage Weil
09:14 PM Feature #24176 (Resolved): osd: add command to drop OSD cache
Sage Weil
09:13 PM Bug #36358 (Pending Backport): Interactive mode CLI prints no output since Mimic
Sage Weil
09:12 PM Backport #36419 (Resolved): luminous: osd: get loadavg per cpu for scrub load threshold check
https://github.com/ceph/ceph/pull/24593 David Zafman
08:43 PM Bug #36418 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh fails to parse pg dump
... Sage Weil
08:39 PM Bug #26958: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objec...
/a/sage-2018-10-12_13:16:07-rados-wip-sage-testing-2018-10-11-1437-distro-basic-smithi/3131789 Sage Weil
08:25 PM Bug #22330 (Fix Under Review): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
https://github.com/ceph/ceph/pull/24564 Neha Ojha
08:25 PM Bug #21931 (Fix Under Review): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <...
https://github.com/ceph/ceph/pull/24564 Neha Ojha
05:39 PM Bug #36417 (Pending Backport): osd: get loadavg per cpu for scrub load threshold check
David Zafman
04:53 PM Bug #36417 (Resolved): osd: get loadavg per cpu for scrub load threshold check
https://github.com/ceph/ceph/pull/17718 David Zafman
03:52 PM Bug #36406 (Fix Under Review): Cache-tier forward mode hang in luminous (again)
John Spray
09:32 AM Bug #36406: Cache-tier forward mode hang in luminous (again)
Patch https://github.com/ceph/ceph/pull/24548 Iain Bucław
11:16 AM Bug #36345: librados C API aio read empty buffer
i tested both v13.2.2 and v12.2.8, with the provided source files. and still no luck: i am not able to reproduce this... Kefu Chai
02:59 AM Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects
@David Zafman Do you have time to take a look? huang jun
02:57 AM Bug #36412 (Closed): ceph-objectstore-tool import after pg splits which will lost objects
Hi, i have a test cluster, doing the follow steps, the pool is erasure k:m=3:1
step 1: export pg 2.f from osd.2, ori...
huang jun
02:15 AM Bug #35969: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
@Nathan, Understood, will open a new issue. Brad Hubbard
02:12 AM Bug #36250 (Need More Info): ceph-osd process crashing
Brad Hubbard
02:11 AM Bug #24835: osd daemon spontaneous segfault
Thanks Soenke,
These should help to isolate the problem.
Brad Hubbard

10/11/2018

10:06 PM Bug #36411 (Closed): OSD crash starting recovery/backfill with EC pool
We have one pg on a 4+2 EC pool in which the OSDs will crash with the following error, on reaching an active set of m... Graham Allan
07:25 PM Bug #36177 (Pending Backport): rados rm --force-full is blocked when cluster is in full status
Sage Weil
07:19 PM Bug #23879: test_mon_osdmap_prune.sh fails
/a/sage-2018-10-10_15:50:53-rados-wip-sage-testing-2018-10-10-0850-distro-basic-smithi/3125020 Sage Weil
06:53 PM Bug #36306 (Pending Backport): monstore tool rebuild does not generate creating_pgs
https://github.com/ceph/ceph/pull/24506 Sage Weil
06:36 PM Bug #36408 (Resolved): [cache tier] failed guarded write + promotion results in "success" op result
Simple reproducer: ... Jason Dillaman
06:25 PM Bug #35845 (Pending Backport): osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
Nathan Cutler
06:09 PM Bug #36300: Clients receive "wrong fsid" error when CephX is disabled
Greg Farnum
05:08 PM Bug #36406: Cache-tier forward mode hang in luminous (again)
Iain Bucław wrote:
> Similar to https://tracker.ceph.com/issues/23296
>
Looking at the fix for the other issue....
Iain Bucław
04:46 PM Bug #36406: Cache-tier forward mode hang in luminous (again)
Iain Bucław wrote:
> In the logs, it looks like the client/server enters an infinite loop.
>
> [...]
These are...
Iain Bucław
04:28 PM Bug #36406: Cache-tier forward mode hang in luminous (again)
In the logs, it looks like the client/server enters an infinite loop.... Iain Bucław
04:19 PM Bug #36406 (Resolved): Cache-tier forward mode hang in luminous (again)
Similar to https://tracker.ceph.com/issues/23296
Commands ran to reproduce (in vstart.sh)...
Iain Bucław
03:14 PM Bug #36405 (Resolved): unittest_seastar_messenger failure on ARM
We often ignore these failures, but when I looked at the log I realised it's actually a recently added test that's fa... John Spray
01:48 PM Bug #24835: osd daemon spontaneous segfault
Coredump: 258b1ec0-ebc6-43df-b35e-f16a780148b5... Soenke Schippmann
01:44 PM Bug #24835: osd daemon spontaneous segfault
Coredump: bf9b2d5c-96f5-4d30-b852-3888dda66a6b... Soenke Schippmann
01:33 PM Bug #24835: osd daemon spontaneous segfault
We do have some more core dumps with different stack traces.
Coredump: ebb8eff9-b0d6-4321-b85b-d31be87ed7c2
<pr...
Soenke Schippmann
02:53 AM Bug #24835 (New): osd daemon spontaneous segfault
Looking into this. Will update when I have analysed these coredumps.
In the meantime, if you get any that have a d...
Brad Hubbard
11:10 AM Bug #24956 (Fix Under Review): osd: parent process need to restart log service after fork, or cep...
Kefu Chai
09:17 AM Bug #35969 (Pending Backport): "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on cent...
Nathan Cutler
09:16 AM Bug #35969: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
@Brad: The backporting process for the original fix is already well-along. If a follow-up fix is required, could you ... Nathan Cutler
03:00 AM Bug #35969: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
Not resolved as per https://github.com/ceph/ceph/pull/24260#issuecomment-427144712. Looking into this further. Brad Hubbard
07:17 AM Bug #36345: librados C API aio read empty buffer
I am personally not running into the issue, but the reporter is. The reporter contacted me to forward the fix which s... Wido den Hollander
06:40 AM Bug #36345 (Fix Under Review): librados C API aio read empty buffer
PR posted by Wido: https://github.com/ceph/ceph/pull/24534 Kefu Chai
06:19 AM Bug #36345: librados C API aio read empty buffer
Wido, i am not able to reproduce this issue on master:... Kefu Chai

10/10/2018

11:51 PM Backport #36393 (Resolved): luminous: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
https://github.com/ceph/ceph/pull/24532 David Zafman
11:25 PM Bug #36300: Clients receive "wrong fsid" error when CephX is disabled
https://github.com/ceph/ceph/pull/24535 Greg Farnum
09:53 PM Backport #36321: luminous: Add support for osd_delete_sleep configuration value
Thank you, David.
I hope you will do new patches to mimic and master as this is very specific to luminous.
Vikhyat Umrao
08:53 PM Backport #36321: luminous: Add support for osd_delete_sleep configuration value
https://github.com/ceph/ceph/pull/24501 David Zafman
09:33 PM Support #36326: Huge traffic spike and assert(is_primary())
Given what you've showed here it's unlikely that the network issue was caused by this — more likely the other way aro... Greg Farnum
09:28 PM Bug #36345: librados C API aio read empty buffer
Yeah can you make a PR, Wido? Somebody will need to know or run through how the IoCtx works with these data members a... Greg Farnum
02:14 PM Bug #36345: librados C API aio read empty buffer
I was notified about this issue and a simple fix would be:... Wido den Hollander
09:24 PM Support #36351: mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
12.2.2 is pretty out-of-date for Luminous and you appear to be running a custom build, so I'm not sure my line number... Greg Farnum
01:07 AM Support #36351: mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
Maybe the same as this issues: http://tracker.ceph.com/issues/12941 huanwen ren
01:02 AM Support #36351: mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
*By using the tool (ceph-monstore-tool) to start the abnormal mon directory, I can get the osdmap and monmap informat... huanwen ren
08:47 PM Bug #26890 (Resolved): scrub livelock
Nathan Cutler
08:47 PM Backport #26932 (Resolved): luminous: scrub livelock
Nathan Cutler
06:53 PM Backport #26932: luminous: scrub livelock
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24396
merged
Yuri Weinstein
08:22 PM Bug #35076 (Resolved): mon: mgr options not parse propertly
Nathan Cutler
08:21 PM Backport #35836 (Resolved): mimic: mon: mgr options not parse propertly
Nathan Cutler
02:06 AM Backport #35836: mimic: mon: mgr options not parse propertly
merged Neha Ojha
07:46 PM Bug #36388 (Resolved): osd: "out of order op"
... Patrick Donnelly
03:20 PM Bug #36378 (New): upmap to same osd twice possible, crashes calc_pg_upmaps
Somehow v12.2.8 let me define an upmap like this:
pg_upmap_items 75.1ef [643,1100,625,907,647,1100]
and this...
Dan van der Ster
02:16 PM Bug #36358 (Fix Under Review): Interactive mode CLI prints no output since Mimic
https://github.com/ceph/ceph/pull/24521 John Spray
08:48 AM Support #36341 (Resolved): build: compile with dpdk failed in master branch
Kefu Chai
04:02 AM Bug #36250: ceph-osd process crashing
Also...
In your original post you showed a message from the log showing an exception "buffer::malformed_input: ent...
Brad Hubbard
01:55 AM Bug #36250: ceph-osd process crashing
Hello Josh,
Sorry it took me a while to see this.
Could you attach the output of "ceph report" please?
Brad Hubbard

10/09/2018

11:07 PM Bug #36306 (Fix Under Review): monstore tool rebuild does not generate creating_pgs
https://github.com/ceph/ceph/pull/24506 Sage Weil
09:01 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
Saw this again in a luminous QA run:... Patrick Donnelly
01:34 PM Bug #36358 (Resolved): Interactive mode CLI prints no output since Mimic
The polling command stuff (for iostat) changed the path for printing output, and now you just don't get anything when... John Spray
09:02 AM Support #36341 (In Progress): build: compile with dpdk failed in master branch
https://github.com/ceph/ceph/pull/24487 should resolve it. Brad Hubbard
02:48 AM Support #36351 (New): mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
I have a CEPH cluster which contains 3 mons, due to abnormal power failure, one mon service starts abnormally. The ex... huanwen ren
01:03 AM Bug #36347 (Resolved): Upgrade test in jewel fails with "Unable to locate package python3-rados"
Neha Ojha

10/08/2018

11:54 PM Backport #36149 (In Progress): luminous: output format is invalid of the crush tree json dumper
https://github.com/ceph/ceph/pull/24482 Prashant D
11:51 PM Backport #36150 (In Progress): mimic: output format is invalid of the crush tree json dumper
https://github.com/ceph/ceph/pull/24481 Prashant D
10:56 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
Another set:... Patrick Donnelly
10:02 PM Bug #36347 (Fix Under Review): Upgrade test in jewel fails with "Unable to locate package python3...
https://github.com/ceph/ceph/pull/24479 Neha Ojha
05:43 PM Bug #36347 (Resolved): Upgrade test in jewel fails with "Unable to locate package python3-rados"
... Neha Ojha
04:11 PM Bug #36345: librados C API aio read empty buffer
the 'same' in c++ seems to work, so i guess it's limited to the c api Anonymous
02:56 PM Bug #36345 (Resolved): librados C API aio read empty buffer
When using the AIO functions, the readbuffer remains empty. when using the normal rados_read, the buffer is filled wi... Anonymous
02:27 PM Bug #24835: osd daemon spontaneous segfault
Hi Brad,
thanks for investigating this issue and sorry for my late response, I was on holidays.
The file IDs as...
Soenke Schippmann
10:16 AM Bug #36239 (Resolved): osd/PrimaryLogPG: fix potential pg-log overtrimming
Nathan Cutler
10:16 AM Backport #36275 (Resolved): mimic: osd/PrimaryLogPG: fix potential pg-log overtrimming
Nathan Cutler
10:15 AM Bug #35924 (Resolved): choose_acting picked want > pool size
Nathan Cutler
10:15 AM Backport #35963 (Resolved): mimic: choose_acting picked want > pool size
Nathan Cutler
10:15 AM Bug #35546 (Resolved): RADOS: probably missing clone location for async_recovery_targets
Nathan Cutler
10:07 AM Backport #35964 (Resolved): mimic: RADOS: probably missing clone location for async_recovery_targets
Nathan Cutler
09:53 AM Backport #26840 (Resolved): luminous: librados application's symbol could conflict with the libce...
Nathan Cutler
07:58 AM Bug #23387: Building Ceph on armhf fails due to out-of-memory
I found a way (it is not directly a solution) to this problem, but using Clang/LLVM instead of the GCC toolchain, I m... Daniel Glaser
04:08 AM Support #36341 (Resolved): build: compile with dpdk failed in master branch
sh do_cmake.sh -DWITH_DPDK=ON -DWITH_TESTS=OFF
make -j 12
[ 51%] Building CXX object src/os/CMakeFiles/os.dir/f...
tangwenjun tang
 

Also available in: Atom