Activity
From 10/08/2018 to 11/06/2018
11/06/2018
- 01:22 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- /a/sage-2018-11-05_22:04:25-rados-wip-sage3-testing-2018-11-05-1406-distro-basic-smithi/3227352
- 11:54 AM Support #36326: Huge traffic spike and assert(is_primary())
- Thanks for the answer! It looks like traffic spike was caused by another issue: ceph-mon's db grows up to 15GB and it...
- 10:07 AM Bug #36709 (Closed): OSD stuck while flushing rocksdb WAL
- Hi all,
We use:
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
Clients work on:
... - 01:30 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- Quoting my reply to ceph-devel for reference:
"Nathan, I don't think we want to revert it for 13.2.2.
This is b...
11/05/2018
- 10:42 PM Bug #22902 (Resolved): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
- 10:32 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- So, the luminous revert was merged. Neha, will there be a mimic revert as well? Since the pg hard limit patches are p...
- 10:13 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- https://github.com/ceph/ceph/pull/24903 merged
- 10:28 PM Bug #36508 (Resolved): gperftools-libs-2.6.1-1 or newer required for binaries linked against corr...
- 10:28 PM Backport #36552 (Resolved): luminous: gperftools-libs-2.6.1-1 or newer required for binaries link...
- 10:10 PM Backport #36552: luminous: gperftools-libs-2.6.1-1 or newer required for binaries linked against ...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24706
merged - 10:25 PM Bug #34541 (Resolved): deep scrub cannot find the bitrot if the object is cached
- 10:25 PM Backport #35067 (Resolved): luminous: deep scrub cannot find the bitrot if the object is cached
- 10:08 PM Backport #35067: luminous: deep scrub cannot find the bitrot if the object is cached
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24802
merged - 10:18 PM Backport #36678 (Resolved): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state...
- 05:20 PM Feature #24917: Gracefully deal with upgrades when bluestore skipping of data_digest becomes active
- Let's include this with any other feature bit addition.
- 01:30 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- > I suspect it shouldn't.
But it does exactly that.
> That's will only re-copy the data to the HEAD revision.
...
11/04/2018
- 06:55 PM Bug #36677 (Fix Under Review): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')'...
- A fix is already available. See Sage's PR: https://github.com/ceph/ceph/pull/24835.
11/03/2018
- 11:27 PM Bug #24923 (Resolved): doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
- 11:27 PM Backport #25055 (Resolved): mimic: doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
- 11:26 PM Backport #35071 (In Progress): mimic: FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor...
- 04:42 AM Backport #23670 (In Progress): luminous: auth: ceph auth add does not sanity-check caps
- 04:24 AM Backport #23670 (New): luminous: auth: ceph auth add does not sanity-check caps
- Kefu did the jewel backport, so assigning this to him in hopes he'll pick it up.
- 04:00 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- -Also, is this bug reproducible in master and mimic as well? If not, the Backport field should probably be modified.....
- 03:58 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- Neha, 12.2.9 has already been cut, so we'll need to expedite 12.2.10 to push the revert out to users.
- 03:52 AM Backport #36678 (In Progress): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad st...
11/02/2018
- 11:57 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- The immediate fix is to revert this for luminous before 12.2.9: https://github.com/ceph/ceph/pull/24903
- 11:51 PM Bug #36686 (Resolved): osd: pg log hard limit can cause crash during upgrade
- During an upgrade from an earlier version, a primary running the new code will send a trim_to value to a replica that...
- 05:14 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
- Ceph has already moved to C++17. The main question is: have we transitioned to C++17 also our public headers xor put ...
- 04:58 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
- The no-message-taking-variant of *static_assert* has been introduced in C++17. The code is being compiled with *-std=...
- 04:55 PM Bug #36677 (In Progress): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
- 05:14 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- Back-and-forth question answering like this is probably better for the mailing list (the ticket is currently closed F...
- 04:57 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- since you've identified that this is an RBD workload, assigning it to that project so that RBD team notices it. HTH.
- 02:37 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- Oops. That's more than 2 questions. But anyway :)
- 02:36 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- OK, I looked into OSD datastore using ceph-objectstore-tool and I see that for almost every object there are two copi...
- 01:39 PM Bug #24835: osd daemon spontaneous segfault
- We do use some configuration set by "ceph config set" or "ceph config-key set":...
11/01/2018
- 11:46 PM Backport #36678 (Resolved): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state...
- https://github.com/ceph/ceph/pull/24902
- 11:19 PM Bug #22902 (Pending Backport): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machin...
- Based on similar failures seen in luminous: http://pulpito.ceph.com/yuriw-2018-10-31_22:45:22-rados-wip-yuri4-testing...
- 09:10 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
- ...
- 09:06 PM Bug #36677 (Resolved): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
- ...
- 04:44 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
- Looking through the ceph/rocksdb repo I don't see how it's possible for rocksdb to be compiled without snappy support...
- 03:35 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
- This seems to be a problem where rocksdb on CentOS doesn't support snappy compression but the ceph-kvstore-tool is co...
- 06:14 AM Bug #36667 (New): OSD object_map sync returned error
- i deploy a cephfs and the used the vdbench tool to wirte data in cephfs mount point,after a while osd appears down.
...
10/31/2018
- 09:21 PM Bug #36411 (Closed): OSD crash starting recovery/backfill with EC pool
- It's my current belief that these objects were broken as a result of intentional metadata manipulation when some of t...
- 09:18 PM Bug #36572: ceph-in: --connect-timeout doesn't work while pinging mon
- New PR: https://github.com/ceph/ceph/pull/24733
- 09:17 PM Support #36584 (Closed): OSD Anomaly behaviour in ceph-reweight
- Are you running the command repeatedly? reweight-by-utilization does not provide a stable balance; it's really just a...
- 08:43 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
- https://github.com/ceph/ceph/pull/24868
- 05:35 PM Feature #36661: osd: add sanity check on startup to compare osd memory target to available memory...
- - in OSD::handle_conf_change, we should sanity check this against current memory available on the system and refuse t...
- 04:59 PM Feature #36661 (New): osd: add sanity check on startup to compare osd memory target to available ...
- This is needed so that we do not fail due to osd_memomory_target being set too high compared to the amount of memory ...
- 11:42 AM Backport #36658 (Resolved): mimic: Cache-tier forward mode hang in luminous (again)
- https://github.com/ceph/ceph/pull/25075
- 11:42 AM Backport #36657 (Resolved): luminous: Cache-tier forward mode hang in luminous (again)
- https://github.com/ceph/ceph/pull/25074
10/30/2018
- 08:08 PM Bug #36345 (Resolved): librados C API aio read empty buffer
- 08:07 PM Bug #36406 (Pending Backport): Cache-tier forward mode hang in luminous (again)
- 05:16 PM Backport #36647 (Resolved): mimic: librados api aio tests race condition
- https://github.com/ceph/ceph/pull/25027
- 05:16 PM Backport #36646 (Resolved): luminous: librados api aio tests race condition
- https://github.com/ceph/ceph/pull/25028
- 05:14 PM Backport #36637 (Resolved): mimic: osd: race condition opening heartbeat connection
- https://github.com/ceph/ceph/pull/25026
- 05:14 PM Backport #36636 (Resolved): luminous: osd: race condition opening heartbeat connection
- https://github.com/ceph/ceph/pull/25035
- 04:06 PM Bug #36634 (New): LibRadosWatchNotify.WatchNotify2Timeout failure
- ...
- 03:33 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- Yes, I'm using EC with RBD and partial overwrites enabled. CephFS pools are only created recently for tests and do no...
- 01:05 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- How are you writing these objects? Most sites that used EC were using RGW, but I don't see all the pools that go wit...
- 10:31 AM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- In fact it doesn't seem that it will self-heal, and nobody seems to care about it in the mailing list by now...)
C... - 02:33 PM Bug #36631 (In Progress): potential deadlock in PG::_scan_snaps when repairing snap mapper
- If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` ...
- 02:28 PM Backport #36630 (Resolved): luminous: potential deadlock in PG::_scan_snaps when repairing snap m...
- If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` ...
- 02:00 PM Bug #36629 (New): osd:the new file was stored in cache pool which mode was none
- ceph version:13.2.1
kernel client 4.17
I created the cache data pool as ceph's instructions:
(1) ceph osd tier add... - 01:41 AM Bug #36620: osd:the vim will be hanged when I saved the file
- the client: 4.17 kernel client
- 01:36 AM Bug #36620 (New): osd:the vim will be hanged when I saved the file
- ceph version: 13.2.1
situtation: the data pool tiered by a cache data pool and the cache tier pool's mode was read...
10/29/2018
- 10:33 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- Thanks for the response, I wrote to the mailing list ceph-users (is it the correct place?) :)
- 08:37 PM Support #36614 (Closed): Cluster uses substantially more space after rebalance (erasure codes)
- The mailing list is a better place to resolve this. My guess is data hasn't been cleaned up from its old locations ye...
- 12:13 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- How to heal it? If I don't heal it I'll need to purge the whole cluster? O_o...
- 12:12 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- ceph df output:...
- 11:11 AM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- Proofs from our prometheus monitoring. Two graphs from yesterday: one with number of objects in cluster and other wit...
- 10:17 AM Support #36614 (Closed): Cluster uses substantially more space after rebalance (erasure codes)
- Hi
After I recreated one OSD + increased pg count of my erasure-coded (2+1) pool (which was way too low, only 100 ... - 10:21 PM Bug #36525: osd-scrub-snaps.sh failure
Looking at the log another scrub has made the number of "_scan_snaps start" in the log from 2 to 4. It results in ...- 01:06 AM Bug #36525: osd-scrub-snaps.sh failure
- /a/sage-2018-10-28_14:12:19-rados-master-distro-basic-smithi/3196520
another instance on current master - 09:48 PM Bug #23827 (Resolved): osd sends op_reply out of order
- 09:47 PM Backport #25010 (Resolved): mimic: osd sends op_reply out of order
- 08:47 PM Backport #25010: mimic: osd sends op_reply out of order
- https://github.com/ceph/ceph/pull/23136 has merged, can we resolve this issue?
- 09:43 PM Bug #25154 (Resolved): librados application's symbol could conflict with the libceph-common
- 09:42 PM Backport #26839 (Resolved): mimic: librados application's symbol could conflict with the libceph-...
- 08:21 PM Backport #26839: mimic: librados application's symbol could conflict with the libceph-common
- Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/24708
merged - 09:40 PM Bug #35969 (Resolved): "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- 09:39 PM Backport #36553 (Resolved): mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked ...
- 08:16 PM Backport #36553: mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked against cor...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24260
merged - 09:39 PM Backport #36132 (Resolved): mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on ...
- 08:16 PM Backport #36132: mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24260
merged - 08:47 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
- The above changes is not entirely correct. This section needs to be ommited:...
- 08:13 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
- Hello!
I've used the instruction created by Daniel Glasser and with some small code adjustments in a few files I w... - 04:17 PM Bug #36610 (Fix Under Review): filestore merge collection replay problem
- https://github.com/ceph/ceph/pull/24806
- 03:51 PM Bug #36610: filestore merge collection replay problem
- the osd is stopped during the merge operation:...
- 03:46 PM Bug #36182 (Resolved): osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is ...
- 02:59 PM Bug #36473 (Resolved): hung osd_repop, bluestore committed but failed to trigger repop_commit
- this is presumably https://github.com/ceph/ceph/pull/24761
- 02:58 PM Bug #36548 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh
- 01:34 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- /a/sage-2018-10-29_01:11:58-rados-wip-sage-testing-2018-10-28-0943-distro-basic-smithi/3197984
- 01:10 AM Bug #36408 (Resolved): [cache tier] failed guarded write + promotion results in "success" op result
10/28/2018
- 02:40 PM Bug #36602 (Pending Backport): osd: race condition opening heartbeat connection
- 02:37 PM Bug #36610 (Resolved): filestore merge collection replay problem
- /a/sage-2018-10-27_02:10:33-rados-wip-sage-testing-2018-10-26-1411-distro-basic-smithi/3188976
osd.3 was partway t...
10/26/2018
- 07:24 PM Feature #24591: FileStore hasn't impl to get kv-db's statistics
- Jack Lv wrote:
> https://github.com/ceph/ceph/pull/22633
merged - 06:30 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
- https://github.com/ceph/ceph/pull/24761
- 05:41 PM Bug #36602: osd: race condition opening heartbeat connection
- 03:39 PM Bug #36602 (Fix Under Review): osd: race condition opening heartbeat connection
- https://github.com/ceph/ceph/pull/24780
- 03:37 PM Bug #36602 (Resolved): osd: race condition opening heartbeat connection
- ...
- 05:10 PM Bug #20694: osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_log().get_log().obje...
- /a/yuriw-2018-10-25_15:31:28-rados-wip-yuri4-testing-2018-10-24-2310-mimic-distro-basic-smithi/3183476/
- 04:22 PM Bug #36345 (Fix Under Review): librados C API aio read empty buffer
- imirc tw, thank you for your analysis. i am approving https://github.com/ceph/ceph/pull/24534. so "unshared buffer" o...
- 09:52 AM Bug #36345: librados C API aio read empty buffer
- I figured it out. In Objecter.cc:3279...
- 09:02 AM Bug #36345: librados C API aio read empty buffer
- without osd_op_timeout, in Objecter::handle_osd_op_reply, Objecter.cc:3473
op->con px is an AsyncConnection on whic... - 07:54 AM Bug #36345: librados C API aio read empty buffer
- Some more info from what I can see while debugging.
Without 'rados osd op timeout', the buffer in librados::IoCtx... - 02:18 PM Bug #24587 (Pending Backport): librados api aio tests race condition
- 11:01 AM Bug #24180 (Resolved): mon: slow op on log message
- 11:01 AM Backport #24293 (Resolved): jewel: mon: slow op on log message
- 06:42 AM Bug #24835: osd daemon spontaneous segfault
- Our ceph.conf:...
- 03:46 AM Bug #24615 (Need More Info): error message for 'unable to find any IP address' not shown
- Francois,
Can you try reproducing your issue on the latest master?
I fixed a similar issue in master and also fro... - 03:28 AM Bug #24615 (In Progress): error message for 'unable to find any IP address' not shown
- 02:34 AM Bug #25153 (Resolved): output format is invalid of the crush tree json dumper
- 02:33 AM Backport #36149 (Resolved): luminous: output format is invalid of the crush tree json dumper
- 02:33 AM Bug #35845 (Resolved): osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- 02:32 AM Backport #36393 (Resolved): luminous: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- 02:30 AM Bug #36183 (Resolved): [objecter] client socket failure leads to hung connection
- 02:30 AM Backport #36295 (Resolved): luminous: [objecter] client socket failure leads to hung connection
- 02:29 AM Bug #21931 (Resolved): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range...
- 02:29 AM Backport #36440 (Resolved): luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + le...
- 02:28 AM Bug #22330 (Resolved): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 02:28 AM Backport #36438 (Resolved): luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 02:27 AM Bug #36417 (Resolved): osd: get loadavg per cpu for scrub load threshold check
- 02:27 AM Backport #36419 (Resolved): luminous: osd: get loadavg per cpu for scrub load threshold check
- 02:26 AM Bug #36174 (Resolved): ceph pg ls creating: EINVAL
- 02:26 AM Backport #36297 (Resolved): luminous: ceph pg ls creating: EINVAL
- 02:25 AM Bug #23614 (Resolved): local_reserver double-reservation of backfilled pg
- 02:25 AM Backport #24333 (Resolved): luminous: local_reserver double-reservation of backfilled pg
- 02:24 AM Backport #26932 (Resolved): luminous: scrub livelock
10/25/2018
- 10:22 PM Backport #36149: luminous: output format is invalid of the crush tree json dumper
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24482
merged - 10:21 PM Backport #36393: luminous: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/24532
merged - 10:20 PM Backport #36295: luminous: [objecter] client socket failure leads to hung connection
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24574
merged - 10:20 PM Backport #36440: luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (r...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24582
merged - 10:20 PM Backport #36438: luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24582
merged - 10:19 PM Backport #36419: luminous: osd: get loadavg per cpu for scrub load threshold check
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/24593
merged - 10:19 PM Backport #36297: luminous: ceph pg ls creating: EINVAL
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24602
mergedReviewed-by: Neha Ojha <nojha@redhat.com> - 10:18 PM Bug #26890: scrub livelock
- merged https://github.com/ceph/ceph/pull/24659
- 08:01 PM Bug #36345: librados C API aio read empty buffer
- Kefu Chai, same happens on master:...
- 04:36 PM Bug #36345: librados C API aio read empty buffer
- 13.2.2 , i will give it a go on master asap.
- 03:57 PM Bug #36345: librados C API aio read empty buffer
- imirc tw, on which release did you reproduce this issue? is master affected?
- 01:18 PM Bug #36345: librados C API aio read empty buffer
- Hi Kefu,
I'm not that deep into the Ceph code, I was making an assumption based on my observations and past ticket... - 08:52 AM Bug #36345: librados C API aio read empty buffer
- imirc tw, i don't understand how "rados_osd_op_timeout" is related to this issue. i agree that current @librados::IoC...
- 07:02 AM Bug #36345: librados C API aio read empty buffer
- Hi Wido,
The 2nd assumption isn't true, that was because the client.admin ceph.conf file used didn't had the osd_o... - 06:51 AM Bug #36345: librados C API aio read empty buffer
- Updating this ticket as the issue seems to be related to two things:
- When using osd_op_timeout
- When using a u... - 06:21 PM Bug #36598 (Can't reproduce): osd: "bluestore(/var/lib/ceph/osd/ceph-6) ENOENT on clone suggests ...
- ...
- 04:20 PM Backport #24333: luminous: local_reserver double-reservation of backfilled pg
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23493
merged - 05:18 AM Feature #36474: Add support for osd_delete_sleep configuration value
- https://github.com/ceph/ceph/pull/24749
10/24/2018
- 09:43 PM Bug #25182: Upmaps forgotten after restarting OSDs
- One thing I've noticed after living with this for a while is that the upmap entries that are forgotten are always for...
- 09:31 PM Bug #36517: client crashes osd with empty object name
- Attached
- 09:17 PM Bug #36517: client crashes osd with empty object name
- Noah, the paste doesn't show now, could you paste the trace in the tracker.
- 09:21 PM Bug #24485 (Resolved): LibRadosTwoPoolsPP.ManifestUnset failure
- 09:11 PM Bug #36166 (Resolved): pg merge can collide with remapped, upmap pgs
- 02:29 PM Bug #36345: librados C API aio read empty buffer
- Kefu, I'm also experiencing this issue. It seems to be related to `rados osd op timeout`. Once this value is set in t...
- 11:01 AM Support #36584 (Closed): OSD Anomaly behaviour in ceph-reweight
- ceph version 10.2.5
We have this behaviour with 2 OSDs in cluster making a backfilling bucle.
I'm executing thi... - 10:35 AM Bug #19348 (Can't reproduce): "ceph ping mon.c" cli prints assertion failure on timeout
- not able to reproduce with master HEAD anymore.
- 10:34 AM Bug #19348: "ceph ping mon.c" cli prints assertion failure on timeout
- https://github.com/ceph/ceph/pull/24733
10/23/2018
- 09:36 PM Bug #36040: mon: Valgrind: mon (InvalidFree, InvalidWrite, InvalidRead)
- /ceph/teuthology-archive/pdonnell-2018-10-17_19:54:38-multimds-wip-pdonnell-testing-20181017.175152-distro-basic-smit...
- 09:30 PM Bug #36497: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::replace()
- /ceph/teuthology-archive/pdonnell-2018-10-17_19:54:38-multimds-wip-pdonnell-testing-20181017.175152-distro-basic-smit...
- 09:28 PM Bug #36411: OSD crash starting recovery/backfill with EC pool
- I have to add to the previous update, which did not explain the resolution of the problem.
The true solution was w... - 08:27 PM Bug #24587 (Fix Under Review): librados api aio tests race condition
- https://github.com/ceph/ceph/pull/24724
- 07:53 PM Bug #36572: ceph-in: --connect-timeout doesn't work while pinging mon
- Submitted a "PR":https://github.com/ceph/ceph/pull/24723 for this.
- 07:44 PM Bug #36572 (Closed): ceph-in: --connect-timeout doesn't work while pinging mon
- Saw the following output while working on "PR 21432":https://github.com/ceph/ceph/pull/21432 -...
- 03:53 PM Bug #36548: qa/standalone/osd/osd-rep-recov-eio.sh
- The failed run did not include the changes in https://github.com/ceph/ceph/pull/24651 (master). This pull request mi...
- 01:43 AM Bug #36548 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh
- pg ended up in active+clean, not recovery_unfound
/a/sage-2018-10-22_21:29:13-rados-wip-sage-testing-2018-10-22-11... - 06:04 AM Backport #36553 (In Progress): mimic: gperftools-libs-2.6.1-1 or newer required for binaries link...
- 05:44 AM Backport #36553 (Resolved): mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked ...
- https://github.com/ceph/ceph/pull/24260
- 05:52 AM Backport #36552 (In Progress): luminous: gperftools-libs-2.6.1-1 or newer required for binaries l...
- 05:43 AM Backport #36552 (Resolved): luminous: gperftools-libs-2.6.1-1 or newer required for binaries link...
- https://github.com/ceph/ceph/pull/24706
- 05:45 AM Backport #36557 (Resolved): mimic: RBD client IOPS pool stats are incorrect (2x higher; includes ...
- https://github.com/ceph/ceph/pull/25024
- 05:45 AM Backport #36556 (Resolved): luminous: RBD client IOPS pool stats are incorrect (2x higher; includ...
- https://github.com/ceph/ceph/pull/25025
- 05:43 AM Backport #35909 (Resolved): mimic: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- 05:31 AM Backport #36439 (Resolved): mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + lengt...
- 05:31 AM Backport #36437 (Resolved): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 05:30 AM Backport #36296 (Resolved): mimic: [objecter] client socket failure leads to hung connection
- 05:30 AM Backport #36298 (Resolved): mimic: ceph pg ls creating: EINVAL
- 04:41 AM Bug #24835: osd daemon spontaneous segfault
- I'd say the cause of most, if not all, of these crashes is memory corruption caused by code responsible for manipulat...
- 04:31 AM Bug #24835: osd daemon spontaneous segfault
- The 'safe_timer.5246' is again similar but this time tcmalloc is 'popping' a
single value rather than a range.
<p... - 03:54 AM Bug #24835: osd daemon spontaneous segfault
- The 'msgr-worker-1.5278' is almost identical to 'tp_osd_tp' except this time 'i'
= 499 so doing that manually is bey... - 01:58 AM Bug #24835: osd daemon spontaneous segfault
- For the rest of the coredumps adding the debuginfo for libtcmalloc really helps
to understand the problem as we end ... - 12:53 AM Bug #24835: osd daemon spontaneous segfault
- Starting with the bluestore bufferlist destructor crash....
10/22/2018
- 11:39 PM Bug #36508 (Pending Backport): gperftools-libs-2.6.1-1 or newer required for binaries linked agai...
- 11:38 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
- Haven't been able to reproduce this on luminous and mimic, so clearing the Backport fields for now.
- 07:05 PM Bug #24909 (Pending Backport): RBD client IOPS pool stats are incorrect (2x higher; includes IO h...
- 03:40 PM Backport #35909: mimic: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/24017
merged - 03:35 PM Backport #36439: mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (rang...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24581
merged - 03:35 PM Backport #36437: mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24581
merged - 03:34 PM Backport #36296: mimic: [objecter] client socket failure leads to hung connection
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24600
merged - 03:32 PM Backport #36298: mimic: ceph pg ls creating: EINVAL
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24601
merged - 02:31 PM Bug #24956 (Resolved): osd: parent process need to restart log service after fork, or ceph-osd wi...
- 02:25 PM Bug #36546 (Duplicate): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back...
- ...
- 02:21 PM Bug #36485 (Resolved): dump-stuck.yaml fails assert len(inactive) == num_inactive
10/21/2018
- 03:53 PM Bug #36485 (Fix Under Review): dump-stuck.yaml fails assert len(inactive) == num_inactive
- https://github.com/ceph/ceph/pull/24689
- 09:25 AM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
- https://github.com/ceph/ceph/pull/24687
10/20/2018
- 09:43 PM Bug #22144: *** Caught signal (Aborted) ** in thread thread_name:tp_peering
- we can confirm we are experiencing the same issue on version 12.2.7 and currently have some random osds that went off...
10/19/2018
- 09:12 PM Bug #16279 (Closed): assert(objiter->second->version > last_divergent_update) failed
- Closing this ticket since the linked PR was also closed.
- 09:09 PM Bug #17252: [Librados] Deadlock on RadosClient::watch_flush
- 08:18 PM Bug #24368: osd: should not restart on permanent failures
- Clearing backport field on the assumption that's what was intended by the previous edit.
- 08:01 PM Bug #24368 (Resolved): osd: should not restart on permanent failures
- Okay, after discussing with CERN I've merged the PR to master so this isn't an issue going forward.
But unfortunat...
10/18/2018
- 10:35 PM Bug #22561: PG stuck during recovery, requires OSD restart
- I've just encountered this again with about 20 OSDs being non-responsive like this. Restarting the OSDs in that state...
- 10:20 PM Bug #36485: dump-stuck.yaml fails assert len(inactive) == num_inactive
- /a/sage-2018-10-17_20:20:33-rados-nautilus-distro-basic-smithi/3154379
fails every time - 08:27 PM Bug #36525 (Resolved): osd-scrub-snaps.sh failure
- ...
- 08:25 PM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
- /a/sage-2018-10-17_20:20:33-rados-nautilus-distro-basic-smithi/3154168
- 06:11 PM Bug #36408 (Fix Under Review): [cache tier] failed guarded write + promotion results in "success"...
- *PR*: https://github.com/ceph/ceph/pull/24666
- 06:08 PM Bug #36408 (In Progress): [cache tier] failed guarded write + promotion results in "success" op r...
- 04:02 PM Bug #24368: osd: should not restart on permanent failures
- From a user:
>There is some class of OSD out there (all filestore, IIRC) that are ultra slow to start at boot time i... - 03:56 PM Bug #22233: prime_pg_temp breaks on uncreated pgs
- I don't understand by the bug happened (or what the proposed fix is trying to do). Given the description above, the ...
- 03:04 PM Bug #36517 (New): client crashes osd with empty object name
- I found a RADOS client causing OSDs to crash running bluestore (haven't tried filestore) producing the following erro...
- 02:51 PM Bug #36515 (Resolved): config options: 'services' field is empty for many config options
- The 'services' field is empty for many config options, e.g.:...
- 11:17 AM Feature #21902: Support bytearray in python binding
- ...
- 10:36 AM Backport #26932 (In Progress): luminous: scrub livelock
- https://github.com/ceph/ceph/pull/24659
i am resetting the target version, and changing its status to "In Progress... - 01:28 AM Bug #36508 (In Progress): gperftools-libs-2.6.1-1 or newer required for binaries linked against c...
- https://github.com/ceph/ceph/pull/24652
- 12:42 AM Bug #36508 (Resolved): gperftools-libs-2.6.1-1 or newer required for binaries linked against corr...
- Binaries compiled against the 2.6 version of libtcmalloc.so.4 (in this case ceph-osd) have the following undefined sy...
- 12:50 AM Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects
- David Zafman wrote:
> The original code is operating as intended when issuing this warning:
>
> WARNING: Split oc... - 12:46 AM Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects
- Greg Farnum wrote:
> Did you try importing 2.1f with that original 2.f PG dump?
No, the test script doesn't impor... - 12:44 AM Bug #35969: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- See http://tracker.ceph.com/issues/36508
10/17/2018
- 11:24 PM Bug #36412 (Closed): ceph-objectstore-tool import after pg splits which will lost objects
The original code is operating as intended when issuing this warning:
WARNING: Split occurred, some objects may ...- 11:19 PM Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects
As Greg pointed out you would use the --pgid 2.1f option with --op import to get the objects that split into that p...- 09:28 PM Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects
- Did you try importing 2.1f with that original 2.f PG dump?
- 09:31 PM Bug #36405: unittest_seastar_messenger failure on ARM
- Kefu, could you please take a look.
- 09:24 PM Bug #22727 (Resolved): "osd pool stats" shows recovery information bugly
- 09:24 PM Backport #22808 (Rejected): jewel: "osd pool stats" shows recovery information bugly
- Jewel is EOL
- 09:24 PM Bug #22539 (Resolved): bluestore: New OSD - Caught signal - bstore_kv_sync
- 09:23 PM Backport #22906 (Rejected): jewel: bluestore: New OSD - Caught signal - bstore_kv_sync (throttle ...
- Jewel is EOL
- 09:20 PM Backport #36506 (Resolved): luminous: mon osdmap cash too small during upgrade to mimic
- https://github.com/ceph/ceph/pull/25021
- 09:20 PM Backport #36505 (Resolved): mimic: mon osdmap cash too small during upgrade to mimic
- https://github.com/ceph/ceph/pull/25019
- 09:19 PM Bug #36163 (Pending Backport): mon osdmap cash too small during upgrade to mimic
- 09:07 PM Bug #36163 (Resolved): mon osdmap cash too small during upgrade to mimic
- 09:12 PM Bug #22329 (Closed): mon: Valgrind: mon (Leak_DefinitelyLost, Leak_IndirectlyLost)
- Please feel free to reopen it, if this appears again.
- 09:09 PM Bug #23879: test_mon_osdmap_prune.sh fails
- Joao, we've been seeing this one for a while, could you please take a look. Thanks!
- 08:33 PM Backport #24889 (Resolved): mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_h...
- 07:58 PM Backport #24889: mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23026
merged - 08:31 PM Bug #36170 (Resolved): pg dout log had backfill=[] and bft= which are the same thing
- 08:31 PM Backport #36292 (Resolved): mimic: pg dout log had backfill=[] and bft= which are the same thing
- 07:56 PM Backport #36292: mimic: pg dout log had backfill=[] and bft= which are the same thing
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24573
merged - 07:13 PM Bug #36498 (New): failed to recover before timeout expired due to pg stuck in creating+peering
- ...
- 06:58 PM Bug #24866: FAILED assert(0 == "past_interval start interval mismatch") in check_past_interval_bo...
- Seeing this on master....
- 06:50 PM Bug #36497 (Resolved): FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::repla...
- ...
- 05:56 PM Bug #36494 (Fix Under Review): Change osd_objectstore default to bluestore
- -https://github.com/ceph/ceph/pull/24642-
- 05:05 PM Bug #36494 (Resolved): Change osd_objectstore default to bluestore
- Change osd_objectstore default to bluestore
https://bugzilla.redhat.com/show_bug.cgi?id=1640257
- 05:56 PM Bug #36485: dump-stuck.yaml fails assert len(inactive) == num_inactive
- /a/nojha-2018-10-16_22:54:08-rados-master-distro-basic-smithi/3149382/
- 01:44 PM Bug #36485 (Resolved): dump-stuck.yaml fails assert len(inactive) == num_inactive
- /a/sage-2018-10-17_01:58:53-rados-wip-sage-testing-2018-10-16-1758-distro-basic-smithi/3149554
- 03:07 PM Bug #35847 (Resolved): wrong cluster_network doesn't cause any errors and ends up using monitor n...
- 01:43 PM Bug #36418 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh fails to parse pg dump
- 12:43 AM Bug #36473: hung osd_repop, bluestore committed but failed to trigger repop_commit
- See https://github.com/ceph/ceph/pull/23317#issuecomment-423432234 (should be the same issue):
I also took a clos...
10/16/2018
- 10:53 PM Bug #36473: hung osd_repop, bluestore committed but failed to trigger repop_commit
- ...
- 10:45 PM Bug #36473 (Resolved): hung osd_repop, bluestore committed but failed to trigger repop_commit
- /a/sage-2018-10-16_18:31:27-rados-wip-sage-testing-2018-10-16-0724-distro-basic-smithi/3148851
Usually after the b... - 10:50 PM Bug #36105 (Resolved): OSD hangs during shutdown
- 10:48 PM Feature #36474: Add support for osd_delete_sleep configuration value
- 10:48 PM Feature #36474 (Resolved): Add support for osd_delete_sleep configuration value
- [RFE] Introduce an option or flag to throttle the pg deletion process
https://bugzilla.redhat.com/show_bug.cgi?id=16... - 10:14 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
- This can be reproduced with the fs:basic_workload suite, using --filter 'cfuse_workunit_suites_fsx.yaml'.
Particular... - 12:19 PM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
- /a/sage-2018-10-15_22:20:16-rados-wip-sage4-testing-2018-10-15-1501-distro-basic-smithi/3145753
- 10:04 AM Backport #36150 (Resolved): mimic: output format is invalid of the crush tree json dumper
- 08:51 AM Bug #24768 (Resolved): rgw workload makes osd memory explode
- 08:48 AM Backport #24847 (Resolved): jewel: rgw workload makes osd memory explode
- 07:56 AM Backport #36437 (In Progress): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 07:54 AM Backport #36437 (New): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 07:54 AM Backport #36438 (In Progress): luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 06:59 AM Bug #36345 (Can't reproduce): librados C API aio read empty buffer
- Wido, i am closing this issue as "can't reproduce". if you managed to reproduce it, please feel free to reopen it. th...
- 05:29 AM Bug #24835: osd daemon spontaneous segfault
- ...
- 01:00 AM Bug #36418 (Fix Under Review): qa/standalone/osd/osd-rep-recov-eio.sh fails to parse pg dump
10/15/2018
- 11:58 PM Backport #36297 (In Progress): luminous: ceph pg ls creating: EINVAL
- https://github.com/ceph/ceph/pull/24602
- 11:57 PM Backport #36298 (In Progress): mimic: ceph pg ls creating: EINVAL
- https://github.com/ceph/ceph/pull/24601
- 11:54 PM Backport #36296 (In Progress): mimic: [objecter] client socket failure leads to hung connection
- https://github.com/ceph/ceph/pull/24600
- 10:08 PM Bug #36411: OSD crash starting recovery/backfill with EC pool
- This resolved itself, though in a way that doesn't exactly make any sense...
Eventually I noticed that one of the ... - 08:49 PM Backport #36150: mimic: output format is invalid of the crush tree json dumper
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24481
merged - 07:57 PM Backport #36419 (In Progress): luminous: osd: get loadavg per cpu for scrub load threshold check
- 04:59 PM Bug #24615: error message for 'unable to find any IP address' not shown
- I haven't compiled Ceph: it was installed on CentOS via the RPM Ceph repository (https://download.ceph.com) version 1...
- 12:31 AM Bug #24615 (Need More Info): error message for 'unable to find any IP address' not shown
- Francois, did you compile your ceph with WITH_SEASTAR option?
- 02:13 PM Documentation #23777: doc: description of OSD_OUT_OF_ORDER_FULL problem
- Any progress? I'm facing the same issue.
- 10:33 AM Backport #36440 (In Progress): luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset +...
- 10:28 AM Backport #36440 (Resolved): luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + le...
- https://github.com/ceph/ceph/pull/24582
- 10:31 AM Backport #36439 (In Progress): mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + le...
- 10:27 AM Backport #36439 (Resolved): mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + lengt...
- https://github.com/ceph/ceph/pull/24581
- 10:30 AM Backport #36437 (In Progress): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 10:27 AM Backport #36437 (Resolved): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- https://github.com/ceph/ceph/pull/24581
- 10:27 AM Backport #36438 (Resolved): luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- https://github.com/ceph/ceph/pull/24582
- 10:26 AM Backport #36436 (Resolved): luminous: rados rm --force-full is blocked when cluster is in full st...
- https://github.com/ceph/ceph/pull/25018
- 10:26 AM Backport #36435 (Resolved): mimic: rados rm --force-full is blocked when cluster is in full status
- https://github.com/ceph/ceph/pull/25017
- 10:25 AM Backport #36434 (Resolved): luminous: monstore tool rebuild does not generate creating_pgs
- https://github.com/ceph/ceph/pull/25825
- 10:25 AM Backport #36433 (Resolved): mimic: monstore tool rebuild does not generate creating_pgs
- https://github.com/ceph/ceph/pull/25016
- 10:25 AM Backport #36432 (Resolved): mimic: Interactive mode CLI prints no output since Mimic
- https://github.com/ceph/ceph/pull/24971
- 09:17 AM Bug #36418: qa/standalone/osd/osd-rep-recov-eio.sh fails to parse pg dump
- sorry, I'm later
fixup: https://github.com/ceph/ceph/pull/24579 - 12:50 AM Backport #36295 (In Progress): luminous: [objecter] client socket failure leads to hung connection
- https://github.com/ceph/ceph/pull/24574
- 12:48 AM Backport #36292 (In Progress): mimic: pg dout log had backfill=[] and bft= which are the same thing
- https://github.com/ceph/ceph/pull/24573
10/14/2018
- 01:05 PM Bug #36300 (Resolved): Clients receive "wrong fsid" error when CephX is disabled
- 01:04 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- /a/sage-2018-10-13_00:36:33-rados-wip-sage-testing-2018-10-12-1741-distro-basic-smithi/3133276
- 01:10 AM Bug #35847 (Fix Under Review): wrong cluster_network doesn't cause any errors and ends up using m...
10/13/2018
- 02:19 AM Bug #22330: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- Note that there is a common PR to be backported for this issue and https://tracker.ceph.com/issues/21931
- 02:17 AM Bug #22330 (Pending Backport): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 02:16 AM Bug #21931 (Pending Backport): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <...
10/12/2018
- 09:26 PM Bug #36186 (Resolved): failed to become clean before timeout expired - pg stuck in clean+premerge...
- this run predates fcb1679eab4240c046ba922060c20423fb35ce43, which fixed the problem!
- 09:14 PM Feature #24176 (Resolved): osd: add command to drop OSD cache
- 09:13 PM Bug #36358 (Pending Backport): Interactive mode CLI prints no output since Mimic
- 09:12 PM Backport #36419 (Resolved): luminous: osd: get loadavg per cpu for scrub load threshold check
- https://github.com/ceph/ceph/pull/24593
- 08:43 PM Bug #36418 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh fails to parse pg dump
- ...
- 08:39 PM Bug #26958: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objec...
- /a/sage-2018-10-12_13:16:07-rados-wip-sage-testing-2018-10-11-1437-distro-basic-smithi/3131789
- 08:25 PM Bug #22330 (Fix Under Review): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- https://github.com/ceph/ceph/pull/24564
- 08:25 PM Bug #21931 (Fix Under Review): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <...
- https://github.com/ceph/ceph/pull/24564
- 05:39 PM Bug #36417 (Pending Backport): osd: get loadavg per cpu for scrub load threshold check
- 04:53 PM Bug #36417 (Resolved): osd: get loadavg per cpu for scrub load threshold check
- https://github.com/ceph/ceph/pull/17718
- 03:52 PM Bug #36406 (Fix Under Review): Cache-tier forward mode hang in luminous (again)
- 09:32 AM Bug #36406: Cache-tier forward mode hang in luminous (again)
- Patch https://github.com/ceph/ceph/pull/24548
- 11:16 AM Bug #36345: librados C API aio read empty buffer
- i tested both v13.2.2 and v12.2.8, with the provided source files. and still no luck: i am not able to reproduce this...
- 02:59 AM Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects
- @David Zafman Do you have time to take a look?
- 02:57 AM Bug #36412 (Closed): ceph-objectstore-tool import after pg splits which will lost objects
- Hi, i have a test cluster, doing the follow steps, the pool is erasure k:m=3:1
step 1: export pg 2.f from osd.2, ori... - 02:15 AM Bug #35969: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- @Nathan, Understood, will open a new issue.
- 02:12 AM Bug #36250 (Need More Info): ceph-osd process crashing
- 02:11 AM Bug #24835: osd daemon spontaneous segfault
- Thanks Soenke,
These should help to isolate the problem.
10/11/2018
- 10:06 PM Bug #36411 (Closed): OSD crash starting recovery/backfill with EC pool
- We have one pg on a 4+2 EC pool in which the OSDs will crash with the following error, on reaching an active set of m...
- 07:25 PM Bug #36177 (Pending Backport): rados rm --force-full is blocked when cluster is in full status
- 07:19 PM Bug #23879: test_mon_osdmap_prune.sh fails
- /a/sage-2018-10-10_15:50:53-rados-wip-sage-testing-2018-10-10-0850-distro-basic-smithi/3125020
- 06:53 PM Bug #36306 (Pending Backport): monstore tool rebuild does not generate creating_pgs
- https://github.com/ceph/ceph/pull/24506
- 06:36 PM Bug #36408 (Resolved): [cache tier] failed guarded write + promotion results in "success" op result
- Simple reproducer: ...
- 06:25 PM Bug #35845 (Pending Backport): osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- 06:09 PM Bug #36300: Clients receive "wrong fsid" error when CephX is disabled
- 05:08 PM Bug #36406: Cache-tier forward mode hang in luminous (again)
- Iain Bucław wrote:
> Similar to https://tracker.ceph.com/issues/23296
>
Looking at the fix for the other issue.... - 04:46 PM Bug #36406: Cache-tier forward mode hang in luminous (again)
- Iain Bucław wrote:
> In the logs, it looks like the client/server enters an infinite loop.
>
> [...]
These are... - 04:28 PM Bug #36406: Cache-tier forward mode hang in luminous (again)
- In the logs, it looks like the client/server enters an infinite loop....
- 04:19 PM Bug #36406 (Resolved): Cache-tier forward mode hang in luminous (again)
- Similar to https://tracker.ceph.com/issues/23296
Commands ran to reproduce (in vstart.sh)... - 03:14 PM Bug #36405 (Resolved): unittest_seastar_messenger failure on ARM
- We often ignore these failures, but when I looked at the log I realised it's actually a recently added test that's fa...
- 01:48 PM Bug #24835: osd daemon spontaneous segfault
- Coredump: 258b1ec0-ebc6-43df-b35e-f16a780148b5...
- 01:44 PM Bug #24835: osd daemon spontaneous segfault
- Coredump: bf9b2d5c-96f5-4d30-b852-3888dda66a6b...
- 01:33 PM Bug #24835: osd daemon spontaneous segfault
- We do have some more core dumps with different stack traces.
Coredump: ebb8eff9-b0d6-4321-b85b-d31be87ed7c2
<pr... - 02:53 AM Bug #24835 (New): osd daemon spontaneous segfault
- Looking into this. Will update when I have analysed these coredumps.
In the meantime, if you get any that have a d... - 11:10 AM Bug #24956 (Fix Under Review): osd: parent process need to restart log service after fork, or cep...
- 09:17 AM Bug #35969 (Pending Backport): "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on cent...
- 09:16 AM Bug #35969: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- @Brad: The backporting process for the original fix is already well-along. If a follow-up fix is required, could you ...
- 03:00 AM Bug #35969: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- Not resolved as per https://github.com/ceph/ceph/pull/24260#issuecomment-427144712. Looking into this further.
- 07:17 AM Bug #36345: librados C API aio read empty buffer
- I am personally not running into the issue, but the reporter is. The reporter contacted me to forward the fix which s...
- 06:40 AM Bug #36345 (Fix Under Review): librados C API aio read empty buffer
- PR posted by Wido: https://github.com/ceph/ceph/pull/24534
- 06:19 AM Bug #36345: librados C API aio read empty buffer
- Wido, i am not able to reproduce this issue on master:...
10/10/2018
- 11:51 PM Backport #36393 (Resolved): luminous: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- https://github.com/ceph/ceph/pull/24532
- 11:25 PM Bug #36300: Clients receive "wrong fsid" error when CephX is disabled
- https://github.com/ceph/ceph/pull/24535
- 09:53 PM Backport #36321: luminous: Add support for osd_delete_sleep configuration value
- Thank you, David.
I hope you will do new patches to mimic and master as this is very specific to luminous. - 08:53 PM Backport #36321: luminous: Add support for osd_delete_sleep configuration value
- https://github.com/ceph/ceph/pull/24501
- 09:33 PM Support #36326: Huge traffic spike and assert(is_primary())
- Given what you've showed here it's unlikely that the network issue was caused by this — more likely the other way aro...
- 09:28 PM Bug #36345: librados C API aio read empty buffer
- Yeah can you make a PR, Wido? Somebody will need to know or run through how the IoCtx works with these data members a...
- 02:14 PM Bug #36345: librados C API aio read empty buffer
- I was notified about this issue and a simple fix would be:...
- 09:24 PM Support #36351: mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
- 12.2.2 is pretty out-of-date for Luminous and you appear to be running a custom build, so I'm not sure my line number...
- 01:07 AM Support #36351: mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
- Maybe the same as this issues: http://tracker.ceph.com/issues/12941
- 01:02 AM Support #36351: mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
- *By using the tool (ceph-monstore-tool) to start the abnormal mon directory, I can get the osdmap and monmap informat...
- 08:47 PM Bug #26890 (Resolved): scrub livelock
- 08:47 PM Backport #26932 (Resolved): luminous: scrub livelock
- 06:53 PM Backport #26932: luminous: scrub livelock
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24396
merged - 08:22 PM Bug #35076 (Resolved): mon: mgr options not parse propertly
- 08:21 PM Backport #35836 (Resolved): mimic: mon: mgr options not parse propertly
- 02:06 AM Backport #35836: mimic: mon: mgr options not parse propertly
- merged
- 07:46 PM Bug #36388 (Resolved): osd: "out of order op"
- ...
- 03:20 PM Bug #36378 (New): upmap to same osd twice possible, crashes calc_pg_upmaps
- Somehow v12.2.8 let me define an upmap like this:
pg_upmap_items 75.1ef [643,1100,625,907,647,1100]
and this... - 02:16 PM Bug #36358 (Fix Under Review): Interactive mode CLI prints no output since Mimic
- https://github.com/ceph/ceph/pull/24521
- 08:48 AM Support #36341 (Resolved): build: compile with dpdk failed in master branch
- 04:02 AM Bug #36250: ceph-osd process crashing
- Also...
In your original post you showed a message from the log showing an exception "buffer::malformed_input: ent... - 01:55 AM Bug #36250: ceph-osd process crashing
- Hello Josh,
Sorry it took me a while to see this.
Could you attach the output of "ceph report" please?
10/09/2018
- 11:07 PM Bug #36306 (Fix Under Review): monstore tool rebuild does not generate creating_pgs
- https://github.com/ceph/ceph/pull/24506
- 09:01 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
- Saw this again in a luminous QA run:...
- 01:34 PM Bug #36358 (Resolved): Interactive mode CLI prints no output since Mimic
- The polling command stuff (for iostat) changed the path for printing output, and now you just don't get anything when...
- 09:02 AM Support #36341 (In Progress): build: compile with dpdk failed in master branch
- https://github.com/ceph/ceph/pull/24487 should resolve it.
- 02:48 AM Support #36351 (New): mon: OSDMonitor.cc: 380: FAILED assert(err == 0)12.2.2
- I have a CEPH cluster which contains 3 mons, due to abnormal power failure, one mon service starts abnormally. The ex...
- 01:03 AM Bug #36347 (Resolved): Upgrade test in jewel fails with "Unable to locate package python3-rados"
10/08/2018
- 11:54 PM Backport #36149 (In Progress): luminous: output format is invalid of the crush tree json dumper
- https://github.com/ceph/ceph/pull/24482
- 11:51 PM Backport #36150 (In Progress): mimic: output format is invalid of the crush tree json dumper
- https://github.com/ceph/ceph/pull/24481
- 10:56 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
- Another set:...
- 10:02 PM Bug #36347 (Fix Under Review): Upgrade test in jewel fails with "Unable to locate package python3...
- https://github.com/ceph/ceph/pull/24479
- 05:43 PM Bug #36347 (Resolved): Upgrade test in jewel fails with "Unable to locate package python3-rados"
- ...
- 04:11 PM Bug #36345: librados C API aio read empty buffer
- the 'same' in c++ seems to work, so i guess it's limited to the c api
- 02:56 PM Bug #36345 (Resolved): librados C API aio read empty buffer
- When using the AIO functions, the readbuffer remains empty. when using the normal rados_read, the buffer is filled wi...
- 02:27 PM Bug #24835: osd daemon spontaneous segfault
- Hi Brad,
thanks for investigating this issue and sorry for my late response, I was on holidays.
The file IDs as... - 10:16 AM Bug #36239 (Resolved): osd/PrimaryLogPG: fix potential pg-log overtrimming
- 10:16 AM Backport #36275 (Resolved): mimic: osd/PrimaryLogPG: fix potential pg-log overtrimming
- 10:15 AM Bug #35924 (Resolved): choose_acting picked want > pool size
- 10:15 AM Backport #35963 (Resolved): mimic: choose_acting picked want > pool size
- 10:15 AM Bug #35546 (Resolved): RADOS: probably missing clone location for async_recovery_targets
- 10:07 AM Backport #35964 (Resolved): mimic: RADOS: probably missing clone location for async_recovery_targets
- 09:53 AM Backport #26840 (Resolved): luminous: librados application's symbol could conflict with the libce...
- 07:58 AM Bug #23387: Building Ceph on armhf fails due to out-of-memory
- I found a way (it is not directly a solution) to this problem, but using Clang/LLVM instead of the GCC toolchain, I m...
- 04:08 AM Support #36341 (Resolved): build: compile with dpdk failed in master branch
- sh do_cmake.sh -DWITH_DPDK=ON -DWITH_TESTS=OFF
make -j 12
[ 51%] Building CXX object src/os/CMakeFiles/os.dir/f...
Also available in: Atom