Activity
From 10/19/2018 to 11/17/2018
11/17/2018
- 03:45 AM Bug #37299 (New): ceph-disk: ceph osd start failed: Command '['/usr/bin/systemctl', 'disable', 'c...
- Please see the details at:
https://bugzilla.redhat.com/show_bug.cgi?id=1649208#c0
11/16/2018
- 12:47 PM Bug #37289 (New): Issue with overfilled OSD for cache-tier pools
- We have bad issue in our ceph cluster.
Centos 7.5 (3.10.0-862.3.2.el7.x86_64)
Luminous 12.2.5, bluestore OSDs, us... - 11:35 AM Backport #37288 (Resolved): mimic: "sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-0'" fails in u...
- https://github.com/ceph/ceph/pull/25227
- 10:34 AM Bug #16500 (Resolved): ceph_erasure_code_benchmark parameter checking error for LRC plugin
- 06:22 AM Bug #22597 (Pending Backport): "sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-0'" fails in upgra...
- 04:53 AM Bug #36767 (Fix Under Review): OSD: unrecoverable heartbeat connections
- 02:53 AM Feature #23493: config: strip/escape single-quotes in values when setting them via conf file/assi...
- Joao,
Could you take a look at https://github.com/ceph/ceph/pull/20610 and see whether you consider it something t... - 01:59 AM Bug #37264: scrub warning check incorrectly uses mon scrub interval
The scrub warning also doesn't consider the pool specific scrub interval if specified. The scrub code gets the p...
11/15/2018
- 01:16 PM Bug #25146 (Resolved): "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:paralle...
- 11:36 AM Backport #37273 (In Progress): mimic: debian: packaging need to reflect move of /etc/bash_complet...
- 10:47 AM Backport #37273: mimic: debian: packaging need to reflect move of /etc/bash_completion.d/radosgw-...
- PR with this backport is https://github.com/ceph/ceph/pull/25115
- 09:44 AM Backport #37273 (Resolved): mimic: debian: packaging need to reflect move of /etc/bash_completion...
- https://github.com/ceph/ceph/pull/25115
- 10:36 AM Backport #37274 (In Progress): luminous: debian: packaging need to reflect move of /etc/bash_comp...
- 09:45 AM Backport #37274 (Resolved): luminous: debian: packaging need to reflect move of /etc/bash_complet...
- https://github.com/ceph/ceph/pull/24997
- 09:38 AM Bug #36725: luminous: Apparent Memory Leak in OSD
- raising priority since this might be a regression in 12.2.9
- 06:31 AM Bug #36741 (Pending Backport): debian: packaging need to reflect move of /etc/bash_completion.d/r...
- https://github.com/ceph/ceph/pull/24996
- 06:20 AM Bug #37269 (Resolved): Prioritize user specified scrubs
When scrubs start backing up, when a user asks for a scrub it doesn't get priority compared to overdue scrubs. The...- 06:14 AM Bug #37264 (Resolved): scrub warning check incorrectly uses mon scrub interval
When checking the mon_warn_not_scrubbed the mon_scrub_interval is used instead of osd_scrub_max_interval.
11/14/2018
- 08:01 PM Bug #36725: luminous: Apparent Memory Leak in OSD
- Note: Downgrading both OSD servers to v12.2.8 returned memory usage to normal.
- 11:43 AM Backport #36636: luminous: osd: race condition opening heartbeat connection
- std::lock_guard is a C++11 feature: https://en.cppreference.com/w/cpp/header/mutex
11/13/2018
- 02:23 PM Backport #36658 (In Progress): mimic: Cache-tier forward mode hang in luminous (again)
- 02:15 PM Backport #36657 (In Progress): luminous: Cache-tier forward mode hang in luminous (again)
- 11:57 AM Bug #36388: osd: "out of order op"
- This looks like the dup op entries were exceeded so the op was not detected as a dup. Perhaps we should increase the ...
- 04:55 AM Bug #25146: "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:parallel-master-di...
- https://github.com/ceph/ceph/pull/25070
11/12/2018
- 03:41 PM Bug #36767: OSD: unrecoverable heartbeat connections
- Pull request:
https://github.com/ceph/ceph/pull/25061 - 03:09 PM Bug #36767 (Fix Under Review): OSD: unrecoverable heartbeat connections
- There are several unrecoverable heartbeat connections according to logs.
They usually appears after problems/reprodu... - 07:05 AM Bug #36758 (Duplicate): aborts in rocksdb::TableFileName() in mimic-x upgrade test suite
- 05:26 AM Bug #36758: aborts in rocksdb::TableFileName() in mimic-x upgrade test suite
- i think it's a dup of #25146
- 02:57 AM Bug #16500 (Fix Under Review): ceph_erasure_code_benchmark parameter checking error for LRC plugin
- https://github.com/ceph/ceph/pull/25046
11/10/2018
- 10:01 PM Bug #36758: aborts in rocksdb::TableFileName() in mimic-x upgrade test suite
- marking it "urgent", as it can be consistently reproducible. and it renders the cluster unusable after upgrading from...
- 06:11 PM Bug #36758 (Duplicate): aborts in rocksdb::TableFileName() in mimic-x upgrade test suite
- ...
- 02:33 PM Backport #36636 (In Progress): luminous: osd: race condition opening heartbeat connection
- 11:46 AM Backport #36636 (Need More Info): luminous: osd: race condition opening heartbeat connection
- The master commit uses std::lock_guard, which is a C++17-ism, and this makes the backport non-trivial (?)
- 12:42 PM Subtask #36091 (Resolved): [rbd top] collect client perf stats when query is enabled
- *PR*: https://github.com/ceph/ceph/pull/24265
- 11:56 AM Backport #36646 (In Progress): luminous: librados api aio tests race condition
- 11:52 AM Backport #36647 (In Progress): mimic: librados api aio tests race condition
- 11:40 AM Backport #36637 (In Progress): mimic: osd: race condition opening heartbeat connection
- 11:38 AM Backport #36556 (In Progress): luminous: RBD client IOPS pool stats are incorrect (2x higher; inc...
- 11:37 AM Backport #36557 (In Progress): mimic: RBD client IOPS pool stats are incorrect (2x higher; includ...
- 10:19 AM Backport #36506 (In Progress): luminous: mon osdmap cash too small during upgrade to mimic
- 10:05 AM Backport #36505 (In Progress): mimic: mon osdmap cash too small during upgrade to mimic
- 09:59 AM Backport #36436 (In Progress): luminous: rados rm --force-full is blocked when cluster is in full...
- 09:54 AM Backport #36435 (In Progress): mimic: rados rm --force-full is blocked when cluster is in full st...
- 09:02 AM Backport #36433 (In Progress): mimic: monstore tool rebuild does not generate creating_pgs
11/09/2018
- 10:08 PM Bug #36667: OSD object_map sync returned error
- Check dmesg for hardware errors, this is leveldb/rocksdb returning an error writing to disk. You may want to ask the ...
- 10:05 PM Bug #36677 (Resolved): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
- 10:05 PM Bug #36732 (Fix Under Review): tools/rados: fix segmentation fault
- https://github.com/ceph/ceph/pull/24990
- 08:55 PM Bug #36610 (Resolved): filestore merge collection replay problem
- 08:54 PM Bug #36748 (New): ms_deliver_verify_authorizer no AuthAuthorizeHandler found for protocol 0
- ...
- 05:18 PM Bug #36746 (New): Ignore osd_find_best_info_ignore_history_les for erasure-coded PGs
The only case that osd_find_best_info_ignore_history_les would work for erasure coded pools is if an interval didn'...- 09:29 AM Bug #36741 (Resolved): debian: packaging need to reflect move of /etc/bash_completion.d/radosgw-a...
- Hi,
Between version 12.0.2 and 12.0.3, the file /etc/bash_completion.d/radosgw-admin moved from the radosgw packag...
11/08/2018
- 11:34 PM Bug #36739: ENOENT in collection_move_rename on EC backfill target
- we create a gen object normally, on a backfill target,...
- 10:25 PM Bug #36739: ENOENT in collection_move_rename on EC backfill target
- 10:24 PM Bug #36739 (Resolved): ENOENT in collection_move_rename on EC backfill target
- ...
- 09:13 PM Feature #36737: Allow multi instances of "make tests" on the same machine
- @Kefu pls take a look, IIRC you mentioned that this may not be a big effort.
- 09:12 PM Feature #36737 (Resolved): Allow multi instances of "make tests" on the same machine
- Currently it's only possible to run `...make; make tests -j8; ctest ...` on the same machine.
Please consider chan... - 10:02 AM Bug #36732 (Resolved): tools/rados: fix segmentation fault
- when connected to ceph cluster, if call exit(1) directly, will
cause the finisher thread segmentation fault as follo...
11/07/2018
- 11:37 PM Feature #24917: Gracefully deal with upgrades when bluestore skipping of data_digest becomes active
Josh, this code needs to be written. It needs a feature bit AND a mon flag that can only be set when all OSDs are ...- 10:07 PM Backport #36729 (Resolved): mimic: Add support for osd_delete_sleep configuration value
- https://github.com/ceph/ceph/pull/25507
- 10:06 PM Feature #36474 (Pending Backport): Add support for osd_delete_sleep configuration value
- 04:40 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- Tests added:
https://github.com/ceph/ceph/pull/24954
https://github.com/ceph/ceph/pull/24938 - 04:27 PM Bug #36725 (Closed): luminous: Apparent Memory Leak in OSD
- Since last update (late October), been experiencing apparent memory leak in OSD process on two ceph servers in small ...
- 11:44 AM Backport #36432 (In Progress): mimic: Interactive mode CLI prints no output since Mimic
- 11:42 AM Backport #35843 (In Progress): mimic: objecter cannot resend split-dropped op when racing with co...
11/06/2018
- 01:22 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- /a/sage-2018-11-05_22:04:25-rados-wip-sage3-testing-2018-11-05-1406-distro-basic-smithi/3227352
- 11:54 AM Support #36326: Huge traffic spike and assert(is_primary())
- Thanks for the answer! It looks like traffic spike was caused by another issue: ceph-mon's db grows up to 15GB and it...
- 10:07 AM Bug #36709 (Closed): OSD stuck while flushing rocksdb WAL
- Hi all,
We use:
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
Clients work on:
... - 01:30 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- Quoting my reply to ceph-devel for reference:
"Nathan, I don't think we want to revert it for 13.2.2.
This is b...
11/05/2018
- 10:42 PM Bug #22902 (Resolved): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
- 10:32 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- So, the luminous revert was merged. Neha, will there be a mimic revert as well? Since the pg hard limit patches are p...
- 10:13 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- https://github.com/ceph/ceph/pull/24903 merged
- 10:28 PM Bug #36508 (Resolved): gperftools-libs-2.6.1-1 or newer required for binaries linked against corr...
- 10:28 PM Backport #36552 (Resolved): luminous: gperftools-libs-2.6.1-1 or newer required for binaries link...
- 10:10 PM Backport #36552: luminous: gperftools-libs-2.6.1-1 or newer required for binaries linked against ...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24706
merged - 10:25 PM Bug #34541 (Resolved): deep scrub cannot find the bitrot if the object is cached
- 10:25 PM Backport #35067 (Resolved): luminous: deep scrub cannot find the bitrot if the object is cached
- 10:08 PM Backport #35067: luminous: deep scrub cannot find the bitrot if the object is cached
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24802
merged - 10:18 PM Backport #36678 (Resolved): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state...
- 05:20 PM Feature #24917: Gracefully deal with upgrades when bluestore skipping of data_digest becomes active
- Let's include this with any other feature bit addition.
- 01:30 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- > I suspect it shouldn't.
But it does exactly that.
> That's will only re-copy the data to the HEAD revision.
...
11/04/2018
- 06:55 PM Bug #36677 (Fix Under Review): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')'...
- A fix is already available. See Sage's PR: https://github.com/ceph/ceph/pull/24835.
11/03/2018
- 11:27 PM Bug #24923 (Resolved): doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
- 11:27 PM Backport #25055 (Resolved): mimic: doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
- 11:26 PM Backport #35071 (In Progress): mimic: FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor...
- 04:42 AM Backport #23670 (In Progress): luminous: auth: ceph auth add does not sanity-check caps
- 04:24 AM Backport #23670 (New): luminous: auth: ceph auth add does not sanity-check caps
- Kefu did the jewel backport, so assigning this to him in hopes he'll pick it up.
- 04:00 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- -Also, is this bug reproducible in master and mimic as well? If not, the Backport field should probably be modified.....
- 03:58 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- Neha, 12.2.9 has already been cut, so we'll need to expedite 12.2.10 to push the revert out to users.
- 03:52 AM Backport #36678 (In Progress): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad st...
11/02/2018
- 11:57 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
- The immediate fix is to revert this for luminous before 12.2.9: https://github.com/ceph/ceph/pull/24903
- 11:51 PM Bug #36686 (Resolved): osd: pg log hard limit can cause crash during upgrade
- During an upgrade from an earlier version, a primary running the new code will send a trim_to value to a replica that...
- 05:14 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
- Ceph has already moved to C++17. The main question is: have we transitioned to C++17 also our public headers xor put ...
- 04:58 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
- The no-message-taking-variant of *static_assert* has been introduced in C++17. The code is being compiled with *-std=...
- 04:55 PM Bug #36677 (In Progress): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
- 05:14 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- Back-and-forth question answering like this is probably better for the mailing list (the ticket is currently closed F...
- 04:57 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- since you've identified that this is an RBD workload, assigning it to that project so that RBD team notices it. HTH.
- 02:37 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- Oops. That's more than 2 questions. But anyway :)
- 02:36 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- OK, I looked into OSD datastore using ceph-objectstore-tool and I see that for almost every object there are two copi...
- 01:39 PM Bug #24835: osd daemon spontaneous segfault
- We do use some configuration set by "ceph config set" or "ceph config-key set":...
11/01/2018
- 11:46 PM Backport #36678 (Resolved): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state...
- https://github.com/ceph/ceph/pull/24902
- 11:19 PM Bug #22902 (Pending Backport): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machin...
- Based on similar failures seen in luminous: http://pulpito.ceph.com/yuriw-2018-10-31_22:45:22-rados-wip-yuri4-testing...
- 09:10 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
- ...
- 09:06 PM Bug #36677 (Resolved): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
- ...
- 04:44 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
- Looking through the ceph/rocksdb repo I don't see how it's possible for rocksdb to be compiled without snappy support...
- 03:35 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
- This seems to be a problem where rocksdb on CentOS doesn't support snappy compression but the ceph-kvstore-tool is co...
- 06:14 AM Bug #36667 (New): OSD object_map sync returned error
- i deploy a cephfs and the used the vdbench tool to wirte data in cephfs mount point,after a while osd appears down.
...
10/31/2018
- 09:21 PM Bug #36411 (Closed): OSD crash starting recovery/backfill with EC pool
- It's my current belief that these objects were broken as a result of intentional metadata manipulation when some of t...
- 09:18 PM Bug #36572: ceph-in: --connect-timeout doesn't work while pinging mon
- New PR: https://github.com/ceph/ceph/pull/24733
- 09:17 PM Support #36584 (Closed): OSD Anomaly behaviour in ceph-reweight
- Are you running the command repeatedly? reweight-by-utilization does not provide a stable balance; it's really just a...
- 08:43 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
- https://github.com/ceph/ceph/pull/24868
- 05:35 PM Feature #36661: osd: add sanity check on startup to compare osd memory target to available memory...
- - in OSD::handle_conf_change, we should sanity check this against current memory available on the system and refuse t...
- 04:59 PM Feature #36661 (New): osd: add sanity check on startup to compare osd memory target to available ...
- This is needed so that we do not fail due to osd_memomory_target being set too high compared to the amount of memory ...
- 11:42 AM Backport #36658 (Resolved): mimic: Cache-tier forward mode hang in luminous (again)
- https://github.com/ceph/ceph/pull/25075
- 11:42 AM Backport #36657 (Resolved): luminous: Cache-tier forward mode hang in luminous (again)
- https://github.com/ceph/ceph/pull/25074
10/30/2018
- 08:08 PM Bug #36345 (Resolved): librados C API aio read empty buffer
- 08:07 PM Bug #36406 (Pending Backport): Cache-tier forward mode hang in luminous (again)
- 05:16 PM Backport #36647 (Resolved): mimic: librados api aio tests race condition
- https://github.com/ceph/ceph/pull/25027
- 05:16 PM Backport #36646 (Resolved): luminous: librados api aio tests race condition
- https://github.com/ceph/ceph/pull/25028
- 05:14 PM Backport #36637 (Resolved): mimic: osd: race condition opening heartbeat connection
- https://github.com/ceph/ceph/pull/25026
- 05:14 PM Backport #36636 (Resolved): luminous: osd: race condition opening heartbeat connection
- https://github.com/ceph/ceph/pull/25035
- 04:06 PM Bug #36634 (New): LibRadosWatchNotify.WatchNotify2Timeout failure
- ...
- 03:33 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- Yes, I'm using EC with RBD and partial overwrites enabled. CephFS pools are only created recently for tests and do no...
- 01:05 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- How are you writing these objects? Most sites that used EC were using RGW, but I don't see all the pools that go wit...
- 10:31 AM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- In fact it doesn't seem that it will self-heal, and nobody seems to care about it in the mailing list by now...)
C... - 02:33 PM Bug #36631 (In Progress): potential deadlock in PG::_scan_snaps when repairing snap mapper
- If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` ...
- 02:28 PM Backport #36630 (Resolved): luminous: potential deadlock in PG::_scan_snaps when repairing snap m...
- If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` ...
- 02:00 PM Bug #36629 (New): osd:the new file was stored in cache pool which mode was none
- ceph version:13.2.1
kernel client 4.17
I created the cache data pool as ceph's instructions:
(1) ceph osd tier add... - 01:41 AM Bug #36620: osd:the vim will be hanged when I saved the file
- the client: 4.17 kernel client
- 01:36 AM Bug #36620 (New): osd:the vim will be hanged when I saved the file
- ceph version: 13.2.1
situtation: the data pool tiered by a cache data pool and the cache tier pool's mode was read...
10/29/2018
- 10:33 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- Thanks for the response, I wrote to the mailing list ceph-users (is it the correct place?) :)
- 08:37 PM Support #36614 (Closed): Cluster uses substantially more space after rebalance (erasure codes)
- The mailing list is a better place to resolve this. My guess is data hasn't been cleaned up from its old locations ye...
- 12:13 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- How to heal it? If I don't heal it I'll need to purge the whole cluster? O_o...
- 12:12 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- ceph df output:...
- 11:11 AM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
- Proofs from our prometheus monitoring. Two graphs from yesterday: one with number of objects in cluster and other wit...
- 10:17 AM Support #36614 (Closed): Cluster uses substantially more space after rebalance (erasure codes)
- Hi
After I recreated one OSD + increased pg count of my erasure-coded (2+1) pool (which was way too low, only 100 ... - 10:21 PM Bug #36525: osd-scrub-snaps.sh failure
Looking at the log another scrub has made the number of "_scan_snaps start" in the log from 2 to 4. It results in ...- 01:06 AM Bug #36525: osd-scrub-snaps.sh failure
- /a/sage-2018-10-28_14:12:19-rados-master-distro-basic-smithi/3196520
another instance on current master - 09:48 PM Bug #23827 (Resolved): osd sends op_reply out of order
- 09:47 PM Backport #25010 (Resolved): mimic: osd sends op_reply out of order
- 08:47 PM Backport #25010: mimic: osd sends op_reply out of order
- https://github.com/ceph/ceph/pull/23136 has merged, can we resolve this issue?
- 09:43 PM Bug #25154 (Resolved): librados application's symbol could conflict with the libceph-common
- 09:42 PM Backport #26839 (Resolved): mimic: librados application's symbol could conflict with the libceph-...
- 08:21 PM Backport #26839: mimic: librados application's symbol could conflict with the libceph-common
- Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/24708
merged - 09:40 PM Bug #35969 (Resolved): "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- 09:39 PM Backport #36553 (Resolved): mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked ...
- 08:16 PM Backport #36553: mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked against cor...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24260
merged - 09:39 PM Backport #36132 (Resolved): mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on ...
- 08:16 PM Backport #36132: mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24260
merged - 08:47 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
- The above changes is not entirely correct. This section needs to be ommited:...
- 08:13 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
- Hello!
I've used the instruction created by Daniel Glasser and with some small code adjustments in a few files I w... - 04:17 PM Bug #36610 (Fix Under Review): filestore merge collection replay problem
- https://github.com/ceph/ceph/pull/24806
- 03:51 PM Bug #36610: filestore merge collection replay problem
- the osd is stopped during the merge operation:...
- 03:46 PM Bug #36182 (Resolved): osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is ...
- 02:59 PM Bug #36473 (Resolved): hung osd_repop, bluestore committed but failed to trigger repop_commit
- this is presumably https://github.com/ceph/ceph/pull/24761
- 02:58 PM Bug #36548 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh
- 01:34 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
- /a/sage-2018-10-29_01:11:58-rados-wip-sage-testing-2018-10-28-0943-distro-basic-smithi/3197984
- 01:10 AM Bug #36408 (Resolved): [cache tier] failed guarded write + promotion results in "success" op result
10/28/2018
- 02:40 PM Bug #36602 (Pending Backport): osd: race condition opening heartbeat connection
- 02:37 PM Bug #36610 (Resolved): filestore merge collection replay problem
- /a/sage-2018-10-27_02:10:33-rados-wip-sage-testing-2018-10-26-1411-distro-basic-smithi/3188976
osd.3 was partway t...
10/26/2018
- 07:24 PM Feature #24591: FileStore hasn't impl to get kv-db's statistics
- Jack Lv wrote:
> https://github.com/ceph/ceph/pull/22633
merged - 06:30 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
- https://github.com/ceph/ceph/pull/24761
- 05:41 PM Bug #36602: osd: race condition opening heartbeat connection
- 03:39 PM Bug #36602 (Fix Under Review): osd: race condition opening heartbeat connection
- https://github.com/ceph/ceph/pull/24780
- 03:37 PM Bug #36602 (Resolved): osd: race condition opening heartbeat connection
- ...
- 05:10 PM Bug #20694: osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_log().get_log().obje...
- /a/yuriw-2018-10-25_15:31:28-rados-wip-yuri4-testing-2018-10-24-2310-mimic-distro-basic-smithi/3183476/
- 04:22 PM Bug #36345 (Fix Under Review): librados C API aio read empty buffer
- imirc tw, thank you for your analysis. i am approving https://github.com/ceph/ceph/pull/24534. so "unshared buffer" o...
- 09:52 AM Bug #36345: librados C API aio read empty buffer
- I figured it out. In Objecter.cc:3279...
- 09:02 AM Bug #36345: librados C API aio read empty buffer
- without osd_op_timeout, in Objecter::handle_osd_op_reply, Objecter.cc:3473
op->con px is an AsyncConnection on whic... - 07:54 AM Bug #36345: librados C API aio read empty buffer
- Some more info from what I can see while debugging.
Without 'rados osd op timeout', the buffer in librados::IoCtx... - 02:18 PM Bug #24587 (Pending Backport): librados api aio tests race condition
- 11:01 AM Bug #24180 (Resolved): mon: slow op on log message
- 11:01 AM Backport #24293 (Resolved): jewel: mon: slow op on log message
- 06:42 AM Bug #24835: osd daemon spontaneous segfault
- Our ceph.conf:...
- 03:46 AM Bug #24615 (Need More Info): error message for 'unable to find any IP address' not shown
- Francois,
Can you try reproducing your issue on the latest master?
I fixed a similar issue in master and also fro... - 03:28 AM Bug #24615 (In Progress): error message for 'unable to find any IP address' not shown
- 02:34 AM Bug #25153 (Resolved): output format is invalid of the crush tree json dumper
- 02:33 AM Backport #36149 (Resolved): luminous: output format is invalid of the crush tree json dumper
- 02:33 AM Bug #35845 (Resolved): osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- 02:32 AM Backport #36393 (Resolved): luminous: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- 02:30 AM Bug #36183 (Resolved): [objecter] client socket failure leads to hung connection
- 02:30 AM Backport #36295 (Resolved): luminous: [objecter] client socket failure leads to hung connection
- 02:29 AM Bug #21931 (Resolved): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range...
- 02:29 AM Backport #36440 (Resolved): luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + le...
- 02:28 AM Bug #22330 (Resolved): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 02:28 AM Backport #36438 (Resolved): luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 02:27 AM Bug #36417 (Resolved): osd: get loadavg per cpu for scrub load threshold check
- 02:27 AM Backport #36419 (Resolved): luminous: osd: get loadavg per cpu for scrub load threshold check
- 02:26 AM Bug #36174 (Resolved): ceph pg ls creating: EINVAL
- 02:26 AM Backport #36297 (Resolved): luminous: ceph pg ls creating: EINVAL
- 02:25 AM Bug #23614 (Resolved): local_reserver double-reservation of backfilled pg
- 02:25 AM Backport #24333 (Resolved): luminous: local_reserver double-reservation of backfilled pg
- 02:24 AM Backport #26932 (Resolved): luminous: scrub livelock
10/25/2018
- 10:22 PM Backport #36149: luminous: output format is invalid of the crush tree json dumper
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24482
merged - 10:21 PM Backport #36393: luminous: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/24532
merged - 10:20 PM Backport #36295: luminous: [objecter] client socket failure leads to hung connection
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24574
merged - 10:20 PM Backport #36440: luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (r...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24582
merged - 10:20 PM Backport #36438: luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24582
merged - 10:19 PM Backport #36419: luminous: osd: get loadavg per cpu for scrub load threshold check
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/24593
merged - 10:19 PM Backport #36297: luminous: ceph pg ls creating: EINVAL
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24602
mergedReviewed-by: Neha Ojha <nojha@redhat.com> - 10:18 PM Bug #26890: scrub livelock
- merged https://github.com/ceph/ceph/pull/24659
- 08:01 PM Bug #36345: librados C API aio read empty buffer
- Kefu Chai, same happens on master:...
- 04:36 PM Bug #36345: librados C API aio read empty buffer
- 13.2.2 , i will give it a go on master asap.
- 03:57 PM Bug #36345: librados C API aio read empty buffer
- imirc tw, on which release did you reproduce this issue? is master affected?
- 01:18 PM Bug #36345: librados C API aio read empty buffer
- Hi Kefu,
I'm not that deep into the Ceph code, I was making an assumption based on my observations and past ticket... - 08:52 AM Bug #36345: librados C API aio read empty buffer
- imirc tw, i don't understand how "rados_osd_op_timeout" is related to this issue. i agree that current @librados::IoC...
- 07:02 AM Bug #36345: librados C API aio read empty buffer
- Hi Wido,
The 2nd assumption isn't true, that was because the client.admin ceph.conf file used didn't had the osd_o... - 06:51 AM Bug #36345: librados C API aio read empty buffer
- Updating this ticket as the issue seems to be related to two things:
- When using osd_op_timeout
- When using a u... - 06:21 PM Bug #36598 (Can't reproduce): osd: "bluestore(/var/lib/ceph/osd/ceph-6) ENOENT on clone suggests ...
- ...
- 04:20 PM Backport #24333: luminous: local_reserver double-reservation of backfilled pg
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23493
merged - 05:18 AM Feature #36474: Add support for osd_delete_sleep configuration value
- https://github.com/ceph/ceph/pull/24749
10/24/2018
- 09:43 PM Bug #25182: Upmaps forgotten after restarting OSDs
- One thing I've noticed after living with this for a while is that the upmap entries that are forgotten are always for...
- 09:31 PM Bug #36517: client crashes osd with empty object name
- Attached
- 09:17 PM Bug #36517: client crashes osd with empty object name
- Noah, the paste doesn't show now, could you paste the trace in the tracker.
- 09:21 PM Bug #24485 (Resolved): LibRadosTwoPoolsPP.ManifestUnset failure
- 09:11 PM Bug #36166 (Resolved): pg merge can collide with remapped, upmap pgs
- 02:29 PM Bug #36345: librados C API aio read empty buffer
- Kefu, I'm also experiencing this issue. It seems to be related to `rados osd op timeout`. Once this value is set in t...
- 11:01 AM Support #36584 (Closed): OSD Anomaly behaviour in ceph-reweight
- ceph version 10.2.5
We have this behaviour with 2 OSDs in cluster making a backfilling bucle.
I'm executing thi... - 10:35 AM Bug #19348 (Can't reproduce): "ceph ping mon.c" cli prints assertion failure on timeout
- not able to reproduce with master HEAD anymore.
- 10:34 AM Bug #19348: "ceph ping mon.c" cli prints assertion failure on timeout
- https://github.com/ceph/ceph/pull/24733
10/23/2018
- 09:36 PM Bug #36040: mon: Valgrind: mon (InvalidFree, InvalidWrite, InvalidRead)
- /ceph/teuthology-archive/pdonnell-2018-10-17_19:54:38-multimds-wip-pdonnell-testing-20181017.175152-distro-basic-smit...
- 09:30 PM Bug #36497: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::replace()
- /ceph/teuthology-archive/pdonnell-2018-10-17_19:54:38-multimds-wip-pdonnell-testing-20181017.175152-distro-basic-smit...
- 09:28 PM Bug #36411: OSD crash starting recovery/backfill with EC pool
- I have to add to the previous update, which did not explain the resolution of the problem.
The true solution was w... - 08:27 PM Bug #24587 (Fix Under Review): librados api aio tests race condition
- https://github.com/ceph/ceph/pull/24724
- 07:53 PM Bug #36572: ceph-in: --connect-timeout doesn't work while pinging mon
- Submitted a "PR":https://github.com/ceph/ceph/pull/24723 for this.
- 07:44 PM Bug #36572 (Closed): ceph-in: --connect-timeout doesn't work while pinging mon
- Saw the following output while working on "PR 21432":https://github.com/ceph/ceph/pull/21432 -...
- 03:53 PM Bug #36548: qa/standalone/osd/osd-rep-recov-eio.sh
- The failed run did not include the changes in https://github.com/ceph/ceph/pull/24651 (master). This pull request mi...
- 01:43 AM Bug #36548 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh
- pg ended up in active+clean, not recovery_unfound
/a/sage-2018-10-22_21:29:13-rados-wip-sage-testing-2018-10-22-11... - 06:04 AM Backport #36553 (In Progress): mimic: gperftools-libs-2.6.1-1 or newer required for binaries link...
- 05:44 AM Backport #36553 (Resolved): mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked ...
- https://github.com/ceph/ceph/pull/24260
- 05:52 AM Backport #36552 (In Progress): luminous: gperftools-libs-2.6.1-1 or newer required for binaries l...
- 05:43 AM Backport #36552 (Resolved): luminous: gperftools-libs-2.6.1-1 or newer required for binaries link...
- https://github.com/ceph/ceph/pull/24706
- 05:45 AM Backport #36557 (Resolved): mimic: RBD client IOPS pool stats are incorrect (2x higher; includes ...
- https://github.com/ceph/ceph/pull/25024
- 05:45 AM Backport #36556 (Resolved): luminous: RBD client IOPS pool stats are incorrect (2x higher; includ...
- https://github.com/ceph/ceph/pull/25025
- 05:43 AM Backport #35909 (Resolved): mimic: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- 05:31 AM Backport #36439 (Resolved): mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + lengt...
- 05:31 AM Backport #36437 (Resolved): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- 05:30 AM Backport #36296 (Resolved): mimic: [objecter] client socket failure leads to hung connection
- 05:30 AM Backport #36298 (Resolved): mimic: ceph pg ls creating: EINVAL
- 04:41 AM Bug #24835: osd daemon spontaneous segfault
- I'd say the cause of most, if not all, of these crashes is memory corruption caused by code responsible for manipulat...
- 04:31 AM Bug #24835: osd daemon spontaneous segfault
- The 'safe_timer.5246' is again similar but this time tcmalloc is 'popping' a
single value rather than a range.
<p... - 03:54 AM Bug #24835: osd daemon spontaneous segfault
- The 'msgr-worker-1.5278' is almost identical to 'tp_osd_tp' except this time 'i'
= 499 so doing that manually is bey... - 01:58 AM Bug #24835: osd daemon spontaneous segfault
- For the rest of the coredumps adding the debuginfo for libtcmalloc really helps
to understand the problem as we end ... - 12:53 AM Bug #24835: osd daemon spontaneous segfault
- Starting with the bluestore bufferlist destructor crash....
10/22/2018
- 11:39 PM Bug #36508 (Pending Backport): gperftools-libs-2.6.1-1 or newer required for binaries linked agai...
- 11:38 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
- Haven't been able to reproduce this on luminous and mimic, so clearing the Backport fields for now.
- 07:05 PM Bug #24909 (Pending Backport): RBD client IOPS pool stats are incorrect (2x higher; includes IO h...
- 03:40 PM Backport #35909: mimic: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/24017
merged - 03:35 PM Backport #36439: mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (rang...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24581
merged - 03:35 PM Backport #36437: mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24581
merged - 03:34 PM Backport #36296: mimic: [objecter] client socket failure leads to hung connection
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24600
merged - 03:32 PM Backport #36298: mimic: ceph pg ls creating: EINVAL
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24601
merged - 02:31 PM Bug #24956 (Resolved): osd: parent process need to restart log service after fork, or ceph-osd wi...
- 02:25 PM Bug #36546 (Duplicate): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back...
- ...
- 02:21 PM Bug #36485 (Resolved): dump-stuck.yaml fails assert len(inactive) == num_inactive
10/21/2018
- 03:53 PM Bug #36485 (Fix Under Review): dump-stuck.yaml fails assert len(inactive) == num_inactive
- https://github.com/ceph/ceph/pull/24689
- 09:25 AM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
- https://github.com/ceph/ceph/pull/24687
10/20/2018
- 09:43 PM Bug #22144: *** Caught signal (Aborted) ** in thread thread_name:tp_peering
- we can confirm we are experiencing the same issue on version 12.2.7 and currently have some random osds that went off...
10/19/2018
- 09:12 PM Bug #16279 (Closed): assert(objiter->second->version > last_divergent_update) failed
- Closing this ticket since the linked PR was also closed.
- 09:09 PM Bug #17252: [Librados] Deadlock on RadosClient::watch_flush
- 08:18 PM Bug #24368: osd: should not restart on permanent failures
- Clearing backport field on the assumption that's what was intended by the previous edit.
- 08:01 PM Bug #24368 (Resolved): osd: should not restart on permanent failures
- Okay, after discussing with CERN I've merged the PR to master so this isn't an issue going forward.
But unfortunat...
Also available in: Atom