Project

General

Profile

Activity

From 10/21/2018 to 11/19/2018

11/19/2018

10:57 PM Bug #36667: OSD object_map sync returned error
This might also indicate something screwe dup the file permissions or ownership in /var/lib/ceph/osd/ceph-10. maybe ... Sage Weil
10:56 PM Bug #36709 (Need More Info): OSD stuck while flushing rocksdb WAL
I'm not sure know rocksdb is what's stuck.. can you dump 'ceph daemon osd.NNN ops' to see what state teh oeprations a... Sage Weil
10:54 PM Bug #37264: scrub warning check incorrectly uses mon scrub interval
You should be able to get the pool info out of the monitor's OSDMap, if that was a question... :) Greg Farnum
10:51 PM Bug #37289: Issue with overfilled OSD for cache-tier pools
I think teh first question to answer is if this can be reproduced without cache tiering. It's not immediately clear ... Sage Weil
10:48 PM Bug #37326 (Need More Info): Daily inconsistent objects
Is this happening on the same disk all the time, or the same node? If so, that suggests a piece of hardware (e.g. con... Josh Durgin
10:31 AM Bug #37326 (Need More Info): Daily inconsistent objects
We have many Ceph mimic 13.2.1 installed with a similar configuration on ubuntu, but on one of them we get inconsiste... Greg Smith
10:48 PM Bug #36304 (Can't reproduce): FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_w...
I'm guessing this was fixed by 450f337d6fd048c8c95a0ec0dec0d97f5474922e Sage Weil
10:43 PM Bug #36598: osd: "bluestore(/var/lib/ceph/osd/ceph-6) ENOENT on clone suggests osd bug"
Sage thinks this might also be #36739. Greg Farnum
10:40 PM Bug #36686 (In Progress): osd: pg log hard limit can cause crash during upgrade
Sage Weil
10:40 PM Bug #36725 (Need More Info): luminous: Apparent Memory Leak in OSD
can you dump the mempools (ceph daemon osd.NNN dump_mempools) several times over the growht of the process so we can ... Sage Weil
07:15 PM Bug #37269 (Pending Backport): Prioritize user specified scrubs
Sage Weil
04:47 PM Bug #37329 (Pending Backport): doc: Add bluestore memory autotuning docs
Neha Ojha
04:44 PM Bug #37329 (Resolved): doc: Add bluestore memory autotuning docs
https://github.com/ceph/ceph/pull/25069 Neha Ojha

11/17/2018

03:45 AM Bug #37299 (New): ceph-disk: ceph osd start failed: Command '['/usr/bin/systemctl', 'disable', 'c...
Please see the details at:
https://bugzilla.redhat.com/show_bug.cgi?id=1649208#c0
Han Han

11/16/2018

12:47 PM Bug #37289 (New): Issue with overfilled OSD for cache-tier pools
We have bad issue in our ceph cluster.
Centos 7.5 (3.10.0-862.3.2.el7.x86_64)
Luminous 12.2.5, bluestore OSDs, us...
Oleksandr Mykhalskyi
11:35 AM Backport #37288 (Resolved): mimic: "sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-0'" fails in u...
https://github.com/ceph/ceph/pull/25227 Nathan Cutler
10:34 AM Bug #16500 (Resolved): ceph_erasure_code_benchmark parameter checking error for LRC plugin
Kefu Chai
06:22 AM Bug #22597 (Pending Backport): "sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-0'" fails in upgra...
Sage Weil
04:53 AM Bug #36767 (Fix Under Review): OSD: unrecoverable heartbeat connections
Kefu Chai
02:53 AM Feature #23493: config: strip/escape single-quotes in values when setting them via conf file/assi...
Joao,
Could you take a look at https://github.com/ceph/ceph/pull/20610 and see whether you consider it something t...
Brad Hubbard
01:59 AM Bug #37264: scrub warning check incorrectly uses mon scrub interval

The scrub warning also doesn't consider the pool specific scrub interval if specified. The scrub code gets the p...
David Zafman

11/15/2018

01:16 PM Bug #25146 (Resolved): "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:paralle...
Kefu Chai
11:36 AM Backport #37273 (In Progress): mimic: debian: packaging need to reflect move of /etc/bash_complet...
Nathan Cutler
10:47 AM Backport #37273: mimic: debian: packaging need to reflect move of /etc/bash_completion.d/radosgw-...
PR with this backport is https://github.com/ceph/ceph/pull/25115 Matthew Vernon
09:44 AM Backport #37273 (Resolved): mimic: debian: packaging need to reflect move of /etc/bash_completion...
https://github.com/ceph/ceph/pull/25115 Nathan Cutler
10:36 AM Backport #37274 (In Progress): luminous: debian: packaging need to reflect move of /etc/bash_comp...
Nathan Cutler
09:45 AM Backport #37274 (Resolved): luminous: debian: packaging need to reflect move of /etc/bash_complet...
https://github.com/ceph/ceph/pull/24997 Nathan Cutler
09:38 AM Bug #36725: luminous: Apparent Memory Leak in OSD
raising priority since this might be a regression in 12.2.9 Nathan Cutler
06:31 AM Bug #36741 (Pending Backport): debian: packaging need to reflect move of /etc/bash_completion.d/r...
https://github.com/ceph/ceph/pull/24996 Kefu Chai
06:20 AM Bug #37269 (Resolved): Prioritize user specified scrubs

When scrubs start backing up, when a user asks for a scrub it doesn't get priority compared to overdue scrubs. The...
David Zafman
06:14 AM Bug #37264 (Resolved): scrub warning check incorrectly uses mon scrub interval

When checking the mon_warn_not_scrubbed the mon_scrub_interval is used instead of osd_scrub_max_interval.
David Zafman

11/14/2018

08:01 PM Bug #36725: luminous: Apparent Memory Leak in OSD
Note: Downgrading both OSD servers to v12.2.8 returned memory usage to normal. John Jaser
11:43 AM Backport #36636: luminous: osd: race condition opening heartbeat connection
std::lock_guard is a C++11 feature: https://en.cppreference.com/w/cpp/header/mutex Patrick Donnelly

11/13/2018

02:23 PM Backport #36658 (In Progress): mimic: Cache-tier forward mode hang in luminous (again)
Jonathan Brielmaier
02:15 PM Backport #36657 (In Progress): luminous: Cache-tier forward mode hang in luminous (again)
Jonathan Brielmaier
11:57 AM Bug #36388: osd: "out of order op"
This looks like the dup op entries were exceeded so the op was not detected as a dup. Perhaps we should increase the ... Josh Durgin
04:55 AM Bug #25146: "rocksdb: Corruption: Can't access /000000.sst" in upgrade:mimic-x:parallel-master-di...
https://github.com/ceph/ceph/pull/25070 Kefu Chai

11/12/2018

03:41 PM Bug #36767: OSD: unrecoverable heartbeat connections
Pull request:
https://github.com/ceph/ceph/pull/25061
Yury Z
03:09 PM Bug #36767 (Fix Under Review): OSD: unrecoverable heartbeat connections
There are several unrecoverable heartbeat connections according to logs.
They usually appears after problems/reprodu...
Yury Z
07:05 AM Bug #36758 (Duplicate): aborts in rocksdb::TableFileName() in mimic-x upgrade test suite
Brad Hubbard
05:26 AM Bug #36758: aborts in rocksdb::TableFileName() in mimic-x upgrade test suite
i think it's a dup of #25146 Kefu Chai
02:57 AM Bug #16500 (Fix Under Review): ceph_erasure_code_benchmark parameter checking error for LRC plugin
https://github.com/ceph/ceph/pull/25046 Kefu Chai

11/10/2018

10:01 PM Bug #36758: aborts in rocksdb::TableFileName() in mimic-x upgrade test suite
marking it "urgent", as it can be consistently reproducible. and it renders the cluster unusable after upgrading from... Kefu Chai
06:11 PM Bug #36758 (Duplicate): aborts in rocksdb::TableFileName() in mimic-x upgrade test suite
... Kefu Chai
02:33 PM Backport #36636 (In Progress): luminous: osd: race condition opening heartbeat connection
Nathan Cutler
11:46 AM Backport #36636 (Need More Info): luminous: osd: race condition opening heartbeat connection
The master commit uses std::lock_guard, which is a C++17-ism, and this makes the backport non-trivial (?) Nathan Cutler
12:42 PM Subtask #36091 (Resolved): [rbd top] collect client perf stats when query is enabled
*PR*: https://github.com/ceph/ceph/pull/24265 Jason Dillaman
11:56 AM Backport #36646 (In Progress): luminous: librados api aio tests race condition
Nathan Cutler
11:52 AM Backport #36647 (In Progress): mimic: librados api aio tests race condition
Nathan Cutler
11:40 AM Backport #36637 (In Progress): mimic: osd: race condition opening heartbeat connection
Nathan Cutler
11:38 AM Backport #36556 (In Progress): luminous: RBD client IOPS pool stats are incorrect (2x higher; inc...
Nathan Cutler
11:37 AM Backport #36557 (In Progress): mimic: RBD client IOPS pool stats are incorrect (2x higher; includ...
Nathan Cutler
10:19 AM Backport #36506 (In Progress): luminous: mon osdmap cash too small during upgrade to mimic
Nathan Cutler
10:05 AM Backport #36505 (In Progress): mimic: mon osdmap cash too small during upgrade to mimic
Nathan Cutler
09:59 AM Backport #36436 (In Progress): luminous: rados rm --force-full is blocked when cluster is in full...
Nathan Cutler
09:54 AM Backport #36435 (In Progress): mimic: rados rm --force-full is blocked when cluster is in full st...
Nathan Cutler
09:02 AM Backport #36433 (In Progress): mimic: monstore tool rebuild does not generate creating_pgs
Nathan Cutler

11/09/2018

10:08 PM Bug #36667: OSD object_map sync returned error
Check dmesg for hardware errors, this is leveldb/rocksdb returning an error writing to disk. You may want to ask the ... Josh Durgin
10:05 PM Bug #36677 (Resolved): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
Josh Durgin
10:05 PM Bug #36732 (Fix Under Review): tools/rados: fix segmentation fault
https://github.com/ceph/ceph/pull/24990 Josh Durgin
08:55 PM Bug #36610 (Resolved): filestore merge collection replay problem
Sage Weil
08:54 PM Bug #36748 (New): ms_deliver_verify_authorizer no AuthAuthorizeHandler found for protocol 0
... Sage Weil
05:18 PM Bug #36746 (New): Ignore osd_find_best_info_ignore_history_les for erasure-coded PGs

The only case that osd_find_best_info_ignore_history_les would work for erasure coded pools is if an interval didn'...
David Zafman
09:29 AM Bug #36741 (Resolved): debian: packaging need to reflect move of /etc/bash_completion.d/radosgw-a...
Hi,
Between version 12.0.2 and 12.0.3, the file /etc/bash_completion.d/radosgw-admin moved from the radosgw packag...
Matthew Vernon

11/08/2018

11:34 PM Bug #36739: ENOENT in collection_move_rename on EC backfill target
we create a gen object normally, on a backfill target,... Sage Weil
10:25 PM Bug #36739: ENOENT in collection_move_rename on EC backfill target
Sage Weil
10:24 PM Bug #36739 (Resolved): ENOENT in collection_move_rename on EC backfill target
... Sage Weil
09:13 PM Feature #36737: Allow multi instances of "make tests" on the same machine
@Kefu pls take a look, IIRC you mentioned that this may not be a big effort. Yuri Weinstein
09:12 PM Feature #36737 (Resolved): Allow multi instances of "make tests" on the same machine
Currently it's only possible to run `...make; make tests -j8; ctest ...` on the same machine.
Please consider chan...
Yuri Weinstein
10:02 AM Bug #36732 (Resolved): tools/rados: fix segmentation fault
when connected to ceph cluster, if call exit(1) directly, will
cause the finisher thread segmentation fault as follo...
Li Wang

11/07/2018

11:37 PM Feature #24917: Gracefully deal with upgrades when bluestore skipping of data_digest becomes active

Josh, this code needs to be written. It needs a feature bit AND a mon flag that can only be set when all OSDs are ...
David Zafman
10:07 PM Backport #36729 (Resolved): mimic: Add support for osd_delete_sleep configuration value
https://github.com/ceph/ceph/pull/25507 David Zafman
10:06 PM Feature #36474 (Pending Backport): Add support for osd_delete_sleep configuration value
David Zafman
04:40 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
Tests added:
https://github.com/ceph/ceph/pull/24954
https://github.com/ceph/ceph/pull/24938
Yuri Weinstein
04:27 PM Bug #36725 (Closed): luminous: Apparent Memory Leak in OSD
Since last update (late October), been experiencing apparent memory leak in OSD process on two ceph servers in small ... John Jaser
11:44 AM Backport #36432 (In Progress): mimic: Interactive mode CLI prints no output since Mimic
Nathan Cutler
11:42 AM Backport #35843 (In Progress): mimic: objecter cannot resend split-dropped op when racing with co...
Nathan Cutler

11/06/2018

01:22 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
/a/sage-2018-11-05_22:04:25-rados-wip-sage3-testing-2018-11-05-1406-distro-basic-smithi/3227352 Sage Weil
11:54 AM Support #36326: Huge traffic spike and assert(is_primary())
Thanks for the answer! It looks like traffic spike was caused by another issue: ceph-mon's db grows up to 15GB and it... Aleksei Zakharov
10:07 AM Bug #36709 (Closed): OSD stuck while flushing rocksdb WAL
Hi all,
We use:
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
Clients work on:
...
Aleksei Zakharov
01:30 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
Quoting my reply to ceph-devel for reference:
"Nathan, I don't think we want to revert it for 13.2.2.
This is b...
Neha Ojha

11/05/2018

10:42 PM Bug #22902 (Resolved): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machine event")
Nathan Cutler
10:32 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
So, the luminous revert was merged. Neha, will there be a mimic revert as well? Since the pg hard limit patches are p... Nathan Cutler
10:13 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
https://github.com/ceph/ceph/pull/24903 merged Yuri Weinstein
10:28 PM Bug #36508 (Resolved): gperftools-libs-2.6.1-1 or newer required for binaries linked against corr...
Nathan Cutler
10:28 PM Backport #36552 (Resolved): luminous: gperftools-libs-2.6.1-1 or newer required for binaries link...
Nathan Cutler
10:10 PM Backport #36552: luminous: gperftools-libs-2.6.1-1 or newer required for binaries linked against ...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24706
merged
Yuri Weinstein
10:25 PM Bug #34541 (Resolved): deep scrub cannot find the bitrot if the object is cached
Nathan Cutler
10:25 PM Backport #35067 (Resolved): luminous: deep scrub cannot find the bitrot if the object is cached
Nathan Cutler
10:08 PM Backport #35067: luminous: deep scrub cannot find the bitrot if the object is cached
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24802
merged
Yuri Weinstein
10:18 PM Backport #36678 (Resolved): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state...
David Zafman
05:20 PM Feature #24917: Gracefully deal with upgrades when bluestore skipping of data_digest becomes active
Let's include this with any other feature bit addition. David Zafman
01:30 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
> I suspect it shouldn't.
But it does exactly that.
> That's will only re-copy the data to the HEAD revision.
...
Vitaliy Filippov

11/04/2018

06:55 PM Bug #36677 (Fix Under Review): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')'...
A fix is already available. See Sage's PR: https://github.com/ceph/ceph/pull/24835. Radoslaw Zarzynski

11/03/2018

11:27 PM Bug #24923 (Resolved): doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
Nathan Cutler
11:27 PM Backport #25055 (Resolved): mimic: doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
Nathan Cutler
11:26 PM Backport #35071 (In Progress): mimic: FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor...
Nathan Cutler
04:42 AM Backport #23670 (In Progress): luminous: auth: ceph auth add does not sanity-check caps
Kefu Chai
04:24 AM Backport #23670 (New): luminous: auth: ceph auth add does not sanity-check caps
Kefu did the jewel backport, so assigning this to him in hopes he'll pick it up. Nathan Cutler
04:00 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
-Also, is this bug reproducible in master and mimic as well? If not, the Backport field should probably be modified..... Nathan Cutler
03:58 AM Bug #36686: osd: pg log hard limit can cause crash during upgrade
Neha, 12.2.9 has already been cut, so we'll need to expedite 12.2.10 to push the revert out to users. Nathan Cutler
03:52 AM Backport #36678 (In Progress): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad st...
Nathan Cutler

11/02/2018

11:57 PM Bug #36686: osd: pg log hard limit can cause crash during upgrade
The immediate fix is to revert this for luminous before 12.2.9: https://github.com/ceph/ceph/pull/24903
Neha Ojha
11:51 PM Bug #36686 (Resolved): osd: pg log hard limit can cause crash during upgrade
During an upgrade from an earlier version, a primary running the new code will send a trim_to value to a replica that... Josh Durgin
05:14 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
Ceph has already moved to C++17. The main question is: have we transitioned to C++17 also our public headers xor put ... Radoslaw Zarzynski
04:58 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
The no-message-taking-variant of *static_assert* has been introduced in C++17. The code is being compiled with *-std=... Radoslaw Zarzynski
04:55 PM Bug #36677 (In Progress): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
Radoslaw Zarzynski
05:14 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
Back-and-forth question answering like this is probably better for the mailing list (the ticket is currently closed F... Jason Dillaman
04:57 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
since you've identified that this is an RBD workload, assigning it to that project so that RBD team notices it. HTH. Ben England
02:37 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
Oops. That's more than 2 questions. But anyway :) Vitaliy Filippov
02:36 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
OK, I looked into OSD datastore using ceph-objectstore-tool and I see that for almost every object there are two copi... Vitaliy Filippov
01:39 PM Bug #24835: osd daemon spontaneous segfault
We do use some configuration set by "ceph config set" or "ceph config-key set":... Soenke Schippmann

11/01/2018

11:46 PM Backport #36678 (Resolved): luminous: src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state...
https://github.com/ceph/ceph/pull/24902 David Zafman
11:19 PM Bug #22902 (Pending Backport): src/osd/PG.cc: 6455: FAILED assert(0 == "we got a bad state machin...
Based on similar failures seen in luminous: http://pulpito.ceph.com/yuriw-2018-10-31_22:45:22-rados-wip-yuri4-testing... Neha Ojha
09:10 PM Bug #36677: /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
... Neha Ojha
09:06 PM Bug #36677 (Resolved): /usr/include/rados/buffer.h:657:61: error: expected ',' before ')' token
... Neha Ojha
04:44 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
Looking through the ceph/rocksdb repo I don't see how it's possible for rocksdb to be compiled without snappy support... David Turner
03:35 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
This seems to be a problem where rocksdb on CentOS doesn't support snappy compression but the ceph-kvstore-tool is co... David Turner
06:14 AM Bug #36667 (New): OSD object_map sync returned error
i deploy a cephfs and the used the vdbench tool to wirte data in cephfs mount point,after a while osd appears down.
...
yp dai

10/31/2018

09:21 PM Bug #36411 (Closed): OSD crash starting recovery/backfill with EC pool
It's my current belief that these objects were broken as a result of intentional metadata manipulation when some of t... Greg Farnum
09:18 PM Bug #36572: ceph-in: --connect-timeout doesn't work while pinging mon
New PR: https://github.com/ceph/ceph/pull/24733 Greg Farnum
09:17 PM Support #36584 (Closed): OSD Anomaly behaviour in ceph-reweight
Are you running the command repeatedly? reweight-by-utilization does not provide a stable balance; it's really just a... Greg Farnum
08:43 PM Bug #21496: doc: Manually editing a CRUSH map, Word 'type' missing.
https://github.com/ceph/ceph/pull/24868 Sage Weil
05:35 PM Feature #36661: osd: add sanity check on startup to compare osd memory target to available memory...
- in OSD::handle_conf_change, we should sanity check this against current memory available on the system and refuse t... Sage Weil
04:59 PM Feature #36661 (New): osd: add sanity check on startup to compare osd memory target to available ...
This is needed so that we do not fail due to osd_memomory_target being set too high compared to the amount of memory ... Neha Ojha
11:42 AM Backport #36658 (Resolved): mimic: Cache-tier forward mode hang in luminous (again)
https://github.com/ceph/ceph/pull/25075 Nathan Cutler
11:42 AM Backport #36657 (Resolved): luminous: Cache-tier forward mode hang in luminous (again)
https://github.com/ceph/ceph/pull/25074 Nathan Cutler

10/30/2018

08:08 PM Bug #36345 (Resolved): librados C API aio read empty buffer
Sage Weil
08:07 PM Bug #36406 (Pending Backport): Cache-tier forward mode hang in luminous (again)
Sage Weil
05:16 PM Backport #36647 (Resolved): mimic: librados api aio tests race condition
https://github.com/ceph/ceph/pull/25027 Patrick Donnelly
05:16 PM Backport #36646 (Resolved): luminous: librados api aio tests race condition
https://github.com/ceph/ceph/pull/25028 Patrick Donnelly
05:14 PM Backport #36637 (Resolved): mimic: osd: race condition opening heartbeat connection
https://github.com/ceph/ceph/pull/25026 Patrick Donnelly
05:14 PM Backport #36636 (Resolved): luminous: osd: race condition opening heartbeat connection
https://github.com/ceph/ceph/pull/25035 Patrick Donnelly
04:06 PM Bug #36634 (New): LibRadosWatchNotify.WatchNotify2Timeout failure
... Sage Weil
03:33 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
Yes, I'm using EC with RBD and partial overwrites enabled. CephFS pools are only created recently for tests and do no... Vitaliy Filippov
01:05 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
How are you writing these objects? Most sites that used EC were using RGW, but I don't see all the pools that go wit... Ben England
10:31 AM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
In fact it doesn't seem that it will self-heal, and nobody seems to care about it in the mailing list by now...)
C...
Vitaliy Filippov
02:33 PM Bug #36631 (In Progress): potential deadlock in PG::_scan_snaps when repairing snap mapper
If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` ... Mykola Golub
02:28 PM Backport #36630 (Resolved): luminous: potential deadlock in PG::_scan_snaps when repairing snap m...
If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` ... Mykola Golub
02:00 PM Bug #36629 (New): osd:the new file was stored in cache pool which mode was none
ceph version:13.2.1
kernel client 4.17
I created the cache data pool as ceph's instructions:
(1) ceph osd tier add...
qinglong li
01:41 AM Bug #36620: osd:the vim will be hanged when I saved the file
the client: 4.17 kernel client qinglong li
01:36 AM Bug #36620 (New): osd:the vim will be hanged when I saved the file
ceph version: 13.2.1
situtation: the data pool tiered by a cache data pool and the cache tier pool's mode was read...
qinglong li

10/29/2018

10:33 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
Thanks for the response, I wrote to the mailing list ceph-users (is it the correct place?) :) Vitaliy Filippov
08:37 PM Support #36614 (Closed): Cluster uses substantially more space after rebalance (erasure codes)
The mailing list is a better place to resolve this. My guess is data hasn't been cleaned up from its old locations ye... Greg Farnum
12:13 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
How to heal it? If I don't heal it I'll need to purge the whole cluster? O_o... Vitaliy Filippov
12:12 PM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
ceph df output:... Vitaliy Filippov
11:11 AM Support #36614: Cluster uses substantially more space after rebalance (erasure codes)
Proofs from our prometheus monitoring. Two graphs from yesterday: one with number of objects in cluster and other wit... Vitaliy Filippov
10:17 AM Support #36614 (Closed): Cluster uses substantially more space after rebalance (erasure codes)
Hi
After I recreated one OSD + increased pg count of my erasure-coded (2+1) pool (which was way too low, only 100 ...
Vitaliy Filippov
10:21 PM Bug #36525: osd-scrub-snaps.sh failure

Looking at the log another scrub has made the number of "_scan_snaps start" in the log from 2 to 4. It results in ...
David Zafman
01:06 AM Bug #36525: osd-scrub-snaps.sh failure
/a/sage-2018-10-28_14:12:19-rados-master-distro-basic-smithi/3196520
another instance on current master
Sage Weil
09:48 PM Bug #23827 (Resolved): osd sends op_reply out of order
Nathan Cutler
09:47 PM Backport #25010 (Resolved): mimic: osd sends op_reply out of order
Nathan Cutler
08:47 PM Backport #25010: mimic: osd sends op_reply out of order
https://github.com/ceph/ceph/pull/23136 has merged, can we resolve this issue? Neha Ojha
09:43 PM Bug #25154 (Resolved): librados application's symbol could conflict with the libceph-common
Nathan Cutler
09:42 PM Backport #26839 (Resolved): mimic: librados application's symbol could conflict with the libceph-...
Nathan Cutler
08:21 PM Backport #26839: mimic: librados application's symbol could conflict with the libceph-common
Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/24708
merged
Yuri Weinstein
09:40 PM Bug #35969 (Resolved): "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
Nathan Cutler
09:39 PM Backport #36553 (Resolved): mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked ...
Nathan Cutler
08:16 PM Backport #36553: mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked against cor...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24260
merged
Yuri Weinstein
09:39 PM Backport #36132 (Resolved): mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on ...
Nathan Cutler
08:16 PM Backport #36132: mimic: "symbol lookup error: ceph-osd: undefined symbol: _ZdaPvm" on centos 7.4
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24260
merged
Yuri Weinstein
08:47 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
The above changes is not entirely correct. This section needs to be ommited:... Louwrentius Louwrentius
08:13 PM Bug #23387: Building Ceph on armhf fails due to out-of-memory
Hello!
I've used the instruction created by Daniel Glasser and with some small code adjustments in a few files I w...
Louwrentius Louwrentius
04:17 PM Bug #36610 (Fix Under Review): filestore merge collection replay problem
https://github.com/ceph/ceph/pull/24806 Sage Weil
03:51 PM Bug #36610: filestore merge collection replay problem
the osd is stopped during the merge operation:... Sage Weil
03:46 PM Bug #36182 (Resolved): osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is ...
Neha Ojha
02:59 PM Bug #36473 (Resolved): hung osd_repop, bluestore committed but failed to trigger repop_commit
this is presumably https://github.com/ceph/ceph/pull/24761 Sage Weil
02:58 PM Bug #36548 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh
Sage Weil
01:34 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
/a/sage-2018-10-29_01:11:58-rados-wip-sage-testing-2018-10-28-0943-distro-basic-smithi/3197984 Sage Weil
01:10 AM Bug #36408 (Resolved): [cache tier] failed guarded write + promotion results in "success" op result
Sage Weil

10/28/2018

02:40 PM Bug #36602 (Pending Backport): osd: race condition opening heartbeat connection
Sage Weil
02:37 PM Bug #36610 (Resolved): filestore merge collection replay problem
/a/sage-2018-10-27_02:10:33-rados-wip-sage-testing-2018-10-26-1411-distro-basic-smithi/3188976
osd.3 was partway t...
Sage Weil

10/26/2018

07:24 PM Feature #24591: FileStore hasn't impl to get kv-db's statistics
Jack Lv wrote:
> https://github.com/ceph/ceph/pull/22633
merged
Yuri Weinstein
06:30 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
https://github.com/ceph/ceph/pull/24761 Neha Ojha
05:41 PM Bug #36602: osd: race condition opening heartbeat connection
Greg Farnum
03:39 PM Bug #36602 (Fix Under Review): osd: race condition opening heartbeat connection
https://github.com/ceph/ceph/pull/24780 Sage Weil
03:37 PM Bug #36602 (Resolved): osd: race condition opening heartbeat connection
... Sage Weil
05:10 PM Bug #20694: osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_log().get_log().obje...
/a/yuriw-2018-10-25_15:31:28-rados-wip-yuri4-testing-2018-10-24-2310-mimic-distro-basic-smithi/3183476/ Neha Ojha
04:22 PM Bug #36345 (Fix Under Review): librados C API aio read empty buffer
imirc tw, thank you for your analysis. i am approving https://github.com/ceph/ceph/pull/24534. so "unshared buffer" o... Kefu Chai
09:52 AM Bug #36345: librados C API aio read empty buffer
I figured it out. In Objecter.cc:3279... imirc tw
09:02 AM Bug #36345: librados C API aio read empty buffer
without osd_op_timeout, in Objecter::handle_osd_op_reply, Objecter.cc:3473
op->con px is an AsyncConnection on whic...
imirc tw
07:54 AM Bug #36345: librados C API aio read empty buffer
Some more info from what I can see while debugging.
Without 'rados osd op timeout', the buffer in librados::IoCtx...
imirc tw
02:18 PM Bug #24587 (Pending Backport): librados api aio tests race condition
Sage Weil
11:01 AM Bug #24180 (Resolved): mon: slow op on log message
Nathan Cutler
11:01 AM Backport #24293 (Resolved): jewel: mon: slow op on log message
Nathan Cutler
06:42 AM Bug #24835: osd daemon spontaneous segfault
Our ceph.conf:... Christian Schlittchen
03:46 AM Bug #24615 (Need More Info): error message for 'unable to find any IP address' not shown
Francois,
Can you try reproducing your issue on the latest master?
I fixed a similar issue in master and also fro...
Victor Denisov
03:28 AM Bug #24615 (In Progress): error message for 'unable to find any IP address' not shown
Victor Denisov
02:34 AM Bug #25153 (Resolved): output format is invalid of the crush tree json dumper
Nathan Cutler
02:33 AM Backport #36149 (Resolved): luminous: output format is invalid of the crush tree json dumper
Nathan Cutler
02:33 AM Bug #35845 (Resolved): osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
Nathan Cutler
02:32 AM Backport #36393 (Resolved): luminous: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
Nathan Cutler
02:30 AM Bug #36183 (Resolved): [objecter] client socket failure leads to hung connection
Nathan Cutler
02:30 AM Backport #36295 (Resolved): luminous: [objecter] client socket failure leads to hung connection
Nathan Cutler
02:29 AM Bug #21931 (Resolved): osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (range...
Nathan Cutler
02:29 AM Backport #36440 (Resolved): luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + le...
Nathan Cutler
02:28 AM Bug #22330 (Resolved): ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler
02:28 AM Backport #36438 (Resolved): luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler
02:27 AM Bug #36417 (Resolved): osd: get loadavg per cpu for scrub load threshold check
Nathan Cutler
02:27 AM Backport #36419 (Resolved): luminous: osd: get loadavg per cpu for scrub load threshold check
Nathan Cutler
02:26 AM Bug #36174 (Resolved): ceph pg ls creating: EINVAL
Nathan Cutler
02:26 AM Backport #36297 (Resolved): luminous: ceph pg ls creating: EINVAL
Nathan Cutler
02:25 AM Bug #23614 (Resolved): local_reserver double-reservation of backfilled pg
Nathan Cutler
02:25 AM Backport #24333 (Resolved): luminous: local_reserver double-reservation of backfilled pg
Nathan Cutler
02:24 AM Backport #26932 (Resolved): luminous: scrub livelock
Nathan Cutler

10/25/2018

10:22 PM Backport #36149: luminous: output format is invalid of the crush tree json dumper
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24482
merged
Yuri Weinstein
10:21 PM Backport #36393: luminous: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
David Zafman wrote:
> https://github.com/ceph/ceph/pull/24532
merged
Yuri Weinstein
10:20 PM Backport #36295: luminous: [objecter] client socket failure leads to hung connection
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24574
merged
Yuri Weinstein
10:20 PM Backport #36440: luminous: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (r...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24582
merged
Yuri Weinstein
10:20 PM Backport #36438: luminous: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24582
merged
Yuri Weinstein
10:19 PM Backport #36419: luminous: osd: get loadavg per cpu for scrub load threshold check
David Zafman wrote:
> https://github.com/ceph/ceph/pull/24593
merged
Yuri Weinstein
10:19 PM Backport #36297: luminous: ceph pg ls creating: EINVAL
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24602
mergedReviewed-by: Neha Ojha <nojha@redhat.com>
Yuri Weinstein
10:18 PM Bug #26890: scrub livelock
merged https://github.com/ceph/ceph/pull/24659 Yuri Weinstein
08:01 PM Bug #36345: librados C API aio read empty buffer
Kefu Chai, same happens on master:... imirc tw
04:36 PM Bug #36345: librados C API aio read empty buffer
13.2.2 , i will give it a go on master asap. imirc tw
03:57 PM Bug #36345: librados C API aio read empty buffer
imirc tw, on which release did you reproduce this issue? is master affected? Kefu Chai
01:18 PM Bug #36345: librados C API aio read empty buffer
Hi Kefu,
I'm not that deep into the Ceph code, I was making an assumption based on my observations and past ticket...
imirc tw
08:52 AM Bug #36345: librados C API aio read empty buffer
imirc tw, i don't understand how "rados_osd_op_timeout" is related to this issue. i agree that current @librados::IoC... Kefu Chai
07:02 AM Bug #36345: librados C API aio read empty buffer
Hi Wido,
The 2nd assumption isn't true, that was because the client.admin ceph.conf file used didn't had the osd_o...
imirc tw
06:51 AM Bug #36345: librados C API aio read empty buffer
Updating this ticket as the issue seems to be related to two things:
- When using osd_op_timeout
- When using a u...
Wido den Hollander
06:21 PM Bug #36598 (Can't reproduce): osd: "bluestore(/var/lib/ceph/osd/ceph-6) ENOENT on clone suggests ...
... Patrick Donnelly
04:20 PM Backport #24333: luminous: local_reserver double-reservation of backfilled pg
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/23493
merged
Yuri Weinstein
05:18 AM Feature #36474: Add support for osd_delete_sleep configuration value
https://github.com/ceph/ceph/pull/24749 David Zafman

10/24/2018

09:43 PM Bug #25182: Upmaps forgotten after restarting OSDs
One thing I've noticed after living with this for a while is that the upmap entries that are forgotten are always for... Bryan Stillwell
09:31 PM Bug #36517: client crashes osd with empty object name
Attached Noah Watkins
09:17 PM Bug #36517: client crashes osd with empty object name
Noah, the paste doesn't show now, could you paste the trace in the tracker. Neha Ojha
09:21 PM Bug #24485 (Resolved): LibRadosTwoPoolsPP.ManifestUnset failure
Greg Farnum
09:11 PM Bug #36166 (Resolved): pg merge can collide with remapped, upmap pgs
Neha Ojha
02:29 PM Bug #36345: librados C API aio read empty buffer
Kefu, I'm also experiencing this issue. It seems to be related to `rados osd op timeout`. Once this value is set in t... imirc tw
11:01 AM Support #36584 (Closed): OSD Anomaly behaviour in ceph-reweight
ceph version 10.2.5
We have this behaviour with 2 OSDs in cluster making a backfilling bucle.
I'm executing thi...
JUan Galan
10:35 AM Bug #19348 (Can't reproduce): "ceph ping mon.c" cli prints assertion failure on timeout
not able to reproduce with master HEAD anymore. Kefu Chai
10:34 AM Bug #19348: "ceph ping mon.c" cli prints assertion failure on timeout
https://github.com/ceph/ceph/pull/24733 Kefu Chai

10/23/2018

09:36 PM Bug #36040: mon: Valgrind: mon (InvalidFree, InvalidWrite, InvalidRead)
/ceph/teuthology-archive/pdonnell-2018-10-17_19:54:38-multimds-wip-pdonnell-testing-20181017.175152-distro-basic-smit... Patrick Donnelly
09:30 PM Bug #36497: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::replace()
/ceph/teuthology-archive/pdonnell-2018-10-17_19:54:38-multimds-wip-pdonnell-testing-20181017.175152-distro-basic-smit... Patrick Donnelly
09:28 PM Bug #36411: OSD crash starting recovery/backfill with EC pool
I have to add to the previous update, which did not explain the resolution of the problem.
The true solution was w...
Graham Allan
08:27 PM Bug #24587 (Fix Under Review): librados api aio tests race condition
https://github.com/ceph/ceph/pull/24724 Josh Durgin
07:53 PM Bug #36572: ceph-in: --connect-timeout doesn't work while pinging mon
Submitted a "PR":https://github.com/ceph/ceph/pull/24723 for this. Rishabh Dave
07:44 PM Bug #36572 (Closed): ceph-in: --connect-timeout doesn't work while pinging mon
Saw the following output while working on "PR 21432":https://github.com/ceph/ceph/pull/21432 -... Rishabh Dave
03:53 PM Bug #36548: qa/standalone/osd/osd-rep-recov-eio.sh
The failed run did not include the changes in https://github.com/ceph/ceph/pull/24651 (master). This pull request mi... David Zafman
01:43 AM Bug #36548 (Resolved): qa/standalone/osd/osd-rep-recov-eio.sh
pg ended up in active+clean, not recovery_unfound
/a/sage-2018-10-22_21:29:13-rados-wip-sage-testing-2018-10-22-11...
Sage Weil
06:04 AM Backport #36553 (In Progress): mimic: gperftools-libs-2.6.1-1 or newer required for binaries link...
Nathan Cutler
05:44 AM Backport #36553 (Resolved): mimic: gperftools-libs-2.6.1-1 or newer required for binaries linked ...
https://github.com/ceph/ceph/pull/24260 Nathan Cutler
05:52 AM Backport #36552 (In Progress): luminous: gperftools-libs-2.6.1-1 or newer required for binaries l...
Nathan Cutler
05:43 AM Backport #36552 (Resolved): luminous: gperftools-libs-2.6.1-1 or newer required for binaries link...
https://github.com/ceph/ceph/pull/24706 Nathan Cutler
05:45 AM Backport #36557 (Resolved): mimic: RBD client IOPS pool stats are incorrect (2x higher; includes ...
https://github.com/ceph/ceph/pull/25024 Nathan Cutler
05:45 AM Backport #36556 (Resolved): luminous: RBD client IOPS pool stats are incorrect (2x higher; includ...
https://github.com/ceph/ceph/pull/25025 Nathan Cutler
05:43 AM Backport #35909 (Resolved): mimic: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
Nathan Cutler
05:31 AM Backport #36439 (Resolved): mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + lengt...
Nathan Cutler
05:31 AM Backport #36437 (Resolved): mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler
05:30 AM Backport #36296 (Resolved): mimic: [objecter] client socket failure leads to hung connection
Nathan Cutler
05:30 AM Backport #36298 (Resolved): mimic: ceph pg ls creating: EINVAL
Nathan Cutler
04:41 AM Bug #24835: osd daemon spontaneous segfault
I'd say the cause of most, if not all, of these crashes is memory corruption caused by code responsible for manipulat... Brad Hubbard
04:31 AM Bug #24835: osd daemon spontaneous segfault
The 'safe_timer.5246' is again similar but this time tcmalloc is 'popping' a
single value rather than a range.
<p...
Brad Hubbard
03:54 AM Bug #24835: osd daemon spontaneous segfault
The 'msgr-worker-1.5278' is almost identical to 'tp_osd_tp' except this time 'i'
= 499 so doing that manually is bey...
Brad Hubbard
01:58 AM Bug #24835: osd daemon spontaneous segfault
For the rest of the coredumps adding the debuginfo for libtcmalloc really helps
to understand the problem as we end ...
Brad Hubbard
12:53 AM Bug #24835: osd daemon spontaneous segfault
Starting with the bluestore bufferlist destructor crash.... Brad Hubbard

10/22/2018

11:39 PM Bug #36508 (Pending Backport): gperftools-libs-2.6.1-1 or newer required for binaries linked agai...
Brad Hubbard
11:38 PM Bug #36182: osd: hung op "osd.3 22 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds....
Haven't been able to reproduce this on luminous and mimic, so clearing the Backport fields for now. Neha Ojha
07:05 PM Bug #24909 (Pending Backport): RBD client IOPS pool stats are incorrect (2x higher; includes IO h...
Jason Dillaman
03:40 PM Backport #35909: mimic: osd-scrub-repair.sh:TEST_corrupt_scrub_replicated failed
David Zafman wrote:
> https://github.com/ceph/ceph/pull/24017
merged
Yuri Weinstein
03:35 PM Backport #36439: mimic: osd: src/osd/ECBackend.cc: 2164: FAILED assert((offset + length) <= (rang...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24581
merged
Yuri Weinstein
03:35 PM Backport #36437: mimic: ec: src/common/interval_map.h: 161: FAILED assert(len > 0)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24581
merged
Yuri Weinstein
03:34 PM Backport #36296: mimic: [objecter] client socket failure leads to hung connection
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24600
merged
Yuri Weinstein
03:32 PM Backport #36298: mimic: ceph pg ls creating: EINVAL
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/24601
merged
Yuri Weinstein
02:31 PM Bug #24956 (Resolved): osd: parent process need to restart log service after fork, or ceph-osd wi...
Kefu Chai
02:25 PM Bug #36546 (Duplicate): common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back...
... Sage Weil
02:21 PM Bug #36485 (Resolved): dump-stuck.yaml fails assert len(inactive) == num_inactive
Sage Weil

10/21/2018

03:53 PM Bug #36485 (Fix Under Review): dump-stuck.yaml fails assert len(inactive) == num_inactive
https://github.com/ceph/ceph/pull/24689 Sage Weil
09:25 AM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
https://github.com/ceph/ceph/pull/24687
Myoungwon Oh
 

Also available in: Atom