Project

General

Profile

Activity

From 01/14/2019 to 02/12/2019

02/12/2019

11:38 PM Bug #38083: mimic: test_kvstore_tool.sh: mkfs failed: (22) Invalid argument
This seems to reliably fail in mimic only with rhel_latest.yaml.
http://pulpito.ceph.com/nojha-2019-02-12_21:14:17...
Neha Ojha
10:51 PM Bug #38283 (Resolved): max-pg-per-osd tests failing
... Sage Weil
10:49 PM Bug #38282 (Resolved): cephtool/test.sh failure in test_mon_osd_pool_set
... Sage Weil
05:47 PM Feature #36737 (Resolved): Allow multi instances of "make tests" on the same machine
Kefu Chai
08:06 AM Feature #36737 (Pending Backport): Allow multi instances of "make tests" on the same machine
Nathan Cutler
05:47 PM Backport #38266 (Resolved): mimic: test: Allow multi instances of "make tests" on the same machine
Kefu Chai
08:07 AM Backport #38266: mimic: test: Allow multi instances of "make tests" on the same machine
h3. original description
Currently it's only possible to run `...make; make tests -j8; ctest ...` on the same mach...
Nathan Cutler
08:04 AM Backport #38266 (In Progress): mimic: test: Allow multi instances of "make tests" on the same mac...
https://github.com/ceph/ceph/pull/26376 Kefu Chai
07:58 AM Backport #38266 (Resolved): mimic: test: Allow multi instances of "make tests" on the same machine
https://github.com/ceph/ceph/pull/26376 Kefu Chai
04:10 PM Backport #38277 (Resolved): mimic: osd_map_message_max default is too high?
https://github.com/ceph/ceph/pull/29242 Nathan Cutler
04:10 PM Backport #38276 (Resolved): luminous: osd_map_message_max default is too high?
https://github.com/ceph/ceph/pull/28640 Nathan Cutler
04:10 PM Backport #38275 (Resolved): mimic: Fix recovery and backfill priority handling
https://github.com/ceph/ceph/pull/27081 Nathan Cutler
04:09 PM Backport #38274 (Resolved): luminous: Fix recovery and backfill priority handling
https://github.com/ceph/ceph/pull/26793 Nathan Cutler
03:32 PM Bug #38040 (Pending Backport): osd_map_message_max default is too high?
Sage Weil
03:10 PM Backport #38096 (Resolved): mimic: doc/rados/configuration: refresh osdmap section
Nathan Cutler
03:05 AM Backport #38096 (In Progress): mimic: doc/rados/configuration: refresh osdmap section
https://github.com/ceph/ceph/pull/26373 Prashant D
03:07 PM Backport #37689 (Resolved): mimic: ceph-objectstore-tool: Add HashInfo to object dump output
Nathan Cutler
12:03 AM Backport #37689: mimic: ceph-objectstore-tool: Add HashInfo to object dump output
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25721
merged
Yuri Weinstein
03:07 PM Backport #37992 (Resolved): mimic: ec pool lost data due to snap clone
Nathan Cutler
12:02 AM Backport #37992: mimic: ec pool lost data due to snap clone
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26077
merged
Yuri Weinstein
03:01 PM Backport #38106 (Resolved): mimic: osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(ho...
Nathan Cutler
12:00 AM Backport #38106: mimic: osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid))
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26239
merged
Yuri Weinstein
02:49 PM Bug #27985 (Resolved): force-backfill sets forced_recovery instead of forced_backfill in 13.2.1
Nathan Cutler
02:49 PM Backport #38111 (Resolved): mimic: force-backfill sets forced_recovery instead of forced_backfill...
Nathan Cutler
12:06 PM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
huang jun wrote:
> @rafal what the status of this issue now? did you resolve the problem?
I had to redeploy this ...
Rafal Wadolowski
04:21 AM Bug #23937: FAILED assert(info.history.same_interval_since != 0)
@rafal what the status of this issue now? did you resolve the problem? huang jun
08:02 AM Backport #38073 (Resolved): luminous: build/ops: Allow multi instances of "make tests" on the sam...
Kefu Chai
03:56 AM Bug #38034: pg stuck in backfill_wait with plenty of disk space

I created a simple fix for this scenario in https://github.com/ceph/ceph/pull/20933. The result is probably this s...
David Zafman
02:58 AM Backport #38095 (In Progress): luminous: doc/rados/configuration: refresh osdmap section
https://github.com/ceph/ceph/pull/26372 Prashant D
12:05 AM Bug #36517: client crashes osd with empty object name
Ok, thank you Noah! Much appreciated. I'll have a look at this soon. Jesse Williamson

02/11/2019

09:58 PM Bug #36517: client crashes osd with empty object name
Jesse,
This is still a bug in the latest master. Also, it appears to be worse than before--it seems as though raw ...
Noah Watkins
09:07 PM Backport #38111: mimic: force-backfill sets forced_recovery instead of forced_backfill in 13.2.1
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26324
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
m...
Yuri Weinstein
05:13 PM Bug #38172: segv in rocksdb NewIterator
I'm guessing this is a dup of #38024 Sage Weil
05:11 PM Bug #38258 (Fix Under Review): filestore: fsync(2) return value not checked
https://github.com/ceph/ceph/pull/26366 Sage Weil
02:33 PM Bug #38258 (Resolved): filestore: fsync(2) return value not checked
WBThrottle is the main one, but there are also fsync(2) calls in the write guard code that should be checked. Sage Weil
02:48 PM Bug #24685: config options: possible inconsistency between flag 'can_update_at_runtime' and 'flag...
Yes, I can edit them now. Thanks Nathan :) Tatjana Dehler
02:36 PM Bug #24685: config options: possible inconsistency between flag 'can_update_at_runtime' and 'flag...
@Tatjana - I added you to the "ceph developers" group so you should be able to change Status etc. fields. Nathan Cutler
02:35 PM Bug #24685 (Resolved): config options: possible inconsistency between flag 'can_update_at_runtime...
Nathan Cutler
11:05 AM Bug #24685: config options: possible inconsistency between flag 'can_update_at_runtime' and 'flag...
I retested it and the issue is gone from my point of view. Unfortunately I can't tell which pull request fixed it.
...
Tatjana Dehler
12:35 PM Backport #38256 (Need More Info): luminous: OSD crashes when loading pgs with "FAILED assert(inte...
Nathan Cutler
12:35 PM Backport #38256 (New): luminous: OSD crashes when loading pgs with "FAILED assert(interval.last >...
Sage writes in https://github.com/ceph/ceph/pull/25800
We're having a hard time root causing http://tracker.ceph.c...
Nathan Cutler
12:31 PM Backport #38256 (In Progress): luminous: OSD crashes when loading pgs with "FAILED assert(interva...
Nathan Cutler
12:30 PM Backport #38256 (Duplicate): luminous: OSD crashes when loading pgs with "FAILED assert(interval....
Nathan Cutler
12:30 PM Bug #21142 (Pending Backport): OSD crashes when loading pgs with "FAILED assert(interval.last > l...
Nathan Cutler
03:08 AM Backport #38243 (In Progress): mimic: scrub warning check incorrectly uses mon scrub interval
https://github.com/ceph/ceph/pull/26356 Prashant D
01:15 AM Backport #38240 (In Progress): luminous: radosbench tests hit ENOSPC
https://github.com/ceph/ceph/pull/26355 Prashant D
01:12 AM Backport #38239 (In Progress): mimic: radosbench tests hit ENOSPC
https://github.com/ceph/ceph/pull/26354 Prashant D

02/09/2019

02:13 AM Bug #36739 (Fix Under Review): ENOENT in collection_move_rename on EC backfill target
Neha Ojha
01:49 AM Bug #38195: osd-backfill-space.sh exposes rocksdb hang

Also, seen in http://qa-proxy.ceph.com/teuthology/dzafman-2019-02-07_22:34:38-rados-wip-dzafman-testing3-distro-bas...
David Zafman
01:44 AM Bug #38248 (New): qa/standalone/osd/pg-split-merge.sh TEST_import_after_merge_and_gap() test fails

http://pulpito.ceph.com/dzafman-2019-02-07_22:34:38-rados-wip-dzafman-testing3-distro-basic-smithi/
3564438 356444...
David Zafman
12:42 AM Bug #38041 (Pending Backport): Fix recovery and backfill priority handling
David Zafman
12:39 AM Bug #37393 (Resolved): mimic: osd-backfill-stats.sh fails in rados/standalone/osd.yaml
David Zafman

02/08/2019

10:51 PM Bug #21142 (Resolved): OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Josh Durgin
09:25 PM Bug #24866: FAILED assert(0 == "past_interval start interval mismatch") in check_past_interval_bo...
Looks like this is still possible to hit:
http://pulpito.ceph.com/nojha-2019-02-08_05:52:40-rados-wip-36739-2019-0...
Josh Durgin
06:11 PM Bug #38212: Rare qa/standalone/osd/osd-markdown.sh mon start-up error

This is not actually causing a test failure. It shows that a test FOR failure is being blocked by this bug.
I t...
David Zafman
04:28 PM Bug #38040 (Fix Under Review): osd_map_message_max default is too high?
* https://github.com/ceph/ceph/pull/26340
* https://github.com/ceph/ceph/pull/26413
* https://github.com/ceph/ceph/...
Sage Weil
03:56 PM Bug #38195: osd-backfill-space.sh exposes rocksdb hang

The workaround was for a specific case but I'm seeing lots more failures in master:
http://pulpito.ceph.com/dzaf...
David Zafman
02:48 PM Backport #38244 (Resolved): luminous: scrub warning check incorrectly uses mon scrub interval
https://github.com/ceph/ceph/pull/26557 Nathan Cutler
02:48 PM Backport #38243 (Resolved): mimic: scrub warning check incorrectly uses mon scrub interval
https://github.com/ceph/ceph/pull/26493 Nathan Cutler
02:47 PM Backport #38240 (Resolved): luminous: radosbench tests hit ENOSPC
https://github.com/ceph/ceph/pull/26355 Nathan Cutler
02:47 PM Backport #38239 (Resolved): mimic: radosbench tests hit ENOSPC
https://github.com/ceph/ceph/pull/26354 Nathan Cutler
02:41 PM Bug #38238 (Duplicate): rados/test.sh: api_aio_pp doesn't seem to start
... Sage Weil
03:11 AM Documentation #23999: osd_recovery_priority is not documented (but osd_recovery_op_priority is)
http://docs.ceph.com/docs/mimic/rados/configuration/osd-config-ref/
Recover op priority is documented here.
Марк Коренберг
02:42 AM Documentation #23999 (In Progress): osd_recovery_priority is not documented (but osd_recovery_op_...
David Zafman

02/07/2019

11:57 PM Bug #36748 (Can't reproduce): ms_deliver_verify_authorizer no AuthAuthorizeHandler found for prot...
Sage Weil
07:19 PM Bug #23031: FAILED assert(!parent->get_log().get_missing().is_missing(soid))
This should be assigned to http://tracker.ceph.com/users/3114 but doesn't show up on assignee list. David Zafman
07:08 PM Bug #37264 (Pending Backport): scrub warning check incorrectly uses mon scrub interval
David Zafman
04:46 PM Bug #36494 (Resolved): Change osd_objectstore default to bluestore
Nathan Cutler
03:36 PM Bug #36494: Change osd_objectstore default to bluestore
@Nathan Let's not backport this to luminous and mimic. Neha Ojha
04:46 PM Backport #37995 (Rejected): luminous: Change osd_objectstore default to bluestore
Nathan Cutler
04:46 PM Backport #37994 (Rejected): mimic: Change osd_objectstore default to bluestore
Nathan Cutler
04:36 PM Bug #37665 (Resolved): ceph-objectstore-tool export from luminous, import to master clears same_i...
Nathan Cutler
04:36 PM Backport #37821 (Resolved): mimic: ceph-objectstore-tool export from luminous, import to master c...
Nathan Cutler
04:31 PM Backport #38111 (In Progress): mimic: force-backfill sets forced_recovery instead of forced_backf...
Nathan Cutler
03:48 AM Bug #38219 (Resolved): rebuild-mondb hangs
http://pulpito.ceph.com/sage-2019-02-06_23:33:50-rados-master-distro-basic-smithi/... Sage Weil

02/06/2019

10:46 PM Backport #38207 (In Progress): luminous: A PG repairing doesn't mean PG is damaged
David Zafman
12:33 PM Backport #38207 (Resolved): luminous: A PG repairing doesn't mean PG is damaged
https://github.com/ceph/ceph/pull/26305 Nathan Cutler
10:44 PM Backport #38208 (In Progress): mimic: A PG repairing doesn't mean PG is damaged
David Zafman
12:33 PM Backport #38208 (Resolved): mimic: A PG repairing doesn't mean PG is damaged
https://github.com/ceph/ceph/pull/26304 Nathan Cutler
10:42 PM Bug #38124: OSD down on snaptrim.
I was theorizing in a bug scrub that maybe the PG was running behind on OSDMaps and so missing the nosnaptrim flag up... Greg Farnum
10:15 PM Bug #38024: segv, heap corruption in ec encode_and_write
related? submit_transaction and bufferlist::rebuild()...
/a/sage-2019-02-06_15:56:08-rados-wip-sage-testing-2019-...
Sage Weil
10:09 PM Bug #38198 (Duplicate): ceph-mon sometimes fails to start (only seen in odd-markdown.sh)
Neha Ojha
10:06 PM Bug #38195 (Resolved): osd-backfill-space.sh exposes rocksdb hang
Neha Ojha
04:44 PM Bug #38195: osd-backfill-space.sh exposes rocksdb hang
Workaround merged, so changed priority from Urgent to High. David Zafman
10:01 PM Bug #37804 (Closed): "monmaptool: too many arguments" in perf siute
Seems better now http://pulpito.ceph.com/teuthology-2019-01-25_03:57:03-perf-basic-master-distro-basic-smithi/ Yuri Weinstein
09:37 PM Backport #37821: mimic: ceph-objectstore-tool export from luminous, import to master clears same_...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25856
merged
Yuri Weinstein
09:29 PM Bug #37808: osd: osdmap cache weak_refs assert during shutdown
/a/sage-2019-02-06_15:56:42-rados-wip-msgr2-peer-addr-distro-basic-smithi/3557216
rados/singleton-flat/valgrind-le...
Sage Weil
09:12 PM Bug #37797 (Pending Backport): radosbench tests hit ENOSPC
looks like we are hitting this in mimic as well: /a/yuriw-2019-02-06_16:30:03-rados-wip-yuri4-testing-2019-02-05-1539... Neha Ojha
08:59 PM Feature #38215 (New): Add bulk operation (--op bulk) to ceph-objectstore-tool

Instead of adding an individual bulk operation for rm-omap like in https://github.com/ceph/ceph/pull/22379, I sugge...
David Zafman
07:58 PM Bug #37393 (In Progress): mimic: osd-backfill-stats.sh fails in rados/standalone/osd.yaml
David Zafman
06:15 PM Bug #38041 (Fix Under Review): Fix recovery and backfill priority handling
Neha Ojha
04:42 PM Bug #38027 (Resolved): osd/osd-backfill-space.sh fails
David Zafman
04:38 PM Bug #38212 (New): Rare qa/standalone/osd/osd-markdown.sh mon start-up error

http://pulpito.ceph.com/dzafman-2019-02-05_11:42:47-rados-wip-zafman-testing2-distro-basic-smithi/3553445
It hap...
David Zafman
12:45 PM Bug #37618 (Resolved): Command failed on smithi191 with status 1: '\n sudo yum -y install ceph-ra...
Nathan Cutler
12:45 PM Backport #37688 (Resolved): mimic: Command failed on smithi191 with status 1: '\n sudo yum -y ins...
Nathan Cutler
04:09 AM Backport #37688: mimic: Command failed on smithi191 with status 1: '\n sudo yum -y install ceph-r...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26201
merged
Yuri Weinstein
12:44 PM Bug #36686 (Resolved): osd: pg log hard limit can cause crash during upgrade
Nathan Cutler
12:44 PM Backport #37902 (Resolved): mimic: osd: pg log hard limit can cause crash during upgrade
Nathan Cutler
04:08 AM Backport #37902: mimic: osd: pg log hard limit can cause crash during upgrade
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26206
merged
Yuri Weinstein
12:33 PM Backport #38206 (Resolved): mimic: osds allows to partially start more than N+2
https://github.com/ceph/ceph/pull/29241 Nathan Cutler
12:33 PM Backport #38205 (Resolved): luminous: osds allows to partially start more than N+2
https://github.com/ceph/ceph/pull/31858 Nathan Cutler
11:02 AM Bug #38076 (Pending Backport): osds allows to partially start more than N+2
Kefu Chai
09:02 AM Bug #37404 (Resolved): OSD mkfs might assert when working agains bluestore disk that already has ...
Igor Fedotov
08:57 AM Backport #37496 (Resolved): mimic: OSD mkfs might assert when working agains bluestore disk that ...
Igor Fedotov
04:09 AM Backport #37496: mimic: OSD mkfs might assert when working agains bluestore disk that already has...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25385
merged
Yuri Weinstein
01:04 AM Documentation #23999: osd_recovery_priority is not documented (but osd_recovery_op_priority is)
I don't see osd_recovery_op_priority documentation, so I propose adding these lines to doc/rados/configuration/pool-p... David Zafman

02/05/2019

11:27 PM Bug #38195: osd-backfill-space.sh exposes rocksdb hang

https://github.com/ceph/ceph/pull/26290 has a workaround for this issue.
David Zafman
07:03 PM Bug #38195: osd-backfill-space.sh exposes rocksdb hang

After adding code to send a SEGV on kill_daemon timeout I got the following stack traces....
David Zafman
07:01 PM Bug #38195 (New): osd-backfill-space.sh exposes rocksdb hang

After increasing the timeout for backfills finish for http://tracker.ceph.com/issues/38027 we see this kill_daemons...
David Zafman
11:22 PM Bug #38198 (Duplicate): ceph-mon sometimes fails to start (only seen in odd-markdown.sh)

http://pulpito.ceph.com/dzafman-2019-02-05_11:42:47-rados-wip-zafman-testing2-distro-basic-smithi/3553445
<pre...
David Zafman
11:17 PM Bug #38070 (Pending Backport): A PG repairing doesn't mean PG is damaged
David Zafman
07:04 PM Bug #38011 (Closed): [Mimic version]extra null list in json output of command: ceph osd crush tre...
Noah Watkins
06:32 PM Bug #38011: [Mimic version]extra null list in json output of command: ceph osd crush tree --forma...
Looks like this was from running an old monitor, and should be fixed in newer release. Noah Watkins
04:56 PM Bug #37886 (Resolved): Adding back the IOPS line for client and recovery IO in cluster logs
Nathan Cutler
04:48 PM Bug #38184 (New): osd: recovery does not preserve copy-on-write allocations between object clones...
Hi. I've already reported it in issue 36614, but here is a more concrete case.
- Start with a bluestore Ceph clust...
Vitaliy Filippov
04:22 PM Backport #38140 (In Progress): luminous: Add hashinfo testing for dump command of ceph-objectstor...
Nathan Cutler
04:20 PM Backport #38141 (In Progress): mimic: Add hashinfo testing for dump command of ceph-objectstore-tool
Nathan Cutler

02/04/2019

10:47 PM Bug #38027: osd/osd-backfill-space.sh fails

dzafman-2019-02-04_11:24:54-rados-wip-zafman-testing2-distro-basic-smithi/3549933
Before mon shutdowns it appear...
David Zafman
10:23 PM Bug #38124: OSD down on snaptrim.
Hello,
I have collected additional information Sage asked. Attached log has debug_osd=20 set.
How this happ...
Darius Kasparavičius
08:49 PM Backport #38107 (Resolved): mimic: Adding back the IOPS line for client and recovery IO in cluste...
https://github.com/ceph/ceph/pull/26208 Neha Ojha
08:47 PM Bug #37886: Adding back the IOPS line for client and recovery IO in cluster logs
merged https://github.com/ceph/ceph/pull/26208 Yuri Weinstein
08:04 PM Bug #38172 (New): segv in rocksdb NewIterator
... Sage Weil
07:58 PM Bug #37393: mimic: osd-backfill-stats.sh fails in rados/standalone/osd.yaml
/a/yuriw-2019-02-02_14:56:39-rados-wip-yuri4-testing-2019-01-31-2315-mimic-distro-basic-smithi/3542409/ Neha Ojha
07:55 PM Bug #38083: mimic: test_kvstore_tool.sh: mkfs failed: (22) Invalid argument
/a/yuriw-2019-02-02_14:56:39-rados-wip-yuri4-testing-2019-01-31-2315-mimic-distro-basic-smithi/3542404/ Neha Ojha
06:09 PM Backport #38108 (Resolved): luminous: Adding back the IOPS line for client and recovery IO in clu...
https://github.com/ceph/ceph/pull/26207 Neha Ojha
11:22 AM Backport #38163 (Resolved): mimic: maybe_remove_pg_upmaps incorrectly cancels valid pending upmaps
Included in https://github.com/ceph/ceph/pull/27963 Nathan Cutler
11:22 AM Backport #38162 (Resolved): luminous: maybe_remove_pg_upmaps incorrectly cancels valid pending up...
https://github.com/ceph/ceph/pull/26127 Nathan Cutler
05:50 AM Bug #38159 (New): ec does not recover below min_size
... Sage Weil
05:26 AM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
/a/sage-2019-02-03_18:58:17-rados-wip-sage2-testing-2019-02-03-1047-distro-basic-smithi/3545716
Sage Weil

02/03/2019

04:59 PM Bug #38023: segv on FileJournal::prepare_entry in bufferlist
... Kefu Chai
04:43 PM Bug #24320: out of order reply and/or osd assert with set-chunks-read.yaml
/a/kchai-2019-02-03_02:07:02-rados-wip-kefu2-testing-2019-02-03-0001-distro-basic-smithi/3543791
rados/thrash/{0-s...
Kefu Chai
04:40 PM Bug #20798: LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
/a/kchai-2019-02-03_02:07:02-rados-wip-kefu2-testing-2019-02-03-0001-distro-basic-smithi/3543664/ Kefu Chai
03:24 AM Bug #38155 (Duplicate): PG stuck in undersized+degraded+remapped+backfill_toofull+peered

dzafman-2019-02-02_15:37:09-rados-wip-zafman-testing2-distro-basic-smithi/3542711
Something like this happened b...
David Zafman
03:12 AM Bug #38027: osd/osd-backfill-space.sh fails

After increasing the time out, saw a different failure. As expected all 4 PG backfills completed and 4 PGs are in ...
David Zafman

02/02/2019

09:25 AM Bug #37968: maybe_remove_pg_upmaps incorrectly cancels valid pending upmaps
https://github.com/ceph/ceph/pull/26179 xie xingguo
09:25 AM Bug #37968 (Pending Backport): maybe_remove_pg_upmaps incorrectly cancels valid pending upmaps
xie xingguo

02/01/2019

11:11 PM Bug #38027: osd/osd-backfill-space.sh fails
http://pulpito.ceph.com/dzafman-2019-01-30_18:54:50-rados-wip-zafman-testing-distro-basic-smithi/3528763 David Zafman
06:08 PM Bug #38151 (New): cephx: service ticket validity dobuled
... Sage Weil
09:18 AM Backport #38141 (Resolved): mimic: Add hashinfo testing for dump command of ceph-objectstore-tool
https://github.com/ceph/ceph/pull/26283 Nathan Cutler
09:18 AM Backport #38140 (Resolved): luminous: Add hashinfo testing for dump command of ceph-objectstore-tool
https://github.com/ceph/ceph/pull/26284 Nathan Cutler
06:49 AM Backport #38106 (In Progress): mimic: osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing...
https://github.com/ceph/ceph/pull/26239 Prashant D
03:49 AM Backport #38105 (In Progress): luminous: osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_miss...
https://github.com/ceph/ceph/pull/26236 Prashant D
12:29 AM Feature #38136 (Resolved): core: lazy omap stat collection
In Nautlis this PR - https://github.com/ceph/ceph/pull/18096 will bring very good support for all sizes in `ceph osd ... Vikhyat Umrao
12:12 AM Bug #38135: Ceph is in HEALTH_ERR status with inconsistent PG after some rbd snapshot creating/re...
1, create_rbd.sh, this is for creating rbds
2, create_snapshot.sh, this is for creating snapshots
3, delete_random_...
Bengen Tan

01/31/2019

11:53 PM Bug #38135 (New): Ceph is in HEALTH_ERR status with inconsistent PG after some rbd snapshot creat...
We observe Ceph is in HEALTH_ERR status with inconsistent PG after some rbd snapshot creating/removing task. Here are... Bengen Tan
06:57 PM Backport #37688: mimic: Command failed on smithi191 with status 1: '\n sudo yum -y install ceph-r...
@Nathan That sounds right. Neha Ojha
12:05 PM Bug #38124 (Resolved): OSD down on snaptrim.
All of ceph cluster OSD's crash when ceph runs snaptrim.
The particular error osd is throwing before crashing ...
Darius Kasparavičius
10:43 AM Bug #24531: Mimic MONs have slow/long running ops
I have restarted mon.node3 and now everything is OK. Марк Коренберг
10:38 AM Bug #24531: Mimic MONs have slow/long running ops
Seems, the same:... Марк Коренберг
10:29 AM Bug #37443 (Resolved): crushtool: add --reclassify operation to convert legacy crush maps to use ...
Nathan Cutler
10:28 AM Backport #37437 (Resolved): mimic: crushtool: add --reclassify operation to convert legacy crush ...
Nathan Cutler
10:27 AM Bug #37653 (Resolved): list-inconsistent-obj output truncated, causing osd-scrub-repair.sh failure
Nathan Cutler
10:27 AM Backport #37686 (Resolved): mimic: list-inconsistent-obj output truncated, causing osd-scrub-repa...
Nathan Cutler
10:22 AM Backport #37832 (Resolved): mimic: FAILED assert(is_up(osd)) in OSDMap::get_inst(int)
Nathan Cutler
10:19 AM Backport #38045 (Resolved): mimic: qa/overrides/short_pg_log.yaml: reduce osd_{min,max}_pg_log_en...
Nathan Cutler

01/30/2019

10:21 PM Bug #37975 (Resolved): assert failure in OSDService::shutdown()
Neha Ojha
10:20 PM Bug #38012: osd bad crc cause the whole cluster stop accepting new request.
It seems the first step would be reporting the crc mismatches via a perfcounter. Then the mgr could look at those to ... Josh Durgin
06:38 PM Backport #38107 (In Progress): mimic: Adding back the IOPS line for client and recovery IO in clu...
Vikhyat Umrao
12:57 PM Backport #38107 (Need More Info): mimic: Adding back the IOPS line for client and recovery IO in ...
@Vikhyat - assigning backport to you, since you volunteered to do it in https://tracker.ceph.com/issues/37886#note-10 Nathan Cutler
12:56 PM Backport #38107 (Resolved): mimic: Adding back the IOPS line for client and recovery IO in cluste...
https://github.com/ceph/ceph/pull/26208 Nathan Cutler
06:25 PM Bug #38057: "ceph -s" hangs indefinitely when a machine running a monitor has failed storage.
The node that had the failed SSD is "hoenir"
The node that I'm trying to use ceph commands from is "mimir".
I've ...
Michael Jones
03:37 PM Bug #38057: "ceph -s" hangs indefinitely when a machine running a monitor has failed storage.
Is the dead node the one that isn't in quorum?
What's the ceph.conf on the client that can't complete "ceph -s"?
...
Greg Farnum
06:16 PM Backport #38108 (In Progress): luminous: Adding back the IOPS line for client and recovery IO in ...
Vikhyat Umrao
12:58 PM Backport #38108 (Need More Info): luminous: Adding back the IOPS line for client and recovery IO ...
@Vikhyat - assigning backport to you, since you volunteered to do it in https://tracker.ceph.com/issues/37886#note-10 Nathan Cutler
12:56 PM Backport #38108 (Resolved): luminous: Adding back the IOPS line for client and recovery IO in clu...
https://github.com/ceph/ceph/pull/26207 Nathan Cutler
06:03 PM Backport #37902 (In Progress): mimic: osd: pg log hard limit can cause crash during upgrade
https://github.com/ceph/ceph/pull/26206 Neha Ojha
05:36 PM Bug #38053 (Pending Backport): Add hashinfo testing for dump command of ceph-objectstore-tool
David Zafman
05:30 PM Feature #37935 (Pending Backport): Add clear-data-digest command to objectstore tool
David Zafman
05:07 PM Backport #37437: mimic: crushtool: add --reclassify operation to convert legacy crush maps to use...
Mykola Golub wrote:
> https://github.com/ceph/ceph/pull/25306
merged
Yuri Weinstein
05:03 PM Backport #37686: mimic: list-inconsistent-obj output truncated, causing osd-scrub-repair.sh failure
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25603
merged
Yuri Weinstein
05:00 PM Backport #37832: mimic: FAILED assert(is_up(osd)) in OSDMap::get_inst(int)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25852
merged
Yuri Weinstein
04:56 PM Backport #38045: mimic: qa/overrides/short_pg_log.yaml: reduce osd_{min,max}_pg_log_entries
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26147
merged
Yuri Weinstein
01:15 PM Backport #37688: mimic: Command failed on smithi191 with status 1: '\n sudo yum -y install ceph-r...
@Kefu, @Neha - I rejected the luminous backport because there is no "rados/thrash-old-clients" suite in luminous. Ple... Nathan Cutler
01:08 PM Backport #37688 (In Progress): mimic: Command failed on smithi191 with status 1: '\n sudo yum -y ...
Nathan Cutler
03:11 AM Backport #37688 (New): mimic: Command failed on smithi191 with status 1: '\n sudo yum -y install ...
@Neha, sorry for the latency. and, yes, it's ready for the backport. Kefu Chai
01:06 PM Backport #37687 (Rejected): luminous: Command failed on smithi191 with status 1: '\n sudo yum -y ...
rados/thrash-old-clients does not exist in luminous Nathan Cutler
01:00 PM Bug #37507 (Resolved): osd_memory_target: failed assert when options mismatch
Nathan Cutler
12:58 PM Backport #38111 (Resolved): mimic: force-backfill sets forced_recovery instead of forced_backfill...
https://github.com/ceph/ceph/pull/26324 Nathan Cutler
12:56 PM Backport #38106 (Resolved): mimic: osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(ho...
https://github.com/ceph/ceph/pull/26239 Nathan Cutler
12:56 PM Backport #38105 (Resolved): luminous: osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing...
https://github.com/ceph/ceph/pull/26236 Nathan Cutler
12:55 PM Backport #38096 (Resolved): mimic: doc/rados/configuration: refresh osdmap section
https://github.com/ceph/ceph/pull/26373 Nathan Cutler
12:55 PM Backport #38095 (Resolved): luminous: doc/rados/configuration: refresh osdmap section
https://github.com/ceph/ceph/pull/26372 Nathan Cutler
03:10 AM Bug #37618: Command failed on smithi191 with status 1: '\n sudo yum -y install ceph-radosgw\n '
just a note: https://github.com/ceph/teuthology/pull/1246 should be able to address the issue completely. Kefu Chai
12:19 AM Bug #37919 (Pending Backport): osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid))
Neha Ojha

01/29/2019

09:31 PM Bug #38057: "ceph -s" hangs indefinitely when a machine running a monitor has failed storage.
I'll be performing maintenance on this machine soon.
This'll be the only chance anyone gets to get more debugging ...
Michael Jones
06:44 PM Bug #38083 (Resolved): mimic: test_kvstore_tool.sh: mkfs failed: (22) Invalid argument
... Neha Ojha
06:39 PM Bug #37393: mimic: osd-backfill-stats.sh fails in rados/standalone/osd.yaml
/a/nojha-2019-01-29_03:40:43-rados-wip-37902-mimic-2019-01-28-distro-basic-smithi/3522520/ Neha Ojha
06:38 PM Bug #38082: mimic: mon/caps.sh fails with "Expected return 0, got 110"
/a/teuthology-2018-12-29_02:30:02-rados-mimic-distro-basic-smithi/3403799/ Neha Ojha
06:36 PM Bug #38082 (New): mimic: mon/caps.sh fails with "Expected return 0, got 110"
... Neha Ojha
06:18 PM Bug #23879: test_mon_osdmap_prune.sh fails
Seen in mimic /a/nojha-2019-01-29_03:40:43-rados-wip-37902-mimic-2019-01-28-distro-basic-smithi/3522485/ Neha Ojha
05:01 PM Bug #38077 (New): Marking all OSDs as "out" does not trigger a HEALTH_ERR state
Just tested this on my local 5 OSD dev environment, but this likely applies to any given cluster: when setting the cl... Lenz Grimmer
04:37 PM Bug #38076: osds allows to partially start more than N+2
https://github.com/ceph/ceph/pull/26177 Sage Weil
04:37 PM Bug #38076 (Resolved): osds allows to partially start more than N+2
- jewel osds
- install mimic
- try to start osds. they fail because of compatset checks etc
- ... but mimic rocks...
Sage Weil
04:21 PM Bug #38034: pg stuck in backfill_wait with plenty of disk space

During preemption what ensures that backfill node processes the following messages from primary in order?
Primar...
David Zafman
06:04 AM Bug #38034: pg stuck in backfill_wait with plenty of disk space

I think this is where things went wrong. We've seen something like this in the past, I think. Here the osd.6 rese...
David Zafman
02:58 AM Bug #38034: pg stuck in backfill_wait with plenty of disk space

Here are the enter/exit lines on the primary where we entered backfilling and then went to backfill_wait for the la...
David Zafman
02:53 AM Bug #38034: pg stuck in backfill_wait with plenty of disk space

Analysis so far:
Maybe this is a backfill preemption issue. The pg is in backfill_wait state after getting Remo...
David Zafman
12:23 PM Backport #38073 (In Progress): luminous: build/ops: Allow multi instances of "make tests" on the ...
https://github.com/ceph/ceph/pull/26186 Kefu Chai
12:21 PM Backport #38073 (Resolved): luminous: build/ops: Allow multi instances of "make tests" on the sam...
https://github.com/ceph/ceph/pull/26186 Kefu Chai

01/28/2019

10:35 PM Bug #38070 (Fix Under Review): A PG repairing doesn't mean PG is damaged
David Zafman
10:35 PM Bug #38070: A PG repairing doesn't mean PG is damaged
https://github.com/ceph/ceph/pull/26178 David Zafman
10:27 PM Bug #38070 (Resolved): A PG repairing doesn't mean PG is damaged
David Zafman
09:00 PM Bug #37919 (Fix Under Review): osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid))
Neha Ojha
08:38 PM Bug #38069 (New): upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_roll...
Run: http://pulpito.ceph.com/yuriw-2019-01-24_16:20:56-upgrade:jewel-x-luminous-distro-basic-smithi/
Jobs: '3501809'...
Yuri Weinstein
03:17 PM Bug #38066 (New): "AdminSocketConfigObs::init: failed:" in upgrade:mimic-x-master
Run: http://pulpito.ceph.com/teuthology-2019-01-25_02:30:02-upgrade:mimic-x-master-distro-basic-smithi/
Jobs: all
L...
Yuri Weinstein
02:55 PM Bug #37269 (Resolved): Prioritize user specified scrubs
Nathan Cutler
02:55 PM Backport #37342 (Resolved): mimic: Prioritize user specified scrubs
Nathan Cutler
02:55 PM Bug #37507: osd_memory_target: failed assert when options mismatch
Nathan Cutler
02:55 PM Backport #37698 (Resolved): mimic: osd_memory_target: failed assert when options mismatch
Nathan Cutler
12:57 PM Bug #38064 (Duplicate): librados::OPERATION_FULL_TRY not completely implemented, test LibRadosAio...
Test LibRadosAio.PoolQuotaPP hanged on
/a/sage-2019-01-28_03:48:46-rados-wip-sage2-testing-2019-01-27-1015-distro-ba...
Sage Weil
08:09 AM Bug #38062 (Resolved): proxy write misordering
1-pg-log-overrides/short_pg_log.yaml
the cache tier osd trimmed the event for the older op on the object, which di...
Sage Weil
07:34 AM Bug #38062 (Resolved): proxy write misordering
out of order replies...... Sage Weil
03:34 AM Bug #38057 (New): "ceph -s" hangs indefinitely when a machine running a monitor has failed storage.
TL;DR; -- the bug is that "ceph -s" hangs indefinitely. It should report failure eventually.
I have a 3 node clu...
Michael Jones

01/27/2019

08:44 PM Bug #37886 (Pending Backport): Adding back the IOPS line for client and recovery IO in cluster logs
Neha Ojha
03:38 AM Bug #38053 (Fix Under Review): Add hashinfo testing for dump command of ceph-objectstore-tool
David Zafman
03:37 AM Bug #38053: Add hashinfo testing for dump command of ceph-objectstore-tool
https://github.com/ceph/ceph/pull/26158 David Zafman

01/26/2019

04:44 PM Bug #24531: Mimic MONs have slow/long running ops
I've now encountered this on a total of 3 different clusters with 13.2.2 and 13.2.4 Paul Emmerich
04:32 PM Bug #37886: Adding back the IOPS line for client and recovery IO in cluster logs
Neha Ojha
12:18 AM Bug #37886 (Pending Backport): Adding back the IOPS line for client and recovery IO in cluster logs
Once it is merged in master I can backport it to mimic and luminous. Vikhyat Umrao
12:14 AM Bug #37886: Adding back the IOPS line for client and recovery IO in cluster logs
Hi Neha,
As discussed I did some testing in luminous branch after adding this patch and changes looks great and wo...
Vikhyat Umrao
01:19 AM Bug #38053 (Resolved): Add hashinfo testing for dump command of ceph-objectstore-tool

Also, this test is broken in master, so fix that too.
David Zafman
12:50 AM Bug #27985 (Pending Backport): force-backfill sets forced_recovery instead of forced_backfill in ...
David Zafman

01/25/2019

07:14 PM Documentation #38051 (Resolved): doc/rados/configuration: refresh osdmap section
"osd map cache size" and "osd map message max" were reduced in commit
855955e ("osd: reduce size of osdmap cache, me...
Neha Ojha
04:09 PM Bug #38027: osd/osd-backfill-space.sh fails
http://pulpito.ceph.com/kchai-2019-01-25_08:53:00-rados-wip-kefu2-testing-2019-01-22-2130-distro-basic-smithi/3505875/ Kefu Chai
04:08 PM Backport #37342: mimic: Prioritize user specified scrubs
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25513
merged
Yuri Weinstein
04:08 PM Backport #37698: mimic: osd_memory_target: failed assert when options mismatch
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25605
merged
Yuri Weinstein
04:06 PM Backport #37814 (Resolved): mimic: workunits/rados/test_health_warnings.sh fails with <9 osds down
Nathan Cutler
04:04 PM Backport #37814: mimic: workunits/rados/test_health_warnings.sh fails with <9 osds down
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/25850
merged
Yuri Weinstein
02:42 PM Bug #24531: Mimic MONs have slow/long running ops
I see the same symptoms on a system running 13.2.2 - each monitor has a small number of slow ops, all initiated withi... Stig Telfer
01:05 PM Backport #38046 (In Progress): luminous: qa/overrides/short_pg_log.yaml: reduce osd_{min,max}_pg_...
Ashish Singh
10:39 AM Backport #38046 (Resolved): luminous: qa/overrides/short_pg_log.yaml: reduce osd_{min,max}_pg_log...
https://github.com/ceph/ceph/pull/26148 Nathan Cutler
12:59 PM Backport #38045 (In Progress): mimic: qa/overrides/short_pg_log.yaml: reduce osd_{min,max}_pg_log...
Ashish Singh
10:39 AM Backport #38045 (Resolved): mimic: qa/overrides/short_pg_log.yaml: reduce osd_{min,max}_pg_log_en...
https://github.com/ceph/ceph/pull/26147 Nathan Cutler
03:51 AM Cleanup #38042 (Resolved): qa/suites/rados/thrash: change crush_tunables to jewel in rados_api_tests
Neha Ojha
12:56 AM Cleanup #38042 (Fix Under Review): qa/suites/rados/thrash: change crush_tunables to jewel in rado...
Neha Ojha
12:50 AM Cleanup #38042 (Resolved): qa/suites/rados/thrash: change crush_tunables to jewel in rados_api_tests
Neha Ojha
01:20 AM Bug #38041: Fix recovery and backfill priority handling

A PG in backfill_wait can be set to force-backfill state, but the reservation request has already been queued at a ...
David Zafman
12:06 AM Bug #38041 (Resolved): Fix recovery and backfill priority handling

David Zafman
01:18 AM Cleanup #38025 (Pending Backport): qa/overrides/short_pg_log.yaml: reduce osd_{min,max}_pg_log_en...
Neha Ojha

01/24/2019

11:59 PM Bug #37393: mimic: osd-backfill-stats.sh fails in rados/standalone/osd.yaml
This is still failing in mimic.
/a/yuriw-2019-01-16_22:57:44-rados-wip-yuri3-testing-2019-01-16-2038-mimic-distro-...
Neha Ojha
11:56 PM Backport #37688: mimic: Command failed on smithi191 with status 1: '\n sudo yum -y install ceph-r...
Seeing more of these on mimic.
Kefu, is this ready to be backported?
Neha Ojha
09:05 PM Bug #38040: osd_map_message_max default is too high?
Assigning Sage, as the author of commit 855955e58e63 ("osd: reduce size of osdmap cache, messages"). Ilya Dryomov
09:04 PM Bug #38040 (Resolved): osd_map_message_max default is too high?
In a thread on ceph-users [1], three different users with fairly large clusters (~600 OSDs, ~3500 OSDs) reported runn... Ilya Dryomov
12:31 PM Bug #38034 (Resolved): pg stuck in backfill_wait with plenty of disk space
... Sage Weil
03:49 AM Bug #37886 (Fix Under Review): Adding back the IOPS line for client and recovery IO in cluster logs
Neha Ojha
01:38 AM Bug #38012: osd bad crc cause the whole cluster stop accepting new request.
Josh Durgin wrote:
> This is likely to be bad networking hardware - the CRC at the ceph level that is failing is des...
frank lin
01:15 AM Feature #38029 (Resolved): [RFE] If the nodeep-scrub/noscrub flags are set in pools instead of gl...
[RFE] If the nodeep-scrub/noscrub flags are set in pools instead of global cluster. List the pool names in the ceph s... Vikhyat Umrao
01:11 AM Bug #38027: osd/osd-backfill-space.sh fails

This doesn't look like a big deal. The test expected backfilling to finish within 2 minutes. According to the log...
David Zafman
12:23 AM Bug #38027 (Resolved): osd/osd-backfill-space.sh fails
... Sage Weil
12:24 AM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
/a/sage-2019-01-23_18:09:58-rados-wip-sage2-testing-2019-01-23-0826-distro-basic-smithi/3497934 Sage Weil

01/23/2019

10:17 PM Bug #38012: osd bad crc cause the whole cluster stop accepting new request.
This is likely to be bad networking hardware - the CRC at the ceph level that is failing is designed to detect exactl... Josh Durgin
09:18 AM Bug #38012 (New): osd bad crc cause the whole cluster stop accepting new request.
I have encounter this problem both on jewel cluster and luminous cluster.
The symptom is some request will be blocke...
frank lin
10:16 PM Bug #37975: assert failure in OSDService::shutdown()
Neha Ojha
10:15 PM Bug #37978 (Duplicate): osd killed by kernel for Segmentation fault
Josh Durgin
09:04 PM Bug #24531: Mimic MONs have slow/long running ops
I am also seeing this on latest mimic (13.2.4). So far it seems like its cosmetic and has no impact.... Tobias Rehn
08:49 PM Cleanup #38025 (Fix Under Review): qa/overrides/short_pg_log.yaml: reduce osd_{min,max}_pg_log_en...
Neha Ojha
08:45 PM Cleanup #38025 (Resolved): qa/overrides/short_pg_log.yaml: reduce osd_{min,max}_pg_log_entries
We have noticed that a very short pg log helps catch more bugs, hence make osd_min_pg_log_entries=1 and osd_max_pg_lo... Neha Ojha
08:01 PM Bug #36739: ENOENT in collection_move_rename on EC backfill target
/a/nojha-2019-01-23_02:37:14-rados:thrash-erasure-code-master-distro-basic-smithi/3494110 Sage Weil
07:58 PM Bug #38024 (Resolved): segv, heap corruption in ec encode_and_write
... Sage Weil
07:55 PM Bug #38023 (Closed): segv on FileJournal::prepare_entry in bufferlist
... Sage Weil
07:53 PM Bug #37509: require past_interval bounds mismatch due to osd oldest_map
/a/nojha-2019-01-23_02:37:14-rados:thrash-erasure-code-master-distro-basic-smithi/3494085/ Neha Ojha
05:23 PM Bug #38011: [Mimic version]extra null list in json output of command: ceph osd crush tree --forma...
Not a ceph-deploy issue but should fall under ceph Vasu Kulkarni
04:47 AM Bug #38011: [Mimic version]extra null list in json output of command: ceph osd crush tree --forma...
Changcheng Liu wrote:
> Changcheng Liu wrote:
> > The extra null list should be removed
> > [{"id":-1,"name":"d...
Changcheng Liu
04:47 AM Bug #38011: [Mimic version]extra null list in json output of command: ceph osd crush tree --forma...
Changcheng Liu wrote:
> The extra null list should be removed
> [{"id":-1,"name":"default","type":"root","type_i...
Changcheng Liu
04:46 AM Bug #38011: [Mimic version]extra null list in json output of command: ceph osd crush tree --forma...
The extra null list should be removed
[{"id":-1,"name":"default","type":"root","type_id":10,"children":[]}]*[]*
Changcheng Liu
04:40 AM Bug #38011 (Closed): [Mimic version]extra null list in json output of command: ceph osd crush tre...
When executing below command to get osd crush tree on Mimic version, it will output extra null list[] which makes jso... Changcheng Liu
11:44 AM Feature #36737: Allow multi instances of "make tests" on the same machine
https://github.com/ceph/ceph/pull/26091 Kefu Chai
11:31 AM Bug #37966 (Resolved): cli: dump osd-fsid as part of osd find <id>
Nathan Cutler
04:36 AM Bug #36498: failed to recover before timeout expired due to pg stuck in creating+peering
Still seeing PG stuck in "creating+peering".
/a/nojha-2019-01-23_02:37:14-rados:thrash-erasure-code-master-distro-...
Neha Ojha
02:47 AM Bug #37980: luminous: osd memery use very high,and missmatch between res and heap stats
Thanks a lot.
> disabling THP or setting max_ptes_none to 0
I will try this later and see if that helps. Since it ...
zhou yang

01/22/2019

05:07 PM Bug #37264: scrub warning check incorrectly uses mon scrub interval
David Zafman
05:06 PM Bug #19753 (Resolved): Deny reservation if expected backfill size would put us over backfill_full...
David Zafman
05:06 PM Bug #24801 (Resolved): PG num_bytes becomes huge
David Zafman
04:02 PM Backport #37984 (Resolved): mimic: cli: dump osd-fsid as part of osd find <id>
Neha Ojha
03:35 PM Backport #37984: mimic: cli: dump osd-fsid as part of osd find <id>
https://github.com/ceph/ceph/pull/26035 Neha Ojha
04:02 PM Backport #37985 (Resolved): luminous: cli: dump osd-fsid as part of osd find <id>
Neha Ojha
03:35 PM Backport #37985: luminous: cli: dump osd-fsid as part of osd find <id>
https://github.com/ceph/ceph/pull/26036 Neha Ojha
03:55 PM Backport #37993 (In Progress): luminous: ec pool lost data due to snap clone
Ashish Singh
03:50 PM Backport #37992 (In Progress): mimic: ec pool lost data due to snap clone
Ashish Singh
03:47 PM Bug #37980: luminous: osd memery use very high,and missmatch between res and heap stats
Hi,
Often times this kind of thing is related to transparent huge pages. There definitely seems to be different k...
Mark Nelson
02:16 PM Bug #37980: luminous: osd memery use very high,and missmatch between res and heap stats
> ceph 12.2.1
Are you really running that version, 12.2.1 ?
Nathan Cutler
03:38 AM Bug #37980: luminous: osd memery use very high,and missmatch between res and heap stats
I am using bluestore, and my client is rbd with ec datapool.
The cluster is running on Centos 7.0.1406, tcmalloc ver...
zhou yang
02:54 PM Backport #37995 (In Progress): luminous: Change osd_objectstore default to bluestore
Nathan Cutler
02:52 PM Backport #37994 (In Progress): mimic: Change osd_objectstore default to bluestore
Nathan Cutler
12:58 PM Backport #37903 (Resolved): luminous: osd: pg log hard limit can cause crash during upgrade
Nathan Cutler
12:40 PM Bug #36515 (Resolved): config options: 'services' field is empty for many config options
Nathan Cutler
09:38 AM Bug #38000 (Duplicate): The osd shutdown procedure accesses the memory that has been released
https://tracker.ceph.com/issues/37975 Igor Fedotov
03:54 AM Bug #38000: The osd shutdown procedure accesses the memory that has been released
int OSD::shutdown()
{
store->umount();
delete store; // The cache is destroyed
store = 0;
...
...
tao ning
03:52 AM Bug #38000 (Duplicate): The osd shutdown procedure accesses the memory that has been released
[Switching to thread 2 (Thread 0x7f7314cc8700 (LWP 32025))]
#0 0x00007f73395a842d in __lll_lock_wait () from /lib64...
tao ning
09:24 AM Bug #37871: Ceph cannot connect to any monitors if one of them has a DNS resolution problem
In practical terms, what's the difference between not being able to connect because the host name cannot be resolved,... Jairo Llopis
03:32 AM Bug #37871: Ceph cannot connect to any monitors if one of them has a DNS resolution problem
i think the unresolvable address(es) is more of a configuration issue. and we should not ignore this. it's quite diff... Kefu Chai

01/21/2019

10:19 PM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
Sounds like Dan's is behaving as expected, but if there's any more info about Bryan's let us know. Greg Farnum
11:58 AM Bug #37980: luminous: osd memery use very high,and missmatch between res and heap stats
And what OS are you using? Igor Fedotov
11:56 AM Bug #37980: luminous: osd memery use very high,and missmatch between res and heap stats
Are you using FileStore or BlueStore? Igor Fedotov
03:36 AM Bug #37980 (New): luminous: osd memery use very high,and missmatch between res and heap stats
ceph 12.2.1
3 nodes, 30 osds per node
ec pool:4+2
After running for 2 months,we find some osds memery use very h...
zhou yang
10:36 AM Backport #37904 (In Progress): mimic: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in Pr...
Nathan Cutler
10:05 AM Backport #37905 (In Progress): luminous: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in...
Nathan Cutler
09:16 AM Backport #37995 (Rejected): luminous: Change osd_objectstore default to bluestore
https://github.com/ceph/ceph/pull/26076 Nathan Cutler
09:16 AM Backport #37994 (Rejected): mimic: Change osd_objectstore default to bluestore
https://github.com/ceph/ceph/pull/26075 Nathan Cutler
09:15 AM Backport #37993 (Resolved): luminous: ec pool lost data due to snap clone
https://github.com/ceph/ceph/pull/26078 Nathan Cutler
09:15 AM Backport #37992 (Resolved): mimic: ec pool lost data due to snap clone
https://github.com/ceph/ceph/pull/26077 Nathan Cutler
09:14 AM Backport #37985 (Resolved): luminous: cli: dump osd-fsid as part of osd find <id>
https://github.com/ceph/ceph/pull/26036 Nathan Cutler
09:13 AM Backport #37984 (Resolved): mimic: cli: dump osd-fsid as part of osd find <id>
Nathan Cutler
02:22 AM Bug #37978 (Duplicate): osd killed by kernel for Segmentation fault
My env is:
[root@gz-ceph-52-204 ceph]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@gz-ceph-...
伟杰 谭

01/20/2019

06:31 AM Bug #37975 (Fix Under Review): assert failure in OSDService::shutdown()
Kefu Chai
05:27 AM Bug #37975: assert failure in OSDService::shutdown()
the return value was 22, as the mutex being acquired was destroyed already. Kefu Chai
05:12 AM Bug #37975 (Resolved): assert failure in OSDService::shutdown()
... Kefu Chai
02:38 AM Bug #37593 (Pending Backport): ec pool lost data due to snap clone
Kefu Chai

01/19/2019

04:23 PM Backport #37972 (In Progress): luminous: FreeBSD/Linux integration - monitor map with wrong sa_fa...
Mykola Golub
04:22 PM Backport #37972: luminous: FreeBSD/Linux integration - monitor map with wrong sa_family
PR: https://github.com/ceph/ceph/pull/26042 Mykola Golub
04:06 PM Backport #37972: luminous: FreeBSD/Linux integration - monitor map with wrong sa_family
Need to backport https://github.com/ceph/ceph/pull/17615/commits/9099ca599de5238cde917f1e1f933247392de03e Mykola Golub
04:05 PM Backport #37972 (Resolved): luminous: FreeBSD/Linux integration - monitor map with wrong sa_family
https://github.com/ceph/ceph/pull/26042 Mykola Golub
09:23 AM Backport #37438 (Resolved): luminous: crushtool: add --reclassify operation to convert legacy cru...
Mykola Golub
02:07 AM Bug #37969 (Can't reproduce): ENOENT on setattrs
... Sage Weil

01/18/2019

11:04 PM Bug #23145: OSD crashes during recovery of EC pg
I've generated a log for this at https://www.dropbox.com/s/8zoos5hhvakcpc4/ceph-osd.3.log?dl=0
haven't been able t...
Peter Woodman
10:43 PM Bug #37968 (Resolved): maybe_remove_pg_upmaps incorrectly cancels valid pending upmaps
It appears that OSDMap::maybe_remove_pg_upmaps's sanity checks are overzealous. With some crush rules it is possible ... Ed Fisher
09:56 PM Backport #37438: luminous: crushtool: add --reclassify operation to convert legacy crush maps to ...
Mykola Golub wrote:
> https://github.com/ceph/ceph/pull/25307
merged
Yuri Weinstein
08:38 PM Backport #37903: luminous: osd: pg log hard limit can cause crash during upgrade
Neha Ojha wrote:
> https://github.com/ceph/ceph/pull/25949
merged
Yuri Weinstein
07:22 PM Backport #37903: luminous: osd: pg log hard limit can cause crash during upgrade
https://github.com/ceph/ceph/pull/25949 Neha Ojha
01:11 PM Backport #37903 (Need More Info): luminous: osd: pg log hard limit can cause crash during upgrade
Marking "Need More Info" just to make sure backporting team doesn't take it by accident. Nathan Cutler
07:44 PM Bug #37966 (Resolved): cli: dump osd-fsid as part of osd find <id>
https://github.com/ceph/ceph/pull/26015 Neha Ojha
05:32 PM Bug #37965 (Can't reproduce): rados/upgrade test fails
recent regression. looking at /a/sage-2019-01-18_06:11:36-rados-wip-sage-testing-2019-01-17-2111-distro-basic-smithi... Sage Weil
02:37 PM Bug #24676 (Pending Backport): FreeBSD/Linux integration - monitor map with wrong sa_family
Richard, i don't think 9099ca5 was ever backported to luminous. if you want to get it fixed sooner in luminous. proba... Kefu Chai
01:26 PM Bug #36515: config options: 'services' field is empty for many config options
I think with https://github.com/ceph/ceph/pull/25456 the issue can be resolved. I'm not allowed to do it myself. Tatjana Dehler
01:11 PM Backport #37902 (Need More Info): mimic: osd: pg log hard limit can cause crash during upgrade
Marking "Need More Info" just to make sure backporting team doesn't take it by accident. Nathan Cutler
03:29 AM Bug #19753: Deny reservation if expected backfill size would put us over backfill_full_ratio
Neha Ojha
12:51 AM Bug #36494 (Pending Backport): Change osd_objectstore default to bluestore
Sage Weil

01/17/2019

03:43 PM Bug #37910: segv during crc of incoming message front
Putting on shelf in the sake of msgr V2.
Runs on wip-bug-37910 with **client** failures:
* http://pulpito.ceph.co...
Radoslaw Zarzynski
11:41 AM Bug #36741 (Resolved): debian: packaging need to reflect move of /etc/bash_completion.d/radosgw-a...
Nathan Cutler
11:40 AM Backport #37274 (Resolved): luminous: debian: packaging need to reflect move of /etc/bash_complet...
Nathan Cutler

01/16/2019

02:40 PM Bug #37910: segv during crc of incoming message front
Hmm, interesting. The same thread 0x7f6ea2dad700 is handling two instances of AsyncConnection: 0x5615360ef000
and th...
Radoslaw Zarzynski
10:32 AM Backport #37806 (Resolved): luminous: OSD logs are not logging slow requests
Nathan Cutler
10:15 AM Feature #37935 (Resolved): Add clear-data-digest command to objectstore tool
There may be a situation where data digest in object info is
inconsistent with that computed from object data, then ...
Nathan Cutler
12:42 AM Bug #37930 (New): osd/PrimaryLogPG.cc: 11997: FAILED ceph_assert(object_c ontexts.empty())
@2019-01-15T11:29:05.078 INFO:tasks.ceph.osd.1.smithi055.stderr:2019-01-15 11:29:05.069 7f35017d1700 -1 osd.1 pg_epoc... xie xingguo

01/15/2019

09:37 PM Bug #37910: segv during crc of incoming message front
... Radoslaw Zarzynski
08:34 PM Bug #37910: segv during crc of incoming message front
... Radoslaw Zarzynski
05:18 PM Bug #37919: osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid))
Looks like we are testing with leveldb here, not sure that matters for the purpose of this bug, but we could get rid ... Neha Ojha
01:14 PM Bug #37919 (Resolved): osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid))
... Sage Weil
03:52 PM Bug #36163 (Resolved): mon osdmap cash too small during upgrade to mimic
Nathan Cutler
03:51 PM Backport #36506 (Resolved): luminous: mon osdmap cash too small during upgrade to mimic
Nathan Cutler
03:35 PM Backport #37343 (Resolved): luminous: Prioritize user specified scrubs
Nathan Cutler
03:34 PM Backport #37697 (Resolved): luminous: osd_memory_target: failed assert when options mismatch
Nathan Cutler
02:40 PM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
Hrm.. actually, after enabling debug_paxos=10 on the mon leader, I see that there's a hysteresis between 500 and 750:... Dan van der Ster
02:12 PM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
I just updated a cluster from v12.2.8 to 12.2.10.
At the beginning we had:
"oldest_map": 281368,
"newes...
Dan van der Ster
01:56 PM Bug #22597 (Resolved): "sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-0'" fails in upgrade test
Nathan Cutler
01:56 PM Backport #37288 (Resolved): mimic: "sudo chown -R ceph:ceph /var/lib/ceph/osd/ceph-0'" fails in u...
Nathan Cutler
09:14 AM Bug #24531: Mimic MONs have slow/long running ops
I've seen this on a 13.2.2 cluster after restarting OSDs Paul Emmerich
06:55 AM Backport #37904: mimic: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::repl...
https://github.com/ceph/ceph/pull/25958 xie xingguo
06:06 AM Documentation #24924 (Resolved): doc: typo in crush-map docs
xie xingguo
04:07 AM Documentation #24924: doc: typo in crush-map docs
Don't care one way or another. Go ahead if you want.
Michael Jones
03:52 AM Documentation #24924: doc: typo in crush-map docs
Hi Michael,
Thank you for reporting this typo. I opened a PR to correct. Is it ok if I
credit you for this repo...
James McClune
05:54 AM Backport #37905: luminous: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::r...
https://github.com/ceph/ceph/pull/25956 xie xingguo

01/14/2019

11:49 PM Bug #37915 (Can't reproduce): osd: Segmentation fault in OpRequest::_unregistered
... Patrick Donnelly
03:06 PM Bug #37911 (Can't reproduce): osd dequeue misorder
... Sage Weil
01:25 PM Bug #37910 (Resolved): segv during crc of incoming message front
... Sage Weil
01:21 PM Feature #36474 (Resolved): Add support for osd_delete_sleep configuration value
Nathan Cutler
01:21 PM Backport #36729 (Resolved): mimic: Add support for osd_delete_sleep configuration value
Nathan Cutler
10:43 AM Backport #37905 (Resolved): luminous: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in Pr...
https://github.com/ceph/ceph/pull/25956 Nathan Cutler
10:43 AM Backport #37904 (Resolved): mimic: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in Proto...
https://github.com/ceph/ceph/pull/25958 Nathan Cutler
10:42 AM Backport #37903 (Resolved): luminous: osd: pg log hard limit can cause crash during upgrade
https://github.com/ceph/ceph/pull/25949 Nathan Cutler
10:42 AM Backport #37902 (Resolved): mimic: osd: pg log hard limit can cause crash during upgrade
https://github.com/ceph/ceph/pull/26206 Nathan Cutler
 

Also available in: Atom