Project

General

Profile

Activity

From 08/21/2019 to 09/19/2019

09/19/2019

11:12 PM Bug #41817: qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
This fix for this particular issue is to just disable auto scaler because it just causes a hang in the test but no cr... David Zafman
10:59 PM Bug #41923: 3 different ceph-osd asserts caused by enabling auto-scaler

I think this stack better reflects the thread that hit the suicide timeout. However, everytime I've seen this thre...
David Zafman
09:41 PM Bug #41923: 3 different ceph-osd asserts caused by enabling auto-scaler

Look at the assert(op.hinfo) it is caused by the corruption injected by the test. I'll verify that the asserts are...
David Zafman
12:05 AM Bug #41923 (Can't reproduce): 3 different ceph-osd asserts caused by enabling auto-scaler

Change config osd_pool_default_pg_autoscale_mode to "on"
Saw these 4 core dumps on 3 different sub-tests.
../...
David Zafman
04:51 PM Bug #41936 (Fix Under Review): scrub errors after quick split/merge cycle
Sage Weil
04:51 PM Bug #41936 (Resolved): scrub errors after quick split/merge cycle
PGs split and then merge soon after. There is a pg stat scrub mismatch. Sage Weil
04:48 PM Bug #41834: qa: EC Pool configuration and slow op warnings for OSDs caused by recent master changes
This shows up in rgw's ec pool tests also. In osd logs, I see slow ops on MOSDECSubOpRead/Reply messages, and they al... Casey Bodley
09:32 AM Feature #41647: pg_autoscaler should show a warning if pg_num isn't a power of two
Note: contrary to what the bug description says, pg_autoscaler will (apparently) *not* be automatically turned on wit... Nathan Cutler
01:56 AM Bug #41924 (Resolved): asynchronous recovery can not function under certain circumstances
guoracle report that:
> In the asynchronous recovery feature,
> the asynchronous recovery target OSD is selected ...
xie xingguo
01:39 AM Bug #41866: OSD cannot report slow operation warnings in time.
*report_callback* thread is also blocked on PG::lock with MGRClient::lock locked while getting the pg stats. This in ... Ilsoo Byun
12:54 AM Bug #41816: Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert info.last_comp...

This can be reproduced by setting config osd_pool_default_pg_autoscale_mode="on" and executing this test:
../qa/...
David Zafman
12:29 AM Bug #41754: Use dump_stream() instead of dump_float() for floats where max precision isn't helpful

I was suspicious that the trailing 0999999994 in the elapsed time is noise. Could this be caused by a float being...
David Zafman

09/18/2019

06:33 PM Backport #41922 (Resolved): mimic: OSDMonitor: missing `pool_id` field in `osd pool ls` command
https://github.com/ceph/ceph/pull/30485
Nathan Cutler
06:33 PM Backport #41921 (Resolved): nautilus: OSDMonitor: missing `pool_id` field in `osd pool ls` command
https://github.com/ceph/ceph/pull/30486 Nathan Cutler
06:31 PM Backport #41920 (Resolved): nautilus: osd: scrub error on big objects; make bluestore refuse to s...
https://github.com/ceph/ceph/pull/30783 Nathan Cutler
06:31 PM Backport #41919 (Resolved): luminous: osd: scrub error on big objects; make bluestore refuse to s...
https://github.com/ceph/ceph/pull/30785 Nathan Cutler
06:31 PM Backport #41918 (Resolved): mimic: osd: scrub error on big objects; make bluestore refuse to star...
https://github.com/ceph/ceph/pull/30784 Nathan Cutler
06:31 PM Backport #41917 (Resolved): nautilus: osd: failure result of do_osd_ops not logged in prepare_tra...
https://github.com/ceph/ceph/pull/30546 Nathan Cutler
04:25 PM Bug #41900 (Resolved): auto-scaler breaks many standalone tests
David Zafman
03:38 PM Bug #41913 (Resolved): With auto scaler operating stopping an OSD can lead to COT crashing instea...
... David Zafman
03:03 PM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
Answering myself - seems that rbd_support cannot be disabled anyway
# ceph mgr module disable rbd_support
Error E...
Marcin Gibula
10:59 AM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
I don't believe this command was running at that time, however "rbd_support" mgr module was active. Could this be the... Marcin Gibula
10:53 AM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
Marcin, I believe I know the cause and I am now discussing the fix [1]. A workaround could be not to use "rbd perf im... Mykola Golub
10:13 AM Bug #41891 (Fix Under Review): global osd crash in DynamicPerfStats::add_to_reports
Mykola Golub
06:24 AM Bug #41891 (In Progress): global osd crash in DynamicPerfStats::add_to_reports
Mykola Golub
01:55 PM Bug #41908 (Fix Under Review): TMAPUP operation results in OSD assertion failure
Jason Dillaman
01:47 PM Bug #41908 (Resolved): TMAPUP operation results in OSD assertion failure
In 'do_tmapup', the object is READ into a 'newop' structure and then when it is re-written, the same 'newop' structur... Jason Dillaman
10:52 AM Bug #41677: Cephmon:fix mon crash
@shuguang what is the exact version of ceph-mon? i cannot match the backtrace with the source code of master HEAD. Kefu Chai
09:46 AM Feature #41905 (New): Add ability to change fsid of cluster
There is a case where you want to change the fsid of a cluster: When you have splitted a cluster into two different c... Wido den Hollander

09/17/2019

09:50 PM Bug #41900 (Resolved): auto-scaler breaks many standalone tests

Caused by https://github.com/ceph/ceph/pull/30112
In some cases I had to kill processes to get past hung tests. ...
David Zafman
08:46 PM Bug #41816: Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert info.last_comp...
This crash didn't reproduce for me using run-standalone.sh with the auto scaler turned off. David Zafman
08:35 PM Bug #40287 (Pending Backport): OSDMonitor: missing `pool_id` field in `osd pool ls` command
Neha Ojha
08:30 PM Bug #41191 (Pending Backport): osd: scrub error on big objects; make bluestore refuse to start on...
Neha Ojha
08:29 PM Bug #41210 (Pending Backport): osd: failure result of do_osd_ops not logged in prepare_transactio...
@shuguang wang did you want this to be backported to a release older than nautilus? Neha Ojha
06:59 PM Bug #41336: All OSD Faild after Reboot.
Hi,
two questions:
- How to find out if a pool is affected?
"ceph osd erasure-code-profile get" does not list...
Oliver Freyermuth
05:04 PM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
Yes, I use "rbd perf image iotop/iostat" (one of the reasons for upgrade:-) ). Not exporting per image data with prom... Marcin Gibula
03:51 PM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
Marcin, are you using `rbd perf image iotop|iostat` commands? Or may be prometheus mgr module with rbd per image stat... Mykola Golub
01:49 PM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
As crash seems to be related to stats reporting - don't know if it is related, but it was soon after eliminating "Leg... Marcin Gibula
10:30 AM Bug #41891 (Resolved): global osd crash in DynamicPerfStats::add_to_reports
Hi,
during routine host maintenance, I've encountered massive osd crash across entire cluster. The sequence of event...
Marcin Gibula
01:19 PM Feature #40420 (Need More Info): Introduce an ceph.conf option to disable HEALTH_WARN when nodeep...
https://github.com/ceph/ceph/pull/29422 has been merged, but not yet backported Nathan Cutler
08:05 AM Bug #41754: Use dump_stream() instead of dump_float() for floats where max precision isn't helpful
Regarding elapsed time it might be important (for `compact` is not, but for benchmarking is). Another importatnat thi... Марк Коренберг
06:15 AM Backport #41238 (In Progress): nautilus: Implement mon_memory_target
Sridhar Seshasayee

09/16/2019

10:10 PM Cleanup #41876 (Fix Under Review): tools/rados: add --pgid in help
Vikhyat Umrao
10:09 PM Cleanup #41876 (Resolved): tools/rados: add --pgid in help
Vikhyat Umrao
09:39 PM Bug #41817 (In Progress): qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
This is likely cause by enabling of auto scaler. David Zafman
03:27 PM Bug #41817: qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
/a/kchai-2019-09-15_15:37:26-rados-wip-kefu-testing-2019-09-15-1533-distro-basic-mira/4311115/
/a/pdonnell-2019-09-1...
Kefu Chai
08:05 PM Bug #41875 (Fix Under Review): Segmentation fault in rados ls when using --pgid and --pool/-p tog...
Vikhyat Umrao
07:55 PM Bug #41875 (Resolved): Segmentation fault in rados ls when using --pgid and --pool/-p together as...
- Works fine with only --pgid... Vikhyat Umrao
07:57 PM Bug #41816: Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert info.last_comp...
Reproduced with logs: /a/nojha-2019-09-13_21:45:51-rados:standalone-master-distro-basic-smithi/4304313/remote/smithi1... Neha Ojha
03:25 PM Bug #40522: on_local_recover doesn't touch?
/a/pdonnell-2019-09-14_22:40:03-rados-master-distro-basic-smithi/4307679/
/a/kchai-2019-09-15_15:37:26-rados-wip-kef...
Kefu Chai
03:23 PM Bug #41874 (Resolved): mon-osdmap-prune.sh fails
... Kefu Chai
03:19 PM Bug #41873 (Resolved): test-erasure-code.sh fails
... Kefu Chai
01:46 PM Backport #41238: nautilus: Implement mon_memory_target
The old PR is unlinked from the tracker as more commits need to be pulled in for this backport. I will update this tr... Sridhar Seshasayee
01:04 PM Backport #41238 (Need More Info): nautilus: Implement mon_memory_target
first attempted backport https://github.com/ceph/ceph/pull/29652 was closed - apparently, the backport is not trivial... Nathan Cutler
01:23 PM Backport #40993: mimic: Ceph status in some cases does not report slow ops
just for completeness - the mimic fix is (I think): https://github.com/ceph/ceph/pull/30391 Nathan Cutler
10:39 AM Bug #41866: OSD cannot report slow operation warnings in time.
assumed that bluestore is used. Ilsoo Byun
10:23 AM Bug #41866 (Fix Under Review): OSD cannot report slow operation warnings in time.
If an underlying device is blocked due to H/W issues, a thread that checks slow ops can’t report slow op warning in t... Ilsoo Byun
07:21 AM Backport #41864 (Resolved): luminous: Mimic MONs have slow/long running ops
https://github.com/ceph/ceph/pull/30519 Nathan Cutler
07:21 AM Backport #41863 (Resolved): mimic: Mimic MONs have slow/long running ops
https://github.com/ceph/ceph/pull/30481 Nathan Cutler
07:21 AM Backport #41862 (Resolved): nautilus: Mimic MONs have slow/long running ops
https://github.com/ceph/ceph/pull/30480 Nathan Cutler
07:14 AM Backport #41845 (Resolved): luminous: tools/rados: allow list objects in a specific pg in a pool
https://github.com/ceph/ceph/pull/30608 Nathan Cutler
07:14 AM Backport #41844 (Resolved): mimic: tools/rados: allow list objects in a specific pg in a pool
https://github.com/ceph/ceph/pull/30893 Nathan Cutler

09/15/2019

01:59 PM Bug #41716 (Resolved): LibRadosTwoPoolsPP.ManifestUnset fails
Myoungwon Oh
01:51 PM Bug #41716: LibRadosTwoPoolsPP.ManifestUnset fails
This issue is fixed by https://github.com/ceph/ceph/pull/29985
When the error occurs, the following ops are executed...
Myoungwon Oh
03:05 AM Bug #41834 (Resolved): qa: EC Pool configuration and slow op warnings for OSDs caused by recent m...
See: http://pulpito.ceph.com/pdonnell-2019-09-14_22:39:31-fs-master-distro-basic-smithi/
Recent run of fs suite on...
Patrick Donnelly

09/13/2019

10:29 PM Feature #41831 (Resolved): tools/rados: allow list objects in a specific pg in a pool
This one is already present in nautilus. Vikhyat Umrao
04:41 PM Bug #41817: qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
David, can you please take a look at this whenever you get a chance. Neha Ojha
01:31 PM Bug #41817 (Closed): qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
... Sage Weil
04:40 PM Bug #41816: Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert info.last_comp...
I'll try to see if I can reproduce this. Neha Ojha
01:30 PM Bug #41816 (Resolved): Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert inf...
... Sage Weil
04:37 PM Bug #41735 (Resolved): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
See https://tracker.ceph.com/issues/41735#note-3 and https://github.com/rook/rook/pull/3847/commits/11d3831d742639148... Neha Ojha
04:29 PM Bug #24531 (Pending Backport): Mimic MONs have slow/long running ops
Neha Ojha
09:09 AM Backport #40993 (Rejected): mimic: Ceph status in some cases does not report slow ops
backports will be pursued in https://tracker.ceph.com/issues/41741 Nathan Cutler
07:54 AM Bug #41758 (Duplicate): Ceph status in some cases does not report slow ops
Kefu Chai
05:13 AM Feature #40420: Introduce an ceph.conf option to disable HEALTH_WARN when nodeep-scrub/scrub flag...
What is the back port targets for this? I don't see a health mute tracker referenced by any of the commits, but this... David Zafman
01:55 AM Backport #41712 (In Progress): nautilus: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::reg...
https://github.com/ceph/ceph/pull/30371 Prashant D

09/12/2019

10:38 PM Backport #40993: mimic: Ceph status in some cases does not report slow ops
Nathan Cutler wrote:
> backport ticket opened prematurely - setting "Need More Info" pending:
>
> 1. opening of P...
Neha Ojha
08:19 PM Backport #40993 (Need More Info): mimic: Ceph status in some cases does not report slow ops
backport ticket opened prematurely - setting "Need More Info" pending:
1. opening of PR fixing the issue in master...
Nathan Cutler
08:18 PM Backport #40993 (New): mimic: Ceph status in some cases does not report slow ops
Nathan Cutler
11:58 AM Backport #40993: mimic: Ceph status in some cases does not report slow ops
Converting this to track backport from master where the fix is under review. Sridhar Seshasayee
02:03 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
We had to scrap the idea of changing the backend and went for upgrading the OSDs to Bluestore. Our backfilling issue ... David Turner
01:58 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
David:
Did you run into a solution for this? We're seeing similar issues but the only possible alternative seems ...
Mohammed Naser
08:32 AM Backport #41785 (Resolved): nautilus: Make dumping of reservation info congruent between scrub an...
https://github.com/ceph/ceph/pull/31444 Nathan Cutler
05:41 AM Backport #41764 (In Progress): nautilus: TestClsRbd.sparsify fails when using filestore
https://github.com/ceph/ceph/pull/30354 Prashant D
02:24 AM Bug #23647 (In Progress): thrash-eio test can prevent recovery
http://pulpito.ceph.com/nojha-2019-09-06_14:33:54-rados:singleton-wip-41385-3-distro-basic-smithi/ - this is where I ... Neha Ojha
01:22 AM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...

Reproduced several times with debug_ms = 20
http://pulpito.ceph.com/dzafman-2019-09-11_15:28:37-rados-wip-zafman...
David Zafman
01:21 AM Bug #41735: pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
sorry I missed that... Vasu Kulkarni

09/11/2019

10:28 PM Bug #41735 (Fix Under Review): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
Rook should probably set this option explicitly, since it is working with nautilus and we won't backport this (or the... Sage Weil
09:29 PM Bug #41735 (Need More Info): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
can you attach the 'ceph health detail' output so i can see which warning it's throwing? Sage Weil
09:33 PM Bug #41669 (Pending Backport): Make dumping of reservation info congruent between scrub and recovery
David Zafman
09:11 PM Bug #41680 (Won't Fix): Removed OSDs with outstanding peer failure reports crash the monitor
OSD failure reports will die out on their own eventually and there's no general reason to expect a removed OSD was in... Greg Farnum
09:11 PM Bug #41639 (Rejected): mon/MgrMonitor: enable pg_autoscaler by default for nautilus
Neha Ojha
09:10 PM Bug #41693 (Need More Info): a accidental problems with osd detection algorithm in monitor
Can you explain in more detail exactly what happened here?
It sounds like you have three hosts with colocated OSDs...
Greg Farnum
09:08 PM Bug #41718 (Fix Under Review): ceph osd stat JSON output incomplete
Neha Ojha
03:28 PM Bug #41758 (Fix Under Review): Ceph status in some cases does not report slow ops
Neha Ojha
01:13 PM Bug #41758: Ceph status in some cases does not report slow ops
After applying the fix, health warning pertaining to slow ops show up as shown below,... Sridhar Seshasayee
12:57 PM Bug #41758: Ceph status in some cases does not report slow ops
PR https://github.com/ceph/ceph/pull/30337 addresses this issue. Sridhar Seshasayee
09:29 AM Bug #41758 (Duplicate): Ceph status in some cases does not report slow ops
In cases when only osds report slow ops, it is observed that ceph summary status doesn't report the same. This issue ... Sridhar Seshasayee
01:28 PM Backport #41764 (Resolved): nautilus: TestClsRbd.sparsify fails when using filestore
https://github.com/ceph/ceph/pull/30354 Nathan Cutler
09:14 AM Backport #40993: mimic: Ceph status in some cases does not report slow ops
Further to my findings earlier, I confirmed that the "reported" flag is being reset in case ONLY an osd daemon report... Sridhar Seshasayee
04:08 AM Bug #41754 (New): Use dump_stream() instead of dump_float() for floats where max precision isn't ...

Some examples from osd dump are below. The full_ratio is .95, backfill_ratio .90 and nearfull_ratio .85.
<pre...
David Zafman
01:25 AM Bug #41661 (Resolved): radosbench_omap_write cleanup slow/stuck
Neha Ojha
12:25 AM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...
David Zafman
12:24 AM Bug #41743 (In Progress): Long heartbeat ping times on front interface seen, longest is 2237.999 ...
David Zafman

09/10/2019

10:42 PM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...
The only OSDs involved are osd.6 and osd.0.
Slow heartbeat ping on front interface from osd.6 to osd.0 2237.999 ms...
David Zafman
12:12 PM Bug #41743 (Resolved): Long heartbeat ping times on front interface seen, longest is 2237.999 mse...
"2019-09-09T22:25:11.794749+0000 mon.b (mon.0) 389 : cluster [WRN] Health check failed: Long heartbeat ping times on ... Sage Weil
08:21 PM Bug #41661 (Fix Under Review): radosbench_omap_write cleanup slow/stuck
Neha Ojha
07:54 PM Bug #41661: radosbench_omap_write cleanup slow/stuck
Clearly, filestore-xfs.yaml is the one failing consistently.
See http://pulpito.ceph.com/nojha-2019-09-09_23:22:30...
Neha Ojha
05:03 PM Backport #40082 (In Progress): luminous: osd: Better error message when OSD count is less than os...
Nathan Cutler
02:59 PM Bug #41748 (Can't reproduce): log [ERR] : 7.19 caller_ops.size 62 > log size 61
... Sage Weil
08:27 AM Bug #41721 (Pending Backport): TestClsRbd.sparsify fails when using filestore
Kefu Chai
06:45 AM Backport #41640 (In Progress): nautilus: FAILED ceph_assert(info.history.same_interval_since != 0...
Nathan Cutler
06:36 AM Backport #41530 (Resolved): mimic: doc: mon_health_to_clog_* values flipped
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30227
m...
Nathan Cutler
06:34 AM Backport #41532 (Resolved): luminous: Move bluefs alloc size initialization log message to log le...
Nathan Cutler
06:32 AM Backport #38551: luminous: core: lazy omap stat collection
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29190
m...
Nathan Cutler
05:42 AM Backport #41703 (In Progress): nautilus: oi(object_info_t).size does not match on disk size
https://github.com/ceph/ceph/pull/30278 Prashant D
03:55 AM Backport #41704 (In Progress): mimic: oi(object_info_t).size does not match on disk size
https://github.com/ceph/ceph/pull/30275 Prashant D
01:01 AM Bug #41735 (Resolved): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
Old pools have auto_scale on and ceph health still shows HEALTH_WARN (20 < 30)... Vasu Kulkarni

09/09/2019

11:38 PM Bug #41661: radosbench_omap_write cleanup slow/stuck
The current timeout (config.get('time', 360) * 30 + 300 = 300*30 + 300) of 9300 seconds is not enough to clean up the... Neha Ojha
10:25 PM Feature #38136 (Resolved): core: lazy omap stat collection
Brad Hubbard
10:25 PM Backport #38551 (Resolved): luminous: core: lazy omap stat collection
Brad Hubbard
09:45 PM Bug #41601: oi(object_info_t).size does not match on disk size
Greg Farnum wrote:
> Hmm I was going to move this into the RADOS project tracker but now I'm leaving it because I'm ...
Nathan Cutler
08:20 PM Bug #41601: oi(object_info_t).size does not match on disk size
Hmm I was going to move this into the RADOS project tracker but now I'm leaving it because I'm not sure if that will ... Greg Farnum
09:35 PM Backport #41731 (Need More Info): nautilus: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(pe...
note that the backport of https://github.com/ceph/ceph/pull/30059 should happen after https://github.com/ceph/ceph/pu... Nathan Cutler
07:39 PM Backport #41731 (Rejected): nautilus: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_mis...
Nathan Cutler
09:34 PM Backport #41732 (Need More Info): mimic: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_...
Nathan Cutler
09:33 PM Backport #41732: mimic: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fro...
note that the backport of https://github.com/ceph/ceph/pull/30059 should happen after https://github.com/ceph/ceph/pu... Nathan Cutler
07:39 PM Backport #41732 (Rejected): mimic: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missin...
Nathan Cutler
09:33 PM Backport #41730 (Need More Info): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(pe...
note that the backport of https://github.com/ceph/ceph/pull/30059 should happen after https://github.com/ceph/ceph/pu... Nathan Cutler
07:39 PM Backport #41730 (Resolved): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_mis...
https://github.com/ceph/ceph/pull/31855 Nathan Cutler
09:03 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
Nathan Cutler wrote:
> @Neha - backport all three PRs?
Yes, note that the backport of https://github.com/ceph/cep...
Neha Ojha
07:41 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
@Neha - backport all three PRs? Nathan Cutler
04:53 PM Bug #41385 (Pending Backport): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.co...
Neha Ojha
08:51 PM Bug #41065 (Closed): new osd added to cluster upgraded from 13 to 14 will down after some days
It's not clear from these snippets what issue you're actually experiencing. The "bad authorizer" suggests either a cl... Greg Farnum
08:37 PM Bug #41406: common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient bootstrap
That's a weird one; perhaps the MonClient should behave differently instead.
(Note that this is a problem only on ...
Greg Farnum
04:20 PM Bug #41689 (Resolved): Network ping test fails in TEST_network_ping_test2
This is a follow on fix for the feature https://tracker.ceph.com/issues/40640. The backport is included as part of t... David Zafman
10:50 AM Bug #41721 (Fix Under Review): TestClsRbd.sparsify fails when using filestore
Kefu Chai
10:24 AM Bug #41721 (Resolved): TestClsRbd.sparsify fails when using filestore
it's a regression introduced by https://github.com/ceph/ceph/pull/30061
see http://pulpito.ceph.com/kchai-2019-09-...
Kefu Chai

09/08/2019

06:16 PM Bug #41718 (Resolved): ceph osd stat JSON output incomplete
... Марк Коренберг
09:22 AM Bug #40583 (Resolved): Lower the default value of osd_deep_scrub_large_omap_object_key_threshold
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:20 AM Backport #40653 (Resolved): luminous: Lower the default value of osd_deep_scrub_large_omap_object...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29175
m...
Nathan Cutler

09/07/2019

08:07 PM Bug #41716 (Resolved): LibRadosTwoPoolsPP.ManifestUnset fails
... Sage Weil
09:29 AM Backport #41712 (Resolved): nautilus: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::regist...
https://github.com/ceph/ceph/pull/30371 Nathan Cutler
09:23 AM Backport #41705 (Resolved): nautilus: Incorrect logical operator in Monitor::handle_auth_request()
https://github.com/ceph/ceph/pull/31038 Nathan Cutler
09:23 AM Backport #41704 (Resolved): mimic: oi(object_info_t).size does not match on disk size
https://github.com/ceph/ceph/pull/30275 Nathan Cutler
09:23 AM Backport #41703 (Resolved): nautilus: oi(object_info_t).size does not match on disk size
https://github.com/ceph/ceph/pull/30278 Nathan Cutler
09:23 AM Backport #41702 (Rejected): luminous: oi(object_info_t).size does not match on disk size
Nathan Cutler
07:45 AM Backport #41697 (In Progress): luminous: Network ping monitoring
Nathan Cutler
07:31 AM Backport #41697 (Resolved): luminous: Network ping monitoring
https://github.com/ceph/ceph/pull/30230 Nathan Cutler
07:43 AM Backport #41696 (In Progress): mimic: Network ping monitoring
Nathan Cutler
07:31 AM Backport #41696 (Resolved): mimic: Network ping monitoring
https://github.com/ceph/ceph/pull/30225 Nathan Cutler
07:34 AM Backport #41695 (In Progress): nautilus: Network ping monitoring
Nathan Cutler
07:31 AM Backport #41695 (Resolved): nautilus: Network ping monitoring
https://github.com/ceph/ceph/pull/30195 Nathan Cutler
02:35 AM Bug #41693 (Need More Info): a accidental problems with osd detection algorithm in monitor
There is a accidental problems with osd detection algorithm in monitor. In a three-cluster environment,HostA/HostB/Ho... shuguang wang

09/06/2019

11:49 PM Backport #41531 (In Progress): nautilus: Move bluefs alloc size initialization log message to log...
Vikhyat Umrao
10:15 PM Backport #41531 (Need More Info): nautilus: Move bluefs alloc size initialization log message to ...
non-trivial backport - needs https://github.com/ceph/ceph/pull/29537 at least Nathan Cutler
11:38 PM Bug #41385 (Fix Under Review): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.co...
https://github.com/ceph/ceph/pull/30119 (merged September 4, 2019)
https://github.com/ceph/ceph/pull/30059 (merged S...
Neha Ojha
10:21 PM Backport #41533 (In Progress): mimic: Move bluefs alloc size initialization log message to log le...
Vikhyat Umrao
10:14 PM Backport #41533 (Need More Info): mimic: Move bluefs alloc size initialization log message to log...
non-trivial backport - needs https://github.com/ceph/ceph/pull/29537 at least Nathan Cutler
10:03 PM Backport #41530 (In Progress): mimic: doc: mon_health_to_clog_* values flipped
Nathan Cutler
08:01 PM Backport #41499 (Need More Info): mimic: backfill_toofull while OSDs are not full (Unneccessary H...
The backport needs 3b8f86c8b09b9143d3e25ab34b51057581b48114 to be cherry-picked, first, for it to make sense, but tha... Nathan Cutler
03:34 PM Backport #41499 (In Progress): mimic: backfill_toofull while OSDs are not full (Unneccessary HEAL...
Nathan Cutler
07:42 PM Backport #41502 (In Progress): mimic: Warning about past_interval bounds on deleting pg
Nathan Cutler
07:03 PM Bug #41689: Network ping test fails in TEST_network_ping_test2
... David Zafman
06:37 PM Bug #41689 (Fix Under Review): Network ping test fails in TEST_network_ping_test2
David Zafman
06:18 PM Bug #41689 (Resolved): Network ping test fails in TEST_network_ping_test2
http://pulpito.ceph.com/kchai-2019-09-06_15:05:18-rados-wip-kefu-testing-2019-09-06-1807-distro-basic-smithi/4283774/... David Zafman
05:27 PM Bug #41429 (Pending Backport): Incorrect logical operator in Monitor::handle_auth_request()
Kefu Chai
05:08 PM Bug #38513: luminous: "AsyncReserver.h: 190: FAILED assert(!queue_pointers.count(item) && !in_pro...
/a/nojha-2019-09-05_23:53:20-rados-wip-40769-luminous-distro-basic-smithi/4279855/ Neha Ojha
03:29 PM Backport #41490 (In Progress): mimic: OSDCap.PoolClassRNS test aborts
Nathan Cutler
03:28 PM Backport #41449 (In Progress): mimic: mon: C_AckMarkedDown has not handled the Callback Arguments
Nathan Cutler
01:29 PM Backport #40993 (In Progress): mimic: Ceph status in some cases does not report slow ops
The logs relating to this tracker didn't indicate anything obvious upon analysis. The issue was reproduced locally on... Sridhar Seshasayee
10:04 AM Bug #41680 (Resolved): Removed OSDs with outstanding peer failure reports crash the monitor
The osd have been reduced, but reported anomaly information for partner OSD Previously. However, reporters of failure... shuguang wang
09:50 AM Bug #41677: Cephmon:fix mon crash
shuguang wang wrote:
> Reduction num of osd in primary mon of three node cluster, the primary mon crash of occasiona...
shuguang wang
08:45 AM Bug #41677 (Fix Under Review): Cephmon:fix mon crash
Kefu Chai
08:43 AM Bug #41677: Cephmon:fix mon crash
shuguang wang wrote:
> Reduction num of osd in primary mon of three node cluster, the primary mon crash of occasiona...
shuguang wang
08:42 AM Bug #41677: Cephmon:fix mon crash
The osd have been reduced, but reported anomaly information for partner OSD Previously. However, failure_info of this... shuguang wang
05:34 AM Bug #41677 (Resolved): Cephmon:fix mon crash
Reduction num of osd in primary mon of three node cluster, the primary mon crash of occasional. shuguang wang
05:53 AM Bug #41427 (Resolved): set-chunk raced with deep-scrub
xie xingguo
05:52 AM Bug #41514 (Resolved): in-flight manifest ops not properly cancelled on interval changing
xie xingguo
03:17 AM Bug #41601 (Pending Backport): oi(object_info_t).size does not match on disk size
xie xingguo

09/05/2019

10:30 PM Bug #41657 (Rejected): osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find_best_in...
this is caused by a bug in my test branch Sage Weil
09:19 PM Bug #41669: Make dumping of reservation info congruent between scrub and recovery
David Zafman
05:47 PM Bug #41669 (Resolved): Make dumping of reservation info congruent between scrub and recovery

Rename dump_reservations to dump_recovery_reservations
Add dump_scrub_reservations
David Zafman
06:59 PM Feature #40640 (Pending Backport): Network ping monitoring
David Zafman
01:50 PM Backport #41447 (In Progress): mimic: osd/PrimaryLogPG: Access destroyed references in finish_deg...
Nathan Cutler
01:01 PM Backport #41351 (In Progress): mimic: hidden corei7 requirement in binary packages
Nathan Cutler
12:49 PM Backport #41291 (In Progress): mimic: filestore pre-split may not split enough directories
Nathan Cutler
12:48 PM Backport #40732 (In Progress): mimic: mon: auth mon isn't loading full KeyServerData after restart
Nathan Cutler
12:36 PM Backport #40083 (In Progress): mimic: osd: Better error message when OSD count is less than osd_p...
Nathan Cutler
07:25 AM Feature #41666 (Resolved): Issue a HEALTH_WARN when a Pool is configured with [min_]size == 1
To prevent the user from experiencing data loss, Ceph should issue a health warning if any Pool is configured with a ... Lenz Grimmer
12:00 AM Bug #41661 (Resolved): radosbench_omap_write cleanup slow/stuck
... Neha Ojha

09/04/2019

09:34 PM Feature #38458: Ceph does not have command to show current osd primary-affinity
"ceph osd dump", perhaps with a detail or json formatting, includes that information.
I don't think we have any qu...
Greg Farnum
05:49 AM Feature #38458: Ceph does not have command to show current osd primary-affinity
Greg, what is the exact command ? Марк Коренберг
05:58 PM Bug #41657 (Fix Under Review): osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find...
Sage Weil
05:53 PM Bug #41657: osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find_best_info_ignore_h...
The find_best_info process excludes getting a master log from an osd with an old(er) last_epoch_started. However, th... Sage Weil
05:51 PM Bug #41657 (Rejected): osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find_best_in...
... Sage Weil
01:27 PM Feature #41650 (New): Convert between EC profiles online
Users have repeatedly voiced the need to convert/modify an EC profile while the cluster was running, in response to c... Lars Marowsky-Brée
09:06 AM Feature #41647 (Resolved): pg_autoscaler should show a warning if pg_num isn't a power of two
As the pg_autoscaler will be automatically turned on with the 14.2.4 release and future releases I would like to enha... Kai Wagner
03:36 AM Bug #40646 (Resolved): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-8-libstd...
Kefu Chai
01:23 AM Bug #40646 (Fix Under Review): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-...
Kefu Chai
12:43 AM Bug #40646 (Resolved): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-8-libstd...
Kefu Chai
12:32 AM Bug #38483 (Pending Backport): FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_...
xie xingguo

09/03/2019

09:19 PM Bug #20283: qa: missing even trivial tests for many commands
Updated script run
'cache drop' has no apparent tests
'cache status' has no apparent tests
'config ls' has no ap...
David Zafman
09:04 PM Bug #41610 (Rejected): python Rados library does not support mon_host bracketed syntax
Your version of librados is too old - 0.69 is cuttlefish. For v1/v2 addresses like that, you need nautilus (v14.2.0+) Josh Durgin
07:26 AM Bug #41610 (Rejected): python Rados library does not support mon_host bracketed syntax
Ceph Nautilus deployed using ceph-ansible
By default it generates ceph.conf with bracketed mon_host syntax (see ht...
Michal Nasiadka
08:48 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
Addressing backport-create-issue script complaint:... Nathan Cutler
06:25 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
Luminous doesn't have the issue! David Zafman
08:40 PM Backport #41640 (Resolved): nautilus: FAILED ceph_assert(info.history.same_interval_since != 0) i...
https://github.com/ceph/ceph/pull/30280 Nathan Cutler
08:37 PM Bug #41639 (Rejected): mon/MgrMonitor: enable pg_autoscaler by default for nautilus
Only https://github.com/ceph/ceph/pull/30112/commits/23edfd202ec1d98cc8c3d52aaaae1d985417aacf needs to be backported ... Neha Ojha
08:08 PM Backport #41350: nautilus: hidden corei7 requirement in binary packages
@Harry - since this is a backport ticket (just for tracking the nautilus backport), I copied your comment to the pare... Nathan Cutler
08:07 PM Bug #41330: hidden corei7 requirement in binary packages
Hi Harry. I don't know about any distros other than openSUSE and SUSE Linux Enterprise. In those distros, there isn't... Nathan Cutler
08:01 PM Bug #41330: hidden corei7 requirement in binary packages
At https://tracker.ceph.com/issues/41350#note-3 (i.e. in the nautilus backport ticket), Harry Coin wrote:
"The 'si...
Nathan Cutler
07:16 PM Bug #39152 (Duplicate): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
yep, dup of #39693 Sage Weil
06:25 PM Backport #41582 (Rejected): luminous: backfill_toofull seen on cluster where the most full OSD is...
David Zafman
02:19 PM Bug #37654 (Pending Backport): FAILED ceph_assert(info.history.same_interval_since != 0) in PG::s...
Sage Weil
02:04 PM Bug #41601 (Fix Under Review): oi(object_info_t).size does not match on disk size
Kefu Chai
02:33 AM Bug #41601: oi(object_info_t).size does not match on disk size
https://github.com/ceph/ceph/pull/30085 xie xingguo
06:22 AM Bug #40646 (Fix Under Review): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-...
* https://github.com/ceph/ceph-build/pull/1387
* https://github.com/ceph/ceph/pull/30088
* https://github.com/ceph/...
Kefu Chai
01:19 AM Backport #41595 (In Progress): mimic: ceph-objectstore-tool can't remove head with bad snapset
https://github.com/ceph/ceph/pull/30081 Prashant D
01:16 AM Backport #41596 (In Progress): nautilus: ceph-objectstore-tool can't remove head with bad snapset
https://github.com/ceph/ceph/pull/30080 Prashant D

09/02/2019

02:04 PM Backport #41350: nautilus: hidden corei7 requirement in binary packages
Thanks! The 'silent' requirement that ceph run only on -march=corei7 capable servers killed two ubuntu eoan based sy... Harry Coin
01:29 PM Bug #41601 (Resolved): oi(object_info_t).size does not match on disk size
In our test environment(ceph version 14.2.1(nautilus) + replicated pool), we found scrub error like bug23701. We use ... 侯 斌
10:09 AM Backport #41597 (Rejected): luminous: ceph-objectstore-tool can't remove head with bad snapset
Nathan Cutler
10:09 AM Backport #41596 (Resolved): nautilus: ceph-objectstore-tool can't remove head with bad snapset
https://github.com/ceph/ceph/pull/30080 Nathan Cutler
10:09 AM Backport #41595 (Resolved): mimic: ceph-objectstore-tool can't remove head with bad snapset
https://github.com/ceph/ceph/pull/30081 Nathan Cutler
10:07 AM Backport #41582 (Need More Info): luminous: backfill_toofull seen on cluster where the most full ...
Nathan Cutler

08/31/2019

03:03 AM Bug #38238 (Duplicate): rados/test.sh: api_aio_pp doesn't seem to start
Brad Hubbard
02:12 AM Bug #38238: rados/test.sh: api_aio_pp doesn't seem to start
... Brad Hubbard
12:19 AM Bug #41517 (Resolved): Missing head object at primary with snapshots crashes primary
David Zafman
12:14 AM Bug #41522 (Pending Backport): ceph-objectstore-tool can't remove head with bad snapset
David Zafman

08/30/2019

10:41 PM Bug #41156: dump_float() poor output

Looking at osd dump output in teuthology.log on a test run and I this see output which is ugly:
"full_ratio": ...
David Zafman
08:34 PM Bug #40522: on_local_recover doesn't touch?
Failed multiple times: http://pulpito.ceph.com/dzafman-2019-08-28_09:11:55-rados-wip-zafman-testing-distro-basic-smit... David Zafman
06:36 PM Backport #41582: luminous: backfill_toofull seen on cluster where the most full OSD is at 1%
This bug doesn't exist on Luminous as far as i can tell, I've only ever seen it since Mimic. Paul Emmerich
08:00 AM Backport #41582 (Rejected): luminous: backfill_toofull seen on cluster where the most full OSD is...
Nathan Cutler
06:15 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
Have been able to reproduce it here: http://pulpito.ceph.com/nojha-2019-08-28_19:12:09-rados:singleton-master-distro-... Neha Ojha
04:43 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
We didn't see this problem on any of our clusters with the 12.2.12 release, so maybe this isn't the fix if a backport... Bryan Stillwell
08:01 AM Bug #41200 (Resolved): osd: fix ceph_assert(mem_avail >= 0) caused by the unset cgroup memory limit
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:01 AM Backport #41584 (Resolved): mimic: backfill_toofull seen on cluster where the most full OSD is at 1%
https://github.com/ceph/ceph/pull/32361 Nathan Cutler
08:00 AM Backport #41583 (Resolved): nautilus: backfill_toofull seen on cluster where the most full OSD is...
https://github.com/ceph/ceph/pull/29999 Nathan Cutler

08/29/2019

11:16 PM Bug #23647: thrash-eio test can prevent recovery
Several proposals that might improve things:
* from Josh, just turn down the odds
* from Greg, is it plausible to...
Greg Farnum
09:24 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
Here's the chain of events that causes this:
Two objects go missing on the primary, and we want to recover them fr...
Neha Ojha
06:54 PM Bug #41577: Erasure-Coded storage in bluestore has larger disk usage than expected
The issue of small object size uses more space seems related to https://tracker.ceph.com/issues/41417
Yan Zhao
06:53 PM Bug #41577 (New): Erasure-Coded storage in bluestore has larger disk usage than expected
The test is done in ceph 14.2.1
We've tested Erasure Coded storage with the same amount of data, which is 800 GiB....
Yan Zhao
05:55 PM Bug #41429 (Fix Under Review): Incorrect logical operator in Monitor::handle_auth_request()
Neha Ojha
03:36 PM Bug #41526 (Rejected): Choosing the next PG for a deep scrubs wrong.
David Zafman
02:43 PM Bug #37775 (Resolved): some pg_created messages not sent to mon
Nathan Cutler
02:40 PM Bug #41517: Missing head object at primary with snapshots crashes primary
Backporting note:
cherry pick https://github.com/ceph/ceph/pull/27575 first, and then https://github.com/ceph/ceph...
Nathan Cutler
02:39 PM Bug #39286: primary recovery local missing object did not update obc
Backports to luminous, mimic, and nautilus are being handled via #41517 Nathan Cutler
02:38 PM Bug #39286 (Resolved): primary recovery local missing object did not update obc
Since this introduced a regression in master, I propose to refrain from backporting it separately, but instead backpo... Nathan Cutler
12:35 PM Backport #41568 (In Progress): nautilus: doc: pg_num should always be a power of two
Nathan Cutler
08:14 AM Backport #41568 (Resolved): nautilus: doc: pg_num should always be a power of two
https://github.com/ceph/ceph/pull/30004 Nathan Cutler
12:29 PM Backport #41529 (In Progress): nautilus: doc: mon_health_to_clog_* values flipped
Nathan Cutler
12:26 PM Bug #39152 (New): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
This is problematic to backport because the "Pull request ID" field is not populated and none of the notes mention a ... Nathan Cutler
11:28 AM Backport #41503 (In Progress): nautilus: Warning about past_interval bounds on deleting pg
Nathan Cutler
11:21 AM Backport #41501 (In Progress): nautilus: backfill_toofull while OSDs are not full (Unneccessary H...
Nathan Cutler
11:17 AM Backport #41491 (In Progress): nautilus: OSDCap.PoolClassRNS test aborts
Nathan Cutler
11:15 AM Backport #41455: nautilus: osd: fix ceph_assert(mem_avail >= 0) caused by the unset cgroup memory...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29745
m...
Nathan Cutler
11:14 AM Backport #41455 (Resolved): nautilus: osd: fix ceph_assert(mem_avail >= 0) caused by the unset cg...
Nathan Cutler
10:47 AM Backport #41453 (In Progress): nautilus: mon: C_AckMarkedDown has not handled the Callback Arguments
Nathan Cutler
10:24 AM Backport #41448 (In Progress): nautilus: osd/PrimaryLogPG: Access destroyed references in finish_...
Nathan Cutler
10:20 AM Backport #40889 (Need More Info): luminous: Pool settings aren't populated to OSD after restart.
non-trivial backport Nathan Cutler
10:20 AM Backport #40890 (Need More Info): mimic: Pool settings aren't populated to OSD after restart.
non-trivial backport Nathan Cutler
10:20 AM Backport #40891 (Need More Info): nautilus: Pool settings aren't populated to OSD after restart.
non-trivial backport Nathan Cutler
10:10 AM Bug #40112 (Resolved): mon: rados/multimon tests fail with clock skew
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
10:09 AM Backport #40228 (Resolved): nautilus: mon: rados/multimon tests fail with clock skew
backport PR https://github.com/ceph/ceph/pull/28576
merge commit 1bc3cc4aa2588bef0acadcf6ba2703df0312b9b4 (v14.2.2-2...
Nathan Cutler
10:03 AM Backport #40084 (In Progress): nautilus: osd: Better error message when OSD count is less than os...
Nathan Cutler
09:56 AM Backport #39700 (In Progress): nautilus: [RFE] If the nodeep-scrub/noscrub flags are set in pools...
Nathan Cutler
09:31 AM Bug #41255 (Pending Backport): backfill_toofull seen on cluster where the most full OSD is at 1%
Kefu Chai
08:59 AM Backport #39682 (In Progress): nautilus: filestore pre-split may not split enough directories
Nathan Cutler
08:50 AM Backport #39517 (In Progress): nautilus: Improvements to standalone tests.
Nathan Cutler
03:51 AM Bug #38155 (Duplicate): PG stuck in undersized+degraded+remapped+backfill_toofull+peered
I'm assuming that the fix for 24452 also fixed this issue. So marking duplicate. David Zafman
03:27 AM Bug #39115 (Duplicate): ceph pg repair doesn't fix itself if osd is bluestore

OSD crashes are the underlying issue here and we can't say anything about repair until there aren't any more crashes.
David Zafman
03:09 AM Documentation #41004 (Pending Backport): doc: pg_num should always be a power of two
Neha Ojha

08/28/2019

09:20 PM Bug #41313: PG distribution completely messed up since Nautilus
Can you reach out on the ceph-users mailing list to see if others have seen similar issues? We've not seen a specific... Neha Ojha
09:19 PM Bug #40522: on_local_recover doesn't touch?

I see this as a hang in running standalone tests in particular qa/standalone/osd/divergent-priors.sh. The test han...
David Zafman
09:13 PM Bug #41336 (Resolved): All OSD Faild after Reboot.
Josh Durgin
09:13 PM Bug #41336: All OSD Faild after Reboot.
This is fixed in later versions - the monitor makes sure stripe_unit is a valid value when the pool is created. With ... Josh Durgin
09:12 PM Bug #41336: All OSD Faild after Reboot.
... Neha Ojha
09:03 PM Bug #41526: Choosing the next PG for a deep scrubs wrong.

You never know what what scrubs can run with osd_max_scrubs (especially defaulting to 1). Without looking at which...
David Zafman
08:44 PM Bug #41385 (In Progress): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(f...
Neha Ojha
08:36 PM Feature #41564 (Resolved): Issue health status warning if num_shards_repaired exceeds some threshold

Now that num_shards_repaired has been added, we can assist in noticing disk, controller, software or other issues b...
David Zafman
08:27 PM Feature #41563: Add connection reset tracking to Network ping monitoring

Experimental code: https://github.com/dzafman/ceph/tree/wip-network-resets
David Zafman
08:24 PM Feature #41563 (New): Add connection reset tracking to Network ping monitoring

Record connection resets on front and back interfaces and report with ping times
David Zafman
08:25 PM Backport #41341 (In Progress): nautilus: "CMake Error" in test_envlibrados_for_rocksdb.sh
Nathan Cutler
08:20 PM Bug #41517: Missing head object at primary with snapshots crashes primary
This was caused by https://github.com/ceph/ceph/pull/27575 David Zafman
05:26 PM Bug #41517 (In Progress): Missing head object at primary with snapshots crashes primary
David Zafman
06:42 PM Bug #41522 (In Progress): ceph-objectstore-tool can't remove head with bad snapset
David Zafman
06:42 PM Backport #38450 (In Progress): mimic: src/osd/OSDMap.h: 1065: FAILED assert(__null != pool)
David Zafman
06:36 PM Bug #37775: some pg_created messages not sent to mon
This patch does not make sense for mimic and luminous.
@Nathan can we please resolve this issue and close the corre...
Neha Ojha
06:34 PM Bug #36498 (New): failed to recover before timeout expired due to pg stuck in creating+peering
I don't think this is a duplicate of https://tracker.ceph.com/issues/37752 or https://tracker.ceph.com/issues/37775 f... Neha Ojha
06:09 PM Bug #39286: primary recovery local missing object did not update obc
https://tracker.ceph.com/issues/41517 is a follow on fix for this. Neha Ojha
11:46 AM Bug #41550 (Fix Under Review): os/bluestore: fadvise_flag leak in generate_transaction
Nathan Cutler
08:09 AM Bug #41550: os/bluestore: fadvise_flag leak in generate_transaction
https://github.com/ceph/ceph/pull/29944 Xuehan Xu
08:06 AM Bug #41550 (Resolved): os/bluestore: fadvise_flag leak in generate_transaction
In generate_transaction when creating ceph::os::Transaction, ObjectOperation::BufferUpdate::Write::fadvise_flag is no... Xuehan Xu
11:13 AM Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
... Nathan Cutler
11:11 AM Backport #38567: luminous: osd_recovery_priority is not documented (but osd_recovery_op_priority is)
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27471
m...
Nathan Cutler
10:11 AM Backport #40638: luminous: osd: report omap/data/metadata usage
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28851
m...
Nathan Cutler
07:43 AM Backport #41548 (Resolved): nautilus: monc: send_command to specific down mon breaks other mon msgs
https://github.com/ceph/ceph/pull/31037 Nathan Cutler
07:43 AM Backport #41547 (Rejected): luminous: monc: send_command to specific down mon breaks other mon msgs
Nathan Cutler
07:42 AM Backport #41546 (Rejected): mimic: monc: send_command to specific down mon breaks other mon msgs
Nathan Cutler

08/27/2019

10:23 PM Bug #38416: crc cache should be invalidated when posting preallocated rx buffers
This is causing lots of failures in luminous/mimic, marking it urgent to get the backports expedited. Neha Ojha
09:16 PM Backport #38880: luminous: ENOENT in collection_move_rename on EC backfill target
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28110
m...
Nathan Cutler
09:16 PM Backport #39373: luminous: ceph tell osd.xx bench help : gives wrong help
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28112
m...
Nathan Cutler
09:14 PM Bug #40765 (Duplicate): mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
Brad Hubbard
09:07 PM Backport #38902: luminous: Minor rados related documentation fixes
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27185
m...
Nathan Cutler
08:18 PM Backport #41532 (In Progress): luminous: Move bluefs alloc size initialization log message to log...
Nathan Cutler
08:46 AM Backport #41532 (Resolved): luminous: Move bluefs alloc size initialization log message to log le...
https://github.com/ceph/ceph/pull/29910 Nathan Cutler
06:19 PM Bug #41522 (Fix Under Review): ceph-objectstore-tool can't remove head with bad snapset
Neha Ojha
04:49 AM Bug #41522 (Resolved): ceph-objectstore-tool can't remove head with bad snapset

We should allow a --force remove of a head object with a bad snapset to remove the object instead of failing.
David Zafman
05:26 PM Bug #20924: osd: leaked Session on osd.7
https://github.com/ceph/ceph/pull/29859 Samuel Just
05:16 PM Bug #41539 (New): luminous: TEST_backfill_remapped fails in above_margin
... Neha Ojha
05:04 PM Bug #38513: luminous: "AsyncReserver.h: 190: FAILED assert(!queue_pointers.count(item) && !in_pro...
/a/nojha-2019-08-26_20:27:46-rados-wip-bluefs-shared-alloc-luminous-2019-08-26-distro-basic-smithi/4255358/ Neha Ojha
03:04 PM Feature #41537: MON DNS Lookup for messenger V2
Jason Dillaman wrote:
> I think v2 over DNS SRV is already handled here [1] and [2].
>
Great, in that case it's...
Ricardo Dias
03:00 PM Feature #41537: MON DNS Lookup for messenger V2
I think v2 over DNS SRV is already handled here [1] and [2].
[1] https://github.com/ceph/ceph/blob/master/src/mon/...
Jason Dillaman
02:43 PM Feature #41537 (New): MON DNS Lookup for messenger V2
Currently is possible for a client to use DNS SRV records to find the MONs addresses to connect to. But these address... Ricardo Dias
01:20 PM Backport #40650: luminous: os/bluestore: fix >2GB writes
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28965
m...
Nathan Cutler
01:19 PM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
Should add one more thing: the only clusters bitten by this issue would be those that, *at any time,* ran the @balanc... Florian Haas
01:18 PM Backport #38276: luminous: osd_map_message_max default is too high?
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28640
m...
Nathan Cutler
01:14 PM Backport #38750: luminous: should report EINVAL in ErasureCode::parse() if m<=0
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28111
m...
Nathan Cutler
10:52 AM Backport #38719: luminous: crush: choose_args array size mis-sized when weight-sets are enabled
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27085
m...
Nathan Cutler
10:52 AM Backport #39343: luminous: ceph-objectstore-tool rename dump-import to dump-export
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27636
m...
Nathan Cutler
10:51 AM Backport #38873: luminous: Rados.get_fsid() returning bytes in python3
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27674
m...
Nathan Cutler
10:51 AM Backport #39042: luminous: osd/PGLog: preserve original_crt to check rollbackability
backport PR https://github.com/ceph/ceph/pull/27715
merge commit f7c528dbafcf540ab046de2cd29010113055da5a (v12.2.12-...
Nathan Cutler
10:51 AM Backport #38905: luminous: osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27715
m...
Nathan Cutler
10:51 AM Backport #39431: luminous: Degraded PG does not discover remapped data on originating OSD
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27751
m...
Nathan Cutler
10:50 AM Backport #39204: luminous: osd: leaked pg refs on shutdown
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27810
m...
Nathan Cutler
10:50 AM Backport #39218: luminous: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(soid...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27878
m...
Nathan Cutler
10:50 AM Backport #39563: luminous: Error message displayed when mon_osd_max_split_count would be exceeded...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27908
m...
Nathan Cutler
10:50 AM Backport #39719: luminous: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when la...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28185
m...
Nathan Cutler
10:14 AM Backport #39239: luminous: "sudo yum -y install python34-cephfs" fails on mimic
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28493
m...
Nathan Cutler
10:12 AM Backport #39420: luminous: Don't mark removed osds in when running "ceph osd in any|all|*"
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27728
m...
Nathan Cutler
09:50 AM Backport #41534 (In Progress): nautilus: valgrind: UninitCondition in ceph::crypto::onwire::AES12...
Nathan Cutler
08:49 AM Backport #41534 (Resolved): nautilus: valgrind: UninitCondition in ceph::crypto::onwire::AES128GC...
https://github.com/ceph/ceph/pull/29928 Nathan Cutler
09:34 AM Bug #40792 (Pending Backport): monc: send_command to specific down mon breaks other mon msgs
Kefu Chai
09:18 AM Bug #41424 (Resolved): readable.sh test fails
Kefu Chai
08:52 AM Bug #22266 (Resolved): mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:46 AM Backport #41533 (Resolved): mimic: Move bluefs alloc size initialization log message to log level 1
https://github.com/ceph/ceph/pull/30219 Nathan Cutler
08:46 AM Backport #41531 (Resolved): nautilus: Move bluefs alloc size initialization log message to log le...
https://github.com/ceph/ceph/pull/30229 Nathan Cutler
08:46 AM Backport #41530 (Resolved): mimic: doc: mon_health_to_clog_* values flipped
https://github.com/ceph/ceph/pull/30227 Nathan Cutler
08:46 AM Backport #41529 (Resolved): nautilus: doc: mon_health_to_clog_* values flipped
https://github.com/ceph/ceph/pull/30003 Nathan Cutler
08:32 AM Bug #41526 (Rejected): Choosing the next PG for a deep scrubs wrong.
I have ceph cluster in this state:... Fyodor Ustinov
07:33 AM Backport #40943: mimic: mon/OSDMonitor.cc: better error message about min_size
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29618
m...
Nathan Cutler
07:33 AM Backport #41086: mimic: Change default for bluestore_fsck_on_mount_deep as false
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29699
m...
Nathan Cutler
07:25 AM Backport #39692 (Resolved): mimic: _txc_add_transaction error (39) Directory not empty not handle...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29217
m...
Nathan Cutler
07:18 AM Backport #40654: mimic: Lower the default value of osd_deep_scrub_large_omap_object_key_threshold
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29174
m...
Nathan Cutler
07:18 AM Backport #38552: mimic: core: lazy omap stat collection
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29189
m...
Nathan Cutler
03:25 AM Bug #41517 (Resolved): Missing head object at primary with snapshots crashes primary

This script crashes osd.1 when it wants to recover to osd.3 after osd.2 is marked out. When it sees the missing "o...
David Zafman
01:22 AM Bug #41406: common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient bootstrap
Patrick Donnelly wrote:
> Does any code actually do that sequence of events. I would think a SafeTimer should not be...
haitao chen
12:47 AM Bug #41514: in-flight manifest ops not properly cancelled on interval changing
http://pulpito.ceph.com/xxg-2019-08-25_02:12:25-rados:thrash-wip-inc-recovery-5-distro-basic-smithi/4250539/ xie xingguo
12:36 AM Bug #41514 (Resolved): in-flight manifest ops not properly cancelled on interval changing
which as a result makes PrimaryLogPG::on_flushed() unhappy:... xie xingguo

08/26/2019

10:52 PM Bug #40721 (Need More Info): backfill caught in loop from block
Samuel Just
10:51 PM Bug #40721: backfill caught in loop from block
I don't think I can make further progress without more logs, I'm marking this need more info for the time being. As ... Samuel Just
09:29 PM Bug #40721: backfill caught in loop from block
Based on the snapcontext, make_writeable should have created a clone. Samuel Just
09:29 PM Bug #40721: backfill caught in loop from block
The copy_from on that object lasted until the end of the test. It did succeed, but presumably during shutdown once t... Samuel Just
08:36 PM Bug #40721: backfill caught in loop from block
Or, I guess the directory is probably correct in that the teuthology.log output is consistent with the above, but the... Samuel Just
08:01 PM Bug #40721: backfill caught in loop from block
Unfortunately, I think the job number is wrong -- I don't see that object in the log (smithi19817795-* objects are in... Samuel Just
09:48 PM Bug #41362 (Fix Under Review): Rados bench sequential and random read: not behaving as expected w...
Patrick Donnelly
09:18 PM Bug #24057 (Can't reproduce): cbt fails to copy results to the archive dir
Neha Ojha
09:10 PM Support #41402 (Rejected): OSD's memory are beyound controlled
Please seek help on ceph-users mailing list. This is not the correct forum to seek support. Patrick Donnelly
09:10 PM Documentation #41403 (Pending Backport): doc: mon_health_to_clog_* values flipped
Patrick Donnelly
09:09 PM Documentation #41403 (Resolved): doc: mon_health_to_clog_* values flipped
Patrick Donnelly
09:08 PM Documentation #41403 (Fix Under Review): doc: mon_health_to_clog_* values flipped
Patrick Donnelly
09:06 PM Bug #41406 (Need More Info): common: SafeTimer reinit doesn't fix up "stopping" bool, used in Mon...
Does any code actually do that sequence of events. I would think a SafeTimer should not be re-inited after shutdown. Patrick Donnelly
08:41 PM Bug #37775: some pg_created messages not sent to mon
The original bug is about a pool level flag - "FLAG_CREATING", which was introduced in 0e526b467af2699e389e7f28a6d709... Neha Ojha
08:40 PM Backport #39475: mimic: segv in fgets() in collect_sys_info reading /proc/cpuinfo
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28206
m...
Nathan Cutler
08:38 PM Backport #40651: mimic: os/bluestore: fix >2GB writes
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28967
m...
Nathan Cutler
08:37 PM Bug #40720 (Resolved): mimic, nautilus: make bitmap allocator the default allocator for bluestore
Nathan Cutler
08:35 PM Backport #38751: mimic: should report EINVAL in ErasureCode::parse() if m<=0
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28995
m...
Nathan Cutler
08:35 PM Backport #39513: mimic: osd: segv in _preboot -> heartbeat
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28220
m...
Nathan Cutler
08:28 PM Backport #39311: mimic: crushtool crash on Fedora 28 and newer
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27986
m...
Nathan Cutler
08:28 PM Backport #39720: mimic: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28089
m...
Nathan Cutler
08:28 PM Backport #39374: mimic: ceph tell osd.xx bench help : gives wrong help
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28097
m...
Nathan Cutler
08:28 PM Backport #39422: mimic: Don't mark removed osds in when running "ceph osd in any|all|*"
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28142
m...
Nathan Cutler
08:27 PM Backport #38341: mimic: pg stuck in backfill_wait with plenty of disk space
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28201
m...
Nathan Cutler
08:17 PM Backport #40639: mimic: osd: report omap/data/metadata usage
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28852
m...
Nathan Cutler
07:58 PM Bug #41399 (Pending Backport): Move bluefs alloc size initialization log message to log level 1
Neha Ojha
07:33 PM Bug #38827 (Pending Backport): valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWir...
seeing this in the rgw suite for nautilus runs, so tagging for backport of https://github.com/ceph/ceph/pull/28305 Casey Bodley
02:59 PM Backport #41503 (Resolved): nautilus: Warning about past_interval bounds on deleting pg
https://github.com/ceph/ceph/pull/30000 Nathan Cutler
02:59 PM Backport #41502 (Resolved): mimic: Warning about past_interval bounds on deleting pg
https://github.com/ceph/ceph/pull/30222 Nathan Cutler
02:59 PM Backport #41501 (Resolved): nautilus: backfill_toofull while OSDs are not full (Unneccessary HEAL...
https://github.com/ceph/ceph/pull/29999 Nathan Cutler
02:58 PM Backport #41500 (Rejected): luminous: backfill_toofull while OSDs are not full (Unneccessary HEAL...
Nathan Cutler
02:58 PM Backport #41499 (Rejected): mimic: backfill_toofull while OSDs are not full (Unneccessary HEALTH_...
Nathan Cutler
02:51 PM Backport #41491 (Resolved): nautilus: OSDCap.PoolClassRNS test aborts
https://github.com/ceph/ceph/pull/29998 Nathan Cutler
02:50 PM Backport #41490 (Resolved): mimic: OSDCap.PoolClassRNS test aborts
https://github.com/ceph/ceph/pull/30214 Nathan Cutler
02:42 PM Backport #41455 (Resolved): nautilus: osd: fix ceph_assert(mem_avail >= 0) caused by the unset cg...
https://github.com/ceph/ceph/pull/29745 Nathan Cutler
02:41 PM Backport #41453 (Resolved): nautilus: mon: C_AckMarkedDown has not handled the Callback Arguments
https://github.com/ceph/ceph/pull/29997 Nathan Cutler
02:25 PM Backport #41449 (Resolved): mimic: mon: C_AckMarkedDown has not handled the Callback Arguments
https://github.com/ceph/ceph/pull/30213 Nathan Cutler
02:25 PM Backport #41448 (Resolved): nautilus: osd/PrimaryLogPG: Access destroyed references in finish_deg...
https://github.com/ceph/ceph/pull/29994 Nathan Cutler
02:25 PM Backport #41447 (Resolved): mimic: osd/PrimaryLogPG: Access destroyed references in finish_degrad...
https://github.com/ceph/ceph/pull/30291 Nathan Cutler
11:23 AM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
With thanks to Paul Emmerich in https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QGY75UVQEAT2SUHHKZC2K... Florian Haas
10:59 AM Backport #39698: mimic: OSD down on snaptrim.
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28202
m...
Nathan Cutler
10:57 AM Backport #39518: mimic: snaps missing in mapper, should be: ca was r -2...repaired
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28232
m...
Nathan Cutler
10:56 AM Backport #39538: mimic: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28259
m...
Nathan Cutler
10:56 AM Backport #39737: mimic: Binary data in OSD log from "CRC header" message
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28503
m...
Nathan Cutler
10:56 AM Backport #39744: mimic: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28540
m...
Nathan Cutler
09:06 AM Bug #41429: Incorrect logical operator in Monitor::handle_auth_request()
“&&” in the following code snippet:... yupeng chen
08:48 AM Bug #41429 (Resolved): Incorrect logical operator in Monitor::handle_auth_request()
When checking auth_mode against AUTH_MODE_MON and AUTH_MODE_MON_MAX in Monitor::handle_auth_request(),
a logical AND...
yupeng chen
08:59 AM Backport #40948 (Resolved): nautilus: Better default value for osd_snap_trim_sleep
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29678
m...
Nathan Cutler
08:48 AM Backport #40885 (Resolved): nautilus: ceph mgr module ls -f plain crashes mon
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29566
m...
Nathan Cutler
08:33 AM Backport #40322: nautilus: nautilus with requrie_osd_release < nautilus cannot increase pg_num
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29671
m...
Nathan Cutler
07:21 AM Bug #41427 (Resolved): set-chunk raced with deep-scrub
which as a result cause object info inconsistency:
"2019-08-25T04:04:19.571852+0000 osd.1 (osd.1) 253 : cluster ...
xie xingguo

08/25/2019

01:44 PM Bug #41424 (Fix Under Review): readable.sh test fails
Kefu Chai
10:38 AM Bug #41424 (Resolved): readable.sh test fails
... Kefu Chai
03:18 AM Documentation #41403: doc: mon_health_to_clog_* values flipped
Verified on nautilus (ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)) that the defa... James McClune

08/24/2019

12:35 AM Bug #41156 (Won't Fix): dump_float() poor output
David Zafman

08/23/2019

11:45 PM Backport #24360: luminous: osd: leaked Session on osd.7
https://github.com/ceph/ceph/pull/29859 Samuel Just
08:31 AM Bug #41406 (New): common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient boot...
1, New a object of SafeTimer().
2, call init.
3, call add_event_after.
4, call shutdown.
5, call init again.
6, ...
haitao chen
05:30 AM Bug #39546 (Pending Backport): Warning about past_interval bounds on deleting pg
Kefu Chai
05:24 AM Bug #41217 (Pending Backport): mon: C_AckMarkedDown has not handled the Callback Arguments
Kefu Chai
05:17 AM Bug #40835 (Pending Backport): OSDCap.PoolClassRNS test aborts
Kefu Chai
02:39 AM Documentation #41403 (Resolved): doc: mon_health_to_clog_* values flipped
On my Luminous cluster (ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)), the defau... James McClune
01:34 AM Support #41402 (Rejected): OSD's memory are beyound controlled
My env :
[store@server01 ~]$ ceph -v
ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
...
伟杰 谭

08/22/2019

07:31 PM Backport #40840 (In Progress): nautilus: Explicitly requested repair of an inconsistent PG cannot...
David Zafman
05:27 PM Bug #41353 (Resolved): scrub/osd-scrub-snaps.sh fails
Sage Weil
05:02 PM Bug #41399 (Fix Under Review): Move bluefs alloc size initialization log message to log level 1
Vikhyat Umrao
04:07 PM Bug #41399: Move bluefs alloc size initialization log message to log level 1
- At present, from a shared BlueStore OSD which has wal, db and block all in one it is being set as 64K we can see in... Vikhyat Umrao
04:05 PM Bug #41399 (Resolved): Move bluefs alloc size initialization log message to log level 1
- https://github.com/ceph/ceph/pull/29537... Vikhyat Umrao
04:57 PM Bug #41255 (In Progress): backfill_toofull seen on cluster where the most full OSD is at 1%
David Zafman
02:48 PM Bug #20050 (Resolved): osd: very old pg creates take a long time to build past_intervals
All of this code went away by mimic. Sage Weil
02:23 PM Backport #41238: nautilus: Implement mon_memory_target
Sridhar Seshasayee wrote:
> https://github.com/ceph/ceph/pull/29652
The above PR is dependent on the backport of ...
Sridhar Seshasayee
12:55 PM Bug #41236 (Fix Under Review): cosbench failures in rados/perf
https://github.com/ceph/cbt/pull/191 Kefu Chai
09:53 AM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
We just bumped into this, on Luminous (12.2.12). It actually caused us momentary loss of quorum.
Sequence of event...
Florian Haas
08:45 AM Documentation #41389 (Resolved): wrong datatype describing crush_rule
current documentation for luminous https://docs.ceph.com/docs/luminous/rados/operations/pools/ is wrong regarding cru... Torben Hørup
05:47 AM Bug #37654: FAILED ceph_assert(info.history.same_interval_since != 0) in PG::start_peering_interv...
http://pulpito.ceph.com/xxg-2019-08-21_09:03:35-rados:thrash-wip-scrub-omap-error-distro-basic-smithi/4236636/ xie xingguo
05:41 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
I had a chance to get back to this.
I fuse mounted the uploaded image and copied the osdmap data for epoch 80890 o...
Brad Hubbard

08/21/2019

10:25 PM Bug #40792 (Fix Under Review): monc: send_command to specific down mon breaks other mon msgs
Updated for a few issues and marked the PR for testing again. Greg Farnum
08:21 PM Bug #40792 (In Progress): monc: send_command to specific down mon breaks other mon msgs
Greg Farnum
09:47 PM Bug #24531: Mimic MONs have slow/long running ops
Greg Farnum
09:20 PM Bug #40073 (Resolved): PG scrub stamps reset to 0.000000
David Zafman
09:18 PM Bug #39570 (Resolved): nautilus with requrie_osd_release < nautilus cannot increase pg_num
Greg Farnum
09:18 PM Backport #40322 (Resolved): nautilus: nautilus with requrie_osd_release < nautilus cannot increas...
Greg Farnum
09:18 PM Bug #39972 (Resolved): librados 'buffer::create' and related functions are not exported in C++ API
Greg Farnum
09:17 PM Backport #24360 (In Progress): luminous: osd: leaked Session on osd.7
Samuel Just
08:45 PM Backport #24360 (New): luminous: osd: leaked Session on osd.7
Meh, actually probably is. Samuel Just
08:40 PM Backport #24360 (Rejected): luminous: osd: leaked Session on osd.7
Not worth backporting to luminous. Samuel Just
09:16 PM Backport #39506 (Rejected): mimic: Give recovery for inactive PGs a higher priority
David Zafman
09:16 PM Backport #39505 (Rejected): luminous: Give recovery for inactive PGs a higher priority
David Zafman
09:16 PM Bug #39484 (Resolved): mon: "FAILED assert(pending_finishers.empty())" when paxos restart
Greg Farnum
09:16 PM Bug #39099 (Resolved): Give recovery for inactive PGs a higher priority
David Zafman
09:13 PM Backport #39518 (Resolved): mimic: snaps missing in mapper, should be: ca was r -2...repaired
David Zafman
09:12 PM Bug #39333 (Resolved): osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
Greg Farnum
09:10 PM Bug #37439 (Resolved): Degraded PG does not discover remapped data on originating OSD
Greg Farnum
09:10 PM Backport #39431 (Resolved): luminous: Degraded PG does not discover remapped data on originating OSD
Greg Farnum
09:08 PM Bug #38359 (Resolved): osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Greg Farnum
09:08 PM Backport #38442 (Resolved): luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Greg Farnum
09:00 PM Documentation #23999 (Resolved): osd_recovery_priority is not documented (but osd_recovery_op_pri...
Greg Farnum
09:00 PM Backport #38567 (Resolved): luminous: osd_recovery_priority is not documented (but osd_recovery_o...
Greg Farnum
08:58 PM Bug #38432 (Resolved): ENOENT on setattrs (obj was recently deleted)
David Zafman
08:57 PM Backport #38507 (Resolved): mimic: ENOENT on setattrs (obj was recently deleted)
David Zafman
08:53 PM Bug #21142 (Won't Fix): OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
If this pops up and causes more trouble we may try again but given the efforts so far it seems like we aren't going t... Greg Farnum
08:52 PM Backport #38256 (Duplicate): luminous: OSD crashes when loading pgs with "FAILED assert(interval....
The original issue #21142 is a luminous-only bug report and there's no code fixing it yet. Greg Farnum
08:44 PM Bug #24174 (Resolved): PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
David Zafman
08:39 PM Backport #23926: luminous: disable bluestore cache caused a rocksdb error
We need to discuss if this is worth backporting any more; it may not be but Kefu can probably talk to the right people? Greg Farnum
08:37 PM Bug #18746 (Resolved): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+k...
Already backported to luminous. Samuel Just
08:33 PM Bug #21629 (Resolved): interval_map.h: 161: FAILED assert(len > 0)
Greg Farnum
08:32 PM Bug #21127 (Resolved): qa/standalone/scrub/osd-scrub-repair.sh timeout
Greg Farnum
08:18 PM Bug #41383 (Need More Info): scrub object count mismatch on device_health_metrics pool
Greg Farnum
08:18 PM Bug #41383: scrub object count mismatch on device_health_metrics pool
This may be the empty object names that the device health manager was inappropriately creating? See the thread "[ceph... Greg Farnum
07:04 PM Bug #41383 (Resolved): scrub object count mismatch on device_health_metrics pool
jenglisch on irc reports multiple scrub errors (error, repaired, reappeared a few days later) on metrics pool.
<pr...
Sage Weil
08:11 PM Bug #41200 (Pending Backport): osd: fix ceph_assert(mem_avail >= 0) caused by the unset cgroup me...
Josh Durgin
07:56 PM Bug #39286 (Pending Backport): primary recovery local missing object did not update obc
Greg Farnum
07:52 PM Bug #38649 (Can't reproduce): [ERR] full status failsafe engaged, dropping updates, now -21474836...
Greg Farnum
07:51 PM Bug #38402: ceph-objectstore-tool on down osd w/ not enough in osds
We think it just needs test fixing. Those in the rados suite test review group can see https://docs.google.com/docume... Greg Farnum
07:49 PM Bug #41385 (Resolved): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(from...
... Sage Weil
07:45 PM Bug #38322 (Fix Under Review): luminous: mons do not trim maps until restarted
Neha Ojha
07:44 PM Bug #40367: "*** Caught signal (Segmentation fault) **" in upgrade:luminous-x-nautilus
same thing upgrading from mimic:
/a/sage-2019-08-21_15:17:39-rados-wip-sage2-testing-2019-08-20-0935-distro-basic-...
Sage Weil
07:31 PM Bug #38023 (Closed): segv on FileJournal::prepare_entry in bufferlist
Seems to have been resolved alongside those related tickets? Greg Farnum
07:30 PM Bug #37808 (Can't reproduce): osd: osdmap cache weak_refs assert during shutdown
Greg Farnum
07:28 PM Bug #37798 (Can't reproduce): ceph-objectstore-tool crash from finisher
David Zafman
07:27 PM Bug #37786 (Can't reproduce): test fails in mon/crush_ops.sh
Greg Farnum
05:06 PM Backport #41084: nautilus: Change default for bluestore_fsck_on_mount_deep as false
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29697
m...
Nathan Cutler
05:04 PM Backport #40537: nautilus: osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29372
m...
Nathan Cutler
04:59 PM Backport #40942: nautilus: mon/OSDMonitor.cc: better error message about min_size
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29617
m...
Nathan Cutler
04:58 PM Backport #40940: nautilus: Update rocksdb to v6.1.2
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29440
m...
Nathan Cutler
04:57 PM Backport #41092: nautilus: rocksdb: enable rocksdb_rmrange=true by default and make delete range ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29439
m...
Nathan Cutler
02:53 PM Bug #41353: scrub/osd-scrub-snaps.sh fails
David Zafman
02:39 PM Backport #39516 (Resolved): nautilus: osd-backfill-space.sh test failed in TEST_backfill_multi_pa...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28187
m...
Nathan Cutler
02:38 PM Backport #40625: nautilus: OSDs get killed by OOM due to a broken switch
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29391
m...
Nathan Cutler
02:37 PM Bug #41052: nautilus: cbt cosbench workloads failing in rados/perf suite
https://github.com/ceph/ceph/pull/29453
merge commit 59177f780c5be0e6530df2fdba1abfa6e3187569 (v14.2.2-230-g59177f780c)
Nathan Cutler
02:36 PM Backport #40180 (Resolved): nautilus: qa/standalone/scrub/osd-scrub-snaps.sh sometimes fails
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29252
m...
Nathan Cutler
02:35 PM Backport #40465 (Resolved): nautilus: osd beacon sometimes has empty pg list
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29254
m...
Nathan Cutler
02:35 PM Backport #39743 (Resolved): nautilus: mon: "FAILED assert(pending_finishers.empty())" when paxos ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28528
m...
Nathan Cutler
02:34 PM Backport #40382: nautilus: RuntimeError: expected MON_CLOCK_SKEW but got none
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28576
m...
Nathan Cutler
02:32 PM Backport #40274 (Resolved): nautilus: librados 'buffer::create' and related functions are not exp...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29244
m...
Nathan Cutler
02:25 PM Backport #40667: nautilus: PG scrub stamps reset to 0.000000
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28869
m...
Nathan Cutler
02:24 PM Backport #40730 (Resolved): nautilus: mon: auth mon isn't loading full KeyServerData after restart
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28993
m...
Nathan Cutler
02:24 PM Backport #39693 (Resolved): nautilus: _txc_add_transaction error (39) Directory not empty not han...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29115
m...
Nathan Cutler
07:53 AM Bug #24339: FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in scan_requ...
Not to my knowledge, but I haven't checked in a while. Ilya Dryomov
07:33 AM Bug #22233: prime_pg_temp breaks on uncreated pgs
> the mon should see hte pg mapping change from [3,6] to [4,6] and send the create to osd.4
exactly. that's why i ...
Kefu Chai
01:16 AM Feature #41363 (New): Allow user to cancel scrub requests

If a user requests multiple scrubs or deep-scrubs, they should be able to cancel the requests. It may be that they...
David Zafman
01:02 AM Bug #41362 (Resolved): Rados bench sequential and random read: not behaving as expected when op s...
ObjBencher::seq_read_bench() is using "num_objects > data.started" to make sure
we don't issue more reads than what ...
Albert Chen
 

Also available in: Atom