Activity
From 08/28/2019 to 09/26/2019
09/26/2019
- 11:04 PM Backport #41960 (In Progress): nautilus: tools/rados: add --pgid in help
- 10:53 PM Backport #41963 (In Progress): nautilus: Segmentation fault in rados ls when using --pgid and --p...
- 10:48 AM Bug #42060: Slow ops seen when one ceph private interface is shut down
- Hi,
When i mention private network i am referring to the cluster_network. - 10:30 AM Bug #42060 (Need More Info): Slow ops seen when one ceph private interface is shut down
- Environment -
5 node Nautilus cluster
67 OSDs per node - 4TB HDD per OSD
We are trying a use case where we shut... - 08:53 AM Bug #42058 (Duplicate): OSD reconnected across map epochs, inconsistent pg logs created
- Get the lossless cluster connection between osd.2 and osd.47 for example.
When osd.47 is restarted and at the same... - 08:37 AM Bug #40035: smoke.sh failing in jenkins "make check" test randomly
- In addition to what Laura reported, it must be said that this failure is seen in jenkins job only
when running the j... - 08:26 AM Bug #40035: smoke.sh failing in jenkins "make check" test randomly
- Kefu Chai wrote:
> [...]
>
> see https://jenkins.ceph.com/job/ceph-pull-requests/817/console
>
> i tried to re... - 03:21 AM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...
This shows the send on osd.0 and receive at osd.6. ...- 02:52 AM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...
- This shows the front and back interface. I don't know which is which, but it already sent the second interface maybe...
- 02:32 AM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...
I confused the front and back interface with a retransmit. The ports are the 2 interfaces.
-At the ping receivi...
09/25/2019
- 11:41 PM Bug #41924 (Fix Under Review): asynchronous recovery can not function under certain circumstances
- 09:27 PM Bug #41924: asynchronous recovery can not function under certain circumstances
- 09:46 PM Bug #41874 (Resolved): mon-osdmap-prune.sh fails
- 09:45 PM Bug #41873 (Resolved): test-erasure-code.sh fails
- 09:28 PM Bug #41939 (Need More Info): Scaling with unfound options might leave PGs in state "unknown"
- 09:28 PM Bug #41939: Scaling with unfound options might leave PGs in state "unknown"
- How are we ending up in this state? What the previous states on the those PGs?
- 09:24 PM Bug #41943 (Need More Info): ceph-mgr fails to report OSD status correctly
- Do you have any other information from that OSD while this happened?
- 09:22 PM Bug #41943: ceph-mgr fails to report OSD status correctly
- Sounds like this OSD was somehow up enough that it responded to peer heartbeats, but was not processing any client re...
- 09:03 PM Bug #41908 (Resolved): TMAPUP operation results in OSD assertion failure
- 12:11 PM Bug #42052 (Resolved): mgr/balancer FAILED ceph_assert(osd_weight.count(i.first))
- > OSDMap.cc: 4603: FAILED ceph_assert(osd_weight.count(i.first))
>
> ceph version v15.0.0-5429-gac828d7 (ac828d732... - 10:50 AM Bug #41866 (Fix Under Review): OSD cannot report slow operation warnings in time.
- 10:49 AM Bug #41866: OSD cannot report slow operation warnings in time.
- 08:26 AM Backport #41921 (In Progress): nautilus: OSDMonitor: missing `pool_id` field in `osd pool ls` com...
- https://github.com/ceph/ceph/pull/30568
09/24/2019
- 09:50 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- Bumping priority based on community feedback.
- 07:53 PM Backport #42037 (Resolved): luminous: Enable auto-scaler and get src/osd/PeeringState.cc:3671: fa...
- https://github.com/ceph/ceph/pull/30896
- 07:52 PM Backport #42036 (Resolved): mimic: Enable auto-scaler and get src/osd/PeeringState.cc:3671: faile...
- https://github.com/ceph/ceph/pull/30895
- 04:11 PM Bug #41946: cbt perf test fails due to leftover in /home/ubuntu/cephtest
- the log files were created by cosbench. see https://github.com/intel-cloud/cosbench/blob/ca68b333e85c51829ea68f203877...
- 12:19 PM Backport #41922 (In Progress): mimic: OSDMonitor: missing `pool_id` field in `osd pool ls` command
- https://github.com/ceph/ceph/pull/30547
- 12:15 PM Backport #41917 (In Progress): nautilus: osd: failure result of do_osd_ops not logged in prepare_...
- https://github.com/ceph/ceph/pull/30546
09/23/2019
- 09:33 PM Bug #42015 (In Progress): Remove unused full and nearful output from OSDMap summary
- 09:27 PM Bug #42015 (Resolved): Remove unused full and nearful output from OSDMap summary
in OSDMap::print_oneline_summary() and OSDMap::print_summary() (CEPH_OSDMAP_FULL and CEPH_OSDMAP_NEARFULL checks)- 08:41 PM Backport #42014 (In Progress): nautilus: Enable auto-scaler and get src/osd/PeeringState.cc:3671:...
- 08:35 PM Backport #42014 (Resolved): nautilus: Enable auto-scaler and get src/osd/PeeringState.cc:3671: fa...
- https://github.com/ceph/ceph/pull/30528
- 07:42 PM Feature #41647 (Fix Under Review): pg_autoscaler should show a warning if pg_num isn't a power of...
- 07:20 PM Bug #42012: mon osd_snap keys grow unbounded
- This is (mostly) fixed in master by https://github.com/ceph/ceph/pull/30518. There is still one set of per-epoch key...
- 03:41 PM Bug #42012: mon osd_snap keys grow unbounded
- Link to the full "dump-keys | grep osd_snap"
https://wustl.box.com/s/3r7bgv32hs5hw4jmgmywbo9qvqrqsmwn - 03:26 PM Bug #42012 (Resolved): mon osd_snap keys grow unbounded
- ...
- 07:19 PM Bug #41680: Removed OSDs with outstanding peer failure reports crash the monitor
- 05:09 PM Bug #41944: inconsistent pool count in ceph -s output
- Is this after pools are deleted? In that case, it's #40011
- 04:27 PM Backport #41864 (In Progress): luminous: Mimic MONs have slow/long running ops
- 02:27 PM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
- Still ongoing here, with mimic too. On one 13.2.6 cluster we have this, for example:...
- 02:12 PM Bug #41816 (Pending Backport): Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed as...
- 09:02 AM Backport #41964 (Resolved): mimic: Segmentation fault in rados ls when using --pgid and --pool/-p...
- https://github.com/ceph/ceph/pull/30893
- 09:02 AM Backport #41963 (Resolved): nautilus: Segmentation fault in rados ls when using --pgid and --pool...
- https://github.com/ceph/ceph/pull/30605
- 09:02 AM Backport #41962 (Resolved): luminous: Segmentation fault in rados ls when using --pgid and --pool...
- 09:02 AM Backport #41961 (Resolved): mimic: tools/rados: add --pgid in help
- https://github.com/ceph/ceph/pull/30893
- 09:02 AM Backport #41960 (Resolved): nautilus: tools/rados: add --pgid in help
- https://github.com/ceph/ceph/pull/30607
- 09:02 AM Backport #41959 (Resolved): luminous: tools/rados: add --pgid in help
- https://github.com/ceph/ceph/pull/30608
- 09:02 AM Backport #41958 (Resolved): nautilus: scrub errors after quick split/merge cycle
- https://github.com/ceph/ceph/pull/30643
09/22/2019
- 10:12 PM Cleanup #41876 (Pending Backport): tools/rados: add --pgid in help
- 11:55 AM Bug #41950 (Can't reproduce): crimson compile
- Can i know crimson-old use what version Seastar code at ceph-15 version?
When compile, output following option:
<... - 04:12 AM Bug #41936 (Pending Backport): scrub errors after quick split/merge cycle
- 03:45 AM Bug #41946: cbt perf test fails due to leftover in /home/ubuntu/cephtest
- ...
- 02:09 AM Bug #41946 (Duplicate): cbt perf test fails due to leftover in /home/ubuntu/cephtest
- ...
- 03:42 AM Bug #41875 (Pending Backport): Segmentation fault in rados ls when using --pgid and --pool/-p tog...
09/20/2019
- 09:01 PM Bug #41156 (Rejected): dump_float() poor output
- 08:47 PM Bug #41817 (Closed): qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
- 07:17 PM Bug #41913 (Fix Under Review): With auto scaler operating stopping an OSD can lead to COT crashin...
- the real bug here is that the pgid split so the pgid specified to COT is wrong. the attached PR adds a check in COT ...
- 06:22 PM Bug #41944 (Resolved): inconsistent pool count in ceph -s output
- ...
- 06:08 PM Bug #41816 (Fix Under Review): Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed as...
- 05:36 PM Bug #41816: Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert info.last_comp...
- The complete_to pointer is already at log end before recover_got() is called. I think it's because during split() we ...
- 04:35 PM Bug #41943 (Closed): ceph-mgr fails to report OSD status correctly
- After an inexplicable cluster event that resulted in around 10% of our OSDs falsely reported down (and shortly after ...
- 12:47 PM Bug #41834: qa: EC Pool configuration and slow op warnings for OSDs caused by recent master changes
- Might as well add some RBD failures while piling on:
http://pulpito.ceph.com/trociny-2019-09-19_12:41:57-rbd-wip-m... - 02:13 AM Bug #41939 (Need More Info): Scaling with unfound options might leave PGs in state "unknown"
With osd_pool_default_pg_autoscale_mode="on"
../qa/run-standalone.sh TEST_rep_recovery_unfound
The test failu...- 01:59 AM Backport #41863 (In Progress): mimic: Mimic MONs have slow/long running ops
- https://github.com/ceph/ceph/pull/30481
- 01:57 AM Backport #41862 (In Progress): nautilus: Mimic MONs have slow/long running ops
- https://github.com/ceph/ceph/pull/30480
09/19/2019
- 11:12 PM Bug #41817: qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
- This fix for this particular issue is to just disable auto scaler because it just causes a hang in the test but no cr...
- 10:59 PM Bug #41923: 3 different ceph-osd asserts caused by enabling auto-scaler
I think this stack better reflects the thread that hit the suicide timeout. However, everytime I've seen this thre...- 09:41 PM Bug #41923: 3 different ceph-osd asserts caused by enabling auto-scaler
Look at the assert(op.hinfo) it is caused by the corruption injected by the test. I'll verify that the asserts are...- 12:05 AM Bug #41923 (Can't reproduce): 3 different ceph-osd asserts caused by enabling auto-scaler
Change config osd_pool_default_pg_autoscale_mode to "on"
Saw these 4 core dumps on 3 different sub-tests.
../...- 04:51 PM Bug #41936 (Fix Under Review): scrub errors after quick split/merge cycle
- 04:51 PM Bug #41936 (Resolved): scrub errors after quick split/merge cycle
- PGs split and then merge soon after. There is a pg stat scrub mismatch.
- 04:48 PM Bug #41834: qa: EC Pool configuration and slow op warnings for OSDs caused by recent master changes
- This shows up in rgw's ec pool tests also. In osd logs, I see slow ops on MOSDECSubOpRead/Reply messages, and they al...
- 09:32 AM Feature #41647: pg_autoscaler should show a warning if pg_num isn't a power of two
- Note: contrary to what the bug description says, pg_autoscaler will (apparently) *not* be automatically turned on wit...
- 01:56 AM Bug #41924 (Resolved): asynchronous recovery can not function under certain circumstances
- guoracle report that:
> In the asynchronous recovery feature,
> the asynchronous recovery target OSD is selected ... - 01:39 AM Bug #41866: OSD cannot report slow operation warnings in time.
- *report_callback* thread is also blocked on PG::lock with MGRClient::lock locked while getting the pg stats. This in ...
- 12:54 AM Bug #41816: Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert info.last_comp...
This can be reproduced by setting config osd_pool_default_pg_autoscale_mode="on" and executing this test:
../qa/...- 12:29 AM Bug #41754: Use dump_stream() instead of dump_float() for floats where max precision isn't helpful
I was suspicious that the trailing 0999999994 in the elapsed time is noise. Could this be caused by a float being...
09/18/2019
- 06:33 PM Backport #41922 (Resolved): mimic: OSDMonitor: missing `pool_id` field in `osd pool ls` command
- https://github.com/ceph/ceph/pull/30485
- 06:33 PM Backport #41921 (Resolved): nautilus: OSDMonitor: missing `pool_id` field in `osd pool ls` command
- https://github.com/ceph/ceph/pull/30486
- 06:31 PM Backport #41920 (Resolved): nautilus: osd: scrub error on big objects; make bluestore refuse to s...
- https://github.com/ceph/ceph/pull/30783
- 06:31 PM Backport #41919 (Resolved): luminous: osd: scrub error on big objects; make bluestore refuse to s...
- https://github.com/ceph/ceph/pull/30785
- 06:31 PM Backport #41918 (Resolved): mimic: osd: scrub error on big objects; make bluestore refuse to star...
- https://github.com/ceph/ceph/pull/30784
- 06:31 PM Backport #41917 (Resolved): nautilus: osd: failure result of do_osd_ops not logged in prepare_tra...
- https://github.com/ceph/ceph/pull/30546
- 04:25 PM Bug #41900 (Resolved): auto-scaler breaks many standalone tests
- 03:38 PM Bug #41913 (Resolved): With auto scaler operating stopping an OSD can lead to COT crashing instea...
- ...
- 03:03 PM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
- Answering myself - seems that rbd_support cannot be disabled anyway
# ceph mgr module disable rbd_support
Error E... - 10:59 AM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
- I don't believe this command was running at that time, however "rbd_support" mgr module was active. Could this be the...
- 10:53 AM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
- Marcin, I believe I know the cause and I am now discussing the fix [1]. A workaround could be not to use "rbd perf im...
- 10:13 AM Bug #41891 (Fix Under Review): global osd crash in DynamicPerfStats::add_to_reports
- 06:24 AM Bug #41891 (In Progress): global osd crash in DynamicPerfStats::add_to_reports
- 01:55 PM Bug #41908 (Fix Under Review): TMAPUP operation results in OSD assertion failure
- 01:47 PM Bug #41908 (Resolved): TMAPUP operation results in OSD assertion failure
- In 'do_tmapup', the object is READ into a 'newop' structure and then when it is re-written, the same 'newop' structur...
- 10:52 AM Bug #41677: Cephmon:fix mon crash
- @shuguang what is the exact version of ceph-mon? i cannot match the backtrace with the source code of master HEAD.
- 09:46 AM Feature #41905 (New): Add ability to change fsid of cluster
- There is a case where you want to change the fsid of a cluster: When you have splitted a cluster into two different c...
09/17/2019
- 09:50 PM Bug #41900 (Resolved): auto-scaler breaks many standalone tests
Caused by https://github.com/ceph/ceph/pull/30112
In some cases I had to kill processes to get past hung tests. ...- 08:46 PM Bug #41816: Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert info.last_comp...
- This crash didn't reproduce for me using run-standalone.sh with the auto scaler turned off.
- 08:35 PM Bug #40287 (Pending Backport): OSDMonitor: missing `pool_id` field in `osd pool ls` command
- 08:30 PM Bug #41191 (Pending Backport): osd: scrub error on big objects; make bluestore refuse to start on...
- 08:29 PM Bug #41210 (Pending Backport): osd: failure result of do_osd_ops not logged in prepare_transactio...
- @shuguang wang did you want this to be backported to a release older than nautilus?
- 06:59 PM Bug #41336: All OSD Faild after Reboot.
- Hi,
two questions:
- How to find out if a pool is affected?
"ceph osd erasure-code-profile get" does not list... - 05:04 PM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
- Yes, I use "rbd perf image iotop/iostat" (one of the reasons for upgrade:-) ). Not exporting per image data with prom...
- 03:51 PM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
- Marcin, are you using `rbd perf image iotop|iostat` commands? Or may be prometheus mgr module with rbd per image stat...
- 01:49 PM Bug #41891: global osd crash in DynamicPerfStats::add_to_reports
- As crash seems to be related to stats reporting - don't know if it is related, but it was soon after eliminating "Leg...
- 10:30 AM Bug #41891 (Resolved): global osd crash in DynamicPerfStats::add_to_reports
- Hi,
during routine host maintenance, I've encountered massive osd crash across entire cluster. The sequence of event... - 01:19 PM Feature #40420 (Need More Info): Introduce an ceph.conf option to disable HEALTH_WARN when nodeep...
- https://github.com/ceph/ceph/pull/29422 has been merged, but not yet backported
- 08:05 AM Bug #41754: Use dump_stream() instead of dump_float() for floats where max precision isn't helpful
- Regarding elapsed time it might be important (for `compact` is not, but for benchmarking is). Another importatnat thi...
- 06:15 AM Backport #41238 (In Progress): nautilus: Implement mon_memory_target
09/16/2019
- 10:10 PM Cleanup #41876 (Fix Under Review): tools/rados: add --pgid in help
- 10:09 PM Cleanup #41876 (Resolved): tools/rados: add --pgid in help
- 09:39 PM Bug #41817 (In Progress): qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
- This is likely cause by enabling of auto scaler.
- 03:27 PM Bug #41817: qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
- /a/kchai-2019-09-15_15:37:26-rados-wip-kefu-testing-2019-09-15-1533-distro-basic-mira/4311115/
/a/pdonnell-2019-09-1... - 08:05 PM Bug #41875 (Fix Under Review): Segmentation fault in rados ls when using --pgid and --pool/-p tog...
- 07:55 PM Bug #41875 (Resolved): Segmentation fault in rados ls when using --pgid and --pool/-p together as...
- - Works fine with only --pgid...
- 07:57 PM Bug #41816: Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert info.last_comp...
- Reproduced with logs: /a/nojha-2019-09-13_21:45:51-rados:standalone-master-distro-basic-smithi/4304313/remote/smithi1...
- 03:25 PM Bug #40522: on_local_recover doesn't touch?
- /a/pdonnell-2019-09-14_22:40:03-rados-master-distro-basic-smithi/4307679/
/a/kchai-2019-09-15_15:37:26-rados-wip-kef... - 03:23 PM Bug #41874 (Resolved): mon-osdmap-prune.sh fails
- ...
- 03:19 PM Bug #41873 (Resolved): test-erasure-code.sh fails
- ...
- 01:46 PM Backport #41238: nautilus: Implement mon_memory_target
- The old PR is unlinked from the tracker as more commits need to be pulled in for this backport. I will update this tr...
- 01:04 PM Backport #41238 (Need More Info): nautilus: Implement mon_memory_target
- first attempted backport https://github.com/ceph/ceph/pull/29652 was closed - apparently, the backport is not trivial...
- 01:23 PM Backport #40993: mimic: Ceph status in some cases does not report slow ops
- just for completeness - the mimic fix is (I think): https://github.com/ceph/ceph/pull/30391
- 10:39 AM Bug #41866: OSD cannot report slow operation warnings in time.
- assumed that bluestore is used.
- 10:23 AM Bug #41866 (Fix Under Review): OSD cannot report slow operation warnings in time.
- If an underlying device is blocked due to H/W issues, a thread that checks slow ops can’t report slow op warning in t...
- 07:21 AM Backport #41864 (Resolved): luminous: Mimic MONs have slow/long running ops
- https://github.com/ceph/ceph/pull/30519
- 07:21 AM Backport #41863 (Resolved): mimic: Mimic MONs have slow/long running ops
- https://github.com/ceph/ceph/pull/30481
- 07:21 AM Backport #41862 (Resolved): nautilus: Mimic MONs have slow/long running ops
- https://github.com/ceph/ceph/pull/30480
- 07:14 AM Backport #41845 (Resolved): luminous: tools/rados: allow list objects in a specific pg in a pool
- https://github.com/ceph/ceph/pull/30608
- 07:14 AM Backport #41844 (Resolved): mimic: tools/rados: allow list objects in a specific pg in a pool
- https://github.com/ceph/ceph/pull/30893
09/15/2019
- 01:59 PM Bug #41716 (Resolved): LibRadosTwoPoolsPP.ManifestUnset fails
- 01:51 PM Bug #41716: LibRadosTwoPoolsPP.ManifestUnset fails
- This issue is fixed by https://github.com/ceph/ceph/pull/29985
When the error occurs, the following ops are executed... - 03:05 AM Bug #41834 (Resolved): qa: EC Pool configuration and slow op warnings for OSDs caused by recent m...
- See: http://pulpito.ceph.com/pdonnell-2019-09-14_22:39:31-fs-master-distro-basic-smithi/
Recent run of fs suite on...
09/13/2019
- 10:29 PM Feature #41831 (Resolved): tools/rados: allow list objects in a specific pg in a pool
- This one is already present in nautilus.
- 04:41 PM Bug #41817: qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
- David, can you please take a look at this whenever you get a chance.
- 01:31 PM Bug #41817 (Closed): qa/standalone/scrub/osd-recovery-scrub.sh timed out waiting for scrub
- ...
- 04:40 PM Bug #41816: Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert info.last_comp...
- I'll try to see if I can reproduce this.
- 01:30 PM Bug #41816 (Resolved): Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert inf...
- ...
- 04:37 PM Bug #41735 (Resolved): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
- See https://tracker.ceph.com/issues/41735#note-3 and https://github.com/rook/rook/pull/3847/commits/11d3831d742639148...
- 04:29 PM Bug #24531 (Pending Backport): Mimic MONs have slow/long running ops
- 09:09 AM Backport #40993 (Rejected): mimic: Ceph status in some cases does not report slow ops
- backports will be pursued in https://tracker.ceph.com/issues/41741
- 07:54 AM Bug #41758 (Duplicate): Ceph status in some cases does not report slow ops
- 05:13 AM Feature #40420: Introduce an ceph.conf option to disable HEALTH_WARN when nodeep-scrub/scrub flag...
- What is the back port targets for this? I don't see a health mute tracker referenced by any of the commits, but this...
- 01:55 AM Backport #41712 (In Progress): nautilus: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::reg...
- https://github.com/ceph/ceph/pull/30371
09/12/2019
- 10:38 PM Backport #40993: mimic: Ceph status in some cases does not report slow ops
- Nathan Cutler wrote:
> backport ticket opened prematurely - setting "Need More Info" pending:
>
> 1. opening of P... - 08:19 PM Backport #40993 (Need More Info): mimic: Ceph status in some cases does not report slow ops
- backport ticket opened prematurely - setting "Need More Info" pending:
1. opening of PR fixing the issue in master... - 08:18 PM Backport #40993 (New): mimic: Ceph status in some cases does not report slow ops
- 11:58 AM Backport #40993: mimic: Ceph status in some cases does not report slow ops
- Converting this to track backport from master where the fix is under review.
- 02:03 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
- We had to scrap the idea of changing the backend and went for upgrading the OSDs to Bluestore. Our backfilling issue ...
- 01:58 PM Bug #36289: Converting Filestore OSD from leveldb to rocksdb backend on CentOS
- David:
Did you run into a solution for this? We're seeing similar issues but the only possible alternative seems ... - 08:32 AM Backport #41785 (Resolved): nautilus: Make dumping of reservation info congruent between scrub an...
- https://github.com/ceph/ceph/pull/31444
- 05:41 AM Backport #41764 (In Progress): nautilus: TestClsRbd.sparsify fails when using filestore
- https://github.com/ceph/ceph/pull/30354
- 02:24 AM Bug #23647 (In Progress): thrash-eio test can prevent recovery
- http://pulpito.ceph.com/nojha-2019-09-06_14:33:54-rados:singleton-wip-41385-3-distro-basic-smithi/ - this is where I ...
- 01:22 AM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...
Reproduced several times with debug_ms = 20
http://pulpito.ceph.com/dzafman-2019-09-11_15:28:37-rados-wip-zafman...- 01:21 AM Bug #41735: pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
- sorry I missed that...
09/11/2019
- 10:28 PM Bug #41735 (Fix Under Review): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
- Rook should probably set this option explicitly, since it is working with nautilus and we won't backport this (or the...
- 09:29 PM Bug #41735 (Need More Info): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
- can you attach the 'ceph health detail' output so i can see which warning it's throwing?
- 09:33 PM Bug #41669 (Pending Backport): Make dumping of reservation info congruent between scrub and recovery
- 09:11 PM Bug #41680 (Won't Fix): Removed OSDs with outstanding peer failure reports crash the monitor
- OSD failure reports will die out on their own eventually and there's no general reason to expect a removed OSD was in...
- 09:11 PM Bug #41639 (Rejected): mon/MgrMonitor: enable pg_autoscaler by default for nautilus
- 09:10 PM Bug #41693 (Need More Info): a accidental problems with osd detection algorithm in monitor
- Can you explain in more detail exactly what happened here?
It sounds like you have three hosts with colocated OSDs... - 09:08 PM Bug #41718 (Fix Under Review): ceph osd stat JSON output incomplete
- 03:28 PM Bug #41758 (Fix Under Review): Ceph status in some cases does not report slow ops
- 01:13 PM Bug #41758: Ceph status in some cases does not report slow ops
- After applying the fix, health warning pertaining to slow ops show up as shown below,...
- 12:57 PM Bug #41758: Ceph status in some cases does not report slow ops
- PR https://github.com/ceph/ceph/pull/30337 addresses this issue.
- 09:29 AM Bug #41758 (Duplicate): Ceph status in some cases does not report slow ops
- In cases when only osds report slow ops, it is observed that ceph summary status doesn't report the same. This issue ...
- 01:28 PM Backport #41764 (Resolved): nautilus: TestClsRbd.sparsify fails when using filestore
- https://github.com/ceph/ceph/pull/30354
- 09:14 AM Backport #40993: mimic: Ceph status in some cases does not report slow ops
- Further to my findings earlier, I confirmed that the "reported" flag is being reset in case ONLY an osd daemon report...
- 04:08 AM Bug #41754 (New): Use dump_stream() instead of dump_float() for floats where max precision isn't ...
Some examples from osd dump are below. The full_ratio is .95, backfill_ratio .90 and nearfull_ratio .85.
<pre...- 01:25 AM Bug #41661 (Resolved): radosbench_omap_write cleanup slow/stuck
- 12:25 AM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...
- 12:24 AM Bug #41743 (In Progress): Long heartbeat ping times on front interface seen, longest is 2237.999 ...
09/10/2019
- 10:42 PM Bug #41743: Long heartbeat ping times on front interface seen, longest is 2237.999 msec (OSD_SLOW...
- The only OSDs involved are osd.6 and osd.0.
Slow heartbeat ping on front interface from osd.6 to osd.0 2237.999 ms... - 12:12 PM Bug #41743 (Resolved): Long heartbeat ping times on front interface seen, longest is 2237.999 mse...
- "2019-09-09T22:25:11.794749+0000 mon.b (mon.0) 389 : cluster [WRN] Health check failed: Long heartbeat ping times on ...
- 08:21 PM Bug #41661 (Fix Under Review): radosbench_omap_write cleanup slow/stuck
- 07:54 PM Bug #41661: radosbench_omap_write cleanup slow/stuck
- Clearly, filestore-xfs.yaml is the one failing consistently.
See http://pulpito.ceph.com/nojha-2019-09-09_23:22:30... - 05:03 PM Backport #40082 (In Progress): luminous: osd: Better error message when OSD count is less than os...
- 02:59 PM Bug #41748 (Can't reproduce): log [ERR] : 7.19 caller_ops.size 62 > log size 61
- ...
- 08:27 AM Bug #41721 (Pending Backport): TestClsRbd.sparsify fails when using filestore
- 06:45 AM Backport #41640 (In Progress): nautilus: FAILED ceph_assert(info.history.same_interval_since != 0...
- 06:36 AM Backport #41530 (Resolved): mimic: doc: mon_health_to_clog_* values flipped
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30227
m... - 06:34 AM Backport #41532 (Resolved): luminous: Move bluefs alloc size initialization log message to log le...
- 06:32 AM Backport #38551: luminous: core: lazy omap stat collection
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29190
m... - 05:42 AM Backport #41703 (In Progress): nautilus: oi(object_info_t).size does not match on disk size
- https://github.com/ceph/ceph/pull/30278
- 03:55 AM Backport #41704 (In Progress): mimic: oi(object_info_t).size does not match on disk size
- https://github.com/ceph/ceph/pull/30275
- 01:01 AM Bug #41735 (Resolved): pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools
- Old pools have auto_scale on and ceph health still shows HEALTH_WARN (20 < 30)...
09/09/2019
- 11:38 PM Bug #41661: radosbench_omap_write cleanup slow/stuck
- The current timeout (config.get('time', 360) * 30 + 300 = 300*30 + 300) of 9300 seconds is not enough to clean up the...
- 10:25 PM Feature #38136 (Resolved): core: lazy omap stat collection
- 10:25 PM Backport #38551 (Resolved): luminous: core: lazy omap stat collection
- 09:45 PM Bug #41601: oi(object_info_t).size does not match on disk size
- Greg Farnum wrote:
> Hmm I was going to move this into the RADOS project tracker but now I'm leaving it because I'm ... - 08:20 PM Bug #41601: oi(object_info_t).size does not match on disk size
- Hmm I was going to move this into the RADOS project tracker but now I'm leaving it because I'm not sure if that will ...
- 09:35 PM Backport #41731 (Need More Info): nautilus: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(pe...
- note that the backport of https://github.com/ceph/ceph/pull/30059 should happen after https://github.com/ceph/ceph/pu...
- 07:39 PM Backport #41731 (Rejected): nautilus: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_mis...
- 09:34 PM Backport #41732 (Need More Info): mimic: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_...
- 09:33 PM Backport #41732: mimic: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fro...
- note that the backport of https://github.com/ceph/ceph/pull/30059 should happen after https://github.com/ceph/ceph/pu...
- 07:39 PM Backport #41732 (Rejected): mimic: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missin...
- 09:33 PM Backport #41730 (Need More Info): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(pe...
- note that the backport of https://github.com/ceph/ceph/pull/30059 should happen after https://github.com/ceph/ceph/pu...
- 07:39 PM Backport #41730 (Resolved): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_mis...
- https://github.com/ceph/ceph/pull/31855
- 09:03 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
- Nathan Cutler wrote:
> @Neha - backport all three PRs?
Yes, note that the backport of https://github.com/ceph/cep... - 07:41 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
- @Neha - backport all three PRs?
- 04:53 PM Bug #41385 (Pending Backport): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.co...
- 08:51 PM Bug #41065 (Closed): new osd added to cluster upgraded from 13 to 14 will down after some days
- It's not clear from these snippets what issue you're actually experiencing. The "bad authorizer" suggests either a cl...
- 08:37 PM Bug #41406: common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient bootstrap
- That's a weird one; perhaps the MonClient should behave differently instead.
(Note that this is a problem only on ... - 04:20 PM Bug #41689 (Resolved): Network ping test fails in TEST_network_ping_test2
- This is a follow on fix for the feature https://tracker.ceph.com/issues/40640. The backport is included as part of t...
- 10:50 AM Bug #41721 (Fix Under Review): TestClsRbd.sparsify fails when using filestore
- 10:24 AM Bug #41721 (Resolved): TestClsRbd.sparsify fails when using filestore
- it's a regression introduced by https://github.com/ceph/ceph/pull/30061
see http://pulpito.ceph.com/kchai-2019-09-...
09/08/2019
- 06:16 PM Bug #41718 (Resolved): ceph osd stat JSON output incomplete
- ...
- 09:22 AM Bug #40583 (Resolved): Lower the default value of osd_deep_scrub_large_omap_object_key_threshold
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 09:20 AM Backport #40653 (Resolved): luminous: Lower the default value of osd_deep_scrub_large_omap_object...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29175
m...
09/07/2019
- 08:07 PM Bug #41716 (Resolved): LibRadosTwoPoolsPP.ManifestUnset fails
- ...
- 09:29 AM Backport #41712 (Resolved): nautilus: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::regist...
- https://github.com/ceph/ceph/pull/30371
- 09:23 AM Backport #41705 (Resolved): nautilus: Incorrect logical operator in Monitor::handle_auth_request()
- https://github.com/ceph/ceph/pull/31038
- 09:23 AM Backport #41704 (Resolved): mimic: oi(object_info_t).size does not match on disk size
- https://github.com/ceph/ceph/pull/30275
- 09:23 AM Backport #41703 (Resolved): nautilus: oi(object_info_t).size does not match on disk size
- https://github.com/ceph/ceph/pull/30278
- 09:23 AM Backport #41702 (Rejected): luminous: oi(object_info_t).size does not match on disk size
- 07:45 AM Backport #41697 (In Progress): luminous: Network ping monitoring
- 07:31 AM Backport #41697 (Resolved): luminous: Network ping monitoring
- https://github.com/ceph/ceph/pull/30230
- 07:43 AM Backport #41696 (In Progress): mimic: Network ping monitoring
- 07:31 AM Backport #41696 (Resolved): mimic: Network ping monitoring
- https://github.com/ceph/ceph/pull/30225
- 07:34 AM Backport #41695 (In Progress): nautilus: Network ping monitoring
- 07:31 AM Backport #41695 (Resolved): nautilus: Network ping monitoring
- https://github.com/ceph/ceph/pull/30195
- 02:35 AM Bug #41693 (Need More Info): a accidental problems with osd detection algorithm in monitor
- There is a accidental problems with osd detection algorithm in monitor. In a three-cluster environment,HostA/HostB/Ho...
09/06/2019
- 11:49 PM Backport #41531 (In Progress): nautilus: Move bluefs alloc size initialization log message to log...
- 10:15 PM Backport #41531 (Need More Info): nautilus: Move bluefs alloc size initialization log message to ...
- non-trivial backport - needs https://github.com/ceph/ceph/pull/29537 at least
- 11:38 PM Bug #41385 (Fix Under Review): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.co...
- https://github.com/ceph/ceph/pull/30119 (merged September 4, 2019)
https://github.com/ceph/ceph/pull/30059 (merged S... - 10:21 PM Backport #41533 (In Progress): mimic: Move bluefs alloc size initialization log message to log le...
- 10:14 PM Backport #41533 (Need More Info): mimic: Move bluefs alloc size initialization log message to log...
- non-trivial backport - needs https://github.com/ceph/ceph/pull/29537 at least
- 10:03 PM Backport #41530 (In Progress): mimic: doc: mon_health_to_clog_* values flipped
- 08:01 PM Backport #41499 (Need More Info): mimic: backfill_toofull while OSDs are not full (Unneccessary H...
- The backport needs 3b8f86c8b09b9143d3e25ab34b51057581b48114 to be cherry-picked, first, for it to make sense, but tha...
- 03:34 PM Backport #41499 (In Progress): mimic: backfill_toofull while OSDs are not full (Unneccessary HEAL...
- 07:42 PM Backport #41502 (In Progress): mimic: Warning about past_interval bounds on deleting pg
- 07:03 PM Bug #41689: Network ping test fails in TEST_network_ping_test2
- ...
- 06:37 PM Bug #41689 (Fix Under Review): Network ping test fails in TEST_network_ping_test2
- 06:18 PM Bug #41689 (Resolved): Network ping test fails in TEST_network_ping_test2
- http://pulpito.ceph.com/kchai-2019-09-06_15:05:18-rados-wip-kefu-testing-2019-09-06-1807-distro-basic-smithi/4283774/...
- 05:27 PM Bug #41429 (Pending Backport): Incorrect logical operator in Monitor::handle_auth_request()
- 05:08 PM Bug #38513: luminous: "AsyncReserver.h: 190: FAILED assert(!queue_pointers.count(item) && !in_pro...
- /a/nojha-2019-09-05_23:53:20-rados-wip-40769-luminous-distro-basic-smithi/4279855/
- 03:29 PM Backport #41490 (In Progress): mimic: OSDCap.PoolClassRNS test aborts
- 03:28 PM Backport #41449 (In Progress): mimic: mon: C_AckMarkedDown has not handled the Callback Arguments
- 01:29 PM Backport #40993 (In Progress): mimic: Ceph status in some cases does not report slow ops
- The logs relating to this tracker didn't indicate anything obvious upon analysis. The issue was reproduced locally on...
- 10:04 AM Bug #41680 (Resolved): Removed OSDs with outstanding peer failure reports crash the monitor
- The osd have been reduced, but reported anomaly information for partner OSD Previously. However, reporters of failure...
- 09:50 AM Bug #41677: Cephmon:fix mon crash
- shuguang wang wrote:
> Reduction num of osd in primary mon of three node cluster, the primary mon crash of occasiona... - 08:45 AM Bug #41677 (Fix Under Review): Cephmon:fix mon crash
- 08:43 AM Bug #41677: Cephmon:fix mon crash
- shuguang wang wrote:
> Reduction num of osd in primary mon of three node cluster, the primary mon crash of occasiona... - 08:42 AM Bug #41677: Cephmon:fix mon crash
- The osd have been reduced, but reported anomaly information for partner OSD Previously. However, failure_info of this...
- 05:34 AM Bug #41677 (Resolved): Cephmon:fix mon crash
- Reduction num of osd in primary mon of three node cluster, the primary mon crash of occasional.
- 05:53 AM Bug #41427 (Resolved): set-chunk raced with deep-scrub
- 05:52 AM Bug #41514 (Resolved): in-flight manifest ops not properly cancelled on interval changing
- 03:17 AM Bug #41601 (Pending Backport): oi(object_info_t).size does not match on disk size
09/05/2019
- 10:30 PM Bug #41657 (Rejected): osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find_best_in...
- this is caused by a bug in my test branch
- 09:19 PM Bug #41669: Make dumping of reservation info congruent between scrub and recovery
- 05:47 PM Bug #41669 (Resolved): Make dumping of reservation info congruent between scrub and recovery
Rename dump_reservations to dump_recovery_reservations
Add dump_scrub_reservations- 06:59 PM Feature #40640 (Pending Backport): Network ping monitoring
- 01:50 PM Backport #41447 (In Progress): mimic: osd/PrimaryLogPG: Access destroyed references in finish_deg...
- 01:01 PM Backport #41351 (In Progress): mimic: hidden corei7 requirement in binary packages
- 12:49 PM Backport #41291 (In Progress): mimic: filestore pre-split may not split enough directories
- 12:48 PM Backport #40732 (In Progress): mimic: mon: auth mon isn't loading full KeyServerData after restart
- 12:36 PM Backport #40083 (In Progress): mimic: osd: Better error message when OSD count is less than osd_p...
- 07:25 AM Feature #41666 (Resolved): Issue a HEALTH_WARN when a Pool is configured with [min_]size == 1
- To prevent the user from experiencing data loss, Ceph should issue a health warning if any Pool is configured with a ...
- 12:00 AM Bug #41661 (Resolved): radosbench_omap_write cleanup slow/stuck
- ...
09/04/2019
- 09:34 PM Feature #38458: Ceph does not have command to show current osd primary-affinity
- "ceph osd dump", perhaps with a detail or json formatting, includes that information.
I don't think we have any qu... - 05:49 AM Feature #38458: Ceph does not have command to show current osd primary-affinity
- Greg, what is the exact command ?
- 05:58 PM Bug #41657 (Fix Under Review): osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find...
- 05:53 PM Bug #41657: osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find_best_info_ignore_h...
- The find_best_info process excludes getting a master log from an osd with an old(er) last_epoch_started. However, th...
- 05:51 PM Bug #41657 (Rejected): osd/PeeringState.cc: 2540: FAILED ceph_assert(cct->_conf->osd_find_best_in...
- ...
- 01:27 PM Feature #41650 (New): Convert between EC profiles online
- Users have repeatedly voiced the need to convert/modify an EC profile while the cluster was running, in response to c...
- 09:06 AM Feature #41647 (Resolved): pg_autoscaler should show a warning if pg_num isn't a power of two
- As the pg_autoscaler will be automatically turned on with the 14.2.4 release and future releases I would like to enha...
- 03:36 AM Bug #40646 (Resolved): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-8-libstd...
- 01:23 AM Bug #40646 (Fix Under Review): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-...
- 12:43 AM Bug #40646 (Resolved): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-8-libstd...
- 12:32 AM Bug #38483 (Pending Backport): FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_...
09/03/2019
- 09:19 PM Bug #20283: qa: missing even trivial tests for many commands
- Updated script run
'cache drop' has no apparent tests
'cache status' has no apparent tests
'config ls' has no ap... - 09:04 PM Bug #41610 (Rejected): python Rados library does not support mon_host bracketed syntax
- Your version of librados is too old - 0.69 is cuttlefish. For v1/v2 addresses like that, you need nautilus (v14.2.0+)
- 07:26 AM Bug #41610 (Rejected): python Rados library does not support mon_host bracketed syntax
- Ceph Nautilus deployed using ceph-ansible
By default it generates ceph.conf with bracketed mon_host syntax (see ht... - 08:48 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
- Addressing backport-create-issue script complaint:...
- 06:25 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
- Luminous doesn't have the issue!
- 08:40 PM Backport #41640 (Resolved): nautilus: FAILED ceph_assert(info.history.same_interval_since != 0) i...
- https://github.com/ceph/ceph/pull/30280
- 08:37 PM Bug #41639 (Rejected): mon/MgrMonitor: enable pg_autoscaler by default for nautilus
- Only https://github.com/ceph/ceph/pull/30112/commits/23edfd202ec1d98cc8c3d52aaaae1d985417aacf needs to be backported ...
- 08:08 PM Backport #41350: nautilus: hidden corei7 requirement in binary packages
- @Harry - since this is a backport ticket (just for tracking the nautilus backport), I copied your comment to the pare...
- 08:07 PM Bug #41330: hidden corei7 requirement in binary packages
- Hi Harry. I don't know about any distros other than openSUSE and SUSE Linux Enterprise. In those distros, there isn't...
- 08:01 PM Bug #41330: hidden corei7 requirement in binary packages
- At https://tracker.ceph.com/issues/41350#note-3 (i.e. in the nautilus backport ticket), Harry Coin wrote:
"The 'si... - 07:16 PM Bug #39152 (Duplicate): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
- yep, dup of #39693
- 06:25 PM Backport #41582 (Rejected): luminous: backfill_toofull seen on cluster where the most full OSD is...
- 02:19 PM Bug #37654 (Pending Backport): FAILED ceph_assert(info.history.same_interval_since != 0) in PG::s...
- 02:04 PM Bug #41601 (Fix Under Review): oi(object_info_t).size does not match on disk size
- 02:33 AM Bug #41601: oi(object_info_t).size does not match on disk size
- https://github.com/ceph/ceph/pull/30085
- 06:22 AM Bug #40646 (Fix Under Review): FTBFS with devtoolset-8-gcc-c++-8.3.1-3.el7.x86_64 and devtoolset-...
- * https://github.com/ceph/ceph-build/pull/1387
* https://github.com/ceph/ceph/pull/30088
* https://github.com/ceph/... - 01:19 AM Backport #41595 (In Progress): mimic: ceph-objectstore-tool can't remove head with bad snapset
- https://github.com/ceph/ceph/pull/30081
- 01:16 AM Backport #41596 (In Progress): nautilus: ceph-objectstore-tool can't remove head with bad snapset
- https://github.com/ceph/ceph/pull/30080
09/02/2019
- 02:04 PM Backport #41350: nautilus: hidden corei7 requirement in binary packages
- Thanks! The 'silent' requirement that ceph run only on -march=corei7 capable servers killed two ubuntu eoan based sy...
- 01:29 PM Bug #41601 (Resolved): oi(object_info_t).size does not match on disk size
- In our test environment(ceph version 14.2.1(nautilus) + replicated pool), we found scrub error like bug23701. We use ...
- 10:09 AM Backport #41597 (Rejected): luminous: ceph-objectstore-tool can't remove head with bad snapset
- 10:09 AM Backport #41596 (Resolved): nautilus: ceph-objectstore-tool can't remove head with bad snapset
- https://github.com/ceph/ceph/pull/30080
- 10:09 AM Backport #41595 (Resolved): mimic: ceph-objectstore-tool can't remove head with bad snapset
- https://github.com/ceph/ceph/pull/30081
- 10:07 AM Backport #41582 (Need More Info): luminous: backfill_toofull seen on cluster where the most full ...
08/31/2019
- 03:03 AM Bug #38238 (Duplicate): rados/test.sh: api_aio_pp doesn't seem to start
- 02:12 AM Bug #38238: rados/test.sh: api_aio_pp doesn't seem to start
- ...
- 12:19 AM Bug #41517 (Resolved): Missing head object at primary with snapshots crashes primary
- 12:14 AM Bug #41522 (Pending Backport): ceph-objectstore-tool can't remove head with bad snapset
08/30/2019
- 10:41 PM Bug #41156: dump_float() poor output
Looking at osd dump output in teuthology.log on a test run and I this see output which is ugly:
"full_ratio": ...- 08:34 PM Bug #40522: on_local_recover doesn't touch?
- Failed multiple times: http://pulpito.ceph.com/dzafman-2019-08-28_09:11:55-rados-wip-zafman-testing-distro-basic-smit...
- 06:36 PM Backport #41582: luminous: backfill_toofull seen on cluster where the most full OSD is at 1%
- This bug doesn't exist on Luminous as far as i can tell, I've only ever seen it since Mimic.
- 08:00 AM Backport #41582 (Rejected): luminous: backfill_toofull seen on cluster where the most full OSD is...
- 06:15 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
- Have been able to reproduce it here: http://pulpito.ceph.com/nojha-2019-08-28_19:12:09-rados:singleton-master-distro-...
- 04:43 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
- We didn't see this problem on any of our clusters with the 12.2.12 release, so maybe this isn't the fix if a backport...
- 08:01 AM Bug #41200 (Resolved): osd: fix ceph_assert(mem_avail >= 0) caused by the unset cgroup memory limit
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 08:01 AM Backport #41584 (Resolved): mimic: backfill_toofull seen on cluster where the most full OSD is at 1%
- https://github.com/ceph/ceph/pull/32361
- 08:00 AM Backport #41583 (Resolved): nautilus: backfill_toofull seen on cluster where the most full OSD is...
- https://github.com/ceph/ceph/pull/29999
08/29/2019
- 11:16 PM Bug #23647: thrash-eio test can prevent recovery
- Several proposals that might improve things:
* from Josh, just turn down the odds
* from Greg, is it plausible to... - 09:24 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
- Here's the chain of events that causes this:
Two objects go missing on the primary, and we want to recover them fr... - 06:54 PM Bug #41577: Erasure-Coded storage in bluestore has larger disk usage than expected
- The issue of small object size uses more space seems related to https://tracker.ceph.com/issues/41417
- 06:53 PM Bug #41577 (New): Erasure-Coded storage in bluestore has larger disk usage than expected
- The test is done in ceph 14.2.1
We've tested Erasure Coded storage with the same amount of data, which is 800 GiB.... - 05:55 PM Bug #41429 (Fix Under Review): Incorrect logical operator in Monitor::handle_auth_request()
- 03:36 PM Bug #41526 (Rejected): Choosing the next PG for a deep scrubs wrong.
- 02:43 PM Bug #37775 (Resolved): some pg_created messages not sent to mon
- 02:40 PM Bug #41517: Missing head object at primary with snapshots crashes primary
- Backporting note:
cherry pick https://github.com/ceph/ceph/pull/27575 first, and then https://github.com/ceph/ceph... - 02:39 PM Bug #39286: primary recovery local missing object did not update obc
- Backports to luminous, mimic, and nautilus are being handled via #41517
- 02:38 PM Bug #39286 (Resolved): primary recovery local missing object did not update obc
- Since this introduced a regression in master, I propose to refrain from backporting it separately, but instead backpo...
- 12:35 PM Backport #41568 (In Progress): nautilus: doc: pg_num should always be a power of two
- 08:14 AM Backport #41568 (Resolved): nautilus: doc: pg_num should always be a power of two
- https://github.com/ceph/ceph/pull/30004
- 12:29 PM Backport #41529 (In Progress): nautilus: doc: mon_health_to_clog_* values flipped
- 12:26 PM Bug #39152 (New): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
- This is problematic to backport because the "Pull request ID" field is not populated and none of the notes mention a ...
- 11:28 AM Backport #41503 (In Progress): nautilus: Warning about past_interval bounds on deleting pg
- 11:21 AM Backport #41501 (In Progress): nautilus: backfill_toofull while OSDs are not full (Unneccessary H...
- 11:17 AM Backport #41491 (In Progress): nautilus: OSDCap.PoolClassRNS test aborts
- 11:15 AM Backport #41455: nautilus: osd: fix ceph_assert(mem_avail >= 0) caused by the unset cgroup memory...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29745
m... - 11:14 AM Backport #41455 (Resolved): nautilus: osd: fix ceph_assert(mem_avail >= 0) caused by the unset cg...
- 10:47 AM Backport #41453 (In Progress): nautilus: mon: C_AckMarkedDown has not handled the Callback Arguments
- 10:24 AM Backport #41448 (In Progress): nautilus: osd/PrimaryLogPG: Access destroyed references in finish_...
- 10:20 AM Backport #40889 (Need More Info): luminous: Pool settings aren't populated to OSD after restart.
- non-trivial backport
- 10:20 AM Backport #40890 (Need More Info): mimic: Pool settings aren't populated to OSD after restart.
- non-trivial backport
- 10:20 AM Backport #40891 (Need More Info): nautilus: Pool settings aren't populated to OSD after restart.
- non-trivial backport
- 10:10 AM Bug #40112 (Resolved): mon: rados/multimon tests fail with clock skew
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 10:09 AM Backport #40228 (Resolved): nautilus: mon: rados/multimon tests fail with clock skew
- backport PR https://github.com/ceph/ceph/pull/28576
merge commit 1bc3cc4aa2588bef0acadcf6ba2703df0312b9b4 (v14.2.2-2... - 10:03 AM Backport #40084 (In Progress): nautilus: osd: Better error message when OSD count is less than os...
- 09:56 AM Backport #39700 (In Progress): nautilus: [RFE] If the nodeep-scrub/noscrub flags are set in pools...
- 09:31 AM Bug #41255 (Pending Backport): backfill_toofull seen on cluster where the most full OSD is at 1%
- 08:59 AM Backport #39682 (In Progress): nautilus: filestore pre-split may not split enough directories
- 08:50 AM Backport #39517 (In Progress): nautilus: Improvements to standalone tests.
- 03:51 AM Bug #38155 (Duplicate): PG stuck in undersized+degraded+remapped+backfill_toofull+peered
- I'm assuming that the fix for 24452 also fixed this issue. So marking duplicate.
- 03:27 AM Bug #39115 (Duplicate): ceph pg repair doesn't fix itself if osd is bluestore
OSD crashes are the underlying issue here and we can't say anything about repair until there aren't any more crashes.- 03:09 AM Documentation #41004 (Pending Backport): doc: pg_num should always be a power of two
08/28/2019
- 09:20 PM Bug #41313: PG distribution completely messed up since Nautilus
- Can you reach out on the ceph-users mailing list to see if others have seen similar issues? We've not seen a specific...
- 09:19 PM Bug #40522: on_local_recover doesn't touch?
I see this as a hang in running standalone tests in particular qa/standalone/osd/divergent-priors.sh. The test han...- 09:13 PM Bug #41336 (Resolved): All OSD Faild after Reboot.
- 09:13 PM Bug #41336: All OSD Faild after Reboot.
- This is fixed in later versions - the monitor makes sure stripe_unit is a valid value when the pool is created. With ...
- 09:12 PM Bug #41336: All OSD Faild after Reboot.
- ...
- 09:03 PM Bug #41526: Choosing the next PG for a deep scrubs wrong.
You never know what what scrubs can run with osd_max_scrubs (especially defaulting to 1). Without looking at which...- 08:44 PM Bug #41385 (In Progress): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(f...
- 08:36 PM Feature #41564 (Resolved): Issue health status warning if num_shards_repaired exceeds some threshold
Now that num_shards_repaired has been added, we can assist in noticing disk, controller, software or other issues b...- 08:27 PM Feature #41563: Add connection reset tracking to Network ping monitoring
Experimental code: https://github.com/dzafman/ceph/tree/wip-network-resets- 08:24 PM Feature #41563 (New): Add connection reset tracking to Network ping monitoring
Record connection resets on front and back interfaces and report with ping times- 08:25 PM Backport #41341 (In Progress): nautilus: "CMake Error" in test_envlibrados_for_rocksdb.sh
- 08:20 PM Bug #41517: Missing head object at primary with snapshots crashes primary
- This was caused by https://github.com/ceph/ceph/pull/27575
- 05:26 PM Bug #41517 (In Progress): Missing head object at primary with snapshots crashes primary
- 06:42 PM Bug #41522 (In Progress): ceph-objectstore-tool can't remove head with bad snapset
- 06:42 PM Backport #38450 (In Progress): mimic: src/osd/OSDMap.h: 1065: FAILED assert(__null != pool)
- 06:36 PM Bug #37775: some pg_created messages not sent to mon
- This patch does not make sense for mimic and luminous.
@Nathan can we please resolve this issue and close the corre... - 06:34 PM Bug #36498 (New): failed to recover before timeout expired due to pg stuck in creating+peering
- I don't think this is a duplicate of https://tracker.ceph.com/issues/37752 or https://tracker.ceph.com/issues/37775 f...
- 06:09 PM Bug #39286: primary recovery local missing object did not update obc
- https://tracker.ceph.com/issues/41517 is a follow on fix for this.
- 11:46 AM Bug #41550 (Fix Under Review): os/bluestore: fadvise_flag leak in generate_transaction
- 08:09 AM Bug #41550: os/bluestore: fadvise_flag leak in generate_transaction
- https://github.com/ceph/ceph/pull/29944
- 08:06 AM Bug #41550 (Resolved): os/bluestore: fadvise_flag leak in generate_transaction
- In generate_transaction when creating ceph::os::Transaction, ObjectOperation::BufferUpdate::Write::fadvise_flag is no...
- 11:13 AM Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- ...
- 11:11 AM Backport #38567: luminous: osd_recovery_priority is not documented (but osd_recovery_op_priority is)
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27471
m... - 10:11 AM Backport #40638: luminous: osd: report omap/data/metadata usage
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28851
m... - 07:43 AM Backport #41548 (Resolved): nautilus: monc: send_command to specific down mon breaks other mon msgs
- https://github.com/ceph/ceph/pull/31037
- 07:43 AM Backport #41547 (Rejected): luminous: monc: send_command to specific down mon breaks other mon msgs
- 07:42 AM Backport #41546 (Rejected): mimic: monc: send_command to specific down mon breaks other mon msgs
Also available in: Atom