Activity
From 04/08/2019 to 05/07/2019
05/07/2019
- 05:50 PM Bug #38724 (Pending Backport): _txc_add_transaction error (39) Directory not empty not handled on...
- 05:38 PM Feature #38029 (Pending Backport): [RFE] If the nodeep-scrub/noscrub flags are set in pools inste...
- 10:16 AM Bug #38124: OSD down on snaptrim.
- Erikas Kučinskis wrote:
> Hi is there any ETA when the bug fix will be live?
- 10:15 AM Bug #38124: OSD down on snaptrim.
- Hi is there any ETA when the bug will be live?
- 07:30 AM Backport #39506: mimic: Give recovery for inactive PGs a higher priority
- Assigning to Neha based on http://tracker.ceph.com/issues/39099#note-11
- 07:29 AM Backport #39505: luminous: Give recovery for inactive PGs a higher priority
- Assigning to Neha based on http://tracker.ceph.com/issues/39099#note-11
- 06:13 AM Bug #16553: Removing Writeback Cache Tier Does not clean up Incomplete_Clones
- Still hit the same issue on 12.2.10
- 05:49 AM Backport #39311 (In Progress): mimic: crushtool crash on Fedora 28 and newer
- https://github.com/ceph/ceph/pull/27986
05/06/2019
- 09:44 PM Bug #25182 (Resolved): Upmaps forgotten after restarting OSDs
- Thanks for verifying the fixes Bryan. Looks like those are all backported to mimic + luminous.
- 09:52 AM Support #39594 (New): OSD marked as down, had timed out after 15, handle_connect_reply connect go...
- Hi,
recently we saw random slow requests in our cluster. in monitor ceph.log I could see that at the same time OSD... - 09:18 AM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- This may be the case indeed, but I'd expect that unless pgs are evacuated, the state would be backfill_wait, not back...
05/05/2019
- 09:13 PM Bug #39152: nautilus osd crash: Caught signal (Aborted) tp_osd_tp
- Sage Weil wrote:
> I'm guessing this is a dup of #38724
>
> Wen, can you tell us what the cluster workload was? ...
05/04/2019
- 10:43 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- ...
- 06:35 PM Bug #39582 (Fix Under Review): Binary data in OSD log from "CRC header" message
- 01:14 AM Backport #39420 (In Progress): luminous: Don't mark removed osds in when running "ceph osd in any...
- https://github.com/ceph/ceph/pull/27728
- 01:02 AM Bug #39304 (In Progress): short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when las...
05/03/2019
- 04:43 PM Bug #39582 (Resolved): Binary data in OSD log from "CRC header" message
This breaks grep'ing the osd logs.
Using cat -v we see the binary data:...- 04:37 PM Bug #39581 (Duplicate): osd/PG.cc: 2523: FAILED ceph_assert(scrub_queued)
dzafman-2019-05-02_19:43:04-rados:thrash-wip-zafman-testing-distro-basic-smithi/3919741
This appears to be PG 2....- 11:25 AM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- Greg Farnum wrote:
> The OSD can't count PGs being evacuated as if they were gone because something could go wrong. ... - 09:33 AM Feature #38370 (Resolved): ceph CLI ability to change file ownership
- 09:32 AM Backport #38511 (Resolved): mimic: ceph CLI ability to change file ownership
- 09:31 AM Bug #38537 (Resolved): mgr deadlock
- 09:31 AM Backport #38561 (Resolved): mimic: mgr deadlock
- 09:31 AM Bug #38377 (Resolved): OpTracker destruct assert when OSD destruct
- 09:30 AM Backport #38646 (Resolved): mimic: OpTracker destruct assert when OSD destruct
- 09:27 AM Backport #38879 (In Progress): mimic: ENOENT in collection_move_rename on EC backfill target
- 05:02 AM Documentation #39011 (In Progress): Document how get_recovery_priority() and get_backfill_priorit...
- 04:22 AM Backport #39220 (In Progress): mimic: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_m...
- https://github.com/ceph/ceph/pull/27940
- 01:16 AM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- In both the set of logs you have shared so far, it seems that the object appears in the OSD log during omap-set-vals(...
- 12:45 AM Backport #39206 (In Progress): mimic: osd: leaked pg refs on shutdown
- https://github.com/ceph/ceph/pull/27938
05/02/2019
- 10:54 PM Bug #39383 (Resolved): Too much log output generated from PrimaryLogPG::do_backfill()
- 10:54 PM Backport #39389 (Resolved): nautilus: Too much log output generated from PrimaryLogPG::do_backfi...
- 10:53 PM Bug #38325 (Resolved): Code to strip | from core pattern isn't right
- 10:51 PM Backport #38565 (Resolved): mimic: Code to strip | from core pattern isn't right
- 10:18 PM Backport #38565: mimic: Code to strip | from core pattern isn't right
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26811
merged - 10:15 PM Backport #38507: mimic: ENOENT on setattrs (obj was recently deleted)
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26709
merged - 10:14 PM Backport #38511: mimic: ceph CLI ability to change file ownership
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26760
merged - 10:09 PM Backport #38561: mimic: mgr deadlock
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26833
merged - 10:08 PM Backport #38646: mimic: OpTracker destruct assert when OSD destruct
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26862
merged - 07:54 PM Bug #23879: test_mon_osdmap_prune.sh fails
- /a/yuriw-2019-05-01_19:40:05-rados-wip-yuri3-testing-2019-04-30-1543-mimic-distro-basic-smithi/3916650/
- 07:52 PM Backport #38879: mimic: ENOENT in collection_move_rename on EC backfill target
- This failure was seen in mimic: /a/yuriw-2019-04-30_20:31:27-rados-wip-yuri3-testing-2019-04-30-1543-mimic-distro-bas...
- 05:40 PM Bug #39152: nautilus osd crash: Caught signal (Aborted) tp_osd_tp
- I'm guessing this is a dup of #38724
Wen, can you tell us what the cluster workload was? rgw? rbd? cephfs? Thanks! - 05:10 PM Documentation #39011: Document how get_recovery_priority() and get_backfill_priority() impacts re...
Now we need to include new recovery priority boost.
/// base recovery priority for MRecoveryReserve (inactive PG...- 04:33 PM Bug #38724 (Fix Under Review): _txc_add_transaction error (39) Directory not empty not handled on...
- https://github.com/ceph/ceph/pull/27929
- 04:18 PM Bug #39570 (Fix Under Review): nautilus with requrie_osd_release < nautilus cannot increase pg_num
- https://github.com/ceph/ceph/pull/27928
- 03:59 PM Bug #39570 (Resolved): nautilus with requrie_osd_release < nautilus cannot increase pg_num
- On Mon, 29 Apr 2019, Alexander Y. Fomichev wrote:
> Hi,
>
> I just upgraded from mimic to nautilus(14.2.0) and...
05/01/2019
- 10:55 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- The OSDs definitely had objects corresponding to the maps, but they failed the CRC check when trying to read them. Al...
- 09:25 PM Bug #39525 (Need More Info): lz4 compressor corrupts data when buffers are unaligned
- Was the problem 1) that different OSDs needed different maps, and they had mismatched CRCs when exported from a 13.2....
- 09:24 PM Bug #39490 (Resolved): osd: failed to encode map e26 with expected crc
- 09:24 PM Bug #39509 (Need More Info): segm fault when invoke MergeOperatorRouter::Name()
- We'll need more information on how and where this occurred.
- 09:11 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- The OSD can't count PGs being evacuated as if they were gone because something could go wrong. So it's stuck seeing i...
- 08:10 AM Bug #39555 (Resolved): backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
- This week I ran into an issue where ceph reports HEALTH_ERR because pgs are backfill_toofull.
None of the OSDs are o... - 09:06 PM Bug #39449: Uninit in EVP_DecryptFinal_ex on ceph::crypto::onwire::AES128GCM_OnWireRxHandler::aut...
- We probably need to backport this?
- 08:50 PM Backport #39044 (Resolved): mimic: osd/PGLog: preserve original_crt to check rollbackability
- 03:48 PM Backport #39044: mimic: osd/PGLog: preserve original_crt to check rollbackability
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27629
merged - 08:49 PM Backport #39342 (Resolved): mimic: ceph-objectstore-tool rename dump-import to dump-export
- 03:47 PM Backport #39342: mimic: ceph-objectstore-tool rename dump-import to dump-export
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27635
merged - 08:48 PM Backport #39433 (Resolved): mimic: Degraded PG does not discover remapped data on originating OSD
- 03:45 PM Backport #39433: mimic: Degraded PG does not discover remapped data on originating OSD
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27745
merged - 08:45 PM Backport #39506 (Need More Info): mimic: Give recovery for inactive PGs a higher priority
- 08:45 PM Backport #39505 (Need More Info): luminous: Give recovery for inactive PGs a higher priority
- 04:34 PM Backport #39563 (In Progress): luminous: Error message displayed when mon_osd_max_split_count wou...
- 04:33 PM Backport #39563 (Resolved): luminous: Error message displayed when mon_osd_max_split_count would ...
- https://github.com/ceph/ceph/pull/27908
- 04:32 PM Bug #39353 (Pending Backport): Error message displayed when mon_osd_max_split_count would be exce...
- 03:47 PM Bug #39353: Error message displayed when mon_osd_max_split_count would be exceeded is not as user...
- https://github.com/ceph/ceph/pull/27647 merged
- 02:24 PM Bug #39099: Give recovery for inactive PGs a higher priority
- David, we should discuss whether we want to backport this all the way to luminous or just to nautilus.
- 02:28 AM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...
I have a version that not only copies the existing dups, but if copy_up_to is excluding any log entries it adds tho...- 01:38 AM Bug #23879: test_mon_osdmap_prune.sh fails
- /a/yuriw-2019-04-29_22:14:10-rados-wip-yuri2-testing-2019-04-29-1936-mimic-distro-basic-smithi/3910028
04/30/2019
- 11:06 PM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...
Before fix:...- 01:26 AM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...
- 2119'372 (write 2775265) does not get identified as a dup when the log boundaries are (2119'372,2119'373], while 2119...
- 12:51 AM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...
Probably 1.1.short_pg_log.yaml produces this
osd_max_pg_log_entries: 2
osd_min_pg_log_entri...- 12:08 AM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...
Before changing primaries from 3 to 0, these 4 operations came in with versions 2119'370 (write 726564), 2119'371 (...- 07:47 PM Bug #39484: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- Updated the PR. Please put further code reviews there. :)
- 06:37 PM Bug #39484: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- Hmm probably!
- 02:50 AM Bug #39484: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- Greg Farnum wrote:
> pending_finishers get moved into committing_finishers once they have been submitted to disk, so... - 12:03 AM Bug #39484 (Fix Under Review): mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- https://github.com/ceph/ceph/pull/27877
- 07:04 PM Bug #39553 (New): PeeringState: cache PeeringListener indirections
- perf_counter refs, etc should be immutable, so PeeringState may as well cache them. Add a call to do so as to avoid ...
- 06:58 PM Bug #39552 (New): mons fail to process send_alive message causing pg stuck creating
- sjust-2019-04-28_20:59:54-rados-wip-sjust-peering-refactor-distro-basic-smithi/3905696/
PG 200.3 is stuck creating... - 05:43 PM Bug #39546: Warning about past_interval bounds on deleting pg
- sjust-2019-04-26_14:00:33-rados-wip-sjust-peering-refactor-distro-basic-mira/3897200/
- 05:43 PM Bug #39546 (Resolved): Warning about past_interval bounds on deleting pg
- cluster [ERR] 4.7
required past_interval bounds are empty [228,226) but past_intervals is not: ([166,225]
a... - 11:36 AM Backport #39539 (Resolved): nautilus: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()-...
- https://github.com/ceph/ceph/pull/28219
- 11:36 AM Backport #39538 (Resolved): mimic: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->ge...
- https://github.com/ceph/ceph/pull/28259
- 11:36 AM Backport #39537 (Resolved): luminous: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()-...
- https://github.com/ceph/ceph/pull/28989
- 01:32 AM Bug #38846 (In Progress): dump_pgstate_history doesn't really produce useful json output, needs a...
- 12:39 AM Backport #39218 (In Progress): luminous: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().i...
- https://github.com/ceph/ceph/pull/27878
04/29/2019
- 11:21 PM Backport #39219 (In Progress): nautilus: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().i...
- https://github.com/ceph/ceph/pull/27839
- 11:18 PM Bug #38784 (Pending Backport): osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(...
- 06:29 AM Bug #38784 (In Progress): osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(soid)...
- 10:40 PM Bug #39484 (In Progress): mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- Hmm this doesn't make a lot of sense. finish_contexts() swaps out the input list with a local one before running fini...
- 09:22 PM Bug #39484: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- pending_finishers get moved into committing_finishers once they have been submitted to disk, so we probably want to f...
- 10:16 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- Hi Greg,
That might actually explain how it happened originally; we auto-deploy hosts with salt, and noticed that ... - 09:15 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- Hmm are the working and broken OSDs actually running the same binary version? It should work anyway but a bug around ...
- 05:23 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
- I might have gotten slightly further.
1) On one of the the broken OSDs, the current_epoch is 34626 (clean_thru is... - 03:58 PM Bug #39525 (Resolved): lz4 compressor corrupts data when buffers are unaligned
- In conjunction with taking a new storage server online we observed that 5 out of the 6 SSDs we use to store metadata ...
- 09:23 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- Unfortunately I didn't turn up debug logging for every OSD in the cluster so I don't have those logs. I'll reproduce...
- 08:58 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- Bryan, could you also upload osd.516 and osd.563 logs from the same time period as you've provided for osd.503.
- 09:02 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
- A user saw this and uploaded with debug 20 on OSD and bluestone: 2d8d22f4-580b-4b57-a13a-f49dade34ba7
- 08:53 PM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...
After writing tids 1,2,3 the output shows finishing tids 1,2,3,4,5. We can see that the 3rd set of writes was inte...- 10:30 AM Backport #39504 (In Progress): nautilus: Give recovery for inactive PGs a higher priority
- 10:23 AM Backport #39520 (Rejected): luminous: snaps missing in mapper, should be: ca was r -2...repaired
- 10:23 AM Backport #39519 (Resolved): nautilus: snaps missing in mapper, should be: ca was r -2...repaired
- https://github.com/ceph/ceph/pull/28205
- 10:23 AM Backport #39518 (Resolved): mimic: snaps missing in mapper, should be: ca was r -2...repaired
- https://github.com/ceph/ceph/pull/28232
- 10:23 AM Backport #39517 (Resolved): nautilus: Improvements to standalone tests.
- https://github.com/ceph/ceph/pull/30528
- 10:22 AM Backport #39516 (Resolved): nautilus: osd-backfill-space.sh test failed in TEST_backfill_multi_pa...
- https://github.com/ceph/ceph/pull/28187
- 10:22 AM Backport #39515 (Rejected): luminous: osd: segv in _preboot -> heartbeat
- 10:22 AM Backport #39514 (Resolved): nautilus: osd: segv in _preboot -> heartbeat
- https://github.com/ceph/ceph/pull/28164
- 10:22 AM Backport #39513 (Resolved): mimic: osd: segv in _preboot -> heartbeat
- https://github.com/ceph/ceph/pull/28220
- 10:22 AM Backport #39512 (Resolved): nautilus: osd acting cycle
- https://github.com/ceph/ceph/pull/28160
04/28/2019
- 02:09 PM Bug #39509 (Need More Info): segm fault when invoke MergeOperatorRouter::Name()
- (gdb) bt
#0 0x00007f60321804ab in raise () from /lib64/libpthread.so.0
#1 0x000055aafad0501a in handle_fatal_sign... - 08:19 AM Bug #39449: Uninit in EVP_DecryptFinal_ex on ceph::crypto::onwire::AES128GCM_OnWireRxHandler::aut...
- /a/kchai-2019-04-27_02:20:42-rados-wip-kefu-testing-2019-04-26-2318-distro-basic-smithi/3898463/remote/smithi017/log/...
- 12:29 AM Bug #38124 (Fix Under Review): OSD down on snaptrim.
04/27/2019
- 11:26 PM Bug #38124: OSD down on snaptrim.
The following script sometimes hits the race and crashes an OSD. I've removed the assert and the script has been r...- 04:39 AM Documentation #3466: rados manpage: bench still documents "read" rather than "seq/rand"
- Dan Mick wrote:
> rados bench read has been replaced with "seq" and "rand", the latter of which is
> still unimplem...
04/26/2019
- 11:45 PM Bug #39152: nautilus osd crash: Caught signal (Aborted) tp_osd_tp
- A similar issue was reported on ceph-users: "Nautilus (14.2.0) OSDs crashing at startup after removing a pool contain...
- 11:19 PM Bug #38124 (In Progress): OSD down on snaptrim.
I am able to reproduce this, so I'll work on a fix.- 11:01 PM Bug #39441 (Pending Backport): osd acting cycle
- 04:23 PM Bug #39441 (Fix Under Review): osd acting cycle
- 10:27 PM Bug #38840 (Pending Backport): snaps missing in mapper, should be: ca was r -2...repaired
- 12:04 AM Bug #38840: snaps missing in mapper, should be: ca was r -2...repaired
- 06:55 PM Bug #39333 (Pending Backport): osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
- 05:12 PM Bug #39333: osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
- 06:23 PM Bug #39439 (Pending Backport): osd: segv in _preboot -> heartbeat
- 05:27 PM Feature #39162 (Pending Backport): Improvements to standalone tests.
- 03:51 PM Feature #38617 (Fix Under Review): osd: Better error message when OSD count is less than osd_pool...
- 03:46 PM Backport #39506 (Rejected): mimic: Give recovery for inactive PGs a higher priority
- 03:46 PM Backport #39505 (Rejected): luminous: Give recovery for inactive PGs a higher priority
- 03:46 PM Backport #39504 (Resolved): nautilus: Give recovery for inactive PGs a higher priority
- https://github.com/ceph/ceph/pull/27854
- 08:59 AM Backport #39204 (In Progress): luminous: osd: leaked pg refs on shutdown
- https://github.com/ceph/ceph/pull/27810
- 03:45 AM Backport #39205 (In Progress): nautilus: osd: leaked pg refs on shutdown
- https://github.com/ceph/ceph/pull/27803
- 01:40 AM Bug #39484: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- Upload the core dump log file.
And the ceph -s:
!ceph_status.png!
mon.b01 crashs again and again. - 12:03 AM Bug #35808 (Need More Info): ceph osd ok-to-stop result dosen't match the real situation
- Can the reporter test this with the change in https://github.com/ceph/ceph/pull/27503 and report back?
04/25/2019
- 11:58 PM Bug #38930 (Duplicate): ceph osd safe-to-destroy wrongly approves any out osd
We can backport pull request https://github.com/ceph/ceph/pull/27503 for http://tracker.ceph.com/issues/39099 which...- 11:55 PM Bug #38930 (Pending Backport): ceph osd safe-to-destroy wrongly approves any out osd
- 11:54 PM Bug #39099 (Pending Backport): Give recovery for inactive PGs a higher priority
- 09:39 PM Bug #39490 (In Progress): osd: failed to encode map e26 with expected crc
- should be fixed by https://github.com/ceph/ceph/pull/27623
- 08:28 PM Bug #39490 (Resolved): osd: failed to encode map e26 with expected crc
upgrade:nautilus-x/parallel/{0-cluster/{openstack.yaml start.yaml} 1-ceph-install/nautilus.yaml 1.1-pg-log-override...- 08:34 PM Bug #36748: ms_deliver_verify_authorizer no AuthAuthorizeHandler found for protocol 0
- ...
- 08:13 PM Bug #38483: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
- /a/nojha-2019-04-25_05:43:35-rados-wip-39441-distro-basic-smithi/3892156/
- 08:10 PM Bug #37797: radosbench tests hit ENOSPC
- This one appeared again.
/a/nojha-2019-04-25_05:43:35-rados-wip-39441-distro-basic-smithi/3892141/ - 12:27 PM Bug #39484 (Resolved): mon: "FAILED assert(pending_finishers.empty())" when paxos restart
- We are running ceph 13.2.5 on Centos Linux 7.5.1804, and the ceph cluster consists of 5 ceph-mon. Every 30 seconds, w...
- 08:07 AM Bug #26958: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objec...
- ...
- 07:46 AM Backport #39476 (Resolved): nautilus: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- https://github.com/ceph/ceph/pull/28141
- 07:46 AM Backport #39475 (Resolved): mimic: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- https://github.com/ceph/ceph/pull/28206
- 07:46 AM Backport #39474 (Resolved): luminous: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- https://github.com/ceph/ceph/pull/32349
- 07:45 AM Backport #39419 (In Progress): nautilus: rados/upgrade/nautilus-x-singleton: mon.c@1(electing).el...
- 06:13 AM Bug #39443: "ceph daemon" does not support ceph args
- Sure, 'debug_ms' was just an example to illustrate the problem. Yes the most (if not all) ceph args do not make sense...
- 03:53 AM Bug #39333 (Fix Under Review): osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
04/24/2019
- 09:31 PM Bug #35808: ceph osd ok-to-stop result dosen't match the real situation
- This may be fixed by https://github.com/ceph/ceph/pull/27503
- 07:40 PM Bug #39441: osd acting cycle
- https://github.com/ceph/ceph/pull/24004 was not backported to mimic, which might explain why the octopus osd is calcu...
- 06:51 PM Bug #39443: "ceph daemon" does not support ceph args
- I'm not sure this is a problem — "ceph daemon" is just for talking to a local Unix socket; it doesn't engage in any o...
- 10:32 AM Bug #39443 (New): "ceph daemon" does not support ceph args
- This works:...
- 06:49 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
- We're also seeing Bus Errors instead of segfaults in the OpHistory cleanup at #24664 so these may be related...
- 06:47 PM Bug #39336 (Duplicate): "*** Caught signal (Bus error) **" in upgrade:luminous-x-mimic
- 06:41 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- Bryan Stillwell wrote:
> I could grab you the debug logs, but that could take a while. Which knobs do you want me t... - 01:26 PM Bug #39449: Uninit in EVP_DecryptFinal_ex on ceph::crypto::onwire::AES128GCM_OnWireRxHandler::aut...
- PRs:
* https://github.com/ceph/teuthology/pull/1274
* https://github.com/ceph/ceph/pull/27265
Gist:
* https:... - 01:24 PM Bug #39449 (Resolved): Uninit in EVP_DecryptFinal_ex on ceph::crypto::onwire::AES128GCM_OnWireRxH...
- ...
- 01:18 PM Backport #39431 (In Progress): luminous: Degraded PG does not discover remapped data on originati...
- 12:35 PM Backport #39433 (In Progress): mimic: Degraded PG does not discover remapped data on originating OSD
- 12:33 PM Backport #39432 (In Progress): nautilus: Degraded PG does not discover remapped data on originati...
04/23/2019
- 10:09 PM Bug #39441 (Resolved): osd acting cycle
- osd.9 (mimic)...
- 06:07 PM Bug #26958 (Pending Backport): osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_lo...
- 01:50 AM Bug #26958 (Fix Under Review): osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_lo...
- 06:05 PM Bug #38296 (Pending Backport): segv in fgets() in collect_sys_info reading /proc/cpuinfo
- 06:04 PM Bug #39439 (Fix Under Review): osd: segv in _preboot -> heartbeat
- https://github.com/ceph/ceph/pull/27729
- 06:01 PM Bug #39439 (Resolved): osd: segv in _preboot -> heartbeat
- ...
- 05:47 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- /ceph/teuthology-archive/pdonnell-2019-04-17_06:12:56-kcephfs-wip-pdonnell-testing-20190417.032809-distro-basic-smith...
- 01:07 PM Backport #39433 (Resolved): mimic: Degraded PG does not discover remapped data on originating OSD
- https://github.com/ceph/ceph/pull/27745
- 01:07 PM Backport #39432 (Resolved): nautilus: Degraded PG does not discover remapped data on originating OSD
- https://github.com/ceph/ceph/pull/27744
- 01:07 PM Backport #39431 (Resolved): luminous: Degraded PG does not discover remapped data on originating OSD
- https://github.com/ceph/ceph/pull/27751
- 01:05 PM Backport #39422 (Resolved): mimic: Don't mark removed osds in when running "ceph osd in any|all|*"
- https://github.com/ceph/ceph/pull/28142
- 01:05 PM Backport #39421 (Resolved): nautilus: Don't mark removed osds in when running "ceph osd in any|al...
- https://github.com/ceph/ceph/pull/28072
- 01:05 PM Backport #39420 (Resolved): luminous: Don't mark removed osds in when running "ceph osd in any|al...
- https://github.com/ceph/ceph/pull/27728
- 01:04 PM Backport #39419 (Resolved): nautilus: rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elect...
- https://github.com/ceph/ceph/pull/27771
- 11:03 AM Bug #24419: ceph-objectstore-tool unable to open mon store
- Were you able to figure out why?
- 10:52 AM Support #39319: Every 15 min - Monitor daemon marked osd.x down, but it is still running
- 2019-04-23 13:36:20.668791 osd.2 [WRN] Monitor daemon marked osd.2 down, but it is still running
2019-04-23 13:40:36... - 10:51 AM Support #39319: Every 15 min - Monitor daemon marked osd.x down, but it is still running
- I add "debug ms = 1" line in [osd] adm view log monitor in /var/log/ceph
...
mon.greend02-n02ceph02@1(peon) e3 ms... - 06:41 AM Backport #39042 (In Progress): luminous: osd/PGLog: preserve original_crt to check rollbackability
- https://github.com/ceph/ceph/pull/27715
04/22/2019
- 11:32 PM Bug #38296 (In Progress): segv in fgets() in collect_sys_info reading /proc/cpuinfo
- 05:52 PM Bug #38296: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- https://github.com/ceph/ceph/pull/27707
(looks like the buffer is only 100 chars, and /proc/cpuinfo frequently exc... - 05:52 PM Bug #38296 (Fix Under Review): segv in fgets() in collect_sys_info reading /proc/cpuinfo
- https://github.com/ceph/ceph/pull/27707
(looks like the buffer is only 100 chars, and /proc/cpuinfo frequently exc... - 05:48 PM Bug #38296: segv in fgets() in collect_sys_info reading /proc/cpuinfo
- saw this again: ...
- 09:17 PM Bug #39402 (New): Can't remove ghost PGs
- This is on the downstream long-running cluster. I can grant SSH access to whomever needs it.
This bug is similar ... - 06:26 PM Bug #39263 (Pending Backport): rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) ...
- Only this commit needs to be backported to nautilus https://github.com/ceph/ceph/pull/27622/commits/ccb86682361cf20bd...
- 05:08 PM Bug #39398 (Fix Under Review): osd: fast_info need update when pglog rewind
- 08:43 AM Bug #39398 (Duplicate): osd: fast_info need update when pglog rewind
- When the pglog need rewind, the info.last_update will need to change to
older value, current impl of PG::_prepare_wr... - 01:50 PM Bug #37679 (Fix Under Review): osd: pull object from the shard who missing it
- 07:50 AM Bug #26958: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objec...
- http://qa-proxy.ceph.com/teuthology/xxg-2019-04-19_03:19:09-rados-wip-yanj-testing-fixpeerings-190418-distro-basic-sm...
- 02:28 AM Bug #37439 (Pending Backport): Degraded PG does not discover remapped data on originating OSD
04/20/2019
- 01:47 PM Bug #39154 (Pending Backport): Don't mark removed osds in when running "ceph osd in any|all|*"
04/19/2019
- 04:07 AM Bug #39390: filestore pre-split may not split enough directories
- https://github.com/ceph/ceph/pull/27689
- 03:50 AM Bug #39390 (Resolved): filestore pre-split may not split enough directories
- Current HashIndex::pre_split_folder() use the following snippet to figure the number of levels for split....
04/18/2019
- 10:04 PM Backport #39389 (In Progress): nautilus: Too much log output generated from PrimaryLogPG::do_bac...
- 10:02 PM Backport #39389 (Resolved): nautilus: Too much log output generated from PrimaryLogPG::do_backfi...
- https://github.com/ceph/ceph/pull/27687
- 09:53 PM Bug #39383 (Pending Backport): Too much log output generated from PrimaryLogPG::do_backfill()
- 09:06 PM Bug #39383 (In Progress): Too much log output generated from PrimaryLogPG::do_backfill()
- 02:55 PM Bug #39383 (Resolved): Too much log output generated from PrimaryLogPG::do_backfill()
Caused by 834d3c19a77- 12:30 PM Bug #39054: osd push failed because local copy is 4394'133607637
- Greg Farnum wrote:
> As Jewel is an outdated release and you ran the potentially-destructive repair tools, you'll ha... - 12:24 PM Bug #39054: osd push failed because local copy is 4394'133607637
- thank you
- 09:44 AM Backport #39381 (Rejected): luminous: src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
- 09:38 AM Feature #39066 (Pending Backport): src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
- 09:36 AM Backport #38873 (In Progress): luminous: Rados.get_fsid() returning bytes in python3
- 09:35 AM Backport #38872 (Resolved): mimic: Rados.get_fsid() returning bytes in python3
- 09:26 AM Bug #38992 (Resolved): unable to link rocksdb library if use system rocksdb
- 09:26 AM Backport #38993 (Resolved): nautilus: unable to link rocksdb library if use system rocksdb
- 09:24 AM Backport #39325 (Resolved): nautilus: ceph-objectstore-tool rename dump-import to dump-export
- 09:24 AM Backport #39310 (Resolved): nautilus: crushtool crash on Fedora 28 and newer
- 09:19 AM Backport #39375 (Resolved): nautilus: ceph tell osd.xx bench help : gives wrong help
- https://github.com/ceph/ceph/pull/28035
- 09:19 AM Backport #39374 (Resolved): mimic: ceph tell osd.xx bench help : gives wrong help
- https://github.com/ceph/ceph/pull/28097
- 09:19 AM Backport #39373 (Resolved): luminous: ceph tell osd.xx bench help : gives wrong help
- https://github.com/ceph/ceph/pull/28112
- 06:41 AM Bug #19753: Deny reservation if expected backfill size would put us over backfill_full_ratio
- would save a lot of diskspace if you could fix that :)
- 05:38 AM Bug #39282: EIO from process_copy_chunk_manifest
- https://github.com/ceph/ceph/pull/27667
- 04:46 AM Bug #38846 (Fix Under Review): dump_pgstate_history doesn't really produce useful json output, ne...
- 03:27 AM Bug #39154 (In Progress): Don't mark removed osds in when running "ceph osd in any|all|*"
- 02:08 AM Bug #39154 (Fix Under Review): Don't mark removed osds in when running "ceph osd in any|all|*"
- 02:46 AM Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd
The fix checks for down OSD when all PGs aren't active+clean and doesn't trust num_pgs which is 0 after marking a d...- 12:47 AM Support #39319: Every 15 min - Monitor daemon marked osd.x down, but it is still running
- Turn up debug_ms to 5 maybe. It's very likely you need to look more closely at your network.
04/17/2019
- 11:33 PM Feature #39066: src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
- merged https://github.com/ceph/ceph/pull/27228
- 11:32 PM Backport #38872: mimic: Rados.get_fsid() returning bytes in python3
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27259
merged - 10:25 PM Bug #39006 (Pending Backport): ceph tell osd.xx bench help : gives wrong help
- 10:24 PM Bug #39306 (Rejected): ceph config: impossible to set osd_scrub_chunk_max
- OK, can you open two new trackers then please. One for each specific problem?
- 12:24 PM Bug #39306: ceph config: impossible to set osd_scrub_chunk_max
- Yes! I have discovered TWO problems:
1. Problem with _min_: ceph config set osd osd_scrub_chunk_min = 1 WORKS (!) ... - 10:06 PM Bug #19753: Deny reservation if expected backfill size would put us over backfill_full_ratio
- A couple of dout(0) should be dout(20) or some dout(10) for some less repetitive ones.
- 08:26 PM Bug #19753: Deny reservation if expected backfill size would put us over backfill_full_ratio
- during backfilling after a failed disk the log files get spammed with do_backfill messages. log files easily grow bey...
- 10:03 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- (not surprisingly, MON_DOWN is in the ceph.log too, and the run would have failed with that had it not failed for som...
- 09:53 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- mon.c is failing to connect to mon.a:...
- 10:00 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- I could grab you the debug logs, but that could take a while. Which knobs do you want me to turn up?
This is what... - 09:47 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- The reason the OSDs are rebooting is that we're applying the latest OS updates for CentOS, so it should be a proper s...
- 09:28 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- Or maybe I misread that; is the claim that an OSD reboots, *then* a delete happens, and then later on you discover th...
- 09:27 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
- Is there any chance of getting good debug logs of the event *while* it happens (ie, not just after scrub detects the ...
- 09:42 PM Bug #39307: EC pools with m=1 are created with an unsafe min_size by default
- Yeah, changing the default ec profile also works
- 09:19 PM Bug #39307: EC pools with m=1 are created with an unsafe min_size by default
- see https://github.com/ceph/ceph/pull/27656 ?
- 09:15 PM Bug #39307 (Won't Fix): EC pools with m=1 are created with an unsafe min_size by default
- This was a deliberate choice. https://github.com/ceph/ceph/pull/26894 made teh change, based on a discussion on anot...
- 09:14 PM Bug #39307: EC pools with m=1 are created with an unsafe min_size by default
- Hmm, is this a default EC mode or just something we let users set?
The change was deliberate in PR https://github.... - 09:22 PM Bug #39249 (Closed): Some PGs stuck in active+remapped state
- This looks like CRUSH's fault. Can you check with tunables you are running? (ceph osd crush show-tunables)
Using ... - 09:08 PM Bug #39263 (Fix Under Review): rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) ...
- https://github.com/ceph/ceph/pull/27622
- 09:01 PM Bug #39286 (Fix Under Review): primary recovery local missing object did not update obc
- https://github.com/ceph/ceph/pull/27575
- 08:04 PM Backport #38993: nautilus: unable to link rocksdb library if use system rocksdb
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/27601
merged - 08:02 PM Backport #39325: nautilus: ceph-objectstore-tool rename dump-import to dump-export
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27610
merged - 08:01 PM Backport #39310: nautilus: crushtool crash on Fedora 28 and newer
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27620
merged - 07:55 PM Bug #39366 (Can't reproduce): ClsLock.TestRenew failure
- ...
- 07:49 PM Feature #39339: prioritize backfill of metadata pools, automatically
I forgot that it is possible that backfill/recovery could be moving data around for several reasons. In those case...- 06:50 PM Feature #39339: prioritize backfill of metadata pools, automatically
Recovery is also about restoring objects to the right level of replication. Because the log is known to represent a...- 03:54 PM Feature #39339: prioritize backfill of metadata pools, automatically
- Also, this ceph command requires the operator to do it, the point of the tracker is that this should be default behav...
- 03:38 PM Feature #39339: prioritize backfill of metadata pools, automatically
- is backfill any different than recovery priority? If not, should it be? By "backfill" I mean the emergency situatio...
- 02:06 PM Feature #39339: prioritize backfill of metadata pools, automatically
- ceph osd pool set <pool> recovery_priority <value>
I think a value of 1 or 2 makes sense (default if unset is 0). - 01:59 AM Feature #39339 (In Progress): prioritize backfill of metadata pools, automatically
- Neha Ojna suggested filing this feature request.
One relatively easy way to minimize damage in a double-failure sc... - 05:18 PM Backport #38880: luminous: ENOENT in collection_move_rename on EC backfill target
- This backport does not require the third commit https://github.com/ceph/ceph/pull/26996/commits/71996da6be171cd310f8c...
- 05:17 PM Backport #38881 (In Progress): nautilus: ENOENT in collection_move_rename on EC backfill target
- https://github.com/ceph/ceph/pull/27654
- 03:13 PM Feature #39362 (New): ignore osd_max_scrubs for forced repair
- On clusters with quite full PGs, it is common (i.e. ~100% sure) that a `ceph pg repair <pgid>` does not start immedia...
- 12:56 PM Bug #39353 (Fix Under Review): Error message displayed when mon_osd_max_split_count would be exce...
- 12:36 PM Bug #39353 (Resolved): Error message displayed when mon_osd_max_split_count would be exceeded is ...
- Under certain circumstances, an attempt to increase the PG count of a pool can fail like this:...
- 06:58 AM Bug #24531: Mimic MONs have slow/long running ops
- We had this happen twice this week on a v13.2.5 cluster. (The cluster was recently upgraded from v12.2.11, where this...
- 06:19 AM Backport #39343 (In Progress): luminous: ceph-objectstore-tool rename dump-import to dump-export
- 06:13 AM Backport #39343: luminous: ceph-objectstore-tool rename dump-import to dump-export
- Backporting note: cherry-pick 96861a8116242bdef487087348c24c97723dfafc only
- 06:07 AM Backport #39343 (Resolved): luminous: ceph-objectstore-tool rename dump-import to dump-export
- https://github.com/ceph/ceph/pull/27636
- 06:16 AM Backport #39342 (In Progress): mimic: ceph-objectstore-tool rename dump-import to dump-export
- 06:13 AM Backport #39342: mimic: ceph-objectstore-tool rename dump-import to dump-export
- Backporting note: cherry-pick 96861a8116242bdef487087348c24c97723dfafc only
- 06:07 AM Backport #39342 (Resolved): mimic: ceph-objectstore-tool rename dump-import to dump-export
- https://github.com/ceph/ceph/pull/27635
- 04:45 AM Backport #39043 (In Progress): nautilus: osd/PGLog: preserve original_crt to check rollbackability
- https://github.com/ceph/ceph/pull/27632
- 03:21 AM Backport #39044 (In Progress): mimic: osd/PGLog: preserve original_crt to check rollbackability
- https://github.com/ceph/ceph/pull/27629
04/16/2019
- 11:31 PM Backport #38566 (Resolved): mimic: osd_recovery_priority is not documented (but osd_recovery_op_p...
- 11:10 PM Bug #39281 (Resolved): object_stat_sum_t decode broken if given older version
- 11:08 PM Bug #39281 (Pending Backport): object_stat_sum_t decode broken if given older version
- 02:41 PM Bug #39281 (Resolved): object_stat_sum_t decode broken if given older version
- 02:41 PM Bug #39281 (Pending Backport): object_stat_sum_t decode broken if given older version
- 10:32 AM Bug #39281 (Fix Under Review): object_stat_sum_t decode broken if given older version
- 10:32 AM Bug #39281 (Pending Backport): object_stat_sum_t decode broken if given older version
- 10:58 PM Bug #39306 (Need More Info): ceph config: impossible to set osd_scrub_chunk_max
- ...
- 05:25 AM Bug #39306 (Rejected): ceph config: impossible to set osd_scrub_chunk_max
- ...
- 08:42 PM Bug #39336 (Duplicate): "*** Caught signal (Bus error) **" in upgrade:luminous-x-mimic
- Run: http://pulpito.ceph.com/teuthology-2019-04-16_02:25:02-upgrade:luminous-x-mimic-distro-basic-smithi/
Job: 38528... - 07:28 PM Backport #39310 (In Progress): nautilus: crushtool crash on Fedora 28 and newer
- https://github.com/ceph/ceph/pull/27620
- 08:00 AM Backport #39310 (Resolved): nautilus: crushtool crash on Fedora 28 and newer
- https://github.com/ceph/ceph/pull/27620
- 06:18 PM Bug #39333 (Resolved): osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
sage-2019-04-16_13:58:36-rados-wip-sage-testing-2019-04-15-0844-distro-basic-smithi/3853774
The final PGs looked...- 04:53 PM Bug #39330 (New): recovery transfer rate not correct
- When running all OSDs inside a QEMU VM (with real disks attached through virtio-scsi), the gathered recovery statisti...
- 03:29 PM Bug #39249: Some PGs stuck in active+remapped state
- I've not tried changing reweights to 1, though last week I ran "ceph osd reweight-by-utilization 110"
Cluster is ... - 02:58 PM Backport #39325 (In Progress): nautilus: ceph-objectstore-tool rename dump-import to dump-export
- 02:40 PM Backport #39325 (Resolved): nautilus: ceph-objectstore-tool rename dump-import to dump-export
- https://github.com/ceph/ceph/pull/27610
- 02:40 PM Bug #39284 (Pending Backport): ceph-objectstore-tool rename dump-import to dump-export
- 10:34 AM Bug #39284: ceph-objectstore-tool rename dump-import to dump-export
- Backporting note: cherry-pick 96861a8116242bdef487087348c24c97723dfafc only (the PR#27564 includes another commit tha...
- 10:53 AM Bug #38786 (Resolved): autoscale down can lead to max_pg_per_osd limit
- 10:53 AM Backport #39271 (Resolved): nautilus: autoscale down can lead to max_pg_per_osd limit
- 10:52 AM Backport #39275 (Resolved): nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 10:40 AM Bug #39055: OSD's crash when specific PG is trying to backfill
- Hi Greg,
Thanks for getting back.
After a while, I resorted to creating a new pool and migrated all the data of... - 10:33 AM Backport #39320 (Resolved): nautilus: object_stat_sum_t decode broken if given older version
- 10:32 AM Backport #39320 (Resolved): nautilus: object_stat_sum_t decode broken if given older version
- https://github.com/ceph/ceph/pull/27555
- 10:23 AM Support #39319 (New): Every 15 min - Monitor daemon marked osd.x down, but it is still running
- 1. Install Ceph (ceph version 13.2.5 mimic (stable)) in 4 node (CentOS7, in test environment VmWare ESXI 5.5)
f... - 08:00 AM Backport #39311 (Resolved): mimic: crushtool crash on Fedora 28 and newer
- https://github.com/ceph/ceph/pull/27986
- 08:00 AM Backport #39309 (Rejected): luminous: crushtool crash on Fedora 28 and newer
- 07:45 AM Bug #39307 (Won't Fix): EC pools with m=1 are created with an unsafe min_size by default
- Creating an EC pool with m=1 on 14.2.0 defaults to a min_size of k, e.g. min_size of 2 for a 2+1 pool.
Older version... - 02:10 AM Backport #38993 (In Progress): nautilus: unable to link rocksdb library if use system rocksdb
- https://github.com/ceph/ceph/pull/27601
- 01:52 AM Bug #39006 (Fix Under Review): ceph tell osd.xx bench help : gives wrong help
04/15/2019
- 09:19 PM Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd
The message below outputs too many PGs. It counts active + up from pg_count as if the actingset and upset are disj...- 08:28 PM Bug #38930 (In Progress): ceph osd safe-to-destroy wrongly approves any out osd
- Okay, reproduced this with vstart. When I mark an OSD out, I get...
- 09:15 PM Bug #39055: OSD's crash when specific PG is trying to backfill
- You'll need to gather full debug logs of the crash and as much as possible about the object(s) which the PG is workin...
- 09:14 PM Bug #39056: localize-reads does not increment pg stats read count
- Yeah, localize_reads has some issues. This is the least of them and would be hard to fix in the current architecture ...
- 08:01 PM Backport #39271: nautilus: autoscale down can lead to max_pg_per_osd limit
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27547
merged - 07:59 PM Backport #39275: nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27550
merged - 07:40 PM Bug #39304 (Resolved): short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_a...
- Run: http://pulpito.ceph.com/yuriw-2019-04-13_15:18:33-upgrade:nautilus-p2p-wip-yuri6-testing-2019-04-12-1636-nautilu...
- 06:40 PM Feature #39302 (New): `ceph df` reports misleading information when no ceph-mgr running
- When there is no ceph-mgr running, the `ceph df` command reports incorrect (misleading) information. For example, in ...
- 03:17 PM Backport #39239 (New): luminous: "sudo yum -y install python34-cephfs" fails on mimic
- 03:16 PM Backport #39239 (In Progress): luminous: "sudo yum -y install python34-cephfs" fails on mimic
- 01:21 PM Bug #39249: Some PGs stuck in active+remapped state
- exactly the same. In order to heal that I have changed all my reweights to 1. This helped. But anyway, I don't unders...
- 01:18 PM Bug #39249: Some PGs stuck in active+remapped state
- We have a Mimic 13.2.5 cluster with a similar looking problem:
After replacing a failing OSD, the cluster mostly ...
04/14/2019
- 08:26 PM Bug #39174 (Pending Backport): crushtool crash on Fedora 28 and newer
- 08:24 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
- ...
04/13/2019
- 07:11 PM Backport #38904 (Resolved): mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to rol...
- 04:03 PM Backport #39237 (Resolved): mimic: "sudo yum -y install python34-cephfs" fails on mimic
- 12:40 PM Bug #39286 (Resolved): primary recovery local missing object did not update obc
- If not, the snapset in local obc may inconsistent, then the make_writeable()
will make mistakes..
04/12/2019
- 09:48 PM Bug #39263: rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) Shutting down becau...
- /a/nojha-2019-04-11_19:53:24-rados-wip-parial-recovery-2019-04-11-distro-basic-smithi/3834700/
- 08:23 PM Backport #38904: mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27284
merged - 08:11 PM Bug #39284 (In Progress): ceph-objectstore-tool rename dump-import to dump-export
- 07:01 PM Bug #39284 (Resolved): ceph-objectstore-tool rename dump-import to dump-export
dump-import is a stupid name for this command.
Treat dump-import as undocumented synonym for dump-export.- 06:40 PM Bug #39281 (In Progress): object_stat_sum_t decode broken if given older version
- 04:58 PM Bug #39281 (Resolved): object_stat_sum_t decode broken if given older version
When the encode/decode for object_stat_sum_t went from version 19 to 20 the fast path wasn't updated....- 05:46 PM Bug #39282 (Resolved): EIO from process_copy_chunk_manifest
- ...
- 03:14 PM Backport #38901 (Resolved): mimic: Minor rados related documentation fixes
- 03:00 PM Backport #39237: mimic: "sudo yum -y install python34-cephfs" fails on mimic
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/27476
merged - 01:10 PM Bug #39249: Some PGs stuck in active+remapped state
- @Mark: Which version of Mimic are you running?
- 01:04 PM Bug #39249: Some PGs stuck in active+remapped state
- ...
- 01:04 PM Bug #39249: Some PGs stuck in active+remapped state
- #3747 ?
- 12:26 PM Backport #38442 (In Progress): luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 12:21 PM Backport #39275 (In Progress): nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 12:04 PM Backport #39275 (Resolved): nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- https://github.com/ceph/ceph/pull/27550
- 12:09 PM Backport #39271 (In Progress): nautilus: autoscale down can lead to max_pg_per_osd limit
- 12:03 PM Backport #39271 (Resolved): nautilus: autoscale down can lead to max_pg_per_osd limit
- https://github.com/ceph/ceph/pull/27547
- 11:57 AM Bug #38786 (Pending Backport): autoscale down can lead to max_pg_per_osd limit
- 11:55 AM Bug #38359 (Pending Backport): osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 09:53 AM Bug #39159 (Fix Under Review): qa: Fix ambiguous store_thrash thrash_store in mon_thrash.py
- 04:32 AM Bug #39099: Give recovery for inactive PGs a higher priority
- Checking acting.size() < pool.info.min_size is wrong. During recovery acting == up. So if active.size() < pool.info...
04/11/2019
- 08:40 PM Bug #38840 (In Progress): snaps missing in mapper, should be: ca was r -2...repaired
- 07:14 PM Bug #39263 (Resolved): rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) Shutting...
- ...
- 04:50 PM Bug #21388 (Duplicate): inconsistent pg but repair does nothing reporting head data_digest != dat...
- This was merged to master Jul 31, 2018 in https://github.com/ceph/ceph/pull/23217 for a different tracker.
- 04:32 PM Bug #39099 (In Progress): Give recovery for inactive PGs a higher priority
- 12:36 PM Bug #39249: Some PGs stuck in active+remapped state
- OSD.11 previously took part in this PG. I don't know now if as primary or not. The bug happened after I made `ceph os...
- 12:35 PM Bug #39249: Some PGs stuck in active+remapped state
- ...
- 12:23 PM Bug #39249: Some PGs stuck in active+remapped state
- ...
- 12:22 PM Bug #39249: Some PGs stuck in active+remapped state
- ...
- 12:22 PM Bug #39249 (Closed): Some PGs stuck in active+remapped state
- Sometimes my PGs stuck in this state. When I stop primary OSD containig this PG, it becomes `active+undersized+degrad...
- 12:14 PM Feature #39248 (New): Add ability to limit number of simultaneously backfilling PGs
- I want to reduce affect of `ceph osd out osd.xxx`. A already set
--osd-recovery-max-active 1
--osd-max-backfills ... - 11:46 AM Bug #38783: Changing mon_pg_warn_max_object_skew has no effect.
- Injecting into mgr has solved the issue, thanks!
- 11:07 AM Backport #39239: luminous: "sudo yum -y install python34-cephfs" fails on mimic
- note to myself or anyone who wants to backport this change to luminous, you need to blacklist the python36 package wh...
- 10:59 AM Backport #39239 (Resolved): luminous: "sudo yum -y install python34-cephfs" fails on mimic
- https://github.com/ceph/ceph/pull/28493
- 10:59 AM Bug #39164 (Pending Backport): "sudo yum -y install python34-cephfs" fails on mimic
- 10:54 AM Backport #39236 (In Progress): nautilus: "sudo yum -y install python34-cephfs" fails on mimic
- 02:46 AM Backport #39236: nautilus: "sudo yum -y install python34-cephfs" fails on mimic
- https://github.com/ceph/ceph/pull/27505
- 02:44 AM Backport #39236 (Resolved): nautilus: "sudo yum -y install python34-cephfs" fails on mimic
- https://github.com/ceph/ceph/pull/27505
- 07:56 AM Bug #39174 (In Progress): crushtool crash on Fedora 28 and newer
- 07:10 AM Bug #39174 (Fix Under Review): crushtool crash on Fedora 28 and newer
- 06:02 AM Bug #39174: crushtool crash on Fedora 28 and newer
- https://bugzilla.redhat.com/show_bug.cgi?id=1515858
- 04:36 AM Bug #39174: crushtool crash on Fedora 28 and newer
- Turning up verbosity gives clues to what might be the problem....
- 02:31 AM Bug #39174: crushtool crash on Fedora 28 and newer
- 02:30 AM Bug #39174: crushtool crash on Fedora 28 and newer
- Vasu Kulkarni wrote:
> very good reason to drop one distro in teuthology and replace it with fedora 28, I think Brad... - 02:47 AM Backport #39237 (In Progress): mimic: "sudo yum -y install python34-cephfs" fails on mimic
- 02:47 AM Backport #39237 (Resolved): mimic: "sudo yum -y install python34-cephfs" fails on mimic
- https://github.com/ceph/ceph/pull/27476
04/10/2019
- 11:51 PM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
- Ah, that's because a jewel osd does not know how to deal with this REJECT in the Started/ReplicaActive/RepNotRecoveri...
- 02:22 AM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
- Fails in 1 out of 20 runs http://pulpito.ceph.com/nojha-2019-04-09_17:54:07-rados:upgrade:jewel-x-singleton-luminous-...
- 11:46 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- mon.c timeline:
2019-04-06 08:58:28.846 hits a lease timeout and triggers the election process
2019-04-06 08:58:28.... - 10:03 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- Greg Farnum wrote:
> The monitor was out of quorum for 30 minutes; it probably has to do with holding on to client c... - 09:59 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- The monitor was out of quorum for 30 minutes; it probably has to do with holding on to client connections or else not...
- 10:19 PM Backport #38720 (Resolved): mimic: crush: choose_args array size mis-sized when weight-sets are e...
- 10:18 PM Bug #38826 (Resolved): upmap broken the crush rule
- 10:18 PM Backport #38858 (Resolved): mimic: upmap broken the crush rule
- 09:48 PM Bug #39085 (Resolved): monmap created timestamp may be blank
- 09:12 PM Bug #39085 (Pending Backport): monmap created timestamp may be blank
- 09:45 PM Bug #38359 (Fix Under Review): osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 09:45 PM Bug #38359: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- npoe, that didn't fix it:
/a/sage-2019-04-10_15:25:57-rados-wip-sage4-testing-2019-04-10-0709-distro-basic-smithi/3... - 09:36 PM Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd
- Hmm, maybe the pg_map is purged of any OSD marked out? Although you can have up OSDs that are out so that shouldn't b...
- 09:30 PM Bug #39174: crushtool crash on Fedora 28 and newer
- very good reason to drop one distro in teuthology and replace it with fedora 28, I think Brad brought this up long ti...
- 08:30 PM Bug #39174 (Resolved): crushtool crash on Fedora 28 and newer
- On Fedora 29, Fedora 30, and RHEL 8, /usr/bin/crushtool crashes when trying to compile the map that Rook uses.
<pr... - 09:28 PM Bug #39054 (Closed): osd push failed because local copy is 4394'133607637
- As Jewel is an outdated release and you ran the potentially-destructive repair tools, you'll have better luck taking ...
- 09:16 PM Backport #38904 (In Progress): mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to ...
- 09:16 PM Backport #38906 (Resolved): nautilus: osd/PGLog.h: print olog_can_rollback_to before deciding to ...
- 09:14 PM Bug #39039: mon connection reset, command not resent
- So it's not the command specifically but that the client doesn't reconnect to a working monitor, right?
- 09:10 PM Backport #38442 (Resolved): luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
- 09:07 PM Backport #39220 (Resolved): mimic: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_miss...
- https://github.com/ceph/ceph/pull/27940
- 09:07 PM Bug #36598 (Can't reproduce): osd: "bluestore(/var/lib/ceph/osd/ceph-6) ENOENT on clone suggests ...
- This has not shown up recently, so maybe this got resolved as a result of http://tracker.ceph.com/issues/36739 being ...
- 09:07 PM Backport #39219 (Resolved): nautilus: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_m...
- https://github.com/ceph/ceph/pull/27839
- 09:07 PM Backport #39218 (Resolved): luminous: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_m...
- https://github.com/ceph/ceph/pull/27878
- 09:05 PM Backport #39206 (Resolved): mimic: osd: leaked pg refs on shutdown
- https://github.com/ceph/ceph/pull/27938
- 09:05 PM Backport #39205 (Resolved): nautilus: osd: leaked pg refs on shutdown
- https://github.com/ceph/ceph/pull/27803
- 09:05 PM Backport #39204 (Resolved): luminous: osd: leaked pg refs on shutdown
- https://github.com/ceph/ceph/pull/27810
- 09:01 PM Bug #39175 (Resolved): RGW DELETE calls partially missed shortly after OSD startup
- We have two separate clusters (physically 2,000+ miles apart) that are seeing
PGs going inconsistent while doing reb... - 04:06 PM Feature #39162 (In Progress): Improvements to standalone tests.
- 05:58 AM Bug #38892: /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation fault
- See https://github.com/ceph/ceph/pull/27479 for a viable workaround. Note that this is a bug in gcc7 [1] and the pref...
- 04:46 AM Backport #38567 (In Progress): luminous: osd_recovery_priority is not documented (but osd_recover...
- 04:16 AM Bug #39164: "sudo yum -y install python34-cephfs" fails on mimic
- note to myself or anyone who wants to backport this change to luminous, you need to blacklist the python36 package wh...
- 04:13 AM Bug #39164 (Fix Under Review): "sudo yum -y install python34-cephfs" fails on mimic
- 03:24 AM Bug #39164 (Resolved): "sudo yum -y install python34-cephfs" fails on mimic
- see http://pulpito.ceph.com/yuriw-2019-04-09_19:20:36-multimds-wip-yuri3-testing-2019-04-08-2038-mimic-testing-basic-...
- 03:56 AM Bug #38582: Pool storage MAX AVAIL reduction seems higher when single OSD reweight is done
- Correction in the description.
It looks like the pools MAX AVAIL value had dropped after there was a hard disk fail...
04/09/2019
- 10:22 PM Bug #38724 (Need More Info): _txc_add_transaction error (39) Directory not empty not handled on o...
- logging level isn't high enough to tell what data is in this pg. :(
- 10:17 PM Bug #38786 (Fix Under Review): autoscale down can lead to max_pg_per_osd limit
- https://github.com/ceph/ceph/pull/27473
- 09:21 PM Feature #39162 (Resolved): Improvements to standalone tests.
Now that OSDs default to bluestore, need to fix the use of run_osd(). We should replace run_osd_bluestore() with r...- 08:29 PM Backport #38567: luminous: osd_recovery_priority is not documented (but osd_recovery_op_priority is)
- https://github.com/ceph/ceph/pull/27471
- 02:54 PM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
- From the osd log including the thread before the crash....
- 02:36 PM Bug #38219 (Fix Under Review): rebuild-mondb hangs
- 12:25 PM Bug #39159 (Resolved): qa: Fix ambiguous store_thrash thrash_store in mon_thrash.py
- Both store_thrash and thrash_store names are used for the same thing in mon_thrash.py. 'thrash_store' is used here: h...
- 08:13 AM Bug #39154 (Resolved): Don't mark removed osds in when running "ceph osd in any|all|*"
- To reproduce....
- 01:47 AM Bug #23030 (Fix Under Review): osd: crash during recovery with assert(p != recovery_info.ss.clone...
- https://github.com/ceph/ceph/pull/27273
- 01:04 AM Bug #39152 (Duplicate): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
- OSD continously crashed
-1> 2019-04-08 17:47:06.615 7f3f3ef62700 -1 /build/ceph-14.2.0/src/os/bluestore/Bl...
04/08/2019
- 11:00 PM Bug #37264 (Resolved): scrub warning check incorrectly uses mon scrub interval
- 10:49 PM Bug #26971 (Duplicate): failed to become clean before timeout expired
- 10:18 PM Bug #26971: failed to become clean before timeout expired
- see http://tracker.ceph.com/issues/39149
- 10:15 PM Bug #26971: failed to become clean before timeout expired
- oh, it's because there's alos 1/10th the probability of choosing the second host:...
- 07:59 PM Bug #26971: failed to become clean before timeout expired
- This is just CRUSH failing. I extracted the osdmap from the data/mon.a.tgz and verified with osdmaptool that it's ju...
- 10:37 PM Bug #39150 (Resolved): mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
- ...
- 08:42 PM Bug #39148 (New): luminous: powercycle: reached maximum tries (500) after waiting for 3000 seconds
- ...
- 07:02 PM Bug #39145 (New): luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine eve...
- ...
- 05:14 PM Bug #37775: some pg_created messages not sent to mon
- /a/yuriw-2019-04-04_00:00:53-rados-luminous-distro-basic-smithi/3806121/
Also available in: Atom