Project

General

Profile

Activity

From 04/09/2019 to 05/08/2019

05/08/2019

11:57 PM Support #39594: OSD marked as down, had timed out after 15, handle_connect_reply connect got RESE...
The ceph-users mailing list might be good place to seek help on this kind of issue. Neha Ojha
09:33 PM Bug #38124 (Pending Backport): OSD down on snaptrim.
No ETA; it'll have to wend its way through the backports process. I don't think any releases are imminent so it shoul... Greg Farnum
09:24 PM Bug #39636: osd: PeeringState valgrind error UninitCondition
Rebuilding without inlining to narrow down the problem. Samuel Just
06:40 PM Bug #39636 (Resolved): osd: PeeringState valgrind error UninitCondition
... Patrick Donnelly
07:42 PM Backport #39220: mimic: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(soid) |...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27940
merged
Yuri Weinstein
07:19 PM Backport #38443: mimic: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27907
merged
Yuri Weinstein
07:18 PM Backport #38879: mimic: ENOENT in collection_move_rename on EC backfill target
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27943
merged
Yuri Weinstein
06:04 PM Bug #39581: osd/PG.cc: 2523: FAILED ceph_assert(scrub_queued)
/a/nojha-2019-05-07_17:20:56-rados-fix-pg-notify-distro-basic-smithi/3938003/ Neha Ojha
04:53 PM Bug #38195: osd-backfill-space.sh exposes rocksdb hang
another instance in mimic: /a/yuriw-2019-05-07_14:33:13-rados-wip-yuri-testing-2019-05-06-2158-mimic-distro-basic-smi... Neha Ojha
03:39 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Hi Neha,
I am on Bryan's team. Bryan is out this week but is returning soon.
I was able to inspect logs for abo...
Wes Dillingham
03:19 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)

The backfull_toofull state is like backfill_wait except that it indicates the reason that backfill can not proceed ...
David Zafman

05/07/2019

05:50 PM Bug #38724 (Pending Backport): _txc_add_transaction error (39) Directory not empty not handled on...
Sage Weil
05:38 PM Feature #38029 (Pending Backport): [RFE] If the nodeep-scrub/noscrub flags are set in pools inste...
Vikhyat Umrao
10:16 AM Bug #38124: OSD down on snaptrim.
Erikas Kučinskis wrote:
> Hi is there any ETA when the bug fix will be live?
Erikas Kučinskis
10:15 AM Bug #38124: OSD down on snaptrim.
Hi is there any ETA when the bug will be live? Erikas Kučinskis
07:30 AM Backport #39506: mimic: Give recovery for inactive PGs a higher priority
Assigning to Neha based on http://tracker.ceph.com/issues/39099#note-11 Nathan Cutler
07:29 AM Backport #39505: luminous: Give recovery for inactive PGs a higher priority
Assigning to Neha based on http://tracker.ceph.com/issues/39099#note-11 Nathan Cutler
06:13 AM Bug #16553: Removing Writeback Cache Tier Does not clean up Incomplete_Clones
Still hit the same issue on 12.2.10 Jun Yang
05:49 AM Backport #39311 (In Progress): mimic: crushtool crash on Fedora 28 and newer
https://github.com/ceph/ceph/pull/27986 Prashant D

05/06/2019

09:44 PM Bug #25182 (Resolved): Upmaps forgotten after restarting OSDs
Thanks for verifying the fixes Bryan. Looks like those are all backported to mimic + luminous. Josh Durgin
09:52 AM Support #39594 (New): OSD marked as down, had timed out after 15, handle_connect_reply connect go...
Hi,
recently we saw random slow requests in our cluster. in monitor ceph.log I could see that at the same time OSD...
Alon Avrahami
09:18 AM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
This may be the case indeed, but I'd expect that unless pgs are evacuated, the state would be backfill_wait, not back... Rene Diepstraten

05/05/2019

09:13 PM Bug #39152: nautilus osd crash: Caught signal (Aborted) tp_osd_tp
Sage Weil wrote:
> I'm guessing this is a dup of #38724
>
> Wen, can you tell us what the cluster workload was? ...
K Jarrett

05/04/2019

10:43 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
... Kefu Chai
06:35 PM Bug #39582 (Fix Under Review): Binary data in OSD log from "CRC header" message
David Zafman
01:14 AM Backport #39420 (In Progress): luminous: Don't mark removed osds in when running "ceph osd in any...
https://github.com/ceph/ceph/pull/27728 Neha Ojha
01:02 AM Bug #39304 (In Progress): short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when las...
David Zafman

05/03/2019

04:43 PM Bug #39582 (Resolved): Binary data in OSD log from "CRC header" message

This breaks grep'ing the osd logs.
Using cat -v we see the binary data:...
David Zafman
04:37 PM Bug #39581 (Duplicate): osd/PG.cc: 2523: FAILED ceph_assert(scrub_queued)

dzafman-2019-05-02_19:43:04-rados:thrash-wip-zafman-testing-distro-basic-smithi/3919741
This appears to be PG 2....
David Zafman
11:25 AM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
Greg Farnum wrote:
> The OSD can't count PGs being evacuated as if they were gone because something could go wrong. ...
Wido den Hollander
09:33 AM Feature #38370 (Resolved): ceph CLI ability to change file ownership
Nathan Cutler
09:32 AM Backport #38511 (Resolved): mimic: ceph CLI ability to change file ownership
Nathan Cutler
09:31 AM Bug #38537 (Resolved): mgr deadlock
Nathan Cutler
09:31 AM Backport #38561 (Resolved): mimic: mgr deadlock
Nathan Cutler
09:31 AM Bug #38377 (Resolved): OpTracker destruct assert when OSD destruct
Nathan Cutler
09:30 AM Backport #38646 (Resolved): mimic: OpTracker destruct assert when OSD destruct
Nathan Cutler
09:27 AM Backport #38879 (In Progress): mimic: ENOENT in collection_move_rename on EC backfill target
Nathan Cutler
05:02 AM Documentation #39011 (In Progress): Document how get_recovery_priority() and get_backfill_priorit...
David Zafman
04:22 AM Backport #39220 (In Progress): mimic: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_m...
https://github.com/ceph/ceph/pull/27940 Prashant D
01:16 AM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
In both the set of logs you have shared so far, it seems that the object appears in the OSD log during omap-set-vals(... Neha Ojha
12:45 AM Backport #39206 (In Progress): mimic: osd: leaked pg refs on shutdown
https://github.com/ceph/ceph/pull/27938 Prashant D

05/02/2019

10:54 PM Bug #39383 (Resolved): Too much log output generated from PrimaryLogPG::do_backfill()
David Zafman
10:54 PM Backport #39389 (Resolved): nautilus: Too much log output generated from PrimaryLogPG::do_backfi...
David Zafman
10:53 PM Bug #38325 (Resolved): Code to strip | from core pattern isn't right
David Zafman
10:51 PM Backport #38565 (Resolved): mimic: Code to strip | from core pattern isn't right
David Zafman
10:18 PM Backport #38565: mimic: Code to strip | from core pattern isn't right
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26811
merged
Yuri Weinstein
10:15 PM Backport #38507: mimic: ENOENT on setattrs (obj was recently deleted)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26709
merged
Yuri Weinstein
10:14 PM Backport #38511: mimic: ceph CLI ability to change file ownership
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26760
merged
Yuri Weinstein
10:09 PM Backport #38561: mimic: mgr deadlock
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26833
merged
Yuri Weinstein
10:08 PM Backport #38646: mimic: OpTracker destruct assert when OSD destruct
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/26862
merged
Yuri Weinstein
07:54 PM Bug #23879: test_mon_osdmap_prune.sh fails
/a/yuriw-2019-05-01_19:40:05-rados-wip-yuri3-testing-2019-04-30-1543-mimic-distro-basic-smithi/3916650/ Neha Ojha
07:52 PM Backport #38879: mimic: ENOENT in collection_move_rename on EC backfill target
This failure was seen in mimic: /a/yuriw-2019-04-30_20:31:27-rados-wip-yuri3-testing-2019-04-30-1543-mimic-distro-bas... Neha Ojha
05:40 PM Bug #39152: nautilus osd crash: Caught signal (Aborted) tp_osd_tp
I'm guessing this is a dup of #38724
Wen, can you tell us what the cluster workload was? rgw? rbd? cephfs? Thanks!
Sage Weil
05:10 PM Documentation #39011: Document how get_recovery_priority() and get_backfill_priority() impacts re...

Now we need to include new recovery priority boost.
/// base recovery priority for MRecoveryReserve (inactive PG...
David Zafman
04:33 PM Bug #38724 (Fix Under Review): _txc_add_transaction error (39) Directory not empty not handled on...
https://github.com/ceph/ceph/pull/27929 Sage Weil
04:18 PM Bug #39570 (Fix Under Review): nautilus with requrie_osd_release < nautilus cannot increase pg_num
https://github.com/ceph/ceph/pull/27928 Sage Weil
03:59 PM Bug #39570 (Resolved): nautilus with requrie_osd_release < nautilus cannot increase pg_num
On Mon, 29 Apr 2019, Alexander Y. Fomichev wrote:
> Hi,
>
> I just upgraded from mimic to nautilus(14.2.0) and...
Sage Weil

05/01/2019

10:55 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
The OSDs definitely had objects corresponding to the maps, but they failed the CRC check when trying to read them. Al... Erik Lindahl
09:25 PM Bug #39525 (Need More Info): lz4 compressor corrupts data when buffers are unaligned
Was the problem 1) that different OSDs needed different maps, and they had mismatched CRCs when exported from a 13.2.... Greg Farnum
09:24 PM Bug #39490 (Resolved): osd: failed to encode map e26 with expected crc
Neha Ojha
09:24 PM Bug #39509 (Need More Info): segm fault when invoke MergeOperatorRouter::Name()
We'll need more information on how and where this occurred. Neha Ojha
09:11 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
The OSD can't count PGs being evacuated as if they were gone because something could go wrong. So it's stuck seeing i... Greg Farnum
08:10 AM Bug #39555 (Resolved): backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
This week I ran into an issue where ceph reports HEALTH_ERR because pgs are backfill_toofull.
None of the OSDs are o...
Rene Diepstraten
09:06 PM Bug #39449: Uninit in EVP_DecryptFinal_ex on ceph::crypto::onwire::AES128GCM_OnWireRxHandler::aut...
We probably need to backport this? Neha Ojha
08:50 PM Backport #39044 (Resolved): mimic: osd/PGLog: preserve original_crt to check rollbackability
Nathan Cutler
03:48 PM Backport #39044: mimic: osd/PGLog: preserve original_crt to check rollbackability
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27629
merged
Yuri Weinstein
08:49 PM Backport #39342 (Resolved): mimic: ceph-objectstore-tool rename dump-import to dump-export
Nathan Cutler
03:47 PM Backport #39342: mimic: ceph-objectstore-tool rename dump-import to dump-export
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27635
merged
Yuri Weinstein
08:48 PM Backport #39433 (Resolved): mimic: Degraded PG does not discover remapped data on originating OSD
Nathan Cutler
03:45 PM Backport #39433: mimic: Degraded PG does not discover remapped data on originating OSD
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27745
merged
Yuri Weinstein
08:45 PM Backport #39506 (Need More Info): mimic: Give recovery for inactive PGs a higher priority
Nathan Cutler
08:45 PM Backport #39505 (Need More Info): luminous: Give recovery for inactive PGs a higher priority
Nathan Cutler
04:34 PM Backport #39563 (In Progress): luminous: Error message displayed when mon_osd_max_split_count wou...
Nathan Cutler
04:33 PM Backport #39563 (Resolved): luminous: Error message displayed when mon_osd_max_split_count would ...
https://github.com/ceph/ceph/pull/27908 Nathan Cutler
04:32 PM Bug #39353 (Pending Backport): Error message displayed when mon_osd_max_split_count would be exce...
Nathan Cutler
03:47 PM Bug #39353: Error message displayed when mon_osd_max_split_count would be exceeded is not as user...
https://github.com/ceph/ceph/pull/27647 merged Yuri Weinstein
02:24 PM Bug #39099: Give recovery for inactive PGs a higher priority
David, we should discuss whether we want to backport this all the way to luminous or just to nautilus. Neha Ojha
02:28 AM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...

I have a version that not only copies the existing dups, but if copy_up_to is excluding any log entries it adds tho...
David Zafman
01:38 AM Bug #23879: test_mon_osdmap_prune.sh fails
/a/yuriw-2019-04-29_22:14:10-rados-wip-yuri2-testing-2019-04-29-1936-mimic-distro-basic-smithi/3910028 Neha Ojha

04/30/2019

11:06 PM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...

Before fix:...
David Zafman
01:26 AM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...
2119'372 (write 2775265) does not get identified as a dup when the log boundaries are (2119'372,2119'373], while 2119... Neha Ojha
12:51 AM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...

Probably 1.1.short_pg_log.yaml produces this
osd_max_pg_log_entries: 2
osd_min_pg_log_entri...
David Zafman
12:08 AM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...

Before changing primaries from 3 to 0, these 4 operations came in with versions 2119'370 (write 726564), 2119'371 (...
David Zafman
07:47 PM Bug #39484: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
Updated the PR. Please put further code reviews there. :) Greg Farnum
06:37 PM Bug #39484: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
Hmm probably! Greg Farnum
02:50 AM Bug #39484: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
Greg Farnum wrote:
> pending_finishers get moved into committing_finishers once they have been submitted to disk, so...
haitao chen
12:03 AM Bug #39484 (Fix Under Review): mon: "FAILED assert(pending_finishers.empty())" when paxos restart
https://github.com/ceph/ceph/pull/27877 Greg Farnum
07:04 PM Bug #39553 (New): PeeringState: cache PeeringListener indirections
perf_counter refs, etc should be immutable, so PeeringState may as well cache them. Add a call to do so as to avoid ... Samuel Just
06:58 PM Bug #39552 (New): mons fail to process send_alive message causing pg stuck creating
sjust-2019-04-28_20:59:54-rados-wip-sjust-peering-refactor-distro-basic-smithi/3905696/
PG 200.3 is stuck creating...
Samuel Just
05:43 PM Bug #39546: Warning about past_interval bounds on deleting pg
sjust-2019-04-26_14:00:33-rados-wip-sjust-peering-refactor-distro-basic-mira/3897200/ Samuel Just
05:43 PM Bug #39546 (Resolved): Warning about past_interval bounds on deleting pg
cluster [ERR] 4.7
required past_interval bounds are empty [228,226) but past_intervals is not: ([166,225]
a...
Samuel Just
11:36 AM Backport #39539 (Resolved): nautilus: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()-...
https://github.com/ceph/ceph/pull/28219 Nathan Cutler
11:36 AM Backport #39538 (Resolved): mimic: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->ge...
https://github.com/ceph/ceph/pull/28259 Nathan Cutler
11:36 AM Backport #39537 (Resolved): luminous: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()-...
https://github.com/ceph/ceph/pull/28989 Nathan Cutler
01:32 AM Bug #38846 (In Progress): dump_pgstate_history doesn't really produce useful json output, needs a...
Brad Hubbard
12:39 AM Backport #39218 (In Progress): luminous: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().i...
https://github.com/ceph/ceph/pull/27878 Prashant D

04/29/2019

11:21 PM Backport #39219 (In Progress): nautilus: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().i...
https://github.com/ceph/ceph/pull/27839 Prashant D
11:18 PM Bug #38784 (Pending Backport): osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(...
Prashant D
06:29 AM Bug #38784 (In Progress): osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(soid)...
Prashant D
10:40 PM Bug #39484 (In Progress): mon: "FAILED assert(pending_finishers.empty())" when paxos restart
Hmm this doesn't make a lot of sense. finish_contexts() swaps out the input list with a local one before running fini... Greg Farnum
09:22 PM Bug #39484: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
pending_finishers get moved into committing_finishers once they have been submitted to disk, so we probably want to f... Greg Farnum
10:16 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
Hi Greg,
That might actually explain how it happened originally; we auto-deploy hosts with salt, and noticed that ...
Erik Lindahl
09:15 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
Hmm are the working and broken OSDs actually running the same binary version? It should work anyway but a bug around ... Greg Farnum
05:23 PM Bug #39525: lz4 compressor corrupts data when buffers are unaligned
I might have gotten slightly further.
1) On one of the the broken OSDs, the current_epoch is 34626 (clean_thru is...
Erik Lindahl
03:58 PM Bug #39525 (Resolved): lz4 compressor corrupts data when buffers are unaligned
In conjunction with taking a new storage server online we observed that 5 out of the 6 SSDs we use to store metadata ... Erik Lindahl
09:23 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Unfortunately I didn't turn up debug logging for every OSD in the cluster so I don't have those logs. I'll reproduce... Bryan Stillwell
08:58 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Bryan, could you also upload osd.516 and osd.563 logs from the same time period as you've provided for osd.503. Neha Ojha
09:02 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
A user saw this and uploaded with debug 20 on OSD and bluestone: 2d8d22f4-580b-4b57-a13a-f49dade34ba7 Greg Farnum
08:53 PM Bug #39304: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_acked_tid wa...

After writing tids 1,2,3 the output shows finishing tids 1,2,3,4,5. We can see that the 3rd set of writes was inte...
David Zafman
10:30 AM Backport #39504 (In Progress): nautilus: Give recovery for inactive PGs a higher priority
Nathan Cutler
10:23 AM Backport #39520 (Rejected): luminous: snaps missing in mapper, should be: ca was r -2...repaired
Nathan Cutler
10:23 AM Backport #39519 (Resolved): nautilus: snaps missing in mapper, should be: ca was r -2...repaired
https://github.com/ceph/ceph/pull/28205 Nathan Cutler
10:23 AM Backport #39518 (Resolved): mimic: snaps missing in mapper, should be: ca was r -2...repaired
https://github.com/ceph/ceph/pull/28232 Nathan Cutler
10:23 AM Backport #39517 (Resolved): nautilus: Improvements to standalone tests.
https://github.com/ceph/ceph/pull/30528 Nathan Cutler
10:22 AM Backport #39516 (Resolved): nautilus: osd-backfill-space.sh test failed in TEST_backfill_multi_pa...
https://github.com/ceph/ceph/pull/28187 Nathan Cutler
10:22 AM Backport #39515 (Rejected): luminous: osd: segv in _preboot -> heartbeat
Nathan Cutler
10:22 AM Backport #39514 (Resolved): nautilus: osd: segv in _preboot -> heartbeat
https://github.com/ceph/ceph/pull/28164 Nathan Cutler
10:22 AM Backport #39513 (Resolved): mimic: osd: segv in _preboot -> heartbeat
https://github.com/ceph/ceph/pull/28220 Nathan Cutler
10:22 AM Backport #39512 (Resolved): nautilus: osd acting cycle
https://github.com/ceph/ceph/pull/28160 Nathan Cutler

04/28/2019

02:09 PM Bug #39509 (Need More Info): segm fault when invoke MergeOperatorRouter::Name()
(gdb) bt
#0 0x00007f60321804ab in raise () from /lib64/libpthread.so.0
#1 0x000055aafad0501a in handle_fatal_sign...
Zengran Zhang
08:19 AM Bug #39449: Uninit in EVP_DecryptFinal_ex on ceph::crypto::onwire::AES128GCM_OnWireRxHandler::aut...
/a/kchai-2019-04-27_02:20:42-rados-wip-kefu-testing-2019-04-26-2318-distro-basic-smithi/3898463/remote/smithi017/log/... Kefu Chai
12:29 AM Bug #38124 (Fix Under Review): OSD down on snaptrim.
David Zafman

04/27/2019

11:26 PM Bug #38124: OSD down on snaptrim.

The following script sometimes hits the race and crashes an OSD. I've removed the assert and the script has been r...
David Zafman
04:39 AM Documentation #3466: rados manpage: bench still documents "read" rather than "seq/rand"
Dan Mick wrote:
> rados bench read has been replaced with "seq" and "rand", the latter of which is
> still unimplem...
James McClune

04/26/2019

11:45 PM Bug #39152: nautilus osd crash: Caught signal (Aborted) tp_osd_tp
A similar issue was reported on ceph-users: "Nautilus (14.2.0) OSDs crashing at startup after removing a pool contain... Neha Ojha
11:19 PM Bug #38124 (In Progress): OSD down on snaptrim.

I am able to reproduce this, so I'll work on a fix.
David Zafman
11:01 PM Bug #39441 (Pending Backport): osd acting cycle
Neha Ojha
04:23 PM Bug #39441 (Fix Under Review): osd acting cycle
Neha Ojha
10:27 PM Bug #38840 (Pending Backport): snaps missing in mapper, should be: ca was r -2...repaired
David Zafman
12:04 AM Bug #38840: snaps missing in mapper, should be: ca was r -2...repaired
David Zafman
06:55 PM Bug #39333 (Pending Backport): osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
David Zafman
05:12 PM Bug #39333: osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
David Zafman
06:23 PM Bug #39439 (Pending Backport): osd: segv in _preboot -> heartbeat
Sage Weil
05:27 PM Feature #39162 (Pending Backport): Improvements to standalone tests.
David Zafman
03:51 PM Feature #38617 (Fix Under Review): osd: Better error message when OSD count is less than osd_pool...
Neha Ojha
03:46 PM Backport #39506 (Rejected): mimic: Give recovery for inactive PGs a higher priority
Nathan Cutler
03:46 PM Backport #39505 (Rejected): luminous: Give recovery for inactive PGs a higher priority
Nathan Cutler
03:46 PM Backport #39504 (Resolved): nautilus: Give recovery for inactive PGs a higher priority
https://github.com/ceph/ceph/pull/27854 Nathan Cutler
08:59 AM Backport #39204 (In Progress): luminous: osd: leaked pg refs on shutdown
https://github.com/ceph/ceph/pull/27810 Prashant D
03:45 AM Backport #39205 (In Progress): nautilus: osd: leaked pg refs on shutdown
https://github.com/ceph/ceph/pull/27803 Prashant D
01:40 AM Bug #39484: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
Upload the core dump log file.
And the ceph -s:
!ceph_status.png!
mon.b01 crashs again and again.
haitao chen
12:03 AM Bug #35808 (Need More Info): ceph osd ok-to-stop result dosen't match the real situation
Can the reporter test this with the change in https://github.com/ceph/ceph/pull/27503 and report back? David Zafman

04/25/2019

11:58 PM Bug #38930 (Duplicate): ceph osd safe-to-destroy wrongly approves any out osd

We can backport pull request https://github.com/ceph/ceph/pull/27503 for http://tracker.ceph.com/issues/39099 which...
David Zafman
11:55 PM Bug #38930 (Pending Backport): ceph osd safe-to-destroy wrongly approves any out osd
David Zafman
11:54 PM Bug #39099 (Pending Backport): Give recovery for inactive PGs a higher priority
David Zafman
09:39 PM Bug #39490 (In Progress): osd: failed to encode map e26 with expected crc
should be fixed by https://github.com/ceph/ceph/pull/27623 Neha Ojha
08:28 PM Bug #39490 (Resolved): osd: failed to encode map e26 with expected crc

upgrade:nautilus-x/parallel/{0-cluster/{openstack.yaml start.yaml} 1-ceph-install/nautilus.yaml 1.1-pg-log-override...
Neha Ojha
08:34 PM Bug #36748: ms_deliver_verify_authorizer no AuthAuthorizeHandler found for protocol 0
... Neha Ojha
08:13 PM Bug #38483: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
/a/nojha-2019-04-25_05:43:35-rados-wip-39441-distro-basic-smithi/3892156/ Neha Ojha
08:10 PM Bug #37797: radosbench tests hit ENOSPC
This one appeared again.
/a/nojha-2019-04-25_05:43:35-rados-wip-39441-distro-basic-smithi/3892141/
Neha Ojha
12:27 PM Bug #39484 (Resolved): mon: "FAILED assert(pending_finishers.empty())" when paxos restart
We are running ceph 13.2.5 on Centos Linux 7.5.1804, and the ceph cluster consists of 5 ceph-mon. Every 30 seconds, w... yu feng
08:07 AM Bug #26958: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objec...
... Nathan Cutler
07:46 AM Backport #39476 (Resolved): nautilus: segv in fgets() in collect_sys_info reading /proc/cpuinfo
https://github.com/ceph/ceph/pull/28141 Nathan Cutler
07:46 AM Backport #39475 (Resolved): mimic: segv in fgets() in collect_sys_info reading /proc/cpuinfo
https://github.com/ceph/ceph/pull/28206 Nathan Cutler
07:46 AM Backport #39474 (Resolved): luminous: segv in fgets() in collect_sys_info reading /proc/cpuinfo
https://github.com/ceph/ceph/pull/32349 Nathan Cutler
07:45 AM Backport #39419 (In Progress): nautilus: rados/upgrade/nautilus-x-singleton: mon.c@1(electing).el...
Nathan Cutler
06:13 AM Bug #39443: "ceph daemon" does not support ceph args
Sure, 'debug_ms' was just an example to illustrate the problem. Yes the most (if not all) ceph args do not make sense... Mykola Golub
03:53 AM Bug #39333 (Fix Under Review): osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
David Zafman

04/24/2019

09:31 PM Bug #35808: ceph osd ok-to-stop result dosen't match the real situation
This may be fixed by https://github.com/ceph/ceph/pull/27503 David Zafman
07:40 PM Bug #39441: osd acting cycle
https://github.com/ceph/ceph/pull/24004 was not backported to mimic, which might explain why the octopus osd is calcu... Neha Ojha
06:51 PM Bug #39443: "ceph daemon" does not support ceph args
I'm not sure this is a problem — "ceph daemon" is just for talking to a local Unix socket; it doesn't engage in any o... Greg Farnum
10:32 AM Bug #39443 (New): "ceph daemon" does not support ceph args
This works:... Mykola Golub
06:49 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
We're also seeing Bus Errors instead of segfaults in the OpHistory cleanup at #24664 so these may be related... Greg Farnum
06:47 PM Bug #39336 (Duplicate): "*** Caught signal (Bus error) **" in upgrade:luminous-x-mimic
Greg Farnum
06:41 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Bryan Stillwell wrote:
> I could grab you the debug logs, but that could take a while. Which knobs do you want me t...
Neha Ojha
01:26 PM Bug #39449: Uninit in EVP_DecryptFinal_ex on ceph::crypto::onwire::AES128GCM_OnWireRxHandler::aut...
PRs:
* https://github.com/ceph/teuthology/pull/1274
* https://github.com/ceph/ceph/pull/27265
Gist:
* https:...
Radoslaw Zarzynski
01:24 PM Bug #39449 (Resolved): Uninit in EVP_DecryptFinal_ex on ceph::crypto::onwire::AES128GCM_OnWireRxH...
... Sage Weil
01:18 PM Backport #39431 (In Progress): luminous: Degraded PG does not discover remapped data on originati...
Ashish Singh
12:35 PM Backport #39433 (In Progress): mimic: Degraded PG does not discover remapped data on originating OSD
Ashish Singh
12:33 PM Backport #39432 (In Progress): nautilus: Degraded PG does not discover remapped data on originati...
Ashish Singh

04/23/2019

10:09 PM Bug #39441 (Resolved): osd acting cycle
osd.9 (mimic)... Sage Weil
06:07 PM Bug #26958 (Pending Backport): osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_lo...
Sage Weil
01:50 AM Bug #26958 (Fix Under Review): osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_lo...
Neha Ojha
06:05 PM Bug #38296 (Pending Backport): segv in fgets() in collect_sys_info reading /proc/cpuinfo
Sage Weil
06:04 PM Bug #39439 (Fix Under Review): osd: segv in _preboot -> heartbeat
https://github.com/ceph/ceph/pull/27729 Sage Weil
06:01 PM Bug #39439 (Resolved): osd: segv in _preboot -> heartbeat
... Sage Weil
05:47 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
/ceph/teuthology-archive/pdonnell-2019-04-17_06:12:56-kcephfs-wip-pdonnell-testing-20190417.032809-distro-basic-smith... Patrick Donnelly
01:07 PM Backport #39433 (Resolved): mimic: Degraded PG does not discover remapped data on originating OSD
https://github.com/ceph/ceph/pull/27745 Nathan Cutler
01:07 PM Backport #39432 (Resolved): nautilus: Degraded PG does not discover remapped data on originating OSD
https://github.com/ceph/ceph/pull/27744 Nathan Cutler
01:07 PM Backport #39431 (Resolved): luminous: Degraded PG does not discover remapped data on originating OSD
https://github.com/ceph/ceph/pull/27751 Nathan Cutler
01:05 PM Backport #39422 (Resolved): mimic: Don't mark removed osds in when running "ceph osd in any|all|*"
https://github.com/ceph/ceph/pull/28142 Nathan Cutler
01:05 PM Backport #39421 (Resolved): nautilus: Don't mark removed osds in when running "ceph osd in any|al...
https://github.com/ceph/ceph/pull/28072 Nathan Cutler
01:05 PM Backport #39420 (Resolved): luminous: Don't mark removed osds in when running "ceph osd in any|al...
https://github.com/ceph/ceph/pull/27728 Nathan Cutler
01:04 PM Backport #39419 (Resolved): nautilus: rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elect...
https://github.com/ceph/ceph/pull/27771 Nathan Cutler
11:03 AM Bug #24419: ceph-objectstore-tool unable to open mon store
Were you able to figure out why? Kevin Cao
10:52 AM Support #39319: Every 15 min - Monitor daemon marked osd.x down, but it is still running
2019-04-23 13:36:20.668791 osd.2 [WRN] Monitor daemon marked osd.2 down, but it is still running
2019-04-23 13:40:36...
Vladimir Savinov
10:51 AM Support #39319: Every 15 min - Monitor daemon marked osd.x down, but it is still running
I add "debug ms = 1" line in [osd] adm view log monitor in /var/log/ceph
...
mon.greend02-n02ceph02@1(peon) e3 ms...
Vladimir Savinov
06:41 AM Backport #39042 (In Progress): luminous: osd/PGLog: preserve original_crt to check rollbackability
https://github.com/ceph/ceph/pull/27715 Prashant D

04/22/2019

11:32 PM Bug #38296 (In Progress): segv in fgets() in collect_sys_info reading /proc/cpuinfo
Brad Hubbard
05:52 PM Bug #38296: segv in fgets() in collect_sys_info reading /proc/cpuinfo
https://github.com/ceph/ceph/pull/27707
(looks like the buffer is only 100 chars, and /proc/cpuinfo frequently exc...
Sage Weil
05:52 PM Bug #38296 (Fix Under Review): segv in fgets() in collect_sys_info reading /proc/cpuinfo
https://github.com/ceph/ceph/pull/27707
(looks like the buffer is only 100 chars, and /proc/cpuinfo frequently exc...
Sage Weil
05:48 PM Bug #38296: segv in fgets() in collect_sys_info reading /proc/cpuinfo
saw this again: ... Sage Weil
09:17 PM Bug #39402 (New): Can't remove ghost PGs
This is on the downstream long-running cluster. I can grant SSH access to whomever needs it.
This bug is similar ...
David Galloway
06:26 PM Bug #39263 (Pending Backport): rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) ...
Only this commit needs to be backported to nautilus https://github.com/ceph/ceph/pull/27622/commits/ccb86682361cf20bd... Neha Ojha
05:08 PM Bug #39398 (Fix Under Review): osd: fast_info need update when pglog rewind
Neha Ojha
08:43 AM Bug #39398 (Duplicate): osd: fast_info need update when pglog rewind
When the pglog need rewind, the info.last_update will need to change to
older value, current impl of PG::_prepare_wr...
Zengran Zhang
01:50 PM Bug #37679 (Fix Under Review): osd: pull object from the shard who missing it
Sage Weil
07:50 AM Bug #26958: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get_log().objec...
http://qa-proxy.ceph.com/teuthology/xxg-2019-04-19_03:19:09-rados-wip-yanj-testing-fixpeerings-190418-distro-basic-sm... xie xingguo
02:28 AM Bug #37439 (Pending Backport): Degraded PG does not discover remapped data on originating OSD
Sage Weil

04/20/2019

01:47 PM Bug #39154 (Pending Backport): Don't mark removed osds in when running "ceph osd in any|all|*"
Sage Weil

04/19/2019

04:07 AM Bug #39390: filestore pre-split may not split enough directories
https://github.com/ceph/ceph/pull/27689 Jeegn Chen
03:50 AM Bug #39390 (Resolved): filestore pre-split may not split enough directories
Current HashIndex::pre_split_folder() use the following snippet to figure the number of levels for split.... Jeegn Chen

04/18/2019

10:04 PM Backport #39389 (In Progress): nautilus: Too much log output generated from PrimaryLogPG::do_bac...
David Zafman
10:02 PM Backport #39389 (Resolved): nautilus: Too much log output generated from PrimaryLogPG::do_backfi...
https://github.com/ceph/ceph/pull/27687 David Zafman
09:53 PM Bug #39383 (Pending Backport): Too much log output generated from PrimaryLogPG::do_backfill()
David Zafman
09:06 PM Bug #39383 (In Progress): Too much log output generated from PrimaryLogPG::do_backfill()
David Zafman
02:55 PM Bug #39383 (Resolved): Too much log output generated from PrimaryLogPG::do_backfill()

Caused by 834d3c19a77
David Zafman
12:30 PM Bug #39054: osd push failed because local copy is 4394'133607637
Greg Farnum wrote:
> As Jewel is an outdated release and you ran the potentially-destructive repair tools, you'll ha...
yite gu
12:24 PM Bug #39054: osd push failed because local copy is 4394'133607637
thank you yite gu
09:44 AM Backport #39381 (Rejected): luminous: src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
Nathan Cutler
09:38 AM Feature #39066 (Pending Backport): src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
Nathan Cutler
09:36 AM Backport #38873 (In Progress): luminous: Rados.get_fsid() returning bytes in python3
Nathan Cutler
09:35 AM Backport #38872 (Resolved): mimic: Rados.get_fsid() returning bytes in python3
Nathan Cutler
09:26 AM Bug #38992 (Resolved): unable to link rocksdb library if use system rocksdb
Nathan Cutler
09:26 AM Backport #38993 (Resolved): nautilus: unable to link rocksdb library if use system rocksdb
Nathan Cutler
09:24 AM Backport #39325 (Resolved): nautilus: ceph-objectstore-tool rename dump-import to dump-export
Nathan Cutler
09:24 AM Backport #39310 (Resolved): nautilus: crushtool crash on Fedora 28 and newer
Nathan Cutler
09:19 AM Backport #39375 (Resolved): nautilus: ceph tell osd.xx bench help : gives wrong help
https://github.com/ceph/ceph/pull/28035 Nathan Cutler
09:19 AM Backport #39374 (Resolved): mimic: ceph tell osd.xx bench help : gives wrong help
https://github.com/ceph/ceph/pull/28097 Nathan Cutler
09:19 AM Backport #39373 (Resolved): luminous: ceph tell osd.xx bench help : gives wrong help
https://github.com/ceph/ceph/pull/28112 Nathan Cutler
06:41 AM Bug #19753: Deny reservation if expected backfill size would put us over backfill_full_ratio
would save a lot of diskspace if you could fix that :) imirc tw
05:38 AM Bug #39282: EIO from process_copy_chunk_manifest
https://github.com/ceph/ceph/pull/27667 Myoungwon Oh
04:46 AM Bug #38846 (Fix Under Review): dump_pgstate_history doesn't really produce useful json output, ne...
Brad Hubbard
03:27 AM Bug #39154 (In Progress): Don't mark removed osds in when running "ceph osd in any|all|*"
Brad Hubbard
02:08 AM Bug #39154 (Fix Under Review): Don't mark removed osds in when running "ceph osd in any|all|*"
Brad Hubbard
02:46 AM Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd

The fix checks for down OSD when all PGs aren't active+clean and doesn't trust num_pgs which is 0 after marking a d...
David Zafman
12:47 AM Support #39319: Every 15 min - Monitor daemon marked osd.x down, but it is still running
Turn up debug_ms to 5 maybe. It's very likely you need to look more closely at your network. Brad Hubbard

04/17/2019

11:33 PM Feature #39066: src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
merged https://github.com/ceph/ceph/pull/27228 Yuri Weinstein
11:32 PM Backport #38872: mimic: Rados.get_fsid() returning bytes in python3
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27259
merged
Yuri Weinstein
10:25 PM Bug #39006 (Pending Backport): ceph tell osd.xx bench help : gives wrong help
Neha Ojha
10:24 PM Bug #39306 (Rejected): ceph config: impossible to set osd_scrub_chunk_max
OK, can you open two new trackers then please. One for each specific problem? Brad Hubbard
12:24 PM Bug #39306: ceph config: impossible to set osd_scrub_chunk_max
Yes! I have discovered TWO problems:
1. Problem with _min_: ceph config set osd osd_scrub_chunk_min = 1 WORKS (!) ...
Марк Коренберг
10:06 PM Bug #19753: Deny reservation if expected backfill size would put us over backfill_full_ratio
A couple of dout(0) should be dout(20) or some dout(10) for some less repetitive ones. David Zafman
08:26 PM Bug #19753: Deny reservation if expected backfill size would put us over backfill_full_ratio
during backfilling after a failed disk the log files get spammed with do_backfill messages. log files easily grow bey... imirc tw
10:03 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
(not surprisingly, MON_DOWN is in the ceph.log too, and the run would have failed with that had it not failed for som... Sage Weil
09:53 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
mon.c is failing to connect to mon.a:... Sage Weil
10:00 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
I could grab you the debug logs, but that could take a while. Which knobs do you want me to turn up?
This is what...
Bryan Stillwell
09:47 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
The reason the OSDs are rebooting is that we're applying the latest OS updates for CentOS, so it should be a proper s... Bryan Stillwell
09:28 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Or maybe I misread that; is the claim that an OSD reboots, *then* a delete happens, and then later on you discover th... Greg Farnum
09:27 PM Bug #39175: RGW DELETE calls partially missed shortly after OSD startup
Is there any chance of getting good debug logs of the event *while* it happens (ie, not just after scrub detects the ... Greg Farnum
09:42 PM Bug #39307: EC pools with m=1 are created with an unsafe min_size by default
Yeah, changing the default ec profile also works Paul Emmerich
09:19 PM Bug #39307: EC pools with m=1 are created with an unsafe min_size by default
see https://github.com/ceph/ceph/pull/27656 ? Sage Weil
09:15 PM Bug #39307 (Won't Fix): EC pools with m=1 are created with an unsafe min_size by default
This was a deliberate choice. https://github.com/ceph/ceph/pull/26894 made teh change, based on a discussion on anot... Sage Weil
09:14 PM Bug #39307: EC pools with m=1 are created with an unsafe min_size by default
Hmm, is this a default EC mode or just something we let users set?
The change was deliberate in PR https://github....
Greg Farnum
09:22 PM Bug #39249 (Closed): Some PGs stuck in active+remapped state
This looks like CRUSH's fault. Can you check with tunables you are running? (ceph osd crush show-tunables)
Using ...
Sage Weil
09:08 PM Bug #39263 (Fix Under Review): rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) ...
https://github.com/ceph/ceph/pull/27622 Sage Weil
09:01 PM Bug #39286 (Fix Under Review): primary recovery local missing object did not update obc
https://github.com/ceph/ceph/pull/27575 Neha Ojha
08:04 PM Backport #38993: nautilus: unable to link rocksdb library if use system rocksdb
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/27601
merged
Yuri Weinstein
08:02 PM Backport #39325: nautilus: ceph-objectstore-tool rename dump-import to dump-export
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27610
merged
Yuri Weinstein
08:01 PM Backport #39310: nautilus: crushtool crash on Fedora 28 and newer
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27620
merged
Yuri Weinstein
07:55 PM Bug #39366 (Can't reproduce): ClsLock.TestRenew failure
... Sage Weil
07:49 PM Feature #39339: prioritize backfill of metadata pools, automatically

I forgot that it is possible that backfill/recovery could be moving data around for several reasons. In those case...
David Zafman
06:50 PM Feature #39339: prioritize backfill of metadata pools, automatically

Recovery is also about restoring objects to the right level of replication. Because the log is known to represent a...
David Zafman
03:54 PM Feature #39339: prioritize backfill of metadata pools, automatically
Also, this ceph command requires the operator to do it, the point of the tracker is that this should be default behav... Ben England
03:38 PM Feature #39339: prioritize backfill of metadata pools, automatically
is backfill any different than recovery priority? If not, should it be? By "backfill" I mean the emergency situatio... Ben England
02:06 PM Feature #39339: prioritize backfill of metadata pools, automatically
ceph osd pool set <pool> recovery_priority <value>
I think a value of 1 or 2 makes sense (default if unset is 0).
Sage Weil
01:59 AM Feature #39339 (In Progress): prioritize backfill of metadata pools, automatically
Neha Ojna suggested filing this feature request.
One relatively easy way to minimize damage in a double-failure sc...
Ben England
05:18 PM Backport #38880: luminous: ENOENT in collection_move_rename on EC backfill target
This backport does not require the third commit https://github.com/ceph/ceph/pull/26996/commits/71996da6be171cd310f8c... Neha Ojha
05:17 PM Backport #38881 (In Progress): nautilus: ENOENT in collection_move_rename on EC backfill target
https://github.com/ceph/ceph/pull/27654 Neha Ojha
03:13 PM Feature #39362 (New): ignore osd_max_scrubs for forced repair
On clusters with quite full PGs, it is common (i.e. ~100% sure) that a `ceph pg repair <pgid>` does not start immedia... Dan van der Ster
12:56 PM Bug #39353 (Fix Under Review): Error message displayed when mon_osd_max_split_count would be exce...
Nathan Cutler
12:36 PM Bug #39353 (Resolved): Error message displayed when mon_osd_max_split_count would be exceeded is ...
Under certain circumstances, an attempt to increase the PG count of a pool can fail like this:... Nathan Cutler
06:58 AM Bug #24531: Mimic MONs have slow/long running ops
We had this happen twice this week on a v13.2.5 cluster. (The cluster was recently upgraded from v12.2.11, where this... Dan van der Ster
06:19 AM Backport #39343 (In Progress): luminous: ceph-objectstore-tool rename dump-import to dump-export
Nathan Cutler
06:13 AM Backport #39343: luminous: ceph-objectstore-tool rename dump-import to dump-export
Backporting note: cherry-pick 96861a8116242bdef487087348c24c97723dfafc only Nathan Cutler
06:07 AM Backport #39343 (Resolved): luminous: ceph-objectstore-tool rename dump-import to dump-export
https://github.com/ceph/ceph/pull/27636 Nathan Cutler
06:16 AM Backport #39342 (In Progress): mimic: ceph-objectstore-tool rename dump-import to dump-export
Nathan Cutler
06:13 AM Backport #39342: mimic: ceph-objectstore-tool rename dump-import to dump-export
Backporting note: cherry-pick 96861a8116242bdef487087348c24c97723dfafc only Nathan Cutler
06:07 AM Backport #39342 (Resolved): mimic: ceph-objectstore-tool rename dump-import to dump-export
https://github.com/ceph/ceph/pull/27635 Nathan Cutler
04:45 AM Backport #39043 (In Progress): nautilus: osd/PGLog: preserve original_crt to check rollbackability
https://github.com/ceph/ceph/pull/27632 Prashant D
03:21 AM Backport #39044 (In Progress): mimic: osd/PGLog: preserve original_crt to check rollbackability
https://github.com/ceph/ceph/pull/27629 Prashant D

04/16/2019

11:31 PM Backport #38566 (Resolved): mimic: osd_recovery_priority is not documented (but osd_recovery_op_p...
David Zafman
11:10 PM Bug #39281 (Resolved): object_stat_sum_t decode broken if given older version
David Zafman
11:08 PM Bug #39281 (Pending Backport): object_stat_sum_t decode broken if given older version
David Zafman
02:41 PM Bug #39281 (Resolved): object_stat_sum_t decode broken if given older version
Nathan Cutler
02:41 PM Bug #39281 (Pending Backport): object_stat_sum_t decode broken if given older version
Nathan Cutler
10:32 AM Bug #39281 (Fix Under Review): object_stat_sum_t decode broken if given older version
Nathan Cutler
10:32 AM Bug #39281 (Pending Backport): object_stat_sum_t decode broken if given older version
Nathan Cutler
10:58 PM Bug #39306 (Need More Info): ceph config: impossible to set osd_scrub_chunk_max
... Brad Hubbard
05:25 AM Bug #39306 (Rejected): ceph config: impossible to set osd_scrub_chunk_max
... Марк Коренберг
08:42 PM Bug #39336 (Duplicate): "*** Caught signal (Bus error) **" in upgrade:luminous-x-mimic
Run: http://pulpito.ceph.com/teuthology-2019-04-16_02:25:02-upgrade:luminous-x-mimic-distro-basic-smithi/
Job: 38528...
Yuri Weinstein
07:28 PM Backport #39310 (In Progress): nautilus: crushtool crash on Fedora 28 and newer
https://github.com/ceph/ceph/pull/27620 Neha Ojha
08:00 AM Backport #39310 (Resolved): nautilus: crushtool crash on Fedora 28 and newer
https://github.com/ceph/ceph/pull/27620 Nathan Cutler
06:18 PM Bug #39333 (Resolved): osd-backfill-space.sh test failed in TEST_backfill_multi_partial()

sage-2019-04-16_13:58:36-rados-wip-sage-testing-2019-04-15-0844-distro-basic-smithi/3853774
The final PGs looked...
David Zafman
04:53 PM Bug #39330 (New): recovery transfer rate not correct
When running all OSDs inside a QEMU VM (with real disks attached through virtio-scsi), the gathered recovery statisti... Jonas Jelten
03:29 PM Bug #39249: Some PGs stuck in active+remapped state
I've not tried changing reweights to 1, though last week I ran "ceph osd reweight-by-utilization 110"
Cluster is ...
Jake Grimmett
02:58 PM Backport #39325 (In Progress): nautilus: ceph-objectstore-tool rename dump-import to dump-export
Nathan Cutler
02:40 PM Backport #39325 (Resolved): nautilus: ceph-objectstore-tool rename dump-import to dump-export
https://github.com/ceph/ceph/pull/27610 Nathan Cutler
02:40 PM Bug #39284 (Pending Backport): ceph-objectstore-tool rename dump-import to dump-export
Nathan Cutler
10:34 AM Bug #39284: ceph-objectstore-tool rename dump-import to dump-export
Backporting note: cherry-pick 96861a8116242bdef487087348c24c97723dfafc only (the PR#27564 includes another commit tha... Nathan Cutler
10:53 AM Bug #38786 (Resolved): autoscale down can lead to max_pg_per_osd limit
Nathan Cutler
10:53 AM Backport #39271 (Resolved): nautilus: autoscale down can lead to max_pg_per_osd limit
Nathan Cutler
10:52 AM Backport #39275 (Resolved): nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Nathan Cutler
10:40 AM Bug #39055: OSD's crash when specific PG is trying to backfill
Hi Greg,
Thanks for getting back.
After a while, I resorted to creating a new pool and migrated all the data of...
Alex Tijhuis
10:33 AM Backport #39320 (Resolved): nautilus: object_stat_sum_t decode broken if given older version
Nathan Cutler
10:32 AM Backport #39320 (Resolved): nautilus: object_stat_sum_t decode broken if given older version
https://github.com/ceph/ceph/pull/27555 Nathan Cutler
10:23 AM Support #39319 (New): Every 15 min - Monitor daemon marked osd.x down, but it is still running
1. Install Ceph (ceph version 13.2.5 mimic (stable)) in 4 node (CentOS7, in test environment VmWare ESXI 5.5)
f...
Vladimir Savinov
08:00 AM Backport #39311 (Resolved): mimic: crushtool crash on Fedora 28 and newer
https://github.com/ceph/ceph/pull/27986 Nathan Cutler
08:00 AM Backport #39309 (Rejected): luminous: crushtool crash on Fedora 28 and newer
Nathan Cutler
07:45 AM Bug #39307 (Won't Fix): EC pools with m=1 are created with an unsafe min_size by default
Creating an EC pool with m=1 on 14.2.0 defaults to a min_size of k, e.g. min_size of 2 for a 2+1 pool.
Older version...
Paul Emmerich
02:10 AM Backport #38993 (In Progress): nautilus: unable to link rocksdb library if use system rocksdb
https://github.com/ceph/ceph/pull/27601 Prashant D
01:52 AM Bug #39006 (Fix Under Review): ceph tell osd.xx bench help : gives wrong help
Neha Ojha

04/15/2019

09:19 PM Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd

The message below outputs too many PGs. It counts active + up from pg_count as if the actingset and upset are disj...
David Zafman
08:28 PM Bug #38930 (In Progress): ceph osd safe-to-destroy wrongly approves any out osd
Okay, reproduced this with vstart. When I mark an OSD out, I get... Sage Weil
09:15 PM Bug #39055: OSD's crash when specific PG is trying to backfill
You'll need to gather full debug logs of the crash and as much as possible about the object(s) which the PG is workin... Greg Farnum
09:14 PM Bug #39056: localize-reads does not increment pg stats read count
Yeah, localize_reads has some issues. This is the least of them and would be hard to fix in the current architecture ... Greg Farnum
08:01 PM Backport #39271: nautilus: autoscale down can lead to max_pg_per_osd limit
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27547
merged
Yuri Weinstein
07:59 PM Backport #39275: nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27550
merged
Yuri Weinstein
07:40 PM Bug #39304 (Resolved): short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_a...
Run: http://pulpito.ceph.com/yuriw-2019-04-13_15:18:33-upgrade:nautilus-p2p-wip-yuri6-testing-2019-04-12-1636-nautilu... Yuri Weinstein
06:40 PM Feature #39302 (New): `ceph df` reports misleading information when no ceph-mgr running
When there is no ceph-mgr running, the `ceph df` command reports incorrect (misleading) information. For example, in ... J. Eric Ivancich
03:17 PM Backport #39239 (New): luminous: "sudo yum -y install python34-cephfs" fails on mimic
Nathan Cutler
03:16 PM Backport #39239 (In Progress): luminous: "sudo yum -y install python34-cephfs" fails on mimic
Nathan Cutler
01:21 PM Bug #39249: Some PGs stuck in active+remapped state
exactly the same. In order to heal that I have changed all my reweights to 1. This helped. But anyway, I don't unders... Марк Коренберг
01:18 PM Bug #39249: Some PGs stuck in active+remapped state
We have a Mimic 13.2.5 cluster with a similar looking problem:
After replacing a failing OSD, the cluster mostly ...
Jake Grimmett

04/14/2019

08:26 PM Bug #39174 (Pending Backport): crushtool crash on Fedora 28 and newer
Sage Weil
08:24 PM Bug #21592: LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
... Sage Weil

04/13/2019

07:11 PM Backport #38904 (Resolved): mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to rol...
Nathan Cutler
04:03 PM Backport #39237 (Resolved): mimic: "sudo yum -y install python34-cephfs" fails on mimic
Nathan Cutler
12:40 PM Bug #39286 (Resolved): primary recovery local missing object did not update obc
If not, the snapset in local obc may inconsistent, then the make_writeable()
will make mistakes..
Zengran Zhang

04/12/2019

09:48 PM Bug #39263: rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) Shutting down becau...
/a/nojha-2019-04-11_19:53:24-rados-wip-parial-recovery-2019-04-11-distro-basic-smithi/3834700/ Neha Ojha
08:23 PM Backport #38904: mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/27284
merged
Yuri Weinstein
08:11 PM Bug #39284 (In Progress): ceph-objectstore-tool rename dump-import to dump-export
David Zafman
07:01 PM Bug #39284 (Resolved): ceph-objectstore-tool rename dump-import to dump-export

dump-import is a stupid name for this command.
Treat dump-import as undocumented synonym for dump-export.
David Zafman
06:40 PM Bug #39281 (In Progress): object_stat_sum_t decode broken if given older version
David Zafman
04:58 PM Bug #39281 (Resolved): object_stat_sum_t decode broken if given older version

When the encode/decode for object_stat_sum_t went from version 19 to 20 the fast path wasn't updated....
David Zafman
05:46 PM Bug #39282 (Resolved): EIO from process_copy_chunk_manifest
... Sage Weil
03:14 PM Backport #38901 (Resolved): mimic: Minor rados related documentation fixes
Nathan Cutler
03:00 PM Backport #39237: mimic: "sudo yum -y install python34-cephfs" fails on mimic
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/27476
merged
Yuri Weinstein
01:10 PM Bug #39249: Some PGs stuck in active+remapped state
@Mark: Which version of Mimic are you running? Nathan Cutler
01:04 PM Bug #39249: Some PGs stuck in active+remapped state
... Марк Коренберг
01:04 PM Bug #39249: Some PGs stuck in active+remapped state
#3747 ?
Марк Коренберг
12:26 PM Backport #38442 (In Progress): luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Nathan Cutler
12:21 PM Backport #39275 (In Progress): nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Nathan Cutler
12:04 PM Backport #39275 (Resolved): nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
https://github.com/ceph/ceph/pull/27550 Nathan Cutler
12:09 PM Backport #39271 (In Progress): nautilus: autoscale down can lead to max_pg_per_osd limit
Nathan Cutler
12:03 PM Backport #39271 (Resolved): nautilus: autoscale down can lead to max_pg_per_osd limit
https://github.com/ceph/ceph/pull/27547 Nathan Cutler
11:57 AM Bug #38786 (Pending Backport): autoscale down can lead to max_pg_per_osd limit
Sage Weil
11:55 AM Bug #38359 (Pending Backport): osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Sage Weil
09:53 AM Bug #39159 (Fix Under Review): qa: Fix ambiguous store_thrash thrash_store in mon_thrash.py
Jos Collin
04:32 AM Bug #39099: Give recovery for inactive PGs a higher priority
Checking acting.size() < pool.info.min_size is wrong. During recovery acting == up. So if active.size() < pool.info... David Zafman

04/11/2019

08:40 PM Bug #38840 (In Progress): snaps missing in mapper, should be: ca was r -2...repaired
David Zafman
07:14 PM Bug #39263 (Resolved): rados/upgrade/nautilus-x-singleton: mon.c@1(electing).elector(11) Shutting...
... Neha Ojha
04:50 PM Bug #21388 (Duplicate): inconsistent pg but repair does nothing reporting head data_digest != dat...
This was merged to master Jul 31, 2018 in https://github.com/ceph/ceph/pull/23217 for a different tracker. David Zafman
04:32 PM Bug #39099 (In Progress): Give recovery for inactive PGs a higher priority
David Zafman
12:36 PM Bug #39249: Some PGs stuck in active+remapped state
OSD.11 previously took part in this PG. I don't know now if as primary or not. The bug happened after I made `ceph os... Марк Коренберг
12:35 PM Bug #39249: Some PGs stuck in active+remapped state
... Марк Коренберг
12:23 PM Bug #39249: Some PGs stuck in active+remapped state
... Марк Коренберг
12:22 PM Bug #39249: Some PGs stuck in active+remapped state
... Марк Коренберг
12:22 PM Bug #39249 (Closed): Some PGs stuck in active+remapped state
Sometimes my PGs stuck in this state. When I stop primary OSD containig this PG, it becomes `active+undersized+degrad... Марк Коренберг
12:14 PM Feature #39248 (New): Add ability to limit number of simultaneously backfilling PGs
I want to reduce affect of `ceph osd out osd.xxx`. A already set
--osd-recovery-max-active 1
--osd-max-backfills ...
Марк Коренберг
11:46 AM Bug #38783: Changing mon_pg_warn_max_object_skew has no effect.
Injecting into mgr has solved the issue, thanks! Andrew Mitroshin
11:07 AM Backport #39239: luminous: "sudo yum -y install python34-cephfs" fails on mimic
note to myself or anyone who wants to backport this change to luminous, you need to blacklist the python36 package wh... Nathan Cutler
10:59 AM Backport #39239 (Resolved): luminous: "sudo yum -y install python34-cephfs" fails on mimic
https://github.com/ceph/ceph/pull/28493 Nathan Cutler
10:59 AM Bug #39164 (Pending Backport): "sudo yum -y install python34-cephfs" fails on mimic
Nathan Cutler
10:54 AM Backport #39236 (In Progress): nautilus: "sudo yum -y install python34-cephfs" fails on mimic
Nathan Cutler
02:46 AM Backport #39236: nautilus: "sudo yum -y install python34-cephfs" fails on mimic
https://github.com/ceph/ceph/pull/27505 Kefu Chai
02:44 AM Backport #39236 (Resolved): nautilus: "sudo yum -y install python34-cephfs" fails on mimic
https://github.com/ceph/ceph/pull/27505 Kefu Chai
07:56 AM Bug #39174 (In Progress): crushtool crash on Fedora 28 and newer
Brad Hubbard
07:10 AM Bug #39174 (Fix Under Review): crushtool crash on Fedora 28 and newer
Brad Hubbard
06:02 AM Bug #39174: crushtool crash on Fedora 28 and newer
https://bugzilla.redhat.com/show_bug.cgi?id=1515858 Brad Hubbard
04:36 AM Bug #39174: crushtool crash on Fedora 28 and newer
Turning up verbosity gives clues to what might be the problem.... Brad Hubbard
02:31 AM Bug #39174: crushtool crash on Fedora 28 and newer
Brad Hubbard
02:30 AM Bug #39174: crushtool crash on Fedora 28 and newer
Vasu Kulkarni wrote:
> very good reason to drop one distro in teuthology and replace it with fedora 28, I think Brad...
Brad Hubbard
02:47 AM Backport #39237 (In Progress): mimic: "sudo yum -y install python34-cephfs" fails on mimic
Kefu Chai
02:47 AM Backport #39237 (Resolved): mimic: "sudo yum -y install python34-cephfs" fails on mimic
https://github.com/ceph/ceph/pull/27476 Kefu Chai

04/10/2019

11:51 PM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
Ah, that's because a jewel osd does not know how to deal with this REJECT in the Started/ReplicaActive/RepNotRecoveri... Neha Ojha
02:22 AM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
Fails in 1 out of 20 runs http://pulpito.ceph.com/nojha-2019-04-09_17:54:07-rados:upgrade:jewel-x-singleton-luminous-... Neha Ojha
11:46 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
mon.c timeline:
2019-04-06 08:58:28.846 hits a lease timeout and triggers the election process
2019-04-06 08:58:28....
Greg Farnum
10:03 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
Greg Farnum wrote:
> The monitor was out of quorum for 30 minutes; it probably has to do with holding on to client c...
Patrick Donnelly
09:59 PM Bug #39150: mon: "FAILED ceph_assert(session_map.sessions.empty())" when out of quorum
The monitor was out of quorum for 30 minutes; it probably has to do with holding on to client connections or else not... Greg Farnum
10:19 PM Backport #38720 (Resolved): mimic: crush: choose_args array size mis-sized when weight-sets are e...
Nathan Cutler
10:18 PM Bug #38826 (Resolved): upmap broken the crush rule
Nathan Cutler
10:18 PM Backport #38858 (Resolved): mimic: upmap broken the crush rule
Nathan Cutler
09:48 PM Bug #39085 (Resolved): monmap created timestamp may be blank
Sage Weil
09:12 PM Bug #39085 (Pending Backport): monmap created timestamp may be blank
Neha Ojha
09:45 PM Bug #38359 (Fix Under Review): osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Sage Weil
09:45 PM Bug #38359: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
npoe, that didn't fix it:
/a/sage-2019-04-10_15:25:57-rados-wip-sage4-testing-2019-04-10-0709-distro-basic-smithi/3...
Sage Weil
09:36 PM Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd
Hmm, maybe the pg_map is purged of any OSD marked out? Although you can have up OSDs that are out so that shouldn't b... Greg Farnum
09:30 PM Bug #39174: crushtool crash on Fedora 28 and newer
very good reason to drop one distro in teuthology and replace it with fedora 28, I think Brad brought this up long ti... Vasu Kulkarni
08:30 PM Bug #39174 (Resolved): crushtool crash on Fedora 28 and newer
On Fedora 29, Fedora 30, and RHEL 8, /usr/bin/crushtool crashes when trying to compile the map that Rook uses.
<pr...
Ken Dreyer
09:28 PM Bug #39054 (Closed): osd push failed because local copy is 4394'133607637
As Jewel is an outdated release and you ran the potentially-destructive repair tools, you'll have better luck taking ... Greg Farnum
09:16 PM Backport #38904 (In Progress): mimic: osd/PGLog.h: print olog_can_rollback_to before deciding to ...
Nathan Cutler
09:16 PM Backport #38906 (Resolved): nautilus: osd/PGLog.h: print olog_can_rollback_to before deciding to ...
Nathan Cutler
09:14 PM Bug #39039: mon connection reset, command not resent
So it's not the command specifically but that the client doesn't reconnect to a working monitor, right? Greg Farnum
09:10 PM Backport #38442 (Resolved): luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Nathan Cutler
09:07 PM Backport #39220 (Resolved): mimic: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_miss...
https://github.com/ceph/ceph/pull/27940 Nathan Cutler
09:07 PM Bug #36598 (Can't reproduce): osd: "bluestore(/var/lib/ceph/osd/ceph-6) ENOENT on clone suggests ...
This has not shown up recently, so maybe this got resolved as a result of http://tracker.ceph.com/issues/36739 being ... Neha Ojha
09:07 PM Backport #39219 (Resolved): nautilus: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_m...
https://github.com/ceph/ceph/pull/27839 Nathan Cutler
09:07 PM Backport #39218 (Resolved): luminous: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_m...
https://github.com/ceph/ceph/pull/27878 Nathan Cutler
09:05 PM Backport #39206 (Resolved): mimic: osd: leaked pg refs on shutdown
https://github.com/ceph/ceph/pull/27938 Nathan Cutler
09:05 PM Backport #39205 (Resolved): nautilus: osd: leaked pg refs on shutdown
https://github.com/ceph/ceph/pull/27803 Nathan Cutler
09:05 PM Backport #39204 (Resolved): luminous: osd: leaked pg refs on shutdown
https://github.com/ceph/ceph/pull/27810 Nathan Cutler
09:01 PM Bug #39175 (Resolved): RGW DELETE calls partially missed shortly after OSD startup
We have two separate clusters (physically 2,000+ miles apart) that are seeing
PGs going inconsistent while doing reb...
Bryan Stillwell
04:06 PM Feature #39162 (In Progress): Improvements to standalone tests.
David Zafman
05:58 AM Bug #38892: /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation fault
See https://github.com/ceph/ceph/pull/27479 for a viable workaround. Note that this is a bug in gcc7 [1] and the pref... Brad Hubbard
04:46 AM Backport #38567 (In Progress): luminous: osd_recovery_priority is not documented (but osd_recover...
Nathan Cutler
04:16 AM Bug #39164: "sudo yum -y install python34-cephfs" fails on mimic
note to myself or anyone who wants to backport this change to luminous, you need to blacklist the python36 package wh... Kefu Chai
04:13 AM Bug #39164 (Fix Under Review): "sudo yum -y install python34-cephfs" fails on mimic
Kefu Chai
03:24 AM Bug #39164 (Resolved): "sudo yum -y install python34-cephfs" fails on mimic
see http://pulpito.ceph.com/yuriw-2019-04-09_19:20:36-multimds-wip-yuri3-testing-2019-04-08-2038-mimic-testing-basic-... Kefu Chai
03:56 AM Bug #38582: Pool storage MAX AVAIL reduction seems higher when single OSD reweight is done
Correction in the description.
It looks like the pools MAX AVAIL value had dropped after there was a hard disk fail...
Nokia ceph-users

04/09/2019

10:22 PM Bug #38724 (Need More Info): _txc_add_transaction error (39) Directory not empty not handled on o...
logging level isn't high enough to tell what data is in this pg. :( Sage Weil
10:17 PM Bug #38786 (Fix Under Review): autoscale down can lead to max_pg_per_osd limit
https://github.com/ceph/ceph/pull/27473 Sage Weil
09:21 PM Feature #39162 (Resolved): Improvements to standalone tests.

Now that OSDs default to bluestore, need to fix the use of run_osd(). We should replace run_osd_bluestore() with r...
David Zafman
08:29 PM Backport #38567: luminous: osd_recovery_priority is not documented (but osd_recovery_op_priority is)
https://github.com/ceph/ceph/pull/27471 Neha Ojha
02:54 PM Bug #39145: luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine event")
From the osd log including the thread before the crash.... David Zafman
02:36 PM Bug #38219 (Fix Under Review): rebuild-mondb hangs
Kefu Chai
12:25 PM Bug #39159 (Resolved): qa: Fix ambiguous store_thrash thrash_store in mon_thrash.py
Both store_thrash and thrash_store names are used for the same thing in mon_thrash.py. 'thrash_store' is used here: h... Jos Collin
08:13 AM Bug #39154 (Resolved): Don't mark removed osds in when running "ceph osd in any|all|*"
To reproduce.... Brad Hubbard
01:47 AM Bug #23030 (Fix Under Review): osd: crash during recovery with assert(p != recovery_info.ss.clone...
https://github.com/ceph/ceph/pull/27273 Neha Ojha
01:04 AM Bug #39152 (Duplicate): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
OSD continously crashed
-1> 2019-04-08 17:47:06.615 7f3f3ef62700 -1 /build/ceph-14.2.0/src/os/bluestore/Bl...
Wen Wei
 

Also available in: Atom