Project

General

Profile

Activity

From 07/31/2019 to 08/29/2019

08/29/2019

11:16 PM Bug #23647: thrash-eio test can prevent recovery
Several proposals that might improve things:
* from Josh, just turn down the odds
* from Greg, is it plausible to...
Greg Farnum
09:24 PM Bug #41385: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(fromshard))
Here's the chain of events that causes this:
Two objects go missing on the primary, and we want to recover them fr...
Neha Ojha
06:54 PM Bug #41577: Erasure-Coded storage in bluestore has larger disk usage than expected
The issue of small object size uses more space seems related to https://tracker.ceph.com/issues/41417
Yan Zhao
06:53 PM Bug #41577 (New): Erasure-Coded storage in bluestore has larger disk usage than expected
The test is done in ceph 14.2.1
We've tested Erasure Coded storage with the same amount of data, which is 800 GiB....
Yan Zhao
05:55 PM Bug #41429 (Fix Under Review): Incorrect logical operator in Monitor::handle_auth_request()
Neha Ojha
03:36 PM Bug #41526 (Rejected): Choosing the next PG for a deep scrubs wrong.
David Zafman
02:43 PM Bug #37775 (Resolved): some pg_created messages not sent to mon
Nathan Cutler
02:40 PM Bug #41517: Missing head object at primary with snapshots crashes primary
Backporting note:
cherry pick https://github.com/ceph/ceph/pull/27575 first, and then https://github.com/ceph/ceph...
Nathan Cutler
02:39 PM Bug #39286: primary recovery local missing object did not update obc
Backports to luminous, mimic, and nautilus are being handled via #41517 Nathan Cutler
02:38 PM Bug #39286 (Resolved): primary recovery local missing object did not update obc
Since this introduced a regression in master, I propose to refrain from backporting it separately, but instead backpo... Nathan Cutler
12:35 PM Backport #41568 (In Progress): nautilus: doc: pg_num should always be a power of two
Nathan Cutler
08:14 AM Backport #41568 (Resolved): nautilus: doc: pg_num should always be a power of two
https://github.com/ceph/ceph/pull/30004 Nathan Cutler
12:29 PM Backport #41529 (In Progress): nautilus: doc: mon_health_to_clog_* values flipped
Nathan Cutler
12:26 PM Bug #39152 (New): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
This is problematic to backport because the "Pull request ID" field is not populated and none of the notes mention a ... Nathan Cutler
11:28 AM Backport #41503 (In Progress): nautilus: Warning about past_interval bounds on deleting pg
Nathan Cutler
11:21 AM Backport #41501 (In Progress): nautilus: backfill_toofull while OSDs are not full (Unneccessary H...
Nathan Cutler
11:17 AM Backport #41491 (In Progress): nautilus: OSDCap.PoolClassRNS test aborts
Nathan Cutler
11:15 AM Backport #41455: nautilus: osd: fix ceph_assert(mem_avail >= 0) caused by the unset cgroup memory...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29745
m...
Nathan Cutler
11:14 AM Backport #41455 (Resolved): nautilus: osd: fix ceph_assert(mem_avail >= 0) caused by the unset cg...
Nathan Cutler
10:47 AM Backport #41453 (In Progress): nautilus: mon: C_AckMarkedDown has not handled the Callback Arguments
Nathan Cutler
10:24 AM Backport #41448 (In Progress): nautilus: osd/PrimaryLogPG: Access destroyed references in finish_...
Nathan Cutler
10:20 AM Backport #40889 (Need More Info): luminous: Pool settings aren't populated to OSD after restart.
non-trivial backport Nathan Cutler
10:20 AM Backport #40890 (Need More Info): mimic: Pool settings aren't populated to OSD after restart.
non-trivial backport Nathan Cutler
10:20 AM Backport #40891 (Need More Info): nautilus: Pool settings aren't populated to OSD after restart.
non-trivial backport Nathan Cutler
10:10 AM Bug #40112 (Resolved): mon: rados/multimon tests fail with clock skew
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
10:09 AM Backport #40228 (Resolved): nautilus: mon: rados/multimon tests fail with clock skew
backport PR https://github.com/ceph/ceph/pull/28576
merge commit 1bc3cc4aa2588bef0acadcf6ba2703df0312b9b4 (v14.2.2-2...
Nathan Cutler
10:03 AM Backport #40084 (In Progress): nautilus: osd: Better error message when OSD count is less than os...
Nathan Cutler
09:56 AM Backport #39700 (In Progress): nautilus: [RFE] If the nodeep-scrub/noscrub flags are set in pools...
Nathan Cutler
09:31 AM Bug #41255 (Pending Backport): backfill_toofull seen on cluster where the most full OSD is at 1%
Kefu Chai
08:59 AM Backport #39682 (In Progress): nautilus: filestore pre-split may not split enough directories
Nathan Cutler
08:50 AM Backport #39517 (In Progress): nautilus: Improvements to standalone tests.
Nathan Cutler
03:51 AM Bug #38155 (Duplicate): PG stuck in undersized+degraded+remapped+backfill_toofull+peered
I'm assuming that the fix for 24452 also fixed this issue. So marking duplicate. David Zafman
03:27 AM Bug #39115 (Duplicate): ceph pg repair doesn't fix itself if osd is bluestore

OSD crashes are the underlying issue here and we can't say anything about repair until there aren't any more crashes.
David Zafman
03:09 AM Documentation #41004 (Pending Backport): doc: pg_num should always be a power of two
Neha Ojha

08/28/2019

09:20 PM Bug #41313: PG distribution completely messed up since Nautilus
Can you reach out on the ceph-users mailing list to see if others have seen similar issues? We've not seen a specific... Neha Ojha
09:19 PM Bug #40522: on_local_recover doesn't touch?

I see this as a hang in running standalone tests in particular qa/standalone/osd/divergent-priors.sh. The test han...
David Zafman
09:13 PM Bug #41336 (Resolved): All OSD Faild after Reboot.
Josh Durgin
09:13 PM Bug #41336: All OSD Faild after Reboot.
This is fixed in later versions - the monitor makes sure stripe_unit is a valid value when the pool is created. With ... Josh Durgin
09:12 PM Bug #41336: All OSD Faild after Reboot.
... Neha Ojha
09:03 PM Bug #41526: Choosing the next PG for a deep scrubs wrong.

You never know what what scrubs can run with osd_max_scrubs (especially defaulting to 1). Without looking at which...
David Zafman
08:44 PM Bug #41385 (In Progress): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(f...
Neha Ojha
08:36 PM Feature #41564 (Resolved): Issue health status warning if num_shards_repaired exceeds some threshold

Now that num_shards_repaired has been added, we can assist in noticing disk, controller, software or other issues b...
David Zafman
08:27 PM Feature #41563: Add connection reset tracking to Network ping monitoring

Experimental code: https://github.com/dzafman/ceph/tree/wip-network-resets
David Zafman
08:24 PM Feature #41563 (New): Add connection reset tracking to Network ping monitoring

Record connection resets on front and back interfaces and report with ping times
David Zafman
08:25 PM Backport #41341 (In Progress): nautilus: "CMake Error" in test_envlibrados_for_rocksdb.sh
Nathan Cutler
08:20 PM Bug #41517: Missing head object at primary with snapshots crashes primary
This was caused by https://github.com/ceph/ceph/pull/27575 David Zafman
05:26 PM Bug #41517 (In Progress): Missing head object at primary with snapshots crashes primary
David Zafman
06:42 PM Bug #41522 (In Progress): ceph-objectstore-tool can't remove head with bad snapset
David Zafman
06:42 PM Backport #38450 (In Progress): mimic: src/osd/OSDMap.h: 1065: FAILED assert(__null != pool)
David Zafman
06:36 PM Bug #37775: some pg_created messages not sent to mon
This patch does not make sense for mimic and luminous.
@Nathan can we please resolve this issue and close the corre...
Neha Ojha
06:34 PM Bug #36498 (New): failed to recover before timeout expired due to pg stuck in creating+peering
I don't think this is a duplicate of https://tracker.ceph.com/issues/37752 or https://tracker.ceph.com/issues/37775 f... Neha Ojha
06:09 PM Bug #39286: primary recovery local missing object did not update obc
https://tracker.ceph.com/issues/41517 is a follow on fix for this. Neha Ojha
11:46 AM Bug #41550 (Fix Under Review): os/bluestore: fadvise_flag leak in generate_transaction
Nathan Cutler
08:09 AM Bug #41550: os/bluestore: fadvise_flag leak in generate_transaction
https://github.com/ceph/ceph/pull/29944 Xuehan Xu
08:06 AM Bug #41550 (Resolved): os/bluestore: fadvise_flag leak in generate_transaction
In generate_transaction when creating ceph::os::Transaction, ObjectOperation::BufferUpdate::Write::fadvise_flag is no... Xuehan Xu
11:13 AM Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
... Nathan Cutler
11:11 AM Backport #38567: luminous: osd_recovery_priority is not documented (but osd_recovery_op_priority is)
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27471
m...
Nathan Cutler
10:11 AM Backport #40638: luminous: osd: report omap/data/metadata usage
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28851
m...
Nathan Cutler
07:43 AM Backport #41548 (Resolved): nautilus: monc: send_command to specific down mon breaks other mon msgs
https://github.com/ceph/ceph/pull/31037 Nathan Cutler
07:43 AM Backport #41547 (Rejected): luminous: monc: send_command to specific down mon breaks other mon msgs
Nathan Cutler
07:42 AM Backport #41546 (Rejected): mimic: monc: send_command to specific down mon breaks other mon msgs
Nathan Cutler

08/27/2019

10:23 PM Bug #38416: crc cache should be invalidated when posting preallocated rx buffers
This is causing lots of failures in luminous/mimic, marking it urgent to get the backports expedited. Neha Ojha
09:16 PM Backport #38880: luminous: ENOENT in collection_move_rename on EC backfill target
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28110
m...
Nathan Cutler
09:16 PM Backport #39373: luminous: ceph tell osd.xx bench help : gives wrong help
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28112
m...
Nathan Cutler
09:14 PM Bug #40765 (Duplicate): mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
Brad Hubbard
09:07 PM Backport #38902: luminous: Minor rados related documentation fixes
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27185
m...
Nathan Cutler
08:18 PM Backport #41532 (In Progress): luminous: Move bluefs alloc size initialization log message to log...
Nathan Cutler
08:46 AM Backport #41532 (Resolved): luminous: Move bluefs alloc size initialization log message to log le...
https://github.com/ceph/ceph/pull/29910 Nathan Cutler
06:19 PM Bug #41522 (Fix Under Review): ceph-objectstore-tool can't remove head with bad snapset
Neha Ojha
04:49 AM Bug #41522 (Resolved): ceph-objectstore-tool can't remove head with bad snapset

We should allow a --force remove of a head object with a bad snapset to remove the object instead of failing.
David Zafman
05:26 PM Bug #20924: osd: leaked Session on osd.7
https://github.com/ceph/ceph/pull/29859 Samuel Just
05:16 PM Bug #41539 (New): luminous: TEST_backfill_remapped fails in above_margin
... Neha Ojha
05:04 PM Bug #38513: luminous: "AsyncReserver.h: 190: FAILED assert(!queue_pointers.count(item) && !in_pro...
/a/nojha-2019-08-26_20:27:46-rados-wip-bluefs-shared-alloc-luminous-2019-08-26-distro-basic-smithi/4255358/ Neha Ojha
03:04 PM Feature #41537: MON DNS Lookup for messenger V2
Jason Dillaman wrote:
> I think v2 over DNS SRV is already handled here [1] and [2].
>
Great, in that case it's...
Ricardo Dias
03:00 PM Feature #41537: MON DNS Lookup for messenger V2
I think v2 over DNS SRV is already handled here [1] and [2].
[1] https://github.com/ceph/ceph/blob/master/src/mon/...
Jason Dillaman
02:43 PM Feature #41537 (New): MON DNS Lookup for messenger V2
Currently is possible for a client to use DNS SRV records to find the MONs addresses to connect to. But these address... Ricardo Dias
01:20 PM Backport #40650: luminous: os/bluestore: fix >2GB writes
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28965
m...
Nathan Cutler
01:19 PM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
Should add one more thing: the only clusters bitten by this issue would be those that, *at any time,* ran the @balanc... Florian Haas
01:18 PM Backport #38276: luminous: osd_map_message_max default is too high?
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28640
m...
Nathan Cutler
01:14 PM Backport #38750: luminous: should report EINVAL in ErasureCode::parse() if m<=0
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28111
m...
Nathan Cutler
10:52 AM Backport #38719: luminous: crush: choose_args array size mis-sized when weight-sets are enabled
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27085
m...
Nathan Cutler
10:52 AM Backport #39343: luminous: ceph-objectstore-tool rename dump-import to dump-export
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27636
m...
Nathan Cutler
10:51 AM Backport #38873: luminous: Rados.get_fsid() returning bytes in python3
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27674
m...
Nathan Cutler
10:51 AM Backport #39042: luminous: osd/PGLog: preserve original_crt to check rollbackability
backport PR https://github.com/ceph/ceph/pull/27715
merge commit f7c528dbafcf540ab046de2cd29010113055da5a (v12.2.12-...
Nathan Cutler
10:51 AM Backport #38905: luminous: osd/PGLog.h: print olog_can_rollback_to before deciding to rollback
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27715
m...
Nathan Cutler
10:51 AM Backport #39431: luminous: Degraded PG does not discover remapped data on originating OSD
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27751
m...
Nathan Cutler
10:50 AM Backport #39204: luminous: osd: leaked pg refs on shutdown
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27810
m...
Nathan Cutler
10:50 AM Backport #39218: luminous: osd: FAILED ceph_assert(attrs || !pg_log.get_missing().is_missing(soid...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27878
m...
Nathan Cutler
10:50 AM Backport #39563: luminous: Error message displayed when mon_osd_max_split_count would be exceeded...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27908
m...
Nathan Cutler
10:50 AM Backport #39719: luminous: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when la...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28185
m...
Nathan Cutler
10:14 AM Backport #39239: luminous: "sudo yum -y install python34-cephfs" fails on mimic
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28493
m...
Nathan Cutler
10:12 AM Backport #39420: luminous: Don't mark removed osds in when running "ceph osd in any|all|*"
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27728
m...
Nathan Cutler
09:50 AM Backport #41534 (In Progress): nautilus: valgrind: UninitCondition in ceph::crypto::onwire::AES12...
Nathan Cutler
08:49 AM Backport #41534 (Resolved): nautilus: valgrind: UninitCondition in ceph::crypto::onwire::AES128GC...
https://github.com/ceph/ceph/pull/29928 Nathan Cutler
09:34 AM Bug #40792 (Pending Backport): monc: send_command to specific down mon breaks other mon msgs
Kefu Chai
09:18 AM Bug #41424 (Resolved): readable.sh test fails
Kefu Chai
08:52 AM Bug #22266 (Resolved): mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
08:46 AM Backport #41533 (Resolved): mimic: Move bluefs alloc size initialization log message to log level 1
https://github.com/ceph/ceph/pull/30219 Nathan Cutler
08:46 AM Backport #41531 (Resolved): nautilus: Move bluefs alloc size initialization log message to log le...
https://github.com/ceph/ceph/pull/30229 Nathan Cutler
08:46 AM Backport #41530 (Resolved): mimic: doc: mon_health_to_clog_* values flipped
https://github.com/ceph/ceph/pull/30227 Nathan Cutler
08:46 AM Backport #41529 (Resolved): nautilus: doc: mon_health_to_clog_* values flipped
https://github.com/ceph/ceph/pull/30003 Nathan Cutler
08:32 AM Bug #41526 (Rejected): Choosing the next PG for a deep scrubs wrong.
I have ceph cluster in this state:... Fyodor Ustinov
07:33 AM Backport #40943: mimic: mon/OSDMonitor.cc: better error message about min_size
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29618
m...
Nathan Cutler
07:33 AM Backport #41086: mimic: Change default for bluestore_fsck_on_mount_deep as false
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29699
m...
Nathan Cutler
07:25 AM Backport #39692 (Resolved): mimic: _txc_add_transaction error (39) Directory not empty not handle...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29217
m...
Nathan Cutler
07:18 AM Backport #40654: mimic: Lower the default value of osd_deep_scrub_large_omap_object_key_threshold
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29174
m...
Nathan Cutler
07:18 AM Backport #38552: mimic: core: lazy omap stat collection
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29189
m...
Nathan Cutler
03:25 AM Bug #41517 (Resolved): Missing head object at primary with snapshots crashes primary

This script crashes osd.1 when it wants to recover to osd.3 after osd.2 is marked out. When it sees the missing "o...
David Zafman
01:22 AM Bug #41406: common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient bootstrap
Patrick Donnelly wrote:
> Does any code actually do that sequence of events. I would think a SafeTimer should not be...
haitao chen
12:47 AM Bug #41514: in-flight manifest ops not properly cancelled on interval changing
http://pulpito.ceph.com/xxg-2019-08-25_02:12:25-rados:thrash-wip-inc-recovery-5-distro-basic-smithi/4250539/ xie xingguo
12:36 AM Bug #41514 (Resolved): in-flight manifest ops not properly cancelled on interval changing
which as a result makes PrimaryLogPG::on_flushed() unhappy:... xie xingguo

08/26/2019

10:52 PM Bug #40721 (Need More Info): backfill caught in loop from block
Samuel Just
10:51 PM Bug #40721: backfill caught in loop from block
I don't think I can make further progress without more logs, I'm marking this need more info for the time being. As ... Samuel Just
09:29 PM Bug #40721: backfill caught in loop from block
Based on the snapcontext, make_writeable should have created a clone. Samuel Just
09:29 PM Bug #40721: backfill caught in loop from block
The copy_from on that object lasted until the end of the test. It did succeed, but presumably during shutdown once t... Samuel Just
08:36 PM Bug #40721: backfill caught in loop from block
Or, I guess the directory is probably correct in that the teuthology.log output is consistent with the above, but the... Samuel Just
08:01 PM Bug #40721: backfill caught in loop from block
Unfortunately, I think the job number is wrong -- I don't see that object in the log (smithi19817795-* objects are in... Samuel Just
09:48 PM Bug #41362 (Fix Under Review): Rados bench sequential and random read: not behaving as expected w...
Patrick Donnelly
09:18 PM Bug #24057 (Can't reproduce): cbt fails to copy results to the archive dir
Neha Ojha
09:10 PM Support #41402 (Rejected): OSD's memory are beyound controlled
Please seek help on ceph-users mailing list. This is not the correct forum to seek support. Patrick Donnelly
09:10 PM Documentation #41403 (Pending Backport): doc: mon_health_to_clog_* values flipped
Patrick Donnelly
09:09 PM Documentation #41403 (Resolved): doc: mon_health_to_clog_* values flipped
Patrick Donnelly
09:08 PM Documentation #41403 (Fix Under Review): doc: mon_health_to_clog_* values flipped
Patrick Donnelly
09:06 PM Bug #41406 (Need More Info): common: SafeTimer reinit doesn't fix up "stopping" bool, used in Mon...
Does any code actually do that sequence of events. I would think a SafeTimer should not be re-inited after shutdown. Patrick Donnelly
08:41 PM Bug #37775: some pg_created messages not sent to mon
The original bug is about a pool level flag - "FLAG_CREATING", which was introduced in 0e526b467af2699e389e7f28a6d709... Neha Ojha
08:40 PM Backport #39475: mimic: segv in fgets() in collect_sys_info reading /proc/cpuinfo
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28206
m...
Nathan Cutler
08:38 PM Backport #40651: mimic: os/bluestore: fix >2GB writes
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28967
m...
Nathan Cutler
08:37 PM Bug #40720 (Resolved): mimic, nautilus: make bitmap allocator the default allocator for bluestore
Nathan Cutler
08:35 PM Backport #38751: mimic: should report EINVAL in ErasureCode::parse() if m<=0
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28995
m...
Nathan Cutler
08:35 PM Backport #39513: mimic: osd: segv in _preboot -> heartbeat
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28220
m...
Nathan Cutler
08:28 PM Backport #39311: mimic: crushtool crash on Fedora 28 and newer
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/27986
m...
Nathan Cutler
08:28 PM Backport #39720: mimic: short pg log+nautilus-p2p-stress-split: "Error: finished tid 3 when last_...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28089
m...
Nathan Cutler
08:28 PM Backport #39374: mimic: ceph tell osd.xx bench help : gives wrong help
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28097
m...
Nathan Cutler
08:28 PM Backport #39422: mimic: Don't mark removed osds in when running "ceph osd in any|all|*"
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28142
m...
Nathan Cutler
08:27 PM Backport #38341: mimic: pg stuck in backfill_wait with plenty of disk space
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28201
m...
Nathan Cutler
08:17 PM Backport #40639: mimic: osd: report omap/data/metadata usage
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28852
m...
Nathan Cutler
07:58 PM Bug #41399 (Pending Backport): Move bluefs alloc size initialization log message to log level 1
Neha Ojha
07:33 PM Bug #38827 (Pending Backport): valgrind: UninitCondition in ceph::crypto::onwire::AES128GCM_OnWir...
seeing this in the rgw suite for nautilus runs, so tagging for backport of https://github.com/ceph/ceph/pull/28305 Casey Bodley
02:59 PM Backport #41503 (Resolved): nautilus: Warning about past_interval bounds on deleting pg
https://github.com/ceph/ceph/pull/30000 Nathan Cutler
02:59 PM Backport #41502 (Resolved): mimic: Warning about past_interval bounds on deleting pg
https://github.com/ceph/ceph/pull/30222 Nathan Cutler
02:59 PM Backport #41501 (Resolved): nautilus: backfill_toofull while OSDs are not full (Unneccessary HEAL...
https://github.com/ceph/ceph/pull/29999 Nathan Cutler
02:58 PM Backport #41500 (Rejected): luminous: backfill_toofull while OSDs are not full (Unneccessary HEAL...
Nathan Cutler
02:58 PM Backport #41499 (Rejected): mimic: backfill_toofull while OSDs are not full (Unneccessary HEALTH_...
Nathan Cutler
02:51 PM Backport #41491 (Resolved): nautilus: OSDCap.PoolClassRNS test aborts
https://github.com/ceph/ceph/pull/29998 Nathan Cutler
02:50 PM Backport #41490 (Resolved): mimic: OSDCap.PoolClassRNS test aborts
https://github.com/ceph/ceph/pull/30214 Nathan Cutler
02:42 PM Backport #41455 (Resolved): nautilus: osd: fix ceph_assert(mem_avail >= 0) caused by the unset cg...
https://github.com/ceph/ceph/pull/29745 Nathan Cutler
02:41 PM Backport #41453 (Resolved): nautilus: mon: C_AckMarkedDown has not handled the Callback Arguments
https://github.com/ceph/ceph/pull/29997 Nathan Cutler
02:25 PM Backport #41449 (Resolved): mimic: mon: C_AckMarkedDown has not handled the Callback Arguments
https://github.com/ceph/ceph/pull/30213 Nathan Cutler
02:25 PM Backport #41448 (Resolved): nautilus: osd/PrimaryLogPG: Access destroyed references in finish_deg...
https://github.com/ceph/ceph/pull/29994 Nathan Cutler
02:25 PM Backport #41447 (Resolved): mimic: osd/PrimaryLogPG: Access destroyed references in finish_degrad...
https://github.com/ceph/ceph/pull/30291 Nathan Cutler
11:23 AM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
With thanks to Paul Emmerich in https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QGY75UVQEAT2SUHHKZC2K... Florian Haas
10:59 AM Backport #39698: mimic: OSD down on snaptrim.
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28202
m...
Nathan Cutler
10:57 AM Backport #39518: mimic: snaps missing in mapper, should be: ca was r -2...repaired
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28232
m...
Nathan Cutler
10:56 AM Backport #39538: mimic: osd/ReplicatedBackend.cc: 1321: FAILED assert(get_parent()->get_log().get...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28259
m...
Nathan Cutler
10:56 AM Backport #39737: mimic: Binary data in OSD log from "CRC header" message
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28503
m...
Nathan Cutler
10:56 AM Backport #39744: mimic: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28540
m...
Nathan Cutler
09:06 AM Bug #41429: Incorrect logical operator in Monitor::handle_auth_request()
“&&” in the following code snippet:... yupeng chen
08:48 AM Bug #41429 (Resolved): Incorrect logical operator in Monitor::handle_auth_request()
When checking auth_mode against AUTH_MODE_MON and AUTH_MODE_MON_MAX in Monitor::handle_auth_request(),
a logical AND...
yupeng chen
08:59 AM Backport #40948 (Resolved): nautilus: Better default value for osd_snap_trim_sleep
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29678
m...
Nathan Cutler
08:48 AM Backport #40885 (Resolved): nautilus: ceph mgr module ls -f plain crashes mon
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29566
m...
Nathan Cutler
08:33 AM Backport #40322: nautilus: nautilus with requrie_osd_release < nautilus cannot increase pg_num
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29671
m...
Nathan Cutler
07:21 AM Bug #41427 (Resolved): set-chunk raced with deep-scrub
which as a result cause object info inconsistency:
"2019-08-25T04:04:19.571852+0000 osd.1 (osd.1) 253 : cluster ...
xie xingguo

08/25/2019

01:44 PM Bug #41424 (Fix Under Review): readable.sh test fails
Kefu Chai
10:38 AM Bug #41424 (Resolved): readable.sh test fails
... Kefu Chai
03:18 AM Documentation #41403: doc: mon_health_to_clog_* values flipped
Verified on nautilus (ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)) that the defa... James McClune

08/24/2019

12:35 AM Bug #41156 (Won't Fix): dump_float() poor output
David Zafman

08/23/2019

11:45 PM Backport #24360: luminous: osd: leaked Session on osd.7
https://github.com/ceph/ceph/pull/29859 Samuel Just
08:31 AM Bug #41406 (New): common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient boot...
1, New a object of SafeTimer().
2, call init.
3, call add_event_after.
4, call shutdown.
5, call init again.
6, ...
haitao chen
05:30 AM Bug #39546 (Pending Backport): Warning about past_interval bounds on deleting pg
Kefu Chai
05:24 AM Bug #41217 (Pending Backport): mon: C_AckMarkedDown has not handled the Callback Arguments
Kefu Chai
05:17 AM Bug #40835 (Pending Backport): OSDCap.PoolClassRNS test aborts
Kefu Chai
02:39 AM Documentation #41403 (Resolved): doc: mon_health_to_clog_* values flipped
On my Luminous cluster (ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)), the defau... James McClune
01:34 AM Support #41402 (Rejected): OSD's memory are beyound controlled
My env :
[store@server01 ~]$ ceph -v
ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
...
伟杰 谭

08/22/2019

07:31 PM Backport #40840 (In Progress): nautilus: Explicitly requested repair of an inconsistent PG cannot...
David Zafman
05:27 PM Bug #41353 (Resolved): scrub/osd-scrub-snaps.sh fails
Sage Weil
05:02 PM Bug #41399 (Fix Under Review): Move bluefs alloc size initialization log message to log level 1
Vikhyat Umrao
04:07 PM Bug #41399: Move bluefs alloc size initialization log message to log level 1
- At present, from a shared BlueStore OSD which has wal, db and block all in one it is being set as 64K we can see in... Vikhyat Umrao
04:05 PM Bug #41399 (Resolved): Move bluefs alloc size initialization log message to log level 1
- https://github.com/ceph/ceph/pull/29537... Vikhyat Umrao
04:57 PM Bug #41255 (In Progress): backfill_toofull seen on cluster where the most full OSD is at 1%
David Zafman
02:48 PM Bug #20050 (Resolved): osd: very old pg creates take a long time to build past_intervals
All of this code went away by mimic. Sage Weil
02:23 PM Backport #41238: nautilus: Implement mon_memory_target
Sridhar Seshasayee wrote:
> https://github.com/ceph/ceph/pull/29652
The above PR is dependent on the backport of ...
Sridhar Seshasayee
12:55 PM Bug #41236 (Fix Under Review): cosbench failures in rados/perf
https://github.com/ceph/cbt/pull/191 Kefu Chai
09:53 AM Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+...
We just bumped into this, on Luminous (12.2.12). It actually caused us momentary loss of quorum.
Sequence of event...
Florian Haas
08:45 AM Documentation #41389 (Resolved): wrong datatype describing crush_rule
current documentation for luminous https://docs.ceph.com/docs/luminous/rados/operations/pools/ is wrong regarding cru... Torben Hørup
05:47 AM Bug #37654: FAILED ceph_assert(info.history.same_interval_since != 0) in PG::start_peering_interv...
http://pulpito.ceph.com/xxg-2019-08-21_09:03:35-rados:thrash-wip-scrub-omap-error-distro-basic-smithi/4236636/ xie xingguo
05:41 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
I had a chance to get back to this.
I fuse mounted the uploaded image and copied the osdmap data for epoch 80890 o...
Brad Hubbard

08/21/2019

10:25 PM Bug #40792 (Fix Under Review): monc: send_command to specific down mon breaks other mon msgs
Updated for a few issues and marked the PR for testing again. Greg Farnum
08:21 PM Bug #40792 (In Progress): monc: send_command to specific down mon breaks other mon msgs
Greg Farnum
09:47 PM Bug #24531: Mimic MONs have slow/long running ops
Greg Farnum
09:20 PM Bug #40073 (Resolved): PG scrub stamps reset to 0.000000
David Zafman
09:18 PM Bug #39570 (Resolved): nautilus with requrie_osd_release < nautilus cannot increase pg_num
Greg Farnum
09:18 PM Backport #40322 (Resolved): nautilus: nautilus with requrie_osd_release < nautilus cannot increas...
Greg Farnum
09:18 PM Bug #39972 (Resolved): librados 'buffer::create' and related functions are not exported in C++ API
Greg Farnum
09:17 PM Backport #24360 (In Progress): luminous: osd: leaked Session on osd.7
Samuel Just
08:45 PM Backport #24360 (New): luminous: osd: leaked Session on osd.7
Meh, actually probably is. Samuel Just
08:40 PM Backport #24360 (Rejected): luminous: osd: leaked Session on osd.7
Not worth backporting to luminous. Samuel Just
09:16 PM Backport #39506 (Rejected): mimic: Give recovery for inactive PGs a higher priority
David Zafman
09:16 PM Backport #39505 (Rejected): luminous: Give recovery for inactive PGs a higher priority
David Zafman
09:16 PM Bug #39484 (Resolved): mon: "FAILED assert(pending_finishers.empty())" when paxos restart
Greg Farnum
09:16 PM Bug #39099 (Resolved): Give recovery for inactive PGs a higher priority
David Zafman
09:13 PM Backport #39518 (Resolved): mimic: snaps missing in mapper, should be: ca was r -2...repaired
David Zafman
09:12 PM Bug #39333 (Resolved): osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
Greg Farnum
09:10 PM Bug #37439 (Resolved): Degraded PG does not discover remapped data on originating OSD
Greg Farnum
09:10 PM Backport #39431 (Resolved): luminous: Degraded PG does not discover remapped data on originating OSD
Greg Farnum
09:08 PM Bug #38359 (Resolved): osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Greg Farnum
09:08 PM Backport #38442 (Resolved): luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Greg Farnum
09:00 PM Documentation #23999 (Resolved): osd_recovery_priority is not documented (but osd_recovery_op_pri...
Greg Farnum
09:00 PM Backport #38567 (Resolved): luminous: osd_recovery_priority is not documented (but osd_recovery_o...
Greg Farnum
08:58 PM Bug #38432 (Resolved): ENOENT on setattrs (obj was recently deleted)
David Zafman
08:57 PM Backport #38507 (Resolved): mimic: ENOENT on setattrs (obj was recently deleted)
David Zafman
08:53 PM Bug #21142 (Won't Fix): OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
If this pops up and causes more trouble we may try again but given the efforts so far it seems like we aren't going t... Greg Farnum
08:52 PM Backport #38256 (Duplicate): luminous: OSD crashes when loading pgs with "FAILED assert(interval....
The original issue #21142 is a luminous-only bug report and there's no code fixing it yet. Greg Farnum
08:44 PM Bug #24174 (Resolved): PrimaryLogPG::try_flush_mark_clean mixplaced ctx release
David Zafman
08:39 PM Backport #23926: luminous: disable bluestore cache caused a rocksdb error
We need to discuss if this is worth backporting any more; it may not be but Kefu can probably talk to the right people? Greg Farnum
08:37 PM Bug #18746 (Resolved): monitors crashing ./include/interval_set.h: 355: FAILED assert(0) (jewel+k...
Already backported to luminous. Samuel Just
08:33 PM Bug #21629 (Resolved): interval_map.h: 161: FAILED assert(len > 0)
Greg Farnum
08:32 PM Bug #21127 (Resolved): qa/standalone/scrub/osd-scrub-repair.sh timeout
Greg Farnum
08:18 PM Bug #41383 (Need More Info): scrub object count mismatch on device_health_metrics pool
Greg Farnum
08:18 PM Bug #41383: scrub object count mismatch on device_health_metrics pool
This may be the empty object names that the device health manager was inappropriately creating? See the thread "[ceph... Greg Farnum
07:04 PM Bug #41383 (Resolved): scrub object count mismatch on device_health_metrics pool
jenglisch on irc reports multiple scrub errors (error, repaired, reappeared a few days later) on metrics pool.
<pr...
Sage Weil
08:11 PM Bug #41200 (Pending Backport): osd: fix ceph_assert(mem_avail >= 0) caused by the unset cgroup me...
Josh Durgin
07:56 PM Bug #39286 (Pending Backport): primary recovery local missing object did not update obc
Greg Farnum
07:52 PM Bug #38649 (Can't reproduce): [ERR] full status failsafe engaged, dropping updates, now -21474836...
Greg Farnum
07:51 PM Bug #38402: ceph-objectstore-tool on down osd w/ not enough in osds
We think it just needs test fixing. Those in the rados suite test review group can see https://docs.google.com/docume... Greg Farnum
07:49 PM Bug #41385 (Resolved): osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_missing.count(from...
... Sage Weil
07:45 PM Bug #38322 (Fix Under Review): luminous: mons do not trim maps until restarted
Neha Ojha
07:44 PM Bug #40367: "*** Caught signal (Segmentation fault) **" in upgrade:luminous-x-nautilus
same thing upgrading from mimic:
/a/sage-2019-08-21_15:17:39-rados-wip-sage2-testing-2019-08-20-0935-distro-basic-...
Sage Weil
07:31 PM Bug #38023 (Closed): segv on FileJournal::prepare_entry in bufferlist
Seems to have been resolved alongside those related tickets? Greg Farnum
07:30 PM Bug #37808 (Can't reproduce): osd: osdmap cache weak_refs assert during shutdown
Greg Farnum
07:28 PM Bug #37798 (Can't reproduce): ceph-objectstore-tool crash from finisher
David Zafman
07:27 PM Bug #37786 (Can't reproduce): test fails in mon/crush_ops.sh
Greg Farnum
05:06 PM Backport #41084: nautilus: Change default for bluestore_fsck_on_mount_deep as false
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29697
m...
Nathan Cutler
05:04 PM Backport #40537: nautilus: osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29372
m...
Nathan Cutler
04:59 PM Backport #40942: nautilus: mon/OSDMonitor.cc: better error message about min_size
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29617
m...
Nathan Cutler
04:58 PM Backport #40940: nautilus: Update rocksdb to v6.1.2
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29440
m...
Nathan Cutler
04:57 PM Backport #41092: nautilus: rocksdb: enable rocksdb_rmrange=true by default and make delete range ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29439
m...
Nathan Cutler
02:53 PM Bug #41353: scrub/osd-scrub-snaps.sh fails
David Zafman
02:39 PM Backport #39516 (Resolved): nautilus: osd-backfill-space.sh test failed in TEST_backfill_multi_pa...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28187
m...
Nathan Cutler
02:38 PM Backport #40625: nautilus: OSDs get killed by OOM due to a broken switch
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29391
m...
Nathan Cutler
02:37 PM Bug #41052: nautilus: cbt cosbench workloads failing in rados/perf suite
https://github.com/ceph/ceph/pull/29453
merge commit 59177f780c5be0e6530df2fdba1abfa6e3187569 (v14.2.2-230-g59177f780c)
Nathan Cutler
02:36 PM Backport #40180 (Resolved): nautilus: qa/standalone/scrub/osd-scrub-snaps.sh sometimes fails
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29252
m...
Nathan Cutler
02:35 PM Backport #40465 (Resolved): nautilus: osd beacon sometimes has empty pg list
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29254
m...
Nathan Cutler
02:35 PM Backport #39743 (Resolved): nautilus: mon: "FAILED assert(pending_finishers.empty())" when paxos ...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28528
m...
Nathan Cutler
02:34 PM Backport #40382: nautilus: RuntimeError: expected MON_CLOCK_SKEW but got none
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28576
m...
Nathan Cutler
02:32 PM Backport #40274 (Resolved): nautilus: librados 'buffer::create' and related functions are not exp...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29244
m...
Nathan Cutler
02:25 PM Backport #40667: nautilus: PG scrub stamps reset to 0.000000
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28869
m...
Nathan Cutler
02:24 PM Backport #40730 (Resolved): nautilus: mon: auth mon isn't loading full KeyServerData after restart
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/28993
m...
Nathan Cutler
02:24 PM Backport #39693 (Resolved): nautilus: _txc_add_transaction error (39) Directory not empty not han...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29115
m...
Nathan Cutler
07:53 AM Bug #24339: FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in scan_requ...
Not to my knowledge, but I haven't checked in a while. Ilya Dryomov
07:33 AM Bug #22233: prime_pg_temp breaks on uncreated pgs
> the mon should see hte pg mapping change from [3,6] to [4,6] and send the create to osd.4
exactly. that's why i ...
Kefu Chai
01:16 AM Feature #41363 (New): Allow user to cancel scrub requests

If a user requests multiple scrubs or deep-scrubs, they should be able to cancel the requests. It may be that they...
David Zafman
01:02 AM Bug #41362 (Resolved): Rados bench sequential and random read: not behaving as expected when op s...
ObjBencher::seq_read_bench() is using "num_objects > data.started" to make sure
we don't issue more reads than what ...
Albert Chen

08/20/2019

11:25 PM Bug #35974: Apparent export-diff/import-diff corruption
@Josh: AFAIK, the diff calculations do not set the LOCALIZED/BALANCED read flags. Those are only (optionally) set on ... Jason Dillaman
11:07 PM Bug #35974: Apparent export-diff/import-diff corruption
This sounds like it may be due to balance reads not behaving properly for the diff rados op, since it's not operating... Josh Durgin
11:17 PM Bug #37656 (Can't reproduce): FileStore::_do_transaction() crashed with error 17 (merge collectio...
Josh Durgin
11:15 PM Bug #37654 (Can't reproduce): FAILED ceph_assert(info.history.same_interval_since != 0) in PG::st...
Josh Durgin
11:13 PM Bug #36746: Ignore osd_find_best_info_ignore_history_les for erasure-coded PGs
Maybe change this from true/false to specify a PG, so only that PG is affected. David Zafman
11:11 PM Bug #36388 (Resolved): osd: "out of order op"
Josh Durgin
11:01 PM Bug #35810 (Can't reproduce): FAILED assert(entries.begin()->version > info.last_update)
Josh Durgin
11:01 PM Bug #35542 (Won't Fix): Backfill and recovery should validate all checksums

Bluestore makes this unnecessary and it is only possible on a pull of the complete object.
David Zafman
10:58 PM Bug #26947 (Resolved): ENOENT on collection_move_rename from divergent activate
Neha thinks this is the same as a merge divergent object bug that was fixed. Josh Durgin
10:57 PM Bug #25155 (Can't reproduce): mon crash from 'ceph osd erasure-code-profile set lrcprofile name=l...
Josh Durgin
10:56 PM Bug #24678 (Can't reproduce): ceph-mon segmentation fault after setting pool size to 1 on degrade...
Josh Durgin
10:53 PM Bug #24531 (Fix Under Review): Mimic MONs have slow/long running ops
Josh Durgin
10:52 PM Bug #24339: FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in scan_requ...
Eek what did you end up doing, Ilya? Anything happen here? Greg Farnum
10:49 PM Bug #24320 (Resolved): out of order reply and/or osd assert with set-chunks-read.yaml
Josh Durgin
10:49 PM Bug #24148 (Duplicate): Segmentation fault out of ObcLockManager::get_lock_type()
Greg Farnum
10:47 PM Bug #23892 (Can't reproduce): luminous->mimic: mon segv in ~MonOpRequest from OpHistoryServiceThread
Believe we've made some fixes to OpHistory since April last year... Greg Farnum
10:43 PM Bug #23830 (Can't reproduce): rados/standalone/erasure-code.yaml gets 160 byte pgmeta object
Greg Farnum
10:40 PM Bug #23760: mon: `config get <who>` does not allow `who` as 'mon'/'osd'
Is this still an issue Joao, Josh? Greg Farnum
10:13 PM Bug #23511 (Can't reproduce): forwarded osd_failure leak in mon
I don't think we've seen this again and may have made even more no_reply fixes? Greg Farnum
10:12 PM Bug #23402 (Duplicate): objecter: does not resend op on split interval
Happily fixed now. Greg Farnum
10:08 PM Bug #22624 (Duplicate): filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No suc...
Greg Farnum
10:04 PM Bug #22656 (Can't reproduce): scrub mismatch on bytes (cache pools)
Neha Ojha
10:01 PM Bug #22408 (Can't reproduce): objecter: sent out of order ops
Neha Ojha
10:00 PM Bug #22233 (In Progress): prime_pg_temp breaks on uncreated pgs
Greg Farnum
09:57 PM Bug #21965 (Can't reproduce): mon/MonClient.cc: 478: FAILED assert(authenticate_err == 0)
Josh Durgin
09:57 PM Bug #21823 (Can't reproduce): on_flushed: object ... obc still alive (ec + cache tiering)
Josh Durgin
09:56 PM Bug #21686 (Can't reproduce): osd/PrimaryLogPG.cc: 10195: FAILED assert(i->second == obc) in fini...
Greg Farnum
09:55 PM Bug #21557 (Can't reproduce): osd.6 found snap mapper error on pg 2.0 oid 2:0e781f33:::smithi1443...
Josh Durgin
09:55 PM Bug #20909: Error ETIMEDOUT: crush test failed with -110: timed out during smoke test (5 seconds)
We've made some improvements and fixed some bad inefficiencies in the CRUSH code and updates. Greg Farnum
09:54 PM Bug #20909 (Can't reproduce): Error ETIMEDOUT: crush test failed with -110: timed out during smok...
Josh Durgin
09:54 PM Bug #21130 (Can't reproduce): "FAILED assert(bh->last_write_tid > tid)" in powercycle-master-test...
Josh Durgin
09:54 PM Bug #20874 (Can't reproduce): osd/PGLog.h: 1386: FAILED assert(miter == missing.get_items().end()...
Josh Durgin
09:52 PM Bug #20798 (Can't reproduce): LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
Josh Durgin
09:52 PM Bug #20759: mon: valgrind detects a few leaks
Maybe this was some of the leaked MForwards we weren't marking as no_reply? Greg Farnum
09:51 PM Bug #20759 (Can't reproduce): mon: valgrind detects a few leaks
Josh Durgin
09:51 PM Bug #20694 (Can't reproduce): osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_lo...
Sam changed this with his PeeringStateMachine refactor. :D Greg Farnum
09:50 PM Bug #20283: qa: missing even trivial tests for many commands

ceph commands tests can go in qa/workunits/cephtool/test.sh
David Zafman
09:48 PM Bug #20303 (Can't reproduce): filejournal: Unable to read past sequence ... journal is corrupt
Josh Durgin
09:45 PM Bug #20133 (Can't reproduce): EnvLibradosMutipoolTest.DBBulkLoadKeysInRandomOrder hangs on rocksd...
Josh Durgin
09:44 PM Bug #20086 (Can't reproduce): LibRadosLockECPP.LockSharedDurPP gets EEXIST
Josh Durgin
09:42 PM Bug #20000 (Can't reproduce): osd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
Re-open if this still occurs. Josh Durgin
09:39 PM Bug #41318 (Resolved): per-pool omap broken with temp recovery objects
Sage Weil
09:38 PM Bug #19512 (Won't Fix): Sparse file info in filestore not propagated to other OSDs
If this is still an issue in bluestore, let's fix it there. Josh Durgin
09:37 PM Bug #18643 (Closed): SnapTrimmer: inconsistencies may lead to snaptrimmer hang
This no longer seems to be the case. If trim_object() returns an error to its sole caller, PrimaryLogPG::AwaitAsyncWo... Greg Farnum
09:37 PM Feature #41360 (New): snaptrim_error condition should allow repair and resume snaptrim
David Zafman
09:37 PM Bug #18667 (Can't reproduce): [cache tiering] omap data time-traveled to stale version
Josh Durgin
09:32 PM Bug #18209 (Duplicate): src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
Looks the same as the Safetimer crash that Brad fixed last year. Josh Durgin
09:30 PM Bug #17252 (Can't reproduce): [Librados] Deadlock on RadosClient::watch_flush
This hasn't come up again and got fixed in the only user. Greg Farnum
09:24 PM Bug #16236 (Won't Fix): cache/proxied ops from different primaries (cache interval change) don't ...
Josh Durgin
09:21 PM Bug #15653 (Resolved): crush: low weight devices get too many objects for num_rep > 1
Closed since upmap fixes this. Josh Durgin
09:12 PM Bug #38483: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
Sage says the PR is buggy, and this case is very hard to hit, so moving to normal priority. Josh Durgin
09:08 PM Bug #40245 (Won't Fix): filestore::read() does not assert on EIO
David Zafman
09:06 PM Bug #40530 (Resolved): Scrub reserves from actingbackfill put waits for acting
The fix for this was included in https://github.com/ceph/ceph/pull/28334 for tracker #40073 David Zafman
09:02 PM Bug #40576 (Closed): src/osd/PrimaryLogPG.cc: 10513: FAILED assert(head_obc)
David Zafman
08:58 PM Backport #40667 (Resolved): nautilus: PG scrub stamps reset to 0.000000
David Zafman
08:57 PM Bug #39555 (Pending Backport): backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
David Zafman
08:47 PM Bug #40522: on_local_recover doesn't touch?
Ping Sage, it sounds like you know what the issue is?
jianping, does your comment have anything to do with this st...
Greg Farnum
08:40 PM Bug #40000: osds do not bound xattrs and/or aggregate xattr data in pg log
Josh Durgin
08:36 PM Bug #39978 (Duplicate): Adding OSD to Luminous Cluster will crash the active mon
Closing in favor of the other since we've lost all the pastebins. :( Greg Farnum
08:33 PM Bug #39152 (Pending Backport): nautilus osd crash: Caught signal (Aborted) tp_osd_tp
Greg Farnum
08:30 PM Bug #38555: scrub error on ec pg, got 6579891/0 or 7569408/6832128 bytes
This report has been obviated by the PeeringState refactor/extraction. Greg Farnum
08:30 PM Bug #38555 (Can't reproduce): scrub error on ec pg, got 6579891/0 or 7569408/6832128 bytes
Josh Durgin
08:24 PM Bug #38219 (In Progress): rebuild-mondb hangs
Demoting as if you're running this you already need manual intervention anyway. Greg Farnum
08:22 PM Bug #37969 (Can't reproduce): ENOENT on setattrs
FileStore, only seen once. Greg Farnum
08:22 PM Bug #37915 (Can't reproduce): osd: Segmentation fault in OpRequest::_unregistered
There have been changes to TrackedOps since then. Greg Farnum
08:21 PM Bug #37911 (Can't reproduce): osd dequeue misorder
There have been pg merge fixes since then... Greg Farnum
08:15 PM Bug #23879: test_mon_osdmap_prune.sh fails
Are we really only seeing this about once a month? Is it just a probabilistic failure based on load of the monitor cl... Greg Farnum
08:14 PM Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
We can bump this priority up if it reappears again. Greg Farnum
07:41 PM Feature #41359 (Resolved): Adding Placement Group id in Large omap log message
... Vikhyat Umrao
03:05 PM Bug #41353 (Fix Under Review): scrub/osd-scrub-snaps.sh fails
https://github.com/ceph/ceph/pull/29774 Sage Weil
03:00 PM Bug #41353 (Resolved): scrub/osd-scrub-snaps.sh fails
... Sage Weil
02:51 PM Backport #41350 (In Progress): nautilus: hidden corei7 requirement in binary packages
Nathan Cutler
02:32 PM Backport #41350 (Resolved): nautilus: hidden corei7 requirement in binary packages
https://github.com/ceph/ceph/pull/29772 Nathan Cutler
02:32 PM Backport #41351 (Resolved): mimic: hidden corei7 requirement in binary packages
https://github.com/ceph/ceph/pull/30183 Nathan Cutler
02:32 PM Bug #41330 (Pending Backport): hidden corei7 requirement in binary packages
Nathan Cutler
01:08 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
Brad Hubbard wrote:
> Hi Troy,
>
> Before we close this I'll take a look at the image you uploaded to see if I ca...
Troy Ablan
12:31 AM Bug #41240 (Triaged): All of the cluster SSDs aborted at around the same time and will not start.
Reducing severity since the cluster is currently healthy. Brad Hubbard
12:24 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
Hi Troy,
Before we close this I'll take a look at the image you uploaded to see if I can work out the nature of th...
Brad Hubbard
12:20 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
With guidance from badone on IRC, all of the osds are running and all of the pgs are active.
http://lists.ceph.com...
Troy Ablan

08/19/2019

11:04 PM Bug #39555: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)
Joao Eduardo Luis wrote:
> The pull request provided with the fix has been merged (https://github.com/ceph/ceph/pull...
Vikhyat Umrao
09:27 PM Bug #39546: Warning about past_interval bounds on deleting pg
/a/sage-2019-08-19_13:35:06-rados-wip-sage-testing-2019-08-17-1023-distro-basic-smithi/4230273 Sage Weil
09:13 PM Bug #41190 (Fix Under Review): osd: pg stuck in waitactingchange when new acting set doesn't change
Patrick Donnelly
04:13 PM Backport #40948: nautilus: Better default value for osd_snap_trim_sleep
Prashant D wrote:
> https://github.com/ceph/ceph/pull/29678
merged
Yuri Weinstein
03:03 PM Backport #41341 (Resolved): nautilus: "CMake Error" in test_envlibrados_for_rocksdb.sh
https://github.com/ceph/ceph/pull/29979 Nathan Cutler
02:38 PM Bug #40451 (Resolved): osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)
Nathan Cutler
02:38 PM Backport #40537 (Resolved): nautilus: osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)
Nathan Cutler
01:08 PM Bug #41336 (Resolved): All OSD Faild after Reboot.
We have Faced an issue with a Writeback-Cache + EC-Pool.
* our ec-pool cration in the fist place https://pastebin....
Ansgar Jazdzewski
06:40 AM Backport #40949 (In Progress): mimic: Better default value for osd_snap_trim_sleep
https://github.com/ceph/ceph/pull/29732 Prashant D
05:03 AM Bug #41330 (Fix Under Review): hidden corei7 requirement in binary packages
Kefu Chai
04:53 AM Bug #41330 (Resolved): hidden corei7 requirement in binary packages
quote from Alexandre Oliva's mail
> After upgrading some old Phenom servers from Fedora/Freed-ora 29 to
> 30's, t...
Kefu Chai
02:42 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
output of `ceph report` as requested by badone on #ceph IRC... Troy Ablan

08/18/2019

03:22 PM Backport #40885: nautilus: ceph mgr module ls -f plain crashes mon
Prashant D wrote:
> https://github.com/ceph/ceph/pull/29566
merged
Yuri Weinstein
03:22 PM Backport #40322: nautilus: nautilus with requrie_osd_release < nautilus cannot increase pg_num
Neha Ojha wrote:
> https://github.com/ceph/ceph/pull/29671
merged
Yuri Weinstein
07:00 AM Bug #41253 (Pending Backport): "CMake Error" in test_envlibrados_for_rocksdb.sh
Kefu Chai
04:04 AM Bug #41253 (Fix Under Review): "CMake Error" in test_envlibrados_for_rocksdb.sh
Kefu Chai

08/17/2019

01:18 AM Bug #41318 (Fix Under Review): per-pool omap broken with temp recovery objects
Josh Durgin

08/16/2019

09:08 PM Backport #40943 (Resolved): mimic: mon/OSDMonitor.cc: better error message about min_size
Neha Ojha
08:06 PM Bug #41253 (In Progress): "CMake Error" in test_envlibrados_for_rocksdb.sh
Neha Ojha
07:59 PM Bug #41253: "CMake Error" in test_envlibrados_for_rocksdb.sh
Neha Ojha
06:35 PM Bug #41321 (New): crush: add-bucket default to root=default

I would consider it a usability improvement if 'ceph osd crush add-bucket' commands used root=default unless anothe...
Kyle Bader
04:50 PM Bug #41318: per-pool omap broken with temp recovery objects
Sage Weil
04:48 PM Bug #41318 (Resolved): per-pool omap broken with temp recovery objects
- recovery creates a temp recovery object,... Sage Weil
04:32 PM Backport #41086 (Resolved): mimic: Change default for bluestore_fsck_on_mount_deep as false
Neha Ojha
01:13 AM Backport #41086 (In Progress): mimic: Change default for bluestore_fsck_on_mount_deep as false
https://github.com/ceph/ceph/pull/29699 Prashant D
04:31 PM Backport #41084 (Resolved): nautilus: Change default for bluestore_fsck_on_mount_deep as false
Neha Ojha
01:11 AM Backport #41084 (In Progress): nautilus: Change default for bluestore_fsck_on_mount_deep as false
https://github.com/ceph/ceph/pull/29697 Prashant D
04:11 PM Bug #41317 (Resolved): PeeringState::GoClean will call purge_strays unconditionally
http://pulpito.ceph.com/kchai-2019-08-15_06:56:12-rados-wip-kefu-testing-2019-08-15-1125-distro-basic-mira/4216150/
...
Samuel Just
03:36 PM Bug #41236: cosbench failures in rados/perf
The issue of cosbench not gathering test results is fixed by reverting https://github.com/ceph/cbt/pull/152.
Revert ...
Neha Ojha
12:39 PM Bug #41313 (New): PG distribution completely messed up since Nautilus
Since using Nautilus the data distribution is an absolute nightmare. We never had issues on Luminous.
Look at this p...
Anonymous
12:22 PM Bug #41250 (Pending Backport): osd/PrimaryLogPG: Access destroyed references in finish_degraded_o...
Kefu Chai
04:52 AM Bug #41190: osd: pg stuck in waitactingchange when new acting set doesn't change
the original log was too large,i filter the problem pg,please check it is enough to annalyze the problem. qiuzhang chen

08/15/2019

11:02 PM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
I have imaged one of the affected OSDs and uploaded a gzipped version of it in the hopes that direct inspection by a ... Troy Ablan
07:59 PM Backport #40537: nautilus: osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/29372
merged
Yuri Weinstein
09:11 AM Backport #41291 (Resolved): mimic: filestore pre-split may not split enough directories
https://github.com/ceph/ceph/pull/30182 Nathan Cutler
06:51 AM Bug #41236: cosbench failures in rados/perf
Neha, i reverted the cbt to ff779d212f5fb9bae6947952ac40e32308ceead5 and reran the cosbench tests. tests were still g... Kefu Chai
06:25 AM Bug #41196: osd: there is no client app running but a watcher remains in OSD
Josh Durgin wrote:
> Hmm, could we be failing to update the oi or the cache of it when removing the watcher?
I re...
yu feng
05:50 AM Backport #40948 (In Progress): nautilus: Better default value for osd_snap_trim_sleep
https://github.com/ceph/ceph/pull/29678 Prashant D

08/14/2019

11:55 PM Bug #38893 (Resolved): RuntimeError: expected MON_CLOCK_SKEW but got none
Neha Ojha
11:54 PM Backport #40382 (Resolved): nautilus: RuntimeError: expected MON_CLOCK_SKEW but got none
Neha Ojha
11:17 PM Bug #40403: osd: rollforward may need to mark pglog dirty
Hi Nathan, https://github.com/ceph/ceph/pull/28621 is a follow-on fix to https://github.com/ceph/ceph/pull/27015, bot... Neha Ojha
09:30 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
Here's the output from ceph status:... Bryan Stillwell
09:21 PM Bug #41255 (Resolved): backfill_toofull seen on cluster where the most full OSD is at 1%
After upgrading both our test and staging clusters to Nautilus (14.2.2), we've seen both of them report some PGs as b... Bryan Stillwell
09:23 PM Bug #41196: osd: there is no client app running but a watcher remains in OSD
Hmm, could we be failing to update the oi or the cache of it when removing the watcher? Josh Durgin
09:14 PM Bug #41145 (Duplicate): osd: bad alloc exception
Josh Durgin
09:09 PM Bug #41254 (New): failed qa/workunits/cephtool/test.sh in rados
Run: http://pulpito.ceph.com/yuriw-2019-08-13_15:08:38-rados-wip-yuri2-testing-2019-08-12-1528-nautilus-distro-basic-... Yuri Weinstein
09:06 PM Bug #41253 (Resolved): "CMake Error" in test_envlibrados_for_rocksdb.sh
Run: http://pulpito.ceph.com/yuriw-2019-08-13_15:08:38-rados-wip-yuri2-testing-2019-08-12-1528-nautilus-distro-basic-... Yuri Weinstein
08:55 PM Backport #40655 (Resolved): nautilus: Lower the default value of osd_deep_scrub_large_omap_object...
Neha Ojha
08:55 PM Bug #40586 (Resolved): OSDs get killed by OOM due to a broken switch
Neha Ojha
08:54 PM Backport #40625 (Resolved): nautilus: OSDs get killed by OOM due to a broken switch
Neha Ojha
08:52 PM Bug #40636 (Resolved): os/bluestore: fix >2GB writes
Neha Ojha
08:51 PM Backport #40652 (Resolved): nautilus: os/bluestore: fix >2GB writes
Neha Ojha
08:50 PM Backport #40942 (Resolved): nautilus: mon/OSDMonitor.cc: better error message about min_size
Neha Ojha
08:50 PM Bug #40915 (Resolved): Update rocksdb to v6.1.2
Neha Ojha
08:49 PM Backport #40940 (Resolved): nautilus: Update rocksdb to v6.1.2
Neha Ojha
08:48 PM Bug #40969 (Resolved): rocksdb: enable rocksdb_rmrange=true by default and make delete range opti...
Neha Ojha
08:47 PM Backport #41092 (Resolved): nautilus: rocksdb: enable rocksdb_rmrange=true by default and make de...
https://github.com/ceph/ceph/pull/29439 Neha Ojha
07:55 PM Backport #40322 (In Progress): nautilus: nautilus with requrie_osd_release < nautilus cannot incr...
https://github.com/ceph/ceph/pull/29671 Neha Ojha
07:11 PM Bug #40117 (Fix Under Review): PG stuck in WaitActingChange
Neha Ojha
05:28 PM Bug #40117: PG stuck in WaitActingChange
PR: https://github.com/ceph/ceph/pull/29669
I've already commit a pr to master.
qiuzhang chen
05:18 PM Bug #41190: osd: pg stuck in waitactingchange when new acting set doesn't change
new PR: https://github.com/ceph/ceph/pull/29668# qiuzhang chen
04:00 PM Bug #41250 (Fix Under Review): osd/PrimaryLogPG: Access destroyed references in finish_degraded_o...
Kefu Chai
03:51 PM Bug #41250 (Resolved): osd/PrimaryLogPG: Access destroyed references in finish_degraded_object
see https://github.com/ceph/ceph/pull/29663
trace:...
tao ning
06:09 AM Backport #41238 (In Progress): nautilus: Implement mon_memory_target
Sridhar Seshasayee
04:47 AM Backport #41238 (Resolved): nautilus: Implement mon_memory_target
https://github.com/ceph/ceph/pull/30419 Sridhar Seshasayee
05:35 AM Bug #41240: All of the cluster SSDs aborted at around the same time and will not start.
Additional info that may be useful: the rgw instance has over 1B objects spread across about 66k buckets. Troy Ablan
05:33 AM Bug #41240 (Can't reproduce): All of the cluster SSDs aborted at around the same time and will no...
There are 14 SSDs and a few hundred HDDs in this cluster.
The SSDs all crashed at around the same time, and they a...
Troy Ablan
04:00 AM Bug #41236: cosbench failures in rados/perf
rerunning the tests with more verbose logging.
http://pulpito.ceph.com/kchai-2019-08-14_03:56:19-perf-basic-maste...
Kefu Chai
02:59 AM Bug #41236 (Resolved): cosbench failures in rados/perf
Starts here:... Neha Ojha

08/13/2019

06:28 PM Documentation #40643: clearify begin hour + end hour
By looking at src/osd/OSD.cc i have concluded that crossing midnight is allowed Torben Hørup
04:50 AM Bug #41217 (Fix Under Review): mon: C_AckMarkedDown has not handled the Callback Arguments
Kefu Chai
03:00 AM Bug #41217 (Resolved): mon: C_AckMarkedDown has not handled the Callback Arguments
mon recive MOSDMarkMeDown msg,begin to handle it,The proposal is not yet in progress. mon begin to elect for other re... shuguang wang
01:36 AM Backport #40943 (In Progress): mimic: mon/OSDMonitor.cc: better error message about min_size
https://github.com/ceph/ceph/pull/29618 Prashant D
01:33 AM Backport #40942 (In Progress): nautilus: mon/OSDMonitor.cc: better error message about min_size
https://github.com/ceph/ceph/pull/29617 Prashant D
01:25 AM Bug #41214 (Duplicate): Segmentation fault in mon/test_config_key_caps.sh
Brad Hubbard
12:59 AM Bug #38406 (Resolved): osd/TestPGLog.cc: Verify that dup_index is being trimmed
Neha Ojha
12:58 AM Backport #38424 (Resolved): mimic: osd/TestPGLog.cc: Verify that dup_index is being trimmed
Neha Ojha
12:57 AM Bug #39546: Warning about past_interval bounds on deleting pg
/kchai-2019-08-12_14:45:48-rados-wip-kefu-testing-2019-08-12-1306-distro-basic-mira/4211285/ Kefu Chai
12:55 AM Bug #24361 (Resolved): auto compaction on rocksdb should kick in more often
Neha Ojha

08/12/2019

11:40 PM Bug #37329 (Resolved): doc: Add bluestore memory autotuning docs
Neha Ojha
11:40 PM Backport #37340 (Resolved): mimic: doc: Add bluestore memory autotuning docs
https://github.com/ceph/ceph/pull/25283/commits/b76c828113f56b28b398abdee00a2499ea749ce4 merged in v13.2.3 Neha Ojha
10:30 PM Backport #40940: nautilus: Update rocksdb to v6.1.2
Neha Ojha wrote:
> https://github.com/ceph/ceph/pull/29440
merged
Yuri Weinstein
10:29 PM Bug #40969: rocksdb: enable rocksdb_rmrange=true by default and make delete range optional on num...
Neha Ojha wrote:
> nautilus backport: https://github.com/ceph/ceph/pull/29439
merged
Yuri Weinstein
09:57 PM Bug #41198 (Fix Under Review): Resolved a problem where too many requests on an object caused OSD...
Neha Ojha
09:32 AM Bug #41198 (Fix Under Review): Resolved a problem where too many requests on an object caused OSD...
When there is a large number of skip writes on an object,and each time only a few bytes are written, the OC will sen... 侯 斌
09:51 PM Bug #41210 (Fix Under Review): osd: failure result of do_osd_ops not logged in prepare_transactio...
Neha Ojha
02:26 PM Bug #41210 (Resolved): osd: failure result of do_osd_ops not logged in prepare_transaction function
shuguang wang
09:19 PM Bug #41214 (Duplicate): Segmentation fault in mon/test_config_key_caps.sh
... Neha Ojha
06:04 PM Bug #41211 (Need More Info): osd crash due to rocksdb
Reported in "[ceph-users] Possibly a bug on rocksdb" on 12.2.12... Neha Ojha
10:42 AM Bug #41200 (Resolved): osd: fix ceph_assert(mem_avail >= 0) caused by the unset cgroup memory limit
if cgroup memory.limit_in_bytes is unset, it's default value is ... mingshuai wang
07:49 AM Bug #41196 (New): osd: there is no client app running but a watcher remains in OSD
We are running ceph 13.2.5 on Centos Linux 7.5.1804, and got an issue about the watcher.
For the image fAEYXC in p...
yu feng
07:01 AM Backport #40884 (In Progress): mimic: ceph mgr module ls -f plain crashes mon
https://github.com/ceph/ceph/pull/29593 Prashant D
06:48 AM Feature #40870 (Pending Backport): Implement mon_memory_target
The PR 28227 is merged into master. Need to backport this to Nautilus. Sridhar Seshasayee

08/09/2019

05:13 PM Bug #41190: osd: pg stuck in waitactingchange when new acting set doesn't change
PR: https://tracker.ceph.com/issues/40117 may have the same problem qiuzhang chen
05:02 PM Bug #41190: osd: pg stuck in waitactingchange when new acting set doesn't change
pull request:OSD/PG: Fix pg stuck in waitactingchange #29580 qiuzhang chen
03:41 PM Bug #41190 (Resolved): osd: pg stuck in waitactingchange when new acting set doesn't change
In pg GetLog state,when process choose_acting, if want not equal acting, it will request pg_temp from mon,and then po... qiuzhang chen
04:52 PM Bug #41191 (Resolved): osd: scrub error on big objects; make bluestore refuse to start on big obj...
Neha Ojha
02:36 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
Florian Haas wrote:
> So we're seeing a similar error after upgrading a Jewel cluster to Luminous (12.2.12), and the...
Sage Weil
12:52 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
So we're seeing a similar error after upgrading a Jewel cluster to Luminous (12.2.12), and then adding BlueStore OSDs... Florian Haas
12:57 PM Backport #39694: luminous: _txc_add_transaction error (39) Directory not empty not handled on ope...
Just adding a note here to say that we've encountered this problem (or a very similar one, see https://tracker.ceph.c... Florian Haas
10:26 AM Bug #41183 (Resolved): pg autoscale on EC pools
The pg_autoscaler plugin wants to seriously increase num_pg on my EC pool from 8192 to 65536, but it seems it doesn't... imirc tw
05:30 AM Backport #40885 (In Progress): nautilus: ceph mgr module ls -f plain crashes mon
https://github.com/ceph/ceph/pull/29566 Prashant D

08/08/2019

08:53 PM Bug #39159 (Resolved): qa: Fix ambiguous store_thrash thrash_store in mon_thrash.py
I don't think this needs a backport... ? Sage Weil
03:25 PM Bug #36304 (New): FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_ch...
... Kefu Chai
07:54 AM Bug #41159 (New): TableFormatter doesn't work for a single column

Adding the code below to src/test/common/test_tableformatter.cc dumps core upon flush(). ...
David Zafman
02:38 AM Bug #41150 (Resolved): osd-scrub-test.sh: TEST_interval_changes failure
David Zafman

08/07/2019

11:56 PM Bug #41156 (In Progress): dump_float() poor output
David Zafman
10:22 PM Bug #41156 (Rejected): dump_float() poor output

dump_float("15min", 810 / 1000.0) outputs "15min": 0.8100000000000001
This was introduced in https://github.com/...
David Zafman
04:12 PM Bug #41154 (New): osd: pg unknown state
hello. yesterday my cluster go crazy and zeroized action sent for one pg.
osd.119 pg_epoch: 79413 pg[15.7c1( v 794...
Alexander Kazansky
08:16 AM Bug #41150 (Resolved): osd-scrub-test.sh: TEST_interval_changes failure
From http://pulpito.ceph.com/nojha-2019-08-07_01:40:44-rados-wip-lower-bfs-alloc-size-distro-basic-smithi/4190093/
...
Josh Durgin
08:12 AM Bug #41149 (New): LibRadosTwoPoolsPP.ManifestUnset failed with -22
From http://pulpito.ceph.com/nojha-2019-08-07_00:01:13-rados-wip-bluestore-monitor-allocations-distro-basic-smithi/41... Josh Durgin
12:02 AM Bug #40765 (In Progress): mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/m...
Brad Hubbard

08/06/2019

08:59 PM Bug #41145 (Duplicate): osd: bad alloc exception
... Patrick Donnelly
04:04 PM Bug #24531: Mimic MONs have slow/long running ops
I am encountering similar issues on a cluster with all daemons running... Theo O
03:07 AM Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
Correcting my statement in comment #12.... Brad Hubbard

08/05/2019

11:04 PM Backport #41092 (Resolved): nautilus: rocksdb: enable rocksdb_rmrange=true by default and make de...
https://github.com/ceph/ceph/pull/29439 Patrick Donnelly
11:03 PM Backport #41086 (Resolved): mimic: Change default for bluestore_fsck_on_mount_deep as false
https://github.com/ceph/ceph/pull/29699 Patrick Donnelly
11:03 PM Backport #41085 (Rejected): luminous: Change default for bluestore_fsck_on_mount_deep as false
Patrick Donnelly
11:03 PM Backport #41084 (Resolved): nautilus: Change default for bluestore_fsck_on_mount_deep as false
https://github.com/ceph/ceph/pull/29697 Patrick Donnelly
06:47 PM Bug #41077 (New): The expected_num_objects parameter when creating the pool. Is it still needed w...
# ceph osd pool create testpool 4096
Error ERANGE: For better initial performance on pools expected to store a large...
Vikhyat Umrao
08:16 AM Bug #41065: new osd added to cluster upgraded from 13 to 14 will down after some days
log in per osd... hoan nv
04:09 AM Bug #41065 (Closed): new osd added to cluster upgraded from 13 to 14 will down after some days
Hi all.
My ceph cluster upgraded from 13.2.5 and 14.2.2
I am not enable mgr v2 and add 2 new mon....
hoan nv
06:44 AM Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
I was discussing this with Josh Durgin and he noticed the reuse of the bufferlist for the read and mentioned we'd had... Brad Hubbard
05:36 AM Bug #41064: OSD: assert(objiter->second->version > last_divergent_update) fails when there is onl...
https://github.com/ceph/ceph/pull/29480 Xuehan Xu
04:07 AM Bug #41064 (New): OSD: assert(objiter->second->version > last_divergent_update) fails when there ...
Recently, some OSDs in one of our cluster failed to start after a power outage
One OSD's log is as follows:
<pr...
Xuehan Xu

08/02/2019

08:18 PM Backport #39516: nautilus: osd-backfill-space.sh test failed in TEST_backfill_multi_partial()
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28187
merged
Yuri Weinstein
08:11 PM Backport #40625: nautilus: OSDs get killed by OOM due to a broken switch
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/29391
merged
Yuri Weinstein
06:51 PM Bug #41017 (Pending Backport): Change default for bluestore_fsck_on_mount_deep as false
Neha Ojha
06:14 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
That OSD failure seems to have caused a cascade. Several more OSDs have crashed. 12% of objects were degraded, and I ... Nathan Fish
04:34 PM Bug #38724: _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1...
Just lost another OSD on 14.2.2. The cluster is still mostly empty, but a large parallel cp is going on, so rebuild w... Nathan Fish
12:20 AM Bug #41052 (Resolved): nautilus: cbt cosbench workloads failing in rados/perf suite
Neha Ojha

08/01/2019

11:15 PM Bug #41052 (Fix Under Review): nautilus: cbt cosbench workloads failing in rados/perf suite
Neha Ojha
10:28 PM Bug #41052: nautilus: cbt cosbench workloads failing in rados/perf suite
I think we need to backport https://github.com/ceph/ceph/pull/28442 to nautilus since master seems to be fine http://... Neha Ojha
10:06 PM Bug #41052 (Resolved): nautilus: cbt cosbench workloads failing in rados/perf suite
Run: http://pulpito.ceph.com/yuriw-2019-07-31_23:02:48-rados-wip-yuri6-testing-2019-07-31-1929-nautilus-distro-basic-... Yuri Weinstein
10:08 PM Backport #40180: nautilus: qa/standalone/scrub/osd-scrub-snaps.sh sometimes fails
David Zafman wrote:
> https://github.com/ceph/ceph/pull/29252
merged
Yuri Weinstein
04:18 PM Bug #40835 (In Progress): OSDCap.PoolClassRNS test aborts
Kefu Chai
03:22 PM Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
Suspect also in => http://pulpito.ceph.com/teuthology-2019-08-01_02:25:03-upgrade:luminous-x-mimic-distro-basic-smithi/ Yuri Weinstein
05:41 AM Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
Simplified test case.... Brad Hubbard
03:42 AM Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
Ok, slowly closing in on this I think.... Brad Hubbard
12:12 AM Backport #40940 (In Progress): nautilus: Update rocksdb to v6.1.2
https://github.com/ceph/ceph/pull/29440 Neha Ojha

07/31/2019

11:55 PM Bug #40969: rocksdb: enable rocksdb_rmrange=true by default and make delete range optional on num...
nautilus backport: https://github.com/ceph/ceph/pull/29439 Neha Ojha
11:50 PM Bug #40969: rocksdb: enable rocksdb_rmrange=true by default and make delete range optional on num...
let's backport https://github.com/ceph/ceph/pull/27317 and https://github.com/ceph/ceph/pull/29323 Neha Ojha
11:15 PM Backport #40465: nautilus: osd beacon sometimes has empty pg list
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/29254
merged
Yuri Weinstein
11:13 PM Backport #39743: nautilus: mon: "FAILED assert(pending_finishers.empty())" when paxos restart
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28528
merged
Yuri Weinstein
11:12 PM Backport #40382: nautilus: RuntimeError: expected MON_CLOCK_SKEW but got none
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/28576
merged
Yuri Weinstein
11:09 PM Backport #40274: nautilus: librados 'buffer::create' and related functions are not exported in C+...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/29244
merged
Yuri Weinstein
09:30 PM Bug #40820: standalone/scrub/osd-scrub-test.sh +3 day failed assert
Can the test be improved so it doesn't assume maps propagate in a certain short time period, but waits for the releva... Josh Durgin
03:31 PM Bug #40720: mimic, nautilus: make bitmap allocator the default allocator for bluestore
Neha Ojha wrote:
> Luminous PR: https://github.com/ceph/ceph/pull/28972
merged
Yuri Weinstein
03:03 AM Feature #40955: Extend the scrub sleep time when the period is outside [osd_scrub_begin_hour, osd...
Updated logic:... Jeegn Chen
03:00 AM Feature #40955: Extend the scrub sleep time when the period is outside [osd_scrub_begin_hour, osd...
One more scenoario dzafman's comment:
@Jeegn-Chen Another way a scrub could happen even with scrub_time_permit() r...
Jeegn Chen
12:58 AM Bug #40765: mimic: "Command failed (workunit test rados/test.sh)" in smoke/master/mimic
Client:... Brad Hubbard
12:12 AM Bug #41017 (Fix Under Review): Change default for bluestore_fsck_on_mount_deep as false
Neha Ojha
12:09 AM Bug #41017: Change default for bluestore_fsck_on_mount_deep as false
Neha - as discussed I have created the tracker and assigned it to you. Vikhyat Umrao
12:08 AM Bug #41017 (Resolved): Change default for bluestore_fsck_on_mount_deep as false
RHBZ - https://bugzilla.redhat.com/show_bug.cgi?id=1734585 Vikhyat Umrao
 

Also available in: Atom