Project

General

Profile

Activity

From 12/05/2019 to 01/03/2020

01/03/2020

11:54 PM Bug #43421 (Fix Under Review): mon spends too much time to build incremental osdmap
Neha Ojha
10:09 AM Bug #43421: mon spends too much time to build incremental osdmap
It takes 5 seconds to build 640 incremental osdmap for one client. simon gao
08:15 AM Bug #43421: mon spends too much time to build incremental osdmap
sorry. It took 5 seconds simon gao
11:49 PM Bug #43185 (Need More Info): ceph -s not showing client activity
super xor wrote:
> Possible relation to https://tracker.ceph.com/issues/43364 and https://tracker.ceph.com/issues/43...
Neha Ojha
10:48 PM Bug #43311 (Pending Backport): asynchronous recovery + backfill might spin pg undersized for a lo...
Neha Ojha
09:01 PM Feature #40870: Implement mon_memory_target
Another follow-on fix: https://github.com/ceph/ceph/pull/32473 Neha Ojha
09:00 PM Bug #43454 (Fix Under Review): ceph monitor crashes after updating 'mon_memory_target' config set...
Neha Ojha
08:24 AM Bug #43454 (Resolved): ceph monitor crashes after updating 'mon_memory_target' config setting.
Refer bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1760257 for more details. Sridhar Seshasayee
08:06 PM Backport #42197: nautilus: osd/PrimaryLogPG.cc: 13068: FAILED ceph_assert(obc)
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31028
merged
Yuri Weinstein
04:39 PM Bug #43334: nautilus: rados/test_envlibrados_for_rocksdb.sh broken packages with ubuntu_16.04.yaml
/a/yuriw-2019-12-23_20:23:51-rados-wip-yuri-testing-2019-12-16-2241-nautilus-distro-basic-smithi/4628899/ Neha Ojha

01/02/2020

03:41 PM Bug #43403: unittest_lockdep unreliable
Happened in https://github.com/ceph/ceph/pull/27792 (among others) Nathan Cutler

01/01/2020

11:01 AM Documentation #42315: Improve rados command usage, man page and turorial
RADOS(8) Ceph RADOS(8)
NAME
rados - rados object s...
Zac Dover
10:52 AM Documentation #42315: Improve rados command usage, man page and turorial
[zdover@192-168-1-112 ~]$ rados -h
usage: rados [options] [commands]
POOL COMMANDS
lspools ...
Zac Dover

12/25/2019

03:24 PM Bug #43422 (Resolved): qa/standalone/mon/osd-pool-create.sh fails to grep utf8 pool name
... Sage Weil
12:33 PM Bug #43421: mon spends too much time to build incremental osdmap
In my cluster , it took five minutes to 1300 versions of incremental osdmap.
patch: https://github.com/ceph/ceph/...
simon gao
09:49 AM Bug #43421 (Fix Under Review): mon spends too much time to build incremental osdmap
if a client's osdmap version is too low. mon spend too much time to build incremental osdmap.
Mon can't handle norma...
simon gao

12/24/2019

05:03 AM Bug #43308 (Pending Backport): negative num_objects can set PG_STATE_DEGRADED
Kefu Chai
05:02 AM Bug #42780 (Pending Backport): recursive lock of OpTracker::lock (70)
Kefu Chai
01:53 AM Bug #43413 (New): Virtual IP address of iface lo results in failing to start an OSD
We added a virtual IP on the loopback internetface lo to complete the LVS configuration.... gb li

12/23/2019

11:54 PM Bug #43412 (Resolved): cephadm ceph_manager IndexError: list index out of range
... Sage Weil
08:26 PM Backport #43140: nautilus: ceph-mon --mkfs: public_address type (v1|v2) is not respected
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32028
mergedReviewed-by: Ricardo Dias <rdias@suse.com>
Yuri Weinstein
02:18 PM Bug #43174: pgs inconsistent, union_shard_errors=missing
Hi David.
> Are you running your own Ceph build?
No, we use official (comunity) build.
> Sortbitwise needed to...
Aleksandr Rudenko

12/21/2019

03:06 PM Bug #43404 (Resolved): mon crash in OSDMap::_pg_to_raw_osds from update_pending_pgs
... Sage Weil

12/20/2019

11:39 PM Bug #42328 (Resolved): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
I can't check the original reports (logs have been removed), but assuming it's the same root cause PR #32382 5bb932c3... Samuel Just
01:31 AM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
I observed something similar on a ceph_test_rados teuthology run: sjust-2019-12-19_20:05:13-rados-wip-sjust-read-from... Samuel Just
11:37 PM Bug #43394 (Resolved): crimson::dmclock segv in crimson::IndIntruHeap
Should be fixed with PR #32380 2c9542901532feafd569d92e9f67ccd2e1af3129 Samuel Just
08:53 PM Bug #43403 (Resolved): unittest_lockdep unreliable
... Sage Weil
08:22 AM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
Hi David:
Good to know the bug is indeed fixed ... too bad it didn't make it in 13.2.8. Anyways ... building patch...
Stefan Kooman
04:50 AM Bug #38345 (In Progress): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Brad Hubbard
01:50 AM Bug #43174: pgs inconsistent, union_shard_errors=missing

Scrub incorrectly thinks the object really isn't there, but we know it is.
The way that you can see missing obje...
David Zafman

12/19/2019

11:57 PM Bug #42780 (Fix Under Review): recursive lock of OpTracker::lock (70)
https://github.com/ceph/ceph/pull/32364 Radoslaw Zarzynski
12:09 PM Bug #42780 (In Progress): recursive lock of OpTracker::lock (70)
Radoslaw Zarzynski
10:30 PM Bug #43307 (Fix Under Review): Remove use of rules batching for upmap balancer
David Zafman
10:27 PM Bug #43397 (Resolved): FS_DEGRADED to cluster log despite --no-mon-health-to-clog
... Sage Weil
09:38 PM Bug #43394 (Resolved): crimson::dmclock segv in crimson::IndIntruHeap
... Sage Weil
07:06 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
A backport to Mimic of the fix can be found here:
https://github.com/ceph/ceph/pull/32361
Or if you can build fro...
David Zafman
02:34 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
We added a CRUSH policy (replicated_nvme) and set this policy on our cephfs metadata pool (with 1.2 Bilion objects) a... Stefan Kooman
07:02 PM Backport #41584 (In Progress): mimic: backfill_toofull seen on cluster where the most full OSD is...
David Zafman
02:29 PM Bug #43306: segv in collect_sys_info
Neha Ojha wrote:
> This looks similar to https://tracker.ceph.com/issues/38296, though the mon seems to have been up...
Nathan Cutler
02:22 PM Backport #39474 (In Progress): luminous: segv in fgets() in collect_sys_info reading /proc/cpuinfo
Nathan Cutler
02:18 PM Bug #41383 (Resolved): scrub object count mismatch on device_health_metrics pool
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:14 PM Backport #42739 (Resolved): nautilus: scrub object count mismatch on device_health_metrics pool
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31735
m...
Nathan Cutler
07:39 AM Bug #43382: medium io/system load causes quorum failure
Or due to limited bandwidth? 10G NICs dedicated. Anonymous
07:36 AM Bug #43382 (New): medium io/system load causes quorum failure
We just found out that if you put some io pressure on your system by e.g. big rsync, the mon process has issues proba... Anonymous
05:44 AM Bug #43126 (Fix Under Review): OSD_SLOW_PING_TIME_BACK nits
David Zafman
02:20 AM Bug #43318: monitor mark all services(osd mgr) down
mgr has no log when setting the debug_mgr to 40. simon gao

12/18/2019

10:31 PM Bug #43193 (Need More Info): "ceph ping mon.<id>" cannot work
Can you provide the sequence of commands that fail? Also, please attach the monitor names and monmap. Neha Ojha
10:25 PM Bug #43305 (Won't Fix): "psutil.NoSuchProcess process no longer exists" error in luminous-x-nauti...
This is an infra issue.... Neha Ojha
10:23 PM Bug #43306: segv in collect_sys_info
This looks similar to https://tracker.ceph.com/issues/38296, though the mon seems to have been upgraded to nautilus(w... Neha Ojha
10:17 PM Bug #43318 (Need More Info): monitor mark all services(osd mgr) down
Can you provide mgr logs from when this happened? Neha Ojha
10:12 PM Feature #43377 (Resolved): Make Zstandard compression level a configurable option
I've played with using the different compression algorithms on the RGWs and the default compression level for Zstanda... Bryan Stillwell
07:38 PM Backport #42739: nautilus: scrub object count mismatch on device_health_metrics pool
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31735
merged
Yuri Weinstein
03:53 PM Backport #43316 (Resolved): nautilus:wrong datatype describing crush_rule
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32254
m...
Nathan Cutler
12:11 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
So it's asserting inside of to_timespan, and the Paxos code triggering that assert is
> auto start = ceph::coarse_...
Greg Farnum
12:03 PM Bug #43365 (Resolved): Nautilus: Random mon crashes in failed assertion at ceph::time_detail::sig...
Thanks to 14.2.5 auto warning for recent crashes, we are observing frequent (somewhat daily period) random crashes of... Alex Walender
09:35 AM Bug #43185: ceph -s not showing client activity
Possible relation to https://tracker.ceph.com/issues/43364 and https://tracker.ceph.com/issues/43317 Anonymous

12/17/2019

05:39 PM Bug #43308 (Fix Under Review): negative num_objects can set PG_STATE_DEGRADED
Neha Ojha
09:19 AM Backport #43346 (Resolved): nautilus: short pg log + cache tier ceph_test_rados out of order reply
https://github.com/ceph/ceph/pull/32848 Nathan Cutler
06:47 AM Bug #41950 (Can't reproduce): crimson compile
Kefu Chai
06:46 AM Bug #41950: crimson compile
i assume that you were trying to compile crimson-osd not crimson-old. please check the submodule of seastar to unders... Kefu Chai

12/16/2019

10:36 PM Bug #43296 (Need More Info): Ceph assimilate-conf results in config entries which can not be removed
Can you attach the (relevant) output from "ceph config-key dump | grep config"? I think the keys are being installed... Sage Weil
10:22 PM Bug #43296: Ceph assimilate-conf results in config entries which can not be removed
Might be related to #42964? Patrick Donnelly
10:06 PM Bug #43334 (Resolved): nautilus: rados/test_envlibrados_for_rocksdb.sh broken packages with ubunt...
Run: http://pulpito.ceph.com/yuriw-2019-12-15_16:25:11-rados-wip-yuri-nautilus-baseline_12.13.19-distro-basic-smithi/... Yuri Weinstein
08:36 PM Bug #38358 (Pending Backport): short pg log + cache tier ceph_test_rados out of order reply
Seen in nautilus: /a/yuriw-2019-12-15_16:25:11-rados-wip-yuri-nautilus-baseline_12.13.19-distro-basic-smithi/4605500/ Neha Ojha
12:40 PM Bug #43174 (New): pgs inconsistent, union_shard_errors=missing
Hmm this may be something else then. David, does it look familiar? Greg Farnum
08:40 AM Feature #43324: Make zlib windowBits configurable for compression
Xiyuan Wang wrote:
> Now the zlib windowBits is hardcoding as -15[1]. But it should be set to different value for di...
Xiyuan Wang
03:38 AM Feature #43324 (Resolved): Make zlib windowBits configurable for compression
Now the zlib windowBits is hardcoding as -15[1]. But it should be set to different value for different case.
Accor...
Xiyuan Wang
07:27 AM Backport #43325 (In Progress): luminous: wrong datatype describing crush_rule
Deepika Upadhyay
07:24 AM Backport #43325 (New): luminous: wrong datatype describing crush_rule
Deepika Upadhyay
07:24 AM Backport #43325 (Resolved): luminous: wrong datatype describing crush_rule
https://github.com/ceph/ceph/pull/32267 Deepika Upadhyay

12/15/2019

10:04 PM Documentation #41389 (Pending Backport): wrong datatype describing crush_rule
Nathan Cutler
03:55 PM Bug #38076 (Resolved): osds allows to partially start more than N+2
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:53 PM Feature #40528 (Resolved): Better default value for osd_snap_trim_sleep
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:53 PM Backport #43320 (Resolved): mimic: PeeringState::GoClean will call purge_strays unconditionally
https://github.com/ceph/ceph/pull/33329 Nathan Cutler
03:53 PM Backport #43319 (Resolved): nautilus: PeeringState::GoClean will call purge_strays unconditionally
https://github.com/ceph/ceph/pull/32847 Nathan Cutler
01:27 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
Looking at the historical test runs, it seems to have started after [1] but before [2].
[1] http://pulpito.ceph.co...
Jason Dillaman
01:30 AM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
http://qa-proxy.ceph.com/teuthology/teuthology-2019-12-02_02:01:02-rbd-master-distro-basic-smithi/4559106/teuthology.log Jason Dillaman
01:29 AM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
http://qa-proxy.ceph.com/teuthology/jdillaman-2019-12-14_17:15:11-rbd-wip-jd-testing-distro-basic-smithi/4603518/teut... Jason Dillaman
06:55 AM Bug #43318 (Need More Info): monitor mark all services(osd mgr) down
Suddenly, all mgrs and osds in my cluster began to be set to down by the monitor.
the log of monitor like this
```
...
simon gao

12/14/2019

08:28 AM Documentation #41389 (In Progress): wrong datatype describing crush_rule
Deepika Upadhyay
07:21 AM Documentation #41389 (Pending Backport): wrong datatype describing crush_rule
Deepika Upadhyay
02:42 AM Documentation #41389: wrong datatype describing crush_rule
Just needs a cherry-pick of 3ed3de6c964ba998d5b18ceb997d1a6dffe355db Neha Ojha
08:26 AM Backport #43315 (In Progress): mimic:wrong datatype describing crush_rule
Deepika Upadhyay
08:02 AM Backport #43315 (Resolved): mimic:wrong datatype describing crush_rule
https://github.com/ceph/ceph/pull/32255 Deepika Upadhyay
08:24 AM Backport #43316 (In Progress): nautilus:wrong datatype describing crush_rule
Deepika Upadhyay
08:03 AM Backport #43316 (Resolved): nautilus:wrong datatype describing crush_rule
https://github.com/ceph/ceph/pull/32254 Deepika Upadhyay
02:50 AM Bug #43307 (In Progress): Remove use of rules batching for upmap balancer
David Zafman
02:49 AM Bug #43312 (In Progress): Change default upmap_max_deviation to 5
David Zafman
02:06 AM Bug #43312 (Resolved): Change default upmap_max_deviation to 5
David Zafman
12:24 AM Bug #43311 (Resolved): asynchronous recovery + backfill might spin pg undersized for a long time
When an osd that is part of current up set gets chosen as an
async_recovery_target, it gets removed from the acting ...
xie xingguo
12:16 AM Bug #43308 (In Progress): negative num_objects can set PG_STATE_DEGRADED
Neha Ojha

12/13/2019

08:40 PM Bug #40963 (Resolved): mimic: MQuery during Deleting state
Sage Weil
08:40 PM Bug #41317 (Pending Backport): PeeringState::GoClean will call purge_strays unconditionally
Sage Weil
07:47 PM Bug #43308 (Resolved): negative num_objects can set PG_STATE_DEGRADED
... Neha Ojha
07:05 PM Bug #43296: Ceph assimilate-conf results in config entries which can not be removed
Alwin from Proxmox provided a work around but this still appears to be a bug:
https://forum.proxmox.com/threads/ceph...
David Herselman
04:51 PM Bug #43296: Ceph assimilate-conf results in config entries which can not be removed
Setting debug_rdb to 5/5 unfortunately doesn't reveal anything:
Commands:...
David Herselman
03:37 AM Bug #43296 (Resolved): Ceph assimilate-conf results in config entries which can not be removed
We assimilated our Ceph configuration file and subsequently have a minimal config file. We are subsequently not able ... David Herselman
04:31 PM Bug #43307 (Resolved): Remove use of rules batching for upmap balancer

Due to cost of calculations for very large PG/shard counts, we will settle for balancing each pool individually for...
David Zafman
03:43 PM Bug #25174 (Can't reproduce): osd: assert failure with FAILED assert(repop_queue.front() == repop...
Neha Ojha
02:43 PM Bug #43306 (Resolved): segv in collect_sys_info
Run: http://pulpito.ceph.com/teuthology-2019-12-13_02:25:03-upgrade:luminous-x-nautilus-distro-basic-smithi/
Job: '4...
Yuri Weinstein
02:40 PM Bug #43305 (Won't Fix): "psutil.NoSuchProcess process no longer exists" error in luminous-x-nauti...
Run: http://pulpito.ceph.com/teuthology-2019-12-13_02:25:03-upgrade:luminous-x-nautilus-distro-basic-smithi/
Jobs: '...
Yuri Weinstein
08:23 AM Backport #42259 (Resolved): nautilus: document new option mon_max_pg_per_osd
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31300
m...
Nathan Cutler
08:22 AM Backport #40947 (Resolved): luminous: Better default value for osd_snap_trim_sleep
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31857
m...
Nathan Cutler
08:22 AM Backport #38205 (Resolved): luminous: osds allows to partially start more than N+2
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31858
m...
Nathan Cutler
08:22 AM Backport #43093 (Resolved): luminous: Improve OSDMap::calc_pg_upmaps() efficiency
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31992
m...
Nathan Cutler
06:17 AM Bug #40712: ceph-mon crash with assert(err == 0) after rocksdb->get
we meet this problem recently.
we decline this related more to rocksdb but not ceph
huang jun

12/12/2019

04:41 PM Backport #40947: luminous: Better default value for osd_snap_trim_sleep
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31857
mergedReviewed-by: Josh Durgin <jdurgin@redhat.com>
Yuri Weinstein
04:41 PM Backport #38205: luminous: osds allows to partially start more than N+2
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31858
merged
Yuri Weinstein
04:40 PM Backport #43093: luminous: Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman wrote:
> https://github.com/ceph/ceph/pull/31992
merged
Yuri Weinstein
10:16 AM Bug #43174: pgs inconsistent, union_shard_errors=missing
Greg thanks for the reply.
Greg Farnum wrote:
> If you fetch an object in RGW and its backing RADOS objects are m...
Aleksandr Rudenko
09:41 AM Bug #38330 (Resolved): osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:23 AM Backport #43119 (Resolved): mimic: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32000
m...
Nathan Cutler
08:44 AM Bug #43193: "ceph ping mon.<id>" cannot work
The command "ceph ping mon.a" or "ceph ping mon.b" or "ceph ping mon.c" works fine.
If the mon id is not specified, ...
Min Shi
05:31 AM Bug #41317 (Fix Under Review): PeeringState::GoClean will call purge_strays unconditionally
Neha Ojha
12:04 AM Bug #43267 (Rejected): unexpected error in BlueStore::_txc_add_transaction
Jeff Layton
12:02 AM Bug #43267: unexpected error in BlueStore::_txc_add_transaction
Nope, it was full. Well spotted:... Jeff Layton

12/11/2019

11:28 PM Bug #43267: unexpected error in BlueStore::_txc_add_transaction

This is caused by an out of space condition that won't usually happen. Check your BlueStore configuration.
Is ...
David Zafman
10:21 PM Bug #43267: unexpected error in BlueStore::_txc_add_transaction
This is simply out-of-space condition, see:
-6> 2019-12-11T16:13:44.466-0500 7fcbe4ecd700 -1 bluestore(/build/ce...
Igor Fedotov
09:39 PM Bug #43267 (Rejected): unexpected error in BlueStore::_txc_add_transaction
I was testing kcephfs vs. a vstart cluster and the OSD crashed. fsstress was running at the time, so it was being kep... Jeff Layton
10:26 PM Bug #43268 (New): Restrict admin socket commands more from the Ceph tool
https://bugzilla.redhat.com/show_bug.cgi?id=1780458
It sounds like we've given admin socket access to any cephx us...
Greg Farnum
10:17 PM Bug #43106 (Resolved): mimic: crash in build_incremental_map_msg
Marking this resolved as all the backports are now in place. Neha Ojha
10:17 PM Bug #43174 (Closed): pgs inconsistent, union_shard_errors=missing
If you fetch an object in RGW and its backing RADOS objects are missing, it just fills in the space with zeros. It so... Greg Farnum
10:15 PM Bug #43173 (Duplicate): pgs inconsistent, union_shard_errors=missing
Neha Ojha
08:07 PM Bug #43266 (Fix Under Review): common: admin socket compiler warning
Patrick Donnelly
08:03 PM Bug #43266 (Resolved): common: admin socket compiler warning
... Patrick Donnelly
01:38 PM Backport #43257 (Resolved): mimic: monitor config store: Deleting logging config settings does no...
https://github.com/ceph/ceph/pull/33327 Nathan Cutler
01:38 PM Backport #43256 (Resolved): nautilus: monitor config store: Deleting logging config settings does...
https://github.com/ceph/ceph/pull/32846 Nathan Cutler
04:05 AM Bug #42964 (Pending Backport): monitor config store: Deleting logging config settings does not de...
Sage Weil

12/10/2019

08:44 PM Backport #40890 (In Progress): mimic: Pool settings aren't populated to OSD after restart.
Nathan Cutler
08:41 PM Backport #40891 (In Progress): nautilus: Pool settings aren't populated to OSD after restart.
Nathan Cutler
08:34 PM Backport #43246 (Resolved): nautilus: Nearfull warnings are incorrect
https://github.com/ceph/ceph/pull/32773 Nathan Cutler
08:29 PM Backport #43245 (Resolved): nautilus: osd: increase priority in certain OSD perf counters
https://github.com/ceph/ceph/pull/32845 Nathan Cutler
08:25 PM Backport #43239 (Resolved): nautilus: ok-to-stop incorrect for some ec pgs
https://github.com/ceph/ceph/pull/32844 Nathan Cutler
08:24 PM Backport #43232 (Rejected): nautilus: pgs stuck in laggy state
Nathan Cutler
04:10 PM Bug #42346 (Pending Backport): Nearfull warnings are incorrect
David Zafman
03:26 PM Bug #42961 (Pending Backport): osd: increase priority in certain OSD perf counters
Neha Ojha
02:51 PM Bug #43189 (Pending Backport): pgs stuck in laggy state
I'm not sure whether we should backport this to nautilus or not. We only noticed qa failures because the new octopus... Sage Weil
02:50 PM Bug #43189 (Resolved): pgs stuck in laggy state
Sage Weil
01:48 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
/a/yuriw-2019-12-06_21:30:44-upgrade:mimic-x-nautilus-distro-basic-smithi/4576681 Neha Ojha

12/09/2019

10:07 PM Bug #43067: Git Master: src/compressor/zlib/ZlibCompressor.cc / src/compressor/zlib/CMakeLists.txt
Thanks Lee!
We generally do patch contributions through Github; can you submit a PR there?
If not, we need a spec...
Greg Farnum
09:53 PM Bug #43176 (Duplicate): pgs inconsistent, union_shard_errors=missing
Nathan Cutler
09:53 PM Bug #43175 (Duplicate): pgs inconsistent, union_shard_errors=missing
Nathan Cutler
09:35 PM Bug #43151 (Pending Backport): ok-to-stop incorrect for some ec pgs
Sage Weil
04:58 PM Bug #43189 (Fix Under Review): pgs stuck in laggy state
Sage Weil
03:15 PM Bug #43189: pgs stuck in laggy state
The problem is the role. The proc_lease() method does this check... Sage Weil
02:33 PM Bug #43189 (In Progress): pgs stuck in laggy state
Sage Weil
04:50 PM Bug #43213 (New): OSDMap::pg_to_up_acting etc specify primary as osd, not pg_shard_t(osd+shard)
The OSD methods to map a PG return primary as an int, not pg_shard_t (osd + shard).
Objecter compensates for this ...
Sage Weil
04:06 PM Bug #40963: mimic: MQuery during Deleting state
/a/sage-2019-12-08_05:43:33-rados-nautilus-distro-basic-smithi/4580545 Neha Ojha
12:59 PM Backport #40890: mimic: Pool settings aren't populated to OSD after restart.
Here's my attempt at the backport: https://github.com/ceph/ceph/pull/32125 Dan van der Ster
12:53 PM Backport #40891: nautilus: Pool settings aren't populated to OSD after restart.
Here's my attempt at the backport: https://github.com/ceph/ceph/pull/32123 Dan van der Ster
08:55 AM Bug #43193 (Rejected): "ceph ping mon.<id>" cannot work
The command "ceph ping mon.<id>" returns an error output:... Min Shi
06:35 AM Bug #42706: LibRadosList.EnumerateObjectsSplit fails
rados_cluster handler will be freed if set_pg_num failed,... huang jun
03:35 AM Bug #42861: Libceph-common.so needs to use private link attribute when including dpdk static library
The dpdk library initializes the EAL using constructors and global
variables, and cannot be re-initialized. Both tes...
chunsong feng

12/08/2019

11:22 PM Bug #43190 (New): qa/standalone/osd/osd-recovery-prio.sh has a race

http://pulpito.ceph.com/dzafman-2019-12-08_11:51:45-rados-master-distro-basic-smithi/4582053/
The test expected ...
David Zafman
09:25 PM Bug #43189: pgs stuck in laggy state
more logs here:
/a/sage-2019-12-07_18:31:18-rados:thrash-erasure-code-wip-sage3-testing-2019-12-05-0959-distro-basic...
Sage Weil
09:23 PM Bug #43189 (Resolved): pgs stuck in laggy state
... Sage Weil

12/07/2019

06:28 PM Bug #43150 (Resolved): osd-scrub-snaps.sh fails
Sage Weil
02:47 PM Bug #41313: PG distribution completely messed up since Nautilus
ceph balancer status
{
"active": true,
"plans": [],
"mode": "upmap"
}
bad distribution:
<p...
Anonymous
02:45 PM Bug #43185: ceph -s not showing client activity
ceph -s only looks like this:
ceph -s
cluster:
id: c4068f25-d46d-438d-af63-5679a2d56efb
health: H...
Anonymous
02:44 PM Bug #43185 (Resolved): ceph -s not showing client activity
Since Nautilus upgrade ceph -s often (2 out of 3 times) does not show any client or recovery activity. Right now it's... Anonymous

12/06/2019

05:21 PM Bug #42964 (Fix Under Review): monitor config store: Deleting logging config settings does not de...
Sage Weil
04:07 PM Bug #42347: nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_flight_list.back(...
Seen in this scrub test run during osd-scrub-repair.sh.
http://pulpito.ceph.com/dzafman-2019-12-05_19:53:40-rados-...
David Zafman
02:01 PM Bug #43176 (Duplicate): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
02:01 PM Bug #43175 (Duplicate): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
02:01 PM Bug #43174 (Resolved): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
02:00 PM Bug #43173 (Duplicate): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
12:55 PM Backport #42997 (In Progress): nautilus: acting_recovery_backfill won't catch all up peers
Nathan Cutler
12:48 PM Backport #42878 (In Progress): nautilus: ceph_test_admin_socket_output fails in rados qa suite
Nathan Cutler
12:48 PM Backport #42853 (In Progress): nautilus: format error: ceph osd stat --format=json
Nathan Cutler
12:47 PM Backport #42847 (Need More Info): mimic: "failing miserably..." in Infiniband.cc
non-trivial Nathan Cutler
12:47 PM Backport #42848 (Need More Info): nautilus: "failing miserably..." in Infiniband.cc
non-trivial Nathan Cutler
04:23 AM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
Oops. I think the more significant issue is that short_pg_log.yaml isn't involved. David Zafman
02:09 AM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
David Zafman wrote:
> Seen in a non-upgrade test:
This is an upgrade test: "rados/upgrade/jewel-x-singleton/{0-c...
Neha Ojha
02:00 AM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
Seen in a -non-upgrade- test with description:
rados/upgrade/jewel-x-singleton/{0-cluster/{openstack.yaml start.ya...
David Zafman

12/05/2019

11:28 PM Bug #41240 (Can't reproduce): All of the cluster SSDs aborted at around the same time and will no...
Brad Hubbard
09:37 PM Bug #41240 (New): All of the cluster SSDs aborted at around the same time and will not start.
Patrick Donnelly
11:24 PM Bug #38892 (Closed): /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation...
Brad Hubbard
09:45 PM Bug #38892 (Fix Under Review): /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Se...
Patrick Donnelly
09:44 PM Bug #23590 (Fix Under Review): kstore: statfs: (95) Operation not supported
Patrick Donnelly
09:44 PM Bug #23297 (Fix Under Review): mon-seesaw 'failed to become clean before timeout' due to laggy pg...
Patrick Donnelly
09:43 PM Bug #13111 (Fix Under Review): replicatedPG:the assert occurs in the fuction ReplicatedPG::on_loc...
Patrick Donnelly
09:40 PM Feature #38653 (New): Enhance health message when pool quota fills up
Patrick Donnelly
09:40 PM Bug #38783 (New): Changing mon_pg_warn_max_object_skew has no effect.
Patrick Donnelly
09:40 PM Feature #3764 (New): osd: async replicas
Patrick Donnelly
09:37 PM Bug #43048 (New): nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
Patrick Donnelly
09:37 PM Bug #42918 (New): memory corruption and lockups with I-Object
Patrick Donnelly
09:37 PM Bug #42780 (New): recursive lock of OpTracker::lock (70)
Patrick Donnelly
09:37 PM Bug #42706 (New): LibRadosList.EnumerateObjectsSplit fails
Patrick Donnelly
09:37 PM Bug #42666 (New): mgropen from mgr comes from unknown.$id instead of mgr.$id
Patrick Donnelly
09:37 PM Bug #42186 (New): "2019-10-04T19:31:51.053283+0000 osd.7 (osd.7) 108 : cluster [ERR] 2.5s0 shard ...
Patrick Donnelly
09:37 PM Bug #41406 (New): common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient boot...
Patrick Donnelly
09:37 PM Bug #40963 (New): mimic: MQuery during Deleting state
Patrick Donnelly
06:31 PM Bug #40963: mimic: MQuery during Deleting state
yuriw-2019-12-04_22:44:10-rados-wip-yuri2-testing-2019-12-04-1938-mimic-distro-basic-smithi/4567200/
DeleteStart e...
David Zafman
09:37 PM Bug #40868 (New): src/common/config_proxy.h: 70: FAILED ceph_assert(p != obs_call_gate.end())
Patrick Donnelly
09:37 PM Bug #40820 (New): standalone/scrub/osd-scrub-test.sh +3 day failed assert
Patrick Donnelly
09:37 PM Bug #40666 (New): osd fails to get latest map
Patrick Donnelly
09:37 PM Fix #40564 (New): Objecter does not have perfcounters for op latency
Patrick Donnelly
09:37 PM Bug #40522 (New): on_local_recover doesn't touch?
Patrick Donnelly
09:37 PM Bug #40454 (New): snap_mapper error, scrub gets r -2..repaired
Patrick Donnelly
09:37 PM Bug #40521 (New): cli timeout (e.g., ceph pg dump)
Patrick Donnelly
09:37 PM Bug #40367 (New): "*** Caught signal (Segmentation fault) **" in upgrade:luminous-x-nautilus
Patrick Donnelly
09:37 PM Bug #40410 (New): ceph pg query Segmentation fault in 12.2.10
Patrick Donnelly
09:36 PM Feature #39966 (New): mon: allow log messages to be throttled and/or force trimming
Patrick Donnelly
09:36 PM Bug #40000 (New): osds do not bound xattrs and/or aggregate xattr data in pg log
Patrick Donnelly
09:36 PM Bug #39366 (New): ClsLock.TestRenew failure
Patrick Donnelly
09:36 PM Bug #39145 (New): luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine eve...
Patrick Donnelly
09:36 PM Bug #39148 (New): luminous: powercycle: reached maximum tries (500) after waiting for 3000 seconds
Patrick Donnelly
09:36 PM Bug #39039 (New): mon connection reset, command not resent
Patrick Donnelly
09:36 PM Fix #39071 (New): monclient: initial probe is non-optimal with v2+v1
Patrick Donnelly
09:36 PM Bug #38656 (New): scrub reservation leak?
Patrick Donnelly
09:36 PM Bug #38718 (New): 'osd crush weight-set create-compat' (and other OSDMonitor commands) can leak u...
Patrick Donnelly
09:36 PM Bug #38624 (New): crush: get_rule_weight_osd_map does not handle multi-take rules
Patrick Donnelly
09:36 PM Bug #38513 (New): luminous: "AsyncReserver.h: 190: FAILED assert(!queue_pointers.count(item) && !...
Patrick Donnelly
09:36 PM Bug #38402 (New): ceph-objectstore-tool on down osd w/ not enough in osds
Patrick Donnelly
09:36 PM Bug #38417 (New): ceph tell mon.a help timeout
Patrick Donnelly
09:36 PM Bug #38357 (New): ClsLock.TestExclusiveEphemeralStealEphemeral failed
Patrick Donnelly
09:36 PM Bug #38358 (New): short pg log + cache tier ceph_test_rados out of order reply
Patrick Donnelly
09:36 PM Bug #38195 (New): osd-backfill-space.sh exposes rocksdb hang
Patrick Donnelly
09:36 PM Bug #38345 (New): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Patrick Donnelly
09:36 PM Bug #38184 (New): osd: recovery does not preserve copy-on-write allocations between object clones...
Patrick Donnelly
09:36 PM Bug #38159 (New): ec does not recover below min_size
Patrick Donnelly
09:36 PM Bug #38172 (New): segv in rocksdb NewIterator
Patrick Donnelly
09:36 PM Bug #38151 (New): cephx: service ticket validity dobuled
Patrick Donnelly
09:36 PM Bug #38082 (New): mimic: mon/caps.sh fails with "Expected return 0, got 110"
Patrick Donnelly
09:36 PM Bug #38064 (New): librados::OPERATION_FULL_TRY not completely implemented, test LibRadosAio.PoolQ...
Patrick Donnelly
09:36 PM Bug #37582 (New): luminous: ceph -s client gets all mgrmaps
Patrick Donnelly
09:36 PM Bug #37532 (New): mon: expected_num_objects warning triggers on bluestore-only setups
Patrick Donnelly
09:36 PM Bug #37509 (New): require past_interval bounds mismatch due to osd oldest_map
Patrick Donnelly
09:36 PM Bug #36748 (New): ms_deliver_verify_authorizer no AuthAuthorizeHandler found for protocol 0
Patrick Donnelly
09:36 PM Bug #37289 (New): Issue with overfilled OSD for cache-tier pools
Patrick Donnelly
09:36 PM Bug #36634 (New): LibRadosWatchNotify.WatchNotify2Timeout failure
Patrick Donnelly
09:36 PM Bug #36337 (New): OSDs crash with failed assertion in PGLog::merge_log as logs do not overlap
Patrick Donnelly
09:36 PM Bug #36164 (New): cephtool/test fails 'ceph tell mon.a help' with EINTR
Patrick Donnelly
09:36 PM Bug #36113 (New): fusestore test umount failed?
Patrick Donnelly
09:36 PM Bug #35075 (New): copy-get stuck sending osd_op
Patrick Donnelly
09:36 PM Bug #36040 (New): mon: Valgrind: mon (InvalidFree, InvalidWrite, InvalidRead)
Patrick Donnelly
09:36 PM Bug #24874 (New): ec fast reads can trigger read errors in log
Patrick Donnelly
09:36 PM Bug #26891 (New): backfill reservation deadlock/stall
Patrick Donnelly
09:36 PM Bug #24242 (New): tcmalloc::ThreadCache::ReleaseToCentralCache on rhel (w/ centos packages)
Patrick Donnelly
09:36 PM Bug #24339 (New): FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in sca...
Patrick Donnelly
09:36 PM Bug #23965 (New): FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cach...
Patrick Donnelly
09:36 PM Bug #23857 (New): flush (manifest) vs async recovery causes out of order op
Patrick Donnelly
09:36 PM Bug #23879 (New): test_mon_osdmap_prune.sh fails
Patrick Donnelly
09:36 PM Bug #23828 (New): ec gen object leaks into different filestore collection just after split
Patrick Donnelly
09:36 PM Bug #23760 (New): mon: `config get <who>` does not allow `who` as 'mon'/'osd'
Patrick Donnelly
09:36 PM Bug #23767 (New): "ceph ping mon" doesn't work
Patrick Donnelly
09:36 PM Bug #23270 (New): failed mutex assert in PipeConnection::try_get_pipe() (via OSD::do_command())
Patrick Donnelly
09:36 PM Bug #23428 (New): Snapset inconsistency is hard to diagnose because authoritative copy used by li...
Patrick Donnelly
09:36 PM Bug #23029 (New): osd does not handle eio on meta objects (e.g., osdmap)
Patrick Donnelly
09:36 PM Bug #22656 (New): scrub mismatch on bytes (cache pools)
Patrick Donnelly
09:36 PM Bug #21592 (New): LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
Patrick Donnelly
09:36 PM Bug #21495 (New): src/osd/OSD.cc: 346: FAILED assert(piter != rev_pending_splits.end())
Patrick Donnelly
09:36 PM Bug #21129 (New): 'ceph -s' hang
Patrick Donnelly
09:36 PM Bug #21194 (New): mon clock skew test is fragile
Patrick Donnelly
09:36 PM Bug #20960 (New): ceph_test_rados: mismatched version (due to pg import/export)
Patrick Donnelly
09:35 PM Bug #20952 (New): Glitchy monitor quorum causes spurious test failure
Patrick Donnelly
09:35 PM Bug #20922 (New): misdirected op with localize_reads set
Patrick Donnelly
09:35 PM Bug #20846 (New): ceph_test_rados_list_parallel: options dtor racing with DispatchQueue lockdep -...
Patrick Donnelly
09:35 PM Bug #20770 (New): test_pidfile.sh test is failing 2 places
Patrick Donnelly
09:35 PM Bug #20730 (New): need new OSD_SKEWED_USAGE implementation
Patrick Donnelly
09:35 PM Bug #20370 (New): leaked MOSDOp via PrimaryLogPG::_copy_some and PrimaryLogPG::do_proxy_write
Patrick Donnelly
09:35 PM Bug #20646 (New): run_seed_to_range.sh: segv, tp_fstore_op timeout
Patrick Donnelly
09:35 PM Bug #20360 (New): rados/verify valgrind tests: osds fail to start (xenial valgrind)
Patrick Donnelly
09:35 PM Bug #20369 (New): segv in OSD::ShardedOpWQ::_process
Patrick Donnelly
09:35 PM Bug #20221 (New): kill osd + osd out leads to stale PGs
Patrick Donnelly
09:35 PM Bug #20169 (New): filestore+btrfs occasionally returns ENOSPC
Patrick Donnelly
09:35 PM Bug #20053 (New): crush compile / decompile looses precision on weight
Patrick Donnelly
09:35 PM Bug #19700 (New): OSD remained up despite cluster network being inactive?
Patrick Donnelly
09:35 PM Bug #19486 (New): Rebalancing can propagate corrupt copy of replicated object
Patrick Donnelly
09:35 PM Bug #19518 (New): log entry does not include per-op rvals?
Patrick Donnelly
09:35 PM Bug #19440 (New): osd: trims maps taht pgs haven't consumed yet when there are gaps
Patrick Donnelly
09:35 PM Bug #17257 (New): ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
Patrick Donnelly
09:35 PM Bug #15015 (New): prepare_new_pool doesn't return failure string ss
Patrick Donnelly
09:35 PM Bug #14115 (New): crypto: race in nss init
Patrick Donnelly
09:35 PM Bug #13385 (New): cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final ro...
Patrick Donnelly
09:35 PM Bug #12687 (New): osd thrashing + pg import/export can cause maybe_went_rw intervals to be missed
Patrick Donnelly
09:35 PM Bug #12615 (New): Repair of Erasure Coded pool with an unrepairable object causes pg state to los...
Patrick Donnelly
09:35 PM Bug #11235 (New): test_rados.py test_aio_read is racy
Patrick Donnelly
09:35 PM Bug #9606 (New): mon: ambiguous error_status returned to user when type is wrong in a command
Patrick Donnelly
08:31 PM Bug #43151 (Fix Under Review): ok-to-stop incorrect for some ec pgs
Sage Weil
04:33 PM Bug #43151 (Resolved): ok-to-stop incorrect for some ec pgs
before,... Sage Weil
08:16 PM Backport #43119: mimic: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32000
merged
Yuri Weinstein
08:01 PM Backport #41238: nautilus: Implement mon_memory_target
Follow-on fix: https://github.com/ceph/ceph/pull/32045 Neha Ojha
08:00 PM Feature #40870: Implement mon_memory_target
This has a follow-on fix: https://github.com/ceph/ceph/pull/32044 Neha Ojha
06:01 PM Bug #38040: osd_map_message_max default is too high?
Luminous backport analysis:
* https://github.com/ceph/ceph/pull/26340 - two of three commits backported to luminou...
Nathan Cutler
05:50 PM Bug #43150 (In Progress): osd-scrub-snaps.sh fails
David Zafman
05:21 PM Bug #43150: osd-scrub-snaps.sh fails
During testing I saw this even though it isn't what happened in the teuthology runs. I think in all cases we have sc... David Zafman
03:51 PM Bug #43150 (Resolved): osd-scrub-snaps.sh fails
/a/sage-2019-12-04_19:33:15-rados-wip-sage2-testing-2019-12-04-0856-distro-basic-smithi/4567061
/a/sage-2019-12-04_1...
Sage Weil
05:41 PM Bug #43106: mimic: crash in build_incremental_map_msg
The three PRs that need to be backported to mimic are:
* https://github.com/ceph/ceph/pull/26340 - backported to m...
Nathan Cutler
01:41 PM Backport #43140 (In Progress): nautilus: ceph-mon --mkfs: public_address type (v1|v2) is not resp...
Nathan Cutler
11:07 AM Backport #43140 (Resolved): nautilus: ceph-mon --mkfs: public_address type (v1|v2) is not respected
https://github.com/ceph/ceph/pull/32028 Nathan Cutler
01:34 PM Bug #42485: verify_upmaps can not cancel invalid upmap_items in some cases
NOTE: https://github.com/ceph/ceph/pull/31131 was merged to master and backported to nautilus and luminous, before it... Nathan Cutler
04:04 AM Bug #42485 (Resolved): verify_upmaps can not cancel invalid upmap_items in some cases
David Zafman
01:32 PM Backport #42547: nautilus: verify_upmaps can not cancel invalid upmap_items in some cases
NOTE: reverted by https://github.com/ceph/ceph/pull/32018 Nathan Cutler
01:30 PM Backport #42548: luminous: verify_upmaps can not cancel invalid upmap_items in some cases
Note: reverted by https://github.com/ceph/ceph/pull/32019 Nathan Cutler
07:52 AM Bug #42906 (Pending Backport): ceph-mon --mkfs: public_address type (v1|v2) is not respected
Kefu Chai
06:22 AM Bug #37968 (Resolved): maybe_remove_pg_upmaps incorrectly cancels valid pending upmaps
David Zafman
06:21 AM Backport #38163 (Resolved): mimic: maybe_remove_pg_upmaps incorrectly cancels valid pending upmaps
David Zafman
04:04 AM Backport #42546 (Rejected): mimic: verify_upmaps can not cancel invalid upmap_items in some cases
This change has been reverted so we won't backport. David Zafman
12:43 AM Bug #43124: Probably legal crush rules cause upmaps to be cleaned

We are reverting the original pull request which changed verify_upmaps(): https://github.com/ceph/ceph/pull/31131
...
David Zafman
 

Also available in: Atom