Project

General

Profile

Activity

From 11/21/2019 to 12/20/2019

12/20/2019

11:39 PM Bug #42328 (Resolved): osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
I can't check the original reports (logs have been removed), but assuming it's the same root cause PR #32382 5bb932c3... Samuel Just
01:31 AM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
I observed something similar on a ceph_test_rados teuthology run: sjust-2019-12-19_20:05:13-rados-wip-sjust-read-from... Samuel Just
11:37 PM Bug #43394 (Resolved): crimson::dmclock segv in crimson::IndIntruHeap
Should be fixed with PR #32380 2c9542901532feafd569d92e9f67ccd2e1af3129 Samuel Just
08:53 PM Bug #43403 (Resolved): unittest_lockdep unreliable
... Sage Weil
08:22 AM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
Hi David:
Good to know the bug is indeed fixed ... too bad it didn't make it in 13.2.8. Anyways ... building patch...
Stefan Kooman
04:50 AM Bug #38345 (In Progress): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Brad Hubbard
01:50 AM Bug #43174: pgs inconsistent, union_shard_errors=missing

Scrub incorrectly thinks the object really isn't there, but we know it is.
The way that you can see missing obje...
David Zafman

12/19/2019

11:57 PM Bug #42780 (Fix Under Review): recursive lock of OpTracker::lock (70)
https://github.com/ceph/ceph/pull/32364 Radoslaw Zarzynski
12:09 PM Bug #42780 (In Progress): recursive lock of OpTracker::lock (70)
Radoslaw Zarzynski
10:30 PM Bug #43307 (Fix Under Review): Remove use of rules batching for upmap balancer
David Zafman
10:27 PM Bug #43397 (Resolved): FS_DEGRADED to cluster log despite --no-mon-health-to-clog
... Sage Weil
09:38 PM Bug #43394 (Resolved): crimson::dmclock segv in crimson::IndIntruHeap
... Sage Weil
07:06 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
A backport to Mimic of the fix can be found here:
https://github.com/ceph/ceph/pull/32361
Or if you can build fro...
David Zafman
02:34 PM Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%
We added a CRUSH policy (replicated_nvme) and set this policy on our cephfs metadata pool (with 1.2 Bilion objects) a... Stefan Kooman
07:02 PM Backport #41584 (In Progress): mimic: backfill_toofull seen on cluster where the most full OSD is...
David Zafman
02:29 PM Bug #43306: segv in collect_sys_info
Neha Ojha wrote:
> This looks similar to https://tracker.ceph.com/issues/38296, though the mon seems to have been up...
Nathan Cutler
02:22 PM Backport #39474 (In Progress): luminous: segv in fgets() in collect_sys_info reading /proc/cpuinfo
Nathan Cutler
02:18 PM Bug #41383 (Resolved): scrub object count mismatch on device_health_metrics pool
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:14 PM Backport #42739 (Resolved): nautilus: scrub object count mismatch on device_health_metrics pool
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31735
m...
Nathan Cutler
07:39 AM Bug #43382: medium io/system load causes quorum failure
Or due to limited bandwidth? 10G NICs dedicated. Anonymous
07:36 AM Bug #43382 (New): medium io/system load causes quorum failure
We just found out that if you put some io pressure on your system by e.g. big rsync, the mon process has issues proba... Anonymous
05:44 AM Bug #43126 (Fix Under Review): OSD_SLOW_PING_TIME_BACK nits
David Zafman
02:20 AM Bug #43318: monitor mark all services(osd mgr) down
mgr has no log when setting the debug_mgr to 40. simon gao

12/18/2019

10:31 PM Bug #43193 (Need More Info): "ceph ping mon.<id>" cannot work
Can you provide the sequence of commands that fail? Also, please attach the monitor names and monmap. Neha Ojha
10:25 PM Bug #43305 (Won't Fix): "psutil.NoSuchProcess process no longer exists" error in luminous-x-nauti...
This is an infra issue.... Neha Ojha
10:23 PM Bug #43306: segv in collect_sys_info
This looks similar to https://tracker.ceph.com/issues/38296, though the mon seems to have been upgraded to nautilus(w... Neha Ojha
10:17 PM Bug #43318 (Need More Info): monitor mark all services(osd mgr) down
Can you provide mgr logs from when this happened? Neha Ojha
10:12 PM Feature #43377 (Resolved): Make Zstandard compression level a configurable option
I've played with using the different compression algorithms on the RGWs and the default compression level for Zstanda... Bryan Stillwell
07:38 PM Backport #42739: nautilus: scrub object count mismatch on device_health_metrics pool
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31735
merged
Yuri Weinstein
03:53 PM Backport #43316 (Resolved): nautilus:wrong datatype describing crush_rule
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32254
m...
Nathan Cutler
12:11 PM Bug #43365: Nautilus: Random mon crashes in failed assertion at ceph::time_detail::signedspan
So it's asserting inside of to_timespan, and the Paxos code triggering that assert is
> auto start = ceph::coarse_...
Greg Farnum
12:03 PM Bug #43365 (Resolved): Nautilus: Random mon crashes in failed assertion at ceph::time_detail::sig...
Thanks to 14.2.5 auto warning for recent crashes, we are observing frequent (somewhat daily period) random crashes of... Alex Walender
09:35 AM Bug #43185: ceph -s not showing client activity
Possible relation to https://tracker.ceph.com/issues/43364 and https://tracker.ceph.com/issues/43317 Anonymous

12/17/2019

05:39 PM Bug #43308 (Fix Under Review): negative num_objects can set PG_STATE_DEGRADED
Neha Ojha
09:19 AM Backport #43346 (Resolved): nautilus: short pg log + cache tier ceph_test_rados out of order reply
https://github.com/ceph/ceph/pull/32848 Nathan Cutler
06:47 AM Bug #41950 (Can't reproduce): crimson compile
Kefu Chai
06:46 AM Bug #41950: crimson compile
i assume that you were trying to compile crimson-osd not crimson-old. please check the submodule of seastar to unders... Kefu Chai

12/16/2019

10:36 PM Bug #43296 (Need More Info): Ceph assimilate-conf results in config entries which can not be removed
Can you attach the (relevant) output from "ceph config-key dump | grep config"? I think the keys are being installed... Sage Weil
10:22 PM Bug #43296: Ceph assimilate-conf results in config entries which can not be removed
Might be related to #42964? Patrick Donnelly
10:06 PM Bug #43334 (Resolved): nautilus: rados/test_envlibrados_for_rocksdb.sh broken packages with ubunt...
Run: http://pulpito.ceph.com/yuriw-2019-12-15_16:25:11-rados-wip-yuri-nautilus-baseline_12.13.19-distro-basic-smithi/... Yuri Weinstein
08:36 PM Bug #38358 (Pending Backport): short pg log + cache tier ceph_test_rados out of order reply
Seen in nautilus: /a/yuriw-2019-12-15_16:25:11-rados-wip-yuri-nautilus-baseline_12.13.19-distro-basic-smithi/4605500/ Neha Ojha
12:40 PM Bug #43174 (New): pgs inconsistent, union_shard_errors=missing
Hmm this may be something else then. David, does it look familiar? Greg Farnum
08:40 AM Feature #43324: Make zlib windowBits configurable for compression
Xiyuan Wang wrote:
> Now the zlib windowBits is hardcoding as -15[1]. But it should be set to different value for di...
Xiyuan Wang
03:38 AM Feature #43324 (Resolved): Make zlib windowBits configurable for compression
Now the zlib windowBits is hardcoding as -15[1]. But it should be set to different value for different case.
Accor...
Xiyuan Wang
07:27 AM Backport #43325 (In Progress): luminous: wrong datatype describing crush_rule
Deepika Upadhyay
07:24 AM Backport #43325 (New): luminous: wrong datatype describing crush_rule
Deepika Upadhyay
07:24 AM Backport #43325 (Resolved): luminous: wrong datatype describing crush_rule
https://github.com/ceph/ceph/pull/32267 Deepika Upadhyay

12/15/2019

10:04 PM Documentation #41389 (Pending Backport): wrong datatype describing crush_rule
Nathan Cutler
03:55 PM Bug #38076 (Resolved): osds allows to partially start more than N+2
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:53 PM Feature #40528 (Resolved): Better default value for osd_snap_trim_sleep
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:53 PM Backport #43320 (Resolved): mimic: PeeringState::GoClean will call purge_strays unconditionally
https://github.com/ceph/ceph/pull/33329 Nathan Cutler
03:53 PM Backport #43319 (Resolved): nautilus: PeeringState::GoClean will call purge_strays unconditionally
https://github.com/ceph/ceph/pull/32847 Nathan Cutler
01:27 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
Looking at the historical test runs, it seems to have started after [1] but before [2].
[1] http://pulpito.ceph.co...
Jason Dillaman
01:30 AM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
http://qa-proxy.ceph.com/teuthology/teuthology-2019-12-02_02:01:02-rbd-master-distro-basic-smithi/4559106/teuthology.log Jason Dillaman
01:29 AM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
http://qa-proxy.ceph.com/teuthology/jdillaman-2019-12-14_17:15:11-rbd-wip-jd-testing-distro-basic-smithi/4603518/teut... Jason Dillaman
06:55 AM Bug #43318 (Need More Info): monitor mark all services(osd mgr) down
Suddenly, all mgrs and osds in my cluster began to be set to down by the monitor.
the log of monitor like this
```
...
simon gao

12/14/2019

08:28 AM Documentation #41389 (In Progress): wrong datatype describing crush_rule
Deepika Upadhyay
07:21 AM Documentation #41389 (Pending Backport): wrong datatype describing crush_rule
Deepika Upadhyay
02:42 AM Documentation #41389: wrong datatype describing crush_rule
Just needs a cherry-pick of 3ed3de6c964ba998d5b18ceb997d1a6dffe355db Neha Ojha
08:26 AM Backport #43315 (In Progress): mimic:wrong datatype describing crush_rule
Deepika Upadhyay
08:02 AM Backport #43315 (Resolved): mimic:wrong datatype describing crush_rule
https://github.com/ceph/ceph/pull/32255 Deepika Upadhyay
08:24 AM Backport #43316 (In Progress): nautilus:wrong datatype describing crush_rule
Deepika Upadhyay
08:03 AM Backport #43316 (Resolved): nautilus:wrong datatype describing crush_rule
https://github.com/ceph/ceph/pull/32254 Deepika Upadhyay
02:50 AM Bug #43307 (In Progress): Remove use of rules batching for upmap balancer
David Zafman
02:49 AM Bug #43312 (In Progress): Change default upmap_max_deviation to 5
David Zafman
02:06 AM Bug #43312 (Resolved): Change default upmap_max_deviation to 5
David Zafman
12:24 AM Bug #43311 (Resolved): asynchronous recovery + backfill might spin pg undersized for a long time
When an osd that is part of current up set gets chosen as an
async_recovery_target, it gets removed from the acting ...
xie xingguo
12:16 AM Bug #43308 (In Progress): negative num_objects can set PG_STATE_DEGRADED
Neha Ojha

12/13/2019

08:40 PM Bug #40963 (Resolved): mimic: MQuery during Deleting state
Sage Weil
08:40 PM Bug #41317 (Pending Backport): PeeringState::GoClean will call purge_strays unconditionally
Sage Weil
07:47 PM Bug #43308 (Resolved): negative num_objects can set PG_STATE_DEGRADED
... Neha Ojha
07:05 PM Bug #43296: Ceph assimilate-conf results in config entries which can not be removed
Alwin from Proxmox provided a work around but this still appears to be a bug:
https://forum.proxmox.com/threads/ceph...
David Herselman
04:51 PM Bug #43296: Ceph assimilate-conf results in config entries which can not be removed
Setting debug_rdb to 5/5 unfortunately doesn't reveal anything:
Commands:...
David Herselman
03:37 AM Bug #43296 (Resolved): Ceph assimilate-conf results in config entries which can not be removed
We assimilated our Ceph configuration file and subsequently have a minimal config file. We are subsequently not able ... David Herselman
04:31 PM Bug #43307 (Resolved): Remove use of rules batching for upmap balancer

Due to cost of calculations for very large PG/shard counts, we will settle for balancing each pool individually for...
David Zafman
03:43 PM Bug #25174 (Can't reproduce): osd: assert failure with FAILED assert(repop_queue.front() == repop...
Neha Ojha
02:43 PM Bug #43306 (Resolved): segv in collect_sys_info
Run: http://pulpito.ceph.com/teuthology-2019-12-13_02:25:03-upgrade:luminous-x-nautilus-distro-basic-smithi/
Job: '4...
Yuri Weinstein
02:40 PM Bug #43305 (Won't Fix): "psutil.NoSuchProcess process no longer exists" error in luminous-x-nauti...
Run: http://pulpito.ceph.com/teuthology-2019-12-13_02:25:03-upgrade:luminous-x-nautilus-distro-basic-smithi/
Jobs: '...
Yuri Weinstein
08:23 AM Backport #42259 (Resolved): nautilus: document new option mon_max_pg_per_osd
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31300
m...
Nathan Cutler
08:22 AM Backport #40947 (Resolved): luminous: Better default value for osd_snap_trim_sleep
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31857
m...
Nathan Cutler
08:22 AM Backport #38205 (Resolved): luminous: osds allows to partially start more than N+2
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31858
m...
Nathan Cutler
08:22 AM Backport #43093 (Resolved): luminous: Improve OSDMap::calc_pg_upmaps() efficiency
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31992
m...
Nathan Cutler
06:17 AM Bug #40712: ceph-mon crash with assert(err == 0) after rocksdb->get
we meet this problem recently.
we decline this related more to rocksdb but not ceph
huang jun

12/12/2019

04:41 PM Backport #40947: luminous: Better default value for osd_snap_trim_sleep
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31857
mergedReviewed-by: Josh Durgin <jdurgin@redhat.com>
Yuri Weinstein
04:41 PM Backport #38205: luminous: osds allows to partially start more than N+2
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31858
merged
Yuri Weinstein
04:40 PM Backport #43093: luminous: Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman wrote:
> https://github.com/ceph/ceph/pull/31992
merged
Yuri Weinstein
10:16 AM Bug #43174: pgs inconsistent, union_shard_errors=missing
Greg thanks for the reply.
Greg Farnum wrote:
> If you fetch an object in RGW and its backing RADOS objects are m...
Aleksandr Rudenko
09:41 AM Bug #38330 (Resolved): osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:23 AM Backport #43119 (Resolved): mimic: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/32000
m...
Nathan Cutler
08:44 AM Bug #43193: "ceph ping mon.<id>" cannot work
The command "ceph ping mon.a" or "ceph ping mon.b" or "ceph ping mon.c" works fine.
If the mon id is not specified, ...
Min Shi
05:31 AM Bug #41317 (Fix Under Review): PeeringState::GoClean will call purge_strays unconditionally
Neha Ojha
12:04 AM Bug #43267 (Rejected): unexpected error in BlueStore::_txc_add_transaction
Jeff Layton
12:02 AM Bug #43267: unexpected error in BlueStore::_txc_add_transaction
Nope, it was full. Well spotted:... Jeff Layton

12/11/2019

11:28 PM Bug #43267: unexpected error in BlueStore::_txc_add_transaction

This is caused by an out of space condition that won't usually happen. Check your BlueStore configuration.
Is ...
David Zafman
10:21 PM Bug #43267: unexpected error in BlueStore::_txc_add_transaction
This is simply out-of-space condition, see:
-6> 2019-12-11T16:13:44.466-0500 7fcbe4ecd700 -1 bluestore(/build/ce...
Igor Fedotov
09:39 PM Bug #43267 (Rejected): unexpected error in BlueStore::_txc_add_transaction
I was testing kcephfs vs. a vstart cluster and the OSD crashed. fsstress was running at the time, so it was being kep... Jeff Layton
10:26 PM Bug #43268 (New): Restrict admin socket commands more from the Ceph tool
https://bugzilla.redhat.com/show_bug.cgi?id=1780458
It sounds like we've given admin socket access to any cephx us...
Greg Farnum
10:17 PM Bug #43106 (Resolved): mimic: crash in build_incremental_map_msg
Marking this resolved as all the backports are now in place. Neha Ojha
10:17 PM Bug #43174 (Closed): pgs inconsistent, union_shard_errors=missing
If you fetch an object in RGW and its backing RADOS objects are missing, it just fills in the space with zeros. It so... Greg Farnum
10:15 PM Bug #43173 (Duplicate): pgs inconsistent, union_shard_errors=missing
Neha Ojha
08:07 PM Bug #43266 (Fix Under Review): common: admin socket compiler warning
Patrick Donnelly
08:03 PM Bug #43266 (Resolved): common: admin socket compiler warning
... Patrick Donnelly
01:38 PM Backport #43257 (Resolved): mimic: monitor config store: Deleting logging config settings does no...
https://github.com/ceph/ceph/pull/33327 Nathan Cutler
01:38 PM Backport #43256 (Resolved): nautilus: monitor config store: Deleting logging config settings does...
https://github.com/ceph/ceph/pull/32846 Nathan Cutler
04:05 AM Bug #42964 (Pending Backport): monitor config store: Deleting logging config settings does not de...
Sage Weil

12/10/2019

08:44 PM Backport #40890 (In Progress): mimic: Pool settings aren't populated to OSD after restart.
Nathan Cutler
08:41 PM Backport #40891 (In Progress): nautilus: Pool settings aren't populated to OSD after restart.
Nathan Cutler
08:34 PM Backport #43246 (Resolved): nautilus: Nearfull warnings are incorrect
https://github.com/ceph/ceph/pull/32773 Nathan Cutler
08:29 PM Backport #43245 (Resolved): nautilus: osd: increase priority in certain OSD perf counters
https://github.com/ceph/ceph/pull/32845 Nathan Cutler
08:25 PM Backport #43239 (Resolved): nautilus: ok-to-stop incorrect for some ec pgs
https://github.com/ceph/ceph/pull/32844 Nathan Cutler
08:24 PM Backport #43232 (Rejected): nautilus: pgs stuck in laggy state
Nathan Cutler
04:10 PM Bug #42346 (Pending Backport): Nearfull warnings are incorrect
David Zafman
03:26 PM Bug #42961 (Pending Backport): osd: increase priority in certain OSD perf counters
Neha Ojha
02:51 PM Bug #43189 (Pending Backport): pgs stuck in laggy state
I'm not sure whether we should backport this to nautilus or not. We only noticed qa failures because the new octopus... Sage Weil
02:50 PM Bug #43189 (Resolved): pgs stuck in laggy state
Sage Weil
01:48 AM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
/a/yuriw-2019-12-06_21:30:44-upgrade:mimic-x-nautilus-distro-basic-smithi/4576681 Neha Ojha

12/09/2019

10:07 PM Bug #43067: Git Master: src/compressor/zlib/ZlibCompressor.cc / src/compressor/zlib/CMakeLists.txt
Thanks Lee!
We generally do patch contributions through Github; can you submit a PR there?
If not, we need a spec...
Greg Farnum
09:53 PM Bug #43176 (Duplicate): pgs inconsistent, union_shard_errors=missing
Nathan Cutler
09:53 PM Bug #43175 (Duplicate): pgs inconsistent, union_shard_errors=missing
Nathan Cutler
09:35 PM Bug #43151 (Pending Backport): ok-to-stop incorrect for some ec pgs
Sage Weil
04:58 PM Bug #43189 (Fix Under Review): pgs stuck in laggy state
Sage Weil
03:15 PM Bug #43189: pgs stuck in laggy state
The problem is the role. The proc_lease() method does this check... Sage Weil
02:33 PM Bug #43189 (In Progress): pgs stuck in laggy state
Sage Weil
04:50 PM Bug #43213 (New): OSDMap::pg_to_up_acting etc specify primary as osd, not pg_shard_t(osd+shard)
The OSD methods to map a PG return primary as an int, not pg_shard_t (osd + shard).
Objecter compensates for this ...
Sage Weil
04:06 PM Bug #40963: mimic: MQuery during Deleting state
/a/sage-2019-12-08_05:43:33-rados-nautilus-distro-basic-smithi/4580545 Neha Ojha
12:59 PM Backport #40890: mimic: Pool settings aren't populated to OSD after restart.
Here's my attempt at the backport: https://github.com/ceph/ceph/pull/32125 Dan van der Ster
12:53 PM Backport #40891: nautilus: Pool settings aren't populated to OSD after restart.
Here's my attempt at the backport: https://github.com/ceph/ceph/pull/32123 Dan van der Ster
08:55 AM Bug #43193 (Rejected): "ceph ping mon.<id>" cannot work
The command "ceph ping mon.<id>" returns an error output:... Min Shi
06:35 AM Bug #42706: LibRadosList.EnumerateObjectsSplit fails
rados_cluster handler will be freed if set_pg_num failed,... huang jun
03:35 AM Bug #42861: Libceph-common.so needs to use private link attribute when including dpdk static library
The dpdk library initializes the EAL using constructors and global
variables, and cannot be re-initialized. Both tes...
chunsong feng

12/08/2019

11:22 PM Bug #43190 (New): qa/standalone/osd/osd-recovery-prio.sh has a race

http://pulpito.ceph.com/dzafman-2019-12-08_11:51:45-rados-master-distro-basic-smithi/4582053/
The test expected ...
David Zafman
09:25 PM Bug #43189: pgs stuck in laggy state
more logs here:
/a/sage-2019-12-07_18:31:18-rados:thrash-erasure-code-wip-sage3-testing-2019-12-05-0959-distro-basic...
Sage Weil
09:23 PM Bug #43189 (Resolved): pgs stuck in laggy state
... Sage Weil

12/07/2019

06:28 PM Bug #43150 (Resolved): osd-scrub-snaps.sh fails
Sage Weil
02:47 PM Bug #41313: PG distribution completely messed up since Nautilus
ceph balancer status
{
"active": true,
"plans": [],
"mode": "upmap"
}
bad distribution:
<p...
Anonymous
02:45 PM Bug #43185: ceph -s not showing client activity
ceph -s only looks like this:
ceph -s
cluster:
id: c4068f25-d46d-438d-af63-5679a2d56efb
health: H...
Anonymous
02:44 PM Bug #43185 (Resolved): ceph -s not showing client activity
Since Nautilus upgrade ceph -s often (2 out of 3 times) does not show any client or recovery activity. Right now it's... Anonymous

12/06/2019

05:21 PM Bug #42964 (Fix Under Review): monitor config store: Deleting logging config settings does not de...
Sage Weil
04:07 PM Bug #42347: nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_flight_list.back(...
Seen in this scrub test run during osd-scrub-repair.sh.
http://pulpito.ceph.com/dzafman-2019-12-05_19:53:40-rados-...
David Zafman
02:01 PM Bug #43176 (Duplicate): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
02:01 PM Bug #43175 (Duplicate): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
02:01 PM Bug #43174 (Resolved): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
02:00 PM Bug #43173 (Duplicate): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
12:55 PM Backport #42997 (In Progress): nautilus: acting_recovery_backfill won't catch all up peers
Nathan Cutler
12:48 PM Backport #42878 (In Progress): nautilus: ceph_test_admin_socket_output fails in rados qa suite
Nathan Cutler
12:48 PM Backport #42853 (In Progress): nautilus: format error: ceph osd stat --format=json
Nathan Cutler
12:47 PM Backport #42847 (Need More Info): mimic: "failing miserably..." in Infiniband.cc
non-trivial Nathan Cutler
12:47 PM Backport #42848 (Need More Info): nautilus: "failing miserably..." in Infiniband.cc
non-trivial Nathan Cutler
04:23 AM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
Oops. I think the more significant issue is that short_pg_log.yaml isn't involved. David Zafman
02:09 AM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
David Zafman wrote:
> Seen in a non-upgrade test:
This is an upgrade test: "rados/upgrade/jewel-x-singleton/{0-c...
Neha Ojha
02:00 AM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
Seen in a -non-upgrade- test with description:
rados/upgrade/jewel-x-singleton/{0-cluster/{openstack.yaml start.ya...
David Zafman

12/05/2019

11:28 PM Bug #41240 (Can't reproduce): All of the cluster SSDs aborted at around the same time and will no...
Brad Hubbard
09:37 PM Bug #41240 (New): All of the cluster SSDs aborted at around the same time and will not start.
Patrick Donnelly
11:24 PM Bug #38892 (Closed): /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation...
Brad Hubbard
09:45 PM Bug #38892 (Fix Under Review): /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Se...
Patrick Donnelly
09:44 PM Bug #23590 (Fix Under Review): kstore: statfs: (95) Operation not supported
Patrick Donnelly
09:44 PM Bug #23297 (Fix Under Review): mon-seesaw 'failed to become clean before timeout' due to laggy pg...
Patrick Donnelly
09:43 PM Bug #13111 (Fix Under Review): replicatedPG:the assert occurs in the fuction ReplicatedPG::on_loc...
Patrick Donnelly
09:40 PM Feature #38653 (New): Enhance health message when pool quota fills up
Patrick Donnelly
09:40 PM Bug #38783 (New): Changing mon_pg_warn_max_object_skew has no effect.
Patrick Donnelly
09:40 PM Feature #3764 (New): osd: async replicas
Patrick Donnelly
09:37 PM Bug #43048 (New): nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
Patrick Donnelly
09:37 PM Bug #42918 (New): memory corruption and lockups with I-Object
Patrick Donnelly
09:37 PM Bug #42780 (New): recursive lock of OpTracker::lock (70)
Patrick Donnelly
09:37 PM Bug #42706 (New): LibRadosList.EnumerateObjectsSplit fails
Patrick Donnelly
09:37 PM Bug #42666 (New): mgropen from mgr comes from unknown.$id instead of mgr.$id
Patrick Donnelly
09:37 PM Bug #42186 (New): "2019-10-04T19:31:51.053283+0000 osd.7 (osd.7) 108 : cluster [ERR] 2.5s0 shard ...
Patrick Donnelly
09:37 PM Bug #41406 (New): common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient boot...
Patrick Donnelly
09:37 PM Bug #40963 (New): mimic: MQuery during Deleting state
Patrick Donnelly
06:31 PM Bug #40963: mimic: MQuery during Deleting state
yuriw-2019-12-04_22:44:10-rados-wip-yuri2-testing-2019-12-04-1938-mimic-distro-basic-smithi/4567200/
DeleteStart e...
David Zafman
09:37 PM Bug #40868 (New): src/common/config_proxy.h: 70: FAILED ceph_assert(p != obs_call_gate.end())
Patrick Donnelly
09:37 PM Bug #40820 (New): standalone/scrub/osd-scrub-test.sh +3 day failed assert
Patrick Donnelly
09:37 PM Bug #40666 (New): osd fails to get latest map
Patrick Donnelly
09:37 PM Fix #40564 (New): Objecter does not have perfcounters for op latency
Patrick Donnelly
09:37 PM Bug #40522 (New): on_local_recover doesn't touch?
Patrick Donnelly
09:37 PM Bug #40454 (New): snap_mapper error, scrub gets r -2..repaired
Patrick Donnelly
09:37 PM Bug #40521 (New): cli timeout (e.g., ceph pg dump)
Patrick Donnelly
09:37 PM Bug #40367 (New): "*** Caught signal (Segmentation fault) **" in upgrade:luminous-x-nautilus
Patrick Donnelly
09:37 PM Bug #40410 (New): ceph pg query Segmentation fault in 12.2.10
Patrick Donnelly
09:36 PM Feature #39966 (New): mon: allow log messages to be throttled and/or force trimming
Patrick Donnelly
09:36 PM Bug #40000 (New): osds do not bound xattrs and/or aggregate xattr data in pg log
Patrick Donnelly
09:36 PM Bug #39366 (New): ClsLock.TestRenew failure
Patrick Donnelly
09:36 PM Bug #39145 (New): luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine eve...
Patrick Donnelly
09:36 PM Bug #39148 (New): luminous: powercycle: reached maximum tries (500) after waiting for 3000 seconds
Patrick Donnelly
09:36 PM Bug #39039 (New): mon connection reset, command not resent
Patrick Donnelly
09:36 PM Fix #39071 (New): monclient: initial probe is non-optimal with v2+v1
Patrick Donnelly
09:36 PM Bug #38656 (New): scrub reservation leak?
Patrick Donnelly
09:36 PM Bug #38718 (New): 'osd crush weight-set create-compat' (and other OSDMonitor commands) can leak u...
Patrick Donnelly
09:36 PM Bug #38624 (New): crush: get_rule_weight_osd_map does not handle multi-take rules
Patrick Donnelly
09:36 PM Bug #38513 (New): luminous: "AsyncReserver.h: 190: FAILED assert(!queue_pointers.count(item) && !...
Patrick Donnelly
09:36 PM Bug #38402 (New): ceph-objectstore-tool on down osd w/ not enough in osds
Patrick Donnelly
09:36 PM Bug #38417 (New): ceph tell mon.a help timeout
Patrick Donnelly
09:36 PM Bug #38357 (New): ClsLock.TestExclusiveEphemeralStealEphemeral failed
Patrick Donnelly
09:36 PM Bug #38358 (New): short pg log + cache tier ceph_test_rados out of order reply
Patrick Donnelly
09:36 PM Bug #38195 (New): osd-backfill-space.sh exposes rocksdb hang
Patrick Donnelly
09:36 PM Bug #38345 (New): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Patrick Donnelly
09:36 PM Bug #38184 (New): osd: recovery does not preserve copy-on-write allocations between object clones...
Patrick Donnelly
09:36 PM Bug #38159 (New): ec does not recover below min_size
Patrick Donnelly
09:36 PM Bug #38172 (New): segv in rocksdb NewIterator
Patrick Donnelly
09:36 PM Bug #38151 (New): cephx: service ticket validity dobuled
Patrick Donnelly
09:36 PM Bug #38082 (New): mimic: mon/caps.sh fails with "Expected return 0, got 110"
Patrick Donnelly
09:36 PM Bug #38064 (New): librados::OPERATION_FULL_TRY not completely implemented, test LibRadosAio.PoolQ...
Patrick Donnelly
09:36 PM Bug #37582 (New): luminous: ceph -s client gets all mgrmaps
Patrick Donnelly
09:36 PM Bug #37532 (New): mon: expected_num_objects warning triggers on bluestore-only setups
Patrick Donnelly
09:36 PM Bug #37509 (New): require past_interval bounds mismatch due to osd oldest_map
Patrick Donnelly
09:36 PM Bug #36748 (New): ms_deliver_verify_authorizer no AuthAuthorizeHandler found for protocol 0
Patrick Donnelly
09:36 PM Bug #37289 (New): Issue with overfilled OSD for cache-tier pools
Patrick Donnelly
09:36 PM Bug #36634 (New): LibRadosWatchNotify.WatchNotify2Timeout failure
Patrick Donnelly
09:36 PM Bug #36337 (New): OSDs crash with failed assertion in PGLog::merge_log as logs do not overlap
Patrick Donnelly
09:36 PM Bug #36164 (New): cephtool/test fails 'ceph tell mon.a help' with EINTR
Patrick Donnelly
09:36 PM Bug #36113 (New): fusestore test umount failed?
Patrick Donnelly
09:36 PM Bug #35075 (New): copy-get stuck sending osd_op
Patrick Donnelly
09:36 PM Bug #36040 (New): mon: Valgrind: mon (InvalidFree, InvalidWrite, InvalidRead)
Patrick Donnelly
09:36 PM Bug #24874 (New): ec fast reads can trigger read errors in log
Patrick Donnelly
09:36 PM Bug #26891 (New): backfill reservation deadlock/stall
Patrick Donnelly
09:36 PM Bug #24242 (New): tcmalloc::ThreadCache::ReleaseToCentralCache on rhel (w/ centos packages)
Patrick Donnelly
09:36 PM Bug #24339 (New): FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in sca...
Patrick Donnelly
09:36 PM Bug #23965 (New): FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cach...
Patrick Donnelly
09:36 PM Bug #23857 (New): flush (manifest) vs async recovery causes out of order op
Patrick Donnelly
09:36 PM Bug #23879 (New): test_mon_osdmap_prune.sh fails
Patrick Donnelly
09:36 PM Bug #23828 (New): ec gen object leaks into different filestore collection just after split
Patrick Donnelly
09:36 PM Bug #23760 (New): mon: `config get <who>` does not allow `who` as 'mon'/'osd'
Patrick Donnelly
09:36 PM Bug #23767 (New): "ceph ping mon" doesn't work
Patrick Donnelly
09:36 PM Bug #23270 (New): failed mutex assert in PipeConnection::try_get_pipe() (via OSD::do_command())
Patrick Donnelly
09:36 PM Bug #23428 (New): Snapset inconsistency is hard to diagnose because authoritative copy used by li...
Patrick Donnelly
09:36 PM Bug #23029 (New): osd does not handle eio on meta objects (e.g., osdmap)
Patrick Donnelly
09:36 PM Bug #22656 (New): scrub mismatch on bytes (cache pools)
Patrick Donnelly
09:36 PM Bug #21592 (New): LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
Patrick Donnelly
09:36 PM Bug #21495 (New): src/osd/OSD.cc: 346: FAILED assert(piter != rev_pending_splits.end())
Patrick Donnelly
09:36 PM Bug #21129 (New): 'ceph -s' hang
Patrick Donnelly
09:36 PM Bug #21194 (New): mon clock skew test is fragile
Patrick Donnelly
09:36 PM Bug #20960 (New): ceph_test_rados: mismatched version (due to pg import/export)
Patrick Donnelly
09:35 PM Bug #20952 (New): Glitchy monitor quorum causes spurious test failure
Patrick Donnelly
09:35 PM Bug #20922 (New): misdirected op with localize_reads set
Patrick Donnelly
09:35 PM Bug #20846 (New): ceph_test_rados_list_parallel: options dtor racing with DispatchQueue lockdep -...
Patrick Donnelly
09:35 PM Bug #20770 (New): test_pidfile.sh test is failing 2 places
Patrick Donnelly
09:35 PM Bug #20730 (New): need new OSD_SKEWED_USAGE implementation
Patrick Donnelly
09:35 PM Bug #20370 (New): leaked MOSDOp via PrimaryLogPG::_copy_some and PrimaryLogPG::do_proxy_write
Patrick Donnelly
09:35 PM Bug #20646 (New): run_seed_to_range.sh: segv, tp_fstore_op timeout
Patrick Donnelly
09:35 PM Bug #20360 (New): rados/verify valgrind tests: osds fail to start (xenial valgrind)
Patrick Donnelly
09:35 PM Bug #20369 (New): segv in OSD::ShardedOpWQ::_process
Patrick Donnelly
09:35 PM Bug #20221 (New): kill osd + osd out leads to stale PGs
Patrick Donnelly
09:35 PM Bug #20169 (New): filestore+btrfs occasionally returns ENOSPC
Patrick Donnelly
09:35 PM Bug #20053 (New): crush compile / decompile looses precision on weight
Patrick Donnelly
09:35 PM Bug #19700 (New): OSD remained up despite cluster network being inactive?
Patrick Donnelly
09:35 PM Bug #19486 (New): Rebalancing can propagate corrupt copy of replicated object
Patrick Donnelly
09:35 PM Bug #19518 (New): log entry does not include per-op rvals?
Patrick Donnelly
09:35 PM Bug #19440 (New): osd: trims maps taht pgs haven't consumed yet when there are gaps
Patrick Donnelly
09:35 PM Bug #17257 (New): ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
Patrick Donnelly
09:35 PM Bug #15015 (New): prepare_new_pool doesn't return failure string ss
Patrick Donnelly
09:35 PM Bug #14115 (New): crypto: race in nss init
Patrick Donnelly
09:35 PM Bug #13385 (New): cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final ro...
Patrick Donnelly
09:35 PM Bug #12687 (New): osd thrashing + pg import/export can cause maybe_went_rw intervals to be missed
Patrick Donnelly
09:35 PM Bug #12615 (New): Repair of Erasure Coded pool with an unrepairable object causes pg state to los...
Patrick Donnelly
09:35 PM Bug #11235 (New): test_rados.py test_aio_read is racy
Patrick Donnelly
09:35 PM Bug #9606 (New): mon: ambiguous error_status returned to user when type is wrong in a command
Patrick Donnelly
08:31 PM Bug #43151 (Fix Under Review): ok-to-stop incorrect for some ec pgs
Sage Weil
04:33 PM Bug #43151 (Resolved): ok-to-stop incorrect for some ec pgs
before,... Sage Weil
08:16 PM Backport #43119: mimic: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32000
merged
Yuri Weinstein
08:01 PM Backport #41238: nautilus: Implement mon_memory_target
Follow-on fix: https://github.com/ceph/ceph/pull/32045 Neha Ojha
08:00 PM Feature #40870: Implement mon_memory_target
This has a follow-on fix: https://github.com/ceph/ceph/pull/32044 Neha Ojha
06:01 PM Bug #38040: osd_map_message_max default is too high?
Luminous backport analysis:
* https://github.com/ceph/ceph/pull/26340 - two of three commits backported to luminou...
Nathan Cutler
05:50 PM Bug #43150 (In Progress): osd-scrub-snaps.sh fails
David Zafman
05:21 PM Bug #43150: osd-scrub-snaps.sh fails
During testing I saw this even though it isn't what happened in the teuthology runs. I think in all cases we have sc... David Zafman
03:51 PM Bug #43150 (Resolved): osd-scrub-snaps.sh fails
/a/sage-2019-12-04_19:33:15-rados-wip-sage2-testing-2019-12-04-0856-distro-basic-smithi/4567061
/a/sage-2019-12-04_1...
Sage Weil
05:41 PM Bug #43106: mimic: crash in build_incremental_map_msg
The three PRs that need to be backported to mimic are:
* https://github.com/ceph/ceph/pull/26340 - backported to m...
Nathan Cutler
01:41 PM Backport #43140 (In Progress): nautilus: ceph-mon --mkfs: public_address type (v1|v2) is not resp...
Nathan Cutler
11:07 AM Backport #43140 (Resolved): nautilus: ceph-mon --mkfs: public_address type (v1|v2) is not respected
https://github.com/ceph/ceph/pull/32028 Nathan Cutler
01:34 PM Bug #42485: verify_upmaps can not cancel invalid upmap_items in some cases
NOTE: https://github.com/ceph/ceph/pull/31131 was merged to master and backported to nautilus and luminous, before it... Nathan Cutler
04:04 AM Bug #42485 (Resolved): verify_upmaps can not cancel invalid upmap_items in some cases
David Zafman
01:32 PM Backport #42547: nautilus: verify_upmaps can not cancel invalid upmap_items in some cases
NOTE: reverted by https://github.com/ceph/ceph/pull/32018 Nathan Cutler
01:30 PM Backport #42548: luminous: verify_upmaps can not cancel invalid upmap_items in some cases
Note: reverted by https://github.com/ceph/ceph/pull/32019 Nathan Cutler
07:52 AM Bug #42906 (Pending Backport): ceph-mon --mkfs: public_address type (v1|v2) is not respected
Kefu Chai
06:22 AM Bug #37968 (Resolved): maybe_remove_pg_upmaps incorrectly cancels valid pending upmaps
David Zafman
06:21 AM Backport #38163 (Resolved): mimic: maybe_remove_pg_upmaps incorrectly cancels valid pending upmaps
David Zafman
04:04 AM Backport #42546 (Rejected): mimic: verify_upmaps can not cancel invalid upmap_items in some cases
This change has been reverted so we won't backport. David Zafman
12:43 AM Bug #43124: Probably legal crush rules cause upmaps to be cleaned

We are reverting the original pull request which changed verify_upmaps(): https://github.com/ceph/ceph/pull/31131
...
David Zafman

12/04/2019

08:51 PM Bug #43126 (Resolved): OSD_SLOW_PING_TIME_BACK nits

From Sage e-mail:
Long heartbeat ping times on back interface seen, longest is 1315.510 msec (OSD_SLOW_PING_TIME...
David Zafman
08:46 PM Bug #43124 (Resolved): Probably legal crush rules cause upmaps to be cleaned
I've seen multiple user sites with crush rules for EC pools which will trigger the verify_upmap() to detect an error.... David Zafman
08:24 PM Backport #42546 (In Progress): mimic: verify_upmaps can not cancel invalid upmap_items in some cases
David Zafman
08:13 PM Backport #42546 (Resolved): mimic: verify_upmaps can not cancel invalid upmap_items in some cases
David Zafman
12:13 PM Bug #38330: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
@Dan, @Neha - mimic backport staged at https://github.com/ceph/ceph/pull/26448 Nathan Cutler
02:33 AM Bug #38330 (Pending Backport): osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
Based on https://tracker.ceph.com/issues/43106#note-1 and https://tracker.ceph.com/issues/38282#note-14 Neha Ojha
12:11 PM Backport #43119 (In Progress): mimic: osd/OSD.cc: 1515: abort() in Service::build_incremental_map...
Nathan Cutler
12:08 PM Backport #43119 (Resolved): mimic: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
https://github.com/ceph/ceph/pull/32000 Nathan Cutler
02:30 AM Bug #43106: mimic: crash in build_incremental_map_msg
I think you are right. We should have backported all three PRs according to https://tracker.ceph.com/issues/38040#not... Neha Ojha

12/03/2019

07:37 PM Bug #43110 (Duplicate): rados/test.sh failure: ceph_test_rados_api_watch_notify_pp
https://tracker.ceph.com/issues/42933 Greg Farnum
07:27 PM Bug #43110: rados/test.sh failure: ceph_test_rados_api_watch_notify_pp
Neha pointed out the core info is obviously helpful:
> 1575385919.6406.core: ELF 64-bit LSB core file x86-64, vers...
Greg Farnum
07:18 PM Bug #43110 (Duplicate): rados/test.sh failure: ceph_test_rados_api_watch_notify_pp
I noticed this in a branch of my own, but it appears to be showing up in the master smoke tests too.
rados/test.sh...
Greg Farnum
05:25 PM Backport #43093: luminous: Improve OSDMap::calc_pg_upmaps() efficiency
@David Does this need https://github.com/ceph/ceph/pull/31944 as well? Nathan Cutler
05:05 PM Backport #43093 (In Progress): luminous: Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman
03:40 PM Bug #43106 (Resolved): mimic: crash in build_incremental_map_msg
Since upgrading from 13.2.6 to 13.2.7 we get this around once per 10 minutes on a cluster with 500 out of 1500 OSDs u... Dan van der Ster
03:27 PM Bug #38330: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
https://tracker.ceph.com/issues/38282 was backported to mimic in 13.2.7.
Does this need a backport also ?
(we ha...
Dan van der Ster
09:53 AM Bug #42961: osd: increase priority in certain OSD perf counters
Neha Ojha wrote:
> Ernesto, while we are at it, are there any other specific stats that you've gotten requests for?
...
Ernesto Puerta
02:00 AM Bug #42961 (Fix Under Review): osd: increase priority in certain OSD perf counters
Ernesto, while we are at it, are there any other specific stats that you've gotten requests for? Neha Ojha
09:13 AM Backport #43099 (Resolved): nautilus: nautilus:osd: network numa affinity not supporting subnet port
https://github.com/ceph/ceph/pull/32843 Nathan Cutler
02:53 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
I think we can dispense with the session put when we call 'remove_session' since we call it when we replace the sessi... Brad Hubbard
01:17 AM Backport #43094 (In Progress): mimic: Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman
01:15 AM Backport #43092 (In Progress): nautilus: Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman

12/02/2019

11:30 PM Bug #42346 (In Progress): Nearfull warnings are incorrect

Spurious nearfull warnings caused by backfill reservation mechanism during rebalancing. The nearfull ratio was com...
David Zafman
11:23 PM Bug #42718: Improve OSDMap::calc_pg_upmaps() efficiency
https://github.com/ceph/ceph/pull/31944 is a follow-on fix for https://github.com/ceph/ceph/pull/31774 Neha Ojha
09:56 PM Bug #42718 (Pending Backport): Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman
09:59 PM Backport #43094 (Resolved): mimic: Improve OSDMap::calc_pg_upmaps() efficiency
https://github.com/ceph/ceph/pull/31957 David Zafman
09:58 PM Backport #43093 (Resolved): luminous: Improve OSDMap::calc_pg_upmaps() efficiency
https://github.com/ceph/ceph/pull/31992 David Zafman
09:58 PM Backport #43092 (Resolved): nautilus: Improve OSDMap::calc_pg_upmaps() efficiency
https://github.com/ceph/ceph/pull/31956 David Zafman
06:56 PM Bug #42411 (Pending Backport): nautilus:osd: network numa affinity not supporting subnet port
Sage Weil
05:54 AM Bug #41313: PG distribution completely messed up since Nautilus
This happens with active PG balancer if the cluster is in WARN state.
...
51 hdd 9.09470 1.00000 9.1 TiB 5.8 ...
Anonymous
02:11 AM Bug #42102: use-after-free in Objecter timer handing
... Sage Weil

12/01/2019

01:23 AM Bug #43067 (New): Git Master: src/compressor/zlib/ZlibCompressor.cc / src/compressor/zlib/CMakeLi...
When Ceph is built without support for CPU feature SSE4_1 (HAVE_INTEL_SSE4_1), the CMake build system does not link ... Lee Leahu

11/29/2019

06:01 PM Bug #42780: recursive lock of OpTracker::lock (70)
I will be working on this bug after returning from PTO (ETA: 16 Dec 2019). Radoslaw Zarzynski
05:41 PM Bug #42780: recursive lock of OpTracker::lock (70)
THe problem comes from OSD::get_health_metrics(), where the visitor lambda is holding the lock and also drops a refer... Sage Weil

11/27/2019

08:40 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
https://www.spinics.net/lists/ceph-users/msg54910.html - could also be related. Neha Ojha
08:36 PM Bug #43048 (Won't Fix - EOL): nautilus: upgrade/mimic-x/stress-split: failed to recover before ti...
... Neha Ojha
04:52 PM Bug #24419: ceph-objectstore-tool unable to open mon store
To be clear, this isn't an issue in mimic or later releases. Josh Durgin
04:50 PM Bug #24419 (Won't Fix): ceph-objectstore-tool unable to open mon store
It looks like this is due to bluestore setting the rocksdb_db_paths config option in luminous. This causes the ceph-o... Josh Durgin

11/26/2019

10:18 PM Bug #42978 (Resolved): ops waiting for lock not requeued; client sees misordering
Sage Weil
10:16 PM Bug #42012 (Resolved): mon osd_snap keys grow unbounded
Sage Weil
09:13 AM Backport #42258 (In Progress): mimic: document new option mon_max_pg_per_osd
Nathan Cutler
05:18 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Theory:
In Monitor::_ms_dispatch() when we detect a feature change we end up with the following sequence.
Monit...
Brad Hubbard
03:45 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
... Brad Hubbard
03:20 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
I'm wondering if maybe this happens due to the feature change and the session being removed during the upgrade proces... Brad Hubbard

11/25/2019

07:32 PM Bug #42012 (Fix Under Review): mon osd_snap keys grow unbounded
Sage Weil
07:29 PM Bug #42012 (In Progress): mon osd_snap keys grow unbounded
Okay, in octopus, there are now 2 sets of keys
- purged_snap_*: map intervals of snaps that are purged. adjacent r...
Sage Weil
07:16 PM Bug #42978 (Fix Under Review): ops waiting for lock not requeued; client sees misordering
Sage Weil
03:35 PM Feature #39066 (Resolved): src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:58 PM Feature #39066: src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
Rejecting luminous backport - luminous is EOL. Nathan Cutler
03:33 PM Bug #40910 (Resolved): mon/OSDMonitor.cc: better error message about min_size
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:57 PM Bug #40910: mon/OSDMonitor.cc: better error message about min_size
Rejecting luminous backport - luminous is EOL. Nathan Cutler
03:33 PM Bug #41017 (Resolved): Change default for bluestore_fsck_on_mount_deep as false
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:56 PM Bug #41017: Change default for bluestore_fsck_on_mount_deep as false
Rejecting luminous backport - luminous is EOL. Nathan Cutler
03:29 PM Bug #42933 (Rejected): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify2/1
we reverted, see https://github.com/ceph/ceph/pull/31790 Sage Weil
03:23 PM Backport #38205 (In Progress): luminous: osds allows to partially start more than N+2
Nathan Cutler
03:19 PM Backport #40947 (In Progress): luminous: Better default value for osd_snap_trim_sleep
Nathan Cutler
03:13 PM Backport #41730 (Need More Info): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(pe...
Nathan Cutler
03:05 PM Backport #41730 (In Progress): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_...
Nathan Cutler
02:57 PM Backport #39381 (Rejected): luminous: src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
Nathan Cutler
02:57 PM Backport #40941 (Rejected): luminous: mon/OSDMonitor.cc: better error message about min_size
Nathan Cutler
02:56 PM Backport #41085 (Rejected): luminous: Change default for bluestore_fsck_on_mount_deep as false
Nathan Cutler
09:44 AM Backport #42998 (Resolved): mimic: acting_recovery_backfill won't catch all up peers
https://github.com/ceph/ceph/pull/33324 Nathan Cutler
09:43 AM Backport #42997 (Resolved): nautilus: acting_recovery_backfill won't catch all up peers
https://github.com/ceph/ceph/pull/32064 Nathan Cutler
09:43 AM Backport #42996 (Rejected): luminous: acting_recovery_backfill won't catch all up peers
https://github.com/ceph/ceph/pull/33326 Nathan Cutler
06:56 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
I'm interested to see more cores in case one sheds more light. I've started a few runs in the hope they will fail. Brad Hubbard
05:14 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
In both cases so far the Message type appears to be MSG_MON_PAXOS and priority is CEPH_MSG_PRIO_HIGH.... Brad Hubbard
03:40 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Neha sent me another instance of this issue available at http://pulpito.ceph.com/nojha-2019-11-22_18:41:03-rados:upgr... Brad Hubbard
12:58 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
... Brad Hubbard
05:27 AM Bug #42971: mgr hangs with upmap balancer
So I wrote my own upmap balancer this weekend and after running it for a bit I found the same problem. It appears th... Bryan Stillwell
04:28 AM Backport #42662 (In Progress): nautilus:Issue a HEALTH_WARN when a Pool is configured with [min_]...
Sridhar Seshasayee

11/24/2019

06:17 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
/a/sage-2019-11-24_06:32:18-rados-wip-sage-testing-2019-11-23-2031-distro-basic-smithi/4538572... Sage Weil
04:58 PM Bug #42577 (Pending Backport): acting_recovery_backfill won't catch all up peers
Kefu Chai
04:57 PM Bug #42782: nautilus: rados/test_librados_build.sh build failure
https://github.com/ceph/ceph/pull/31693 Kefu Chai

11/22/2019

10:22 PM Bug #42975 (Duplicate): out of order ops in rados/upgrade/nautilus-x-singleton
Neha Ojha
05:49 PM Bug #42975: out of order ops in rados/upgrade/nautilus-x-singleton
Another out of order bug https://tracker.ceph.com/issues/42328. Neha Ojha
05:47 PM Bug #42975 (Duplicate): out of order ops in rados/upgrade/nautilus-x-singleton
... Neha Ojha
09:36 PM Bug #42968 (Duplicate): TestClsRbd.mirror_image_status failure during luminous->nautilus upgrade
Duplicating to https://tracker.ceph.com/issues/42891 as its the same issue. Jason Dillaman
03:06 PM Bug #42968 (Duplicate): TestClsRbd.mirror_image_status failure during luminous->nautilus upgrade
Run: http://pulpito.ceph.com/teuthology-2019-11-22_02:25:03-upgrade:luminous-x-nautilus-distro-basic-smithi/
Jobs:'4...
Yuri Weinstein
07:44 PM Bug #42978: ops waiting for lock not requeued; client sees misordering
reproduces with suite: rados:upgrade:nautilus-x-singleton
filter: '0-cluster/{openstack.yaml start.yaml} 1-install...
Neha Ojha
07:09 PM Bug #42978: ops waiting for lock not requeued; client sees misordering
ok, 99% sure the problem si this bit of code in release_object_locks()... Sage Weil
07:05 PM Bug #42978 (Resolved): ops waiting for lock not requeued; client sees misordering
a ceph_test_rados sequence of ops come in, but replies go back out of order... Sage Weil
06:44 PM Bug #42977 (Resolved): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
... Neha Ojha
04:37 PM Bug #42971: mgr hangs with upmap balancer
We are using device classes. Bryan Stillwell
04:28 PM Bug #42971: mgr hangs with upmap balancer
Hey Bryan, David's been fixing a couple issues in the balancer that sound like what you're running into:
1) https:...
Josh Durgin
04:15 PM Bug #42971 (New): mgr hangs with upmap balancer
On multiple clusters we are seeing the mgr hang frequently when the balancer is enabled. It seems that the balancer ... Bryan Stillwell
01:24 PM Bug #42964 (Resolved): monitor config store: Deleting logging config settings does not decrease l...
How to reproduce:
1. increase log level of mds:
ceph config set mds debug_mds 10/10
2. try to revert this:
ce...
Марк Коренберг
11:22 AM Bug #42477: Rados should use the '-o outfile' convention
@Nathan, I think that's the right decision in this case mate. It should be less disruptive hopefully. Brad Hubbard
08:45 AM Bug #42477: Rados should use the '-o outfile' convention
@Brad - got it, thanks. So, the issue is fixed as of Octopus and the fix will not be backported for the reason you st... Nathan Cutler
08:44 AM Bug #42477 (Resolved): Rados should use the '-o outfile' convention
Nathan Cutler
11:16 AM Bug #42961 (Resolved): osd: increase priority in certain OSD perf counters
There are reports from users missing stats in dashboard/prometheus mgr modules about the following perf counters:
<p...
Ernesto Puerta

11/21/2019

03:56 PM Bug #42933 (Rejected): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify2/1
... Sage Weil
03:23 PM Bug #42918: memory corruption and lockups with I-Object
I managed to grab the stack traces from when it locks up instead of crashing -- also around watch/notify in the face ... Ilya Dryomov
03:18 PM Bug #42918: memory corruption and lockups with I-Object
Ilya Dryomov wrote:
> Haven't tried without failure injection yet, but it's probably related to ms_inject_socket_fai...
Ilya Dryomov
02:37 PM Bug #42918: memory corruption and lockups with I-Object
one segfault related to watch/notify is fixed in https://github.com/ceph/ceph/pull/31768, but testing in the rgw suit... Casey Bodley
01:31 PM Bug #42918: memory corruption and lockups with I-Object
Haven't tried without failure injection yet, but it's probably related to ms_inject_socket_failures (and resulting wa... Ilya Dryomov
01:28 PM Bug #42918: memory corruption and lockups with I-Object
@Ilya: does it reproduce when you have injected socket failures disabled? From your initial logs and from the backtra... Jason Dillaman
01:25 PM Bug #42918: memory corruption and lockups with I-Object
Excellent sleuthing -- thanks! I am going to bump this over to the RADOS project since I can't see how this is purely... Jason Dillaman
11:56 AM Bug #42918: memory corruption and lockups with I-Object
Got actionable stack traces on 669453138d89:... Ilya Dryomov
10:42 AM Bug #42918: memory corruption and lockups with I-Object
Looks real and seems to be introduced with I-Object: no issues with 36f5fcbb97eb ("Merge PR #31672 into master") and ... Ilya Dryomov
05:26 AM Bug #42718: Improve OSDMap::calc_pg_upmaps() efficiency

The rules based pool groups being passed to calc_pg_upmaps() is a better method, so we don't want to revert.
try...
David Zafman
01:58 AM Backport #41531 (Resolved): nautilus: Move bluefs alloc size initialization log message to log le...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30229
m...
Nathan Cutler
 

Also available in: Atom