Project

General

Profile

Activity

From 11/10/2019 to 12/09/2019

12/09/2019

10:07 PM Bug #43067: Git Master: src/compressor/zlib/ZlibCompressor.cc / src/compressor/zlib/CMakeLists.txt
Thanks Lee!
We generally do patch contributions through Github; can you submit a PR there?
If not, we need a spec...
Greg Farnum
09:53 PM Bug #43176 (Duplicate): pgs inconsistent, union_shard_errors=missing
Nathan Cutler
09:53 PM Bug #43175 (Duplicate): pgs inconsistent, union_shard_errors=missing
Nathan Cutler
09:35 PM Bug #43151 (Pending Backport): ok-to-stop incorrect for some ec pgs
Sage Weil
04:58 PM Bug #43189 (Fix Under Review): pgs stuck in laggy state
Sage Weil
03:15 PM Bug #43189: pgs stuck in laggy state
The problem is the role. The proc_lease() method does this check... Sage Weil
02:33 PM Bug #43189 (In Progress): pgs stuck in laggy state
Sage Weil
04:50 PM Bug #43213 (New): OSDMap::pg_to_up_acting etc specify primary as osd, not pg_shard_t(osd+shard)
The OSD methods to map a PG return primary as an int, not pg_shard_t (osd + shard).
Objecter compensates for this ...
Sage Weil
04:06 PM Bug #40963: mimic: MQuery during Deleting state
/a/sage-2019-12-08_05:43:33-rados-nautilus-distro-basic-smithi/4580545 Neha Ojha
12:59 PM Backport #40890: mimic: Pool settings aren't populated to OSD after restart.
Here's my attempt at the backport: https://github.com/ceph/ceph/pull/32125 Dan van der Ster
12:53 PM Backport #40891: nautilus: Pool settings aren't populated to OSD after restart.
Here's my attempt at the backport: https://github.com/ceph/ceph/pull/32123 Dan van der Ster
08:55 AM Bug #43193 (Rejected): "ceph ping mon.<id>" cannot work
The command "ceph ping mon.<id>" returns an error output:... Min Shi
06:35 AM Bug #42706: LibRadosList.EnumerateObjectsSplit fails
rados_cluster handler will be freed if set_pg_num failed,... huang jun
03:35 AM Bug #42861: Libceph-common.so needs to use private link attribute when including dpdk static library
The dpdk library initializes the EAL using constructors and global
variables, and cannot be re-initialized. Both tes...
chunsong feng

12/08/2019

11:22 PM Bug #43190 (New): qa/standalone/osd/osd-recovery-prio.sh has a race

http://pulpito.ceph.com/dzafman-2019-12-08_11:51:45-rados-master-distro-basic-smithi/4582053/
The test expected ...
David Zafman
09:25 PM Bug #43189: pgs stuck in laggy state
more logs here:
/a/sage-2019-12-07_18:31:18-rados:thrash-erasure-code-wip-sage3-testing-2019-12-05-0959-distro-basic...
Sage Weil
09:23 PM Bug #43189 (Resolved): pgs stuck in laggy state
... Sage Weil

12/07/2019

06:28 PM Bug #43150 (Resolved): osd-scrub-snaps.sh fails
Sage Weil
02:47 PM Bug #41313: PG distribution completely messed up since Nautilus
ceph balancer status
{
"active": true,
"plans": [],
"mode": "upmap"
}
bad distribution:
<p...
Anonymous
02:45 PM Bug #43185: ceph -s not showing client activity
ceph -s only looks like this:
ceph -s
cluster:
id: c4068f25-d46d-438d-af63-5679a2d56efb
health: H...
Anonymous
02:44 PM Bug #43185 (Resolved): ceph -s not showing client activity
Since Nautilus upgrade ceph -s often (2 out of 3 times) does not show any client or recovery activity. Right now it's... Anonymous

12/06/2019

05:21 PM Bug #42964 (Fix Under Review): monitor config store: Deleting logging config settings does not de...
Sage Weil
04:07 PM Bug #42347: nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_flight_list.back(...
Seen in this scrub test run during osd-scrub-repair.sh.
http://pulpito.ceph.com/dzafman-2019-12-05_19:53:40-rados-...
David Zafman
02:01 PM Bug #43176 (Duplicate): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
02:01 PM Bug #43175 (Duplicate): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
02:01 PM Bug #43174 (Resolved): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
02:00 PM Bug #43173 (Duplicate): pgs inconsistent, union_shard_errors=missing
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
...
Aleksandr Rudenko
12:55 PM Backport #42997 (In Progress): nautilus: acting_recovery_backfill won't catch all up peers
Nathan Cutler
12:48 PM Backport #42878 (In Progress): nautilus: ceph_test_admin_socket_output fails in rados qa suite
Nathan Cutler
12:48 PM Backport #42853 (In Progress): nautilus: format error: ceph osd stat --format=json
Nathan Cutler
12:47 PM Backport #42847 (Need More Info): mimic: "failing miserably..." in Infiniband.cc
non-trivial Nathan Cutler
12:47 PM Backport #42848 (Need More Info): nautilus: "failing miserably..." in Infiniband.cc
non-trivial Nathan Cutler
04:23 AM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
Oops. I think the more significant issue is that short_pg_log.yaml isn't involved. David Zafman
02:09 AM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
David Zafman wrote:
> Seen in a non-upgrade test:
This is an upgrade test: "rados/upgrade/jewel-x-singleton/{0-c...
Neha Ojha
02:00 AM Bug #38069: upgrade:jewel-x-luminous with short_pg_log.yaml fails with assert(s <= can_rollback_to)
Seen in a -non-upgrade- test with description:
rados/upgrade/jewel-x-singleton/{0-cluster/{openstack.yaml start.ya...
David Zafman

12/05/2019

11:28 PM Bug #41240 (Can't reproduce): All of the cluster SSDs aborted at around the same time and will no...
Brad Hubbard
09:37 PM Bug #41240 (New): All of the cluster SSDs aborted at around the same time and will not start.
Patrick Donnelly
11:24 PM Bug #38892 (Closed): /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Segmentation...
Brad Hubbard
09:45 PM Bug #38892 (Fix Under Review): /ceph/src/tools/kvstore_tool.cc:266:1: internal compiler error: Se...
Patrick Donnelly
09:44 PM Bug #23590 (Fix Under Review): kstore: statfs: (95) Operation not supported
Patrick Donnelly
09:44 PM Bug #23297 (Fix Under Review): mon-seesaw 'failed to become clean before timeout' due to laggy pg...
Patrick Donnelly
09:43 PM Bug #13111 (Fix Under Review): replicatedPG:the assert occurs in the fuction ReplicatedPG::on_loc...
Patrick Donnelly
09:40 PM Feature #38653 (New): Enhance health message when pool quota fills up
Patrick Donnelly
09:40 PM Bug #38783 (New): Changing mon_pg_warn_max_object_skew has no effect.
Patrick Donnelly
09:40 PM Feature #3764 (New): osd: async replicas
Patrick Donnelly
09:37 PM Bug #43048 (New): nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
Patrick Donnelly
09:37 PM Bug #42918 (New): memory corruption and lockups with I-Object
Patrick Donnelly
09:37 PM Bug #42780 (New): recursive lock of OpTracker::lock (70)
Patrick Donnelly
09:37 PM Bug #42706 (New): LibRadosList.EnumerateObjectsSplit fails
Patrick Donnelly
09:37 PM Bug #42666 (New): mgropen from mgr comes from unknown.$id instead of mgr.$id
Patrick Donnelly
09:37 PM Bug #42186 (New): "2019-10-04T19:31:51.053283+0000 osd.7 (osd.7) 108 : cluster [ERR] 2.5s0 shard ...
Patrick Donnelly
09:37 PM Bug #41406 (New): common: SafeTimer reinit doesn't fix up "stopping" bool, used in MonClient boot...
Patrick Donnelly
09:37 PM Bug #40963 (New): mimic: MQuery during Deleting state
Patrick Donnelly
06:31 PM Bug #40963: mimic: MQuery during Deleting state
yuriw-2019-12-04_22:44:10-rados-wip-yuri2-testing-2019-12-04-1938-mimic-distro-basic-smithi/4567200/
DeleteStart e...
David Zafman
09:37 PM Bug #40868 (New): src/common/config_proxy.h: 70: FAILED ceph_assert(p != obs_call_gate.end())
Patrick Donnelly
09:37 PM Bug #40820 (New): standalone/scrub/osd-scrub-test.sh +3 day failed assert
Patrick Donnelly
09:37 PM Bug #40666 (New): osd fails to get latest map
Patrick Donnelly
09:37 PM Fix #40564 (New): Objecter does not have perfcounters for op latency
Patrick Donnelly
09:37 PM Bug #40522 (New): on_local_recover doesn't touch?
Patrick Donnelly
09:37 PM Bug #40454 (New): snap_mapper error, scrub gets r -2..repaired
Patrick Donnelly
09:37 PM Bug #40521 (New): cli timeout (e.g., ceph pg dump)
Patrick Donnelly
09:37 PM Bug #40367 (New): "*** Caught signal (Segmentation fault) **" in upgrade:luminous-x-nautilus
Patrick Donnelly
09:37 PM Bug #40410 (New): ceph pg query Segmentation fault in 12.2.10
Patrick Donnelly
09:36 PM Feature #39966 (New): mon: allow log messages to be throttled and/or force trimming
Patrick Donnelly
09:36 PM Bug #40000 (New): osds do not bound xattrs and/or aggregate xattr data in pg log
Patrick Donnelly
09:36 PM Bug #39366 (New): ClsLock.TestRenew failure
Patrick Donnelly
09:36 PM Bug #39145 (New): luminous: jewel-x-singleton: FAILED assert(0 == "we got a bad state machine eve...
Patrick Donnelly
09:36 PM Bug #39148 (New): luminous: powercycle: reached maximum tries (500) after waiting for 3000 seconds
Patrick Donnelly
09:36 PM Bug #39039 (New): mon connection reset, command not resent
Patrick Donnelly
09:36 PM Fix #39071 (New): monclient: initial probe is non-optimal with v2+v1
Patrick Donnelly
09:36 PM Bug #38656 (New): scrub reservation leak?
Patrick Donnelly
09:36 PM Bug #38718 (New): 'osd crush weight-set create-compat' (and other OSDMonitor commands) can leak u...
Patrick Donnelly
09:36 PM Bug #38624 (New): crush: get_rule_weight_osd_map does not handle multi-take rules
Patrick Donnelly
09:36 PM Bug #38513 (New): luminous: "AsyncReserver.h: 190: FAILED assert(!queue_pointers.count(item) && !...
Patrick Donnelly
09:36 PM Bug #38402 (New): ceph-objectstore-tool on down osd w/ not enough in osds
Patrick Donnelly
09:36 PM Bug #38417 (New): ceph tell mon.a help timeout
Patrick Donnelly
09:36 PM Bug #38357 (New): ClsLock.TestExclusiveEphemeralStealEphemeral failed
Patrick Donnelly
09:36 PM Bug #38358 (New): short pg log + cache tier ceph_test_rados out of order reply
Patrick Donnelly
09:36 PM Bug #38195 (New): osd-backfill-space.sh exposes rocksdb hang
Patrick Donnelly
09:36 PM Bug #38345 (New): mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Patrick Donnelly
09:36 PM Bug #38184 (New): osd: recovery does not preserve copy-on-write allocations between object clones...
Patrick Donnelly
09:36 PM Bug #38159 (New): ec does not recover below min_size
Patrick Donnelly
09:36 PM Bug #38172 (New): segv in rocksdb NewIterator
Patrick Donnelly
09:36 PM Bug #38151 (New): cephx: service ticket validity dobuled
Patrick Donnelly
09:36 PM Bug #38082 (New): mimic: mon/caps.sh fails with "Expected return 0, got 110"
Patrick Donnelly
09:36 PM Bug #38064 (New): librados::OPERATION_FULL_TRY not completely implemented, test LibRadosAio.PoolQ...
Patrick Donnelly
09:36 PM Bug #37582 (New): luminous: ceph -s client gets all mgrmaps
Patrick Donnelly
09:36 PM Bug #37532 (New): mon: expected_num_objects warning triggers on bluestore-only setups
Patrick Donnelly
09:36 PM Bug #37509 (New): require past_interval bounds mismatch due to osd oldest_map
Patrick Donnelly
09:36 PM Bug #36748 (New): ms_deliver_verify_authorizer no AuthAuthorizeHandler found for protocol 0
Patrick Donnelly
09:36 PM Bug #37289 (New): Issue with overfilled OSD for cache-tier pools
Patrick Donnelly
09:36 PM Bug #36634 (New): LibRadosWatchNotify.WatchNotify2Timeout failure
Patrick Donnelly
09:36 PM Bug #36337 (New): OSDs crash with failed assertion in PGLog::merge_log as logs do not overlap
Patrick Donnelly
09:36 PM Bug #36164 (New): cephtool/test fails 'ceph tell mon.a help' with EINTR
Patrick Donnelly
09:36 PM Bug #36113 (New): fusestore test umount failed?
Patrick Donnelly
09:36 PM Bug #35075 (New): copy-get stuck sending osd_op
Patrick Donnelly
09:36 PM Bug #36040 (New): mon: Valgrind: mon (InvalidFree, InvalidWrite, InvalidRead)
Patrick Donnelly
09:36 PM Bug #24874 (New): ec fast reads can trigger read errors in log
Patrick Donnelly
09:36 PM Bug #26891 (New): backfill reservation deadlock/stall
Patrick Donnelly
09:36 PM Bug #24242 (New): tcmalloc::ThreadCache::ReleaseToCentralCache on rhel (w/ centos packages)
Patrick Donnelly
09:36 PM Bug #24339 (New): FULL_FORCE ops are dropped if fail-safe full check fails, but not resent in sca...
Patrick Donnelly
09:36 PM Bug #23965 (New): FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cach...
Patrick Donnelly
09:36 PM Bug #23857 (New): flush (manifest) vs async recovery causes out of order op
Patrick Donnelly
09:36 PM Bug #23879 (New): test_mon_osdmap_prune.sh fails
Patrick Donnelly
09:36 PM Bug #23828 (New): ec gen object leaks into different filestore collection just after split
Patrick Donnelly
09:36 PM Bug #23760 (New): mon: `config get <who>` does not allow `who` as 'mon'/'osd'
Patrick Donnelly
09:36 PM Bug #23767 (New): "ceph ping mon" doesn't work
Patrick Donnelly
09:36 PM Bug #23270 (New): failed mutex assert in PipeConnection::try_get_pipe() (via OSD::do_command())
Patrick Donnelly
09:36 PM Bug #23428 (New): Snapset inconsistency is hard to diagnose because authoritative copy used by li...
Patrick Donnelly
09:36 PM Bug #23029 (New): osd does not handle eio on meta objects (e.g., osdmap)
Patrick Donnelly
09:36 PM Bug #22656 (New): scrub mismatch on bytes (cache pools)
Patrick Donnelly
09:36 PM Bug #21592 (New): LibRadosCWriteOps.CmpExt got 0 instead of -4095-1
Patrick Donnelly
09:36 PM Bug #21495 (New): src/osd/OSD.cc: 346: FAILED assert(piter != rev_pending_splits.end())
Patrick Donnelly
09:36 PM Bug #21129 (New): 'ceph -s' hang
Patrick Donnelly
09:36 PM Bug #21194 (New): mon clock skew test is fragile
Patrick Donnelly
09:36 PM Bug #20960 (New): ceph_test_rados: mismatched version (due to pg import/export)
Patrick Donnelly
09:35 PM Bug #20952 (New): Glitchy monitor quorum causes spurious test failure
Patrick Donnelly
09:35 PM Bug #20922 (New): misdirected op with localize_reads set
Patrick Donnelly
09:35 PM Bug #20846 (New): ceph_test_rados_list_parallel: options dtor racing with DispatchQueue lockdep -...
Patrick Donnelly
09:35 PM Bug #20770 (New): test_pidfile.sh test is failing 2 places
Patrick Donnelly
09:35 PM Bug #20730 (New): need new OSD_SKEWED_USAGE implementation
Patrick Donnelly
09:35 PM Bug #20370 (New): leaked MOSDOp via PrimaryLogPG::_copy_some and PrimaryLogPG::do_proxy_write
Patrick Donnelly
09:35 PM Bug #20646 (New): run_seed_to_range.sh: segv, tp_fstore_op timeout
Patrick Donnelly
09:35 PM Bug #20360 (New): rados/verify valgrind tests: osds fail to start (xenial valgrind)
Patrick Donnelly
09:35 PM Bug #20369 (New): segv in OSD::ShardedOpWQ::_process
Patrick Donnelly
09:35 PM Bug #20221 (New): kill osd + osd out leads to stale PGs
Patrick Donnelly
09:35 PM Bug #20169 (New): filestore+btrfs occasionally returns ENOSPC
Patrick Donnelly
09:35 PM Bug #20053 (New): crush compile / decompile looses precision on weight
Patrick Donnelly
09:35 PM Bug #19700 (New): OSD remained up despite cluster network being inactive?
Patrick Donnelly
09:35 PM Bug #19486 (New): Rebalancing can propagate corrupt copy of replicated object
Patrick Donnelly
09:35 PM Bug #19518 (New): log entry does not include per-op rvals?
Patrick Donnelly
09:35 PM Bug #19440 (New): osd: trims maps taht pgs haven't consumed yet when there are gaps
Patrick Donnelly
09:35 PM Bug #17257 (New): ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
Patrick Donnelly
09:35 PM Bug #15015 (New): prepare_new_pool doesn't return failure string ss
Patrick Donnelly
09:35 PM Bug #14115 (New): crypto: race in nss init
Patrick Donnelly
09:35 PM Bug #13385 (New): cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final ro...
Patrick Donnelly
09:35 PM Bug #12687 (New): osd thrashing + pg import/export can cause maybe_went_rw intervals to be missed
Patrick Donnelly
09:35 PM Bug #12615 (New): Repair of Erasure Coded pool with an unrepairable object causes pg state to los...
Patrick Donnelly
09:35 PM Bug #11235 (New): test_rados.py test_aio_read is racy
Patrick Donnelly
09:35 PM Bug #9606 (New): mon: ambiguous error_status returned to user when type is wrong in a command
Patrick Donnelly
08:31 PM Bug #43151 (Fix Under Review): ok-to-stop incorrect for some ec pgs
Sage Weil
04:33 PM Bug #43151 (Resolved): ok-to-stop incorrect for some ec pgs
before,... Sage Weil
08:16 PM Backport #43119: mimic: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/32000
merged
Yuri Weinstein
08:01 PM Backport #41238: nautilus: Implement mon_memory_target
Follow-on fix: https://github.com/ceph/ceph/pull/32045 Neha Ojha
08:00 PM Feature #40870: Implement mon_memory_target
This has a follow-on fix: https://github.com/ceph/ceph/pull/32044 Neha Ojha
06:01 PM Bug #38040: osd_map_message_max default is too high?
Luminous backport analysis:
* https://github.com/ceph/ceph/pull/26340 - two of three commits backported to luminou...
Nathan Cutler
05:50 PM Bug #43150 (In Progress): osd-scrub-snaps.sh fails
David Zafman
05:21 PM Bug #43150: osd-scrub-snaps.sh fails
During testing I saw this even though it isn't what happened in the teuthology runs. I think in all cases we have sc... David Zafman
03:51 PM Bug #43150 (Resolved): osd-scrub-snaps.sh fails
/a/sage-2019-12-04_19:33:15-rados-wip-sage2-testing-2019-12-04-0856-distro-basic-smithi/4567061
/a/sage-2019-12-04_1...
Sage Weil
05:41 PM Bug #43106: mimic: crash in build_incremental_map_msg
The three PRs that need to be backported to mimic are:
* https://github.com/ceph/ceph/pull/26340 - backported to m...
Nathan Cutler
01:41 PM Backport #43140 (In Progress): nautilus: ceph-mon --mkfs: public_address type (v1|v2) is not resp...
Nathan Cutler
11:07 AM Backport #43140 (Resolved): nautilus: ceph-mon --mkfs: public_address type (v1|v2) is not respected
https://github.com/ceph/ceph/pull/32028 Nathan Cutler
01:34 PM Bug #42485: verify_upmaps can not cancel invalid upmap_items in some cases
NOTE: https://github.com/ceph/ceph/pull/31131 was merged to master and backported to nautilus and luminous, before it... Nathan Cutler
04:04 AM Bug #42485 (Resolved): verify_upmaps can not cancel invalid upmap_items in some cases
David Zafman
01:32 PM Backport #42547: nautilus: verify_upmaps can not cancel invalid upmap_items in some cases
NOTE: reverted by https://github.com/ceph/ceph/pull/32018 Nathan Cutler
01:30 PM Backport #42548: luminous: verify_upmaps can not cancel invalid upmap_items in some cases
Note: reverted by https://github.com/ceph/ceph/pull/32019 Nathan Cutler
07:52 AM Bug #42906 (Pending Backport): ceph-mon --mkfs: public_address type (v1|v2) is not respected
Kefu Chai
06:22 AM Bug #37968 (Resolved): maybe_remove_pg_upmaps incorrectly cancels valid pending upmaps
David Zafman
06:21 AM Backport #38163 (Resolved): mimic: maybe_remove_pg_upmaps incorrectly cancels valid pending upmaps
David Zafman
04:04 AM Backport #42546 (Rejected): mimic: verify_upmaps can not cancel invalid upmap_items in some cases
This change has been reverted so we won't backport. David Zafman
12:43 AM Bug #43124: Probably legal crush rules cause upmaps to be cleaned

We are reverting the original pull request which changed verify_upmaps(): https://github.com/ceph/ceph/pull/31131
...
David Zafman

12/04/2019

08:51 PM Bug #43126 (Resolved): OSD_SLOW_PING_TIME_BACK nits

From Sage e-mail:
Long heartbeat ping times on back interface seen, longest is 1315.510 msec (OSD_SLOW_PING_TIME...
David Zafman
08:46 PM Bug #43124 (Resolved): Probably legal crush rules cause upmaps to be cleaned
I've seen multiple user sites with crush rules for EC pools which will trigger the verify_upmap() to detect an error.... David Zafman
08:24 PM Backport #42546 (In Progress): mimic: verify_upmaps can not cancel invalid upmap_items in some cases
David Zafman
08:13 PM Backport #42546 (Resolved): mimic: verify_upmaps can not cancel invalid upmap_items in some cases
David Zafman
12:13 PM Bug #38330: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
@Dan, @Neha - mimic backport staged at https://github.com/ceph/ceph/pull/26448 Nathan Cutler
02:33 AM Bug #38330 (Pending Backport): osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
Based on https://tracker.ceph.com/issues/43106#note-1 and https://tracker.ceph.com/issues/38282#note-14 Neha Ojha
12:11 PM Backport #43119 (In Progress): mimic: osd/OSD.cc: 1515: abort() in Service::build_incremental_map...
Nathan Cutler
12:08 PM Backport #43119 (Resolved): mimic: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
https://github.com/ceph/ceph/pull/32000 Nathan Cutler
02:30 AM Bug #43106: mimic: crash in build_incremental_map_msg
I think you are right. We should have backported all three PRs according to https://tracker.ceph.com/issues/38040#not... Neha Ojha

12/03/2019

07:37 PM Bug #43110 (Duplicate): rados/test.sh failure: ceph_test_rados_api_watch_notify_pp
https://tracker.ceph.com/issues/42933 Greg Farnum
07:27 PM Bug #43110: rados/test.sh failure: ceph_test_rados_api_watch_notify_pp
Neha pointed out the core info is obviously helpful:
> 1575385919.6406.core: ELF 64-bit LSB core file x86-64, vers...
Greg Farnum
07:18 PM Bug #43110 (Duplicate): rados/test.sh failure: ceph_test_rados_api_watch_notify_pp
I noticed this in a branch of my own, but it appears to be showing up in the master smoke tests too.
rados/test.sh...
Greg Farnum
05:25 PM Backport #43093: luminous: Improve OSDMap::calc_pg_upmaps() efficiency
@David Does this need https://github.com/ceph/ceph/pull/31944 as well? Nathan Cutler
05:05 PM Backport #43093 (In Progress): luminous: Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman
03:40 PM Bug #43106 (Resolved): mimic: crash in build_incremental_map_msg
Since upgrading from 13.2.6 to 13.2.7 we get this around once per 10 minutes on a cluster with 500 out of 1500 OSDs u... Dan van der Ster
03:27 PM Bug #38330: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg
https://tracker.ceph.com/issues/38282 was backported to mimic in 13.2.7.
Does this need a backport also ?
(we ha...
Dan van der Ster
09:53 AM Bug #42961: osd: increase priority in certain OSD perf counters
Neha Ojha wrote:
> Ernesto, while we are at it, are there any other specific stats that you've gotten requests for?
...
Ernesto Puerta
02:00 AM Bug #42961 (Fix Under Review): osd: increase priority in certain OSD perf counters
Ernesto, while we are at it, are there any other specific stats that you've gotten requests for? Neha Ojha
09:13 AM Backport #43099 (Resolved): nautilus: nautilus:osd: network numa affinity not supporting subnet port
https://github.com/ceph/ceph/pull/32843 Nathan Cutler
02:53 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
I think we can dispense with the session put when we call 'remove_session' since we call it when we replace the sessi... Brad Hubbard
01:17 AM Backport #43094 (In Progress): mimic: Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman
01:15 AM Backport #43092 (In Progress): nautilus: Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman

12/02/2019

11:30 PM Bug #42346 (In Progress): Nearfull warnings are incorrect

Spurious nearfull warnings caused by backfill reservation mechanism during rebalancing. The nearfull ratio was com...
David Zafman
11:23 PM Bug #42718: Improve OSDMap::calc_pg_upmaps() efficiency
https://github.com/ceph/ceph/pull/31944 is a follow-on fix for https://github.com/ceph/ceph/pull/31774 Neha Ojha
09:56 PM Bug #42718 (Pending Backport): Improve OSDMap::calc_pg_upmaps() efficiency
David Zafman
09:59 PM Backport #43094 (Resolved): mimic: Improve OSDMap::calc_pg_upmaps() efficiency
https://github.com/ceph/ceph/pull/31957 David Zafman
09:58 PM Backport #43093 (Resolved): luminous: Improve OSDMap::calc_pg_upmaps() efficiency
https://github.com/ceph/ceph/pull/31992 David Zafman
09:58 PM Backport #43092 (Resolved): nautilus: Improve OSDMap::calc_pg_upmaps() efficiency
https://github.com/ceph/ceph/pull/31956 David Zafman
06:56 PM Bug #42411 (Pending Backport): nautilus:osd: network numa affinity not supporting subnet port
Sage Weil
05:54 AM Bug #41313: PG distribution completely messed up since Nautilus
This happens with active PG balancer if the cluster is in WARN state.
...
51 hdd 9.09470 1.00000 9.1 TiB 5.8 ...
Anonymous
02:11 AM Bug #42102: use-after-free in Objecter timer handing
... Sage Weil

12/01/2019

01:23 AM Bug #43067 (New): Git Master: src/compressor/zlib/ZlibCompressor.cc / src/compressor/zlib/CMakeLi...
When Ceph is built without support for CPU feature SSE4_1 (HAVE_INTEL_SSE4_1), the CMake build system does not link ... Lee Leahu

11/29/2019

06:01 PM Bug #42780: recursive lock of OpTracker::lock (70)
I will be working on this bug after returning from PTO (ETA: 16 Dec 2019). Radoslaw Zarzynski
05:41 PM Bug #42780: recursive lock of OpTracker::lock (70)
THe problem comes from OSD::get_health_metrics(), where the visitor lambda is holding the lock and also drops a refer... Sage Weil

11/27/2019

08:40 PM Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired
https://www.spinics.net/lists/ceph-users/msg54910.html - could also be related. Neha Ojha
08:36 PM Bug #43048 (Won't Fix - EOL): nautilus: upgrade/mimic-x/stress-split: failed to recover before ti...
... Neha Ojha
04:52 PM Bug #24419: ceph-objectstore-tool unable to open mon store
To be clear, this isn't an issue in mimic or later releases. Josh Durgin
04:50 PM Bug #24419 (Won't Fix): ceph-objectstore-tool unable to open mon store
It looks like this is due to bluestore setting the rocksdb_db_paths config option in luminous. This causes the ceph-o... Josh Durgin

11/26/2019

10:18 PM Bug #42978 (Resolved): ops waiting for lock not requeued; client sees misordering
Sage Weil
10:16 PM Bug #42012 (Resolved): mon osd_snap keys grow unbounded
Sage Weil
09:13 AM Backport #42258 (In Progress): mimic: document new option mon_max_pg_per_osd
Nathan Cutler
05:18 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Theory:
In Monitor::_ms_dispatch() when we detect a feature change we end up with the following sequence.
Monit...
Brad Hubbard
03:45 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
... Brad Hubbard
03:20 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
I'm wondering if maybe this happens due to the feature change and the session being removed during the upgrade proces... Brad Hubbard

11/25/2019

07:32 PM Bug #42012 (Fix Under Review): mon osd_snap keys grow unbounded
Sage Weil
07:29 PM Bug #42012 (In Progress): mon osd_snap keys grow unbounded
Okay, in octopus, there are now 2 sets of keys
- purged_snap_*: map intervals of snaps that are purged. adjacent r...
Sage Weil
07:16 PM Bug #42978 (Fix Under Review): ops waiting for lock not requeued; client sees misordering
Sage Weil
03:35 PM Feature #39066 (Resolved): src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:58 PM Feature #39066: src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
Rejecting luminous backport - luminous is EOL. Nathan Cutler
03:33 PM Bug #40910 (Resolved): mon/OSDMonitor.cc: better error message about min_size
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:57 PM Bug #40910: mon/OSDMonitor.cc: better error message about min_size
Rejecting luminous backport - luminous is EOL. Nathan Cutler
03:33 PM Bug #41017 (Resolved): Change default for bluestore_fsck_on_mount_deep as false
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
02:56 PM Bug #41017: Change default for bluestore_fsck_on_mount_deep as false
Rejecting luminous backport - luminous is EOL. Nathan Cutler
03:29 PM Bug #42933 (Rejected): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify2/1
we reverted, see https://github.com/ceph/ceph/pull/31790 Sage Weil
03:23 PM Backport #38205 (In Progress): luminous: osds allows to partially start more than N+2
Nathan Cutler
03:19 PM Backport #40947 (In Progress): luminous: Better default value for osd_snap_trim_sleep
Nathan Cutler
03:13 PM Backport #41730 (Need More Info): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(pe...
Nathan Cutler
03:05 PM Backport #41730 (In Progress): luminous: osd/ReplicatedBackend.cc: 1349: FAILED ceph_assert(peer_...
Nathan Cutler
02:57 PM Backport #39381 (Rejected): luminous: src/ceph-disk/tests/ceph-disk.sh is using hardcoded port
Nathan Cutler
02:57 PM Backport #40941 (Rejected): luminous: mon/OSDMonitor.cc: better error message about min_size
Nathan Cutler
02:56 PM Backport #41085 (Rejected): luminous: Change default for bluestore_fsck_on_mount_deep as false
Nathan Cutler
09:44 AM Backport #42998 (Resolved): mimic: acting_recovery_backfill won't catch all up peers
https://github.com/ceph/ceph/pull/33324 Nathan Cutler
09:43 AM Backport #42997 (Resolved): nautilus: acting_recovery_backfill won't catch all up peers
https://github.com/ceph/ceph/pull/32064 Nathan Cutler
09:43 AM Backport #42996 (Rejected): luminous: acting_recovery_backfill won't catch all up peers
https://github.com/ceph/ceph/pull/33326 Nathan Cutler
06:56 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
I'm interested to see more cores in case one sheds more light. I've started a few runs in the hope they will fail. Brad Hubbard
05:14 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
In both cases so far the Message type appears to be MSG_MON_PAXOS and priority is CEPH_MSG_PRIO_HIGH.... Brad Hubbard
03:40 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
Neha sent me another instance of this issue available at http://pulpito.ceph.com/nojha-2019-11-22_18:41:03-rados:upgr... Brad Hubbard
12:58 AM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
... Brad Hubbard
05:27 AM Bug #42971: mgr hangs with upmap balancer
So I wrote my own upmap balancer this weekend and after running it for a bit I found the same problem. It appears th... Bryan Stillwell
04:28 AM Backport #42662 (In Progress): nautilus:Issue a HEALTH_WARN when a Pool is configured with [min_]...
Sridhar Seshasayee

11/24/2019

06:17 PM Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup
/a/sage-2019-11-24_06:32:18-rados-wip-sage-testing-2019-11-23-2031-distro-basic-smithi/4538572... Sage Weil
04:58 PM Bug #42577 (Pending Backport): acting_recovery_backfill won't catch all up peers
Kefu Chai
04:57 PM Bug #42782: nautilus: rados/test_librados_build.sh build failure
https://github.com/ceph/ceph/pull/31693 Kefu Chai

11/22/2019

10:22 PM Bug #42975 (Duplicate): out of order ops in rados/upgrade/nautilus-x-singleton
Neha Ojha
05:49 PM Bug #42975: out of order ops in rados/upgrade/nautilus-x-singleton
Another out of order bug https://tracker.ceph.com/issues/42328. Neha Ojha
05:47 PM Bug #42975 (Duplicate): out of order ops in rados/upgrade/nautilus-x-singleton
... Neha Ojha
09:36 PM Bug #42968 (Duplicate): TestClsRbd.mirror_image_status failure during luminous->nautilus upgrade
Duplicating to https://tracker.ceph.com/issues/42891 as its the same issue. Jason Dillaman
03:06 PM Bug #42968 (Duplicate): TestClsRbd.mirror_image_status failure during luminous->nautilus upgrade
Run: http://pulpito.ceph.com/teuthology-2019-11-22_02:25:03-upgrade:luminous-x-nautilus-distro-basic-smithi/
Jobs:'4...
Yuri Weinstein
07:44 PM Bug #42978: ops waiting for lock not requeued; client sees misordering
reproduces with suite: rados:upgrade:nautilus-x-singleton
filter: '0-cluster/{openstack.yaml start.yaml} 1-install...
Neha Ojha
07:09 PM Bug #42978: ops waiting for lock not requeued; client sees misordering
ok, 99% sure the problem si this bit of code in release_object_locks()... Sage Weil
07:05 PM Bug #42978 (Resolved): ops waiting for lock not requeued; client sees misordering
a ceph_test_rados sequence of ops come in, but replies go back out of order... Sage Weil
06:44 PM Bug #42977 (Resolved): mon/Elector.cc: FAILED ceph_assert(m->epoch == get_epoch())
... Neha Ojha
04:37 PM Bug #42971: mgr hangs with upmap balancer
We are using device classes. Bryan Stillwell
04:28 PM Bug #42971: mgr hangs with upmap balancer
Hey Bryan, David's been fixing a couple issues in the balancer that sound like what you're running into:
1) https:...
Josh Durgin
04:15 PM Bug #42971 (New): mgr hangs with upmap balancer
On multiple clusters we are seeing the mgr hang frequently when the balancer is enabled. It seems that the balancer ... Bryan Stillwell
01:24 PM Bug #42964 (Resolved): monitor config store: Deleting logging config settings does not decrease l...
How to reproduce:
1. increase log level of mds:
ceph config set mds debug_mds 10/10
2. try to revert this:
ce...
Марк Коренберг
11:22 AM Bug #42477: Rados should use the '-o outfile' convention
@Nathan, I think that's the right decision in this case mate. It should be less disruptive hopefully. Brad Hubbard
08:45 AM Bug #42477: Rados should use the '-o outfile' convention
@Brad - got it, thanks. So, the issue is fixed as of Octopus and the fix will not be backported for the reason you st... Nathan Cutler
08:44 AM Bug #42477 (Resolved): Rados should use the '-o outfile' convention
Nathan Cutler
11:16 AM Bug #42961 (Resolved): osd: increase priority in certain OSD perf counters
There are reports from users missing stats in dashboard/prometheus mgr modules about the following perf counters:
<p...
Ernesto Puerta

11/21/2019

03:56 PM Bug #42933 (Rejected): LibRadosWatchNotifyPPTests/LibRadosWatchNotifyPP.WatchNotify2/1
... Sage Weil
03:23 PM Bug #42918: memory corruption and lockups with I-Object
I managed to grab the stack traces from when it locks up instead of crashing -- also around watch/notify in the face ... Ilya Dryomov
03:18 PM Bug #42918: memory corruption and lockups with I-Object
Ilya Dryomov wrote:
> Haven't tried without failure injection yet, but it's probably related to ms_inject_socket_fai...
Ilya Dryomov
02:37 PM Bug #42918: memory corruption and lockups with I-Object
one segfault related to watch/notify is fixed in https://github.com/ceph/ceph/pull/31768, but testing in the rgw suit... Casey Bodley
01:31 PM Bug #42918: memory corruption and lockups with I-Object
Haven't tried without failure injection yet, but it's probably related to ms_inject_socket_failures (and resulting wa... Ilya Dryomov
01:28 PM Bug #42918: memory corruption and lockups with I-Object
@Ilya: does it reproduce when you have injected socket failures disabled? From your initial logs and from the backtra... Jason Dillaman
01:25 PM Bug #42918: memory corruption and lockups with I-Object
Excellent sleuthing -- thanks! I am going to bump this over to the RADOS project since I can't see how this is purely... Jason Dillaman
11:56 AM Bug #42918: memory corruption and lockups with I-Object
Got actionable stack traces on 669453138d89:... Ilya Dryomov
10:42 AM Bug #42918: memory corruption and lockups with I-Object
Looks real and seems to be introduced with I-Object: no issues with 36f5fcbb97eb ("Merge PR #31672 into master") and ... Ilya Dryomov
05:26 AM Bug #42718: Improve OSDMap::calc_pg_upmaps() efficiency

The rules based pool groups being passed to calc_pg_upmaps() is a better method, so we don't want to revert.
try...
David Zafman
01:58 AM Backport #41531 (Resolved): nautilus: Move bluefs alloc size initialization log message to log le...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30229
m...
Nathan Cutler

11/20/2019

11:02 PM Bug #42921 (Can't reproduce): osd: segmentation fault in PGLog::check
... Patrick Donnelly
10:21 PM Bug #42918: memory corruption and lockups with I-Object
The stack is corrupt in both:... Ilya Dryomov
10:11 PM Bug #42918 (Closed): memory corruption and lockups with I-Object
http://pulpito.ceph.com/dis-2019-11-20_20:35:04-krbd-master-wip-krbd-readonly-basic-mira/4526411... Ilya Dryomov
10:06 PM Bug #42783 (Resolved): test failure: due to client closed connection
Neha Ojha
08:07 PM Backport #41531: nautilus: Move bluefs alloc size initialization log message to log level 1
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30229
merged
Yuri Weinstein
05:47 PM Bug #42782 (Resolved): nautilus: rados/test_librados_build.sh build failure
Nathan Cutler
02:54 PM Bug #42906 (Resolved): ceph-mon --mkfs: public_address type (v1|v2) is not respected
When calling `ceph-mon --mkfs ... --public_address v1:<ip_address>:<random_port>`
the `v1:` type is ignored and the ...
Ricardo Dias
08:07 AM Backport #42796 (Resolved): luminous: unnecessary error message "calc_pg_upmaps failed to build o...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31598
m...
Nathan Cutler
02:35 AM Bug #42890 (New): Deadlock occurs when exiting with dpdkstack
exit() will call pthread_cond_destroy attempting to destroy dpdk::eal::cond
upon which other threads are currently b...
chunsong feng
02:14 AM Bug #40367: "*** Caught signal (Segmentation fault) **" in upgrade:luminous-x-nautilus
/a/sage-2019-11-19_05:29:27-rados-wip-sage-testing-2019-11-18-1656-distro-basic-smithi/4522662
Sage Weil

11/19/2019

09:45 PM Backport #42796: luminous: unnecessary error message "calc_pg_upmaps failed to build overfull/und...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31598
merged
Yuri Weinstein
03:24 PM Bug #42884: OSDMapTest.CleanPGUpmaps failure
not reproducible locally. i am testing 9b61479da4f89014b6d1857287102bbc9db13e6e Kefu Chai
02:00 PM Bug #42884 (New): OSDMapTest.CleanPGUpmaps failure
During a make check build for a PR, I got this crash during OSD testing:
https://jenkins.ceph.com/job/ceph-pull-re...
Jeff Layton
12:55 PM Backport #42846 (In Progress): nautilus: src/msg/async/net_handler.cc: Fix compilation
Nathan Cutler
12:50 PM Backport #42739 (In Progress): nautilus: scrub object count mismatch on device_health_metrics pool
Nathan Cutler
09:01 AM Bug #41669 (Resolved): Make dumping of reservation info congruent between scrub and recovery
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:01 AM Bug #41924 (Resolved): asynchronous recovery can not function under certain circumstances
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:00 AM Bug #42015 (Resolved): Remove unused full and nearful output from OSDMap summary
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
09:00 AM Backport #42879 (Resolved): mimic: ceph_test_admin_socket_output fails in rados qa suite
https://github.com/ceph/ceph/pull/33323 Nathan Cutler
09:00 AM Backport #42878 (Resolved): nautilus: ceph_test_admin_socket_output fails in rados qa suite
https://github.com/ceph/ceph/pull/32063 Nathan Cutler
08:54 AM Backport #41785 (Resolved): nautilus: Make dumping of reservation info congruent between scrub an...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31444
m...
Nathan Cutler
08:39 AM Backport #42141 (Resolved): nautilus: asynchronous recovery can not function under certain circum...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31077
m...
Nathan Cutler
08:39 AM Backport #42136 (Resolved): nautilus: Remove unused full and nearful output from OSDMap summary
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30900
m...
Nathan Cutler
06:25 AM Bug #42387 (Pending Backport): ceph_test_admin_socket_output fails in rados qa suite
Kefu Chai

11/18/2019

09:45 PM Bug #42830: problem returning mon to cluster
I forgot, Ceph is at version 14.2.1 on our side. Jérôme Poulin
09:44 PM Bug #42830: problem returning mon to cluster
We encountered the same problem last week, after stopping a monitor service on a server on the cluster, trying to sta... Jérôme Poulin
09:13 PM Bug #42387 (In Progress): ceph_test_admin_socket_output fails in rados qa suite
Brad Hubbard
09:09 AM Bug #42387 (Fix Under Review): ceph_test_admin_socket_output fails in rados qa suite
Nathan Cutler
08:14 PM Bug #42782: nautilus: rados/test_librados_build.sh build failure
https://github.com/ceph/ceph/pull/31604 merged Yuri Weinstein
08:13 PM Backport #41785: nautilus: Make dumping of reservation info congruent between scrub and recovery
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31444
merged
Yuri Weinstein
08:08 PM Backport #42141: nautilus: asynchronous recovery can not function under certain circumstances
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31077
merged
Yuri Weinstein
08:05 PM Backport #42136: nautilus: Remove unused full and nearful output from OSDMap summary
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30900
merged
Yuri Weinstein
08:29 AM Bug #42592 (Duplicate): ceph-mon/mgr PGstat Segmentation Fault
Nathan Cutler
07:43 AM Bug #42577 (Fix Under Review): acting_recovery_backfill won't catch all up peers
xie xingguo
07:14 AM Bug #42861 (Fix Under Review): Libceph-common.so needs to use private link attribute when includi...
Libceph-common.so does not specify a link attribute containing the dpdk library,
dpdk global variables and function...
chunsong feng

11/17/2019

10:27 PM Bug #42477: Rados should use the '-o outfile' convention
Note that the reason I did not set this for backport is that it has the potential to break existing scripts and funct... Brad Hubbard
06:02 PM Bug #42082 (Resolved): pybind/rados: set_omap() crash on py3
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
06:00 PM Backport #42853 (Resolved): nautilus: format error: ceph osd stat --format=json
https://github.com/ceph/ceph/pull/32062 Nathan Cutler
06:00 PM Backport #42852 (Resolved): mimic: format error: ceph osd stat --format=json
https://github.com/ceph/ceph/pull/33322 Nathan Cutler
05:59 PM Backport #42848 (Rejected): nautilus: "failing miserably..." in Infiniband.cc
Nathan Cutler
05:59 PM Backport #42847 (Rejected): mimic: "failing miserably..." in Infiniband.cc
Nathan Cutler
05:59 PM Backport #42846 (Resolved): nautilus: src/msg/async/net_handler.cc: Fix compilation
https://github.com/ceph/ceph/pull/31736 Nathan Cutler
05:53 PM Bug #42821 (Pending Backport): src/msg/async/net_handler.cc: Fix compilation
Kefu Chai
04:08 PM Bug #42845 (New): CVE-2019-14818
https://nvd.nist.gov/vuln/detail/CVE-2019-14818 affects many versions of `dpdk`.
In ceph, you appear to bundle ver...
Robert Scott

11/16/2019

05:16 PM Bug #42742 (Pending Backport): "failing miserably..." in Infiniband.cc
Kefu Chai
05:12 PM Bug #42477 (Pending Backport): Rados should use the '-o outfile' convention
Kefu Chai
05:09 PM Bug #42501 (Pending Backport): format error: ceph osd stat --format=json
Kefu Chai
05:02 PM Bug #42501 (Resolved): format error: ceph osd stat --format=json
Kefu Chai
04:55 PM Bug #42689 (Duplicate): nautilus mon/mgr: ceph status:pool number display is not right
Kefu Chai
06:38 AM Bug #42332 (Resolved): CephContext::CephContextServiceThread might pause for 5 seconds at shutdown
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
06:38 AM Bug #42360 (Resolved): python3-cephfs should provide python36-cephfs
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
06:34 AM Backport #42363 (Resolved): nautilus: python3-cephfs should provide python36-cephfs
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30983
m...
Nathan Cutler
06:33 AM Backport #42395 (Resolved): nautilus: CephContext::CephContextServiceThread might pause for 5 sec...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/31097
m...
Nathan Cutler

11/15/2019

10:56 PM Documentation #12059: rados/troubleshooting/troubleshooting-mon: suprious `` in titles
Looks like this was fixed by https://github.com/ceph/ceph/pull/5004, not https://github.com/ceph/ceph/pull/4988. Nathan Cutler
10:53 PM Documentation #12059 (Resolved): rados/troubleshooting/troubleshooting-mon: suprious `` in titles
Nathan Cutler
08:37 AM Documentation #12059: rados/troubleshooting/troubleshooting-mon: suprious `` in titles
This issue has already been resolved. Updated the pull request ID for the same. Deepika Upadhyay
10:39 PM Backport #42363: nautilus: python3-cephfs should provide python36-cephfs
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/30983
merged
Yuri Weinstein
10:38 PM Backport #42395: nautilus: CephContext::CephContextServiceThread might pause for 5 seconds at shu...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/31097
merged
Yuri Weinstein
09:46 PM Bug #42824 (Resolved): mimic: rebuild_mondb.cc: FAILED assert(0) in update_osdmap()
Nathan Cutler
05:58 PM Bug #42824 (Fix Under Review): mimic: rebuild_mondb.cc: FAILED assert(0) in update_osdmap()
Neha Ojha
05:39 AM Bug #42824: mimic: rebuild_mondb.cc: FAILED assert(0) in update_osdmap()
... Brad Hubbard
07:31 AM Bug #42830 (New): problem returning mon to cluster
as discussed on the list, here https://www.spinics.net/lists/ceph-users/msg55977.html
After rebooting one of the n...
Nikola Ciprich
03:42 AM Bug #42821 (Fix Under Review): src/msg/async/net_handler.cc: Fix compilation
Kefu Chai

11/14/2019

09:58 PM Bug #42824 (Resolved): mimic: rebuild_mondb.cc: FAILED assert(0) in update_osdmap()
The assert got added recently https://github.com/ceph/ceph/commit/43662bd4266a4fcc8db62f4bc9beb0de78eef355#diff-431a9... Neha Ojha
09:37 PM Bug #22656: scrub mismatch on bytes (cache pools)
mimic with cache pools: /a/yuriw-2019-11-09_18:55:35-rados-wip-yuri-mimic_13.2.7_RC2-distro-basic-smithi/4489809/
...
Neha Ojha
07:33 PM Bug #42821 (Resolved): src/msg/async/net_handler.cc: Fix compilation
On a Cray system I'm working on, it seems that SO_PRIORITY is defined, but IPTOS_CLASS_CS6 is not. Without this patch... Carlos Valiente
05:17 PM Backport #41092: nautilus: rocksdb: enable rocksdb_rmrange=true by default and make delete range ...
partially reverted by https://github.com/ceph/ceph/pull/31612 Nathan Cutler
04:44 PM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
I think this is when this started:... Dan van der Ster
03:46 PM Bug #37875: osdmaps aren't being cleaned up automatically on healthy cluster
I may have found more about what's causing this.
I have a cluster with ~1600 uncleaned maps. ...
Dan van der Ster
04:14 PM Bug #42783 (Fix Under Review): test failure: due to client closed connection
This is just the 'mon tell' implementation sucking up through nautilus. it's rewritten to not suck for octopus.
F...
Sage Weil
03:54 PM Bug #24531 (Resolved): Mimic MONs have slow/long running ops
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:54 PM Bug #37654 (Resolved): FAILED ceph_assert(info.history.same_interval_since != 0) in PG::start_pee...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:53 PM Bug #38483 (Resolved): FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_spl...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:53 PM Feature #39162 (Resolved): Improvements to standalone tests.
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:52 PM Bug #40287 (Resolved): OSDMonitor: missing `pool_id` field in `osd pool ls` command
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:51 PM Bug #40620 (Resolved): Explicitly requested repair of an inconsistent PG cannot be scheduled time...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:51 PM Feature #40870 (Resolved): Implement mon_memory_target
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:50 PM Bug #41210 (Resolved): osd: failure result of do_osd_ops not logged in prepare_transaction function
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:49 PM Bug #41330 (Resolved): hidden corei7 requirement in binary packages
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:49 PM Bug #41816 (Resolved): Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert inf...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:48 PM Bug #42052 (Resolved): mgr/balancer FAILED ceph_assert(osd_weight.count(i.first))
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:48 PM Bug #42111 (Resolved): max_size from crushmap ignored when increasing size on pool
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:37 PM Backport #42547 (Resolved): nautilus: verify_upmaps can not cancel invalid upmap_items in some cases
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30899
m...
Nathan Cutler
03:37 PM Backport #42126 (Resolved): nautilus: mgr/balancer FAILED ceph_assert(osd_weight.count(i.first))
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30899
m...
Nathan Cutler
03:36 PM Backport #41238 (Resolved): nautilus: Implement mon_memory_target
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30419
m...
Nathan Cutler
03:34 PM Backport #42326 (Resolved): nautilus: max_size from crushmap ignored when increasing size on pool
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30941
m...
Nathan Cutler
03:34 PM Backport #42242: nautilus: Adding Placement Group id in Large omap log message
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30923
m...
Nathan Cutler
03:33 PM Backport #40840 (Resolved): nautilus: Explicitly requested repair of an inconsistent PG cannot be...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29748
m...
Nathan Cutler
03:32 PM Backport #41917 (Resolved): nautilus: osd: failure result of do_osd_ops not logged in prepare_tra...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30546
m...
Nathan Cutler
03:32 PM Backport #39517 (Resolved): nautilus: Improvements to standalone tests.
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30528
m...
Nathan Cutler
03:31 PM Backport #42014 (Resolved): nautilus: Enable auto-scaler and get src/osd/PeeringState.cc:3671: fa...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30528
m...
Nathan Cutler
03:30 PM Backport #41921 (Resolved): nautilus: OSDMonitor: missing `pool_id` field in `osd pool ls` command
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30486
m...
Nathan Cutler
03:30 PM Backport #41862 (Resolved): nautilus: Mimic MONs have slow/long running ops
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30480
m...
Nathan Cutler
03:30 PM Backport #41712 (Resolved): nautilus: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::regist...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30371
m...
Nathan Cutler
03:30 PM Backport #41640 (Resolved): nautilus: FAILED ceph_assert(info.history.same_interval_since != 0) i...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30280
m...
Nathan Cutler
12:36 PM Backport #41350 (Resolved): nautilus: hidden corei7 requirement in binary packages
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/29772
m...
Nathan Cutler
10:05 AM Bug #42810 (Resolved): ceph config rm does not revert debug_mon to default
When we increase the debug_mon level to 10, then rm the setting, the effective log level is stuck at 10 in the ceph-m... Dan van der Ster
10:03 AM Feature #41359 (Resolved): Adding Placement Group id in Large omap log message
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
04:54 AM Bug #42387: ceph_test_admin_socket_output fails in rados qa suite
As suspected this is due to the length of time the 'bench' command takes to run and return a result to the socket.
...
Brad Hubbard

11/13/2019

10:44 PM Bug #42328: osd/PrimaryLogPG.cc: 3962: ceph_abort_msg("out of order op")
/a/trociny-2019-10-15_07:49:13-rbd-master-distro-basic-smithi/4414497/ Neha Ojha
10:32 PM Bug #42503 (Closed): There are a lot of OSD downturns on this node. After PG is redistributed, a ...
Yes, sometimes CRUSH selection fails when you have a very small number of choices compared to the number of required ... Greg Farnum
10:27 PM Bug #42529 (Closed): memory bloat + OSD process crash
Greg Farnum
10:26 PM Bug #42577 (Rejected): acting_recovery_backfill won't catch all up peers
Xie, feel free to reopen it with more explanation, if you still think this is a problem. Neha Ojha
10:07 PM Backport #42242 (Resolved): nautilus: Adding Placement Group id in Large omap log message
Brad Hubbard
08:31 PM Backport #42547: nautilus: verify_upmaps can not cancel invalid upmap_items in some cases
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30899
merged
Yuri Weinstein
08:30 PM Backport #42126: nautilus: mgr/balancer FAILED ceph_assert(osd_weight.count(i.first))
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30899
merged
Yuri Weinstein
08:25 PM Backport #41238: nautilus: Implement mon_memory_target
Sridhar Seshasayee wrote:
> https://github.com/ceph/ceph/pull/30419
merged
Yuri Weinstein
08:21 PM Backport #42326: nautilus: max_size from crushmap ignored when increasing size on pool
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30941
merged
Yuri Weinstein
08:20 PM Feature #41359: Adding Placement Group id in Large omap log message
https://github.com/ceph/ceph/pull/30923 merged Yuri Weinstein
08:13 PM Backport #40840: nautilus: Explicitly requested repair of an inconsistent PG cannot be scheduled ...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/29748
merged
Yuri Weinstein
08:09 PM Backport #41917: nautilus: osd: failure result of do_osd_ops not logged in prepare_transaction fu...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30546
merged
Yuri Weinstein
08:09 PM Backport #39517: nautilus: Improvements to standalone tests.
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30528
merged
Yuri Weinstein
08:09 PM Backport #42014: nautilus: Enable auto-scaler and get src/osd/PeeringState.cc:3671: failed assert...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30528
merged
Yuri Weinstein
08:07 PM Backport #41921: nautilus: OSDMonitor: missing `pool_id` field in `osd pool ls` command
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30486
merged
Yuri Weinstein
08:07 PM Backport #41862: nautilus: Mimic MONs have slow/long running ops
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30480
merged
Yuri Weinstein
08:06 PM Backport #41712: nautilus: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30371
merged
Yuri Weinstein
08:06 PM Backport #41640: nautilus: FAILED ceph_assert(info.history.same_interval_since != 0) in PG::start...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30280
merged
Yuri Weinstein
07:14 PM Bug #42783: test failure: due to client closed connection
Marking this high, since this is showing up a lot on nautilus. Neha Ojha
03:32 AM Bug #42783: test failure: due to client closed connection
the connections were closed by client repeatly. i wonder if it's expected: we have "msgr-failures/fastclose.yaml". an... Kefu Chai
03:29 AM Bug #42783 (Resolved): test failure: due to client closed connection
on client side:... Kefu Chai
04:45 PM Backport #41350: nautilus: hidden corei7 requirement in binary packages
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/29772
merged
Yuri Weinstein
01:46 PM Bug #42782: nautilus: rados/test_librados_build.sh build failure
Note: when octopus is split off from master, it will start having this same problem. Nathan Cutler
01:46 PM Bug #42782: nautilus: rados/test_librados_build.sh build failure
mimic and luminous already have this fix. It was backported from master before the nautilus release. Nathan Cutler
01:26 PM Bug #42782 (Fix Under Review): nautilus: rados/test_librados_build.sh build failure
Nathan Cutler
12:41 PM Backport #42798 (In Progress): mimic: unnecessary error message "calc_pg_upmaps failed to build o...
Nathan Cutler
12:25 PM Backport #42798 (Resolved): mimic: unnecessary error message "calc_pg_upmaps failed to build over...
https://github.com/ceph/ceph/pull/31957 Nathan Cutler
12:41 PM Backport #42797 (In Progress): nautilus: unnecessary error message "calc_pg_upmaps failed to buil...
Nathan Cutler
12:25 PM Backport #42797 (Resolved): nautilus: unnecessary error message "calc_pg_upmaps failed to build o...
https://github.com/ceph/ceph/pull/31956 Nathan Cutler
12:40 PM Backport #42796 (In Progress): luminous: unnecessary error message "calc_pg_upmaps failed to buil...
Nathan Cutler
12:24 PM Backport #42796 (Resolved): luminous: unnecessary error message "calc_pg_upmaps failed to build o...
https://github.com/ceph/ceph/pull/31598 Nathan Cutler
12:26 PM Bug #41680 (Resolved): Removed OSDs with outstanding peer failure reports crash the monitor
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
12:18 PM Backport #41695: nautilus: Network ping monitoring
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30195
m...
Nathan Cutler
03:33 AM Backport #41695 (Resolved): nautilus: Network ping monitoring
David Zafman
12:17 PM Backport #42152 (Resolved): nautilus: Removed OSDs with outstanding peer failure reports crash th...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30904
m...
Nathan Cutler
07:59 AM Bug #42387: ceph_test_admin_socket_output fails in rados qa suite
It's only the bench command that causes the issue.... Brad Hubbard
03:39 AM Feature #40640 (Resolved): Network ping monitoring
David Zafman
03:39 AM Backport #41697: luminous: Network ping monitoring
Backporting this requires https://github.com/ceph/ceph/pull/31277 David Zafman
03:37 AM Backport #41696: mimic: Network ping monitoring
Backporting this requires https://github.com/ceph/ceph/pull/31275 from https://tracker.ceph.com/issues/42570 David Zafman

11/12/2019

11:40 PM Backport #41695: nautilus: Network ping monitoring
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30195
merged
Yuri Weinstein
11:40 PM Backport #42152: nautilus: Removed OSDs with outstanding peer failure reports crash the monitor
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/30904
merged
Yuri Weinstein
11:32 PM Bug #42782 (Resolved): nautilus: rados/test_librados_build.sh build failure
... Neha Ojha
08:22 PM Bug #42780 (Resolved): recursive lock of OpTracker::lock (70)
I was testing 2 cephfs clients vs. a vstart cluster and the osd crashed.... Jeff Layton
06:24 PM Bug #42756 (Pending Backport): unnecessary error message "calc_pg_upmaps failed to build overfull...
Neha Ojha
04:31 PM Bug #41362 (Resolved): Rados bench sequential and random read: not behaving as expected when op s...
Neha Ojha
07:06 AM Feature #41666 (Pending Backport): Issue a HEALTH_WARN when a Pool is configured with [min_]size ...
PR #31416 - https://github.com/ceph/ceph/pull/31416 is now merged in to master. Sridhar Seshasayee
03:37 AM Bug #42387 (New): ceph_test_admin_socket_output fails in rados qa suite
... Kefu Chai

11/11/2019

11:00 PM Feature #14865: Permit cache eviction of watched object
See [1] for an abandoned PR
[1] http://tracker.ceph.com/issues/14865
Jason Dillaman
10:08 PM Support #42584 (Closed): MGR error: auth: could not find secret_id=<number>
I'm closing this as I think it got addressed on the mailing list? Greg Farnum
02:39 PM Support #42584: MGR error: auth: could not find secret_id=<number>
This error message is *not* only written in active MGR log but in specific OSD logs, too. Thomas Schneider
09:45 PM Bug #42756 (Fix Under Review): unnecessary error message "calc_pg_upmaps failed to build overfull...
Neha Ojha
08:25 PM Bug #42756 (Resolved): unnecessary error message "calc_pg_upmaps failed to build overfull/underfull"
After enabling ceph-mgr module balancer in upmap mode, we can see in ceph-mgr logs messages like:
-1 calc_pg_upmaps ...
Neha Ojha
08:00 PM Bug #41191 (Resolved): osd: scrub error on big objects; make bluestore refuse to start on big obj...
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
07:58 PM Bug #41936 (Resolved): scrub errors after quick split/merge cycle
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
03:48 PM Bug #42501 (Fix Under Review): format error: ceph osd stat --format=json
Kefu Chai
02:24 PM Bug #42742 (Resolved): "failing miserably..." in Infiniband.cc
lockdep should be initialized before creating any mutex.
as RDMA is always enabled when building ceph. and global ...
Kefu Chai
12:52 PM Backport #41958 (Resolved): nautilus: scrub errors after quick split/merge cycle
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30643
m...
Nathan Cutler
12:52 PM Backport #42095: nautilus: global osd crash in DynamicPerfStats::add_to_reports
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30648
m...
Nathan Cutler
12:51 PM Backport #41920 (Resolved): nautilus: osd: scrub error on big objects; make bluestore refuse to s...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30783
m...
Nathan Cutler
12:36 PM Backport #42739 (Resolved): nautilus: scrub object count mismatch on device_health_metrics pool
https://github.com/ceph/ceph/pull/31735 Nathan Cutler
 

Also available in: Atom