Project

General

Profile

Activity

From 06/19/2018 to 07/18/2018

07/18/2018

09:42 PM Backport #24989: mimic: Limit pg log length during recovery/backfill so that we don't run out of ...
We can hold off on this backport for now. Need to let this bake in master for a while. Neha Ojha
08:00 PM Backport #24989 (Resolved): mimic: Limit pg log length during recovery/backfill so that we don't ...
https://github.com/ceph/ceph/pull/23403 Nathan Cutler
09:42 PM Backport #24988: luminous: Limit pg log length during recovery/backfill so that we don't run out ...
We can hold off on this backport for now. Need to let this bake in master for a while.
Also, this backport is going ...
Neha Ojha
08:00 PM Backport #24988 (Resolved): luminous: Limit pg log length during recovery/backfill so that we don...
https://github.com/ceph/ceph/pull/23211 Nathan Cutler
09:38 PM Bug #24975 (Pending Backport): valgrind-leaks.yaml: expected valgrind issues and found none
This issue has been fixed in master by https://github.com/ceph/ceph/pull/22261
Needs to be backported to mimic.
Neha Ojha
09:14 PM Bug #24935 (Duplicate): SafeTimer? osd killed by kernel for Segmentation fault
This appears to be another instance of #23352. Josh Durgin
09:12 PM Bug #24938: luminous: rados listomapkeys & listomapvals don't return data.
Did you check that this bucket actually has any entries? These commands are tested in our suite. Greg Farnum
08:46 PM Bug #24990 (Resolved): api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
... Neha Ojha
06:10 PM Feature #23979 (Pending Backport): Limit pg log length during recovery/backfill so that we don't ...
Josh Durgin
04:15 PM Support #24980: Pg Inconsistent - failed to pick suitable auth object
Alon Avrahami wrote:
> Hi,
>
>
> We have ceph cluster installed with Luminous 12.2.2 using bluestore.
> All no...
Alon Avrahami
01:24 PM Support #24980 (Rejected): Pg Inconsistent - failed to pick suitable auth object
Hi,
We have ceph cluster installed with Luminous 12.2.2 using bluestore.
All nodes are Intel servers with 1.6TB...
Alon Avrahami
03:42 PM Backport #24472 (Resolved): mimic: Ceph-osd crash when activate SPDK
Nathan Cutler
02:32 PM Backport #24472: mimic: Ceph-osd crash when activate SPDK
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22684
merged
Yuri Weinstein
03:36 PM Bug #24950 (Resolved): Running osd_skip_data_digest in a mixed cluster is not ideal
Nathan Cutler
03:35 PM Backport #24865 (Resolved): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/te...
Nathan Cutler
02:20 PM Backport #24865: mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-...
Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/23024
merged
Yuri Weinstein
03:14 PM Backport #24951 (Resolved): mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
David Zafman
02:24 PM Backport #24951: mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
David Zafman wrote:
> https://github.com/ceph/ceph/pull/23084
nerged
Yuri Weinstein
02:22 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
https://github.com/ceph/ceph/pull/23096 merged Yuri Weinstein
11:20 AM Documentation #20894 (Resolved): rados manpage does not document "cleanup"
https://github.com/ceph/ceph/pull/16777 Nathan Cutler

07/17/2018

10:48 PM Bug #24975 (Resolved): valgrind-leaks.yaml: expected valgrind issues and found none
... Neha Ojha
10:43 PM Bug #24974 (New): Segmentation fault in tcmalloc::ThreadCache::ReleaseToCentralCache()
... Neha Ojha
08:32 PM Backport #24583 (Resolved): mimic: osdc: wrong offset in BufferHead
Nathan Cutler
08:10 PM Backport #24583: mimic: osdc: wrong offset in BufferHead
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22869
merged
Yuri Weinstein
06:21 PM Feature #23979 (Fix Under Review): Limit pg log length during recovery/backfill so that we don't ...
https://github.com/ceph/ceph/pull/23098 Neha Ojha
05:39 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
Neha Ojha
01:37 PM Bug #20645 (Closed): bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
Igor Fedotov
09:58 AM Bug #24956 (Resolved): osd: parent process need to restart log service after fork, or ceph-osd wi...
ceph-osd parent process need to restart log service after fork, or ceph-osd will not work correctly when the option l... mingshuai wang

07/16/2018

09:18 PM Bug #24950: Running osd_skip_data_digest in a mixed cluster is not ideal
https://github.com/ceph/ceph/pull/23083 David Zafman
09:14 PM Bug #24950 (Resolved): Running osd_skip_data_digest in a mixed cluster is not ideal

If osd_skip_data_digest in a mixed BlueStore/FileStore cluster is dangerous because we loose data_digest integrity ...
David Zafman
09:17 PM Backport #24951 (Resolved): mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
https://github.com/ceph/ceph/pull/23084 David Zafman
09:08 PM Feature #24949 (Resolved): luminous: Allow scrub to fix Luminous 12.2.6 corruption of data_digest

I'm thinking that while osd_distrust_data_digest=true we should automatically ignore data_digest errors and repair ...
David Zafman
07:36 PM Bug #23352: osd: segfaults under normal operation
We actually got one on July 15: Jul 14 23:54:42 roc04r-sc3a080 kernel: [6988357.283555] safe_timer[19917]: segfault a... Alex Gorbachev
03:54 AM Bug #23352: osd: segfaults under normal operation
The latest core uploaded by Dan in comment 66 is slightly different to the others we've seen so far.
Once again th...
Brad Hubbard
02:24 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
https://github.com/ceph/ceph/pull/23072 Douglas Fuller
02:24 PM Bug #24687 (Fix Under Review): Automatically set expected_num_objects for new pools with >=100 PG...
Because a value for expected_num_objects is too difficult to determine automatically, instead we print a suggestion t... Douglas Fuller
11:16 AM Bug #24938 (New): luminous: rados listomapkeys & listomapvals don't return data.
Hi,
rados listomapkeys & rados listomapvals don't return data when running Luminous, tested on 12.2.4 and 12.2.6:
...
Magnus Grönlund
08:52 AM Bug #24935 (Duplicate): SafeTimer? osd killed by kernel for Segmentation fault
My environment :
[root@gz-ceph-52-203 log]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@gz-...
伟杰 谭
12:57 AM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
Noting the same issue, per ceph-users list post:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-July/028...
David Young

07/15/2018

05:46 AM Documentation #24924 (Resolved): doc: typo in crush-map docs
Each time the OSD starts, it verifies it is in the correct location in the CRUSH map and, if it is not, it moved its... Michael Jones

07/14/2018

09:04 PM Bug #24923 (Resolved): doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
Undersized
The placement group fewer copies than the configured pool replication level.
Missing "has"
Michael Jones
07:57 PM Bug #23871: luminous->mimic: missing primary copy of xxx, wil try copies on 3, then full-object r...
For the luminous regression, this will reproduce the issue:... Sage Weil

07/13/2018

11:02 PM Feature #24917 (New): Gracefully deal with upgrades when bluestore skipping of data_digest become...

Once the data_digest is no longer being used, but is still set from an earlier version, we can get EIO from read bu...
David Zafman
09:26 PM Backport #24083 (In Progress): luminous: rados: not all exceptions accept keyargs
PR: https://github.com/ceph/ceph/pull/22979 Victor Denisov
03:52 PM Bug #24597 (Resolved): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_m...
Nathan Cutler
05:09 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
Could cephfs trigger this issue? There have been two reports of cephfs_metadata pool crc errors on the users ML this ... Dan van der Ster
03:51 PM Backport #24891 (Resolved): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_...
Nathan Cutler
03:18 PM Backport #24891: mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22997
merged
Yuri Weinstein
03:00 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
FTR, this crc issue is probably due to an incomplete backport to 12.2.6 of the skip_digest changes for bluestore:
...
Dan van der Ster
01:55 PM Bug #24909 (Fix Under Review): RBD client IOPS pool stats are incorrect (2x higher; includes IO h...
https://github.com/ceph/ceph/pull/23029 Jason Dillaman
01:47 PM Bug #24909 (In Progress): RBD client IOPS pool stats are incorrect (2x higher; includes IO hints ...
Jason Dillaman
01:47 PM Bug #24909 (Resolved): RBD client IOPS pool stats are incorrect (2x higher; includes IO hints as ...
While running performance testing with Ceph metrics gathering statistics on the cluster, I noticed that while my RBD ... Jason Dillaman
12:58 PM Backport #24908 (In Progress): luminous: luminous->mimic: missing primary copy of xxx, wil try co...
Nathan Cutler
12:57 PM Backport #24908 (Resolved): luminous: luminous->mimic: missing primary copy of xxx, wil try copie...
https://github.com/ceph/ceph/pull/23028 Nathan Cutler
12:26 PM Backport #24890 (Resolved): luminous: FAILED assert(0 == "ERROR: source must exist") in FileStore...
Nathan Cutler
12:26 PM Bug #23871: luminous->mimic: missing primary copy of xxx, wil try copies on 3, then full-object r...
original fix is fe5038c7f9577327f82913b4565712c53903ee48
luminosu backport https://github.com/ceph/ceph/pull/23028
Sage Weil
12:06 PM Bug #23871 (Pending Backport): luminous->mimic: missing primary copy of xxx, wil try copies on 3,...
Sage Weil
11:31 AM Backport #24888 (Need More Info): luminous: osd: crash in OpTracker::unregister_inflight_op via O...
non-trivial backport. There are two conflicts. The first conflict can be resolved by cherry-picking 17a192ba5cdbe2129... Nathan Cutler
11:23 AM Backport #24889 (In Progress): mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::ge...
Nathan Cutler
11:22 AM Backport #24864 (In Progress): luminous: Abort in OSDMap::decode() during qa/standalone/erasure-c...
Nathan Cutler
11:20 AM Backport #24865 (In Progress): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code...
Nathan Cutler

07/12/2018

11:56 PM Bug #24801 (In Progress): PG num_bytes becomes huge
David Zafman
07:38 PM Bug #24600 (Resolved): ValueError: too many values to unpack due to lack of subdir
Nathan Cutler
07:38 PM Backport #24617 (Resolved): mimic: ValueError: too many values to unpack due to lack of subdir
Nathan Cutler
04:36 PM Backport #24617: mimic: ValueError: too many values to unpack due to lack of subdir
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22888
merged
Yuri Weinstein
02:05 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
Is this the relevant fix? https://github.com/ceph/ceph/commit/4667280f8afe6cd68dfffea61d7530581f3dd0eb
Alessandro'...
Dan van der Ster
12:27 PM Backport #24890 (In Progress): luminous: FAILED assert(0 == "ERROR: source must exist") in FileSt...
Nathan Cutler
10:18 AM Backport #24890 (Resolved): luminous: FAILED assert(0 == "ERROR: source must exist") in FileStore...
https://github.com/ceph/ceph/pull/22976 Nathan Cutler
11:03 AM Backport #24891 (In Progress): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore...
Nathan Cutler
10:18 AM Backport #24891 (Resolved): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_...
https://github.com/ceph/ceph/pull/22997 Nathan Cutler
10:50 AM Bug #24150 (Resolved): LibRadosMiscPool.PoolCreationRace segv
Nathan Cutler
10:50 AM Backport #24204 (Resolved): mimic: LibRadosMiscPool.PoolCreationRace segv
Nathan Cutler
12:06 AM Backport #24204: mimic: LibRadosMiscPool.PoolCreationRace segv
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22291
merged
Yuri Weinstein
10:50 AM Bug #24321 (Resolved): assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max...
Nathan Cutler
10:49 AM Backport #24329 (Resolved): mimic: assert manager.get_num_active_clean() == pg_num on rados/singl...
Nathan Cutler
12:05 AM Backport #24329: mimic: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/ma...
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22492
merged
Yuri Weinstein
10:48 AM Backport #24747 (Resolved): mimic: change default filestore_merge_threshold to -10
Nathan Cutler
12:03 AM Backport #24747: mimic: change default filestore_merge_threshold to -10
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22813
merged
Yuri Weinstein
10:48 AM Bug #24365 (Resolved): cosbench stuck at booting cosbench driver
Nathan Cutler
10:47 AM Backport #24473 (Resolved): mimic: cosbench stuck at booting cosbench driver
Nathan Cutler
12:03 AM Backport #24473: mimic: cosbench stuck at booting cosbench driver
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22887
merged
Yuri Weinstein
10:46 AM Bug #24487 (Resolved): osd: choose_acting loop
Nathan Cutler
10:46 AM Backport #24618 (Resolved): mimic: osd: choose_acting loop
Nathan Cutler
12:02 AM Backport #24618: mimic: osd: choose_acting loop
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22889
merged
Yuri Weinstein
10:46 AM Bug #24349 (Resolved): osd: stray osds in async_recovery_targets cause out of order ops
Nathan Cutler
10:46 AM Backport #24383 (Resolved): mimic: osd: stray osds in async_recovery_targets cause out of order ops
Nathan Cutler
12:02 AM Backport #24383: mimic: osd: stray osds in async_recovery_targets cause out of order ops
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22889
merged
Yuri Weinstein
10:45 AM Backport #24805 (Resolved): mimic: rgw workload makes osd memory explode
Nathan Cutler
12:00 AM Backport #24805: mimic: rgw workload makes osd memory explode
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22960
merged
Yuri Weinstein
10:36 AM Backport #24771 (Resolved): mimic: osd: may get empty info at recovery
Nathan Cutler
10:18 AM Backport #24889 (Resolved): mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_h...
https://github.com/ceph/ceph/pull/23026 Nathan Cutler
10:18 AM Backport #24888 (Rejected): luminous: osd: crash in OpTracker::unregister_inflight_op via OSD::ge...
Nathan Cutler
03:03 AM Bug #24664 (Pending Backport): osd: crash in OpTracker::unregister_inflight_op via OSD::get_healt...
Sage Weil
03:01 AM Bug #24597 (Pending Backport): FAILED assert(0 == "ERROR: source must exist") in FileStore::_coll...
Sage Weil

07/11/2018

11:48 PM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
... Josh Durgin
11:47 PM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
Happened again in 12.2.4:... Josh Durgin
11:33 PM Bug #24866: FAILED assert(0 == "past_interval start interval mismatch") in check_past_interval_bo...
/a/nojha-2018-07-06_23:31:26-rados-wip-23979-2018-07-06-distro-basic-smithi/2744661/ Neha Ojha
11:24 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
Cool, I will pickup and run your test, atm the load on workers is high, should have the results tomorrow eod. Vasu Kulkarni
10:25 AM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
OK, it looks like we missed this in the previous tracker issue that mentioned it (it was actually a three part fix an... Boris Ranto
11:23 PM Bug #24676 (Resolved): FreeBSD/Linux integration - monitor map with wrong sa_family
Josh Durgin
11:21 PM Bug #24683: ceph-mon binary doesn't report to systemd why it dies
Does this show up in the monitor's log in /var/log/ceph/ ? Josh Durgin
11:15 PM Bug #24786 (Resolved): LibRadosList.ListObjectsNS fails
https://github.com/ceph/ceph/pull/22771 Josh Durgin
11:13 PM Bug #24787 (Duplicate): cls_rgw.index_suggest FAILED
Looks the same as #24640 Josh Durgin
11:11 PM Bug #24835 (Need More Info): osd daemon spontaneous segfault
Unfortunately there's not much to go on - if this happens again perhaps you can grab a core file or a crash dump will... Josh Durgin
10:09 PM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
mimic backport: https://github.com/ceph/ceph/pull/22997 Sage Weil
03:54 PM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
Factors leading to this:
- ec pool (e.g., rgw workload0
- rados ops that result in pg log 'error' entries (e.g., ...
Sage Weil
12:37 PM Bug #24597 (In Progress): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collectio...
https://github.com/ceph/ceph/pull/22974 Sage Weil
01:16 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
Aha, in that case wip-24192 should fix it. Running it through testing again... Josh Durgin
12:38 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
Sage Weil
12:38 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
I believe this is caused by b50186bfe6c8981700e33c8a62850e21779d67d5, which does... Sage Weil
09:38 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
Ah, the error was reported on luminous, which doesn't do the repair, and I guess I missed it on master. Sorry for the... Greg Farnum
09:01 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors

The do_sparse_read() path doesn't attempt to repair a checksum error. Could that be the real issue?
The do_read...
David Zafman
08:25 PM Bug #24875 (Resolved): OSD: still returning EIO instead of recovering objects on checksum errors
A report came in on the mailing list of an MDS journal which couldn't be read and was throwing errors:... Greg Farnum
08:31 PM Bug #24876 (New): snaptrim_error state cannot be cleared without a new snaptrim
A user on the list reported they had PGs in state "active+clean+snaptrim_error". Investigating, I found that the only... Greg Farnum
08:11 PM Backport #24771: mimic: osd: may get empty info at recovery
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22861
mergedReviewed-by: Sage Weil <sage@redhat.com>
Yuri Weinstein
07:27 PM Bug #24874 (New): ec fast reads can trigger read errors in log
fast read finishes...... Sage Weil
04:11 PM Bug #23145 (Duplicate): OSD crashes during recovery of EC pg
This looks like #24597 for the 12.2.5 case, at least. I wonder if the original 12.2.3 is something else (time warp d... Sage Weil
03:51 PM Bug #24192 (Duplicate): cluster [ERR] Corruption detected: object 2:f59d1934:::smithi14913526-582...
Josh Durgin

07/10/2018

10:10 PM Bug #24866 (Resolved): FAILED assert(0 == "past_interval start interval mismatch") in check_past_...
... Neha Ojha
08:30 PM Backport #24865 (Resolved): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/te...
https://github.com/ceph/ceph/pull/23024 Patrick Donnelly
08:29 PM Backport #24864 (Resolved): luminous: Abort in OSDMap::decode() during qa/standalone/erasure-code...
https://github.com/ceph/ceph/pull/23025 Patrick Donnelly
04:51 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
This was a ceph-volume test with rbd workload, no upgrades, just fresh install, full logs at
http://pulpito.ceph.c...
Vasu Kulkarni
02:41 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
This points to a deeper issue. The target context seems to always be 'unlabeled_t'. That context means something like... Boris Ranto
12:23 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
Filing under RADOS because it appears to be OSD specific. John Spray
01:42 PM Bug #23492 (Pending Backport): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-e...
Sage Weil
12:46 PM Bug #24850 (New): IPv6 scoped address not parseable by entity_addr_t
An IPv6 link-local scoped address is not currently parseable since it contains a "%<interface name>" suffix in the ad... Jason Dillaman
12:14 PM Bug #24835: osd daemon spontaneous segfault
The log (attached) does not contain any information on the crash. It shows only the automatic restart of the crashed ... Christian Schlittchen
09:54 AM Backport #24847 (In Progress): jewel: rgw workload makes osd memory explode
Nathan Cutler
09:54 AM Backport #24847 (Resolved): jewel: rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22959 Nathan Cutler
09:48 AM Backport #24806 (In Progress): luminous: rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22962 Prashant D
09:42 AM Backport #24805 (In Progress): mimic: rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22960 Prashant D
09:41 AM Bug #23352: osd: segfaults under normal operation
We see periodically with osd_enable_op_tracker = false
Last time ...
Serg D
12:55 AM Bug #23352: osd: segfaults under normal operation
That is correct, Brad. No crashes for 7 days now. Alex Gorbachev
09:33 AM Bug #24768: rgw workload makes osd memory explode
jewel backport: https://github.com/ceph/ceph/pull/22959
i knew that jewel is (almost) EOL. just in case anyone is ...
Kefu Chai
04:01 AM Backport #24845 (Resolved): luminous: tools/ceph-objectstore-tool: split filestore directories of...
https://github.com/ceph/ceph/pull/23418 Nathan Cutler

07/09/2018

10:43 PM Bug #23352: osd: segfaults under normal operation
Alex, So that's a week without issue when previously you were getting a crash every 3-4 days right? Brad Hubbard
01:36 PM Bug #23352: osd: segfaults under normal operation
No issues so far since injecting osd_enable_op_tracker=false Alex Gorbachev
08:40 PM Feature #21366 (Pending Backport): tools/ceph-objectstore-tool: split filestore directories offli...
Josh Durgin
06:27 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
https://github.com/ceph/ceph/pull/22954 Sage Weil
06:02 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
The problem is that int global_init_shutdown_stderr(CephContext *cct) is not being run at a time in the process lifec... Sage Weil
05:02 PM Bug #24835: osd daemon spontaneous segfault
Can you provide the backtrace out of the OSD log? Or even the whole log? Greg Farnum
02:13 PM Bug #24835 (Can't reproduce): osd daemon spontaneous segfault
We experience spontaneous segmentation faults of osd daemons in our mimic production cluster:... Christian Schlittchen
04:36 PM Bug #24838 (Resolved): mon: auth checks not correct for pool ops
The mon was not enforcing caps for pool ops correctly (which are used for managing unmanaged snapshots or even pool d... Sage Weil
04:32 PM Bug #24837 (Resolved): auth: cephx signature check is weak/broken
The signature check code was validating only the first (32-byte) of two blocks, and thus did not cover all of the crc... Sage Weil
04:30 PM Bug #24836 (Resolved): auth: cephx authorizer subject to replay
The cephx authorizer does not have any challenge or nonce, and thus (if sniffed) can be reused by another session.
...
Sage Weil
04:09 PM Bug #24368: osd: should not restart on permanent failures
I don't think the issue has moved beyond the PR linked above to change the systemd settings. I sent this out to one o... Greg Farnum
08:42 AM Bug #24368: osd: should not restart on permanent failures
guotao Yao wrote:
> I've had a similar problem recently. One OSD crash and exit, and the OSD process starts up quick...
guotao Yao
08:12 AM Bug #24368: osd: should not restart on permanent failures
I've had a similar problem recently. One OSD crash and exit, and the OSD process starts up quickly by systemd. It cau... guotao Yao

07/06/2018

09:55 PM Bug #24322 (Resolved): slow mon ops from osd_failure
Nathan Cutler
09:55 PM Backport #24350 (Resolved): mimic: slow mon ops from osd_failure
Nathan Cutler
09:50 PM Backport #24350: mimic: slow mon ops from osd_failure
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22297
merged
Yuri Weinstein
09:54 PM Bug #24222 (Resolved): Manager daemon y is unresponsive during teuthology cluster teardown
Nathan Cutler
09:54 PM Backport #24246 (Resolved): mimic: Manager daemon y is unresponsive during teuthology cluster tea...
Nathan Cutler
09:49 PM Backport #24246: mimic: Manager daemon y is unresponsive during teuthology cluster teardown
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22333
merged
Yuri Weinstein
09:54 PM Backport #24375 (Resolved): mimic: mon: auto compaction on rocksdb should kick in more often
Nathan Cutler
09:49 PM Backport #24375: mimic: mon: auto compaction on rocksdb should kick in more often
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22361
merged
Yuri Weinstein
09:52 PM Backport #24407 (Resolved): mimic: read object attrs failed at EC recovery
Nathan Cutler
09:51 PM Bug #24408 (Resolved): tell ... config rm <foo> not idempotent
Nathan Cutler
09:51 PM Backport #24468 (Resolved): mimic: tell ... config rm <foo> not idempotent
Nathan Cutler
09:42 PM Backport #24468: mimic: tell ... config rm <foo> not idempotent
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22552
merged
Yuri Weinstein
09:50 PM Backport #24332 (Resolved): mimic: local_reserver double-reservation of backfilled pg
Nathan Cutler
09:42 PM Backport #24332: mimic: local_reserver double-reservation of backfilled pg
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22559
merged
Yuri Weinstein
09:49 PM Bug #24423 (Resolved): failed to load OSD map for epoch X, got 0 bytes
Nathan Cutler
09:49 PM Backport #24599 (Resolved): mimic: failed to load OSD map for epoch X, got 0 bytes
Nathan Cutler
09:40 PM Backport #24599: mimic: failed to load OSD map for epoch X, got 0 bytes
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22651
merged
Yuri Weinstein
09:48 PM Backport #24494 (Resolved): mimic: osd: segv in Session::have_backoff
Nathan Cutler
09:39 PM Backport #24494: mimic: osd: segv in Session::have_backoff
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22730
merged
Yuri Weinstein
09:47 PM Bug #24199 (Resolved): common: JSON output from rados bench write has typo in max_latency key
Nathan Cutler
09:47 PM Backport #24291 (Resolved): jewel: common: JSON output from rados bench write has typo in max_lat...
Nathan Cutler
09:45 PM Backport #24292 (Resolved): mimic: common: JSON output from rados bench write has typo in max_lat...
Nathan Cutler
09:44 PM Backport #24292: mimic: common: JSON output from rados bench write has typo in max_latency key
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22406
merged
Yuri Weinstein
09:06 PM Backport #24806 (Resolved): luminous: rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22962 Nathan Cutler
09:06 PM Backport #24805 (Resolved): mimic: rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22960 Nathan Cutler
06:44 PM Bug #24768 (Pending Backport): rgw workload makes osd memory explode
Sage Weil
06:12 PM Bug #24801: PG num_bytes becomes huge

The OSD logs and this bug point to a slight flaw in https://github.com/ceph/ceph/pull/22797. I add the adjustment ...
David Zafman
05:57 PM Bug #24801 (Resolved): PG num_bytes becomes huge

dzafman-2018-07-05_12:45:56-rados-wip-19753-distro-basic-smithi/2739140
description: rados/thrash/{0-size-min-si...
David Zafman
04:45 PM Backport #23772 (In Progress): luminous: ceph status shows wrong number of objects
Nathan Cutler
01:39 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
... David Zafman
01:28 AM Bug #24787 (Duplicate): cls_rgw.index_suggest FAILED

dzafman-2018-07-03_13:41:32-rados-wip-19753-distro-basic-smithi
2732821
2732693
2732523...
David Zafman
01:01 AM Bug #24786 (Resolved): LibRadosList.ListObjectsNS fails

http://pulpito.ceph.com/dzafman-2018-07-03_13:41:32-rados-wip-19753-distro-basic-smithi
Multiple jobs
2732818
...
David Zafman

07/05/2018

10:40 PM Bug #24785 (Resolved): mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
... Vasu Kulkarni
09:33 PM Bug #24664 (In Progress): osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_met...
https://github.com/ceph/ceph/pull/22877 Brad Hubbard
08:39 PM Backport #24383: mimic: osd: stray osds in async_recovery_targets cause out of order ops
Ganging up with another backport to prevent merge conflicts. Nathan Cutler
08:27 PM Backport #24618 (In Progress): mimic: osd: choose_acting loop
Nathan Cutler
08:22 PM Backport #24617 (In Progress): mimic: ValueError: too many values to unpack due to lack of subdir
Nathan Cutler
08:15 PM Backport #24473 (In Progress): mimic: cosbench stuck at booting cosbench driver
Nathan Cutler
12:44 PM Bug #24768 (Fix Under Review): rgw workload makes osd memory explode
https://github.com/ceph/ceph/pull/22858 Sage Weil
09:17 AM Backport #24583 (In Progress): mimic: osdc: wrong offset in BufferHead
https://github.com/ceph/ceph/pull/22869 Prashant D
07:25 AM Backport #24584 (In Progress): luminous: osdc: wrong offset in BufferHead
https://github.com/ceph/ceph/pull/22865 Prashant D

07/04/2018

11:11 PM Backport #24772 (In Progress): luminous: osd: may get empty info at recovery
Nathan Cutler
10:52 PM Backport #24772 (Resolved): luminous: osd: may get empty info at recovery
https://github.com/ceph/ceph/pull/22862 Nathan Cutler
11:03 PM Backport #24771 (In Progress): mimic: osd: may get empty info at recovery
Nathan Cutler
10:52 PM Backport #24771 (Resolved): mimic: osd: may get empty info at recovery
https://github.com/ceph/ceph/pull/22861 Nathan Cutler
07:24 PM Bug #24588: osd: may get empty info at recovery
https://github.com/ceph/ceph/pull/22704 is the fix Sage Weil
07:23 PM Bug #24588 (Pending Backport): osd: may get empty info at recovery
Sage Weil
07:17 PM Bug #24768 (Resolved): rgw workload makes osd memory explode
From ML,... Sage Weil
12:47 PM Bug #23352: osd: segfaults under normal operation
Brad Hubbard wrote:
> Having reviewed the code in question again I was afraid that may be the case. If you can provi...
Dan van der Ster
09:38 AM Bug #23352: osd: segfaults under normal operation
Having reviewed the code in question again I was afraid that may be the case. If you can provide the crash dump Dan, ... Brad Hubbard
07:36 AM Bug #23352: osd: segfaults under normal operation
I *injected* osd_enable_op_tracker=false yesterday ... Dan van der Ster
07:40 AM Bug #24123 (Resolved): "process (unknown)" in ceph logs
Nathan Cutler
07:39 AM Backport #24215 (Resolved): mimic: "process (unknown)" in ceph logs
Nathan Cutler
07:38 AM Bug #24243 (Resolved): osd: pg hard limit too easy to hit
Nathan Cutler
07:38 AM Backport #24500 (Resolved): mimic: osd: eternal stuck PG in 'unfound_recovery'
Nathan Cutler
07:37 AM Backport #24355 (Resolved): mimic: osd: pg hard limit too easy to hit
Nathan Cutler
07:31 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
Hi Martin,
Have you try my workaround above?
Best regards,
Lazuardi Nasution
06:18 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
Hi everyone,
What’s the workaround for this issue? Not being able to add new osds is getting more and more urgent...
Martin Overgaard Hansen
01:10 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
Final bisect results:
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
2e476...
David Zafman

07/03/2018

10:51 AM Backport #24748 (In Progress): luminous: change default filestore_merge_threshold to -10
Nathan Cutler
07:55 AM Backport #24748 (Resolved): luminous: change default filestore_merge_threshold to -10
https://github.com/ceph/ceph/pull/22814 Nathan Cutler
10:47 AM Backport #24747 (In Progress): mimic: change default filestore_merge_threshold to -10
Nathan Cutler
07:55 AM Backport #24747 (Resolved): mimic: change default filestore_merge_threshold to -10
https://github.com/ceph/ceph/pull/22813 Nathan Cutler
10:47 AM Bug #24686: change default filestore_merge_threshold to -10
*master PR*: https://github.com/ceph/ceph/pull/22761 Nathan Cutler
12:36 AM Bug #24686 (Pending Backport): change default filestore_merge_threshold to -10
Josh Durgin
10:24 AM Feature #13507: scrub APIs to read replica
Update backport field? Nathan Cutler
04:16 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
Hi Sergey,
Have you try that after "ceph osd require-osd-release mimic"?
My workaround is below.
1. Build pa...
Lazuardi Nasution
03:08 AM Bug #23352: osd: segfaults under normal operation
Thanks Alex! Brad Hubbard
01:50 AM Bug #23352: osd: segfaults under normal operation
I set it Brad, watching the status. We normal get one failure in 3-4 days. Alex Gorbachev
01:03 AM Bug #23352: osd: segfaults under normal operation
We are investigating the potential race between get_health_metrics and the op_tracker code.
In the mean time, for ...
Brad Hubbard

07/02/2018

10:47 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I finded workaround solution how add new osd to "mimic" cluster:
1. Purge osd from cluster which displayed as "dow...
Sergey Ponomarev
04:52 PM Backport #24215: mimic: "process (unknown)" in ceph logs
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22311
merged
Yuri Weinstein
04:52 PM Backport #24500: mimic: osd: eternal stuck PG in 'unfound_recovery'
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22545
merged
Yuri Weinstein
04:51 PM Backport #24355: mimic: osd: pg hard limit too easy to hit
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22621
merged
Yuri Weinstein
06:12 AM Bug #23352: osd: segfaults under normal operation
Pretty sure this all revolves around the racy code highlighted in #24037 and, unfortunately, the PR does *not* fix al... Brad Hubbard

06/29/2018

11:27 PM Bug #23875 (In Progress): Removal of snapshot with corrupt replica crashes osd
Tentative pull request https://github.com/ceph/ceph/pull/22476 is an improvement but doesn't address comment 3 David Zafman
11:25 PM Bug #19753 (In Progress): Deny reservation if expected backfill size would put us over backfill_f...
David Zafman
05:59 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
Also include >1024 PGs overall Douglas Fuller
09:59 AM Bug #23145: OSD crashes during recovery of EC pg
@sage weil,
tks, due to env is not exists. I couldn't get the logs for the arguments debug_osd=20.
from the previou...
Yong Wang

06/28/2018

05:53 PM Bug #24645: Upload to radosgw fails when there are degraded objects
When the cluster is in recovery this is expected that we're waiting for the OSDs to respond Abhishek Lekshmanan
05:16 PM Bug #24676: FreeBSD/Linux integration - monitor map with wrong sa_family
I discovered that commit 9099ca5 - "fix the dencoder of entity_addr_t" introduced this kind of interoperability which... Alexander Haemmerle
08:50 AM Bug #24676: FreeBSD/Linux integration - monitor map with wrong sa_family
I investigated further with gdb. Lines 478-501 from msg/msg_types.h seem to be the culprit. Here sa_family is decoded... Alexander Haemmerle
05:14 PM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
... Neha Ojha
05:08 PM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
/a/nojha-2018-06-27_22:32:36-rados-wip-23979-distro-basic-smithi/2715571/ Neha Ojha
02:50 PM Bug #24686 (In Progress): change default filestore_merge_threshold to -10
Douglas Fuller
02:18 PM Bug #24686 (Resolved): change default filestore_merge_threshold to -10
Performance evaluations of medium to large size Ceph clusters have demonstrated negligible performance impact from un... Douglas Fuller
02:49 PM Bug #24687 (Resolved): Automatically set expected_num_objects for new pools with >=100 PGs per OSD
Field experience has demonstrated significant performance impact from filestore split and merge activity. The expecte... Douglas Fuller
10:15 AM Bug #24685 (Resolved): config options: possible inconsistency between flag 'can_update_at_runtime...
I'm wondering if there is a inconsistency between the 'can_update_at_runtime' flag and the 'flags' list for the confi... Tatjana Dehler
08:47 AM Bug #24683: ceph-mon binary doesn't report to systemd why it dies
If I execute the same command that systemd uses, I get a great readable error message:... Erik Bernoth
08:45 AM Bug #24683 (New): ceph-mon binary doesn't report to systemd why it dies
Following the quick start guide I get at a point where the monitor is supposed to come up but it doesn't. It doesn't ... Erik Bernoth
05:53 AM Bug #24587: librados api aio tests race condition
Good news, this is just a bug in the tests. They're submitting a write and then a read without waiting for the write ... Josh Durgin
03:04 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Thanks Sage. I'll try to get my hands on another environment and see if I can reproduce and get more details. Will up... Dexter John Genterone

06/27/2018

09:17 PM Bug #24615: error message for 'unable to find any IP address' not shown
Sounds like the log isn't being flushed before exiting Josh Durgin
09:13 PM Bug #24652 (Won't Fix): OSD crashes when repairing pg
This should be fixed in later versions - hammer is end of life.
The crash was:...
Josh Durgin
09:06 PM Bug #24667: osd: SIGSEGV in MMgrReport::encode_payload
Possibly related to a memory corruption we've been seeing related to mgr health reporting on the osd. Josh Durgin
07:54 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
https://github.com/ceph/ceph/pull/22744 disabled build_past_itnervals_parallel in luminous (by default; can be turned... Sage Weil
05:51 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Well, I can work around the issue.. I the build_past_itnervals_parallel() is removed entirely in mimic and I can do t... Sage Weil
07:12 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
> Any chance you can gdb one of the core files for a crashing OSD to identify which PG is it asserting on? and perhap... Dexter John Genterone
06:10 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Sage Weil wrote:
> Dexter, anyone: was there a PG split (pg_num increase) on the cluster before this happened? Or m...
Xiaoxi Chen
05:19 PM Bug #24678 (Can't reproduce): ceph-mon segmentation fault after setting pool size to 1 on degrade...
We have an issue with starting any from 3 monitors after changing pool size from 3 to 1. The cluster was in a degrade... Sergey Burdakov
03:09 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
I also have this issue on new installed mimic cluster.
Don't know if this is important, the problem has appeared aft...
Sergey Burdakov
12:00 PM Bug #24676 (Resolved): FreeBSD/Linux integration - monitor map with wrong sa_family
We are using a ceph cluster in a mixed FreeBSD/Linux environment. The ceph cluster is based on FreeBSD. Linux clients... Alexander Haemmerle
09:16 AM Bug #23352: osd: segfaults under normal operation
We're getting a few crashes like this per week here on 12.2.5.
Here's a fileStore OSD:...
Dan van der Ster
04:10 AM Backport #24494 (In Progress): mimic: osd: segv in Session::have_backoff
https://github.com/ceph/ceph/pull/22730 Prashant D
04:09 AM Backport #24495 (In Progress): luminous: osd: segv in Session::have_backoff
https://github.com/ceph/ceph/pull/22729 Prashant D
12:41 AM Bug #23395 (Can't reproduce): qa/standalone/special/ceph_objectstore_tool.py causes ceph-mon core...
David Zafman

06/26/2018

11:32 PM Bug #23492 (In Progress): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasur...
David Zafman
11:29 PM Feature #13507 (New): scrub APIs to read replica
David Zafman
11:28 PM Bug #24366 (Resolved): omap_digest handling still not correct
David Zafman
11:27 PM Backport #24381 (Resolved): luminous: omap_digest handling still not correct
David Zafman
11:27 PM Backport #24380 (Resolved): mimic: omap_digest handling still not correct
David Zafman
09:08 PM Bug #23352: osd: segfaults under normal operation
Matt,
Can you provide a coredump or full backtrace?
Brad Hubbard
01:54 PM Bug #23352: osd: segfaults under normal operation
Also confirmed on Ubuntu 18.04/Ceph 13.2.0:
ceph-mgr.log
> 2018-06-24 11:14:47.317 7ff17b0db700 -1 mgr.server s...
Matt Dunavant
02:54 AM Bug #23352: osd: segfaults under normal operation
confirmed
ceph-mgr.log
@2018-06-20 08:46:05.528656 7fb998ff2700 -1 mgr.server send_report send_report osd,215.0x5...
Beom-Seok Park
07:14 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Dexter, anyone: was there a PG split (pg_num increase) on the cluster before this happened? Or maybe a split combine... Sage Weil
07:10 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
... Sage Weil
07:06 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
... Sage Weil
06:42 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Dexter John Genterone wrote:
> Uploaded a few more logs (debug 20) here: https://storage.googleapis.com/ceph-logs/ce...
Sage Weil
07:11 PM Bug #24667 (Can't reproduce): osd: SIGSEGV in MMgrReport::encode_payload
... Patrick Donnelly
07:07 PM Bug #24666 (New): pybind: InvalidArgumentError is missing 'errno' argument
Instead of being derived from 'Error', the 'InvalidArgumentError' should be derived from 'OSError' which will handle ... Jason Dillaman
01:49 PM Bug #24664 (Resolved): osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics
... Patrick Donnelly
09:47 AM Bug #24660 (New): admin/build-doc fails during autodoc on rados module: "AttributeError: __next__"
I'm trying to send a doc patch and am running @admin/build-doc@ in my local environment explained in [[http://docs.ce... Florian Haas

06/25/2018

09:50 PM Bug #23352: osd: segfaults under normal operation
Same here
2018-06-24 19:42:41.348699 7f3e53a46700 -1 mgr.server send_report send_report osd,226.0x55678069c850 sen...
Alex Gorbachev
09:34 PM Bug #23352: osd: segfaults under normal operation
Brad Hubbard wrote:
> Can anyone confirm seeing the "unknown health metric" messages in the mgr logs prior to the se...
Kjetil Joergensen
02:50 PM Bug #24652 (Won't Fix): OSD crashes when repairing pg
After a deep-scrub on the primary OSD for the pg we get:... Ana Aviles
01:03 PM Bug #24650 (New): mark unfound lost revert: out of order trim
OSD crashes in a few seconds after command 'ceph pg X.XX mark_unfound_lost revert'.
-10> 2018-06-25 15:52:14.49...
Sergey Malinin
07:53 AM Bug #24645 (New): Upload to radosgw fails when there are degraded objects
Hi,
we use Ceph RadosGW for storing and serving milions of small images. Everything is working well until recovery...
Michal Cila
06:46 AM Backport #24471 (In Progress): luminous: Ceph-osd crash when activate SPDK
https://github.com/ceph/ceph/pull/22686 Prashant D
05:01 AM Backport #24472 (In Progress): mimic: Ceph-osd crash when activate SPDK
https://github.com/ceph/ceph/pull/22684 Prashant D

06/22/2018

11:25 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Uploaded a few more logs (debug 20) here: https://storage.googleapis.com/ceph-logs/ceph-osd-logs.tar.gz
After runn...
Dexter John Genterone
12:45 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
Hi Sage,
We've experienced this again on a new environment we setup. Took a snippet of the logs, hope it's enough:...
Dexter John Genterone
04:46 PM Bug #23622 (Resolved): qa/workunits/mon/test_mon_config_key.py fails on master
Nathan Cutler
04:45 PM Backport #23675 (Resolved): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
Nathan Cutler
04:25 PM Backport #23675: luminous: qa/workunits/mon/test_mon_config_key.py fails on master
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21368
merged
Yuri Weinstein
04:44 PM Bug #23921 (Resolved): pg-upmap cannot balance in some case
Nathan Cutler
04:43 PM Backport #24048 (Resolved): luminous: pg-upmap cannot balance in some case
Nathan Cutler
04:25 PM Backport #24048: luminous: pg-upmap cannot balance in some case
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22115
merged
Yuri Weinstein
04:43 PM Bug #24025 (Resolved): RocksDB compression is not supported at least on Debian.
Nathan Cutler
04:42 PM Backport #24279 (Resolved): luminous: RocksDB compression is not supported at least on Debian.
Nathan Cutler
04:24 PM Backport #24279: luminous: RocksDB compression is not supported at least on Debian.
Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22215
merged
Yuri Weinstein
04:40 PM Backport #24329: mimic: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/ma...
original mimic backport https://github.com/ceph/ceph/pull/22288 was merged, but deemed insufficient Nathan Cutler
04:38 PM Backport #24328 (Resolved): luminous: assert manager.get_num_active_clean() == pg_num on rados/si...
Nathan Cutler
04:23 PM Bug #24321: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max-pg-per-osd...
merged https://github.com/ceph/ceph/pull/22296 Yuri Weinstein
04:15 PM Bug #24635 (New): luminous: LibRadosTwoPoolsPP.SetRedirectRead failed
Probably a race with the redirect code.
From http://qa-proxy.ceph.com/teuthology/yuriw-2018-06-22_03:31:56-rados-w...
Josh Durgin
01:19 PM Bug #23352: osd: segfaults under normal operation
Yeah. We got in log mgr before segfault ceph-osd:
> mgr.server send_report send_report osd,74.0x560276d34ed8 sent me...
Serg D
03:54 AM Bug #23352: osd: segfaults under normal operation
Brad Hubbard
03:54 AM Bug #23352: osd: segfaults under normal operation
In several of the crashes we are seeing lines like the following prior to the crash.... Brad Hubbard
12:42 PM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
I have a bit different effect on v12.2.5, but may be related:
I have similar logs:...
Peter Gervai
08:44 AM Backport #24351 (Resolved): luminous: slow mon ops from osd_failure
Nathan Cutler
12:23 AM Backport #24351: luminous: slow mon ops from osd_failure
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22568
merged
Yuri Weinstein
08:44 AM Bug #23386 (Resolved): crush device class: Monitor Crash when moving Bucket into Default root
Nathan Cutler
08:43 AM Backport #24258 (Resolved): luminous: crush device class: Monitor Crash when moving Bucket into D...
Nathan Cutler
12:21 AM Backport #24258: luminous: crush device class: Monitor Crash when moving Bucket into Default root
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22381
mergedReviewed-by: Nathan Cutler <ncutler@suse.com>
Yuri Weinstein
08:43 AM Backport #24290 (Resolved): luminous: common: JSON output from rados bench write has typo in max_...
Nathan Cutler
12:20 AM Backport #24290: luminous: common: JSON output from rados bench write has typo in max_latency key
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22391
merged
Yuri Weinstein
08:41 AM Backport #24356 (Resolved): luminous: osd: pg hard limit too easy to hit
Nathan Cutler
12:18 AM Backport #24356: luminous: osd: pg hard limit too easy to hit
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22592
merged
Yuri Weinstein
08:41 AM Backport #24618 (Resolved): mimic: osd: choose_acting loop
https://github.com/ceph/ceph/pull/22889 Nathan Cutler
08:41 AM Backport #24617 (Resolved): mimic: ValueError: too many values to unpack due to lack of subdir
https://github.com/ceph/ceph/pull/22888 Nathan Cutler
12:32 AM Bug #24615 (Resolved): error message for 'unable to find any IP address' not shown
Hi,
In my ceph.conf I have the option:...
Francois Lafont

06/21/2018

11:40 PM Bug #24613 (New): luminous: rest/test.py fails with expected 200, got 400
... Neha Ojha
10:57 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
possibly related lumious run: http://pulpito.ceph.com/yuriw-2018-06-11_16:27:32-rados-wip-yuri3-testing-2018-06-11-14... Josh Durgin
10:15 PM Bug #23352: osd: segfaults under normal operation
Another instance: http://pulpito.ceph.com/yuriw-2018-06-19_21:29:48-rados-wip-yuri-testing-2018-06-19-1953-luminous-d... Josh Durgin
09:01 PM Bug #24487 (Pending Backport): osd: choose_acting loop
Neha Ojha
05:58 PM Bug #24487 (Fix Under Review): osd: choose_acting loop
https://github.com/ceph/ceph/pull/22664 Neha Ojha
06:48 PM Bug #24612 (Resolved): FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor::prune_init()
... Neha Ojha
04:51 PM Bug #23879: test_mon_osdmap_prune.sh fails
/a/nojha-2018-06-21_00:18:52-rados-wip-24487-distro-basic-smithi/2686362 Neha Ojha
09:19 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
We also hit this today, happen to have osd log with --debug_osd = 20
FWIW, the cluster has an inconsistent PG and ...
Xiaoxi Chen
01:34 AM Bug #24601 (Resolved): FAILED assert(is_up(osd)) in OSDMap::get_inst(int)
... Neha Ojha
12:52 AM Bug #24600 (Resolved): ValueError: too many values to unpack due to lack of subdir
... Neha Ojha

06/20/2018

10:13 PM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
Sage Weil wrote:
> Can you generate an osd log with 'debug osd = 20' for the crashing osd that leads up to the crash...
Sage Weil
10:13 PM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
Can you generate an osd log with 'debug osd = 20' for the crashing osd that leads up to the crash? Sage Weil
09:50 PM Bug #24422 (Duplicate): Ceph OSDs crashing in BlueStore::queue_transactions() using EC
Josh Durgin
10:11 PM Bug #23145: OSD crashes during recovery of EC pg
Two basic theories:
1. There is a bug that prematurely advances can_rollback_to
2. One of Peter's OSDs warped bac...
Sage Weil
10:05 PM Bug #23145: OSD crashes during recovery of EC pg
Sage Weil wrote:
> Zengran Zhang wrote:
> > osd in last peering stage will call pg_log.roll_forward(at last of PG:...
Sage Weil
10:03 PM Bug #23145 (Need More Info): OSD crashes during recovery of EC pg
Yong Wang, can you provide a full osd log with debug osd = 20 for the primary osd for the PG leading up to the crash... Sage Weil
09:22 PM Bug #23145: OSD crashes during recovery of EC pg
Zengran Zhang wrote:
> osd in last peering stage will call pg_log.roll_forward(at last of PG::activate), is there p...
Sage Weil
01:46 AM Bug #23145: OSD crashes during recovery of EC pg
@Sage Weil @Zengran Zhang
could you shared something about this bug recently?
Yong Wang
01:44 AM Bug #23145: OSD crashes during recovery of EC pg
hi all,did it has any updates please? Yong Wang
10:02 PM Backport #24599 (In Progress): mimic: failed to load OSD map for epoch X, got 0 bytes
Nathan Cutler
10:01 PM Backport #24599 (Resolved): mimic: failed to load OSD map for epoch X, got 0 bytes
https://github.com/ceph/ceph/pull/22651 Nathan Cutler
09:47 PM Bug #24448 (Won't Fix): (Filestore) ABRT report for package ceph has reached 10 occurrences
This is likely due to filestore becoming overloaded (hence waiting on throttles) and hitting the filestore op thread ... Josh Durgin
09:38 PM Bug #24511 (Duplicate): osd crushed at thread_name:safe_timer
Josh Durgin
09:37 PM Bug #24515: "[WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, mon.c has slow...
Kefu, can you take a look at this? Josh Durgin
09:36 PM Bug #24531: Mimic MONs have slow/long running ops
Joao, could you take a look at this? Josh Durgin
09:34 PM Bug #24549 (Won't Fix): FileStore::read assert (ABRT report for package ceph has reached 1000 occ...
As John described, this is not a bug in ceph but due to failing hardware or the filesystem below. Josh Durgin
09:25 PM Bug #23753 (Can't reproduce): "Error ENXIO: problem getting command descriptions from osd.4" in u...
re-open if it recurs Josh Durgin
09:19 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
Josh Durgin
09:12 PM Bug #22085 (Can't reproduce): jewel->luminous: "[ FAILED ] LibRadosAioEC.IsSafe" in upgrade:jew...
assuming this is the mon crush testing timeout, logs are gone so can't be sure Josh Durgin
08:10 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
backport for mimic: https://github.com/ceph/ceph/pull/22651 Sage Weil
08:07 PM Bug #24423 (Pending Backport): failed to load OSD map for epoch X, got 0 bytes
Sage Weil
07:46 PM Bug #24597 (Resolved): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_m...
... Neha Ojha
06:32 PM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
... Neha Ojha
03:01 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh

Now that I've looked at the code there is nothing surprising about the map handling. There is code in dequeue_op()...
David Zafman
12:37 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh

I was able to reproduce by running a loop of a single test case in qa/standalone/erasure-code/test-erasure-eio.sh
...
David Zafman
01:00 PM Backport #23673 (Resolved): jewel: auth: ceph auth add does not sanity-check caps
Nathan Cutler
12:52 PM Bug #23872 (Resolved): Deleting a pool with active watch/notify linger ops can result in seg fault
Nathan Cutler
12:52 PM Backport #23905 (Resolved): jewel: Deleting a pool with active watch/notify linger ops can result...
Nathan Cutler
12:21 PM Backport #24383 (In Progress): mimic: osd: stray osds in async_recovery_targets cause out of orde...
https://github.com/ceph/ceph/pull/22642 Prashant D
08:42 AM Bug #24588 (Fix Under Review): osd: may get empty info at recovery
-https://github.com/ceph/ceph/pull/22362- John Spray
01:42 AM Bug #24588 (Resolved): osd: may get empty info at recovery
2018-06-15 20:34:16.421720 7f89d2c24700 -1 /home/zzr/ceph.sf/src/osd/PG.cc: In function 'void PG::start_peering_inter... tao ning
08:40 AM Bug #24593: s390x: Ceph Monitor crashed with Caught signal (Aborted)
I expect that only people in possession of s390x hardware will be able to debug this
I see that there is another t...
John Spray
05:33 AM Bug #24593 (New): s390x: Ceph Monitor crashed with Caught signal (Aborted)
We are trying to setup ceph cluster on s390x platform.
ceph-mon service crashed with an error: *** Caught signal ...
Nayana Thorat
05:50 AM Feature #24591 (Fix Under Review): FileStore hasn't impl to get kv-db's statistics
Kefu Chai
03:22 AM Feature #24591: FileStore hasn't impl to get kv-db's statistics
https://github.com/ceph/ceph/pull/22633 Jack Lv
03:22 AM Feature #24591 (Fix Under Review): FileStore hasn't impl to get kv-db's statistics
In BlueStore, you can see kv-db's statistics by "ceph daemon osd.X dump_objectstore_kv_stats", but FileStore hasn't i... Jack Lv
03:22 AM Feature #22147: Set multiple flags in a single command line
I don’t think we should skip it entirely. Many of the places that implement a check like that are using a common flag... Greg Farnum

06/19/2018

11:44 PM Bug #24487 (In Progress): osd: choose_acting loop
This happens when an osd which is part of the acting set and not a part the upset, gets chosen as an async_recovery_t... Neha Ojha
10:51 PM Backport #23673: jewel: auth: ceph auth add does not sanity-check caps
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21367
merged
Yuri Weinstein
10:50 PM Backport #23905: jewel: Deleting a pool with active watch/notify linger ops can result in seg fault
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21754
merged
Yuri Weinstein
10:49 PM Feature #22147: Set multiple flags in a single command line
It seems fair to assume that "unset" should support this also.
Question: should settings that require --yes-i-real...
Jesse Williamson
10:40 PM Bug #24587: librados api aio tests race condition
http://pulpito.ceph.com/yuriw-2018-06-13_14:55:30-rados-wip-yuri4-testing-2018-06-12-2037-jewel-distro-basic-smithi/2... Josh Durgin
10:38 PM Bug #24587 (Resolved): librados api aio tests race condition
Seen in a jewel integration branch with no OSD changes:
http://pulpito.ceph.com/yuriw-2018-06-12_22:32:43-rados-wi...
Josh Durgin
09:58 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
I did a run based on d9284902e1b2e292595696caf11cdead18acec96 which is a branch off of master.
http://pulpito.ceph...
David Zafman
07:24 PM Backport #24584 (Resolved): luminous: osdc: wrong offset in BufferHead
https://github.com/ceph/ceph/pull/22865 Nathan Cutler
07:24 PM Backport #24583 (Resolved): mimic: osdc: wrong offset in BufferHead
https://github.com/ceph/ceph/pull/22869 Nathan Cutler
06:02 PM Bug #19971 (Resolved): osd: deletes are performed inline during pg log processing
Nathan Cutler
06:01 PM Backport #22406 (Rejected): jewel: osd: deletes are performed inline during pg log processing
This change was deemed too invasive at such a late stage in Jewel's life cycle. Nathan Cutler
06:01 PM Backport #22405 (Rejected): jewel: store longer dup op information
This change was deemed too invasive at such a late stage in Jewel's life cycle. Nathan Cutler
06:00 PM Backport #22400 (Rejected): jewel: PR #16172 causing performance regression
This change was deemed too invasive at such a late stage in Jewel's life cycle. Nathan Cutler
04:10 PM Bug #24484 (Pending Backport): osdc: wrong offset in BufferHead
Jason Dillaman
11:54 AM Bug #24448: (Filestore) ABRT report for package ceph has reached 10 occurrences
OSD killed by signal, something like OOM incidents perhaps? John Spray
11:53 AM Bug #24450 (Duplicate): OSD Caught signal (Aborted)
http://tracker.ceph.com/issues/24423 Igor Fedotov
11:51 AM Bug #24559 (Fix Under Review): building error for QAT decompress
John Spray
02:10 AM Bug #24559 (Fix Under Review): building error for QAT decompress
The parameter of decompress changes from 'bufferlist::iterator' to 'bufferlist::const_iterator', but chis change miss... Qiaowei Ren
11:34 AM Bug #24549: FileStore::read assert (ABRT report for package ceph has reached 1000 occurrences)
Presumably this is underlying FS failures tripping asserts rather than a bug (perhaps people using ZFS on centos, or ... John Spray
07:26 AM Backport #24355 (In Progress): mimic: osd: pg hard limit too easy to hit
https://github.com/ceph/ceph/pull/22621 Prashant D
 

Also available in: Atom