Activity
From 06/19/2018 to 07/18/2018
07/18/2018
- 09:42 PM Backport #24989: mimic: Limit pg log length during recovery/backfill so that we don't run out of ...
- We can hold off on this backport for now. Need to let this bake in master for a while.
- 08:00 PM Backport #24989 (Resolved): mimic: Limit pg log length during recovery/backfill so that we don't ...
- https://github.com/ceph/ceph/pull/23403
- 09:42 PM Backport #24988: luminous: Limit pg log length during recovery/backfill so that we don't run out ...
- We can hold off on this backport for now. Need to let this bake in master for a while.
Also, this backport is going ... - 08:00 PM Backport #24988 (Resolved): luminous: Limit pg log length during recovery/backfill so that we don...
- https://github.com/ceph/ceph/pull/23211
- 09:38 PM Bug #24975 (Pending Backport): valgrind-leaks.yaml: expected valgrind issues and found none
- This issue has been fixed in master by https://github.com/ceph/ceph/pull/22261
Needs to be backported to mimic. - 09:14 PM Bug #24935 (Duplicate): SafeTimer? osd killed by kernel for Segmentation fault
- This appears to be another instance of #23352.
- 09:12 PM Bug #24938: luminous: rados listomapkeys & listomapvals don't return data.
- Did you check that this bucket actually has any entries? These commands are tested in our suite.
- 08:46 PM Bug #24990 (Resolved): api_watch_notify: LibRadosWatchNotify.Watch3Timeout failed
- ...
- 06:10 PM Feature #23979 (Pending Backport): Limit pg log length during recovery/backfill so that we don't ...
- 04:15 PM Support #24980: Pg Inconsistent - failed to pick suitable auth object
- Alon Avrahami wrote:
> Hi,
>
>
> We have ceph cluster installed with Luminous 12.2.2 using bluestore.
> All no... - 01:24 PM Support #24980 (Rejected): Pg Inconsistent - failed to pick suitable auth object
- Hi,
We have ceph cluster installed with Luminous 12.2.2 using bluestore.
All nodes are Intel servers with 1.6TB... - 03:42 PM Backport #24472 (Resolved): mimic: Ceph-osd crash when activate SPDK
- 02:32 PM Backport #24472: mimic: Ceph-osd crash when activate SPDK
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22684
merged - 03:36 PM Bug #24950 (Resolved): Running osd_skip_data_digest in a mixed cluster is not ideal
- 03:35 PM Backport #24865 (Resolved): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/te...
- 02:20 PM Backport #24865: mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-...
- Patrick Donnelly wrote:
> https://github.com/ceph/ceph/pull/23024
merged - 03:14 PM Backport #24951 (Resolved): mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
- 02:24 PM Backport #24951: mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
- David Zafman wrote:
> https://github.com/ceph/ceph/pull/23084
nerged - 02:22 PM Bug #23965: FAIL: s3tests.functional.test_s3.test_multipart_upload_resend_part with ec cache pools
- https://github.com/ceph/ceph/pull/23096 merged
- 11:20 AM Documentation #20894 (Resolved): rados manpage does not document "cleanup"
- https://github.com/ceph/ceph/pull/16777
07/17/2018
- 10:48 PM Bug #24975 (Resolved): valgrind-leaks.yaml: expected valgrind issues and found none
- ...
- 10:43 PM Bug #24974 (New): Segmentation fault in tcmalloc::ThreadCache::ReleaseToCentralCache()
- ...
- 08:32 PM Backport #24583 (Resolved): mimic: osdc: wrong offset in BufferHead
- 08:10 PM Backport #24583: mimic: osdc: wrong offset in BufferHead
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22869
merged - 06:21 PM Feature #23979 (Fix Under Review): Limit pg log length during recovery/backfill so that we don't ...
- https://github.com/ceph/ceph/pull/23098
- 05:39 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
- 01:37 PM Bug #20645 (Closed): bluesfs wal failed to allocate (assert(0 == "allocate failed... wtf"))
- 09:58 AM Bug #24956 (Resolved): osd: parent process need to restart log service after fork, or ceph-osd wi...
- ceph-osd parent process need to restart log service after fork, or ceph-osd will not work correctly when the option l...
07/16/2018
- 09:18 PM Bug #24950: Running osd_skip_data_digest in a mixed cluster is not ideal
- https://github.com/ceph/ceph/pull/23083
- 09:14 PM Bug #24950 (Resolved): Running osd_skip_data_digest in a mixed cluster is not ideal
If osd_skip_data_digest in a mixed BlueStore/FileStore cluster is dangerous because we loose data_digest integrity ...- 09:17 PM Backport #24951 (Resolved): mimic: Running osd_skip_data_digest in a mixed cluster is not ideal
- https://github.com/ceph/ceph/pull/23084
- 09:08 PM Feature #24949 (Resolved): luminous: Allow scrub to fix Luminous 12.2.6 corruption of data_digest
I'm thinking that while osd_distrust_data_digest=true we should automatically ignore data_digest errors and repair ...- 07:36 PM Bug #23352: osd: segfaults under normal operation
- We actually got one on July 15: Jul 14 23:54:42 roc04r-sc3a080 kernel: [6988357.283555] safe_timer[19917]: segfault a...
- 03:54 AM Bug #23352: osd: segfaults under normal operation
- The latest core uploaded by Dan in comment 66 is slightly different to the others we've seen so far.
Once again th... - 02:24 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
- https://github.com/ceph/ceph/pull/23072
- 02:24 PM Bug #24687 (Fix Under Review): Automatically set expected_num_objects for new pools with >=100 PG...
- Because a value for expected_num_objects is too difficult to determine automatically, instead we print a suggestion t...
- 11:16 AM Bug #24938 (New): luminous: rados listomapkeys & listomapvals don't return data.
- Hi,
rados listomapkeys & rados listomapvals don't return data when running Luminous, tested on 12.2.4 and 12.2.6:
... - 08:52 AM Bug #24935 (Duplicate): SafeTimer? osd killed by kernel for Segmentation fault
- My environment :
[root@gz-ceph-52-203 log]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@gz-... - 12:57 AM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
- Noting the same issue, per ceph-users list post:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-July/028...
07/15/2018
- 05:46 AM Documentation #24924 (Resolved): doc: typo in crush-map docs
- Each time the OSD starts, it verifies it is in the correct location in the CRUSH map and, if it is not, it moved its...
07/14/2018
- 09:04 PM Bug #24923 (Resolved): doc: http://docs.ceph.com/docs/mimic/rados/operations/pg-states/
- Undersized
The placement group fewer copies than the configured pool replication level.
Missing "has"
- 07:57 PM Bug #23871: luminous->mimic: missing primary copy of xxx, wil try copies on 3, then full-object r...
- For the luminous regression, this will reproduce the issue:...
07/13/2018
- 11:02 PM Feature #24917 (New): Gracefully deal with upgrades when bluestore skipping of data_digest become...
Once the data_digest is no longer being used, but is still set from an earlier version, we can get EIO from read bu...- 09:26 PM Backport #24083 (In Progress): luminous: rados: not all exceptions accept keyargs
- PR: https://github.com/ceph/ceph/pull/22979
- 03:52 PM Bug #24597 (Resolved): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_m...
- 05:09 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- Could cephfs trigger this issue? There have been two reports of cephfs_metadata pool crc errors on the users ML this ...
- 03:51 PM Backport #24891 (Resolved): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_...
- 03:18 PM Backport #24891: mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22997
merged - 03:00 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
- FTR, this crc issue is probably due to an incomplete backport to 12.2.6 of the skip_digest changes for bluestore:
... - 01:55 PM Bug #24909 (Fix Under Review): RBD client IOPS pool stats are incorrect (2x higher; includes IO h...
- https://github.com/ceph/ceph/pull/23029
- 01:47 PM Bug #24909 (In Progress): RBD client IOPS pool stats are incorrect (2x higher; includes IO hints ...
- 01:47 PM Bug #24909 (Resolved): RBD client IOPS pool stats are incorrect (2x higher; includes IO hints as ...
- While running performance testing with Ceph metrics gathering statistics on the cluster, I noticed that while my RBD ...
- 12:58 PM Backport #24908 (In Progress): luminous: luminous->mimic: missing primary copy of xxx, wil try co...
- 12:57 PM Backport #24908 (Resolved): luminous: luminous->mimic: missing primary copy of xxx, wil try copie...
- https://github.com/ceph/ceph/pull/23028
- 12:26 PM Backport #24890 (Resolved): luminous: FAILED assert(0 == "ERROR: source must exist") in FileStore...
- 12:26 PM Bug #23871: luminous->mimic: missing primary copy of xxx, wil try copies on 3, then full-object r...
- original fix is fe5038c7f9577327f82913b4565712c53903ee48
luminosu backport https://github.com/ceph/ceph/pull/23028 - 12:06 PM Bug #23871 (Pending Backport): luminous->mimic: missing primary copy of xxx, wil try copies on 3,...
- 11:31 AM Backport #24888 (Need More Info): luminous: osd: crash in OpTracker::unregister_inflight_op via O...
- non-trivial backport. There are two conflicts. The first conflict can be resolved by cherry-picking 17a192ba5cdbe2129...
- 11:23 AM Backport #24889 (In Progress): mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::ge...
- 11:22 AM Backport #24864 (In Progress): luminous: Abort in OSDMap::decode() during qa/standalone/erasure-c...
- 11:20 AM Backport #24865 (In Progress): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code...
07/12/2018
- 11:56 PM Bug #24801 (In Progress): PG num_bytes becomes huge
- 07:38 PM Bug #24600 (Resolved): ValueError: too many values to unpack due to lack of subdir
- 07:38 PM Backport #24617 (Resolved): mimic: ValueError: too many values to unpack due to lack of subdir
- 04:36 PM Backport #24617: mimic: ValueError: too many values to unpack due to lack of subdir
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22888
merged - 02:05 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
- Is this the relevant fix? https://github.com/ceph/ceph/commit/4667280f8afe6cd68dfffea61d7530581f3dd0eb
Alessandro'... - 12:27 PM Backport #24890 (In Progress): luminous: FAILED assert(0 == "ERROR: source must exist") in FileSt...
- 10:18 AM Backport #24890 (Resolved): luminous: FAILED assert(0 == "ERROR: source must exist") in FileStore...
- https://github.com/ceph/ceph/pull/22976
- 11:03 AM Backport #24891 (In Progress): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore...
- 10:18 AM Backport #24891 (Resolved): mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_...
- https://github.com/ceph/ceph/pull/22997
- 10:50 AM Bug #24150 (Resolved): LibRadosMiscPool.PoolCreationRace segv
- 10:50 AM Backport #24204 (Resolved): mimic: LibRadosMiscPool.PoolCreationRace segv
- 12:06 AM Backport #24204: mimic: LibRadosMiscPool.PoolCreationRace segv
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22291
merged - 10:50 AM Bug #24321 (Resolved): assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max...
- 10:49 AM Backport #24329 (Resolved): mimic: assert manager.get_num_active_clean() == pg_num on rados/singl...
- 12:05 AM Backport #24329: mimic: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/ma...
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22492
merged - 10:48 AM Backport #24747 (Resolved): mimic: change default filestore_merge_threshold to -10
- 12:03 AM Backport #24747: mimic: change default filestore_merge_threshold to -10
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22813
merged - 10:48 AM Bug #24365 (Resolved): cosbench stuck at booting cosbench driver
- 10:47 AM Backport #24473 (Resolved): mimic: cosbench stuck at booting cosbench driver
- 12:03 AM Backport #24473: mimic: cosbench stuck at booting cosbench driver
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22887
merged - 10:46 AM Bug #24487 (Resolved): osd: choose_acting loop
- 10:46 AM Backport #24618 (Resolved): mimic: osd: choose_acting loop
- 12:02 AM Backport #24618: mimic: osd: choose_acting loop
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22889
merged - 10:46 AM Bug #24349 (Resolved): osd: stray osds in async_recovery_targets cause out of order ops
- 10:46 AM Backport #24383 (Resolved): mimic: osd: stray osds in async_recovery_targets cause out of order ops
- 12:02 AM Backport #24383: mimic: osd: stray osds in async_recovery_targets cause out of order ops
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22889
merged - 10:45 AM Backport #24805 (Resolved): mimic: rgw workload makes osd memory explode
- 12:00 AM Backport #24805: mimic: rgw workload makes osd memory explode
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22960
merged - 10:36 AM Backport #24771 (Resolved): mimic: osd: may get empty info at recovery
- 10:18 AM Backport #24889 (Resolved): mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_h...
- https://github.com/ceph/ceph/pull/23026
- 10:18 AM Backport #24888 (Rejected): luminous: osd: crash in OpTracker::unregister_inflight_op via OSD::ge...
- 03:03 AM Bug #24664 (Pending Backport): osd: crash in OpTracker::unregister_inflight_op via OSD::get_healt...
- 03:01 AM Bug #24597 (Pending Backport): FAILED assert(0 == "ERROR: source must exist") in FileStore::_coll...
07/11/2018
- 11:48 PM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
- ...
- 11:47 PM Bug #18209: src/common/LogClient.cc: 310: FAILED assert(num_unsent <= log_queue.size())
- Happened again in 12.2.4:...
- 11:33 PM Bug #24866: FAILED assert(0 == "past_interval start interval mismatch") in check_past_interval_bo...
- /a/nojha-2018-07-06_23:31:26-rados-wip-23979-2018-07-06-distro-basic-smithi/2744661/
- 11:24 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- Cool, I will pickup and run your test, atm the load on workers is high, should have the results tomorrow eod.
- 10:25 AM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- OK, it looks like we missed this in the previous tracker issue that mentioned it (it was actually a three part fix an...
- 11:23 PM Bug #24676 (Resolved): FreeBSD/Linux integration - monitor map with wrong sa_family
- 11:21 PM Bug #24683: ceph-mon binary doesn't report to systemd why it dies
- Does this show up in the monitor's log in /var/log/ceph/ ?
- 11:15 PM Bug #24786 (Resolved): LibRadosList.ListObjectsNS fails
- https://github.com/ceph/ceph/pull/22771
- 11:13 PM Bug #24787 (Duplicate): cls_rgw.index_suggest FAILED
- Looks the same as #24640
- 11:11 PM Bug #24835 (Need More Info): osd daemon spontaneous segfault
- Unfortunately there's not much to go on - if this happens again perhaps you can grab a core file or a crash dump will...
- 10:09 PM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- mimic backport: https://github.com/ceph/ceph/pull/22997
- 03:54 PM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- Factors leading to this:
- ec pool (e.g., rgw workload0
- rados ops that result in pg log 'error' entries (e.g., ... - 12:37 PM Bug #24597 (In Progress): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collectio...
- https://github.com/ceph/ceph/pull/22974
- 01:16 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- Aha, in that case wip-24192 should fix it. Running it through testing again...
- 12:38 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- 12:38 AM Bug #24597: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
- I believe this is caused by b50186bfe6c8981700e33c8a62850e21779d67d5, which does...
- 09:38 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
- Ah, the error was reported on luminous, which doesn't do the repair, and I guess I missed it on master. Sorry for the...
- 09:01 PM Bug #24875: OSD: still returning EIO instead of recovering objects on checksum errors
The do_sparse_read() path doesn't attempt to repair a checksum error. Could that be the real issue?
The do_read...- 08:25 PM Bug #24875 (Resolved): OSD: still returning EIO instead of recovering objects on checksum errors
- A report came in on the mailing list of an MDS journal which couldn't be read and was throwing errors:...
- 08:31 PM Bug #24876 (New): snaptrim_error state cannot be cleared without a new snaptrim
- A user on the list reported they had PGs in state "active+clean+snaptrim_error". Investigating, I found that the only...
- 08:11 PM Backport #24771: mimic: osd: may get empty info at recovery
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22861
mergedReviewed-by: Sage Weil <sage@redhat.com> - 07:27 PM Bug #24874 (New): ec fast reads can trigger read errors in log
- fast read finishes......
- 04:11 PM Bug #23145 (Duplicate): OSD crashes during recovery of EC pg
- This looks like #24597 for the 12.2.5 case, at least. I wonder if the original 12.2.3 is something else (time warp d...
- 03:51 PM Bug #24192 (Duplicate): cluster [ERR] Corruption detected: object 2:f59d1934:::smithi14913526-582...
07/10/2018
- 10:10 PM Bug #24866 (Resolved): FAILED assert(0 == "past_interval start interval mismatch") in check_past_...
- ...
- 08:30 PM Backport #24865 (Resolved): mimic: Abort in OSDMap::decode() during qa/standalone/erasure-code/te...
- https://github.com/ceph/ceph/pull/23024
- 08:29 PM Backport #24864 (Resolved): luminous: Abort in OSDMap::decode() during qa/standalone/erasure-code...
- https://github.com/ceph/ceph/pull/23025
- 04:51 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- This was a ceph-volume test with rbd workload, no upgrades, just fresh install, full logs at
http://pulpito.ceph.c... - 02:41 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- This points to a deeper issue. The target context seems to always be 'unlabeled_t'. That context means something like...
- 12:23 PM Bug #24785: mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- Filing under RADOS because it appears to be OSD specific.
- 01:42 PM Bug #23492 (Pending Backport): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-e...
- 12:46 PM Bug #24850 (New): IPv6 scoped address not parseable by entity_addr_t
- An IPv6 link-local scoped address is not currently parseable since it contains a "%<interface name>" suffix in the ad...
- 12:14 PM Bug #24835: osd daemon spontaneous segfault
- The log (attached) does not contain any information on the crash. It shows only the automatic restart of the crashed ...
- 09:54 AM Backport #24847 (In Progress): jewel: rgw workload makes osd memory explode
- 09:54 AM Backport #24847 (Resolved): jewel: rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22959
- 09:48 AM Backport #24806 (In Progress): luminous: rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22962
- 09:42 AM Backport #24805 (In Progress): mimic: rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22960
- 09:41 AM Bug #23352: osd: segfaults under normal operation
- We see periodically with osd_enable_op_tracker = false
Last time ... - 12:55 AM Bug #23352: osd: segfaults under normal operation
- That is correct, Brad. No crashes for 7 days now.
- 09:33 AM Bug #24768: rgw workload makes osd memory explode
- jewel backport: https://github.com/ceph/ceph/pull/22959
i knew that jewel is (almost) EOL. just in case anyone is ... - 04:01 AM Backport #24845 (Resolved): luminous: tools/ceph-objectstore-tool: split filestore directories of...
- https://github.com/ceph/ceph/pull/23418
07/09/2018
- 10:43 PM Bug #23352: osd: segfaults under normal operation
- Alex, So that's a week without issue when previously you were getting a crash every 3-4 days right?
- 01:36 PM Bug #23352: osd: segfaults under normal operation
- No issues so far since injecting osd_enable_op_tracker=false
- 08:40 PM Feature #21366 (Pending Backport): tools/ceph-objectstore-tool: split filestore directories offli...
- 06:27 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- https://github.com/ceph/ceph/pull/22954
- 06:02 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- The problem is that int global_init_shutdown_stderr(CephContext *cct) is not being run at a time in the process lifec...
- 05:02 PM Bug #24835: osd daemon spontaneous segfault
- Can you provide the backtrace out of the OSD log? Or even the whole log?
- 02:13 PM Bug #24835 (Can't reproduce): osd daemon spontaneous segfault
- We experience spontaneous segmentation faults of osd daemons in our mimic production cluster:...
- 04:36 PM Bug #24838 (Resolved): mon: auth checks not correct for pool ops
- The mon was not enforcing caps for pool ops correctly (which are used for managing unmanaged snapshots or even pool d...
- 04:32 PM Bug #24837 (Resolved): auth: cephx signature check is weak/broken
- The signature check code was validating only the first (32-byte) of two blocks, and thus did not cover all of the crc...
- 04:30 PM Bug #24836 (Resolved): auth: cephx authorizer subject to replay
- The cephx authorizer does not have any challenge or nonce, and thus (if sniffed) can be reused by another session.
... - 04:09 PM Bug #24368: osd: should not restart on permanent failures
- I don't think the issue has moved beyond the PR linked above to change the systemd settings. I sent this out to one o...
- 08:42 AM Bug #24368: osd: should not restart on permanent failures
- guotao Yao wrote:
> I've had a similar problem recently. One OSD crash and exit, and the OSD process starts up quick... - 08:12 AM Bug #24368: osd: should not restart on permanent failures
- I've had a similar problem recently. One OSD crash and exit, and the OSD process starts up quickly by systemd. It cau...
07/06/2018
- 09:55 PM Bug #24322 (Resolved): slow mon ops from osd_failure
- 09:55 PM Backport #24350 (Resolved): mimic: slow mon ops from osd_failure
- 09:50 PM Backport #24350: mimic: slow mon ops from osd_failure
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22297
merged - 09:54 PM Bug #24222 (Resolved): Manager daemon y is unresponsive during teuthology cluster teardown
- 09:54 PM Backport #24246 (Resolved): mimic: Manager daemon y is unresponsive during teuthology cluster tea...
- 09:49 PM Backport #24246: mimic: Manager daemon y is unresponsive during teuthology cluster teardown
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22333
merged - 09:54 PM Backport #24375 (Resolved): mimic: mon: auto compaction on rocksdb should kick in more often
- 09:49 PM Backport #24375: mimic: mon: auto compaction on rocksdb should kick in more often
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22361
merged - 09:52 PM Backport #24407 (Resolved): mimic: read object attrs failed at EC recovery
- 09:51 PM Bug #24408 (Resolved): tell ... config rm <foo> not idempotent
- 09:51 PM Backport #24468 (Resolved): mimic: tell ... config rm <foo> not idempotent
- 09:42 PM Backport #24468: mimic: tell ... config rm <foo> not idempotent
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22552
merged - 09:50 PM Backport #24332 (Resolved): mimic: local_reserver double-reservation of backfilled pg
- 09:42 PM Backport #24332: mimic: local_reserver double-reservation of backfilled pg
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22559
merged - 09:49 PM Bug #24423 (Resolved): failed to load OSD map for epoch X, got 0 bytes
- 09:49 PM Backport #24599 (Resolved): mimic: failed to load OSD map for epoch X, got 0 bytes
- 09:40 PM Backport #24599: mimic: failed to load OSD map for epoch X, got 0 bytes
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22651
merged - 09:48 PM Backport #24494 (Resolved): mimic: osd: segv in Session::have_backoff
- 09:39 PM Backport #24494: mimic: osd: segv in Session::have_backoff
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22730
merged - 09:47 PM Bug #24199 (Resolved): common: JSON output from rados bench write has typo in max_latency key
- 09:47 PM Backport #24291 (Resolved): jewel: common: JSON output from rados bench write has typo in max_lat...
- 09:45 PM Backport #24292 (Resolved): mimic: common: JSON output from rados bench write has typo in max_lat...
- 09:44 PM Backport #24292: mimic: common: JSON output from rados bench write has typo in max_latency key
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22406
merged - 09:06 PM Backport #24806 (Resolved): luminous: rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22962
- 09:06 PM Backport #24805 (Resolved): mimic: rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22960
- 06:44 PM Bug #24768 (Pending Backport): rgw workload makes osd memory explode
- 06:12 PM Bug #24801: PG num_bytes becomes huge
The OSD logs and this bug point to a slight flaw in https://github.com/ceph/ceph/pull/22797. I add the adjustment ...- 05:57 PM Bug #24801 (Resolved): PG num_bytes becomes huge
dzafman-2018-07-05_12:45:56-rados-wip-19753-distro-basic-smithi/2739140
description: rados/thrash/{0-size-min-si...- 04:45 PM Backport #23772 (In Progress): luminous: ceph status shows wrong number of objects
- 01:39 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- ...
- 01:28 AM Bug #24787 (Duplicate): cls_rgw.index_suggest FAILED
dzafman-2018-07-03_13:41:32-rados-wip-19753-distro-basic-smithi
2732821
2732693
2732523...- 01:01 AM Bug #24786 (Resolved): LibRadosList.ListObjectsNS fails
http://pulpito.ceph.com/dzafman-2018-07-03_13:41:32-rados-wip-19753-distro-basic-smithi
Multiple jobs
2732818
...
07/05/2018
- 10:40 PM Bug #24785 (Resolved): mimic selinux denials comm="tp_fstore_op / comm="ceph-osd dev=dm-0 and dm-1
- ...
- 09:33 PM Bug #24664 (In Progress): osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_met...
- https://github.com/ceph/ceph/pull/22877
- 08:39 PM Backport #24383: mimic: osd: stray osds in async_recovery_targets cause out of order ops
- Ganging up with another backport to prevent merge conflicts.
- 08:27 PM Backport #24618 (In Progress): mimic: osd: choose_acting loop
- 08:22 PM Backport #24617 (In Progress): mimic: ValueError: too many values to unpack due to lack of subdir
- 08:15 PM Backport #24473 (In Progress): mimic: cosbench stuck at booting cosbench driver
- 12:44 PM Bug #24768 (Fix Under Review): rgw workload makes osd memory explode
- https://github.com/ceph/ceph/pull/22858
- 09:17 AM Backport #24583 (In Progress): mimic: osdc: wrong offset in BufferHead
- https://github.com/ceph/ceph/pull/22869
- 07:25 AM Backport #24584 (In Progress): luminous: osdc: wrong offset in BufferHead
- https://github.com/ceph/ceph/pull/22865
07/04/2018
- 11:11 PM Backport #24772 (In Progress): luminous: osd: may get empty info at recovery
- 10:52 PM Backport #24772 (Resolved): luminous: osd: may get empty info at recovery
- https://github.com/ceph/ceph/pull/22862
- 11:03 PM Backport #24771 (In Progress): mimic: osd: may get empty info at recovery
- 10:52 PM Backport #24771 (Resolved): mimic: osd: may get empty info at recovery
- https://github.com/ceph/ceph/pull/22861
- 07:24 PM Bug #24588: osd: may get empty info at recovery
- https://github.com/ceph/ceph/pull/22704 is the fix
- 07:23 PM Bug #24588 (Pending Backport): osd: may get empty info at recovery
- 07:17 PM Bug #24768 (Resolved): rgw workload makes osd memory explode
- From ML,...
- 12:47 PM Bug #23352: osd: segfaults under normal operation
- Brad Hubbard wrote:
> Having reviewed the code in question again I was afraid that may be the case. If you can provi... - 09:38 AM Bug #23352: osd: segfaults under normal operation
- Having reviewed the code in question again I was afraid that may be the case. If you can provide the crash dump Dan, ...
- 07:36 AM Bug #23352: osd: segfaults under normal operation
- I *injected* osd_enable_op_tracker=false yesterday ...
- 07:40 AM Bug #24123 (Resolved): "process (unknown)" in ceph logs
- 07:39 AM Backport #24215 (Resolved): mimic: "process (unknown)" in ceph logs
- 07:38 AM Bug #24243 (Resolved): osd: pg hard limit too easy to hit
- 07:38 AM Backport #24500 (Resolved): mimic: osd: eternal stuck PG in 'unfound_recovery'
- 07:37 AM Backport #24355 (Resolved): mimic: osd: pg hard limit too easy to hit
- 07:31 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- Hi Martin,
Have you try my workaround above?
Best regards, - 06:18 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- Hi everyone,
What’s the workaround for this issue? Not being able to add new osds is getting more and more urgent... - 01:10 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- Final bisect results:
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
2e476...
07/03/2018
- 10:51 AM Backport #24748 (In Progress): luminous: change default filestore_merge_threshold to -10
- 07:55 AM Backport #24748 (Resolved): luminous: change default filestore_merge_threshold to -10
- https://github.com/ceph/ceph/pull/22814
- 10:47 AM Backport #24747 (In Progress): mimic: change default filestore_merge_threshold to -10
- 07:55 AM Backport #24747 (Resolved): mimic: change default filestore_merge_threshold to -10
- https://github.com/ceph/ceph/pull/22813
- 10:47 AM Bug #24686: change default filestore_merge_threshold to -10
- *master PR*: https://github.com/ceph/ceph/pull/22761
- 12:36 AM Bug #24686 (Pending Backport): change default filestore_merge_threshold to -10
- 10:24 AM Feature #13507: scrub APIs to read replica
- Update backport field?
- 04:16 AM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- Hi Sergey,
Have you try that after "ceph osd require-osd-release mimic"?
My workaround is below.
1. Build pa... - 03:08 AM Bug #23352: osd: segfaults under normal operation
- Thanks Alex!
- 01:50 AM Bug #23352: osd: segfaults under normal operation
- I set it Brad, watching the status. We normal get one failure in 3-4 days.
- 01:03 AM Bug #23352: osd: segfaults under normal operation
- We are investigating the potential race between get_health_metrics and the op_tracker code.
In the mean time, for ...
07/02/2018
- 10:47 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I finded workaround solution how add new osd to "mimic" cluster:
1. Purge osd from cluster which displayed as "dow... - 04:52 PM Backport #24215: mimic: "process (unknown)" in ceph logs
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22311
merged - 04:52 PM Backport #24500: mimic: osd: eternal stuck PG in 'unfound_recovery'
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22545
merged - 04:51 PM Backport #24355: mimic: osd: pg hard limit too easy to hit
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22621
merged - 06:12 AM Bug #23352: osd: segfaults under normal operation
- Pretty sure this all revolves around the racy code highlighted in #24037 and, unfortunately, the PR does *not* fix al...
06/29/2018
- 11:27 PM Bug #23875 (In Progress): Removal of snapshot with corrupt replica crashes osd
- Tentative pull request https://github.com/ceph/ceph/pull/22476 is an improvement but doesn't address comment 3
- 11:25 PM Bug #19753 (In Progress): Deny reservation if expected backfill size would put us over backfill_f...
- 05:59 PM Bug #24687: Automatically set expected_num_objects for new pools with >=100 PGs per OSD
- Also include >1024 PGs overall
- 09:59 AM Bug #23145: OSD crashes during recovery of EC pg
- @sage weil,
tks, due to env is not exists. I couldn't get the logs for the arguments debug_osd=20.
from the previou...
06/28/2018
- 05:53 PM Bug #24645: Upload to radosgw fails when there are degraded objects
- When the cluster is in recovery this is expected that we're waiting for the OSDs to respond
- 05:16 PM Bug #24676: FreeBSD/Linux integration - monitor map with wrong sa_family
- I discovered that commit 9099ca5 - "fix the dencoder of entity_addr_t" introduced this kind of interoperability which...
- 08:50 AM Bug #24676: FreeBSD/Linux integration - monitor map with wrong sa_family
- I investigated further with gdb. Lines 478-501 from msg/msg_types.h seem to be the culprit. Here sa_family is decoded...
- 05:14 PM Bug #17257: ceph_test_rados_api_lock fails LibRadosLockPP.LockExclusiveDurPP
- ...
- 05:08 PM Bug #24485: LibRadosTwoPoolsPP.ManifestUnset failure
- /a/nojha-2018-06-27_22:32:36-rados-wip-23979-distro-basic-smithi/2715571/
- 02:50 PM Bug #24686 (In Progress): change default filestore_merge_threshold to -10
- 02:18 PM Bug #24686 (Resolved): change default filestore_merge_threshold to -10
- Performance evaluations of medium to large size Ceph clusters have demonstrated negligible performance impact from un...
- 02:49 PM Bug #24687 (Resolved): Automatically set expected_num_objects for new pools with >=100 PGs per OSD
- Field experience has demonstrated significant performance impact from filestore split and merge activity. The expecte...
- 10:15 AM Bug #24685 (Resolved): config options: possible inconsistency between flag 'can_update_at_runtime...
- I'm wondering if there is a inconsistency between the 'can_update_at_runtime' flag and the 'flags' list for the confi...
- 08:47 AM Bug #24683: ceph-mon binary doesn't report to systemd why it dies
- If I execute the same command that systemd uses, I get a great readable error message:...
- 08:45 AM Bug #24683 (New): ceph-mon binary doesn't report to systemd why it dies
- Following the quick start guide I get at a point where the monitor is supposed to come up but it doesn't. It doesn't ...
- 05:53 AM Bug #24587: librados api aio tests race condition
- Good news, this is just a bug in the tests. They're submitting a write and then a read without waiting for the write ...
- 03:04 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Thanks Sage. I'll try to get my hands on another environment and see if I can reproduce and get more details. Will up...
06/27/2018
- 09:17 PM Bug #24615: error message for 'unable to find any IP address' not shown
- Sounds like the log isn't being flushed before exiting
- 09:13 PM Bug #24652 (Won't Fix): OSD crashes when repairing pg
- This should be fixed in later versions - hammer is end of life.
The crash was:... - 09:06 PM Bug #24667: osd: SIGSEGV in MMgrReport::encode_payload
- Possibly related to a memory corruption we've been seeing related to mgr health reporting on the osd.
- 07:54 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- https://github.com/ceph/ceph/pull/22744 disabled build_past_itnervals_parallel in luminous (by default; can be turned...
- 05:51 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Well, I can work around the issue.. I the build_past_itnervals_parallel() is removed entirely in mimic and I can do t...
- 07:12 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- > Any chance you can gdb one of the core files for a crashing OSD to identify which PG is it asserting on? and perhap...
- 06:10 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Sage Weil wrote:
> Dexter, anyone: was there a PG split (pg_num increase) on the cluster before this happened? Or m... - 05:19 PM Bug #24678 (Can't reproduce): ceph-mon segmentation fault after setting pool size to 1 on degrade...
- We have an issue with starting any from 3 monitors after changing pool size from 3 to 1. The cluster was in a degrade...
- 03:09 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- I also have this issue on new installed mimic cluster.
Don't know if this is important, the problem has appeared aft... - 12:00 PM Bug #24676 (Resolved): FreeBSD/Linux integration - monitor map with wrong sa_family
- We are using a ceph cluster in a mixed FreeBSD/Linux environment. The ceph cluster is based on FreeBSD. Linux clients...
- 09:16 AM Bug #23352: osd: segfaults under normal operation
- We're getting a few crashes like this per week here on 12.2.5.
Here's a fileStore OSD:... - 04:10 AM Backport #24494 (In Progress): mimic: osd: segv in Session::have_backoff
- https://github.com/ceph/ceph/pull/22730
- 04:09 AM Backport #24495 (In Progress): luminous: osd: segv in Session::have_backoff
- https://github.com/ceph/ceph/pull/22729
- 12:41 AM Bug #23395 (Can't reproduce): qa/standalone/special/ceph_objectstore_tool.py causes ceph-mon core...
06/26/2018
- 11:32 PM Bug #23492 (In Progress): Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasur...
- 11:29 PM Feature #13507 (New): scrub APIs to read replica
- 11:28 PM Bug #24366 (Resolved): omap_digest handling still not correct
- 11:27 PM Backport #24381 (Resolved): luminous: omap_digest handling still not correct
- 11:27 PM Backport #24380 (Resolved): mimic: omap_digest handling still not correct
- 09:08 PM Bug #23352: osd: segfaults under normal operation
- Matt,
Can you provide a coredump or full backtrace? - 01:54 PM Bug #23352: osd: segfaults under normal operation
- Also confirmed on Ubuntu 18.04/Ceph 13.2.0:
ceph-mgr.log
> 2018-06-24 11:14:47.317 7ff17b0db700 -1 mgr.server s... - 02:54 AM Bug #23352: osd: segfaults under normal operation
- confirmed
ceph-mgr.log
@2018-06-20 08:46:05.528656 7fb998ff2700 -1 mgr.server send_report send_report osd,215.0x5... - 07:14 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Dexter, anyone: was there a PG split (pg_num increase) on the cluster before this happened? Or maybe a split combine...
- 07:10 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- ...
- 07:06 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- ...
- 06:42 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Dexter John Genterone wrote:
> Uploaded a few more logs (debug 20) here: https://storage.googleapis.com/ceph-logs/ce... - 07:11 PM Bug #24667 (Can't reproduce): osd: SIGSEGV in MMgrReport::encode_payload
- ...
- 07:07 PM Bug #24666 (New): pybind: InvalidArgumentError is missing 'errno' argument
- Instead of being derived from 'Error', the 'InvalidArgumentError' should be derived from 'OSError' which will handle ...
- 01:49 PM Bug #24664 (Resolved): osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics
- ...
- 09:47 AM Bug #24660 (New): admin/build-doc fails during autodoc on rados module: "AttributeError: __next__"
- I'm trying to send a doc patch and am running @admin/build-doc@ in my local environment explained in [[http://docs.ce...
06/25/2018
- 09:50 PM Bug #23352: osd: segfaults under normal operation
- Same here
2018-06-24 19:42:41.348699 7f3e53a46700 -1 mgr.server send_report send_report osd,226.0x55678069c850 sen... - 09:34 PM Bug #23352: osd: segfaults under normal operation
- Brad Hubbard wrote:
> Can anyone confirm seeing the "unknown health metric" messages in the mgr logs prior to the se... - 02:50 PM Bug #24652 (Won't Fix): OSD crashes when repairing pg
- After a deep-scrub on the primary OSD for the pg we get:...
- 01:03 PM Bug #24650 (New): mark unfound lost revert: out of order trim
- OSD crashes in a few seconds after command 'ceph pg X.XX mark_unfound_lost revert'.
-10> 2018-06-25 15:52:14.49... - 07:53 AM Bug #24645 (New): Upload to radosgw fails when there are degraded objects
- Hi,
we use Ceph RadosGW for storing and serving milions of small images. Everything is working well until recovery... - 06:46 AM Backport #24471 (In Progress): luminous: Ceph-osd crash when activate SPDK
- https://github.com/ceph/ceph/pull/22686
- 05:01 AM Backport #24472 (In Progress): mimic: Ceph-osd crash when activate SPDK
- https://github.com/ceph/ceph/pull/22684
06/22/2018
- 11:25 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Uploaded a few more logs (debug 20) here: https://storage.googleapis.com/ceph-logs/ceph-osd-logs.tar.gz
After runn... - 12:45 PM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- Hi Sage,
We've experienced this again on a new environment we setup. Took a snippet of the logs, hope it's enough:... - 04:46 PM Bug #23622 (Resolved): qa/workunits/mon/test_mon_config_key.py fails on master
- 04:45 PM Backport #23675 (Resolved): luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- 04:25 PM Backport #23675: luminous: qa/workunits/mon/test_mon_config_key.py fails on master
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21368
merged - 04:44 PM Bug #23921 (Resolved): pg-upmap cannot balance in some case
- 04:43 PM Backport #24048 (Resolved): luminous: pg-upmap cannot balance in some case
- 04:25 PM Backport #24048: luminous: pg-upmap cannot balance in some case
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22115
merged - 04:43 PM Bug #24025 (Resolved): RocksDB compression is not supported at least on Debian.
- 04:42 PM Backport #24279 (Resolved): luminous: RocksDB compression is not supported at least on Debian.
- 04:24 PM Backport #24279: luminous: RocksDB compression is not supported at least on Debian.
- Kefu Chai wrote:
> https://github.com/ceph/ceph/pull/22215
merged - 04:40 PM Backport #24329: mimic: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/ma...
- original mimic backport https://github.com/ceph/ceph/pull/22288 was merged, but deemed insufficient
- 04:38 PM Backport #24328 (Resolved): luminous: assert manager.get_num_active_clean() == pg_num on rados/si...
- 04:23 PM Bug #24321: assert manager.get_num_active_clean() == pg_num on rados/singleton/all/max-pg-per-osd...
- merged https://github.com/ceph/ceph/pull/22296
- 04:15 PM Bug #24635 (New): luminous: LibRadosTwoPoolsPP.SetRedirectRead failed
- Probably a race with the redirect code.
From http://qa-proxy.ceph.com/teuthology/yuriw-2018-06-22_03:31:56-rados-w... - 01:19 PM Bug #23352: osd: segfaults under normal operation
- Yeah. We got in log mgr before segfault ceph-osd:
> mgr.server send_report send_report osd,74.0x560276d34ed8 sent me... - 03:54 AM Bug #23352: osd: segfaults under normal operation
- 03:54 AM Bug #23352: osd: segfaults under normal operation
- In several of the crashes we are seeing lines like the following prior to the crash....
- 12:42 PM Bug #17170: mon/monclient: update "unable to obtain rotating service keys when osd init" to sugge...
- I have a bit different effect on v12.2.5, but may be related:
I have similar logs:... - 08:44 AM Backport #24351 (Resolved): luminous: slow mon ops from osd_failure
- 12:23 AM Backport #24351: luminous: slow mon ops from osd_failure
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22568
merged - 08:44 AM Bug #23386 (Resolved): crush device class: Monitor Crash when moving Bucket into Default root
- 08:43 AM Backport #24258 (Resolved): luminous: crush device class: Monitor Crash when moving Bucket into D...
- 12:21 AM Backport #24258: luminous: crush device class: Monitor Crash when moving Bucket into Default root
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22381
mergedReviewed-by: Nathan Cutler <ncutler@suse.com> - 08:43 AM Backport #24290 (Resolved): luminous: common: JSON output from rados bench write has typo in max_...
- 12:20 AM Backport #24290: luminous: common: JSON output from rados bench write has typo in max_latency key
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22391
merged - 08:41 AM Backport #24356 (Resolved): luminous: osd: pg hard limit too easy to hit
- 12:18 AM Backport #24356: luminous: osd: pg hard limit too easy to hit
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/22592
merged - 08:41 AM Backport #24618 (Resolved): mimic: osd: choose_acting loop
- https://github.com/ceph/ceph/pull/22889
- 08:41 AM Backport #24617 (Resolved): mimic: ValueError: too many values to unpack due to lack of subdir
- https://github.com/ceph/ceph/pull/22888
- 12:32 AM Bug #24615 (Resolved): error message for 'unable to find any IP address' not shown
- Hi,
In my ceph.conf I have the option:...
06/21/2018
- 11:40 PM Bug #24613 (New): luminous: rest/test.py fails with expected 200, got 400
- ...
- 10:57 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- possibly related lumious run: http://pulpito.ceph.com/yuriw-2018-06-11_16:27:32-rados-wip-yuri3-testing-2018-06-11-14...
- 10:15 PM Bug #23352: osd: segfaults under normal operation
- Another instance: http://pulpito.ceph.com/yuriw-2018-06-19_21:29:48-rados-wip-yuri-testing-2018-06-19-1953-luminous-d...
- 09:01 PM Bug #24487 (Pending Backport): osd: choose_acting loop
- 05:58 PM Bug #24487 (Fix Under Review): osd: choose_acting loop
- https://github.com/ceph/ceph/pull/22664
- 06:48 PM Bug #24612 (Resolved): FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor::prune_init()
- ...
- 04:51 PM Bug #23879: test_mon_osdmap_prune.sh fails
- /a/nojha-2018-06-21_00:18:52-rados-wip-24487-distro-basic-smithi/2686362
- 09:19 AM Bug #21142: OSD crashes when loading pgs with "FAILED assert(interval.last > last)"
- We also hit this today, happen to have osd log with --debug_osd = 20
FWIW, the cluster has an inconsistent PG and ... - 01:34 AM Bug #24601 (Resolved): FAILED assert(is_up(osd)) in OSDMap::get_inst(int)
- ...
- 12:52 AM Bug #24600 (Resolved): ValueError: too many values to unpack due to lack of subdir
- ...
06/20/2018
- 10:13 PM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- Sage Weil wrote:
> Can you generate an osd log with 'debug osd = 20' for the crashing osd that leads up to the crash... - 10:13 PM Bug #24422: Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- Can you generate an osd log with 'debug osd = 20' for the crashing osd that leads up to the crash?
- 09:50 PM Bug #24422 (Duplicate): Ceph OSDs crashing in BlueStore::queue_transactions() using EC
- 10:11 PM Bug #23145: OSD crashes during recovery of EC pg
- Two basic theories:
1. There is a bug that prematurely advances can_rollback_to
2. One of Peter's OSDs warped bac... - 10:05 PM Bug #23145: OSD crashes during recovery of EC pg
- Sage Weil wrote:
> Zengran Zhang wrote:
> > osd in last peering stage will call pg_log.roll_forward(at last of PG:... - 10:03 PM Bug #23145 (Need More Info): OSD crashes during recovery of EC pg
- Yong Wang, can you provide a full osd log with debug osd = 20 for the primary osd for the PG leading up to the crash...
- 09:22 PM Bug #23145: OSD crashes during recovery of EC pg
- Zengran Zhang wrote:
> osd in last peering stage will call pg_log.roll_forward(at last of PG::activate), is there p... - 01:46 AM Bug #23145: OSD crashes during recovery of EC pg
- @Sage Weil @Zengran Zhang
could you shared something about this bug recently? - 01:44 AM Bug #23145: OSD crashes during recovery of EC pg
- hi all,did it has any updates please?
- 10:02 PM Backport #24599 (In Progress): mimic: failed to load OSD map for epoch X, got 0 bytes
- 10:01 PM Backport #24599 (Resolved): mimic: failed to load OSD map for epoch X, got 0 bytes
- https://github.com/ceph/ceph/pull/22651
- 09:47 PM Bug #24448 (Won't Fix): (Filestore) ABRT report for package ceph has reached 10 occurrences
- This is likely due to filestore becoming overloaded (hence waiting on throttles) and hitting the filestore op thread ...
- 09:38 PM Bug #24511 (Duplicate): osd crushed at thread_name:safe_timer
- 09:37 PM Bug #24515: "[WRN] Health check failed: 1 slow ops, oldest one blocked for 32 sec, mon.c has slow...
- Kefu, can you take a look at this?
- 09:36 PM Bug #24531: Mimic MONs have slow/long running ops
- Joao, could you take a look at this?
- 09:34 PM Bug #24549 (Won't Fix): FileStore::read assert (ABRT report for package ceph has reached 1000 occ...
- As John described, this is not a bug in ceph but due to failing hardware or the filesystem below.
- 09:25 PM Bug #23753 (Can't reproduce): "Error ENXIO: problem getting command descriptions from osd.4" in u...
- re-open if it recurs
- 09:19 PM Bug #22624: filestore: 3180: FAILED assert(0 == "unexpected error"): error (2) No such file or di...
- 09:12 PM Bug #22085 (Can't reproduce): jewel->luminous: "[ FAILED ] LibRadosAioEC.IsSafe" in upgrade:jew...
- assuming this is the mon crush testing timeout, logs are gone so can't be sure
- 08:10 PM Bug #24423: failed to load OSD map for epoch X, got 0 bytes
- backport for mimic: https://github.com/ceph/ceph/pull/22651
- 08:07 PM Bug #24423 (Pending Backport): failed to load OSD map for epoch X, got 0 bytes
- 07:46 PM Bug #24597 (Resolved): FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_m...
- ...
- 06:32 PM Bug #20086: LibRadosLockECPP.LockSharedDurPP gets EEXIST
- ...
- 03:01 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
Now that I've looked at the code there is nothing surprising about the map handling. There is code in dequeue_op()...- 12:37 AM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
I was able to reproduce by running a loop of a single test case in qa/standalone/erasure-code/test-erasure-eio.sh
...- 01:00 PM Backport #23673 (Resolved): jewel: auth: ceph auth add does not sanity-check caps
- 12:52 PM Bug #23872 (Resolved): Deleting a pool with active watch/notify linger ops can result in seg fault
- 12:52 PM Backport #23905 (Resolved): jewel: Deleting a pool with active watch/notify linger ops can result...
- 12:21 PM Backport #24383 (In Progress): mimic: osd: stray osds in async_recovery_targets cause out of orde...
- https://github.com/ceph/ceph/pull/22642
- 08:42 AM Bug #24588 (Fix Under Review): osd: may get empty info at recovery
- -https://github.com/ceph/ceph/pull/22362-
- 01:42 AM Bug #24588 (Resolved): osd: may get empty info at recovery
- 2018-06-15 20:34:16.421720 7f89d2c24700 -1 /home/zzr/ceph.sf/src/osd/PG.cc: In function 'void PG::start_peering_inter...
- 08:40 AM Bug #24593: s390x: Ceph Monitor crashed with Caught signal (Aborted)
- I expect that only people in possession of s390x hardware will be able to debug this
I see that there is another t... - 05:33 AM Bug #24593 (New): s390x: Ceph Monitor crashed with Caught signal (Aborted)
- We are trying to setup ceph cluster on s390x platform.
ceph-mon service crashed with an error: *** Caught signal ... - 05:50 AM Feature #24591 (Fix Under Review): FileStore hasn't impl to get kv-db's statistics
- 03:22 AM Feature #24591: FileStore hasn't impl to get kv-db's statistics
- https://github.com/ceph/ceph/pull/22633
- 03:22 AM Feature #24591 (Fix Under Review): FileStore hasn't impl to get kv-db's statistics
- In BlueStore, you can see kv-db's statistics by "ceph daemon osd.X dump_objectstore_kv_stats", but FileStore hasn't i...
- 03:22 AM Feature #22147: Set multiple flags in a single command line
- I don’t think we should skip it entirely. Many of the places that implement a check like that are using a common flag...
06/19/2018
- 11:44 PM Bug #24487 (In Progress): osd: choose_acting loop
- This happens when an osd which is part of the acting set and not a part the upset, gets chosen as an async_recovery_t...
- 10:51 PM Backport #23673: jewel: auth: ceph auth add does not sanity-check caps
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21367
merged - 10:50 PM Backport #23905: jewel: Deleting a pool with active watch/notify linger ops can result in seg fault
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/21754
merged - 10:49 PM Feature #22147: Set multiple flags in a single command line
- It seems fair to assume that "unset" should support this also.
Question: should settings that require --yes-i-real... - 10:40 PM Bug #24587: librados api aio tests race condition
- http://pulpito.ceph.com/yuriw-2018-06-13_14:55:30-rados-wip-yuri4-testing-2018-06-12-2037-jewel-distro-basic-smithi/2...
- 10:38 PM Bug #24587 (Resolved): librados api aio tests race condition
- Seen in a jewel integration branch with no OSD changes:
http://pulpito.ceph.com/yuriw-2018-06-12_22:32:43-rados-wi... - 09:58 PM Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh
- I did a run based on d9284902e1b2e292595696caf11cdead18acec96 which is a branch off of master.
http://pulpito.ceph... - 07:24 PM Backport #24584 (Resolved): luminous: osdc: wrong offset in BufferHead
- https://github.com/ceph/ceph/pull/22865
- 07:24 PM Backport #24583 (Resolved): mimic: osdc: wrong offset in BufferHead
- https://github.com/ceph/ceph/pull/22869
- 06:02 PM Bug #19971 (Resolved): osd: deletes are performed inline during pg log processing
- 06:01 PM Backport #22406 (Rejected): jewel: osd: deletes are performed inline during pg log processing
- This change was deemed too invasive at such a late stage in Jewel's life cycle.
- 06:01 PM Backport #22405 (Rejected): jewel: store longer dup op information
- This change was deemed too invasive at such a late stage in Jewel's life cycle.
- 06:00 PM Backport #22400 (Rejected): jewel: PR #16172 causing performance regression
- This change was deemed too invasive at such a late stage in Jewel's life cycle.
- 04:10 PM Bug #24484 (Pending Backport): osdc: wrong offset in BufferHead
- 11:54 AM Bug #24448: (Filestore) ABRT report for package ceph has reached 10 occurrences
- OSD killed by signal, something like OOM incidents perhaps?
- 11:53 AM Bug #24450 (Duplicate): OSD Caught signal (Aborted)
- http://tracker.ceph.com/issues/24423
- 11:51 AM Bug #24559 (Fix Under Review): building error for QAT decompress
- 02:10 AM Bug #24559 (Fix Under Review): building error for QAT decompress
- The parameter of decompress changes from 'bufferlist::iterator' to 'bufferlist::const_iterator', but chis change miss...
- 11:34 AM Bug #24549: FileStore::read assert (ABRT report for package ceph has reached 1000 occurrences)
- Presumably this is underlying FS failures tripping asserts rather than a bug (perhaps people using ZFS on centos, or ...
- 07:26 AM Backport #24355 (In Progress): mimic: osd: pg hard limit too easy to hit
- https://github.com/ceph/ceph/pull/22621
Also available in: Atom