Activity
From 05/22/2020 to 06/20/2020
06/20/2020
06/18/2020
06/17/2020
- 04:06 PM Bug #46055 (Resolved): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
- ...
- 03:55 PM Bug #46054 (Resolved): RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion...
- ...
- 03:46 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- We don't use RGW, we use self-written client which operates with small objects (~500-700-byte objects). Load is not t...
06/16/2020
- 09:25 PM Bug #46027: bufferlist c_str() sometimes clears assignment to mempool
- PR: https://github.com/ceph/ceph/pull/35584.
- 08:02 AM Bug #46027 (Resolved): bufferlist c_str() sometimes clears assignment to mempool
- Sometimes c_str() needs to rebuild underlying buffer::raw.
It that case original assignment to mempool is lost. - 09:23 AM Bug #45994 (Triaged): OSD crash - in thread tp_osd_tp
- 09:20 AM Bug #45994: OSD crash - in thread tp_osd_tp
- Increasing suicide timeout doesn't look like the proper way of dealing with this issue.
I presume you're suffering... - 06:04 AM Bug #45994: OSD crash - in thread tp_osd_tp
- hi ,
We have seen the issue to be caused by heartbeat timeout resolved by increasing the timer. Hence can this tick...
06/15/2020
- 07:21 PM Backport #46010 (Rejected): mimic: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 f...
- 07:21 PM Backport #46009 (Resolved): octopus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2...
- https://github.com/ceph/ceph/pull/36049
- 07:21 PM Backport #46008 (Resolved): nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/...
- https://github.com/ceph/ceph/pull/37824
- 12:17 PM Backport #45354 (Rejected): octopus: ceph_test_objectstore: src/os/bluestore/bluestore_types.h: 7...
- Igor writes in parent issue:
"I doubt we'll backport deferring big writes to Octopus. Hence marking as resolved" - 07:00 AM Bug #45994 (Duplicate): OSD crash - in thread tp_osd_tp
- We recently see some random OSD crashes in thread tp_osd_tp with the below backtrace on one of our Nautilus clusters....
06/12/2020
- 02:20 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- @Aleksei,
first of all manual compaction isn't supposed to be the regular mean to operate with. Generally RocksDB p... - 01:14 PM Bug #38745: spillover that doesn't make sense
- Seena Fallah wrote:
> Igor Fedotov wrote:
> > @Marcin - perfect overview, thanks!
> > Just want to mention that th...
06/11/2020
- 11:22 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Unfortunately, manual compaction of one OSD takes about 30 minutes. So, in this situation we're forced to do a lot of...
- 07:54 AM Bug #38745: spillover that doesn't make sense
- Hi Seena,
"How did you find out that db is now on level 4? I mean what's the size limit on level one?"
These ar... - 01:39 AM Bug #45788 (Pending Backport): ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
06/10/2020
- 04:15 PM Bug #38745: spillover that doesn't make sense
- Igor Fedotov wrote:
> @Marcin - perfect overview, thanks!
> Just want to mention that this "granular" level space a... - 04:07 PM Bug #38745: spillover that doesn't make sense
- @Marcin One more question I would be so thankful if you answer me, How did you find out that db is now on level 4? I ...
- 04:02 PM Bug #38745: spillover that doesn't make sense
- @Marcin Really thanks for your overview. Now I get's what's going on.
- 01:18 PM Bug #38745: spillover that doesn't make sense
- @Marcin - perfect overview, thanks!
Just want to mention that this "granular" level space allocation has been fixed ... - 11:07 AM Bug #38745: spillover that doesn't make sense
- Hi Seena,
Metadata is stored in RocksDB which a 'logging database'. It doesn't replace or remove entries, it just ... - 09:49 AM Bug #38745: spillover that doesn't make sense
- I'm experiencing this in nautilus 14.2.9
Should the above PR solve this issue? I get what does the message really me...
06/09/2020
- 12:02 PM Bug #45788: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
- /a/kchai-2020-06-09_07:14:19-rados-wip-kefu-testing-2020-06-09-1352-distro-basic-smithi/5131622
- 08:55 AM Bug #45195 (Resolved): ceph_test_objectstore: src/os/bluestore/bluestore_types.h: 734: FAILED cep...
- I doubt we'll backport deferring big writes to Octopus. Hence marking as resolved
- 08:51 AM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
- Original fix was https://github.com/ceph/ceph/pull/35201 but it was decided to kill WAL preextending instead, see htt...
06/08/2020
- 12:28 PM Bug #45519: OSD asserts during block allocation for BlueFS
- OK, here is a backtrace from the OSD.30 log
ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautil...
06/06/2020
- 08:37 AM Backport #45682 (In Progress): octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered...
06/05/2020
- 09:54 PM Bug #40741 (Triaged): Mass OSD failure, unable to restart
- 09:39 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Manual compaction is a workaround for slow KV access which tends to be cause by prior massive data removals. In regul...
- 05:33 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Thanks, it works now!
Do we need to run manual compaction, when we see high latency? It takes about 30 minutes to co... - 03:37 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Proper command line for my previous comment would be:
CEPH_ARGS="--bluefs-shared-alloc-size 4096" ceph-kvstore-too - 01:27 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Looks like you're using custom bluefs_shared_alloc_size setting set to 4K. And kvstore_tool is unaware of that and tr...
- 09:37 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- It is failing during any ceph-kvstore-tool command. It doesn't matter compaction/stats or list. Looks like another is...
- 04:52 PM Bug #45613 (Fix Under Review): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 f...
- 04:03 PM Bug #45519: OSD asserts during block allocation for BlueFS
- Aleksei Zakharov wrote:
> Igor Fedotov wrote:
> > Aleksei Zakharov wrote:
> > >
> > > Turning on stupid allocato... - 01:11 PM Bug #45903: BlueFS replay log grows without end
- Seen that to. Looks like we're lacking some backports in luminous.
See https://github.com/ceph/ceph/pull/34876 - 08:29 AM Bug #45903 (Resolved): BlueFS replay log grows without end
- If data is slowly pouring to RocksDB WAL and new files are not created, BlueFS replay log can grow to the size that i...
06/04/2020
- 02:50 PM Backport #45684 (In Progress): nautilus: Large (>=2 GB) writes are incomplete when bluefs_buffere...
- 11:10 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- So it's failing during compaction only, right? It's even more interesting, could you please share more verbose ?log?
- 09:26 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- It starts and works ok. The only issue with it is the latency issue described earlier.
06/03/2020
- 04:47 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- So this OSD doesn't start any more, does it?
Please set debug-bluestore to 20 and collect the log if so. - 11:03 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Unfortunately manual compaction doesn't work. Error log is attached.
- 10:14 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- I will try manual compaction, thanks!
I'm not sure that turning direct IO on for bluefs can help somehow in this p... - 01:24 PM Backport #41339: mimic: os/bluestore/BlueFS: use 64K alloc_size on the shared device
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30219
m... - 10:22 AM Bug #45519: OSD asserts during block allocation for BlueFS
- Igor Fedotov wrote:
> Aleksei Zakharov wrote:
> >
> > Turning on stupid allocator with "bad" OSD turned into an i... - 02:02 AM Bug #45788: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
- /a/yuriw-2020-05-30_02:18:17-rados-wip-yuri-master_5.29.20-distro-basic-smithi/5104557
06/01/2020
- 10:21 PM Bug #45519: OSD asserts during block allocation for BlueFS
- For the sake of record the following thread has some helpful info on assertion: h->file->fnode.ino != 1
I did som... - 10:07 PM Bug #45519: OSD asserts during block allocation for BlueFS
- Aleksei Zakharov wrote:
>
> Turning on stupid allocator with "bad" OSD turned into an impossibility to start this ... - 10:47 AM Bug #45519: OSD asserts during block allocation for BlueFS
- We added some new OSD's. During backfill process we were affected by another issue(https://tracker.ceph.com/issues/45...
- 10:16 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Anyway suggesting to compact DB manually.
Besides recently we switched backed to direct IO for bluefs, see https:/... - 10:18 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Nope, we use NVMe SSD's.
The issue starts with high page cache usage. Bluefs uses buffered IO, it reads a lot from... - 09:30 PM Bug #45613 (Resolved): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
- 03:48 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
- https://github.com/ceph/ceph/pull/35293 - Fix for octopus has been merged and released in 15.2.3.
05/30/2020
- 03:29 PM Bug #44359: Raw usage reported by 'ceph osd df' incorrect when using WAL/DB on another drive
- Same here. Fresh Cluster - completely empty. "Raw Use" corresponds to Size of DB+WAL/DB Partition located on separate...
05/29/2020
- 07:32 PM Bug #45788 (Fix Under Review): ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
- 07:32 PM Bug #45788 (Pending Backport): ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
- 07:14 PM Bug #45788 (Resolved): ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
- Originally appeared in https://tracker.ceph.com/issues/45519
but cloning to another ticket since looks unrelated.
... - 07:17 PM Bug #45519: OSD asserts during block allocation for BlueFS
- @Neha, @Kefu - made another ticket for QA test case failure, IMO unrelated.
https://tracker.ceph.com/issues/45788 - 04:20 PM Bug #45519: OSD asserts during block allocation for BlueFS
- We have similar issues on our cluster. ~1 billion objects in EC (8+3) pools, 540 OSDs, Nautilus 14.2.8. No tuning o...
- 04:32 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Yeah, that's a known issue with RocksDB/BlueStore.
Manual compaction using "ceph-kvstore-tool bluestore-kv <path-t... - 12:18 PM Bug #45765 (Resolved): BlueStore::_collection_list causes huge latency growth pg deletion
- Hi!
We have ceph v14.2.7 cluster with about 2 billions of objects. Each object is less than 4K size. One PG have abo...
05/28/2020
- 08:30 PM Bug #45519: OSD asserts during block allocation for BlueFS
- ...
- 02:25 PM Bug #44213: Erasure coded pool might need much more disk space than expected
- We need to run some performance tests with 4k min_alloc_size.
- 12:50 AM Bug #43147 (New): segv in LruOnodeCacheShard::_pin
- Reopening this since I have seen it in /a/yuriw-2020-05-24_19:30:40-rados-wip-yuri-master_5.24.20-distro-basic-smithi...
05/27/2020
05/26/2020
- 04:53 AM Bug #45703 (New): LruOnodeCacheShard::_add: Assertion `!safemode_or_autounlink || node_algorithms...
- /a/yuriw-2020-05-22_19:55:53-rados-wip-yuri-master_5.22.20-distro-basic-smithi/5083216...
05/25/2020
- 02:26 PM Bug #45110 (Resolved): Extent leak after main device expand
- While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ...
- 11:05 AM Backport #45126 (Resolved): nautilus: Extent leak after main device expand
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/34711
m...
05/24/2020
- 09:05 PM Backport #45684 (Resolved): nautilus: Large (>=2 GB) writes are incomplete when bluefs_buffered_i...
- https://github.com/ceph/ceph/pull/35404
- 09:05 PM Backport #45683 (Rejected): mimic: Large (>=2 GB) writes are incomplete when bluefs_buffered_io =...
- 09:05 PM Backport #45682 (Resolved): octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io...
- https://github.com/ceph/ceph/pull/35446
05/22/2020
- 07:37 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
- OK thanks. I'm also just confirming that nautilus would be immune, even though 14.2.10 will change bluefs_bufferio_io...
- 05:30 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
- At the same time IIUC it's OSD restart which reveals data corruption - while OSD is running it doesn't read from WAL ...
- 05:26 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
- @Dan - right. Setting bluefs_preextend_wal_files seems better to me (one can even try to do that on the fly) but I'd ...
- 04:27 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
- @Igor: So setting bluefs_preextend_wal_files=false and/or bluefs_buffered_io=true should workaround the issue until t...
- 04:02 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
- So the bug is caused by submitting overlapping write requests for BlueFS WAL via libaio. BlueFS::_flush_range might e...
- 03:53 PM Bug #45613 (Fix Under Review): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 f...
- 03:45 PM Bug #45657: false positive check in KernelDevice::aio_log_start
- Also the checking for overlapped io is currently broken due to preextended WAL prefilling in BlueFS::_flush_range. Wr...
- 02:23 PM Bug #45657 (New): false positive check in KernelDevice::aio_log_start
- When enabling bdev_debug_inflight_ios KernelDevice might improperly detect conflicts for inflight read ops. This is c...
- 04:31 AM Bug #45337 (Pending Backport): Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true
- 04:30 AM Bug #45337 (Resolved): Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true
- 04:29 AM Bug #45335 (Resolved): cephadm upgrade: OSD.0 is not coming back after restart: rocksdb: verify_...
Also available in: Atom