Activity
From 06/01/2020 to 06/30/2020
06/30/2020
- 09:45 PM Bug #44880: ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
- Backporting note: needs to be backported together with follow-on fix. See the octopus backport PR and #45426
- 09:44 PM Bug #46055 (Resolved): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
- backport tracked via https://tracker.ceph.com/issues/44880
- 02:21 AM Bug #46055: ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
- Being backported in https://github.com/ceph/ceph/pull/34943
- 02:20 AM Bug #46055 (Pending Backport): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
- 04:48 PM Bug #46270: mimic:osd can not start
- This just looks like bluefs is running out of space. Mimic is EOL, I'd recommend you to upgrade and report back if yo...
- 06:40 AM Bug #46270 (Can't reproduce): mimic:osd can not start
- My env:
[root@mon1 test]# ceph -v
ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
[r...
06/27/2020
- 02:52 PM Bug #46054 (Fix Under Review): RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): A...
- 02:51 PM Bug #46054: RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref'...
- hi Adam, i am working on this issue. as i've run into it twice. and i feel obliged to fix it. as i failed to identify...
- 08:18 AM Bug #46054: RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref'...
- /a//kchai-2020-06-27_07:37:00-rados-wip-kefu-testing-2020-06-27-1407-distro-basic-smithi/5183643
06/26/2020
- 05:35 PM Bug #46054: RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref'...
- /a/sseshasa-2020-06-24_17:46:09-rados-wip-sseshasa-testing-2020-06-24-1858-distro-basic-smithi/5176446
06/24/2020
- 08:43 PM Backport #46195 (Resolved): luminous: BlueFS replay log grows without end
- https://github.com/ceph/ceph/pull/35776
- 08:43 PM Backport #46194 (Resolved): nautilus: BlueFS replay log grows without end
- https://github.com/ceph/ceph/pull/37948
- 08:43 PM Backport #46193 (Resolved): octopus: BlueFS replay log grows without end
- https://github.com/ceph/ceph/pull/36621
- 08:43 PM Backport #46192 (Rejected): mimic: BlueFS replay log grows without end
- 02:35 PM Bug #45903 (Pending Backport): BlueFS replay log grows without end
- It will be good to get the fix into luminous and mimic for affected users.
- 10:37 AM Backport #45682 (Resolved): octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/35446
m...
06/23/2020
- 08:12 PM Backport #45682: octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/35446
merged
06/21/2020
- 09:09 PM Bug #46124: Potential race condition regression around new OSD flock()s
- > I suspect that Ceph starts other threads (using clone() on Linux) while the lock is held
Sorry, this should be t... - 05:27 PM Bug #46124: Potential race condition regression around new OSD flock()s
- From the strace above, we can see that there's always a `close()` after a matching `flock()` within the same PID, so ...
- 01:53 PM Bug #46124: Potential race condition regression around new OSD flock()s
- Another question:
Would it not be better to use OFD locks (Open File Description locks), that is via ... - 03:18 AM Bug #46124: Potential race condition regression around new OSD flock()s
- In case it helps, here are `strace` invocations, each showing slightly different behaviour and error messages, that i...
- 03:08 AM Bug #46124: Potential race condition regression around new OSD flock()s
- I did not experience that in Mimic.
- 03:07 AM Bug #46124 (Resolved): Potential race condition regression around new OSD flock()s
- In #38150 and PR https://github.com/ceph/ceph/pull/26245, a new `flock()` approach was introuduced.
When I use `ce... - 03:08 AM Bug #38150: KernelDevice exclusive lock broken
- I suspect this may have introduced a regression: #46124
06/20/2020
06/18/2020
06/17/2020
- 04:06 PM Bug #46055 (Resolved): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
- ...
- 03:55 PM Bug #46054 (Resolved): RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion...
- ...
- 03:46 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- We don't use RGW, we use self-written client which operates with small objects (~500-700-byte objects). Load is not t...
06/16/2020
- 09:25 PM Bug #46027: bufferlist c_str() sometimes clears assignment to mempool
- PR: https://github.com/ceph/ceph/pull/35584.
- 08:02 AM Bug #46027 (Resolved): bufferlist c_str() sometimes clears assignment to mempool
- Sometimes c_str() needs to rebuild underlying buffer::raw.
It that case original assignment to mempool is lost. - 09:23 AM Bug #45994 (Triaged): OSD crash - in thread tp_osd_tp
- 09:20 AM Bug #45994: OSD crash - in thread tp_osd_tp
- Increasing suicide timeout doesn't look like the proper way of dealing with this issue.
I presume you're suffering... - 06:04 AM Bug #45994: OSD crash - in thread tp_osd_tp
- hi ,
We have seen the issue to be caused by heartbeat timeout resolved by increasing the timer. Hence can this tick...
06/15/2020
- 07:21 PM Backport #46010 (Rejected): mimic: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 f...
- 07:21 PM Backport #46009 (Resolved): octopus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2...
- https://github.com/ceph/ceph/pull/36049
- 07:21 PM Backport #46008 (Resolved): nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/...
- https://github.com/ceph/ceph/pull/37824
- 12:17 PM Backport #45354 (Rejected): octopus: ceph_test_objectstore: src/os/bluestore/bluestore_types.h: 7...
- Igor writes in parent issue:
"I doubt we'll backport deferring big writes to Octopus. Hence marking as resolved" - 07:00 AM Bug #45994 (Duplicate): OSD crash - in thread tp_osd_tp
- We recently see some random OSD crashes in thread tp_osd_tp with the below backtrace on one of our Nautilus clusters....
06/12/2020
- 02:20 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- @Aleksei,
first of all manual compaction isn't supposed to be the regular mean to operate with. Generally RocksDB p... - 01:14 PM Bug #38745: spillover that doesn't make sense
- Seena Fallah wrote:
> Igor Fedotov wrote:
> > @Marcin - perfect overview, thanks!
> > Just want to mention that th...
06/11/2020
- 11:22 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Unfortunately, manual compaction of one OSD takes about 30 minutes. So, in this situation we're forced to do a lot of...
- 07:54 AM Bug #38745: spillover that doesn't make sense
- Hi Seena,
"How did you find out that db is now on level 4? I mean what's the size limit on level one?"
These ar... - 01:39 AM Bug #45788 (Pending Backport): ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
06/10/2020
- 04:15 PM Bug #38745: spillover that doesn't make sense
- Igor Fedotov wrote:
> @Marcin - perfect overview, thanks!
> Just want to mention that this "granular" level space a... - 04:07 PM Bug #38745: spillover that doesn't make sense
- @Marcin One more question I would be so thankful if you answer me, How did you find out that db is now on level 4? I ...
- 04:02 PM Bug #38745: spillover that doesn't make sense
- @Marcin Really thanks for your overview. Now I get's what's going on.
- 01:18 PM Bug #38745: spillover that doesn't make sense
- @Marcin - perfect overview, thanks!
Just want to mention that this "granular" level space allocation has been fixed ... - 11:07 AM Bug #38745: spillover that doesn't make sense
- Hi Seena,
Metadata is stored in RocksDB which a 'logging database'. It doesn't replace or remove entries, it just ... - 09:49 AM Bug #38745: spillover that doesn't make sense
- I'm experiencing this in nautilus 14.2.9
Should the above PR solve this issue? I get what does the message really me...
06/09/2020
- 12:02 PM Bug #45788: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
- /a/kchai-2020-06-09_07:14:19-rados-wip-kefu-testing-2020-06-09-1352-distro-basic-smithi/5131622
- 08:55 AM Bug #45195 (Resolved): ceph_test_objectstore: src/os/bluestore/bluestore_types.h: 734: FAILED cep...
- I doubt we'll backport deferring big writes to Octopus. Hence marking as resolved
- 08:51 AM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
- Original fix was https://github.com/ceph/ceph/pull/35201 but it was decided to kill WAL preextending instead, see htt...
06/08/2020
- 12:28 PM Bug #45519: OSD asserts during block allocation for BlueFS
- OK, here is a backtrace from the OSD.30 log
ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautil...
06/06/2020
- 08:37 AM Backport #45682 (In Progress): octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered...
06/05/2020
- 09:54 PM Bug #40741 (Triaged): Mass OSD failure, unable to restart
- 09:39 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Manual compaction is a workaround for slow KV access which tends to be cause by prior massive data removals. In regul...
- 05:33 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Thanks, it works now!
Do we need to run manual compaction, when we see high latency? It takes about 30 minutes to co... - 03:37 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Proper command line for my previous comment would be:
CEPH_ARGS="--bluefs-shared-alloc-size 4096" ceph-kvstore-too - 01:27 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Looks like you're using custom bluefs_shared_alloc_size setting set to 4K. And kvstore_tool is unaware of that and tr...
- 09:37 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- It is failing during any ceph-kvstore-tool command. It doesn't matter compaction/stats or list. Looks like another is...
- 04:52 PM Bug #45613 (Fix Under Review): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 f...
- 04:03 PM Bug #45519: OSD asserts during block allocation for BlueFS
- Aleksei Zakharov wrote:
> Igor Fedotov wrote:
> > Aleksei Zakharov wrote:
> > >
> > > Turning on stupid allocato... - 01:11 PM Bug #45903: BlueFS replay log grows without end
- Seen that to. Looks like we're lacking some backports in luminous.
See https://github.com/ceph/ceph/pull/34876 - 08:29 AM Bug #45903 (Resolved): BlueFS replay log grows without end
- If data is slowly pouring to RocksDB WAL and new files are not created, BlueFS replay log can grow to the size that i...
06/04/2020
- 02:50 PM Backport #45684 (In Progress): nautilus: Large (>=2 GB) writes are incomplete when bluefs_buffere...
- 11:10 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- So it's failing during compaction only, right? It's even more interesting, could you please share more verbose ?log?
- 09:26 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- It starts and works ok. The only issue with it is the latency issue described earlier.
06/03/2020
- 04:47 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- So this OSD doesn't start any more, does it?
Please set debug-bluestore to 20 and collect the log if so. - 11:03 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Unfortunately manual compaction doesn't work. Error log is attached.
- 10:14 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- I will try manual compaction, thanks!
I'm not sure that turning direct IO on for bluefs can help somehow in this p... - 01:24 PM Backport #41339: mimic: os/bluestore/BlueFS: use 64K alloc_size on the shared device
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30219
m... - 10:22 AM Bug #45519: OSD asserts during block allocation for BlueFS
- Igor Fedotov wrote:
> Aleksei Zakharov wrote:
> >
> > Turning on stupid allocator with "bad" OSD turned into an i... - 02:02 AM Bug #45788: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
- /a/yuriw-2020-05-30_02:18:17-rados-wip-yuri-master_5.29.20-distro-basic-smithi/5104557
06/01/2020
- 10:21 PM Bug #45519: OSD asserts during block allocation for BlueFS
- For the sake of record the following thread has some helpful info on assertion: h->file->fnode.ino != 1
I did som... - 10:07 PM Bug #45519: OSD asserts during block allocation for BlueFS
- Aleksei Zakharov wrote:
>
> Turning on stupid allocator with "bad" OSD turned into an impossibility to start this ... - 10:47 AM Bug #45519: OSD asserts during block allocation for BlueFS
- We added some new OSD's. During backfill process we were affected by another issue(https://tracker.ceph.com/issues/45...
- 10:16 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Anyway suggesting to compact DB manually.
Besides recently we switched backed to direct IO for bluefs, see https:/... - 10:18 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Nope, we use NVMe SSD's.
The issue starts with high page cache usage. Bluefs uses buffered IO, it reads a lot from... - 09:30 PM Bug #45613 (Resolved): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
- 03:48 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
- https://github.com/ceph/ceph/pull/35293 - Fix for octopus has been merged and released in 15.2.3.
Also available in: Atom