Project

General

Profile

Activity

From 06/01/2020 to 06/30/2020

06/30/2020

09:45 PM Bug #44880: ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
Backporting note: needs to be backported together with follow-on fix. See the octopus backport PR and #45426 Nathan Cutler
09:44 PM Bug #46055 (Resolved): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
backport tracked via https://tracker.ceph.com/issues/44880 Nathan Cutler
02:21 AM Bug #46055: ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
Being backported in https://github.com/ceph/ceph/pull/34943 Neha Ojha
02:20 AM Bug #46055 (Pending Backport): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
Neha Ojha
04:48 PM Bug #46270: mimic:osd can not start
This just looks like bluefs is running out of space. Mimic is EOL, I'd recommend you to upgrade and report back if yo... Neha Ojha
06:40 AM Bug #46270 (Can't reproduce): mimic:osd can not start
My env:
[root@mon1 test]# ceph -v
ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
[r...
伟杰 谭

06/27/2020

02:52 PM Bug #46054 (Fix Under Review): RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): A...
Kefu Chai
02:51 PM Bug #46054: RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref'...
hi Adam, i am working on this issue. as i've run into it twice. and i feel obliged to fix it. as i failed to identify... Kefu Chai
08:18 AM Bug #46054: RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref'...
/a//kchai-2020-06-27_07:37:00-rados-wip-kefu-testing-2020-06-27-1407-distro-basic-smithi/5183643 Kefu Chai

06/26/2020

05:35 PM Bug #46054: RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion `last_ref'...
/a/sseshasa-2020-06-24_17:46:09-rados-wip-sseshasa-testing-2020-06-24-1858-distro-basic-smithi/5176446 Neha Ojha

06/24/2020

08:43 PM Backport #46195 (Resolved): luminous: BlueFS replay log grows without end
https://github.com/ceph/ceph/pull/35776 Nathan Cutler
08:43 PM Backport #46194 (Resolved): nautilus: BlueFS replay log grows without end
https://github.com/ceph/ceph/pull/37948 Nathan Cutler
08:43 PM Backport #46193 (Resolved): octopus: BlueFS replay log grows without end
https://github.com/ceph/ceph/pull/36621 Nathan Cutler
08:43 PM Backport #46192 (Rejected): mimic: BlueFS replay log grows without end
Nathan Cutler
02:35 PM Bug #45903 (Pending Backport): BlueFS replay log grows without end
It will be good to get the fix into luminous and mimic for affected users. Neha Ojha
10:37 AM Backport #45682 (Resolved): octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/35446
m...
Nathan Cutler

06/23/2020

08:12 PM Backport #45682: octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/35446
merged
Yuri Weinstein

06/21/2020

09:09 PM Bug #46124: Potential race condition regression around new OSD flock()s
> I suspect that Ceph starts other threads (using clone() on Linux) while the lock is held
Sorry, this should be t...
Niklas Hambuechen
05:27 PM Bug #46124: Potential race condition regression around new OSD flock()s
From the strace above, we can see that there's always a `close()` after a matching `flock()` within the same PID, so ... Niklas Hambuechen
01:53 PM Bug #46124: Potential race condition regression around new OSD flock()s
Another question:
Would it not be better to use OFD locks (Open File Description locks), that is via ...
Niklas Hambuechen
03:18 AM Bug #46124: Potential race condition regression around new OSD flock()s
In case it helps, here are `strace` invocations, each showing slightly different behaviour and error messages, that i... Niklas Hambuechen
03:08 AM Bug #46124: Potential race condition regression around new OSD flock()s
I did not experience that in Mimic. Niklas Hambuechen
03:07 AM Bug #46124 (Resolved): Potential race condition regression around new OSD flock()s
In #38150 and PR https://github.com/ceph/ceph/pull/26245, a new `flock()` approach was introuduced.
When I use `ce...
Niklas Hambuechen
03:08 AM Bug #38150: KernelDevice exclusive lock broken
I suspect this may have introduced a regression: #46124 Niklas Hambuechen

06/20/2020

11:50 AM Bug #46027 (Resolved): bufferlist c_str() sometimes clears assignment to mempool
Kefu Chai

06/18/2020

05:05 PM Bug #46055 (Fix Under Review): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
Igor Fedotov

06/17/2020

04:06 PM Bug #46055 (Resolved): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
... Neha Ojha
03:55 PM Bug #46054 (Resolved): RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion...
... Neha Ojha
03:46 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
We don't use RGW, we use self-written client which operates with small objects (~500-700-byte objects). Load is not t... Aleksei Zakharov

06/16/2020

09:25 PM Bug #46027: bufferlist c_str() sometimes clears assignment to mempool
PR: https://github.com/ceph/ceph/pull/35584. Radoslaw Zarzynski
08:02 AM Bug #46027 (Resolved): bufferlist c_str() sometimes clears assignment to mempool
Sometimes c_str() needs to rebuild underlying buffer::raw.
It that case original assignment to mempool is lost.
Adam Kupczyk
09:23 AM Bug #45994 (Triaged): OSD crash - in thread tp_osd_tp
Igor Fedotov
09:20 AM Bug #45994: OSD crash - in thread tp_osd_tp
Increasing suicide timeout doesn't look like the proper way of dealing with this issue.
I presume you're suffering...
Igor Fedotov
06:04 AM Bug #45994: OSD crash - in thread tp_osd_tp
hi ,
We have seen the issue to be caused by heartbeat timeout resolved by increasing the timer. Hence can this tick...
Nokia ceph-users

06/15/2020

07:21 PM Backport #46010 (Rejected): mimic: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 f...
Nathan Cutler
07:21 PM Backport #46009 (Resolved): octopus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2...
https://github.com/ceph/ceph/pull/36049 Nathan Cutler
07:21 PM Backport #46008 (Resolved): nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/...
https://github.com/ceph/ceph/pull/37824 Nathan Cutler
12:17 PM Backport #45354 (Rejected): octopus: ceph_test_objectstore: src/os/bluestore/bluestore_types.h: 7...
Igor writes in parent issue:
"I doubt we'll backport deferring big writes to Octopus. Hence marking as resolved"
Nathan Cutler
07:00 AM Bug #45994 (Duplicate): OSD crash - in thread tp_osd_tp
We recently see some random OSD crashes in thread tp_osd_tp with the below backtrace on one of our Nautilus clusters.... Nokia ceph-users

06/12/2020

02:20 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
@Aleksei,
first of all manual compaction isn't supposed to be the regular mean to operate with. Generally RocksDB p...
Igor Fedotov
01:14 PM Bug #38745: spillover that doesn't make sense
Seena Fallah wrote:
> Igor Fedotov wrote:
> > @Marcin - perfect overview, thanks!
> > Just want to mention that th...
Igor Fedotov

06/11/2020

11:22 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Unfortunately, manual compaction of one OSD takes about 30 minutes. So, in this situation we're forced to do a lot of... Aleksei Zakharov
07:54 AM Bug #38745: spillover that doesn't make sense
Hi Seena,
"How did you find out that db is now on level 4? I mean what's the size limit on level one?"
These ar...
Marcin W
01:39 AM Bug #45788 (Pending Backport): ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
Kefu Chai

06/10/2020

04:15 PM Bug #38745: spillover that doesn't make sense
Igor Fedotov wrote:
> @Marcin - perfect overview, thanks!
> Just want to mention that this "granular" level space a...
Seena Fallah
04:07 PM Bug #38745: spillover that doesn't make sense
@Marcin One more question I would be so thankful if you answer me, How did you find out that db is now on level 4? I ... Seena Fallah
04:02 PM Bug #38745: spillover that doesn't make sense
@Marcin Really thanks for your overview. Now I get's what's going on. Seena Fallah
01:18 PM Bug #38745: spillover that doesn't make sense
@Marcin - perfect overview, thanks!
Just want to mention that this "granular" level space allocation has been fixed ...
Igor Fedotov
11:07 AM Bug #38745: spillover that doesn't make sense
Hi Seena,
Metadata is stored in RocksDB which a 'logging database'. It doesn't replace or remove entries, it just ...
Marcin W
09:49 AM Bug #38745: spillover that doesn't make sense
I'm experiencing this in nautilus 14.2.9
Should the above PR solve this issue? I get what does the message really me...
Seena Fallah

06/09/2020

12:02 PM Bug #45788: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
/a/kchai-2020-06-09_07:14:19-rados-wip-kefu-testing-2020-06-09-1352-distro-basic-smithi/5131622 Kefu Chai
08:55 AM Bug #45195 (Resolved): ceph_test_objectstore: src/os/bluestore/bluestore_types.h: 734: FAILED cep...
I doubt we'll backport deferring big writes to Octopus. Hence marking as resolved Igor Fedotov
08:51 AM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
Original fix was https://github.com/ceph/ceph/pull/35201 but it was decided to kill WAL preextending instead, see htt... Igor Fedotov

06/08/2020

12:28 PM Bug #45519: OSD asserts during block allocation for BlueFS
OK, here is a backtrace from the OSD.30 log
ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautil...
Igor Fedotov

06/06/2020

08:37 AM Backport #45682 (In Progress): octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered...
Nathan Cutler

06/05/2020

09:54 PM Bug #40741 (Triaged): Mass OSD failure, unable to restart
Igor Fedotov
09:39 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Manual compaction is a workaround for slow KV access which tends to be cause by prior massive data removals. In regul... Igor Fedotov
05:33 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Thanks, it works now!
Do we need to run manual compaction, when we see high latency? It takes about 30 minutes to co...
Aleksei Zakharov
03:37 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Proper command line for my previous comment would be:
CEPH_ARGS="--bluefs-shared-alloc-size 4096" ceph-kvstore-too
Igor Fedotov
01:27 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Looks like you're using custom bluefs_shared_alloc_size setting set to 4K. And kvstore_tool is unaware of that and tr... Igor Fedotov
09:37 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
It is failing during any ceph-kvstore-tool command. It doesn't matter compaction/stats or list. Looks like another is... Aleksei Zakharov
04:52 PM Bug #45613 (Fix Under Review): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 f...
Igor Fedotov
04:03 PM Bug #45519: OSD asserts during block allocation for BlueFS
Aleksei Zakharov wrote:
> Igor Fedotov wrote:
> > Aleksei Zakharov wrote:
> > >
> > > Turning on stupid allocato...
Aleksei Zakharov
01:11 PM Bug #45903: BlueFS replay log grows without end
Seen that to. Looks like we're lacking some backports in luminous.
See https://github.com/ceph/ceph/pull/34876
Igor Fedotov
08:29 AM Bug #45903 (Resolved): BlueFS replay log grows without end
If data is slowly pouring to RocksDB WAL and new files are not created, BlueFS replay log can grow to the size that i... Adam Kupczyk

06/04/2020

02:50 PM Backport #45684 (In Progress): nautilus: Large (>=2 GB) writes are incomplete when bluefs_buffere...
Nathan Cutler
11:10 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
So it's failing during compaction only, right? It's even more interesting, could you please share more verbose ?log? Igor Fedotov
09:26 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
It starts and works ok. The only issue with it is the latency issue described earlier. Aleksei Zakharov

06/03/2020

04:47 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
So this OSD doesn't start any more, does it?
Please set debug-bluestore to 20 and collect the log if so.
Igor Fedotov
11:03 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Unfortunately manual compaction doesn't work. Error log is attached. Aleksei Zakharov
10:14 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
I will try manual compaction, thanks!
I'm not sure that turning direct IO on for bluefs can help somehow in this p...
Aleksei Zakharov
01:24 PM Backport #41339: mimic: os/bluestore/BlueFS: use 64K alloc_size on the shared device
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30219
m...
Nathan Cutler
10:22 AM Bug #45519: OSD asserts during block allocation for BlueFS
Igor Fedotov wrote:
> Aleksei Zakharov wrote:
> >
> > Turning on stupid allocator with "bad" OSD turned into an i...
Aleksei Zakharov
02:02 AM Bug #45788: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
/a/yuriw-2020-05-30_02:18:17-rados-wip-yuri-master_5.29.20-distro-basic-smithi/5104557 Brad Hubbard

06/01/2020

10:21 PM Bug #45519: OSD asserts during block allocation for BlueFS
For the sake of record the following thread has some helpful info on assertion: h->file->fnode.ino != 1
I did som...
Igor Fedotov
10:07 PM Bug #45519: OSD asserts during block allocation for BlueFS
Aleksei Zakharov wrote:
>
> Turning on stupid allocator with "bad" OSD turned into an impossibility to start this ...
Igor Fedotov
10:47 AM Bug #45519: OSD asserts during block allocation for BlueFS
We added some new OSD's. During backfill process we were affected by another issue(https://tracker.ceph.com/issues/45... Aleksei Zakharov
10:16 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Anyway suggesting to compact DB manually.
Besides recently we switched backed to direct IO for bluefs, see https:/...
Igor Fedotov
10:18 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Nope, we use NVMe SSD's.
The issue starts with high page cache usage. Bluefs uses buffered IO, it reads a lot from...
Aleksei Zakharov
09:30 PM Bug #45613 (Resolved): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
Igor Fedotov
03:48 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
https://github.com/ceph/ceph/pull/35293 - Fix for octopus has been merged and released in 15.2.3. Neha Ojha
 

Also available in: Atom