Project

General

Profile

Activity

From 05/22/2020 to 06/20/2020

06/20/2020

11:50 AM Bug #46027 (Resolved): bufferlist c_str() sometimes clears assignment to mempool
Kefu Chai

06/18/2020

05:05 PM Bug #46055 (Fix Under Review): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
Igor Fedotov

06/17/2020

04:06 PM Bug #46055 (Resolved): ObjectStore/StoreTestSpecificAUSize.SpilloverTest/2 failed
... Neha Ojha
03:55 PM Bug #46054 (Resolved): RocksDBResharding: rocksdb::ColumnFamilySet::~ColumnFamilySet(): Assertion...
... Neha Ojha
03:46 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
We don't use RGW, we use self-written client which operates with small objects (~500-700-byte objects). Load is not t... Aleksei Zakharov

06/16/2020

09:25 PM Bug #46027: bufferlist c_str() sometimes clears assignment to mempool
PR: https://github.com/ceph/ceph/pull/35584. Radoslaw Zarzynski
08:02 AM Bug #46027 (Resolved): bufferlist c_str() sometimes clears assignment to mempool
Sometimes c_str() needs to rebuild underlying buffer::raw.
It that case original assignment to mempool is lost.
Adam Kupczyk
09:23 AM Bug #45994 (Triaged): OSD crash - in thread tp_osd_tp
Igor Fedotov
09:20 AM Bug #45994: OSD crash - in thread tp_osd_tp
Increasing suicide timeout doesn't look like the proper way of dealing with this issue.
I presume you're suffering...
Igor Fedotov
06:04 AM Bug #45994: OSD crash - in thread tp_osd_tp
hi ,
We have seen the issue to be caused by heartbeat timeout resolved by increasing the timer. Hence can this tick...
Nokia ceph-users

06/15/2020

07:21 PM Backport #46010 (Rejected): mimic: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 f...
Nathan Cutler
07:21 PM Backport #46009 (Resolved): octopus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2...
https://github.com/ceph/ceph/pull/36049 Nathan Cutler
07:21 PM Backport #46008 (Resolved): nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/...
https://github.com/ceph/ceph/pull/37824 Nathan Cutler
12:17 PM Backport #45354 (Rejected): octopus: ceph_test_objectstore: src/os/bluestore/bluestore_types.h: 7...
Igor writes in parent issue:
"I doubt we'll backport deferring big writes to Octopus. Hence marking as resolved"
Nathan Cutler
07:00 AM Bug #45994 (Duplicate): OSD crash - in thread tp_osd_tp
We recently see some random OSD crashes in thread tp_osd_tp with the below backtrace on one of our Nautilus clusters.... Nokia ceph-users

06/12/2020

02:20 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
@Aleksei,
first of all manual compaction isn't supposed to be the regular mean to operate with. Generally RocksDB p...
Igor Fedotov
01:14 PM Bug #38745: spillover that doesn't make sense
Seena Fallah wrote:
> Igor Fedotov wrote:
> > @Marcin - perfect overview, thanks!
> > Just want to mention that th...
Igor Fedotov

06/11/2020

11:22 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Unfortunately, manual compaction of one OSD takes about 30 minutes. So, in this situation we're forced to do a lot of... Aleksei Zakharov
07:54 AM Bug #38745: spillover that doesn't make sense
Hi Seena,
"How did you find out that db is now on level 4? I mean what's the size limit on level one?"
These ar...
Marcin W
01:39 AM Bug #45788 (Pending Backport): ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
Kefu Chai

06/10/2020

04:15 PM Bug #38745: spillover that doesn't make sense
Igor Fedotov wrote:
> @Marcin - perfect overview, thanks!
> Just want to mention that this "granular" level space a...
Seena Fallah
04:07 PM Bug #38745: spillover that doesn't make sense
@Marcin One more question I would be so thankful if you answer me, How did you find out that db is now on level 4? I ... Seena Fallah
04:02 PM Bug #38745: spillover that doesn't make sense
@Marcin Really thanks for your overview. Now I get's what's going on. Seena Fallah
01:18 PM Bug #38745: spillover that doesn't make sense
@Marcin - perfect overview, thanks!
Just want to mention that this "granular" level space allocation has been fixed ...
Igor Fedotov
11:07 AM Bug #38745: spillover that doesn't make sense
Hi Seena,
Metadata is stored in RocksDB which a 'logging database'. It doesn't replace or remove entries, it just ...
Marcin W
09:49 AM Bug #38745: spillover that doesn't make sense
I'm experiencing this in nautilus 14.2.9
Should the above PR solve this issue? I get what does the message really me...
Seena Fallah

06/09/2020

12:02 PM Bug #45788: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
/a/kchai-2020-06-09_07:14:19-rados-wip-kefu-testing-2020-06-09-1352-distro-basic-smithi/5131622 Kefu Chai
08:55 AM Bug #45195 (Resolved): ceph_test_objectstore: src/os/bluestore/bluestore_types.h: 734: FAILED cep...
I doubt we'll backport deferring big writes to Octopus. Hence marking as resolved Igor Fedotov
08:51 AM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
Original fix was https://github.com/ceph/ceph/pull/35201 but it was decided to kill WAL preextending instead, see htt... Igor Fedotov

06/08/2020

12:28 PM Bug #45519: OSD asserts during block allocation for BlueFS
OK, here is a backtrace from the OSD.30 log
ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautil...
Igor Fedotov

06/06/2020

08:37 AM Backport #45682 (In Progress): octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered...
Nathan Cutler

06/05/2020

09:54 PM Bug #40741 (Triaged): Mass OSD failure, unable to restart
Igor Fedotov
09:39 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Manual compaction is a workaround for slow KV access which tends to be cause by prior massive data removals. In regul... Igor Fedotov
05:33 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Thanks, it works now!
Do we need to run manual compaction, when we see high latency? It takes about 30 minutes to co...
Aleksei Zakharov
03:37 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Proper command line for my previous comment would be:
CEPH_ARGS="--bluefs-shared-alloc-size 4096" ceph-kvstore-too
Igor Fedotov
01:27 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Looks like you're using custom bluefs_shared_alloc_size setting set to 4K. And kvstore_tool is unaware of that and tr... Igor Fedotov
09:37 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
It is failing during any ceph-kvstore-tool command. It doesn't matter compaction/stats or list. Looks like another is... Aleksei Zakharov
04:52 PM Bug #45613 (Fix Under Review): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 f...
Igor Fedotov
04:03 PM Bug #45519: OSD asserts during block allocation for BlueFS
Aleksei Zakharov wrote:
> Igor Fedotov wrote:
> > Aleksei Zakharov wrote:
> > >
> > > Turning on stupid allocato...
Aleksei Zakharov
01:11 PM Bug #45903: BlueFS replay log grows without end
Seen that to. Looks like we're lacking some backports in luminous.
See https://github.com/ceph/ceph/pull/34876
Igor Fedotov
08:29 AM Bug #45903 (Resolved): BlueFS replay log grows without end
If data is slowly pouring to RocksDB WAL and new files are not created, BlueFS replay log can grow to the size that i... Adam Kupczyk

06/04/2020

02:50 PM Backport #45684 (In Progress): nautilus: Large (>=2 GB) writes are incomplete when bluefs_buffere...
Nathan Cutler
11:10 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
So it's failing during compaction only, right? It's even more interesting, could you please share more verbose ?log? Igor Fedotov
09:26 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
It starts and works ok. The only issue with it is the latency issue described earlier. Aleksei Zakharov

06/03/2020

04:47 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
So this OSD doesn't start any more, does it?
Please set debug-bluestore to 20 and collect the log if so.
Igor Fedotov
11:03 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Unfortunately manual compaction doesn't work. Error log is attached. Aleksei Zakharov
10:14 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
I will try manual compaction, thanks!
I'm not sure that turning direct IO on for bluefs can help somehow in this p...
Aleksei Zakharov
01:24 PM Backport #41339: mimic: os/bluestore/BlueFS: use 64K alloc_size on the shared device
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/30219
m...
Nathan Cutler
10:22 AM Bug #45519: OSD asserts during block allocation for BlueFS
Igor Fedotov wrote:
> Aleksei Zakharov wrote:
> >
> > Turning on stupid allocator with "bad" OSD turned into an i...
Aleksei Zakharov
02:02 AM Bug #45788: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
/a/yuriw-2020-05-30_02:18:17-rados-wip-yuri-master_5.29.20-distro-basic-smithi/5104557 Brad Hubbard

06/01/2020

10:21 PM Bug #45519: OSD asserts during block allocation for BlueFS
For the sake of record the following thread has some helpful info on assertion: h->file->fnode.ino != 1
I did som...
Igor Fedotov
10:07 PM Bug #45519: OSD asserts during block allocation for BlueFS
Aleksei Zakharov wrote:
>
> Turning on stupid allocator with "bad" OSD turned into an impossibility to start this ...
Igor Fedotov
10:47 AM Bug #45519: OSD asserts during block allocation for BlueFS
We added some new OSD's. During backfill process we were affected by another issue(https://tracker.ceph.com/issues/45... Aleksei Zakharov
10:16 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Anyway suggesting to compact DB manually.
Besides recently we switched backed to direct IO for bluefs, see https:/...
Igor Fedotov
10:18 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Nope, we use NVMe SSD's.
The issue starts with high page cache usage. Bluefs uses buffered IO, it reads a lot from...
Aleksei Zakharov
09:30 PM Bug #45613 (Resolved): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
Igor Fedotov
03:48 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
https://github.com/ceph/ceph/pull/35293 - Fix for octopus has been merged and released in 15.2.3. Neha Ojha

05/30/2020

03:29 PM Bug #44359: Raw usage reported by 'ceph osd df' incorrect when using WAL/DB on another drive
Same here. Fresh Cluster - completely empty. "Raw Use" corresponds to Size of DB+WAL/DB Partition located on separate... Tobias Fischer

05/29/2020

07:32 PM Bug #45788 (Fix Under Review): ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
Igor Fedotov
07:32 PM Bug #45788 (Pending Backport): ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
Igor Fedotov
07:14 PM Bug #45788 (Resolved): ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
Originally appeared in https://tracker.ceph.com/issues/45519
but cloning to another ticket since looks unrelated.
...
Igor Fedotov
07:17 PM Bug #45519: OSD asserts during block allocation for BlueFS
@Neha, @Kefu - made another ticket for QA test case failure, IMO unrelated.
https://tracker.ceph.com/issues/45788
Igor Fedotov
04:20 PM Bug #45519: OSD asserts during block allocation for BlueFS
We have similar issues on our cluster. ~1 billion objects in EC (8+3) pools, 540 OSDs, Nautilus 14.2.8. No tuning o... Simon Leinen
04:32 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Yeah, that's a known issue with RocksDB/BlueStore.
Manual compaction using "ceph-kvstore-tool bluestore-kv <path-t...
Igor Fedotov
12:18 PM Bug #45765 (Resolved): BlueStore::_collection_list causes huge latency growth pg deletion
Hi!
We have ceph v14.2.7 cluster with about 2 billions of objects. Each object is less than 4K size. One PG have abo...
Aleksei Zakharov

05/28/2020

08:30 PM Bug #45519: OSD asserts during block allocation for BlueFS
... Neha Ojha
02:25 PM Bug #44213: Erasure coded pool might need much more disk space than expected
We need to run some performance tests with 4k min_alloc_size. Neha Ojha
12:50 AM Bug #43147 (New): segv in LruOnodeCacheShard::_pin
Reopening this since I have seen it in /a/yuriw-2020-05-24_19:30:40-rados-wip-yuri-master_5.24.20-distro-basic-smithi... Brad Hubbard

05/27/2020

01:47 PM Bug #45519: OSD asserts during block allocation for BlueFS
... Kefu Chai

05/26/2020

04:53 AM Bug #45703 (New): LruOnodeCacheShard::_add: Assertion `!safemode_or_autounlink || node_algorithms...
/a/yuriw-2020-05-22_19:55:53-rados-wip-yuri-master_5.22.20-distro-basic-smithi/5083216... Brad Hubbard

05/25/2020

02:26 PM Bug #45110 (Resolved): Extent leak after main device expand
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are ... Nathan Cutler
11:05 AM Backport #45126 (Resolved): nautilus: Extent leak after main device expand
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/34711
m...
Nathan Cutler

05/24/2020

09:05 PM Backport #45684 (Resolved): nautilus: Large (>=2 GB) writes are incomplete when bluefs_buffered_i...
https://github.com/ceph/ceph/pull/35404 Nathan Cutler
09:05 PM Backport #45683 (Rejected): mimic: Large (>=2 GB) writes are incomplete when bluefs_buffered_io =...
Nathan Cutler
09:05 PM Backport #45682 (Resolved): octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io...
https://github.com/ceph/ceph/pull/35446 Nathan Cutler

05/22/2020

07:37 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
OK thanks. I'm also just confirming that nautilus would be immune, even though 14.2.10 will change bluefs_bufferio_io... Dan van der Ster
05:30 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
At the same time IIUC it's OSD restart which reveals data corruption - while OSD is running it doesn't read from WAL ... Igor Fedotov
05:26 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
@Dan - right. Setting bluefs_preextend_wal_files seems better to me (one can even try to do that on the fly) but I'd ... Igor Fedotov
04:27 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
@Igor: So setting bluefs_preextend_wal_files=false and/or bluefs_buffered_io=true should workaround the issue until t... Dan van der Ster
04:02 PM Bug #45613: ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 failed
So the bug is caused by submitting overlapping write requests for BlueFS WAL via libaio. BlueFS::_flush_range might e... Igor Fedotov
03:53 PM Bug #45613 (Fix Under Review): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2 f...
Igor Fedotov
03:45 PM Bug #45657: false positive check in KernelDevice::aio_log_start
Also the checking for overlapped io is currently broken due to preextended WAL prefilling in BlueFS::_flush_range. Wr... Igor Fedotov
02:23 PM Bug #45657 (New): false positive check in KernelDevice::aio_log_start
When enabling bdev_debug_inflight_ios KernelDevice might improperly detect conflicts for inflight read ops. This is c... Igor Fedotov
04:31 AM Bug #45337 (Pending Backport): Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true
Kefu Chai
04:30 AM Bug #45337 (Resolved): Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true
Kefu Chai
04:29 AM Bug #45335 (Resolved): cephadm upgrade: OSD.0 is not coming back after restart: rocksdb: verify_...
Kefu Chai
 

Also available in: Atom