Project

General

Profile

Activity

From 10/25/2020 to 11/23/2020

11/23/2020

06:32 PM Bug #48036: bluefs corrupted in a OSD
@Satoru,
could you please reproduce the issue once again, now with both debug_bluefs set to 20 and debug_bluestore s...
Igor Fedotov
06:06 PM Bug #48036: bluefs corrupted in a OSD
Satoru Takeuchi wrote:
> @Igor
>
> Do you have any progress?
Hi Satoru,
sorry for a long response.
At the se...
Igor Fedotov
04:13 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
v14.2.11 has got hybrid allocator enabled but bluestore_volume_selection_policy was still at original there. Hence th... Igor Fedotov
03:00 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))

I got the same issue in nautilus 14.2.11
it happened four times on different nodes..
Bálint Szűcs
02:49 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Igor Fedotov wrote:
> Thanks everybody for updates. Yeah I understand all the complexities for the debugging this so...
Dan van der Ster
02:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Thanks everybody for updates. Yeah I understand all the complexities for the debugging this sort of issues in a produ... Igor Fedotov
01:55 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
I got the same issue in nautilus 14.2.14
it happened four times on different nodes..
Bálint Szűcs
01:01 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
I got the same issue in nautilus 14.2.14
Full trace: https://paste.ubuntu.com/p/4KHcCG9YQx/
Seena Fallah
12:49 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Hi Igor,
thanks for answering.
The thing is:
- Issue isn't reproduceable
- Happens on Production Systems.
...
Bastian Mäuser
12:45 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
Meanwhile I see no way to troubleshoot this unless one is able to repro the issue with debug-bdev set to 20. Igor Fedotov
12:43 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
The following patch once merged [and backported] will provide more insight on the issue's root cause.
https://gith...
Igor Fedotov
01:46 PM Documentation #23443 (Resolved): doc: object -> file -> disk is wrong for bluestore
Zac Dover

11/22/2020

05:40 AM Fix #48272 (Resolved): osd: fix bluestore avl allocator
Kefu Chai

11/20/2020

02:40 AM Bug #48036: bluefs corrupted in a OSD
@Igor
Do you have any progress?
Satoru Takeuchi

11/19/2020

01:30 PM Bug #48070: Wrong bluefs db usage value (doubled) returned by `perf dump` when option `bluestore_...
``max_total_wal_size` is discussed here: https://github.com/ceph/ceph/pull/35277 Kinga Karczewska
01:29 PM Bug #48070: Wrong bluefs db usage value (doubled) returned by `perf dump` when option `bluestore_...
As it turned out, this was caused by small size of WAL (`bluestore block wal size`) and the fact I did not set `max_t... Kinga Karczewska
07:07 AM Documentation #23443: doc: object -> file -> disk is wrong for bluestore
https://github.com/ceph/ceph/pull/38181 entered Anthony D'Atri
04:16 AM Documentation #24075: Bluestore and Bluefs Config Reference
That document currently lists a number of BlueStore configuration options. Emre,is there anything specific that you ... Anthony D'Atri
03:03 AM Fix #48288 (Need More Info): test/objectstore: allocate function may return -ENOSPC
test/objectstore: allocate function may return -ENOSPC yantao xue

11/18/2020

09:53 PM Backport #48282 (Resolved): nautilus: osd: fix bluestore bitmap allocator
https://github.com/ceph/ceph/pull/39708 Nathan Cutler
09:53 PM Backport #48281 (Resolved): octopus: osd: fix bluestore bitmap allocator
https://github.com/ceph/ceph/pull/38430 Nathan Cutler
01:45 PM Bug #48276 (Duplicate): OSD Crash with ceph_assert(is_valid_io(off, len))
Hello,
Last night one OSD in a 3-node Cluster crashed with the attached Crashreport. I can pretty much rule out Ha...
Bastian Mäuser
09:07 AM Fix #48272 (Fix Under Review): osd: fix bluestore avl allocator
Kefu Chai
08:21 AM Fix #48272 (Resolved): osd: fix bluestore avl allocator
In _block_picker, the first rs == t.begin() can also tell we've searched the whole tree yantao xue

11/17/2020

01:32 AM Bug #48256 (Can't reproduce): Many4KWritesNoCSumTest fails on nautilus [ FAILED ] ObjectStore/S...
/a/bhubbard-2020-11-16_08:11:06-rados-wip-nautilus-badone-testing-2-distro-basic-smithi/5630856... Brad Hubbard

11/16/2020

02:05 AM Bug #38554: ObjectStore/StoreTestSpecificAUSize.TooManyBlobsTest/2 fail, Expected: (res_stat.allo...
Once again on nautilus HEAD + 1 patch that is unrelated.
/ceph/teuthology-archive/bhubbard-2020-11-13_00:57:56-rad...
Brad Hubbard

11/15/2020

04:39 PM Bug #48214 (Pending Backport): osd: fix bluestore bitmap allocator
Kefu Chai

11/13/2020

08:32 PM Bug #42928: ceph-bluestore-tool bluefs-bdev-new-db does not update lv tags
Quick question, what if the bluestore device is not an lvm device ? All my devices were created with luminous with c... Simon Pierre Desrosiers

11/12/2020

10:45 PM Bug #48218 (Can't reproduce): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompressionAlgor...
... Neha Ojha
05:40 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Eric Petit wrote:
> > @Eric How is that @bluefs_buffered_io = true@ working for you? We are considering to re-enabl...
Dan van der Ster
05:14 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
> @Eric How is that @bluefs_buffered_io = true@ working for you? We are considering to re-enable it to help workarou... Eric Petit
05:21 PM Bug #48216: Spanning blobs list might have zombie blobs that aren't of use any more
Related PR to detect leaked spanning blobs and fix with fsck: https://github.com/ceph/ceph/pull/38050 Igor Fedotov
05:15 PM Bug #48216 (New): Spanning blobs list might have zombie blobs that aren't of use any more
As reported at https://tracker.ceph.com/issues/40449#note-9 users are still facing "no blob id" assertion. Provided l... Igor Fedotov
05:17 PM Backport #40449: nautilus: "no available blob id" assertion might occur
Nathan Cutler wrote:
> @Alexander - it might make sense to open a new bug in the Bluestore project for that, since t...
Igor Fedotov
02:57 PM Bug #48214 (Resolved): osd: fix bluestore bitmap allocator
bluestore bitmap allocator calculate wrong last_pos with hint, for hint is a bdev physical addr. yantao xue

11/11/2020

02:21 PM Backport #48194 (Resolved): octopus: bufferlist c_str() sometimes clears assignment to mempool
https://github.com/ceph/ceph/pull/38429 Nathan Cutler
02:21 PM Backport #48193 (Resolved): nautilus: bufferlist c_str() sometimes clears assignment to mempool
https://github.com/ceph/ceph/pull/39651 Nathan Cutler
02:18 PM Bug #46027 (Pending Backport): bufferlist c_str() sometimes clears assignment to mempool
Nathan Cutler
09:36 AM Bug #46027: bufferlist c_str() sometimes clears assignment to mempool
It could be beneficial to backport this fix to octypus and nautilus. I suspect that it fixes a rebuild leak to the bu... Zac Medico
12:27 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
Eric Petit wrote:
> > Besides recently we switched backed to direct IO for bluefs, see https://github.com/ceph/ceph/...
Dan van der Ster

11/09/2020

06:42 AM Bug #48036: bluefs corrupted in a OSD
> given you're able to reproduce the issue locally would you be able to collect OSD log (with debug-bluefs = 20) BEFO... Satoru Takeuchi

11/06/2020

04:39 PM Backport #40449: nautilus: "no available blob id" assertion might occur
@Alexander - it might make sense to open a new bug in the Bluestore project for that, since this one is closed. Nathan Cutler
03:38 PM Backport #40449: nautilus: "no available blob id" assertion might occur
Alexander Patrakov wrote:
> Nathan Cutler wrote:
> > This update was made using the script "backport-resolve-issue"...
Alexander Patrakov
02:37 PM Backport #40449: nautilus: "no available blob id" assertion might occur
Nathan Cutler wrote:
> This update was made using the script "backport-resolve-issue".
> backport PR https://github...
Alexander Patrakov
12:36 PM Bug #48036: bluefs corrupted in a OSD
> Unfortunately, after capturing logs, this problem hasn't been reproduced.
More precisely, with setting `debug-bl...
Satoru Takeuchi
11:14 AM Bug #48036: bluefs corrupted in a OSD
> > Your initial analysis about ino 26 being removed and later reused is very helpful and indicative. Wondering if th... Satoru Takeuchi

11/05/2020

07:59 PM Backport #47894 (Resolved): nautilus: Compressed blobs lack checksums
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37843
m...
Nathan Cutler
05:19 PM Backport #47894: nautilus: Compressed blobs lack checksums
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37843
merged
Yuri Weinstein
07:59 PM Backport #47707 (Resolved): nautilus: Potential race condition regression around new OSD flock()s
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37842
m...
Nathan Cutler
05:19 PM Backport #47707: nautilus: Potential race condition regression around new OSD flock()s
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37842
merged
Yuri Weinstein
05:59 PM Backport #46008 (Resolved): nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37824
m...
Nathan Cutler
05:18 PM Backport #46008: nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37824
merged
Yuri Weinstein

11/04/2020

02:12 PM Backport #46194 (In Progress): nautilus: BlueFS replay log grows without end
Nathan Cutler
11:38 AM Backport #46194: nautilus: BlueFS replay log grows without end
Fixed by https://github.com/ceph/ceph/pull/37948 Adam Kupczyk

11/03/2020

11:26 AM Backport #48094 (Resolved): octopus: Hybrid allocator might segfault when fallback allocator is p...
https://github.com/ceph/ceph/pull/38428 Nathan Cutler
11:26 AM Backport #48093 (Resolved): nautilus: Hybrid allocator might segfault when fallback allocator is ...
https://github.com/ceph/ceph/pull/38637 Nathan Cutler
11:25 AM Backport #48092 (Rejected): mimic: Hybrid allocator might segfault when fallback allocator is pre...
Nathan Cutler
02:09 AM Bug #48025: osd start up failed when osd superblock crc fail
Igor Fedotov wrote:
> Bo Zhang wrote:
> > Another bug also appears on the same node.(https://tracker.ceph.com/issue...
Bo Zhang

11/02/2020

10:37 PM Backport #46194 (Need More Info): nautilus: BlueFS replay log grows without end
first attempted backport - https://github.com/ceph/ceph/pull/37833 - was closed Nathan Cutler
03:34 PM Bug #48036: bluefs corrupted in a OSD
@Satoru,
given you're able to reproduce the issue locally would you be able to collect OSD log (with debug-bluefs = ...
Igor Fedotov
12:50 PM Bug #48036: bluefs corrupted in a OSD
As you suspect, `bluefs-bdev-expand` seems to be the first sensor. After running the reproducer with my custom Rook, ... Satoru Takeuchi
12:25 PM Bug #48036: bluefs corrupted in a OSD
> Nevertheless I'm not completely sure whether bluefs-bdev-expand is a trigger for the issue or it's just the first "... Satoru Takeuchi
12:10 PM Bug #48036: bluefs corrupted in a OSD
Hi Satoru,
thanks for the update.
Nevertheless I'm not completely sure whether bluefs-bdev-expand is a trigger for ...
Igor Fedotov
11:27 AM Bug #48036: bluefs corrupted in a OSD
I succeeded to reproduce this problem in my Rook/Ceph cluster.
https://github.com/rook/rook/issues/6530
I gue...
Satoru Takeuchi
12:01 AM Bug #48036: bluefs corrupted in a OSD
> As far as I can see you're attempting to expand DB volume, weren't you? Any rationale for that?
> Wasn't that a vo...
Satoru Takeuchi
02:18 PM Bug #48070 (New): Wrong bluefs db usage value (doubled) returned by `perf dump` when option `blue...
During some tests we discovered that OSD db usage returned by `ceph daemon osd.num perf dump` tool is twice the real ... Kinga Karczewska
12:26 PM Bug #47751 (Pending Backport): Hybrid allocator might segfault when fallback allocator is present
Igor Fedotov
11:35 AM Bug #48025: osd start up failed when osd superblock crc fail
Bo Zhang wrote:
> Another bug also appears on the same node.(https://tracker.ceph.com/issues/48061)
This another ...
Igor Fedotov
02:38 AM Bug #48025: osd start up failed when osd superblock crc fail
Another bug also appears on the same node.(https://tracker.ceph.com/issues/48061)
Bo Zhang
02:09 AM Bug #48025: osd start up failed when osd superblock crc fail
Igor Fedotov wrote:
> Bo Jang, I haven't got your last commends on disabled WAL, please elaborate.
>
> From Rocks...
Bo Zhang
02:34 AM Bug #48061 (New): .sst block checksum mismatch
【verson】
14.2.8
【trigger operation 】
Under normal operation of the cluster, power down the equipment manually, and...
Bo Zhang

11/01/2020

05:12 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Staring rebuild of osd.0. Jamin Collins
06:56 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Will start the recreation of osd.0 tomorrow (roughly 10 hours from now). Will check this bug report before doing so. Jamin Collins

10/30/2020

10:56 AM Bug #48036: bluefs corrupted in a OSD
Igor Fedotov wrote:
>
> Please set debug-bluestore & debug-bluefs to 20 and collect OSD startup log.
Never mind...
Igor Fedotov
10:41 AM Bug #48036: bluefs corrupted in a OSD
As far as I can see you're attempting to expand DB volume, weren't you? Any rationale for that?
Wasn't that a volum...
Igor Fedotov
10:41 AM Bug #48036: bluefs corrupted in a OSD
Both
https://tracker.ceph.com/issues/46886
and https://github.com/ceph/ceph/pull/36745
were following up the http...
Igor Fedotov
10:28 AM Bug #48025: osd start up failed when osd superblock crc fail
Bo Jang, I haven't got your last commends on disabled WAL, please elaborate.
From RocksDB config line I don't see ...
Igor Fedotov
10:13 AM Bug #48047: osd: fix bluestore stupid allocator
IMO bdev_block_size should be marked with FLAG_STARTUP (or even FLAG_CREATE) and hence protected from the modificatio... Igor Fedotov
03:33 AM Bug #48047 (Rejected): osd: fix bluestore stupid allocator
In StupidAllocator::_choose_bin, it uses cct->_conf->bdev_block_size that can be changed in the allocator running,but... yantao xue

10/29/2020

10:59 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
I'm planning to zap and rebuild the OSD (@osd.0@) this weekend. Please let me know if there's any information you'd ... Jamin Collins
02:35 PM Bug #47330 (Fix Under Review): ceph-osd can't start when CURRENT file does not end with newline o...
Neha Ojha
02:33 PM Bug #47453 (Can't reproduce): checksum failures lead to assert on OSD shutdown in lab tests
Neha Ojha
02:26 PM Bug #47874 (Need More Info): Allocation error even though the block has 50 GB free
Neha Ojha
02:24 PM Bug #47883 (Need More Info): bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r...
Still waiting for https://tracker.ceph.com/issues/47883#note-5 Neha Ojha
06:20 AM Bug #48036 (Closed): bluefs corrupted in a OSD
I hit a problem that is very similar to the following issue/PR in v15.2.4.
upgrade/nautilus-x-master: bluefs mount...
Satoru Takeuchi
01:37 AM Bug #48025: osd start up failed when osd superblock crc fail
Bo Zhang wrote:
> Igor Fedotov wrote:
> > Just in case - don't you have any custom settings for RocksDB, e.g. disab...
Bo Zhang
01:36 AM Bug #48025: osd start up failed when osd superblock crc fail
Igor Fedotov wrote:
> Just in case - don't you have any custom settings for RocksDB, e.g. disabled WAL?
NOT disab...
Bo Zhang
01:30 AM Bug #48025: osd start up failed when osd superblock crc fail
Igor Fedotov wrote:
> Just in case - don't you have any custom settings for RocksDB, e.g. disabled WAL?
Has been ...
Bo Zhang

10/28/2020

03:16 PM Bug #46490: osds crashing during deep-scrub
It seems that the ceph-bluestore-tool repair only temporarily resolves the issue for us.
We ran the repair tool on e...
Lawrence Smith
10:51 AM Bug #48025: osd start up failed when osd superblock crc fail
Just in case - don't you have any custom settings for RocksDB, e.g. disabled WAL? Igor Fedotov
09:52 AM Bug #48025 (New): osd start up failed when osd superblock crc fail
【verson】
14.2.8
【trigger operation 】
Under normal operation of the cluster, power down the equipment ...
Bo Zhang
10:47 AM Bug #47985: When WAL is closed, osd cannot be restarted
I doubt it will work this way as there would be no onode's metadata consistency guarantee any more... In your case su... Igor Fedotov
06:04 AM Bug #47985: When WAL is closed, osd cannot be restarted
Hi Igor:
1. we've found disable WAL would reduce latency(measured by P99.9 latency),as we've tested rgw put worklo...
Jiaying Ren

10/27/2020

02:42 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
> At the moment I don't see anything else to one can retrieve from this daemon. But suggest to keep it for additional... Jamin Collins
09:48 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Jamin Collins wrote:
> Nothing suspicious about it either. It's the DB device for all the OSDs on that host and is ...
Igor Fedotov
09:39 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Jamin Collins wrote:
> Also, should I continue to keep the OSD in its failed state, is there any information that ca...
Igor Fedotov
09:37 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Jamin Collins wrote:
> What about the error coinciding precisely with the log volume filling? Any chance that's the...
Igor Fedotov
11:11 AM Bug #47985: When WAL is closed, osd cannot be restarted
In addition, I have also tried to deploy osd first, and then modify the bluestore_rocksdb_options in the configuratio... jiaxu li
11:03 AM Bug #47985: When WAL is closed, osd cannot be restarted
In some application scenarios, I want to close wal in order to get lower latency and higher IOPS. After closing wal, ... jiaxu li
09:22 AM Bug #47985: When WAL is closed, osd cannot be restarted
I haven't investigated this deeper but what's the rationale to disableWAL? Generally this introduces a breach to data... Igor Fedotov
01:40 AM Bug #47985: When WAL is closed, osd cannot be restarted
The detailed steps to deploy the cluster are as follows:
1. deploy a cluster without osd
```
MON=1 OSD=0 MDS=0 MGR...
jiaxu li
09:47 AM Backport #47892 (In Progress): octopus: Compressed blobs lack checksums
Nathan Cutler
09:46 AM Backport #47708 (In Progress): octopus: Potential race condition regression around new OSD flock()s
Nathan Cutler
08:36 AM Backport #47894 (In Progress): nautilus: Compressed blobs lack checksums
Nathan Cutler
08:35 AM Backport #47707 (In Progress): nautilus: Potential race condition regression around new OSD flock()s
Nathan Cutler
08:27 AM Backport #47669 (Need More Info): nautilus: Some structs aren't bound to mempools properly
Not immediately clear how to backport this. Nathan Cutler
08:07 AM Backport #46194 (In Progress): nautilus: BlueFS replay log grows without end
Nathan Cutler

10/26/2020

10:31 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Also, should I continue to keep the OSD in its failed state, is there any information that can be retrieved from it t... Jamin Collins
09:57 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Nothing suspicious about it either. It's the DB device for all the OSDs on that host and is the same as in the previ... Jamin Collins
09:25 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
So there is no spillover to main(HDD) device. Hence the issue is rather not related to this device.
Anything suspi...
Igor Fedotov
05:39 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
> Additionally for OSDs running on the same host (I presume you haven't restarted them for a while, have you?) please... Jamin Collins
05:16 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Could you please provide a report for ceph-bluestore-tool's bluefs-bdev-sizes command. Wondering if this OSD has any ... Igor Fedotov
04:42 PM Bug #48002 (New): Compaction error: Corruption: block checksum mismatch:
I appear to have ran into https://tracker.ceph.com/issues/37282 again.
Same AMD based host...
Jamin Collins
09:55 PM Backport #46599 (Resolved): octopus: Rescue procedure for extremely large bluefs log
Nathan Cutler
09:51 PM Backport #46008 (In Progress): nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentati...
Nathan Cutler
12:04 PM Backport #47671 (In Progress): octopus: Hybrid allocator might cause duplicate admin socket comma...
https://github.com/ceph/ceph/pull/37794 Igor Fedotov
12:03 PM Backport #47672 (In Progress): nautilus: Hybrid allocator might cause duplicate admin socket comm...
https://github.com/ceph/ceph/pull/37793 Igor Fedotov
11:18 AM Bug #47985 (Need More Info): When WAL is closed, osd cannot be restarted
It's not clear what did you mean under "close bluestore wal during deployment, and place bluestore wal/db and block o... Igor Fedotov
09:57 AM Bug #47985 (Need More Info): When WAL is closed, osd cannot be restarted
Compile the master branch source code, use vstart to deploy the cluster, close bluestore wal during deployment, and p... jiaxu li
10:52 AM Bug #38272: "no available blob id" assertion might occur
Nathan Cutler wrote:
> Jiang Yu wrote:
> > I encountered the same problem in ceph 12.2.2, but found that there is n...
Igor Fedotov
10:36 AM Bug #38272: "no available blob id" assertion might occur
Jiang Yu wrote:
> I encountered the same problem in ceph 12.2.2, but found that there is no patch available in ceph ...
Nathan Cutler
01:24 AM Bug #38272: "no available blob id" assertion might occur
Hello everyone,
I encountered the same problem in ceph 12.2.2, but found that there is no patch available in ceph 1...
Jiang Yu
 

Also available in: Atom