Project

General

Profile

Activity

From 10/08/2020 to 11/06/2020

11/06/2020

04:39 PM Backport #40449: nautilus: "no available blob id" assertion might occur
@Alexander - it might make sense to open a new bug in the Bluestore project for that, since this one is closed. Nathan Cutler
03:38 PM Backport #40449: nautilus: "no available blob id" assertion might occur
Alexander Patrakov wrote:
> Nathan Cutler wrote:
> > This update was made using the script "backport-resolve-issue"...
Alexander Patrakov
02:37 PM Backport #40449: nautilus: "no available blob id" assertion might occur
Nathan Cutler wrote:
> This update was made using the script "backport-resolve-issue".
> backport PR https://github...
Alexander Patrakov
12:36 PM Bug #48036: bluefs corrupted in a OSD
> Unfortunately, after capturing logs, this problem hasn't been reproduced.
More precisely, with setting `debug-bl...
Satoru Takeuchi
11:14 AM Bug #48036: bluefs corrupted in a OSD
> > Your initial analysis about ino 26 being removed and later reused is very helpful and indicative. Wondering if th... Satoru Takeuchi

11/05/2020

07:59 PM Backport #47894 (Resolved): nautilus: Compressed blobs lack checksums
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37843
m...
Nathan Cutler
05:19 PM Backport #47894: nautilus: Compressed blobs lack checksums
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37843
merged
Yuri Weinstein
07:59 PM Backport #47707 (Resolved): nautilus: Potential race condition regression around new OSD flock()s
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37842
m...
Nathan Cutler
05:19 PM Backport #47707: nautilus: Potential race condition regression around new OSD flock()s
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37842
merged
Yuri Weinstein
05:59 PM Backport #46008 (Resolved): nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/...
This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37824
m...
Nathan Cutler
05:18 PM Backport #46008: nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37824
merged
Yuri Weinstein

11/04/2020

02:12 PM Backport #46194 (In Progress): nautilus: BlueFS replay log grows without end
Nathan Cutler
11:38 AM Backport #46194: nautilus: BlueFS replay log grows without end
Fixed by https://github.com/ceph/ceph/pull/37948 Adam Kupczyk

11/03/2020

11:26 AM Backport #48094 (Resolved): octopus: Hybrid allocator might segfault when fallback allocator is p...
https://github.com/ceph/ceph/pull/38428 Nathan Cutler
11:26 AM Backport #48093 (Resolved): nautilus: Hybrid allocator might segfault when fallback allocator is ...
https://github.com/ceph/ceph/pull/38637 Nathan Cutler
11:25 AM Backport #48092 (Rejected): mimic: Hybrid allocator might segfault when fallback allocator is pre...
Nathan Cutler
02:09 AM Bug #48025: osd start up failed when osd superblock crc fail
Igor Fedotov wrote:
> Bo Zhang wrote:
> > Another bug also appears on the same node.(https://tracker.ceph.com/issue...
Bo Zhang

11/02/2020

10:37 PM Backport #46194 (Need More Info): nautilus: BlueFS replay log grows without end
first attempted backport - https://github.com/ceph/ceph/pull/37833 - was closed Nathan Cutler
03:34 PM Bug #48036: bluefs corrupted in a OSD
@Satoru,
given you're able to reproduce the issue locally would you be able to collect OSD log (with debug-bluefs = ...
Igor Fedotov
12:50 PM Bug #48036: bluefs corrupted in a OSD
As you suspect, `bluefs-bdev-expand` seems to be the first sensor. After running the reproducer with my custom Rook, ... Satoru Takeuchi
12:25 PM Bug #48036: bluefs corrupted in a OSD
> Nevertheless I'm not completely sure whether bluefs-bdev-expand is a trigger for the issue or it's just the first "... Satoru Takeuchi
12:10 PM Bug #48036: bluefs corrupted in a OSD
Hi Satoru,
thanks for the update.
Nevertheless I'm not completely sure whether bluefs-bdev-expand is a trigger for ...
Igor Fedotov
11:27 AM Bug #48036: bluefs corrupted in a OSD
I succeeded to reproduce this problem in my Rook/Ceph cluster.
https://github.com/rook/rook/issues/6530
I gue...
Satoru Takeuchi
12:01 AM Bug #48036: bluefs corrupted in a OSD
> As far as I can see you're attempting to expand DB volume, weren't you? Any rationale for that?
> Wasn't that a vo...
Satoru Takeuchi
02:18 PM Bug #48070 (New): Wrong bluefs db usage value (doubled) returned by `perf dump` when option `blue...
During some tests we discovered that OSD db usage returned by `ceph daemon osd.num perf dump` tool is twice the real ... Kinga Karczewska
12:26 PM Bug #47751 (Pending Backport): Hybrid allocator might segfault when fallback allocator is present
Igor Fedotov
11:35 AM Bug #48025: osd start up failed when osd superblock crc fail
Bo Zhang wrote:
> Another bug also appears on the same node.(https://tracker.ceph.com/issues/48061)
This another ...
Igor Fedotov
02:38 AM Bug #48025: osd start up failed when osd superblock crc fail
Another bug also appears on the same node.(https://tracker.ceph.com/issues/48061)
Bo Zhang
02:09 AM Bug #48025: osd start up failed when osd superblock crc fail
Igor Fedotov wrote:
> Bo Jang, I haven't got your last commends on disabled WAL, please elaborate.
>
> From Rocks...
Bo Zhang
02:34 AM Bug #48061 (New): .sst block checksum mismatch
【verson】
14.2.8
【trigger operation 】
Under normal operation of the cluster, power down the equipment manually, and...
Bo Zhang

11/01/2020

05:12 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Staring rebuild of osd.0. Jamin Collins
06:56 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Will start the recreation of osd.0 tomorrow (roughly 10 hours from now). Will check this bug report before doing so. Jamin Collins

10/30/2020

10:56 AM Bug #48036: bluefs corrupted in a OSD
Igor Fedotov wrote:
>
> Please set debug-bluestore & debug-bluefs to 20 and collect OSD startup log.
Never mind...
Igor Fedotov
10:41 AM Bug #48036: bluefs corrupted in a OSD
As far as I can see you're attempting to expand DB volume, weren't you? Any rationale for that?
Wasn't that a volum...
Igor Fedotov
10:41 AM Bug #48036: bluefs corrupted in a OSD
Both
https://tracker.ceph.com/issues/46886
and https://github.com/ceph/ceph/pull/36745
were following up the http...
Igor Fedotov
10:28 AM Bug #48025: osd start up failed when osd superblock crc fail
Bo Jang, I haven't got your last commends on disabled WAL, please elaborate.
From RocksDB config line I don't see ...
Igor Fedotov
10:13 AM Bug #48047: osd: fix bluestore stupid allocator
IMO bdev_block_size should be marked with FLAG_STARTUP (or even FLAG_CREATE) and hence protected from the modificatio... Igor Fedotov
03:33 AM Bug #48047 (Rejected): osd: fix bluestore stupid allocator
In StupidAllocator::_choose_bin, it uses cct->_conf->bdev_block_size that can be changed in the allocator running,but... yantao xue

10/29/2020

10:59 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
I'm planning to zap and rebuild the OSD (@osd.0@) this weekend. Please let me know if there's any information you'd ... Jamin Collins
02:35 PM Bug #47330 (Fix Under Review): ceph-osd can't start when CURRENT file does not end with newline o...
Neha Ojha
02:33 PM Bug #47453 (Can't reproduce): checksum failures lead to assert on OSD shutdown in lab tests
Neha Ojha
02:26 PM Bug #47874 (Need More Info): Allocation error even though the block has 50 GB free
Neha Ojha
02:24 PM Bug #47883 (Need More Info): bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r...
Still waiting for https://tracker.ceph.com/issues/47883#note-5 Neha Ojha
06:20 AM Bug #48036 (Closed): bluefs corrupted in a OSD
I hit a problem that is very similar to the following issue/PR in v15.2.4.
upgrade/nautilus-x-master: bluefs mount...
Satoru Takeuchi
01:37 AM Bug #48025: osd start up failed when osd superblock crc fail
Bo Zhang wrote:
> Igor Fedotov wrote:
> > Just in case - don't you have any custom settings for RocksDB, e.g. disab...
Bo Zhang
01:36 AM Bug #48025: osd start up failed when osd superblock crc fail
Igor Fedotov wrote:
> Just in case - don't you have any custom settings for RocksDB, e.g. disabled WAL?
NOT disab...
Bo Zhang
01:30 AM Bug #48025: osd start up failed when osd superblock crc fail
Igor Fedotov wrote:
> Just in case - don't you have any custom settings for RocksDB, e.g. disabled WAL?
Has been ...
Bo Zhang

10/28/2020

03:16 PM Bug #46490: osds crashing during deep-scrub
It seems that the ceph-bluestore-tool repair only temporarily resolves the issue for us.
We ran the repair tool on e...
Lawrence Smith
10:51 AM Bug #48025: osd start up failed when osd superblock crc fail
Just in case - don't you have any custom settings for RocksDB, e.g. disabled WAL? Igor Fedotov
09:52 AM Bug #48025 (New): osd start up failed when osd superblock crc fail
【verson】
14.2.8
【trigger operation 】
Under normal operation of the cluster, power down the equipment ...
Bo Zhang
10:47 AM Bug #47985: When WAL is closed, osd cannot be restarted
I doubt it will work this way as there would be no onode's metadata consistency guarantee any more... In your case su... Igor Fedotov
06:04 AM Bug #47985: When WAL is closed, osd cannot be restarted
Hi Igor:
1. we've found disable WAL would reduce latency(measured by P99.9 latency),as we've tested rgw put worklo...
Jiaying Ren

10/27/2020

02:42 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
> At the moment I don't see anything else to one can retrieve from this daemon. But suggest to keep it for additional... Jamin Collins
09:48 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Jamin Collins wrote:
> Nothing suspicious about it either. It's the DB device for all the OSDs on that host and is ...
Igor Fedotov
09:39 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Jamin Collins wrote:
> Also, should I continue to keep the OSD in its failed state, is there any information that ca...
Igor Fedotov
09:37 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Jamin Collins wrote:
> What about the error coinciding precisely with the log volume filling? Any chance that's the...
Igor Fedotov
11:11 AM Bug #47985: When WAL is closed, osd cannot be restarted
In addition, I have also tried to deploy osd first, and then modify the bluestore_rocksdb_options in the configuratio... jiaxu li
11:03 AM Bug #47985: When WAL is closed, osd cannot be restarted
In some application scenarios, I want to close wal in order to get lower latency and higher IOPS. After closing wal, ... jiaxu li
09:22 AM Bug #47985: When WAL is closed, osd cannot be restarted
I haven't investigated this deeper but what's the rationale to disableWAL? Generally this introduces a breach to data... Igor Fedotov
01:40 AM Bug #47985: When WAL is closed, osd cannot be restarted
The detailed steps to deploy the cluster are as follows:
1. deploy a cluster without osd
```
MON=1 OSD=0 MDS=0 MGR...
jiaxu li
09:47 AM Backport #47892 (In Progress): octopus: Compressed blobs lack checksums
Nathan Cutler
09:46 AM Backport #47708 (In Progress): octopus: Potential race condition regression around new OSD flock()s
Nathan Cutler
08:36 AM Backport #47894 (In Progress): nautilus: Compressed blobs lack checksums
Nathan Cutler
08:35 AM Backport #47707 (In Progress): nautilus: Potential race condition regression around new OSD flock()s
Nathan Cutler
08:27 AM Backport #47669 (Need More Info): nautilus: Some structs aren't bound to mempools properly
Not immediately clear how to backport this. Nathan Cutler
08:07 AM Backport #46194 (In Progress): nautilus: BlueFS replay log grows without end
Nathan Cutler

10/26/2020

10:31 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Also, should I continue to keep the OSD in its failed state, is there any information that can be retrieved from it t... Jamin Collins
09:57 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Nothing suspicious about it either. It's the DB device for all the OSDs on that host and is the same as in the previ... Jamin Collins
09:25 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
So there is no spillover to main(HDD) device. Hence the issue is rather not related to this device.
Anything suspi...
Igor Fedotov
05:39 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
> Additionally for OSDs running on the same host (I presume you haven't restarted them for a while, have you?) please... Jamin Collins
05:16 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
Could you please provide a report for ceph-bluestore-tool's bluefs-bdev-sizes command. Wondering if this OSD has any ... Igor Fedotov
04:42 PM Bug #48002 (New): Compaction error: Corruption: block checksum mismatch:
I appear to have ran into https://tracker.ceph.com/issues/37282 again.
Same AMD based host...
Jamin Collins
09:55 PM Backport #46599 (Resolved): octopus: Rescue procedure for extremely large bluefs log
Nathan Cutler
09:51 PM Backport #46008 (In Progress): nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentati...
Nathan Cutler
12:04 PM Backport #47671 (In Progress): octopus: Hybrid allocator might cause duplicate admin socket comma...
https://github.com/ceph/ceph/pull/37794 Igor Fedotov
12:03 PM Backport #47672 (In Progress): nautilus: Hybrid allocator might cause duplicate admin socket comm...
https://github.com/ceph/ceph/pull/37793 Igor Fedotov
11:18 AM Bug #47985 (Need More Info): When WAL is closed, osd cannot be restarted
It's not clear what did you mean under "close bluestore wal during deployment, and place bluestore wal/db and block o... Igor Fedotov
09:57 AM Bug #47985 (Need More Info): When WAL is closed, osd cannot be restarted
Compile the master branch source code, use vstart to deploy the cluster, close bluestore wal during deployment, and p... jiaxu li
10:52 AM Bug #38272: "no available blob id" assertion might occur
Nathan Cutler wrote:
> Jiang Yu wrote:
> > I encountered the same problem in ceph 12.2.2, but found that there is n...
Igor Fedotov
10:36 AM Bug #38272: "no available blob id" assertion might occur
Jiang Yu wrote:
> I encountered the same problem in ceph 12.2.2, but found that there is no patch available in ceph ...
Nathan Cutler
01:24 AM Bug #38272: "no available blob id" assertion might occur
Hello everyone,
I encountered the same problem in ceph 12.2.2, but found that there is no patch available in ceph 1...
Jiang Yu

10/20/2020

12:43 PM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
Igor Fedotov wrote:
> chunsong feng wrote:
> Hi Feng,
>
> you're using master Ceph branch, right?
right
> Am I...
chunsong feng
10:56 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
Also prior to redeploying the cluster/OSDs you might want to learn current fragmentation rating using:
ceph-bluestor...
Igor Fedotov
10:38 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
chunsong feng wrote:
Hi Feng,
you're using master Ceph branch, right?
Am I getting properly OSD is unable to res...
Igor Fedotov
01:01 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
Test Allocator=stipid and hybrid respectively.
When creating an OSD, only data is used. Block-wal and block-db are n...
chunsong feng
11:40 AM Bug #47874: Allocation error even though the block has 50 GB free
Hi Fabian,
Bluestore is unable to allocate space for additional BlueFS data. This line:
2020-10-15 18:36:36.526 7f8...
Igor Fedotov
11:24 AM Bug #45519: OSD asserts during block allocation for BlueFS
You might want to use free-dump command to inspect actual free extents layout:
ceph-bluestore-tool --path <> free-...
Igor Fedotov

10/19/2020

08:33 AM Backport #47895 (Rejected): mimic: Compressed blobs lack checksums
Nathan Cutler
08:33 AM Backport #47894 (Resolved): nautilus: Compressed blobs lack checksums
https://github.com/ceph/ceph/pull/37843 Nathan Cutler
08:33 AM Backport #47893 (Rejected): luminous: Compressed blobs lack checksums
Nathan Cutler
08:33 AM Backport #47892 (Resolved): octopus: Compressed blobs lack checksums
https://github.com/ceph/ceph/pull/37861 Nathan Cutler
02:16 AM Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
The problem recurs within 10 minutes after a test is performed on more than 32 KB concurrent random writes. chunsong feng
01:38 AM Bug #47883 (Resolved): bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)
2020-10-17T18:02:17.658+0800 ffff7eaac980 1 bluefs _allocate failed to allocate 0x4d0000 on bdev 1, free 0x19c210800... chunsong feng
01:31 AM Bug #47243: bluefs _allocate failed then assert
2020-10-17T18:02:17.658+0800 ffff7eaac980 1 bluefs _allocate failed to allocate 0x4d0000 on bdev 1, free 0x19c210800... chunsong feng

10/18/2020

03:14 PM Bug #45519: OSD asserts during block allocation for BlueFS
I just redeployed an OSD that was created at 2020-10-16 16:35:49.954951487 +0000, it is currently Sun Oct 18 15:13:07... Mohammed Naser

10/16/2020

04:19 PM Bug #45519: OSD asserts during block allocation for BlueFS
> Are you using v14.2.11 from the beginning or OSDs suffering from high fragmentation were deployed (and used) with t... Mohammed Naser
03:05 PM Bug #45519: OSD asserts during block allocation for BlueFS
Mohammed Naser wrote:
> Thanks for responding, this is actually running Nautilus with the default allocator (so `hyb...
Igor Fedotov
02:35 PM Bug #45519: OSD asserts during block allocation for BlueFS
Thanks for responding, this is actually running Nautilus with the default allocator (so `hybrid`). Perhaps we should... Mohammed Naser
04:11 AM Bug #47740: OSD crash when increase pg_num
Igor Fedotov wrote:
> Definitely it's not expected to crash but from my experience it's a bad practice to put DB dat...
玮文 胡

10/15/2020

06:46 PM Bug #47874 (Need More Info): Allocation error even though the block has 50 GB free
We started seeing our ceph cluster going down when OSD hit 20GB usage. Currently all out OSD have 70 GB disks attache... Fabian Faßbender
02:59 PM Bug #46658 (Rejected): Ceph-OSD nautilus/octopus memory leak ?
Igor Fedotov
02:59 PM Bug #46658: Ceph-OSD nautilus/octopus memory leak ?
No problem. Igor Fedotov
02:51 PM Bug #46658: Ceph-OSD nautilus/octopus memory leak ?
Thanks for your help :) Christophe Hauquiert
02:46 PM Bug #46658: Ceph-OSD nautilus/octopus memory leak ?
Igor Fedotov wrote:
> Christophe Hauquiert wrote:
> > I just realized that osd_memory_target was setted to ~20GB (a...
Christophe Hauquiert
02:13 PM Bug #46658: Ceph-OSD nautilus/octopus memory leak ?
Christophe Hauquiert wrote:
> I just realized that osd_memory_target was setted to ~20GB (as Adam Kupczyk already no...
Igor Fedotov
02:32 PM Bug #45519: OSD asserts during block allocation for BlueFS
Mohammed Naser wrote:
> I did a whole bunch of digging today and found out that for some reason, there is a non-triv...
Igor Fedotov
01:59 PM Bug #47740: OSD crash when increase pg_num
玮文 胡 wrote:
> Igor Fedotov wrote:
> > 1) Are osd_op_thread's timeouts are observed in OSD log after its restart? If...
Igor Fedotov
06:28 AM Bug #47661: Cannot allocate memory appears when using io_uring osd
The kernel panic problem can be solved by upgrading to 5.4.0-49.
But ceph osd will crash abnormally after running fo...
Jiang Yu
06:22 AM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
> Besides recently we switched backed to direct IO for bluefs, see https://github.com/ceph/ceph/pull/34297
> Likely ...
Eric Petit

10/14/2020

02:26 PM Bug #47475 (Pending Backport): Compressed blobs lack checksums
Igor Fedotov
01:50 AM Bug #47661: Cannot allocate memory appears when using io_uring osd
@
[Tue Oct 13 10:26:03 2020] pps pps0: new PPS source ptp2
[Tue Oct 13 10:26:03 2020] ixgbe 0000:04:00.0: registere...
Jiang Yu

10/13/2020

04:26 PM Bug #47740: OSD crash when increase pg_num
Igor Fedotov wrote:
> 1) Are osd_op_thread's timeouts are observed in OSD log after its restart? If so - could you p...
玮文 胡

10/10/2020

11:41 AM Bug #47740: OSD crash when increase pg_num
Neha Ojha wrote:
> Igor, does this look like one of the other _trim_to crashes we had seen in our teuthology runs an...
Igor Fedotov

10/09/2020

09:44 PM Bug #47740: OSD crash when increase pg_num
Igor, does this look like one of the other _trim_to crashes we had seen in our teuthology runs and are now fixed by y... Neha Ojha
 

Also available in: Atom