Activity
From 10/25/2020 to 11/23/2020
11/23/2020
- 06:32 PM Bug #48036: bluefs corrupted in a OSD
- @Satoru,
could you please reproduce the issue once again, now with both debug_bluefs set to 20 and debug_bluestore s... - 06:06 PM Bug #48036: bluefs corrupted in a OSD
- Satoru Takeuchi wrote:
> @Igor
>
> Do you have any progress?
Hi Satoru,
sorry for a long response.
At the se... - 04:13 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- v14.2.11 has got hybrid allocator enabled but bluestore_volume_selection_policy was still at original there. Hence th...
- 03:00 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
I got the same issue in nautilus 14.2.11
it happened four times on different nodes..
- 02:49 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Igor Fedotov wrote:
> Thanks everybody for updates. Yeah I understand all the complexities for the debugging this so... - 02:06 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Thanks everybody for updates. Yeah I understand all the complexities for the debugging this sort of issues in a produ...
- 01:55 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- I got the same issue in nautilus 14.2.14
it happened four times on different nodes.. - 01:01 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- I got the same issue in nautilus 14.2.14
Full trace: https://paste.ubuntu.com/p/4KHcCG9YQx/ - 12:49 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Hi Igor,
thanks for answering.
The thing is:
- Issue isn't reproduceable
- Happens on Production Systems.
... - 12:45 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- Meanwhile I see no way to troubleshoot this unless one is able to repro the issue with debug-bdev set to 20.
- 12:43 PM Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))
- The following patch once merged [and backported] will provide more insight on the issue's root cause.
https://gith... - 01:46 PM Documentation #23443 (Resolved): doc: object -> file -> disk is wrong for bluestore
11/22/2020
11/20/2020
- 02:40 AM Bug #48036: bluefs corrupted in a OSD
- @Igor
Do you have any progress?
11/19/2020
- 01:30 PM Bug #48070: Wrong bluefs db usage value (doubled) returned by `perf dump` when option `bluestore_...
- ``max_total_wal_size` is discussed here: https://github.com/ceph/ceph/pull/35277
- 01:29 PM Bug #48070: Wrong bluefs db usage value (doubled) returned by `perf dump` when option `bluestore_...
- As it turned out, this was caused by small size of WAL (`bluestore block wal size`) and the fact I did not set `max_t...
- 07:07 AM Documentation #23443: doc: object -> file -> disk is wrong for bluestore
- https://github.com/ceph/ceph/pull/38181 entered
- 04:16 AM Documentation #24075: Bluestore and Bluefs Config Reference
- That document currently lists a number of BlueStore configuration options. Emre,is there anything specific that you ...
- 03:03 AM Fix #48288 (Need More Info): test/objectstore: allocate function may return -ENOSPC
- test/objectstore: allocate function may return -ENOSPC
11/18/2020
- 09:53 PM Backport #48282 (Resolved): nautilus: osd: fix bluestore bitmap allocator
- https://github.com/ceph/ceph/pull/39708
- 09:53 PM Backport #48281 (Resolved): octopus: osd: fix bluestore bitmap allocator
- https://github.com/ceph/ceph/pull/38430
- 01:45 PM Bug #48276 (Duplicate): OSD Crash with ceph_assert(is_valid_io(off, len))
- Hello,
Last night one OSD in a 3-node Cluster crashed with the attached Crashreport. I can pretty much rule out Ha... - 09:07 AM Fix #48272 (Fix Under Review): osd: fix bluestore avl allocator
- 08:21 AM Fix #48272 (Resolved): osd: fix bluestore avl allocator
- In _block_picker, the first rs == t.begin() can also tell we've searched the whole tree
11/17/2020
- 01:32 AM Bug #48256 (Can't reproduce): Many4KWritesNoCSumTest fails on nautilus [ FAILED ] ObjectStore/S...
- /a/bhubbard-2020-11-16_08:11:06-rados-wip-nautilus-badone-testing-2-distro-basic-smithi/5630856...
11/16/2020
- 02:05 AM Bug #38554: ObjectStore/StoreTestSpecificAUSize.TooManyBlobsTest/2 fail, Expected: (res_stat.allo...
- Once again on nautilus HEAD + 1 patch that is unrelated.
/ceph/teuthology-archive/bhubbard-2020-11-13_00:57:56-rad...
11/15/2020
11/13/2020
- 08:32 PM Bug #42928: ceph-bluestore-tool bluefs-bdev-new-db does not update lv tags
- Quick question, what if the bluestore device is not an lvm device ? All my devices were created with luminous with c...
11/12/2020
- 10:45 PM Bug #48218 (Can't reproduce): ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompressionAlgor...
- ...
- 05:40 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Eric Petit wrote:
> > @Eric How is that @bluefs_buffered_io = true@ working for you? We are considering to re-enabl... - 05:14 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- > @Eric How is that @bluefs_buffered_io = true@ working for you? We are considering to re-enable it to help workarou...
- 05:21 PM Bug #48216: Spanning blobs list might have zombie blobs that aren't of use any more
- Related PR to detect leaked spanning blobs and fix with fsck: https://github.com/ceph/ceph/pull/38050
- 05:15 PM Bug #48216 (New): Spanning blobs list might have zombie blobs that aren't of use any more
- As reported at https://tracker.ceph.com/issues/40449#note-9 users are still facing "no blob id" assertion. Provided l...
- 05:17 PM Backport #40449: nautilus: "no available blob id" assertion might occur
- Nathan Cutler wrote:
> @Alexander - it might make sense to open a new bug in the Bluestore project for that, since t... - 02:57 PM Bug #48214 (Resolved): osd: fix bluestore bitmap allocator
- bluestore bitmap allocator calculate wrong last_pos with hint, for hint is a bdev physical addr.
11/11/2020
- 02:21 PM Backport #48194 (Resolved): octopus: bufferlist c_str() sometimes clears assignment to mempool
- https://github.com/ceph/ceph/pull/38429
- 02:21 PM Backport #48193 (Resolved): nautilus: bufferlist c_str() sometimes clears assignment to mempool
- https://github.com/ceph/ceph/pull/39651
- 02:18 PM Bug #46027 (Pending Backport): bufferlist c_str() sometimes clears assignment to mempool
- 09:36 AM Bug #46027: bufferlist c_str() sometimes clears assignment to mempool
- It could be beneficial to backport this fix to octypus and nautilus. I suspect that it fixes a rebuild leak to the bu...
- 12:27 PM Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletion
- Eric Petit wrote:
> > Besides recently we switched backed to direct IO for bluefs, see https://github.com/ceph/ceph/...
11/09/2020
- 06:42 AM Bug #48036: bluefs corrupted in a OSD
- > given you're able to reproduce the issue locally would you be able to collect OSD log (with debug-bluefs = 20) BEFO...
11/06/2020
- 04:39 PM Backport #40449: nautilus: "no available blob id" assertion might occur
- @Alexander - it might make sense to open a new bug in the Bluestore project for that, since this one is closed.
- 03:38 PM Backport #40449: nautilus: "no available blob id" assertion might occur
- Alexander Patrakov wrote:
> Nathan Cutler wrote:
> > This update was made using the script "backport-resolve-issue"... - 02:37 PM Backport #40449: nautilus: "no available blob id" assertion might occur
- Nathan Cutler wrote:
> This update was made using the script "backport-resolve-issue".
> backport PR https://github... - 12:36 PM Bug #48036: bluefs corrupted in a OSD
- > Unfortunately, after capturing logs, this problem hasn't been reproduced.
More precisely, with setting `debug-bl... - 11:14 AM Bug #48036: bluefs corrupted in a OSD
- > > Your initial analysis about ino 26 being removed and later reused is very helpful and indicative. Wondering if th...
11/05/2020
- 07:59 PM Backport #47894 (Resolved): nautilus: Compressed blobs lack checksums
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37843
m... - 05:19 PM Backport #47894: nautilus: Compressed blobs lack checksums
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37843
merged - 07:59 PM Backport #47707 (Resolved): nautilus: Potential race condition regression around new OSD flock()s
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37842
m... - 05:19 PM Backport #47707: nautilus: Potential race condition regression around new OSD flock()s
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37842
merged - 05:59 PM Backport #46008 (Resolved): nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/...
- This update was made using the script "backport-resolve-issue".
backport PR https://github.com/ceph/ceph/pull/37824
m... - 05:18 PM Backport #46008: nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentation/2 failed
- Nathan Cutler wrote:
> https://github.com/ceph/ceph/pull/37824
merged
11/04/2020
- 02:12 PM Backport #46194 (In Progress): nautilus: BlueFS replay log grows without end
- 11:38 AM Backport #46194: nautilus: BlueFS replay log grows without end
- Fixed by https://github.com/ceph/ceph/pull/37948
11/03/2020
- 11:26 AM Backport #48094 (Resolved): octopus: Hybrid allocator might segfault when fallback allocator is p...
- https://github.com/ceph/ceph/pull/38428
- 11:26 AM Backport #48093 (Resolved): nautilus: Hybrid allocator might segfault when fallback allocator is ...
- https://github.com/ceph/ceph/pull/38637
- 11:25 AM Backport #48092 (Rejected): mimic: Hybrid allocator might segfault when fallback allocator is pre...
- 02:09 AM Bug #48025: osd start up failed when osd superblock crc fail
- Igor Fedotov wrote:
> Bo Zhang wrote:
> > Another bug also appears on the same node.(https://tracker.ceph.com/issue...
11/02/2020
- 10:37 PM Backport #46194 (Need More Info): nautilus: BlueFS replay log grows without end
- first attempted backport - https://github.com/ceph/ceph/pull/37833 - was closed
- 03:34 PM Bug #48036: bluefs corrupted in a OSD
- @Satoru,
given you're able to reproduce the issue locally would you be able to collect OSD log (with debug-bluefs = ... - 12:50 PM Bug #48036: bluefs corrupted in a OSD
- As you suspect, `bluefs-bdev-expand` seems to be the first sensor. After running the reproducer with my custom Rook, ...
- 12:25 PM Bug #48036: bluefs corrupted in a OSD
- > Nevertheless I'm not completely sure whether bluefs-bdev-expand is a trigger for the issue or it's just the first "...
- 12:10 PM Bug #48036: bluefs corrupted in a OSD
- Hi Satoru,
thanks for the update.
Nevertheless I'm not completely sure whether bluefs-bdev-expand is a trigger for ... - 11:27 AM Bug #48036: bluefs corrupted in a OSD
- I succeeded to reproduce this problem in my Rook/Ceph cluster.
https://github.com/rook/rook/issues/6530
I gue... - 12:01 AM Bug #48036: bluefs corrupted in a OSD
- > As far as I can see you're attempting to expand DB volume, weren't you? Any rationale for that?
> Wasn't that a vo... - 02:18 PM Bug #48070 (New): Wrong bluefs db usage value (doubled) returned by `perf dump` when option `blue...
- During some tests we discovered that OSD db usage returned by `ceph daemon osd.num perf dump` tool is twice the real ...
- 12:26 PM Bug #47751 (Pending Backport): Hybrid allocator might segfault when fallback allocator is present
- 11:35 AM Bug #48025: osd start up failed when osd superblock crc fail
- Bo Zhang wrote:
> Another bug also appears on the same node.(https://tracker.ceph.com/issues/48061)
This another ... - 02:38 AM Bug #48025: osd start up failed when osd superblock crc fail
- Another bug also appears on the same node.(https://tracker.ceph.com/issues/48061)
- 02:09 AM Bug #48025: osd start up failed when osd superblock crc fail
- Igor Fedotov wrote:
> Bo Jang, I haven't got your last commends on disabled WAL, please elaborate.
>
> From Rocks... - 02:34 AM Bug #48061 (New): .sst block checksum mismatch
- 【verson】
14.2.8
【trigger operation 】
Under normal operation of the cluster, power down the equipment manually, and...
11/01/2020
- 05:12 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- Staring rebuild of osd.0.
- 06:56 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- Will start the recreation of osd.0 tomorrow (roughly 10 hours from now). Will check this bug report before doing so.
10/30/2020
- 10:56 AM Bug #48036: bluefs corrupted in a OSD
- Igor Fedotov wrote:
>
> Please set debug-bluestore & debug-bluefs to 20 and collect OSD startup log.
Never mind... - 10:41 AM Bug #48036: bluefs corrupted in a OSD
- As far as I can see you're attempting to expand DB volume, weren't you? Any rationale for that?
Wasn't that a volum... - 10:41 AM Bug #48036: bluefs corrupted in a OSD
- Both
https://tracker.ceph.com/issues/46886
and https://github.com/ceph/ceph/pull/36745
were following up the http... - 10:28 AM Bug #48025: osd start up failed when osd superblock crc fail
- Bo Jang, I haven't got your last commends on disabled WAL, please elaborate.
From RocksDB config line I don't see ... - 10:13 AM Bug #48047: osd: fix bluestore stupid allocator
- IMO bdev_block_size should be marked with FLAG_STARTUP (or even FLAG_CREATE) and hence protected from the modificatio...
- 03:33 AM Bug #48047 (Rejected): osd: fix bluestore stupid allocator
- In StupidAllocator::_choose_bin, it uses cct->_conf->bdev_block_size that can be changed in the allocator running,but...
10/29/2020
- 10:59 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- I'm planning to zap and rebuild the OSD (@osd.0@) this weekend. Please let me know if there's any information you'd ...
- 02:35 PM Bug #47330 (Fix Under Review): ceph-osd can't start when CURRENT file does not end with newline o...
- 02:33 PM Bug #47453 (Can't reproduce): checksum failures lead to assert on OSD shutdown in lab tests
- 02:26 PM Bug #47874 (Need More Info): Allocation error even though the block has 50 GB free
- 02:24 PM Bug #47883 (Need More Info): bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r...
- Still waiting for https://tracker.ceph.com/issues/47883#note-5
- 06:20 AM Bug #48036 (Closed): bluefs corrupted in a OSD
- I hit a problem that is very similar to the following issue/PR in v15.2.4.
upgrade/nautilus-x-master: bluefs mount... - 01:37 AM Bug #48025: osd start up failed when osd superblock crc fail
- Bo Zhang wrote:
> Igor Fedotov wrote:
> > Just in case - don't you have any custom settings for RocksDB, e.g. disab... - 01:36 AM Bug #48025: osd start up failed when osd superblock crc fail
- Igor Fedotov wrote:
> Just in case - don't you have any custom settings for RocksDB, e.g. disabled WAL?
NOT disab... - 01:30 AM Bug #48025: osd start up failed when osd superblock crc fail
- Igor Fedotov wrote:
> Just in case - don't you have any custom settings for RocksDB, e.g. disabled WAL?
Has been ...
10/28/2020
- 03:16 PM Bug #46490: osds crashing during deep-scrub
- It seems that the ceph-bluestore-tool repair only temporarily resolves the issue for us.
We ran the repair tool on e... - 10:51 AM Bug #48025: osd start up failed when osd superblock crc fail
- Just in case - don't you have any custom settings for RocksDB, e.g. disabled WAL?
- 09:52 AM Bug #48025 (New): osd start up failed when osd superblock crc fail
- 【verson】
14.2.8
【trigger operation 】
Under normal operation of the cluster, power down the equipment ... - 10:47 AM Bug #47985: When WAL is closed, osd cannot be restarted
- I doubt it will work this way as there would be no onode's metadata consistency guarantee any more... In your case su...
- 06:04 AM Bug #47985: When WAL is closed, osd cannot be restarted
- Hi Igor:
1. we've found disable WAL would reduce latency(measured by P99.9 latency),as we've tested rgw put worklo...
10/27/2020
- 02:42 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- > At the moment I don't see anything else to one can retrieve from this daemon. But suggest to keep it for additional...
- 09:48 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- Jamin Collins wrote:
> Nothing suspicious about it either. It's the DB device for all the OSDs on that host and is ... - 09:39 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- Jamin Collins wrote:
> Also, should I continue to keep the OSD in its failed state, is there any information that ca... - 09:37 AM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- Jamin Collins wrote:
> What about the error coinciding precisely with the log volume filling? Any chance that's the... - 11:11 AM Bug #47985: When WAL is closed, osd cannot be restarted
- In addition, I have also tried to deploy osd first, and then modify the bluestore_rocksdb_options in the configuratio...
- 11:03 AM Bug #47985: When WAL is closed, osd cannot be restarted
- In some application scenarios, I want to close wal in order to get lower latency and higher IOPS. After closing wal, ...
- 09:22 AM Bug #47985: When WAL is closed, osd cannot be restarted
- I haven't investigated this deeper but what's the rationale to disableWAL? Generally this introduces a breach to data...
- 01:40 AM Bug #47985: When WAL is closed, osd cannot be restarted
- The detailed steps to deploy the cluster are as follows:
1. deploy a cluster without osd
```
MON=1 OSD=0 MDS=0 MGR... - 09:47 AM Backport #47892 (In Progress): octopus: Compressed blobs lack checksums
- 09:46 AM Backport #47708 (In Progress): octopus: Potential race condition regression around new OSD flock()s
- 08:36 AM Backport #47894 (In Progress): nautilus: Compressed blobs lack checksums
- 08:35 AM Backport #47707 (In Progress): nautilus: Potential race condition regression around new OSD flock()s
- 08:27 AM Backport #47669 (Need More Info): nautilus: Some structs aren't bound to mempools properly
- Not immediately clear how to backport this.
- 08:07 AM Backport #46194 (In Progress): nautilus: BlueFS replay log grows without end
10/26/2020
- 10:31 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- Also, should I continue to keep the OSD in its failed state, is there any information that can be retrieved from it t...
- 09:57 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- Nothing suspicious about it either. It's the DB device for all the OSDs on that host and is the same as in the previ...
- 09:25 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- So there is no spillover to main(HDD) device. Hence the issue is rather not related to this device.
Anything suspi... - 05:39 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- > Additionally for OSDs running on the same host (I presume you haven't restarted them for a while, have you?) please...
- 05:16 PM Bug #48002: Compaction error: Corruption: block checksum mismatch:
- Could you please provide a report for ceph-bluestore-tool's bluefs-bdev-sizes command. Wondering if this OSD has any ...
- 04:42 PM Bug #48002 (New): Compaction error: Corruption: block checksum mismatch:
- I appear to have ran into https://tracker.ceph.com/issues/37282 again.
Same AMD based host... - 09:55 PM Backport #46599 (Resolved): octopus: Rescue procedure for extremely large bluefs log
- 09:51 PM Backport #46008 (In Progress): nautilus: ObjectStore/StoreTestSpecificAUSize.ExcessiveFragmentati...
- 12:04 PM Backport #47671 (In Progress): octopus: Hybrid allocator might cause duplicate admin socket comma...
- https://github.com/ceph/ceph/pull/37794
- 12:03 PM Backport #47672 (In Progress): nautilus: Hybrid allocator might cause duplicate admin socket comm...
- https://github.com/ceph/ceph/pull/37793
- 11:18 AM Bug #47985 (Need More Info): When WAL is closed, osd cannot be restarted
- It's not clear what did you mean under "close bluestore wal during deployment, and place bluestore wal/db and block o...
- 09:57 AM Bug #47985 (Need More Info): When WAL is closed, osd cannot be restarted
- Compile the master branch source code, use vstart to deploy the cluster, close bluestore wal during deployment, and p...
- 10:52 AM Bug #38272: "no available blob id" assertion might occur
- Nathan Cutler wrote:
> Jiang Yu wrote:
> > I encountered the same problem in ceph 12.2.2, but found that there is n... - 10:36 AM Bug #38272: "no available blob id" assertion might occur
- Jiang Yu wrote:
> I encountered the same problem in ceph 12.2.2, but found that there is no patch available in ceph ... - 01:24 AM Bug #38272: "no available blob id" assertion might occur
- Hello everyone,
I encountered the same problem in ceph 12.2.2, but found that there is no patch available in ceph 1...
Also available in: Atom