Project

General

Profile

Activity

From 09/05/2022 to 10/04/2022

10/04/2022

05:39 PM Bug #57672: SSD OSD won't start after high framentation score!
Sure. https://tracker.ceph.com/issues/57762
Kevin Fox
03:51 PM Bug #57672: SSD OSD won't start after high framentation score!
Kevin Fox wrote:
> For the record, ssd/ssd or hdd/hdd seems to work fine even though the documentation makes it soun...
Igor Fedotov
03:30 PM Bug #57672: SSD OSD won't start after high framentation score!
For the record, ssd/ssd or hdd/hdd seems to work fine even though the documentation makes it sound like it doesn't.
...
Kevin Fox
05:38 PM Documentation #57762 (New): documentation about same hardware class wrong
The documentation in at least one place:
https://docs.ceph.com/en/pacific/man/8/ceph-bluestore-tool/ bluefs-bdev-mig...
Kevin Fox

09/30/2022

02:34 AM Bug #57672: SSD OSD won't start after high framentation score!
So, it looks like moving the db to a db volume works with ceph-bluestore-tool bluefs-bdev-migrate. So most of the way... Kevin Fox

09/29/2022

09:01 PM Bug #57672: SSD OSD won't start after high framentation score!
Igor Fedotov wrote:
> The issue with that fragmentation score is that there is no strong math behind it. Originally ...
Kevin Fox
08:39 PM Bug #57672: SSD OSD won't start after high framentation score!
Igor Fedotov wrote:
> If this is still available - may I ask you to run ceph-bluestore-tool's free-dump command and ...
Kevin Fox
08:38 PM Bug #57672: SSD OSD won't start after high framentation score!
Igor Fedotov wrote:
> This is totally irrelevant - these are warnings showing legacy formatted omaps for this OSD....
Kevin Fox
08:15 PM Bug #57672: SSD OSD won't start after high framentation score!
Kevin Fox wrote:
> Random other thing... during repairs, I see:
> [root@pc20 ceph]# ceph-bluestore-tool --log-lev...
Igor Fedotov
08:10 PM Bug #57672: SSD OSD won't start after high framentation score!
Kevin Fox wrote:
> Just saw this again, on a small scale. just one of the osds that I had moved the db off to its ow...
Igor Fedotov
08:08 PM Bug #57672: SSD OSD won't start after high framentation score!
Kevin Fox wrote:
> One note I see in the rook documentation:
> "Notably, ceph-volume will not use a device of the s...
Igor Fedotov
08:06 PM Bug #57672: SSD OSD won't start after high framentation score!
Kevin Fox wrote:
> Hi Igor,
>
>
> Does the fragmentation score alone show how fragmented things are? I still se...
Igor Fedotov
04:30 PM Bug #57672: SSD OSD won't start after high framentation score!
Random other thing... during repairs, I see:
[root@pc20 ceph]# ceph-bluestore-tool --log-level 30 --path /var/lib/...
Kevin Fox
04:19 PM Bug #57672: SSD OSD won't start after high framentation score!
Got some more info... during the outage, I had 12 drives that wouldn't recover by moving off the db. looking back thr... Kevin Fox
03:51 PM Bug #57672: SSD OSD won't start after high framentation score!
Just saw this again, on a small scale. just one of the osds that I had moved the db off to its own volume, just enter... Kevin Fox

09/28/2022

08:00 PM Bug #57672: SSD OSD won't start after high framentation score!
One note I see in the rook documentation:
"Notably, ceph-volume will not use a device of the same device class (HDD,...
Kevin Fox
07:32 PM Bug #57672: SSD OSD won't start after high framentation score!
Hi Igor,
Thanks for the details. That makes sense and helps me feel much more comfortable that the hack I put in p...
Kevin Fox
09:07 AM Bug #57672: SSD OSD won't start after high framentation score!
Kevin Fox wrote:
> I can find no evidence that the cluster got full. I've seen it occasionally go up a little past 8...
Igor Fedotov
09:21 AM Backport #57688 (In Progress): quincy: unable to read osd superblock on AArch64 with page size 64K
https://github.com/ceph/ceph/pull/48279 Igor Fedotov
09:19 AM Backport #57687 (In Progress): pacific: unable to read osd superblock on AArch64 with page size 64K
https://github.com/ceph/ceph/pull/48278 Igor Fedotov

09/27/2022

06:04 PM Bug #57672: SSD OSD won't start after high framentation score!
Thank you, Igor. I think Kevin answered as much as the background he had from the issue. Vikhyat Umrao
04:34 PM Backport #57688 (Resolved): quincy: unable to read osd superblock on AArch64 with page size 64K
Backport Bot
04:34 PM Backport #57687 (Resolved): pacific: unable to read osd superblock on AArch64 with page size 64K
Backport Bot
04:18 PM Bug #57537 (Pending Backport): unable to read osd superblock on AArch64 with page size 64K
Igor Fedotov

09/26/2022

05:37 PM Bug #57672: SSD OSD won't start after high framentation score!
I can find no evidence that the cluster got full. I've seen it occasionally go up a little past 85 (usually if I'm re... Kevin Fox
05:34 PM Bug #57672: SSD OSD won't start after high framentation score!
I switched one of the pods to /bin/bash and tried various things to fsck the osds. Every time it hit the point where ... Kevin Fox
05:06 PM Bug #57672: SSD OSD won't start after high framentation score!
The cluster involved was provisioned with ceph:v14.2.4-20190917 Oct 2019. Its been running nautilus until last month.... Kevin Fox
09:12 AM Bug #57672 (Need More Info): SSD OSD won't start after high framentation score!
Igor Fedotov
09:12 AM Bug #57672: SSD OSD won't start after high framentation score!
@Vikhyat - what Ceph release are we talking about? Igor Fedotov
08:46 AM Bug #57292 (Fix Under Review): Failed to start OSD when upgrading from nautilus to pacific with b...
Igor Fedotov

09/23/2022

06:10 PM Bug #57672: SSD OSD won't start after high framentation score!
User question:... Vikhyat Umrao
06:09 PM Bug #57672: SSD OSD won't start after high framentation score!
The user was not able to capture any debug data because it hit the cluster so hard that it went down.
Vikhyat Umrao
06:07 PM Bug #57672 (Duplicate): SSD OSD won't start after high framentation score!
One of the rook upstream users reported this issue in the upstream rook channel!... Vikhyat Umrao

09/21/2022

12:06 AM Bug #57507: rocksdb crushed due to checksum mismatch
Thank you very much Igor! I'll try this workaround and will update to v16.2.11 or later. Satoru Takeuchi

09/20/2022

09:51 PM Bug #52464: FAILED ceph_assert(current_shard->second->valid())
Also reported via telemetry:
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?var-sig_v2=245d...
Yaarit Hatuka
05:21 PM Bug #52464 (New): FAILED ceph_assert(current_shard->second->valid())
It happened again: https://github.com/rook/rook/issues/10936. For logs see: https://github.com/rook/rook/issues/10936... Radoslaw Zarzynski

09/19/2022

04:43 PM Backport #57604 (Resolved): quincy: Log message is little confusing
Backport Bot
04:43 PM Backport #57603 (Resolved): pacific: Log message is little confusing
Backport Bot
04:34 PM Bug #57271 (Pending Backport): Log message is little confusing
Vikhyat Umrao
03:34 PM Bug #57271: Log message is little confusing
https://github.com/ceph/ceph/pull/47774 merged Yuri Weinstein
11:58 AM Backport #57028 (In Progress): quincy: Bluefs might put an orpan op_update record in the log
https://github.com/ceph/ceph/pull/48171 Igor Fedotov
11:54 AM Backport #55301 (In Progress): quincy: Hybrid allocator might return duplicate extents when perfo...
https://github.com/ceph/ceph/pull/48170 Igor Fedotov
11:49 AM Backport #57458 (In Progress): quincy: bluefs fsync doesn't respect file truncate
https://github.com/ceph/ceph/pull/48169 Igor Fedotov
11:03 AM Backport #57027 (In Progress): pacific: Bluefs might put an orpan op_update record in the log
https://github.com/ceph/ceph/pull/48168 Igor Fedotov
10:47 AM Backport #55302 (Rejected): octopus: Hybrid allocator might return duplicate extents when perform...
Octopus is at EOL Igor Fedotov
10:47 AM Backport #55300 (In Progress): pacific: Hybrid allocator might return duplicate extents when perf...
https://github.com/ceph/ceph/pull/48167 Igor Fedotov
09:11 AM Backport #55518 (Resolved): pacific: test_cls_rbd.sh: multiple TestClsRbd failures during upgrade...
Igor Fedotov
09:10 AM Bug #54465 (Resolved): BlueFS broken sync compaction mode
Igor Fedotov
09:10 AM Backport #55024 (Resolved): quincy: BlueFS broken sync compaction mode
Igor Fedotov
09:09 AM Bug #54248 (Resolved): BlueFS improperly tracks vselector sizes in _flush_special()
Igor Fedotov
09:09 AM Backport #54318 (Resolved): quincy: BlueFS improperly tracks vselector sizes in _flush_special()
Igor Fedotov
09:08 AM Bug #53907 (Resolved): BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)
Igor Fedotov
09:03 AM Backport #54209 (Resolved): quincy: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)
Igor Fedotov

09/16/2022

07:00 PM Bug #53184: failed to start new osd due to SIGSEGV in BlueStore::read()
Hey Igor, do we have any more insight on this issue? Neha Ojha

09/15/2022

10:00 AM Bug #57537: unable to read osd superblock on AArch64 with page size 64K
Well, I can see your PR hence the root cause is apparently more or less clear. So no much sense in the above question... Igor Fedotov
09:43 AM Bug #57537 (Fix Under Review): unable to read osd superblock on AArch64 with page size 64K
Igor Fedotov
09:29 AM Bug #57537: unable to read osd superblock on AArch64 with page size 64K
Hi @luo - could you please answer the following questions:
- what Ceph release are you using?
- Is this a container...
Igor Fedotov
09:11 AM Bug #57507 (Triaged): rocksdb crushed due to checksum mismatch
Igor Fedotov
09:10 AM Bug #57507: rocksdb crushed due to checksum mismatch
I think you might be facing https://tracker.ceph.com/issues/54547
As far as I can see the preconditions for this bug...
Igor Fedotov

09/14/2022

11:40 AM Bug #57537 (Resolved): unable to read osd superblock on AArch64 with page size 64K
On aarch64 with page size 64k, it occurs occasionally "OSD::init(): unable to read osd superblock" when deploying osd... Rixin Luo

09/13/2022

01:20 AM Bug #57507 (Duplicate): rocksdb crushed due to checksum mismatch
OSD crushed with the following log.
```
2022-09-09 03:03:05 debug 2022-09-08T18:03:05.665+0000 7f236763e080 1 os...
Satoru Takeuchi

09/12/2022

06:52 AM Bug #53266 (Resolved): default osd_fast_shutdown=true would cause NCB to recover allocation map o...
Konstantin Shalygin
06:50 AM Backport #54523 (Resolved): quincy: default osd_fast_shutdown=true would cause NCB to recover all...
Konstantin Shalygin

09/08/2022

05:25 PM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Wow, thank you, this is amazing. And great write-up in that kernel commit! Paul Emmerich
09:06 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
Trent, thanks a lot for the update! Mind reposting this information on ceph-users mailing list? Or I can do that myse... Igor Fedotov
06:54 AM Bug #22464: Bluestore: many checksum errors, always 0x6706be76 (which matches a zero block)
At Canonical we tracked down and solved the cause of this bug. Credit to my colleague Mauricio Faria de Oliveira for ... Trent Lloyd
08:21 AM Bug #56456: rook-ceph-v1.9.5: ceph-osd crash randomly
Igor,
We can't know how large is the peak since we removed rook osd limits.
Without limit, the biggest OSD is takin...
Sébastien Bernard

09/07/2022

04:28 PM Bug #56456: rook-ceph-v1.9.5: ceph-osd crash randomly
Hi Sebastien,
does the above mean that OSDs are using more that 15GB RAM at some point?
How large is the peak t...
Igor Fedotov
11:54 AM Backport #57458 (Resolved): quincy: bluefs fsync doesn't respect file truncate
Backport Bot
11:52 AM Bug #55307 (Pending Backport): bluefs fsync doesn't respect file truncate
Igor Fedotov
 

Also available in: Atom