Bug #54314
OSD slow ops during Windows VM upgrades
0%
Description
When our Windows team triggers an upgrade to several 10s of Windows VMs concurrently, this triggers thousands of slow ops on our OSDs which in normal cases are performant and without slow ops.
Cluster is running 15.2.15. HDD block, SSD block.db. HDDs have a Media Cache enabled (write through cache).
I captured one osd like this just now. I posted logs to ceph-post-file: 04dd95b1-6227-4423-b5e0-e3a769903d19
There you have `ops` (the ongoing ops with slow reqs) the osd log with debug_bluestore=20 for around 10s, then the ops around 30s later (`ops.after`).
For example, lots of ops like this:
"description": "osd_op(client.769991852.0:52672435 4.111d 4.b51a311d (undecoded) ondisk+write+known_if_redirected e3973877)",
"age": 143.48358313400001
We're already aware that Windows VMs send sub 4k writes -- triggering a read-modify-write cycle on the 4096b OSDs. We are pushing out a change so the Windows VMs use emulated 4k sectors. Does this explain all of the performance issue?
In `ops` file I posted, there are only three write ops (but lots of repops where I don't see the size). One of the writes is 3k:
"description": "osd_op(client.769991852.0:52672421 4.111d 4:b88c58ad:::rbd_data.e5584f61a3e0e9.0000000000001362:head [write 585728~3072 in=3072b] snapc 1d3ca=[1d3ca,1d382] ondisk+write+known_if_redirected e3973877)",
"description": "osd_op(client.678845326.0:134585512 4.1ced 4:b73ec2ef:::rbd_data.6ff73a60bf4abc.0000000000001823:head [write 630784~409088 in=409088b] snapc 0=[] ondisk+write+known_if_redirected e3973877)",
"description": "osd_op(client.252924738.0:665572961 4.1ced 4:b73b9b4c:::rbd_data.67eac959e2bbdf.0000000000000309:head [write 2408448~8192 in=8192b] snapc 0=[] ondisk+write+known_if_redirected e3973877)",
The osd debug_bluestore log shows plenty of <4k writes, though.
History
#1 Updated by Ruben Kerkhof about 1 year ago
Hi Dan,
Which block driver are you using? virtio-blk, virtio-scsi or sata? Are you using rbd cache? Which Windows version?
I haven't seen what you're describing but I was planning to do some Qemu tracing of Windows vm behaviour so perhaps that can help.