Project

General

Profile

Bug #22005

result -108 xferred 2000, blk_update_request: I/O error

Added by Alexey Zakurin over 6 years ago. Updated over 6 years ago.

Status:
Need More Info
Priority:
Low
Assignee:
Category:
rbd
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Hello, community.

RBD linux kernel client hangs, when cluster in rebalance process.
Cluster has 3 client nodes, that map RBD images from 4 OSD nodes.
It may happens with any node, totally randomed.
While cluster is healthy, problem is not watched.
I take this from syslog:

Nov  1 16:48:19 node2 kernel: [4844818.817351] rbd: rbd43: failed to acquire lock: -108
Nov  1 16:48:19 node2 kernel: [4844818.817879] rbd: rbd43: failed to acquire lock: -108
Nov  1 16:48:19 node2 kernel: [4844818.818160] rbd: rbd43: failed to acquire lock: -108
Nov  1 16:48:20 node2 kernel: [4844818.851384] rbd: rbd31: encountered watch error: -107
Nov  1 16:48:20 node2 kernel: [4844818.851725] rbd: rbd31: failed to unwatch: -108
Nov  1 16:48:20 node2 kernel: [4844818.880953] rbd: rbd28: write 6000 at 108f32000 (332000)
Nov  1 16:48:20 node2 kernel: [4844818.880957] rbd: rbd28:   result -108 xferred 6000
Nov  1 16:48:20 node2 kernel: [4844818.880961] blk_update_request: I/O error, dev rbd28, sector 8681872
Nov  1 16:48:20 node2 kernel: [4844818.881700] rbd: rbd7: write 2000 at 1088af000 (af000)
Nov  1 16:48:20 node2 kernel: [4844818.881705] rbd: rbd7:   result -108 xferred 2000
Nov  1 16:48:20 node2 kernel: [4844818.881708] blk_update_request: I/O error, dev rbd7, sector 8668536
Nov  1 16:48:20 node2 kernel: [4844818.899006] rbd: rbd43: encountered watch error: -107
Nov  1 16:48:20 node2 kernel: [4844818.899347] rbd: rbd43: failed to unwatch: -108
Nov  1 16:48:20 node2 kernel: [4844818.899727] rbd: rbd43: failed to reregister watch: -108
Nov  1 16:48:20 node2 kernel: [4844818.899760] rbd: rbd43: write 2000 at 0 result -108
Nov  1 16:48:20 node2 kernel: [4844818.899764] blk_update_request: I/O error, dev rbd43, sector 0
Nov  1 16:48:20 node2 kernel: [4844818.899839] rbd: rbd43: write 2000 at 288000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.899841] blk_update_request: I/O error, dev rbd43, sector 5184
Nov  1 16:48:20 node2 kernel: [4844818.899898] rbd: rbd43: write 1000 at 4cf000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.899900] blk_update_request: I/O error, dev rbd43, sector 9848
Nov  1 16:48:20 node2 kernel: [4844818.899955] rbd: rbd43: write 1000 at 44e78000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.899957] blk_update_request: I/O error, dev rbd43, sector 2257856
Nov  1 16:48:20 node2 kernel: [4844818.900024] rbd: rbd43: write 1000 at 200001000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900026] blk_update_request: I/O error, dev rbd43, sector 16777224
Nov  1 16:48:20 node2 kernel: [4844818.900085] rbd: rbd43: write 1000 at 200010000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900087] blk_update_request: I/O error, dev rbd43, sector 16777344
Nov  1 16:48:20 node2 kernel: [4844818.900154] rbd: rbd43: write 2000 at 20007a000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900156] blk_update_request: I/O error, dev rbd43, sector 16778192
Nov  1 16:48:20 node2 kernel: [4844818.900223] rbd: rbd43: write 1000 at 20009b000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900226] blk_update_request: I/O error, dev rbd43, sector 16778456
Nov  1 16:48:20 node2 kernel: [4844818.900294] rbd: rbd43: write 1000 at 20009f000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900304] rbd: rbd43: write 1000 at 2000a2000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900313] rbd: rbd43: write 2000 at 2000b6000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900322] rbd: rbd43: write 1000 at 2000c0000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900337] rbd: rbd43: write 1000 at 2000fe000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900350] rbd: rbd43: write 1000 at 200118000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900363] rbd: rbd43: write 1000 at 202103000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900375] rbd: rbd43: write 1000 at 20800d000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.900386] rbd: rbd43: write 1000 at 208209000 result -108
Nov  1 16:48:20 node2 kernel: [4844818.907424] rbd: rbd22: write 2000 at 1096cd000 (2cd000)
Nov  1 16:48:20 node2 kernel: [4844818.907427] rbd: rbd22:   result -108 xferred 2000
Nov  1 16:48:20 node2 kernel: [4844818.908073] rbd: rbd22: write 1000 at 108000000 (0)
Nov  1 16:48:20 node2 kernel: [4844818.908077] rbd: rbd22:   result -108 xferred 1000

Logs in monitor nodes don't take any warnings on this moment.

I watch this problem on Kraken. Now, cluster updated to Luminous, but yesterday, problem was watched again.

Kernel versions on all nodes - 4.9.0-3-amd64.

History

#1 Updated by Ilya Dryomov over 6 years ago

  • Assignee set to Ilya Dryomov

Hi Alexey,

This snippet suggests that the ceph client instance on node2 got blacklisted.

Can you provide the entire dmesg?

Can you describe your setup in more detail? How do you distribute images between those three nodes?

#2 Updated by Ilya Dryomov over 6 years ago

  • Status changed from New to Need More Info
  • Priority changed from Normal to Low

Also available in: Atom PDF