Project

General

Profile

Actions

Bug #53227

open

osdc: bh split will lost error number, maybe cause client crash

Added by wendong jia over 2 years ago. Updated over 1 year ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
ObjectCacher
Target version:
% Done:

100%

Source:
Community (dev)
Tags:
backport_processed
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

GDB:

#12 0x00007f72626352ab in ObjectCacher::C_RetryRead::finish (this=0x7f71dc001f80, r=<optimized out>) at /var/ws/ivan/nautilus-ceph-14.2.5/src/osdc/ObjectCacher.cc:82
82            r = oc->_readx(rd, oset, onfinish, false, &trace);
(gdb) p rd
$6 = (ObjectCacher::OSDRead *) 0x7f6a1a8ac7b0

(gdb) pmap hits loff_t BufferHead*
elem[0].left: $60 = 0
elem[0].right: $61 = (ObjectCacher::BufferHead *) 0x7f6a38fbaf60
elem[1].left: $62 = 131072
elem[1].right: $63 = (ObjectCacher::BufferHead *) 0x7f6a3d80fce0
elem[2].left: $64 = 262144
elem[2].right: $65 = (ObjectCacher::BufferHead *) 0x7f6a4e6e86c0
elem[3].left: $66 = 393216
elem[3].right: $67 = (ObjectCacher::BufferHead *) 0x7f6a4a43c370
elem[4].left: $68 = 524288
elem[4].right: $69 = (ObjectCacher::BufferHead *) 0x7f6a3d80f940
elem[5].left: $70 = 655360
elem[5].right: $71 = (ObjectCacher::BufferHead *) 0x7f6a571cfe70
elem[6].left: $72 = 786432
elem[6].right: $73 = (ObjectCacher::BufferHead *) 0x7f6a274e5570
elem[7].left: $74 = 917504
elem[7].right: $75 = (ObjectCacher::BufferHead *) 0x7f6a4ce46510
elem[8].left: $76 = 1048576
elem[8].right: $77 = (ObjectCacher::BufferHead *) 0x7f6a216f1a00
elem[9].left: $78 = 1179648
elem[9].right: $79 = (ObjectCacher::BufferHead *) 0x7f6a57fffcb0
elem[10].left: $80 = 1441792
elem[10].right: $81 = (ObjectCacher::BufferHead *) 0x7f6a33e64180
elem[11].left: $82 = 1572864
elem[11].right: $83 = (ObjectCacher::BufferHead *) 0x7f6a313543b0
elem[12].left: $84 = 1703936
elem[12].right: $85 = (ObjectCacher::BufferHead *) 0x7f6a19960540
elem[13].left: $86 = 1835008
elem[13].right: $87 = (ObjectCacher::BufferHead *) 0x7f6a2a7b05d0
elem[14].left: $88 = 1966080
elem[14].right: $89 = (ObjectCacher::BufferHead *) 0x7f6a47ac5ce0
elem[15].left: $90 = 2097152
elem[15].right: $91 = (ObjectCacher::BufferHead *) 0x7f6a37b8c840
elem[16].left: $92 = 2228224
elem[16].right: $93 = (ObjectCacher::BufferHead *) 0x7f6a5464e3d0
elem[17].left: $94 = 2490368
elem[17].right: $95 = (ObjectCacher::BufferHead *) 0x7f6f923edca0
elem[18].left: $96 = 2752512
elem[18].right: $97 = (ObjectCacher::BufferHead *) 0x7f6a24ad4300
elem[19].left: $98 = 2883584
elem[19].right: $99 = (ObjectCacher::BufferHead *) 0x7f6a351c37d0
Map size = 20

(gdb) p (*(BufferHead *)0x7f6a24ad4300).state
$310 = 5   //STATE_TX
(gdb) p (*(BufferHead *)0x7f6a24ad4300).error
$311 = 0
(gdb) p (*(BufferHead *)0x7f6a351c37d0).state  
$312 = 6  //STATE_ERROR
(gdb) p (*(BufferHead *)0x7f6a351c37d0).error
$313 = 0  //should be eio 5

in conclusion, bh 0x7f6a351c37d0 have STATE_ERROR, but error number is 0,
it's previous bh have STATE_TX, this write triggered bh split.

How did the problem arise:
when readahead, something make osd reply eio, the error number is 5,
eio will be assigned to bh->error, then will read_cond.Signal.
But there is a writing that triggers the split of bh.
in split will new a BufferHead and and use error(0) to initialize, the eio will lost.
When read is awakened, Will continue to branch !error, client will crash.

How to resolve:
add function set_error in BufferHead, when split set_error like set_state.


Related issues 2 (0 open2 closed)

Copied to Ceph - Backport #53703: pacific: osdc: bh split will lost error number, maybe cause client crashResolvedCory SnyderActions
Copied to Ceph - Backport #53704: octopus: osdc: bh split will lost error number, maybe cause client crashResolvedCory SnyderActions
Actions #2

Updated by wendong jia over 2 years ago

my ceph verison is nautilus 14.2.5

Actions #3

Updated by Patrick Donnelly over 2 years ago

  • Subject changed from bh split will lost error number, maybe cause client crash to osdc: bh split will lost error number, maybe cause client crash
  • Status changed from New to Fix Under Review
  • Assignee set to wendong jia
  • Target version set to v17.0.0
  • Source set to Community (dev)
  • Backport set to pacific,octopus
  • Pull request ID set to 43881
Actions #4

Updated by Patrick Donnelly over 2 years ago

  • Description updated (diff)
Actions #5

Updated by Patrick Donnelly over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Backport Bot over 2 years ago

  • Copied to Backport #53703: pacific: osdc: bh split will lost error number, maybe cause client crash added
Actions #7

Updated by Backport Bot over 2 years ago

  • Copied to Backport #53704: octopus: osdc: bh split will lost error number, maybe cause client crash added
Actions #8

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions

Also available in: Atom PDF