Bug #41601: oi(object_info_t).size does not match on disk size - RADOS - Ceph

Actions

Copy link

Bug #41601

closed

oi(object_info_t).size does not match on disk size

Added by 侯斌 over 4 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

xie xingguo

Category:

Target version:

Ceph - v15.0.0

% Done:

50%

Source:

Community (dev)

Tags:

Backport:

mimic,luminous,nautilus

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v14.2.1

ceph-qa-suite:

rest

Component(RADOS):

Pull request ID:

30085

Crash signature (v1):

Crash signature (v2):

Description

In our test environment(ceph version 14.2.1(nautilus) + replicated pool), we found scrub error like bug23701. We use rados_write interface in librados.h to write (offset = 1024, length = 0), and after scrub the corresponding pg, we found a scrub error. The log in osd belows:

2019-09-01 20:41:21.393 7f767dc21700 0 log_channel(cluster) log [DBG] : 1.63 scrub starts
2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 1 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch
2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 0 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch
2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 3 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch
2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 soid 1:c68704ec:::test_0001:head : failed to pick suitable object info
2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : scrub 1.63 1:c68704ec:::test_0001:head : on disk size (1024) does not match object info size (0) adjusted for ondisk to (0)
2019-09-01 20:41:21.434 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 scrub 4 errors

After analysis the code, we found the operation CEPH_OSD_OP_WRITE will truncate object size to offset when length is 0.

    if (op.extent.length == 0) {
      if (op.extent.offset > oi.size) {
        t->truncate(
          soid, op.extent.offset);
      } else {
        t->nop(soid);
      }
    } else {
      t->write(
        soid, op.extent.offset, op.extent.length, osd_op.indata, op.flags);
    }

but it doesn't update the oi.size later because the length is 0

  if (write_full ||
      (offset + length > oi.size && length)) {
    uint64_t new_size = offset + length;
    delta_stats.num_bytes -= oi.size;
    delta_stats.num_bytes += new_size;
    oi.size = new_size;
  }

Moreover, we think it has the same bug when old write(e.g offset=1024, length=4096) arrived after trimtrunc(e.g truncate_seq = 2, truncate_size=0) in cephfs.

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by xie xingguo over 4 years ago

https://github.com/ceph/ceph/pull/30085

Actions

Copy link

Updated by xie xingguo over 4 years ago

Assignee set to xie xingguo

Actions

Copy link

Updated by Kefu Chai over 4 years ago

Description updated (diff)

Actions

Copy link

Updated by Kefu Chai over 4 years ago

Status changed from New to Fix Under Review
Pull request ID set to 30085

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

Backport changed from Mimic,Luminous,Nautilus to mimic,luminous,nautilus

Actions

Copy link

Updated by xie xingguo over 4 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

Copied to Backport #41702: luminous: oi(object_info_t).size does not match on disk size added

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

Copied to Backport #41703: nautilus: oi(object_info_t).size does not match on disk size added

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

Copied to Backport #41704: mimic: oi(object_info_t).size does not match on disk size added

Actions

Copy link

#10

Updated by Greg Farnum over 4 years ago

Project changed from Ceph to RADOS
Category deleted (~~OSD~~)

Hmm I was going to move this into the RADOS project tracker but now I'm leaving it because I'm not sure if that will mess up the backport scripts.

Actions

Copy link

#11

Updated by Nathan Cutler over 4 years ago

Greg Farnum wrote:

Hmm I was going to move this into the RADOS project tracker but now I'm leaving it because I'm not sure if that will mess up the backport scripts.

It's in RADOS now and I changed the backport tracker issue to match it, but in general the backport scripts don't care what the project is, as long as it's one of the projects that has a "Backport" tracker.

Actions

Copy link

#12

Updated by Nathan Cutler about 3 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #41601

oi(object_info_t).size does not match on disk size

Updated by xie xingguo over 4 years ago

Updated by xie xingguo over 4 years ago

Updated by Kefu Chai over 4 years ago

Updated by Kefu Chai over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by xie xingguo over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler about 3 years ago