Project

General

Profile

Actions

Bug #41601

closed

oi(object_info_t).size does not match on disk size

Added by 侯 斌 over 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

50%

Source:
Community (dev)
Tags:
Backport:
mimic,luminous,nautilus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rest
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In our test environment(ceph version 14.2.1(nautilus) + replicated pool), we found scrub error like bug23701. We use rados_write interface in librados.h to write (offset = 1024, length = 0), and after scrub the corresponding pg, we found a scrub error. The log in osd belows:

2019-09-01 20:41:21.393 7f767dc21700 0 log_channel(cluster) log [DBG] : 1.63 scrub starts
2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 1 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch
2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 0 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch
2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 3 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch
2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 soid 1:c68704ec:::test_0001:head : failed to pick suitable object info
2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : scrub 1.63 1:c68704ec:::test_0001:head : on disk size (1024) does not match object info size (0) adjusted for ondisk to (0)
2019-09-01 20:41:21.434 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 scrub 4 errors

After analysis the code, we found the operation CEPH_OSD_OP_WRITE will truncate object size to offset when length is 0.

    if (op.extent.length == 0) {
      if (op.extent.offset > oi.size) {
        t->truncate(
          soid, op.extent.offset);
      } else {
        t->nop(soid);
      }
    } else {
      t->write(
        soid, op.extent.offset, op.extent.length, osd_op.indata, op.flags);
    }

but it doesn't update the oi.size later because the length is 0

  if (write_full ||
      (offset + length > oi.size && length)) {
    uint64_t new_size = offset + length;
    delta_stats.num_bytes -= oi.size;
    delta_stats.num_bytes += new_size;
    oi.size = new_size;
  }

Moreover, we think it has the same bug when old write(e.g offset=1024, length=4096) arrived after trimtrunc(e.g truncate_seq = 2, truncate_size=0) in cephfs.


Related issues 3 (0 open3 closed)

Copied to RADOS - Backport #41702: luminous: oi(object_info_t).size does not match on disk size RejectedActions
Copied to RADOS - Backport #41703: nautilus: oi(object_info_t).size does not match on disk size ResolvedPrashant DActions
Copied to RADOS - Backport #41704: mimic: oi(object_info_t).size does not match on disk size ResolvedPrashant DActions
Actions #2

Updated by xie xingguo over 4 years ago

  • Assignee set to xie xingguo
Actions #3

Updated by Kefu Chai over 4 years ago

  • Description updated (diff)
Actions #4

Updated by Kefu Chai over 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 30085
Actions #5

Updated by Nathan Cutler over 4 years ago

  • Backport changed from Mimic,Luminous,Nautilus to mimic,luminous,nautilus
Actions #6

Updated by xie xingguo over 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41702: luminous: oi(object_info_t).size does not match on disk size added
Actions #8

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41703: nautilus: oi(object_info_t).size does not match on disk size added
Actions #9

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41704: mimic: oi(object_info_t).size does not match on disk size added
Actions #10

Updated by Greg Farnum over 4 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (OSD)

Hmm I was going to move this into the RADOS project tracker but now I'm leaving it because I'm not sure if that will mess up the backport scripts.

Actions #11

Updated by Nathan Cutler over 4 years ago

Greg Farnum wrote:

Hmm I was going to move this into the RADOS project tracker but now I'm leaving it because I'm not sure if that will mess up the backport scripts.

It's in RADOS now and I changed the backport tracker issue to match it, but in general the backport scripts don't care what the project is, as long as it's one of the projects that has a "Backport" tracker.

Actions #12

Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF