Project

General

Profile

Actions

Bug #16176

closed

objectmap does not show object existence correctly

Added by Xinxin Shu almost 8 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel,kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

in the latest master, i create 10GB rbd and use rbd bench-write to fill objects, and i used "ctrl + c" to terminate bench-write process, then i check objects,it shows there are 26 objects

[root@ctrl src]# ./rados -c /mnt/data/devs/ceph.conf -p rbd ls | grep rbd_data.113a74b0dc51 | wc -l
2016-06-07 10:02:45.681910 7f438e50ba40 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-06-07 10:02:45.682067 7f438e50ba40 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-06-07 10:02:45.713863 7f438e50ba40 -1 WARNING: the following dangerous and experimental features are enabled: *
26
[root@ctrl src]#

but when i use 'rbd du' to disk usage, it only shows that only 6 objects exist, and the objectmap flag is valid
NAME PROVISIONED USED
test 10240M 24576k

[root@ctrl src]# ./rbd -c /mnt/data/devs/ceph.conf -p rbd info test
2016-06-07 10:03:55.902227 7f6963a84d80 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-06-07 10:03:55.902758 7f6963a84d80 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-06-07 10:03:55.935078 7f6963a84d80 -1 WARNING: the following dangerous and experimental features are enabled: *
rbd image 'test':
size 10240 MB in 2560 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.113a74b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:


Files

bug-16176-log.zip (749 KB) bug-16176-log.zip Xinxin Shu, 11/22/2016 08:26 AM

Related issues 2 (0 open2 closed)

Copied to rbd - Backport #18289: kraken: objectmap does not show object existence correctlyClosedActions
Copied to rbd - Backport #18290: jewel: objectmap does not show object existence correctlyResolvedJason DillamanActions
Actions #1

Updated by Samuel Just over 7 years ago

  • Project changed from Ceph to rbd
  • Category deleted (librbd)
Actions #2

Updated by Jason Dillaman over 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman
  • Priority changed from High to Normal
Actions #3

Updated by Jason Dillaman over 7 years ago

  • Status changed from In Progress to Need More Info

@Xinxin: I've tried to repeat your findings without success. Can you repeat with "debug rbd = 20" in your ceph client configuration and then attach the associated bench-write log files?

Also, did you image have a snapshot? If it does, that would potentially explain the issue.

Actions #4

Updated by Jason Dillaman over 7 years ago

  • Assignee deleted (Jason Dillaman)
Actions #5

Updated by Jason Dillaman over 7 years ago

@Xinxin: ping -- any additional information?

Actions #6

Updated by Xinxin Shu over 7 years ago

hi jason, sorry for late response, i can reproduce this on the latest master, it seems that this only happens with async messenger, the 'rados ls' output is greater than 'rbd du' output, but with simple messenger i can not reproduce this, the 'rados ls' output is lower than or equal to 'rbd du' output, i think this is reasonable since librbd always updates object map before writing the data to ceph cluser

however in both cases, inconsistency state always occurs, my first thought is to update the object map and set the object map as invalid or other transient state, object map is only set to valid until the write completes, does this make sense to you

Actions #7

Updated by Jason Dillaman over 7 years ago

I am still not following the issue and will require more information. To answer your question, you cannot set the object map to valid after the block is updated because that would mean the object map doesn't reflect reality (i.e. if you were to crash after adding the object but before updating the object map, you would have an orphaned object).

Actions #8

Updated by Xinxin Shu over 7 years ago

m thought is as follows

1. update object map, set object map state to invalid or transient state
2. write data to rados
3. clear invalid state or change transient state to valid state

if rbd crashed after step 2, object map is updated but it is not a valid state, since we provide object map rebuild api, the user should rebuild objectmap if the objectmap is invalid.

Actions #9

Updated by Jason Dillaman over 7 years ago

You really need to explain to me the problem you are attempting to solve because I am not understanding it.

Actions #10

Updated by Xinxin Shu over 7 years ago

i want to fix the inconsistency between 'rbd df' and 'rados ls', as this tracker state, 'rados ls' show that there are 26 objects in this rbd, 'rbd du' should return 106496K, but actually it shows 24576K, only 6 objects has been written in this rbd, this is the inconsistenc of these two commands, i investigate write flow, found that librbd always update objectmap before writing data, if librbd crashes after updating objectmap, but before writing data, these two commands's output is inconsistent.

Actions #11

Updated by Jason Dillaman over 7 years ago

OK, I think I am still not understanding the issue. If `rados ls` shows 26 objects (104MB of used space) but `rbd du` shows 24MB of used space (6 objects), how is what you are suggesting going to address this issue? Since the object map is updated before the object is written, if anything, `rbd du` should say 104MB of space is used and `rados ls` would only show 24MB.

Actions #12

Updated by Xiaoxi Chen over 7 years ago

it looks to me like the only possibility is random write instead of sequential write was used to fill the drive, rbd du report 24MB doesnt necessary means 6 objects, it just iter the diff and sum up.

The easiest way to make this "issue" clear, @Xinxin, could you pls add output in https://github.com/ceph/ceph/blob/master/src/tools/rbd/action/DiskUsage.cc#L27? just log out <offset, len, exists> , which will be clear enough to show your disk structure to everybody.

Actions #13

Updated by Xinxin Shu over 7 years ago

hi jason, i reproduce this bug, detail logs are attached, pls check the log, "./bin/rbd -c ceph.conf -p rbd bench-write test" is used to fill data

[root@ceph01 build]# ./bin/rados -p rbd -c ceph.conf ls | grep rbd_data
2016-11-22 15:31:30.455290 7f6a0078ca00 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-11-22 15:31:30.455679 7f6a0078ca00 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-11-22 15:31:30.488535 7f6a0078ca00 -1 WARNING: the following dangerous and experimental features are enabled: *
rbd_data.100b74b0dc51.0000000000000594
rbd_data.100b74b0dc51.0000000000000683
rbd_data.100b74b0dc51.00000000000000b6
rbd_data.100b74b0dc51.000000000000052a
rbd_data.100b74b0dc51.0000000000000682
rbd_data.100b74b0dc51.0000000000000856
rbd_data.100b74b0dc51.00000000000002a7
rbd_data.100b74b0dc51.000000000000010a
rbd_data.100b74b0dc51.00000000000001a5
rbd_data.100b74b0dc51.000000000000068b
rbd_data.100b74b0dc51.00000000000001ed
rbd_data.100b74b0dc51.0000000000000279
rbd_data.100b74b0dc51.000000000000027a
rbd_data.100b74b0dc51.000000000000020d
rbd_data.100b74b0dc51.00000000000008b3
rbd_data.100b74b0dc51.000000000000083f

[root@ceph01 build]# ./bin/rbd -p rbd -c ceph.conf du test
2016-11-22 15:32:25.802697 7f2b35252ec0 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-11-22 15:32:25.802915 7f2b35252ec0 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-11-22 15:32:25.839371 7f2b35252ec0 -1 WARNING: the following dangerous and experimental features are enabled: *
NAME PROVISIONED USED
test 10240M 53248k

Actions #14

Updated by Jason Dillaman over 7 years ago

@Xinxin: Thanks for the debug logs. I will try to take a look at this today or tomorrow.

Actions #15

Updated by Jason Dillaman over 7 years ago

  • Backport set to kraken,jewel

OK, I see the issue now. In the face of a crash during in-flight IO, where more than one in-flight IO maps to the same backing object, while the first IO to the object is blocked waiting for the object map update, the subsequent IOs to the same object are not blocked and can proceed to create the backing object potentially before the object map update is committed. This bug was introduced in the Infernalis release due to updating the in-memory object map before the on-disk object map was committed.

Actions #16

Updated by Jason Dillaman over 7 years ago

  • Status changed from Need More Info to In Progress
  • Assignee set to Jason Dillaman
  • Priority changed from Normal to High
Actions #17

Updated by Jason Dillaman over 7 years ago

  • Status changed from In Progress to Fix Under Review
Actions #18

Updated by Mykola Golub over 7 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #19

Updated by Nathan Cutler over 7 years ago

  • Copied to Backport #18289: kraken: objectmap does not show object existence correctly added
Actions #20

Updated by Nathan Cutler over 7 years ago

  • Copied to Backport #18290: jewel: objectmap does not show object existence correctly added
Actions #21

Updated by Jason Dillaman over 7 years ago

  • Backport changed from kraken,jewel to jewel
Actions #22

Updated by Nathan Cutler over 7 years ago

  • Backport changed from jewel to jewel,kraken
Actions #23

Updated by Nathan Cutler about 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF