Project

General

Profile

Actions

Bug #53349

open

stat_sum.num_bytes of pool is incorrect when randomly writing small IOs to the pool

Added by mingpo li over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In a test, I found that when random writes with an IO size of 512B are performed on the rbd, The pool's stat_sum.num_bytes will grow rapidly,Thousands of times the actual data.
Following is the data obtained with 'ceph report':

"poolid": 32,
"num_pg": 512,
"stat_sum": {
"num_bytes": 21554570771,
"num_objects": 9750,
"num_object_clones": 0,
"num_object_copies": 19500,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 9750,
"num_whiteouts": 0,
"num_read": 59,
"num_read_kb": 159,
"num_write": 11374,
"num_write_kb": 11365,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 10,
"num_bytes_recovered": 2560,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0,
"num_large_omap_objects": 0,
"num_objects_manifest": 0,
"num_omap_bytes": 0,
"num_omap_keys": 0,
"num_objects_repaired": 0
},
"store_stats": {
"total": 0,
"available": 0,
"internally_reserved": 0,
"allocated": 1485045760,
"data_stored": 11619366,
"data_compressed": 0,
"data_compressed_allocated": 0,
"data_compressed_original": 0,
"omap_allocated": 0,
"internal_metadata": 0
},

store_stats.data_stored only is 11619366,but stat_sum.num_bytes is 21554570771.
Then I read the code, I found that in function "PrimaryLogPG::write_update_size_and_usage":
new_size = offset + length
So if a large number of small IO are randomly written with relatively large offset, new_size will be larger than actual size.At the same time,Pool checks whether is full through num_bytes,This will cause the pool to be marked as full but actually only use a very small part of the space.I want to ask if this is reasonable?

No data to display

Actions

Also available in: Atom PDF