Project

General

Profile

Bug #15936

Osd-s on cache pool crash after upgrade from Hammer to Jewel

Added by elder one almost 8 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Joao Eduardo Luis
Category:
Tiering
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After upgrading my cluster from 94.7 to 10.2.1 all ods-s (ssd) backing cache tier stared to crash constantly as soon as pool with cache tier was accessed.
Randomly from 1 min to hour at max.

Cache pool info:
pool 8 'ssdcache' replicated size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 11570 flags hashpspool,incomplete_clones tier_of 6 cache_mode writeback target_bytes 107374182400 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 decay_rate 0 search_last_n 1 min_read_recency_for_promote 1 stripe_width 0

And pool info:
pool 6 'hdd10k' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 11561 lfor 11561 flags hashpspool tiers 8 read_tier 8 write_tier 8 min_write_recency_for_promote 1 stripe_width 0

---
Also setting: ceph osd set sortbitwise after upgrade whole cluster become unusable - no qemu clients were able to r/w to disks. Reverted quickly.

ceph.conf View (2.1 KB) elder one, 05/19/2016 11:05 AM

ceph-osd.44.zip - crash log (539 KB) elder one, 05/19/2016 11:22 AM

History

#1 Updated by elder one almost 8 years ago

#2 Updated by elder one almost 8 years ago

Ubuntu 14.04, kernel 3.18.33

#3 Updated by Samuel Just over 7 years ago

  • Assignee set to Joao Eduardo Luis

#4 Updated by Greg Farnum almost 7 years ago

Ping Joao? This looks to have been a crash in persisting/trimming HitSets, which I know underwent a bunch of changes/fixes around time notation and stuff, so I suspect it's done now...

#5 Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category set to Tiering
  • Component(RADOS) OSD added

#6 Updated by Sage Weil over 6 years ago

  • Status changed from New to Can't reproduce

not enough info here to go on..

Also available in: Atom PDF