Project

General

Profile

Bug #23365

CEPH device class not honored for erasure encoding.

Added by Brian Woods over 3 years ago. Updated about 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

To start, this cluster isn't happy. It is my destructive testing/learning cluster.

Recently I rebuilt the cluster adding SSDs (having used just HDDs before) and I have been having some issues. First was with performance, it dropped, by a fair amount (going down to just 2MBps, per stream), then I had pgs suddenly do "not active" without any failures. And now it is all sorts of mad. BUT, for this case, that is just some context.

When I first added the SSDs I had the reweight set VERY low to prevent data from the old pools being migrated (that have now been removed). Thinking that that could have been causing some of my issues I returned the weight to normal. This triggered some rebalancing, but the next day the SSDs had all filled (one died in the process, bad hardware). This puzzled me as the meta data pool is only about 80MBs and the write cache hovers around 3.5GBs.

So I started digging, thinking that the data pool may have been migrated the SSDs for some reason.

Lets get our data pool:

root@MediaServer:~# ceph df
...
    NAME                         ID     USED       %USED      MAX AVAIL     OBJECTS 
    MigrationPool                17      6308G     100.00             0     2019061 
    MigrationPool-Meta           18     70932k     100.00             0       88098 
    MigrationPool-WriteCache     19      3395M     100.00             0         875 
...

We are interested in ID 17, here is that pools profile:

root@MediaServer:~# ceph osd pool get MigrationPool all
...
erasure_code_profile: Erasure-D5F1-HDD
...

That profiles class:

root@MediaServer:~# ceph osd erasure-code-profile get Erasure-D5F1-HDD
crush-device-class=hdd
...

Lets pick an SSD:

root@MediaServer:~# ceph osd df | sort -n -k1
ID CLASS WEIGHT  REWEIGHT SIZE   USE    AVAIL   %USE  VAR  PGS 
...
11   ssd 0.09999  1.00000 95392M 88634M   6757M 92.92 1.55  14 
...

And finally, lets look for pool ID 17 in that SSD:

root@MediaServer:~# ceph pg ls-by-osd 11 | grep '17\.'
17.16 31854 0 0 0 0 106917534705 1557 1557 active+clean+remapped 2018-03-14 04:11:17.392929 2996'82494 3490:274645 [2,1,2147483647,12,11,3] 2 [2,1,6,12, 11 ,3] 2 2996'82494 2018-03-14 04:11:17.392736 2996'82494 2018-03-10 23:07:48.358707
17.35 31370 0 0 0 0 104997447663 1623 1623 active+clean 2018-03-14 05:31:53.160644 2996'81594 3490:315943 [12,3,1,6,2,11] 12 [12,3,1,6,2, 11 ] 12 2996'81594 2018-03-14 05:31:53.160520 2993'81192 2018-03-07 15:01:11.463540
17.36 31787 0 0 0 0 106702587303 1500 1500 active+clean 2018-03-14 04:10:48.600589 2996'82305 3490:418755 [12,2,1,3,8,11] 12 [12,2,1,3,8, 11 ] 12 2996'82305 2018-03-14 04:10:48.600464 2996'82305 2018-03-12 05:00:40.453809
17.3e 31662 0 0 0 0 106070903943 1553 1553 active+clean 2018-03-14 03:49:39.645138 3013'81823 3490:289769 [2,1,3,6,11,12] 2 [2,1,3,6, 11 ,12] 2 3013'81823 2018-03-14 03:49:39.645014 3013'81823 2018-03-12 03:38:14.041181

That's not good.... Ideas?

History

#1 Updated by Greg Farnum about 3 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (common)

What version are you running? How are your OSDs configured?

There was a bug with BlueStore SSDs being misreported as rotational for some purposes that may have caused this.

#2 Updated by Brian Woods about 3 years ago

I put 12.2.2, but that is incorrect. It is version ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable).

OSD Tree:

ID  CLASS WEIGHT  TYPE NAME            STATUS REWEIGHT PRI-AFF 
-10             0 datacenter home                              
 -1       9.53093 root default                                 
 -2       9.10095     host MediaServer                         
  1   hdd 4.20000         osd.1            up  1.00000 1.00000 
  2   hdd 1.54648         osd.2            up  1.00000 1.00000 
  3   hdd 1.45551         osd.3            up  1.00000 1.00000 
  4   hdd 0.17000         osd.4            up  1.00000 1.00000 
  5   hdd 0.00899         osd.5            up  1.00000 1.00000 
 12   hdd 1.50000         osd.12           up  1.00000 1.00000 
  9   ssd 0.06000         osd.9          down  1.00000 1.00000 
 10   ssd 0.06000         osd.10           up  1.00000 1.00000 
 11   ssd 0.09999         osd.11           up  1.00000 1.00000 
 -3       0.42998     host TheMonolith                         
  6   hdd 0.17000         osd.6            up  1.00000 1.00000 
  8   hdd 0.15999         osd.8            up  1.00000 1.00000 
  7   ssd 0.09999         osd.7            up  0.20000 1.00000 

Side note, the OSD that I thought had died is actually in a crash loop of some sort.

ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
 1: (()+0xa74234) [0x558d65225234]
 2: (()+0x11390) [0x7f66bdbd5390]
 3: (gsignal()+0x38) [0x7f66bcb70428]
 4: (abort()+0x16a) [0x7f66bcb7202a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x558d652689fe]
 6: (pi_compact_rep::add_interval(bool, PastIntervals::pg_interval_t const&)+0x435) [0x558d64eed115]
 7: (PastIntervals::check_new_interval(int, int, std::vector<int, std::allocator<int> > const&, 
std::vector<int, std::allocator<int> > const&, int, int, std::vector<int, std::allocator<int> > const&,
std::vector<int, std::allocator<int> > const&, unsigned int, unsigned int, std::shared_ptr<OSDMap const>, 
std::shared_ptr<OSDMap const>, pg_t, IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)+0x396) [0x558d64eca8a6]
 8: (OSD::build_past_intervals_parallel()+0xd9d) [0x558d64c7417d]
 9: (OSD::load_pgs()+0x14fb) [0x558d64c76c7b]
 10: (OSD::init()+0x2217) [0x558d64c94d07]
 11: (main()+0x2f07) [0x558d64ba3f17]
 12: (__libc_start_main()+0xf0) [0x7f66bcb5b830]
 13: (_start()+0x29) [0x558d64c2f6b9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.9.log
--- end dump of recent events ---

#3 Updated by Brian Woods about 3 years ago

A quote from Greg Farnum on the crash from another ticket:

Brian, that's a separate bug; the code address you've picked up on is just part of the generic failure handling code.

Reading it again, maybe it is just a hardware failure.

Also available in: Atom PDF