Project

General

Profile

Actions

Bug #13134

closed

old leveldb is slow (firefly->hammer upgrade leads to slow requests)

Added by Corin Langosch over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Yesterday I upgraded a firefly cluster (started with argonaut several years ago) to hammer 0.94.3. The cluster was completely healthy and seemed to work fine.

During the night at 2015-09-17 04:51:14 osd.0 suddenly died. The log is attached. I only noticed it this morning, when recovery was already finished. But ceph still reported "284 requests are blocked > 32 sec":

ceph -w
    cluster 4ac0e21b-6ea2-4ac7-8114-122bd9ba55d6
     health HEALTH_WARN
            284 requests are blocked > 32 sec
     monmap e5: 3 mons at {a=10.0.0.5:6789/0,b=10.0.0.6:6789/0,c=10.0.0.7:6789/0}
            election epoch 798, quorum 0,1,2 a,b,c
     osdmap e21073: 19 osds: 9 up, 9 in
      pgmap v73120784: 4096 pgs, 1 pools, 3183 GB data, 797 kobjects
            6373 GB used, 2077 GB / 8451 GB avail
                4096 active+clean
  client io 29236 kB/s rd, 4152 kB/s wr, 1647 op/s

ceph health detail
HEALTH_WARN 284 requests are blocked > 32 sec; 1 osds have slow requests
284 ops are blocked > 4194.3 sec
284 ops are blocked > 4194.3 sec on osd.9
1 osds have slow requests

Looking at osd.9 everything looked normal: cpu, memory, iostat (0% - 5%). I waited a few more minutes but nothing changed. A few qemu machines were unresponsive during that period. As the cluster was healthy expect for that I decided to simply restart osd.9 - and yes, the problem was solved and the cluster recovered within a few seconds to state HEALTHY. Logs of osd.9 are attached too. Now around 1h later the cluster is still HEALTHY.

Looking at the logs I assume there's probably a locking bug in the ceph filestore backend? Probably it's also a "tuning bug" of my ceph config?

Configuration:
  • hammer 0.94.3
  • full ssd, no spinning disks
  • 19 osds in total but normally only 10 in/ up (currently 9 because of the one which died this night)
  • only used for qemu vms (around 200 and growing)
  • no errors in syslog/dmesg, iostat looks good
ID  WEIGHT   TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY 
 -4 10.00000 root ssd                                               
 -2  2.00000     host r-ch102-ssd                                   
  0  1.00000         osd.0           down        0          1.00000 
 15  1.00000         osd.15            up  1.00000          1.00000 
 -5  2.00000     host r-ch103-ssd                                   
  3  1.00000         osd.3             up  1.00000          1.00000 
 16  1.00000         osd.16            up  1.00000          1.00000 
 -6  2.00000     host r-ch104-ssd                                   
  6  1.00000         osd.6             up  1.00000          1.00000 
 17  1.00000         osd.17            up  1.00000          1.00000 
 -7  2.00000     host r-ch105-ssd                                   
  9  1.00000         osd.9             up  1.00000          1.00000 
 18  1.00000         osd.18            up  1.00000          1.00000 
 -8  2.00000     host r-ch106-ssd                                   
 12  1.00000         osd.12            up  1.00000          1.00000 
 14  1.00000         osd.14            up  1.00000          1.00000 
 -1 26.09995 root hdd                                               
 -3  5.79999     host r-ch102-hdd                                   
  1  2.89999         osd.1           down        0          1.00000 
  2  2.89999         osd.2           down        0          1.00000 
 -9  5.79999     host r-ch103-hdd                                   
  4  2.89999         osd.4           down        0          1.00000 
  5  2.89999         osd.5           down        0          1.00000 
-10  5.79999     host r-ch104-hdd                                   
  7  2.89999         osd.7           down        0          1.00000 
  8  2.89999         osd.8           down        0          1.00000 
-11  5.79999     host r-ch105-hdd                                   
 10  2.89999         osd.10          down        0          1.00000 
 11  2.89999         osd.11          down        0          1.00000 
-12  2.89999     host r-ch106-hdd                                   
 13  2.89999         osd.13          down        0          1.00000 
[global]
  max open files = 65536
  auth cluster required = cephx
  auth service required = cephx
  auth client required = cephx
  cephx require signatures = true
  public network = 10.0.0.0/24
  cluster network = 10.0.0.0/24
  debug mon = 0
  debug paxos = 0
  mon osd down out interval = 3600

[client]
  rbd cache = true
  rbd cache size = 33554432
  rbd cache max dirty = 25165824
  rbd cache target dirty = 16777216
  rbd cache max dirty age = 3

[mon]
  osd pool default flag hashpspool = true
  mon data avail warn = 15
  mon data avail crit = 5

[osd]
  osd journal size = 1000
  osd journal dio = true
  osd journal aio = true
  osd op threads = 8
  osd min pg log entries = 250
  osd max pg log entries = 1000
  osd crush update on start = false
  filestore op threads = 16
  filestore max sync interval = 10
  filestore min sync interval = 3

[mon.a]
  host = r-ch103
  mon addr = 10.0.0.5:6789

[mon.b]
  host = r-ch104
  mon addr = 10.0.0.6:6789

[mon.c]
  host = r-ch105
  mon addr = 10.0.0.7:6789

[osd.0]
  host = r-ch102

[osd.1]
  host = r-ch102

[osd.2]
  host = r-ch102

[osd.3]
  host = r-ch103

[osd.4]
  host = r-ch103

[osd.5]
  host = r-ch103

[osd.6]
  host = r-ch104

[osd.7]
  host = r-ch104

[osd.8]
  host = r-ch104

[osd.9]
  host = r-ch105

[osd.10]
  host = r-ch105

[osd.11]
  host = r-ch105

[osd.12]
  host = r-ch106

[osd.13]
  host = r-ch106

[osd.14]
  host = r-ch106

[osd.15]
  host = r-ch102

[osd.16]
  host = r-ch103

[osd.17]
  host = r-ch104

[osd.18]
  host = r-ch105

Sidenotes: Upgrading pgs on old osds took very long (20 minutes per osd), while upgrading pgs on newer osds was very fast (15s per osd). IO of the particular OSD disk was idle during that time, but the osd process took constantly 100%. After everything was healthy again, I restarted again and then the osd booted fast. SO I think it had to clean up a lot of old gargabe then upgrading and in the end everything was fine.

Firefly was running really stable and I'm not quite concerned there's something wrong with my cluster:
  • why did osd.0 suddenly die?
  • why do blocked requests exist even system resources are idle?

Files

logs.tar.gz (791 KB) logs.tar.gz Corin Langosch, 09/17/2015 06:58 AM
Actions

Also available in: Atom PDF