Bug #21318
segv in rocksdb::BlockBasedTable::NewIndexIterator
0%
Description
Hi, so if you want to get more info there is a thread on ceph-user:
"OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1"
I was told:
1. go away and install 12.2 so I waited until it was available for proxmox from their own tree and upgraded today to stable release
2. I was told to open a bug here.
So problem is that OSD will crash on normal scrub and on a deep scrub:
So some data on main issue:
FAIL ON DEEP SCRUB
2017-08-26 01:50:04.421944 osd.0 osd.0 192.168.1.240:6814/12991 7 : cluster [INF] 6.5 scrub ok
2017-08-26 01:50:09.331095 osd.0 osd.0 192.168.1.240:6814/12991 8 : cluster [INF] 1.1c scrub starts
2017-08-26 01:51:03.339509 osd.0 osd.0 192.168.1.240:6814/12991 9 : cluster [INF] 1.1c scrub ok
2017-08-26 02:21:00.706695 osd.10 osd.10 192.168.1.240:6806/21564 7 : cluster [INF] 1.d1 scrub starts
2017-08-26 02:21:34.066183 osd.10 osd.10 192.168.1.240:6806/21564 8 : cluster [INF] 1.d1 scrub ok
2017-08-26 02:21:56.943046 osd.8 osd.8 192.168.1.240:6810/22002 7 : cluster [INF] 1.17 scrub starts
2017-08-26 02:23:06.341409 osd.8 osd.8 192.168.1.240:6810/22002 8 : cluster [INF] 1.17 scrub ok
2017-08-26 02:35:51.099649 osd.8 osd.8 192.168.1.240:6810/22002 9 : cluster [INF] 1.89 scrub starts
2017-08-26 02:36:42.605600 osd.8 osd.8 192.168.1.240:6810/22002 10 : cluster [INF] 1.89 scrub ok
2017-08-26 02:38:27.132698 osd.8 osd.8 192.168.1.240:6810/22002 11 : cluster [INF] 1.ce scrub starts
2017-08-26 02:38:49.820489 osd.8 osd.8 192.168.1.240:6810/22002 12 : cluster [INF] 1.ce scrub ok
2017-08-26 03:23:27.619669 osd.8 osd.8 192.168.1.240:6810/22002 13 : cluster [INF] 1.8c scrub starts
2017-08-26 03:23:49.679403 osd.8 osd.8 192.168.1.240:6810/22002 14 : cluster [INF] 1.8c scrub ok
2017-08-26 03:32:19.475812 osd.0 osd.0 192.168.1.240:6814/12991 10 : cluster [INF] 1.d4 deep-scrub starts
2017-08-26 03:38:46.708163 mon.0 mon.0 192.168.1.240:6789/0 1201 : cluster [INF] osd.0 failed (root=default,host=proxmox1) (connection refused reported by osd.8)
2017-08-26 03:38:46.759470 mon.0 mon.0 192.168.1.240:6789/0 1207 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
2017-08-26 03:38:49.820122 mon.0 mon.0 192.168.1.240:6789/0 1212 : cluster [WRN] Health check failed: Reduced data availability: 12 pgs inactive (PG_AVAILABILITY)
2017-08-26 03:38:49.820165 mon.0 mon.0 192.168.1.240:6789/0 1213 : cluster [WRN] Health check failed: Degraded data redundancy: 292260/3786364 objects degraded (7.719%), 38 pgs unclean, 38 pgs degraded (PG_DEGRADED)
2017-08-26 03:38:51.088934 mon.0 mon.0 192.168.1.240:6789/0 1214 : cluster [WRN] Health check update: Reduced data availability: 16 pgs inactive (PG_AVAILABILITY)
2017-08-26 03:38:51.088975 mon.0 mon.0 192.168.1.240:6789/0 1215 : cluster [WRN] Health check update: Degraded data redundancy: 392568/3786364 objects degraded (10.368%), 52 pgs unclean, 52 pgs degraded (PG_DEGRADED)
2017-08-26 03:38:53.090178 mon.0 mon.0 192.168.1.240:6789/0 1216 : cluster [WRN] Health check update: Reduced data availability: 29 pgs inactive (PG_AVAILABILITY)
2017-08-26 03:38:53.090216 mon.0 mon.0 192.168.1.240:6789/0 1217 : cluster [WRN] Health check update: Degraded data redundancy: 592033/3786364 objects degraded (15.636%), 82 pgs unclean, 82 pgs degraded (PG_DEGRADED)
2017-08-26 03:39:37.928816 mon.0 mon.0 192.168.1.240:6789/0 1220 : cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2017-08-26 03:39:37.941007 mon.0 mon.0 192.168.1.240:6789/0 1221 : cluster [INF] osd.0 192.168.1.240:6814/15727 boot
2017-08-26 03:39:39.949551 mon.0 mon.0 192.168.1.240:6789/0 1226 : cluster [WRN] Health check update: Degraded data redundancy: 436309/3786364 objects degraded (11.523%), 82 pgs unclean, 60 pgs degraded (PG_DEGRADED)
2017-08-26 03:39:41.974996 mon.0 mon.0 192.168.1.240:6789/0 1227 : cluster [WRN] Health check update: Degraded data redundancy: 379236/3786364 objects degraded (10.016%), 74 pgs unclean, 52 pgs degraded (PG_DEGRADED)
2017-08-26 03:39:43.120495 mon.0 mon.0 192.168.1.240:6789/0 1228 : cluster [WRN] Health check update: Degraded data redundancy: 22 pgs unclean (PG_DEGRADED)
2017-08-26 03:39:43.120534 mon.0 mon.0 192.168.1.240:6789/0 1229 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 29 pgs inactive)
2017-08-26 03:39:45.121340 mon.0 mon.0 192.168.1.240:6789/0 1230 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 22 pgs unclean)
2017-08-26 03:39:45.121386 mon.0 mon.0 192.168.1.240:6789/0 1231 : cluster [INF] Cluster is now healthy
2017-08-26 03:40:11.568499 osd.10 osd.10 192.168.1.240:6806/21564 9 : cluster [INF] 1.b5 scrub starts
2017-08-26 03:40:51.874519 osd.10 osd.10 192.168.1.240:6806/21564 10 : cluster [INF] 1.b5 scrub ok
2017-08-26 03:41:15.794026 osd.8 osd.8 192.168.1.240:6810/22002 15 : cluster [INF] 1.77 scrub starts
2017-08-26 03:42:19.561924 osd.8 osd.8 192.168.1.240:6810/22002 16 : cluster [INF] 1.77 scrub ok
2017-08-26 03:42:30.895351 osd.0 osd.0 192.168.1.240:6814/15727 1 : cluster [INF] 1.d4 deep-scrub starts
2017-08-26 03:42:30.842869 osd.8 osd.8 192.168.1.240:6810/22002 17 : cluster [INF] 1.12 scrub starts
2017-08-26 03:43:15.478366 osd.8 osd.8 192.168.1.240:6810/22002 18 : cluster [INF] 1.12 scrub ok
2017-08-26 03:47:17.962016 osd.0 osd.0 192.168.1.240:6814/15727 2 : cluster [INF] 1.d4 deep-scrub ok
2017-08-26 03:48:30.668792 osd.10 osd.10 192.168.1.240:6806/21564 11 : cluster [INF] 1.1 scrub starts
2017-08-26 03:49:05.879546 osd.10 osd.10 192.168.1.240:6806/21564 12 : cluster [INF] 1.1 scrub ok
2017-08-26 03:50:53.709500 osd.10 osd.10 192.168.1.240:6806/21564 13 : cluster [INF] 1.9d scrub starts
2017-08-26 03:52:13.278975 osd.10 osd.10 192.168.1.240:6806/21564 14 : cluster [INF] 1.9d scrub ok
2017-08-26 04:31:37.144944 osd.10 osd.10 192.168.1.240:6806/21564 15 : cluster [INF] 1.82 scrub starts
2017-08-26 04:32:35.917646 osd.10 osd.10 192.168.1.240:6806/21564 16 : cluster [INF] 1.82 scrub ok
2017-08-26 04:33:03.930435 osd.9 osd.9 192.168.1.240:6802/32411 36 : cluster [INF] 1.f4 scrub starts
2017-08-26 04:34:08.360134 osd.9 osd.9 192.168.1.240:6802/32411 37 : cluster [INF] 1.f4 scrub ok
FAIL ON NORMAL SCRUB
2017-08-25 23:28:55.310602 osd.8 osd.8 192.168.1.240:6806/2820 29 : cluster [INF] 6.3 deep-scrub starts
2017-08-25 23:28:55.415144 osd.8 osd.8 192.168.1.240:6806/2820 30 : cluster [INF] 6.3 deep-scrub ok
2017-08-25 23:29:01.273979 osd.8 osd.8 192.168.1.240:6806/2820 31 : cluster [INF] 1.d2 scrub starts
2017-08-25 23:30:47.518484 osd.8 osd.8 192.168.1.240:6806/2820 32 : cluster [INF] 1.d2 scrub ok
2017-08-25 23:31:40.311045 osd.8 osd.8 192.168.1.240:6806/2820 33 : cluster [INF] 1.6e scrub starts
2017-08-25 23:32:22.150274 osd.8 osd.8 192.168.1.240:6806/2820 34 : cluster [INF] 1.6e scrub ok
2017-08-25 23:32:58.297062 osd.9 osd.9 192.168.1.240:6802/7091 32 : cluster [INF] 1.d5 scrub starts
2017-08-25 23:35:19.285841 osd.9 osd.9 192.168.1.240:6802/7091 33 : cluster [INF] 1.d5 scrub ok
2017-08-25 23:36:38.375447 osd.8 osd.8 192.168.1.240:6806/2820 35 : cluster [INF] 1.3 scrub starts
2017-08-25 23:37:25.012116 osd.8 osd.8 192.168.1.240:6806/2820 36 : cluster [INF] 1.3 scrub ok
2017-08-25 23:38:29.406144 osd.8 osd.8 192.168.1.240:6806/2820 37 : cluster [INF] 1.45 scrub starts
2017-08-25 23:38:53.020365 mon.0 mon.0 192.168.1.240:6789/0 831 : cluster [INF] osd.9 failed (root=default,host=proxmox1) (connection refused reported by osd.8)
2017-08-25 23:38:53.166364 mon.0 mon.0 192.168.1.240:6789/0 832 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
2017-08-25 23:38:56.200767 mon.0 mon.0 192.168.1.240:6789/0 837 : cluster [WRN] Health check failed: Degraded data redundancy: 100309/3786338 objects degraded (2.649%), 14 pgs unclean, 14 pgs degraded (PG_DEGRADED)
2017-08-25 23:38:58.155562 mon.0 mon.0 192.168.1.240:6789/0 838 : cluster [WRN] Health check failed: Reduced data availability: 1 pg inactive (PG_AVAILABILITY)
2017-08-25 23:38:58.155601 mon.0 mon.0 192.168.1.240:6789/0 839 : cluster [WRN] Health check update: Degraded data redundancy: 715775/3786338 objects degraded (18.904%), 101 pgs unclean, 102 pgs degraded (PG_DEGRADED)
2017-08-25 23:39:30.172451 mon.0 mon.0 192.168.1.240:6789/0 840 : cluster [WRN] Health check update: Degraded data redundancy: 715775/3786338 objects degraded (18.904%), 102 pgs unclean, 102 pgs degraded (PG_DEGRADED)
2017-08-25 23:39:47.851497 mon.0 mon.0 192.168.1.240:6789/0 843 : cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2017-08-25 23:39:47.864774 mon.0 mon.0 192.168.1.240:6789/0 844 : cluster [INF] osd.9 192.168.1.240:6802/32411 boot
2017-08-25 23:39:50.876761 mon.0 mon.0 192.168.1.240:6789/0 849 : cluster [WRN] Health check update: Degraded data redundancy: 672540/3786338 objects degraded (17.762%), 96 pgs unclean, 96 pgs degraded (PG_DEGRADED)
2017-08-25 23:39:52.184954 mon.0 mon.0 192.168.1.240:6789/0 850 : cluster [WRN] Health check update: Degraded data redundancy: 476349/3786338 objects degraded (12.581%), 69 pgs unclean, 69 pgs degraded (PG_DEGRADED)
2017-08-25 23:39:50.533429 osd.0 osd.0 192.168.1.240:6814/16223 13 : cluster [INF] 1.80 scrub starts
2017-08-25 23:39:55.056537 mon.0 mon.0 192.168.1.240:6789/0 851 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive)
2017-08-25 23:39:55.056574 mon.0 mon.0 192.168.1.240:6789/0 852 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 476349/3786338 objects degraded (12.581%), 69 pgs unclean, 69 pgs degraded)
2017-08-25 23:39:55.056591 mon.0 mon.0 192.168.1.240:6789/0 853 : cluster [INF] Cluster is now healthy
2017-08-25 23:40:17.806395 osd.0 osd.0 192.168.1.240:6814/16223 14 : cluster [INF] 1.80 scrub ok
2017-08-25 23:40:19.775012 osd.9 osd.9 192.168.1.240:6802/32411 1 : cluster [INF] 1.5a scrub starts
2017-08-25 23:40:46.458847 osd.9 osd.9 192.168.1.240:6802/32411 2 : cluster [INF] 1.5a scrub ok
2017-08-25 23:40:53.807218 osd.9 osd.9 192.168.1.240:6802/32411 3 : cluster [INF] 1.56 scrub starts
2017-08-25 23:41:16.197304 osd.9 osd.9 192.168.1.240:6802/32411 4 : cluster [INF] 1.56 scrub ok
2017-08-25 23:41:24.814502 osd.9 osd.9 192.168.1.240:6802/32411 5 : cluster [INF] 1.92 deep-scrub starts
2017-08-25 23:51:35.881952 osd.9 osd.9 192.168.1.240:6802/32411 6 : cluster [INF] 1.92 deep-scrub ok
2017-08-25 23:52:54.476268 osd.10 osd.10 192.168.1.240:6810/4355 39 : cluster [INF] 1.f2 scrub starts
2017-08-25 23:53:21.208291 osd.10 osd.10 192.168.1.240:6810/4355 40 : cluster [INF] 1.f2 scrub ok
2017-08-25 23:53:47.475879 osd.10 osd.10 192.168.1.240:6810/4355 41 : cluster [INF] 1.c8 deep-scrub starts
2017-08-26 00:01:08.611371 osd.10 osd.10 192.168.1.240:6810/4355 42 : cluster [INF] 1.c8 deep-scrub ok
20
root@proxmox1:/# ceph pg dump | egrep -v '^(0\.|1\.|2\.|3\.)' | egrep -v '(^pool\ (0|1|2|3))' | column -t
dumped all
version 9678
stamp 2017-08-27 01:27:53.321763
last_osdmap_epoch 0
last_pg_scan 0
full_ratio 0
nearfull_ratio 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP
6.4 0 0 0 0 0 0 0 0 active+clean 2017-08-26 22:55:06.289033 0'0 4725:2848 [10,8] 10 [10,8] 10 0'0 2017-08-26 22:55:06.288961 0'0 2017-08-20 23:32:51.270895
6.5 0 0 0 0 0 0 0 0 active+clean 2017-08-26 23:03:07.062129 0'0 4726:2101 [0,10] 0 [0,10] 0 0'0 2017-08-26 01:50:04.421951 0'0 2017-08-22 14:26:19.915612
6.6 0 0 0 0 0 0 0 0 active+clean 2017-08-27 00:46:09.548107 0'0 4726:2344 [10,9] 10 [10,9] 10 0'0 2017-08-27 00:46:09.548029 0'0 2017-08-24 13:08:56.447183
6.7 0 0 0 0 0 0 0 0 active+clean 2017-08-26 22:52:44.635393 0'0 4725:1481 [10,8] 10 [10,8] 10 0'0 2017-08-25 22:02:26.297723 0'0 2017-08-23 15:55:58.299570
6.3 0 0 0 0 0 0 0 0 active+clean 2017-08-26 22:52:44.632667 0'0 4725:1971 [8,10] 8 [8,10] 8 0'0 2017-08-25 23:28:55.415148 0'0 2017-08-25 23:28:55.415148
5.0 18661 0 0 0 0 12583538 1563 1563 active+clean 2017-08-26 22:03:03.809158 4652'1197298 4725:1382436 [10,9] 10 [10,9] 10 4623'1197263 2017-08-26 19:49:19.819627 4270'1161119 2017-08-20 02:04:03.373813
6.2 0 0 0 0 0 0 0 0 active+clean 2017-08-26 22:52:45.677622 0'0 4725:1440 [9,8] 9 [9,8] 9 0'0 2017-08-26 20:58:34.722865 0'0 2017-08-26 20:58:34.722865
5.1 18878 0 0 0 0 12583048 1573 1573 active+clean 2017-08-26 23:03:07.062298 4640'959478 4726:1131301 [0,8] 0 [0,8] 0 4596'958844 2017-08-26 13:47:19.329350 4393'956123 2017-08-25 09:32:09.556396
6.1 0 0 0 0 0 0 0 0 active+clean 2017-08-26 22:52:44.736333 0'0 4725:1615 [8,9] 8 [8,9] 8 0'0 2017-08-26 01:28:24.476136 0'0 2017-08-22 16:20:13.243273
5.2 18472 0 0 0 0 32462655 1592 1592 active+clean 2017-08-26 22:52:44.634997 4652'952265 4725:1174014 [10,8] 10 [10,8] 10 4652'952265 2017-08-26 22:45:06.916647 4270'930889 2017-08-23 05:50:46.370503
6.0 0 0 0 0 0 0 0 0 active+clean 2017-08-26 23:03:07.061426 0'0 4726:2441 [10,0] 10 [10,0] 10 0'0 2017-08-26 21:59:03.746276 0'0 2017-08-23 02:26:18.206975
5.3 18512 0 0 0 0 10928869 1519 1519 active+clean 2017-08-26 23:03:07.062484 4639'984496 4726:1199339 [0,8] 0 [0,8] 0 4531'983789 2017-08-26 00:09:32.283691 4270'975964 2017-08-23 16:15:09.546043
5 74523 0 0 0 0 68558110 6247 6247
6 0 0 0 0 0 0 0 0
1 1821197 0 0 0 0 6962542387273 401319 401319
sum 1895720 0 0 0 0 6962610945383 407566 407566
OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM
8 4288G 3163G 7451G [0,9,10] 177 93
10 4240G 3211G 7451G [0,8,9] 175 93
0 1984G 809G 2794G [8,9,10] 82 37
9 2492G 1233G 3725G [0,8,10] 102 45
sum 13005G 8418G 21424G
root@proxmox1:~# ceph versions
{
"mon": {
"ceph version 12.1.2 (cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc)": 1
},
"mgr": {
"ceph version 12.1.2 (cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc)": 1
},
"osd": {
"ceph version 12.1.2 (cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc)": 4
},
"mds": {
"ceph version 12.1.2 (cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc)": 1
},
"overall": {
"ceph version 12.1.2 (cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc)": 7
}
}
- begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
- devices
device 0 osd.0 class hdd
device 1 device1
device 2 device2
device 3 device3
device 4 device4
device 5 device5
device 6 device6
device 7 device7
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
- types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
- buckets
host proxmox1 {
id -2 # do not change unnecessarily
id -3 class hdd # do not change unnecessarily # weight 20.922
alg straw
hash 0 # rjenkins1
item osd.10 weight 7.277
item osd.9 weight 3.639
item osd.0 weight 2.729
item osd.8 weight 7.277
}
root default {
id -1 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily # weight 20.922
alg straw
hash 0 # rjenkins1
item proxmox1 weight 20.922
}
- rules
rule replicated_ruleset {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type osd
step emit
}
- end crush map
Related issues
History
#1 Updated by Tomasz Kusmierz over 6 years ago
I was asked to add this in:
Sep 08 18:48:05 proxmox1 ceph-osd3954: * Caught signal (Segmentation fault) *
Sep 08 18:48:05 proxmox1 ceph-osd3954: in thread 7f5c883f0700 thread_name:tp_osd_tp
Sep 08 18:48:05 proxmox1 ceph-osd3954: ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 08 18:48:05 proxmox1 ceph-osd3954: 1: (()+0xa07bb4) [0x55e157f22bb4]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 2: (()+0x110c0) [0x7f5ca94030c0]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 3: (()+0x1ff2f) [0x7f5caba05f2f]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 4: (rocksdb::BlockBasedTable::NewIndexIterator(rocksdb::ReadOptions const&, rocksdb::BlockIter, rocksdb::BlockBasedTable::CachableEntry<rocksdb::BlockBasedTable::IndexReader>*)+0x4e6) [0x55e158306bb6]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 5: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, bool)+0x283) [0x55e158307963]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 6: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::HistogramImpl*, bool, int)+0x13a) [0x55e1583e718a]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 7: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, rocksdb::RangeDelAggregator*, bool*, bool*, unsigned long*)+0x3f8) [0x55e1582c8c28]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 8: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*)+0x552) [0x55e15838d682]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 9: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x13) [0x55e15838dab3]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 10: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0xc1) [0x55e157e6bb51]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 11: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::list)+0x1bb) [0x55e157e6308b]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 12: (()+0x885d71) [0x55e157da0d71]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 13: (()+0x885675) [0x55e157da0675]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 14: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x5d7) [0x55e157de48c7]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 15: (BlueStore::_do_truncate(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, std::allocator<BlueStore::SharedBlob*> >)+0x118) [0x55e157e05cd8]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 16: (BlueStore::_do_remove(BlueStore::TransContext, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0xc5) [0x55e157e06755]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 17: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x7b) [0x55e157e0807b]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 18: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1f55) [0x55e157e1ec15]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 19: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x536) [0x55e157e1f916]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 20: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x66) [0x55e157b437f6]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 21: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xbdc) [0x55e157c70d6c]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 22: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2b7) [0x55e157c73b47]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 23: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55e157b810d0]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 24: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x4e3) [0x55e157ae6a83]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 25: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x55e15796b19b]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 26: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x55e157c0354a]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 27: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x55e157991d9d]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 28: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x55e157f6f20f]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 29: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55e157f72510]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 30: (()+0x7494) [0x7f5ca93f9494]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 31: (clone()+0x3f) [0x7f5ca8480aff]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 2017-09-08 18:48:05.257014 7f5c883f0700 -1 Caught signal (Segmentation fault)
Sep 08 18:48:05 proxmox1 ceph-osd3954: in thread 7f5c883f0700 thread_name:tp_osd_tp
Sep 08 18:48:05 proxmox1 ceph-osd3954: ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 08 18:48:05 proxmox1 ceph-osd3954: 1: (()+0xa07bb4) [0x55e157f22bb4]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 2: (()+0x110c0) [0x7f5ca94030c0]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 3: (()+0x1ff2f) [0x7f5caba05f2f]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 4: (rocksdb::BlockBasedTable::NewIndexIterator(rocksdb::ReadOptions const&, rocksdb::BlockIter, rocksdb::BlockBasedTable::CachableEntry<rocksdb::BlockBasedTable::IndexReader>*)+0x4e6) [0x55e158306bb6]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 5: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, bool)+0x283) [0x55e158307963]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 6: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::HistogramImpl*, bool, int)+0x13a) [0x55e1583e718a]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 7: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, rocksdb::RangeDelAggregator*, bool*, bool*, unsigned long*)+0x3f8) [0x55e1582c8c28]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 8: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*)+0x552) [0x55e15838d682]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 9: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x13) [0x55e15838dab3]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 10: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0xc1) [0x55e157e6bb51]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 11: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::list)+0x1bb) [0x55e157e6308b]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 12: (()+0x885d71) [0x55e157da0d71]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 13: (()+0x885675) [0x55e157da0675]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 14: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x5d7) [0x55e157de48c7]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 15: (BlueStore::_do_truncate(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, std::allocator<BlueStore::SharedBlob*> >)+0x118) [0x55e157e05cd8]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 16: (BlueStore::_do_remove(BlueStore::TransContext, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0xc5) [0x55e157e06755]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 17: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x7b) [0x55e157e0807b]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 18: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1f55) [0x55e157e1ec15]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 19: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x536) [0x55e157e1f916]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 20: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x66) [0x55e157b437f6]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 21: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xbdc) [0x55e157c70d6c]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 22: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2b7) [0x55e157c73b47]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 23: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55e157b810d0]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 24: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x4e3) [0x55e157ae6a83]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 25: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x55e15796b19b]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 26: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x55e157c0354a]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 27: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x55e157991d9d]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 28: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x55e157f6f20f]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 29: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55e157f72510]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 30: (()+0x7494) [0x7f5ca93f9494]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 31: (clone()+0x3f) [0x7f5ca8480aff]
Sep 08 18:48:05 proxmox1 ceph-osd3954: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 08 18:48:05 proxmox1 ceph-osd3954: 0> 2017-09-08 18:48:05.257014 7f5c883f0700 -1 Caught signal (Segmentation fault) *
Sep 08 18:48:05 proxmox1 ceph-osd3954: in thread 7f5c883f0700 thread_name:tp_osd_tp
Sep 08 18:48:05 proxmox1 ceph-osd3954: ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 08 18:48:05 proxmox1 ceph-osd3954: 1: (()+0xa07bb4) [0x55e157f22bb4]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 2: (()+0x110c0) [0x7f5ca94030c0]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 3: (()+0x1ff2f) [0x7f5caba05f2f]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 4: (rocksdb::BlockBasedTable::NewIndexIterator(rocksdb::ReadOptions const&, rocksdb::BlockIter*, rocksdb::BlockBasedTable::CachableEntry<rocksdb::BlockBasedTable::IndexReader>*)+0x4e6) [0x55e158306bb6]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 5: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, bool)+0x283) [0x55e158307963]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 6: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::HistogramImpl*, bool, int)+0x13a) [0x55e1583e718a]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 7: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, rocksdb::RangeDelAggregator*, bool*, bool*, unsigned long*)+0x3f8) [0x55e1582c8c28]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 8: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*)+0x552) [0x55e15838d682]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 9: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x13) [0x55e15838dab3]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 10: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0xc1) [0x55e157e6bb51]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 11: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::list)+0x1bb) [0x55e157e6308b]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 12: (()+0x885d71) [0x55e157da0d71]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 13: (()+0x885675) [0x55e157da0675]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 14: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x5d7) [0x55e157de48c7]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 15: (BlueStore::_do_truncate(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, std::allocator<BlueStore::SharedBlob*> >)+0x118) [0x55e157e05cd8]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 16: (BlueStore::_do_remove(BlueStore::TransContext, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0xc5) [0x55e157e06755]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 17: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x7b) [0x55e157e0807b]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 18: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1f55) [0x55e157e1ec15]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 19: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x536) [0x55e157e1f916]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 20: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x66) [0x55e157b437f6]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 21: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xbdc) [0x55e157c70d6c]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 22: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2b7) [0x55e157c73b47]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 23: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55e157b810d0]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 24: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x4e3) [0x55e157ae6a83]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 25: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x55e15796b19b]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 26: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x55e157c0354a]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 27: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x55e157991d9d]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 28: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x55e157f6f20f]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 29: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55e157f72510]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 30: (()+0x7494) [0x7f5ca93f9494]
Sep 08 18:48:05 proxmox1 ceph-osd3954: 31: (clone()+0x3f) [0x7f5ca8480aff]
Sep 08 18:48:05 proxmox1 ceph-osd3954: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 08 18:48:05 proxmox1 systemd1: ceph-osd@0.service: Main process exited, code=killed, status=11/SEGV
#2 Updated by Sage Weil over 6 years ago
on one of the failing osds, can you do
CEPH_ARGS="--no-log-to-stderr --log-file out --debug-bluestore 20" ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-0 ceph-post-file log
and reply with the upload code? Thanks!
#3 Updated by Tomasz Kusmierz over 6 years ago
action fsck
error from fsck: (11) Resource temporarily unavailable
#4 Updated by Tomasz Kusmierz over 6 years ago
ceph-post-file: b2b3a8d3-35cb-45f0-8145-de5ff54753ba
#5 Updated by Sage Weil over 6 years ago
- Status changed from New to Need More Info
- Priority changed from Normal to High
Hmm, fsck doesn't report any errors. Can you check the other osds that have crashed?
Also, if htis is reproducible (a restarted osd will crash again), can you add 'debug bluestore = 20' to the ceph.conf, restart the osd, get it to crash again, and then ceph-post-file the resulting log file?
Thanks!
#6 Updated by Tomasz Kusmierz over 6 years ago
CEPH_ARGS="--no-log-to-stderr --log-file out --debug-bluestore 20" ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-8; ceph-post-file out
action fsck
args: -- out
/usr/bin/ceph-post-file: upload tag 5fde94bc-9ccd-4484-b970-dda664130726
/usr/bin/ceph-post-file: user: root@proxmox1.vixake.com
/usr/bin/ceph-post-file: will upload file out
sftp> mkdir post/5fde94bc-9ccd-4484-b970-dda664130726_root@proxmox1.vixake.com_733493e7-7b96-4f9f-a54a-04269dfd9b8d
sftp> cd post/5fde94bc-9ccd-4484-b970-dda664130726_root@proxmox1.vixake.com_733493e7-7b96-4f9f-a54a-04269dfd9b8d
sftp> put /tmp/tmp.cgicaKJJ44 user
sftp> put out
/usr/bin/ceph-post-file: copy the upload id below to share with a dev:
ceph-post-file: 5fde94bc-9ccd-4484-b970-dda664130726
I will continue with next two OSD. This is here just not to loose it.
#7 Updated by Tomasz Kusmierz over 6 years ago
rm out; CEPH_ARGS="--no-log-to-stderr --log-file out --debug-bluestore 20" ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-9; ceph-post-file out
action fsck
args: -- out
/usr/bin/ceph-post-file: upload tag 21ffd5e9-f128-4ac7-b33d-5391cdfda307
/usr/bin/ceph-post-file: user: root@proxmox1.vixake.com
/usr/bin/ceph-post-file: will upload file out
sftp> mkdir post/21ffd5e9-f128-4ac7-b33d-5391cdfda307_root@proxmox1.vixake.com_5ddf730d-1eef-4cf0-bc9c-f5a45b83539f
sftp> cd post/21ffd5e9-f128-4ac7-b33d-5391cdfda307_root@proxmox1.vixake.com_5ddf730d-1eef-4cf0-bc9c-f5a45b83539f
sftp> put /tmp/tmp.6r3HDRVhwm user
sftp> put out
/usr/bin/ceph-post-file: copy the upload id below to share with a dev:
ceph-post-file: 21ffd5e9-f128-4ac7-b33d-5391cdfda307
#8 Updated by Tomasz Kusmierz over 6 years ago
rm out; CEPH_ARGS="--no-log-to-stderr --log-file out --debug-bluestore 20" ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-10; ceph-post-file out
action fsck
args: -- out
/usr/bin/ceph-post-file: upload tag fb9edfb3-977a-4631-96e4-17bfc8afc672
/usr/bin/ceph-post-file: user: root@proxmox1.vixake.com
/usr/bin/ceph-post-file: will upload file out
sftp> mkdir post/fb9edfb3-977a-4631-96e4-17bfc8afc672_root@proxmox1.vixake.com_d99383b0-f100-44b1-887c-f001e90de5f0
sftp> cd post/fb9edfb3-977a-4631-96e4-17bfc8afc672_root@proxmox1.vixake.com_d99383b0-f100-44b1-887c-f001e90de5f0
sftp> put /tmp/tmp.4loGHRVe70 user
sftp> put out
/usr/bin/ceph-post-file: copy the upload id below to share with a dev:
ceph-post-file: fb9edfb3-977a-4631-96e4-17bfc8afc672
So that's it folks.
Any more logs requested ?
#9 Updated by Tomasz Kusmierz over 6 years ago
So OK. I did reboot the machine and everything started smoothly. Waited an hour for cluster to be absolutely stable - tried to access some data on cephfs
2017-09-14 23:13:59.449831 mon.0 mon.0 192.168.1.240:6789/0 180 : cluster [INF] Cluster is now healthy
2017-09-15 00:31:47.080133 mon.0 mon.0 192.168.1.240:6789/0 198 : cluster [INF] osd.10 failed (root=default,host=proxmox1) (connection refused reported by osd.9)
2017-09-15 00:31:47.140900 mon.0 mon.0 192.168.1.240:6789/0 204 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
2017-09-15 00:31:50.224752 mon.0 mon.0 192.168.1.240:6789/0 209 : cluster [WRN] Health check failed: Reduced data availability: 19 pgs inactive (PG_AVAILABILITY)
2017-09-15 00:31:50.224792 mon.0 mon.0 192.168.1.240:6789/0 210 : cluster [WRN] Health check failed: Degraded data redundancy: 1219462/3775548 objects degraded (32.299%), 127 pgs unclean, 175 pgs degraded (PG_DEGRADED)
2017-09-15 00:31:55.620947 mon.0 mon.0 192.168.1.240:6789/0 211 : cluster [WRN] Health check update: Degraded data redundancy: 1219462/3775554 objects degraded (32.299%), 127 pgs unclean, 175 pgs degraded (PG_DEGRADED)
#10 Updated by Tomasz Kusmierz over 6 years ago
Sep 15 00:31:46 proxmox1 ceph-osd7191: * Caught signal (Segmentation fault) *
Sep 15 00:31:46 proxmox1 ceph-osd7191: in thread 7fe98c3f3700 thread_name:tp_osd_tp
Sep 15 00:31:46 proxmox1 ceph-osd7191: ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 15 00:31:46 proxmox1 ceph-osd7191: 1: (()+0xa07bb4) [0x562f557cfbb4]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 2: (()+0x110c0) [0x7fe9aa2710c0]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 3: (()+0x1ff2f) [0x7fe9ac873f2f]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 4: (rocksdb::BlockBasedTable::NewIndexIterator(rocksdb::ReadOptions const&, rocksdb::BlockIter, rocksdb::BlockBasedTable::CachableEntry<rocksdb::BlockBasedTable::IndexReader>*)+0x4e6) [0x562f55bb3bb6]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 5: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, bool)+0x283) [0x562f55bb4963]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 6: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::HistogramImpl*, bool, int)+0x13a) [0x562f55c9418a]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 7: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, rocksdb::RangeDelAggregator*, bool*, bool*, unsigned long*)+0x3f8) [0x562f55b75c28]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 8: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*)+0x552) [0x562f55c3a682]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 9: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x13) [0x562f55c3aab3]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 10: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0xc1) [0x562f55718b51]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 11: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const, unsigned long, ceph::buffer::list*)+0x158) [0x562f5570fa28]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 12: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae) [0x562f5569442e]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 13: (BlueStore::getattr(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, char const*, ceph::buffer::ptr&)+0xf6) [0x562f55695456]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 14: (PGBackend::objects_get_attr(hobject_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::list*)+0x106) [0x562f554262d6]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 15: (PrimaryLogPG::get_object_context(hobject_t const&, bool, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::list> > > const*)+0xa3e) [0x562f55397cce]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 16: (PrimaryLogPG::find_object_context(hobject_t const&, std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x387) [0x562f5539b687]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 17: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x21ed) [0x562f553d958d]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 18: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xea6) [0x562f55394446]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 19: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x562f5521819b]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 20: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x562f554b054a]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 21: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x562f5523ed9d]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 22: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x562f5581c20f]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 23: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562f5581f510]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 24: (()+0x7494) [0x7fe9aa267494]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 25: (clone()+0x3f) [0x7fe9a92eeaff]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 2017-09-15 00:31:46.907008 7fe98c3f3700 -1 Caught signal (Segmentation fault)
Sep 15 00:31:46 proxmox1 ceph-osd7191: in thread 7fe98c3f3700 thread_name:tp_osd_tp
Sep 15 00:31:46 proxmox1 ceph-osd7191: ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 15 00:31:46 proxmox1 ceph-osd7191: 1: (()+0xa07bb4) [0x562f557cfbb4]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 2: (()+0x110c0) [0x7fe9aa2710c0]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 3: (()+0x1ff2f) [0x7fe9ac873f2f]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 4: (rocksdb::BlockBasedTable::NewIndexIterator(rocksdb::ReadOptions const&, rocksdb::BlockIter, rocksdb::BlockBasedTable::CachableEntry<rocksdb::BlockBasedTable::IndexReader>*)+0x4e6) [0x562f55bb3bb6]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 5: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, bool)+0x283) [0x562f55bb4963]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 6: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::HistogramImpl*, bool, int)+0x13a) [0x562f55c9418a]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 7: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, rocksdb::RangeDelAggregator*, bool*, bool*, unsigned long*)+0x3f8) [0x562f55b75c28]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 8: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*)+0x552) [0x562f55c3a682]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 9: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x13) [0x562f55c3aab3]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 10: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0xc1) [0x562f55718b51]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 11: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const, unsigned long, ceph::buffer::list*)+0x158) [0x562f5570fa28]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 12: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae) [0x562f5569442e]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 13: (BlueStore::getattr(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, char const*, ceph::buffer::ptr&)+0xf6) [0x562f55695456]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 14: (PGBackend::objects_get_attr(hobject_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::list*)+0x106) [0x562f554262d6]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 15: (PrimaryLogPG::get_object_context(hobject_t const&, bool, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::list> > > const*)+0xa3e) [0x562f55397cce]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 16: (PrimaryLogPG::find_object_context(hobject_t const&, std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x387) [0x562f5539b687]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 17: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x21ed) [0x562f553d958d]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 18: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xea6) [0x562f55394446]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 19: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x562f5521819b]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 20: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x562f554b054a]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 21: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x562f5523ed9d]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 22: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x562f5581c20f]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 23: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562f5581f510]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 24: (()+0x7494) [0x7fe9aa267494]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 25: (clone()+0x3f) [0x7fe9a92eeaff]
Sep 15 00:31:46 proxmox1 ceph-osd7191: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 15 00:31:46 proxmox1 ceph-osd7191: 0> 2017-09-15 00:31:46.907008 7fe98c3f3700 -1 Caught signal (Segmentation fault) *
Sep 15 00:31:46 proxmox1 ceph-osd7191: in thread 7fe98c3f3700 thread_name:tp_osd_tp
Sep 15 00:31:46 proxmox1 ceph-osd7191: ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 15 00:31:46 proxmox1 ceph-osd7191: 1: (()+0xa07bb4) [0x562f557cfbb4]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 2: (()+0x110c0) [0x7fe9aa2710c0]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 3: (()+0x1ff2f) [0x7fe9ac873f2f]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 4: (rocksdb::BlockBasedTable::NewIndexIterator(rocksdb::ReadOptions const&, rocksdb::BlockIter*, rocksdb::BlockBasedTable::CachableEntry<rocksdb::BlockBasedTable::IndexReader>*)+0x4e6) [0x562f55bb3bb6]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 5: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, bool)+0x283) [0x562f55bb4963]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 6: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::HistogramImpl*, bool, int)+0x13a) [0x562f55c9418a]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 7: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, rocksdb::RangeDelAggregator*, bool*, bool*, unsigned long*)+0x3f8) [0x562f55b75c28]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 8: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*)+0x552) [0x562f55c3a682]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 9: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x13) [0x562f55c3aab3]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 10: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0xc1) [0x562f55718b51]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 11: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const, unsigned long, ceph::buffer::list*)+0x158) [0x562f5570fa28]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 12: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae) [0x562f5569442e]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 13: (BlueStore::getattr(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, char const*, ceph::buffer::ptr&)+0xf6) [0x562f55695456]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 14: (PGBackend::objects_get_attr(hobject_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::list*)+0x106) [0x562f554262d6]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 15: (PrimaryLogPG::get_object_context(hobject_t const&, bool, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::list> > > const*)+0xa3e) [0x562f55397cce]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 16: (PrimaryLogPG::find_object_context(hobject_t const&, std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x387) [0x562f5539b687]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 17: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x21ed) [0x562f553d958d]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 18: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xea6) [0x562f55394446]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 19: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x562f5521819b]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 20: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x562f554b054a]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 21: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x562f5523ed9d]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 22: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x562f5581c20f]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 23: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562f5581f510]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 24: (()+0x7494) [0x7fe9aa267494]
Sep 15 00:31:46 proxmox1 ceph-osd7191: 25: (clone()+0x3f) [0x7fe9a92eeaff]
Sep 15 00:31:46 proxmox1 ceph-osd7191: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 15 00:31:47 proxmox1 systemd1: ceph-osd@10.service: Main process exited, code=killed, status=11/SEGV
Sep 15 00:31:47 proxmox1 systemd1: ceph-osd@10.service: Unit entered failed state.
Sep 15 00:31:47 proxmox1 systemd1: ceph-osd@10.service: Failed with result 'signal'.
Sep 15 00:31:47 proxmox1 kernel: libceph: osd10 192.168.1.240:6814 socket closed (con state OPEN)
Sep 15 00:31:47 proxmox1 kernel: libceph: osd10 192.168.1.240:6814 socket closed (con state CONNECTING)
Sep 15 00:31:47 proxmox1 kernel: libceph: osd10 down
#11 Updated by Tomasz Kusmierz over 6 years ago
Ok, so I did as advised with
[global]
debug bluestore = 20
[osd]
debug bluestore = 20
Here OSD 10 fails right on queue.
2017-09-16 19:51:03.942591 mon.0 mon.0 192.168.1.240:6789/0 246 : cluster [INF] Cluster is now healthy
2017-09-16 21:23:41.705597 mon.0 mon.0 192.168.1.240:6789/0 292 : cluster [INF] osd.10 failed (root=default,host=proxmox1) (connection refused reported by osd.0)
2017-09-16 21:23:41.756966 mon.0 mon.0 192.168.1.240:6789/0 293 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
2017-09-16 21:23:44.827529 mon.0 mon.0 192.168.1.240:6789/0 298 : cluster [WRN] Health check failed: Reduced data availability: 19 pgs inactive (PG_AVAILABILITY)
2017-09-16 21:23:44.827574 mon.0 mon.0 192.168.1.240:6789/0 299 : cluster [WRN] Health check failed: Degraded data redundancy: 1020757/3776242 objects degraded (27.031%), 142 pgs unclean, 145 pgs degraded (PG_DEGRADED)
2017-09-16 21:23:46.531644 mon.0 mon.0 192.168.1.240:6789/0 300 : cluster [WRN] Health check update: Reduced data availability: 24 pgs inactive (PG_AVAILABILITY)
2017-09-16 21:23:46.531677 mon.0 mon.0 192.168.1.240:6789/0 301 : cluster [WRN] Health check update: Degraded data redundancy: 1219677/3776242 objects degraded (32.299%), 171 pgs unclean, 175 pgs degraded (PG_DEGRADED)
2017-09-16 21:23:58.539188 mon.0 mon.0 192.168.1.240:6789/0 302 : cluster [WRN] Health check failed: 1 slow requests are blocked > 32 sec (REQUEST_SLOW)
2017-09-16 21:24:00.540138 mon.0 mon.0 192.168.1.240:6789/0 303 : cluster [WRN] Health check update: Degraded data redundancy: 1219677/3776242 objects degraded (32.299%), 172 pgs unclean, 175 pgs degraded (PG_DEGRADED)
2017-09-16 21:24:12.735259 osd.8 osd.8 192.168.1.240:6810/11662 11 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.957971 secs
2017-09-16 21:24:12.735277 osd.8 osd.8 192.168.1.240:6810/11662 12 : cluster [WRN] slow request 30.957971 seconds old, received at 2017-09-16 21:23:41.777206: osd_op(client.1304422.1:6187 1.2c3fedf5 1:afb7fc34:::1000000038d.0000003f:head [read 0~3407872 [1@-1]] snapc 0=[] RETRY=2 ack+retry+read+known_if_redirected e7144) currently waiting for active
2017-09-16 21:24:18.856839 mon.0 mon.0 192.168.1.240:6789/0 304 : cluster [WRN] Health check update: Degraded data redundancy: 1219677/3776242 objects degraded (32.299%), 173 pgs unclean, 175 pgs degraded (PG_DEGRADED)
2017-09-16 21:24:18.856879 mon.0 mon.0 192.168.1.240:6789/0 305 : cluster [WRN] Health check update: 3 slow requests are blocked > 32 sec (REQUEST_SLOW)
2017-09-16 21:24:20.552185 mon.0 mon.0 192.168.1.240:6789/0 306 : cluster [WRN] Health check update: Degraded data redundancy: 1219677/3776242 objects degraded (32.299%), 174 pgs unclean, 175 pgs degraded (PG_DEGRADED)
2017-09-16 21:24:28.738204 osd.8 osd.8 192.168.1.240:6810/11662 13 : cluster [WRN] 2 slow requests, 1 included below; oldest blocked for > 46.960950 secs
2017-09-16 21:24:28.738209 osd.8 osd.8 192.168.1.240:6810/11662 14 : cluster [WRN] slow request 30.900214 seconds old, received at 2017-09-16 21:23:57.837942: osd_op(client.1304422.1:6191 1.2c3fedf5 1:afb7fc34:::1000000038d.0000003f:head [read 3407872~786432 [1@-1]] snapc 0=[] ack+read+known_if_redirected e7145) currently waiting for active
2017-09-16 21:24:30.738467 osd.8 osd.8 192.168.1.240:6810/11662 15 : cluster [WRN] 3 slow requests, 1 included below; oldest blocked for > 48.961217 secs
2017-09-16 21:24:30.738473 osd.8 osd.8 192.168.1.240:6810/11662 16 : cluster [WRN] slow request 30.966784 seconds old, received at 2017-09-16 21:23:59.771639: osd_op(client.1304422.1:6194 1.fff80ff7 1:eff01fff:::10000101cdc.00000000:head [write 0~4096 [2@0],write 16384~16384 [2@0]] snapc 1=[] ondisk+write+known_if_redirected e7145) currently waiting for active
2017-09-16 21:24:42.565202 mon.0 mon.0 192.168.1.240:6789/0 307 : cluster [WRN] Health check update: Reduced data availability: 175 pgs inactive (PG_AVAILABILITY)
2017-09-16 21:24:42.565240 mon.0 mon.0 192.168.1.240:6789/0 308 : cluster [WRN] Health check update: Degraded data redundancy: 1219677/3776242 objects degraded (32.299%), 175 pgs unclean, 175 pgs degraded (PG_DEGRADED)
2017-09-16 21:24:44.566833 mon.0 mon.0 192.168.1.240:6789/0 309 : cluster [WRN] Health check update: Degraded data redundancy: 1219677/3776242 objects degraded (32.299%), 175 pgs unclean, 175 pgs degraded, 175 pgs undersized (PG_DEGRADED)
2017-09-16 21:24:42.740998 osd.8 osd.8 192.168.1.240:6810/11662 17 : cluster [WRN] 3 slow requests, 1 included below; oldest blocked for > 60.963759 secs
2017-09-16 21:24:42.741005 osd.8 osd.8 192.168.1.240:6810/11662 18 : cluster [WRN] slow request 60.963759 seconds old, received at 2017-09-16 21:23:41.777206: osd_op(client.1304422.1:6187 1.2c3fedf5 1:afb7fc34:::1000000038d.0000003f:head [read 0~3407872 [1@-1]] snapc 0=[] RETRY=2 ack+retry+read+known_if_redirected e7144) currently waiting for active
2017-09-16 21:24:58.744604 osd.8 osd.8 192.168.1.240:6810/11662 19 : cluster [WRN] 3 slow requests, 1 included below; oldest blocked for > 76.967356 secs
2017-09-16 21:24:58.744612 osd.8 osd.8 192.168.1.240:6810/11662 20 : cluster [WRN] slow request 60.906620 seconds old, received at 2017-09-16 21:23:57.837942: osd_op(client.1304422.1:6191 1.2c3fedf5 1:afb7fc34:::1000000038d.0000003f:head [read 3407872~786432 [1@-1]] snapc 0=[] ack+read+known_if_redirected e7145) currently waiting for active
2017-09-16 21:25:00.745097 osd.8 osd.8 192.168.1.240:6810/11662 21 : cluster [WRN] 3 slow requests, 1 included below; oldest blocked for > 78.967855 secs
2017-09-16 21:25:00.745103 osd.8 osd.8 192.168.1.240:6810/11662 22 : cluster [WRN] slow request 60.973422 seconds old, received at 2017-09-16 21:23:59.771639: osd_op(client.1304422.1:6194 1.fff80ff7 1:eff01fff:::10000101cdc.00000000:head [write 0~4096 [2@0],write 16384~16384 [2@0]] snapc 1=[] ondisk+write+known_if_redirected e7145) currently waiting for active
2017-09-16 21:25:42.753323 osd.8 osd.8 192.168.1.240:6810/11662 23 : cluster [WRN] 3 slow requests, 1 included below; oldest blocked for > 120.976084 secs
2017-09-16 21:25:42.753329 osd.8 osd.8 192.168.1.240:6810/11662 24 : cluster [WRN] slow request 120.976084 seconds old, received at 2017-09-16 21:23:41.777206: osd_op(client.1304422.1:6187 1.2c3fedf5 1:afb7fc34:::1000000038d.0000003f:head [read 0~3407872 [1@-1]] snapc 0=[] RETRY=2 ack+retry+read+known_if_redirected e7144) currently waiting for active
2017-09-16 21:25:58.756301 osd.8 osd.8 192.168.1.240:6810/11662 25 : cluster [WRN] 3 slow requests, 1 included below; oldest blocked for > 136.979063 secs
2017-09-16 21:25:58.756307 osd.8 osd.8 192.168.1.240:6810/11662 26 : cluster [WRN] slow request 120.918327 seconds old, received at 2017-09-16 21:23:57.837942: osd_op(client.1304422.1:6191 1.2c3fedf5 1:afb7fc34:::1000000038d.0000003f:head [read 3407872~786432 [1@-1]] snapc 0=[] ack+read+known_if_redirected e7145) currently waiting for active
2017-09-16 21:26:00.756650 osd.8 osd.8 192.168.1.240:6810/11662 27 : cluster [WRN] 3 slow requests, 1 included below; oldest blocked for > 138.979409 secs
2017-09-16 21:26:00.756656 osd.8 osd.8 192.168.1.240:6810/11662 28 : cluster [WRN] slow request 120.984976 seconds old, received at 2017-09-16 21:23:59.771639: osd_op(client.1304422.1:6194 1.fff80ff7 1:eff01fff:::10000101cdc.00000000:head [write 0~4096 [2@0],write 16384~16384 [2@0]] snapc 1=[] ondisk+write+known_if_redirected e7145) currently waiting for active
And here is requested log
root@proxmox1:/var/log/ceph# ceph-post-file ceph-osd.10.log
args: -- ceph-osd.10.log
/usr/bin/ceph-post-file: upload tag 4c4948c7-df7c-4595-a412-741649e89e2a
/usr/bin/ceph-post-file: user: root@proxmox1.vixake.com
/usr/bin/ceph-post-file: will upload file ceph-osd.10.log
sftp> mkdir post/4c4948c7-df7c-4595-a412-741649e89e2a_root@proxmox1.vixake.com_235d8b2d-5315-4c3c-9ed9-9bc8daf9d80f
sftp> cd post/4c4948c7-df7c-4595-a412-741649e89e2a_root@proxmox1.vixake.com_235d8b2d-5315-4c3c-9ed9-9bc8daf9d80f
sftp> put /tmp/tmp.qwiBJsePAD user
sftp> put ceph-osd.10.log
/usr/bin/ceph-post-file: copy the upload id below to share with a dev:
ceph-post-file: 4c4948c7-df7c-4595-a412-741649e89e2a
#12 Updated by Tomasz Kusmierz over 6 years ago
I let ceph run with those flags for some time and all crashed today at 14:36 due to no space of disk.
all logs are here, I'm deleting them for a drive
root@proxmox1:/var/log/ceph# ceph-post-file ceph
Display all 102 possibilities? (y or n)
root@proxmox1:/var/log/ceph# ceph-post-file ceph-osd.0.log ceph-osd.8.log ceph-osd.9.log ceph-osd.10.log
args: -- ceph-osd.0.log ceph-osd.8.log ceph-osd.9.log ceph-osd.10.log
/usr/bin/ceph-post-file: upload tag 108a5841-d1de-416c-b02e-842debefa01a
/usr/bin/ceph-post-file: user: root@proxmox1.vixake.com
/usr/bin/ceph-post-file: will upload file ceph-osd.0.log
/usr/bin/ceph-post-file: will upload file ceph-osd.8.log
/usr/bin/ceph-post-file: will upload file ceph-osd.9.log
/usr/bin/ceph-post-file: will upload file ceph-osd.10.log
sftp> mkdir post/108a5841-d1de-416c-b02e-842debefa01a_root@proxmox1.vixake.com_fc4f47c7-dd75-4fbb-9542-a642e192c681
sftp> cd post/108a5841-d1de-416c-b02e-842debefa01a_root@proxmox1.vixake.com_fc4f47c7-dd75-4fbb-9542-a642e192c681
sftp> put /tmp/tmp.UD1RAeoNBg user
sftp> put ceph-osd.0.log
sftp> put ceph-osd.8.log
sftp> put ceph-osd.9.log
sftp> put ceph-osd.10.log
/usr/bin/ceph-post-file: copy the upload id below to share with a dev:
ceph-post-file: 108a5841-d1de-416c-b02e-842debefa01a
#13 Updated by Tomasz Kusmierz over 6 years ago
OK. so I've got my self to more 8TB disks trying to shift data there. So new disks are osd.2 and osd3 ... and here is a most recent OSD.3 crash:
Sep 24 11:10:29 proxmox1 ceph-osd20007: * Caught signal (Segmentation fault) *
Sep 24 11:10:29 proxmox1 ceph-osd20007: in thread 7fafb5c2ce40 thread_name:ceph-osd
Sep 24 11:10:29 proxmox1 ceph-osd20007: ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 24 11:10:29 proxmox1 ceph-osd20007: 1: (()+0xa07bb4) [0x564921f46bb4]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 2: (()+0x110c0) [0x7fafb320e0c0]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 3: (()+0x1ff2f) [0x7fafb5810f2f]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 4: (rocksdb::Arena::AllocateNewBlock(unsigned long)+0x6d) [0x56492235d2ad]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 5: (rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x4b) [0x56492235d3fb]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 6: (rocksdb::Arena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger)+0x110) [0x56492235d5b0]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 7: (rocksdb::DoGenerateLevelFilesBrief(rocksdb::LevelFilesBrief*, std::vector<rocksdb::FileMetaData*, std::allocator<rocksdb::FileMetaData*> > const&, rocksdb::Arena*)+0x184) [0x5649222dfa74]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 8: (rocksdb::VersionStorageInfo::GenerateLevelFilesBrief()+0xda) [0x5649222e073a]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 9: (rocksdb::Version::PrepareApply(rocksdb::MutableCFOptions const&, bool)+0x76) [0x5649222eb656]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 10: (rocksdb::VersionSet::LogAndApply(rocksdb::ColumnFamilyData*, rocksdb::MutableCFOptions const&, rocksdb::autovector<rocksdb::VersionEdit*, 8ul> const&, rocksdb::InstrumentedMutex*, rocksdb::Directory*, bool, rocksdb::ColumnFamilyOptions const*)+0x698) [0x5649222ee6f8]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 11: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool)+0x1d3c) [0x5649222b966c]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 12: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x89e) [0x5649222ba60e]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 13: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >, rocksdb::DB)+0xe40) [0x5649222bb900]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 14: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB*)+0x698) [0x5649222bd168]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 15: (RocksDBStore::do_open(std::ostream&, bool)+0x908) [0x564921e8b808]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 16: (BlueStore::_open_db(bool)+0xf6b) [0x564921e1591b]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 17: (BlueStore::_mount(bool)+0x29c) [0x564921e474bc]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 18: (OSD::init()+0x3df) [0x5649219bc76f]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 19: (main()+0x2eb8) [0x5649218d0d38]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 20: (_libc_start_main()+0xf1) [0x7fafb21c32b1]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 21: (_start()+0x2a) [0x56492195aa0a]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 2017-09-24 11:10:29.819151 7fafb5c2ce40 -1 * Caught signal (Segmentation fault) *
Sep 24 11:10:29 proxmox1 ceph-osd20007: in thread 7fafb5c2ce40 thread_name:ceph-osd
Sep 24 11:10:29 proxmox1 ceph-osd20007: ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 24 11:10:29 proxmox1 ceph-osd20007: 1: (()+0xa07bb4) [0x564921f46bb4]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 2: (()+0x110c0) [0x7fafb320e0c0]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 3: (()+0x1ff2f) [0x7fafb5810f2f]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 4: (rocksdb::Arena::AllocateNewBlock(unsigned long)+0x6d) [0x56492235d2ad]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 5: (rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x4b) [0x56492235d3fb]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 6: (rocksdb::Arena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger)+0x110) [0x56492235d5b0]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 7: (rocksdb::DoGenerateLevelFilesBrief(rocksdb::LevelFilesBrief*, std::vector<rocksdb::FileMetaData*, std::allocator<rocksdb::FileMetaData*> > const&, rocksdb::Arena*)+0x184) [0x5649222dfa74]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 8: (rocksdb::VersionStorageInfo::GenerateLevelFilesBrief()+0xda) [0x5649222e073a]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 9: (rocksdb::Version::PrepareApply(rocksdb::MutableCFOptions const&, bool)+0x76) [0x5649222eb656]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 10: (rocksdb::VersionSet::LogAndApply(rocksdb::ColumnFamilyData*, rocksdb::MutableCFOptions const&, rocksdb::autovector<rocksdb::VersionEdit*, 8ul> const&, rocksdb::InstrumentedMutex*, rocksdb::Directory*, bool, rocksdb::ColumnFamilyOptions const*)+0x698) [0x5649222ee6f8]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 11: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool)+0x1d3c) [0x5649222b966c]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 12: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x89e) [0x5649222ba60e]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 13: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >, rocksdb::DB)+0xe40) [0x5649222bb900]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 14: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB*)+0x698) [0x5649222bd168]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 15: (RocksDBStore::do_open(std::ostream&, bool)+0x908) [0x564921e8b808]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 16: (BlueStore::_open_db(bool)+0xf6b) [0x564921e1591b]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 17: (BlueStore::_mount(bool)+0x29c) [0x564921e474bc]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 18: (OSD::init()+0x3df) [0x5649219bc76f]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 19: (main()+0x2eb8) [0x5649218d0d38]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 20: (_libc_start_main()+0xf1) [0x7fafb21c32b1]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 21: (_start()+0x2a) [0x56492195aa0a]
Sep 24 11:10:29 proxmox1 ceph-osd20007: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 24 11:10:29 proxmox1 ceph-osd20007: 0> 2017-09-24 11:10:29.819151 7fafb5c2ce40 -1 * Caught signal (Segmentation fault) *
Sep 24 11:10:29 proxmox1 ceph-osd20007: in thread 7fafb5c2ce40 thread_name:ceph-osd
Sep 24 11:10:29 proxmox1 ceph-osd20007: ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 24 11:10:29 proxmox1 ceph-osd20007: 1: (()+0xa07bb4) [0x564921f46bb4]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 2: (()+0x110c0) [0x7fafb320e0c0]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 3: (()+0x1ff2f) [0x7fafb5810f2f]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 4: (rocksdb::Arena::AllocateNewBlock(unsigned long)+0x6d) [0x56492235d2ad]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 5: (rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x4b) [0x56492235d3fb]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 6: (rocksdb::Arena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger)+0x110) [0x56492235d5b0]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 7: (rocksdb::DoGenerateLevelFilesBrief(rocksdb::LevelFilesBrief*, std::vector<rocksdb::FileMetaData*, std::allocator<rocksdb::FileMetaData*> > const&, rocksdb::Arena*)+0x184) [0x5649222dfa74]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 8: (rocksdb::VersionStorageInfo::GenerateLevelFilesBrief()+0xda) [0x5649222e073a]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 9: (rocksdb::Version::PrepareApply(rocksdb::MutableCFOptions const&, bool)+0x76) [0x5649222eb656]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 10: (rocksdb::VersionSet::LogAndApply(rocksdb::ColumnFamilyData*, rocksdb::MutableCFOptions const&, rocksdb::autovector<rocksdb::VersionEdit*, 8ul> const&, rocksdb::InstrumentedMutex*, rocksdb::Directory*, bool, rocksdb::ColumnFamilyOptions const*)+0x698) [0x5649222ee6f8]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 11: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool)+0x1d3c) [0x5649222b966c]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 12: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x89e) [0x5649222ba60e]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 13: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >, rocksdb::DB)+0xe40) [0x5649222bb900]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 14: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB*)+0x698) [0x5649222bd168]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 15: (RocksDBStore::do_open(std::ostream&, bool)+0x908) [0x564921e8b808]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 16: (BlueStore::_open_db(bool)+0xf6b) [0x564921e1591b]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 17: (BlueStore::_mount(bool)+0x29c) [0x564921e474bc]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 18: (OSD::init()+0x3df) [0x5649219bc76f]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 19: (main()+0x2eb8) [0x5649218d0d38]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 20: (__libc_start_main()+0xf1) [0x7fafb21c32b1]
Sep 24 11:10:29 proxmox1 ceph-osd20007: 21: (_start()+0x2a) [0x56492195aa0a]
Sep 24 11:10:29 proxmox1 ceph-osd20007: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 24 11:10:29 proxmox1 systemd1: ceph-osd@3.service: Main process exited, code=killed, status=11/SEGV
Sep 24 11:10:29 proxmox1 systemd1: ceph-osd@3.service: Unit entered failed state.
Sep 24 11:10:29 proxmox1 systemd1: ceph-osd@3.service: Failed with result 'signal'.
Sep 24 11:10:49 proxmox1 systemd1: ceph-osd@3.service: Service hold-off time over, scheduling restart.
Sep 24 11:10:49 proxmox1 systemd1: Stopped Ceph object storage daemon osd.3.
Sep 24 11:10:49 proxmox1 systemd1: Starting Ceph object storage daemon osd.3...
Sep 24 11:10:50 proxmox1 systemd1: Started Ceph object storage daemon osd.3.
#14 Updated by Tomasz Kusmierz over 6 years ago
Does anybody even checks those bugs at this issue tracker or is this just a joke to give false sense of professionalism ?
#15 Updated by Sage Weil over 6 years ago
- Subject changed from Multiple OSD crashing on 12.2.0 (started on upgrade from 12.1.1 -> 12.1.2 and still continues on 12.2) to segv in rocksdb::BlockBasedTable::NewIndexIterator
- Priority changed from High to Immediate
The fsck's all are clean--those iterate over the entire data store. No corruption.
The crash is deep in rocksdb, and seems to be triggered by a variety of things... scrub, a get from get_onode(), a get from fault_range
ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc) 1: (()+0xa07bb4) [0x55deb9eaebb4] 2: (()+0x110c0) [0x7f45f22d90c0] 3: (()+0x1ff2f) [0x7f45f48dbf2f] 4: (rocksdb::BlockBasedTable::NewIndexIterator(rocksdb::ReadOptions const&, rocksdb::BlockIter*, rocksdb::BlockBasedTable::CachableEntry<rocksdb::BlockBasedTable::IndexReader>*)+0x4e6) [0x55deba292bb6] 5: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, bool)+0x283) [0x55deba293963] 6: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::HistogramImpl*, bool, int)+0x13a) [0x55deba37318a] 7: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, rocksdb::RangeDelAggregator*, bool*, bool*, unsigned long*)+0x3f8) [0x55deba254c28] 8: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*)+0x552) [0x55deba319682] 9: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x13) [0x55deba319ab3] 10: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0xc1) [0x55deb9df7b51] 11: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer::list*)+0x158) [0x55deb9deea28] 12: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae) [0x55deb9d7342e] 13: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0xa19) [0x55deb9da96d9] 14: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x536) [0x55deb9dab916] 15: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f) [0x55deb9955dff] 16: (OSD::handle_osd_map(MOSDMap*)+0xfc6) [0x55deb99102f6] 17: (OSD::_dispatch(Message*)+0xd5) [0x55deb992ca95] 18: (OSD::ms_dispatch(Message*)+0x87) [0x55deb992d087] 19: (DispatchQueue::entry()+0xeda) [0x55deba1c837a]
this one is a bit different:
ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc) 1: (()+0xa07bb4) [0x55a9a6ffabb4] 2: (()+0x110c0) [0x7ff6266df0c0] 3: (()+0x1ff2f) [0x7ff628ce1f2f] 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0x1a5) [0x55a9a73dba35] 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x36f) [0x55a9a73dd2ef] 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x29c) [0x55a9a73dd61c] 7: (rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice const&)+0x87) [0x55a9a73e5a47] 8: (()+0xe1d63e) [0x55a9a741063e] 9: (()+0xe1d97e) [0x55a9a741097e] 10: (()+0xe1da21) [0x55a9a7410a21] 11: (rocksdb::MergingIterator::Next()+0x51b) [0x55a9a73f3a9b] 12: (rocksdb::DBIter::Next()+0xb3) [0x55a9a748eef3] 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x9a) [0x55a9a6f3919a] 14: (BlueStore::_collection_list(BlueStore::Collection*, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x153e) [0x55a9a6e93e5e] 15: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x25a) [0x55a9a6e94f1a] 16: (PGBackend::objects_list_range(hobject_t const&, hobject_t const&, snapid_t, std::vector<hobject_t, std::allocator<hobject_t> >*, std::vector<ghobject_t, std::allocator<ghobject_t> >*)+0x14b) [0x55a9a6c5392b] 17: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x200) [0x55a9a6afc2e0] 18: (PG::replica_scrub(boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x65d) [0x55a9a6b0270d] 19: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6bc) [0x55a9a6bbec5c] 20: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x55a9a6a4319b] 21: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x55a9a6cdb54a] 22: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x103d) [0x55a9a6a69d9d]
A couple questions:
1. have you adjusted the bluestore_cache_size at all? (or other bluestore/rocksdb config options?)
2. can you set 'bluestore cache kv ratio = 0' and see if that makes the crashes go away? (this will disable the rocksdb cache, so it will slow things down...)
#16 Updated by Tomasz Kusmierz over 6 years ago
auth client required = none
auth cluster required = none
auth service required = none
cluster network = 192.168.1.0/24
filestore xattr use omap = true
fsid = 33f22266-856b-4497-821d-7af1050765a0
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 192.168.1.0/24
mon_allow_pool_delete = true
mon osd full radio = 0.98
mon osd backfillfull ratio = 0.92
mon osd nearfull ratio = .90
osd mount options = rw, noatime, inode64
- enable experimental unrecoverable data corrupting features = bluestore rocksdb
- osd objectstore = bluestore
mon pg warn max object skew = 100000
mon pg warn min per osd = 0
mon pg warn max per osd = 32768
ms_type = async
[mon]
mon_pg_warn_max_per_osd=5000
mon_max_pool_pg_num=106496
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
osd max backfills = 1
osd recovery max active = 1
objecter_inflight_op_bytes = 1048576000
osd disk thread ioprio class = idle
osd disk thread ioprio priority = 7
[mon.0]
host = proxmox1
mon addr = 192.168.1.240:6789
[mds]
mds data = /var/lib/ceph/mds/ceph-$id
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.0]
host = proxmox1
#17 Updated by Tomasz Kusmierz over 6 years ago
[global]
auth client required = none
auth cluster required = none
auth service required = none
cluster network = 192.168.1.0/24
filestore xattr use omap = true
fsid = 33f22266-856b-4497-821d-7af1050765a0
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 192.168.1.0/24
mon_allow_pool_delete = true
mon osd full radio = 0.98
mon osd backfillfull ratio = 0.92
mon osd nearfull ratio = .90
osd mount options = rw, noatime, inode64
- enable experimental unrecoverable data corrupting features = bluestore rocksdb
- osd objectstore = bluestore
mon pg warn max object skew = 100000
mon pg warn min per osd = 0
mon pg warn max per osd = 32768
ms_type = async
[mon]
mon_pg_warn_max_per_osd=5000
mon_max_pool_pg_num=106496
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
osd max backfills = 1
osd recovery max active = 1
objecter_inflight_op_bytes = 1048576000
osd disk thread ioprio class = idle
osd disk thread ioprio priority = 7
[mon.0]
host = proxmox1
mon addr = 192.168.1.240:6789
[mds]
mds data = /var/lib/ceph/mds/ceph-$id
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.0]
host = proxmox1
[/code]
#18 Updated by Tomasz Kusmierz over 6 years ago
Sorry was trying to paste the code so it looks propper. Anyway those are only changes that I made. If the problem sits in cache my bet is on:
objecter_inflight_op_bytes = 1048576000
I've just disable this one, along with:
filestore xattr use omap = true
osd journal size = 5120
osd mount options = rw, noatime, inode64
reboot on the way and we will see what happens.
Thanks for reply and sorry for comment about professionalism was a bit upset with 5 week wait time.
#19 Updated by Tomasz Kusmierz over 6 years ago
Still the same for OSD.3
#$%#($*%
#($*%&@#(*$%
Sep 25 09:48:21 proxmox1 ceph-osd14958: * Caught signal (Segmentation fault) *
Sep 25 09:48:21 proxmox1 ceph-osd14958: in thread 7f606be44e40 thread_name:ceph-osd
Sep 25 09:48:21 proxmox1 ceph-osd14958: ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 25 09:48:21 proxmox1 ceph-osd14958: 1: (()+0xa07bb4) [0x55f342f11bb4]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 2: (()+0x110c0) [0x7f60694260c0]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 3: (()+0x1ff2f) [0x7f606ba28f2f]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0x1a5) [0x55f3432f2a35]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x36f) [0x55f3432f42ef]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x29c) [0x55f3432f461c]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 7: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::BlockIter*, bool)+0x89) [0x55f3432f4a89]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 8: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, bool)+0x554) [0x55f3432f6c34]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 9: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::HistogramImpl*, bool, int)+0x13a) [0x55f3433d618a]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 10: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, rocksdb::RangeDelAggregator*, bool*, bool*, unsigned long*)+0x3f8) [0x55f3432b7c28]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 11: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*)+0x552) [0x55f34337c682]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 12: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x13) [0x55f34337cab3]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 13: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0xc1) [0x55f342e5ab51]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 14: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::list)+0x1bb) [0x55f342e5208b]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 15: (BlueStore::_open_statfs()+0x7b) [0x55f342da24ab]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 16: (BlueStore::_open_super_meta()+0xdf0) [0x55f342e097e0]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 17: (BlueStore::_mount(bool)+0x2bb) [0x55f342e124db]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 18: (OSD::init()+0x3df) [0x55f34298776f]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 19: (main()+0x2eb8) [0x55f34289bd38]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 20: (_libc_start_main()+0xf1) [0x7f60683db2b1]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 21: (_start()+0x2a) [0x55f342925a0a]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 2017-09-25 09:48:21.970931 7f606be44e40 -1 Caught signal (Segmentation fault) *
Sep 25 09:48:21 proxmox1 ceph-osd14958: in thread 7f606be44e40 thread_name:ceph-osd
Sep 25 09:48:21 proxmox1 ceph-osd14958: ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
Sep 25 09:48:21 proxmox1 ceph-osd14958: 1: (()+0xa07bb4) [0x55f342f11bb4]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 2: (()+0x110c0) [0x7f60694260c0]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 3: (()+0x1ff2f) [0x7f606ba28f2f]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0x1a5) [0x55f3432f2a35]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x36f) [0x55f3432f42ef]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x29c) [0x55f3432f461c]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 7: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::BlockIter*, bool)+0x89) [0x55f3432f4a89]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 8: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, bool)+0x554) [0x55f3432f6c34]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 9: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::HistogramImpl*, bool, int)+0x13a) [0x55f3433d618a]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 10: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, rocksdb::RangeDelAggregator*, bool*, bool*, unsigned long*)+0x3f8) [0x55f3432b7c28]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 11: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*)+0x552) [0x55f34337c682]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 12: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x13) [0x55f34337cab3]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 13: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0xc1) [0x55f342e5ab51]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 14: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::list)+0x1bb) [0x55f342e5208b]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 15: (BlueStore::_open_statfs()+0x7b) [0x55f342da24ab]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 16: (BlueStore::_open_super_meta()+0xdf0) [0x55f342e097e0]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 17: (BlueStore::_mount(bool)+0x2bb) [0x55f342e124db]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 18: (OSD::init()+0x3df) [0x55f34298776f]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 19: (main()+0x2eb8) [0x55f34289bd38]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 20: (_libc_start_main()+0xf1) [0x7f60683db2b1]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 21: (_start()+0x2a) [0x55f342925a0a]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 2: (()+0x110c0) [0x7f60694260c0]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 3: (()+0x1ff2f) [0x7f606ba28f2f]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::ImmutableCFOptions const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, unsigned int, rocksdb::Slice const&, unsigned long, bool, rocksdb::Cache::Priority)+0x1a5) [0x55f3432f2a35]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 5: (rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x36f) [0x55f3432f42ef]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 6: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, bool, rocksdb::Status)+0x29c) [0x55f3432f461c]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 7: (rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::BlockIter*, bool)+0x89) [0x55f3432f4a89]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 8: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, bool)+0x554) [0x55f3432f6c34]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 9: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::HistogramImpl*, bool, int)+0x13a) [0x55f3433d618a]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 10: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, rocksdb::RangeDelAggregator*, bool*, bool*, unsigned long*)+0x3f8) [0x55f3432b7c28]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 11: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*)+0x552) [0x55f34337c682]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 12: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x13) [0x55f34337cab3]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 13: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0xc1) [0x55f342e5ab51]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 14: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::list)+0x1bb) [0x55f342e5208b]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 15: (BlueStore::_open_statfs()+0x7b) [0x55f342da24ab]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 16: (BlueStore::_open_super_meta()+0xdf0) [0x55f342e097e0]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 17: (BlueStore::_mount(bool)+0x2bb) [0x55f342e124db]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 18: (OSD::init()+0x3df) [0x55f34298776f]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 19: (main()+0x2eb8) [0x55f34289bd38]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 20: (__libc_start_main()+0xf1) [0x7f60683db2b1]
Sep 25 09:48:21 proxmox1 ceph-osd14958: 21: (_start()+0x2a) [0x55f342925a0a]
Sep 25 09:48:21 proxmox1 ceph-osd14958: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 25 09:48:22 proxmox1 systemd1: ceph-osd@3.service: Main process exited, code=killed, status=11/SEGV
Sep 25 09:48:22 proxmox1 systemd1: ceph-osd@3.service: Unit entered failed state.
Sep 25 09:48:22 proxmox1 systemd1: ceph-osd@3.service: Failed with result 'signal'.
Sep 25 09:48:42 proxmox1 systemd1: ceph-osd@3.service: Service hold-off time over, scheduling restart.
Sep 25 09:48:42 proxmox1 systemd1: Stopped Ceph object storage daemon osd.3.
Sep 25 09:48:42 proxmox1 systemd1: Starting Ceph object storage daemon osd.3...
Sep 25 09:48:42 proxmox1 systemd1: Started Ceph object storage daemon osd.3.
Sep 25 09:48:42 proxmox1 ceph-osd15491: starting osd.3 at - osd_data /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
#20 Updated by Sage Weil over 6 years ago
can you confirm that you set 'bluestore cache kv ratio = 0'? and set 'debug bluestore = 10' and 'debug rocksdb = 10' and start the osd, then ceph-post-file the log? i want to make sure we see
dout(10) << __func__ << " block_cache_size " << block_cache_size << ", setting no_block_cache " << dendl;
line indicating the rocksdb block cache is disabled entirely.
The other thing to try: install the latest luminous branch from https://shaman.ceph.com/builds/ceph/luminous/, which includes a newer rocksdb version than 12.2.0.
Also: are you using any custom allocator changes? (e.g., did you modify /etc/default/ceph or /etc/sysconfig/ceph to enable jemalloc? that would explain this crash)
#21 Updated by Sage Weil over 6 years ago
Also, can you double check from dmesg output that OOM killer isn't kicking in?
#22 Updated by Tomasz Kusmierz over 6 years ago
Sorry gents, I was away yesterday I'll get right on this in the evening.
Sage:
1. no, I've not set those values yes. I will when I'll get back to the rig.
2. I did not modify those files to enable jemalloc (as far as I remember) would you kindly let me know what to look for to check whenever it's set. Another thing is that this rig is running "proxmox" - those people base it on ubuntu, but I can not warrant that they don't make any mods like that.
3. I doubts OOM killer is on the horizon here, system has 32GB and utilisation usually sits well bellow 10G (nothing else is running) but I'll look into that as well ( better 100% certain than 99% stupid :D )
#23 Updated by Tomasz Kusmierz over 6 years ago
Sage:
I removed from:
/etc/default/ceph
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1
and for past 2 days ago I've had zero issue. I'll continue testing and reenable other options of eliminate any possible combination and return to you.
Still if that was set a long time ago and used to work OK, it's weird that on changing from 12.1.1 -> 12.1.2 everything started exploding ...
Luckily ceph lives up to expectation and with all those crashes, reboots explosions etc I've had zero data corruption / loss.
#24 Updated by Sage Weil over 6 years ago
- Status changed from Need More Info to Duplicate
- Priority changed from Immediate to Urgent
Yay! I think this is a dup then of http://tracker.ceph.com/issues/20557. We haven't looked into the jemalloc issue because it's not really recommended or supported at this point. Thank for your patience and for following up!
#25 Updated by Sage Weil over 6 years ago
- Related to Bug #20557: segmentation fault with rocksdb|BlueStore and jemalloc added
#26 Updated by Tomasz Kusmierz over 6 years ago
I can confirm that it's only jemalloc issue.
PLESE PLEASE PLEASE put it somewhere in the documentation because I think that soon you will get a lot more people upgrading from filestor to bluestor that will hit this issue.
#27 Updated by alexandre derumier over 6 years ago
Hi, I can reproduce it too with uncommenting
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1
But, anyway, does this preload have ever worked ?
see:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-March/017214.html
a simple perf command, show that tcmalloc is still used.
AFAIK, the only way to have jemalloc working, is to rebuild ceph --with-jemalloc and without any tcmalloc dev packages.
#28 Updated by Tomasz Kusmierz over 6 years ago
Yes I have to say that this part of ceph: "performance tuning" and "corner use cases" has next to zero documentation and it feels like there is no incentive on producing broadly available explanation to some of those problems. I've my self spent some time in the past investigating jemalloc but failed to find a definite answer on how to get it going (might be then I did uncommented it - duno).
Some of basic functionality is not explained at all. For example the pool size and min_size does says next to nothing about implications to operation of the cluster, also nobody ever says anything about cluster in WRN state to essentially halt MDS from allowing mounting cephfs ... there is nothing anywhere about it and it's kind of a major problem for some ... specially if you have a large cluster with lots of OSD where there will be definite failures going on all the time and due to refill you can end up with not being able to mount cephfs on your nodes for large amounts of time.
At some points documentation feels like doxygen generated document: "size - this parameter changes size of the pool" ... every newcomer will ask WTF IS A POOL SIZE ?!?!?
#29 Updated by Tim Petersen over 6 years ago
alexandre derumier wrote:
Hi, I can reproduce it too with uncommenting
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1But, anyway, does this preload have ever worked ?
Yes! Our cluster worked like this since last year - till luminous :-)