Bug #23120
closedOSDs continously crash during recovery
0%
Description
I have several OSDs continuously crashing during recovery. This is Luminous 12.2.3.
ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949) luminous (stable) 1: (()+0xa3c591) [0x55b3e5a85591] 2: (()+0xf5e0) [0x7f8c237ca5e0] 3: (gsignal()+0x37) [0x7f8c227f31f7] 4: (abort()+0x148) [0x7f8c227f48e8] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x55b3e5ac4664] 6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1487) [0x55b3e5997a27] 7: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x55b3e5998a70] 8: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x55b3e5708a85] 9: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x631) [0x55b3e5828191] 10: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) [0x55b3e5838b27] 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55b3e573d680] 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x59c) [0x55b3e56a900c] 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x55b3e552ef29] 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x55b3e57abad7] 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x55b3e555d99e] 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x55b3e5aca009] 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55b3e5acbfa0] 18: (()+0x7e25) [0x7f8c237c2e25] 19: (clone()+0x6d) [0x7f8c228b634d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
This is using the officially released RPMs.
I've uploaded the logfile of one such OSD as:
ca0a29ae-0993-4faa-be4d-9ba2f7d6f905
The cluster will likely be recreated soon, since the system is now borked anyway, so please let me know quickly if more info is needed.
Updated by Oliver Freyermuth about 6 years ago
It might be that this OSD was subject to OOM at some point in the last 24 hours.
It seems OSDs are using 2-3 times as much memory as configured via bluestore_cache_size_hdd when doing recovery and accepting small objects,
which exceeded our RAM + swap on some machines.
Updated by Oliver Freyermuth about 6 years ago
Here's another log of another OSD:
7de1dddf-27d4-4b6b-9128-0138bfaf85cf
backtrace looks similar.
Updated by Oliver Freyermuth about 6 years ago
After many restarts of all OSDs, and temporarily lowering min_size, they now stay up. I'll watch and see if the cluster recovers.
Updated by Oliver Freyermuth about 6 years ago
Cluster has mostly recovered, looks good.
Still, hopefully the stacktrace and logs can help to track down the underlying issue that caused the crashes.
Updated by Oliver Freyermuth about 6 years ago
Here's a ceph osd tree due to popular request:
# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 700.74890 root default -3 0.43658 host mon001 0 ssd 0.21829 osd.0 up 1.00000 1.00000 1 ssd 0.21829 osd.1 up 1.00000 1.00000 -5 0.43637 host mon002 2 ssd 0.21819 osd.2 up 1.00000 1.00000 3 ssd 0.21819 osd.3 up 1.00000 1.00000 -10 116.64600 host osd001 4 hdd 3.64519 osd.4 up 1.00000 1.00000 5 hdd 3.64519 osd.5 up 1.00000 1.00000 6 hdd 3.64519 osd.6 up 1.00000 1.00000 7 hdd 3.64519 osd.7 up 1.00000 1.00000 8 hdd 3.64519 osd.8 up 1.00000 1.00000 9 hdd 3.64519 osd.9 up 1.00000 1.00000 10 hdd 3.64519 osd.10 up 1.00000 1.00000 11 hdd 3.64519 osd.11 up 1.00000 1.00000 12 hdd 3.64519 osd.12 up 1.00000 1.00000 13 hdd 3.64519 osd.13 up 1.00000 1.00000 14 hdd 3.64519 osd.14 up 1.00000 1.00000 15 hdd 3.64519 osd.15 up 1.00000 1.00000 16 hdd 3.64519 osd.16 up 1.00000 1.00000 17 hdd 3.64519 osd.17 up 1.00000 1.00000 18 hdd 3.64519 osd.18 up 1.00000 1.00000 19 hdd 3.64519 osd.19 up 1.00000 1.00000 20 hdd 3.64519 osd.20 up 1.00000 1.00000 21 hdd 3.64519 osd.21 up 1.00000 1.00000 22 hdd 3.64519 osd.22 up 1.00000 1.00000 23 hdd 3.64519 osd.23 up 1.00000 1.00000 24 hdd 3.64519 osd.24 up 1.00000 1.00000 25 hdd 3.64519 osd.25 up 1.00000 1.00000 26 hdd 3.64519 osd.26 up 1.00000 1.00000 27 hdd 3.64519 osd.27 up 1.00000 1.00000 28 hdd 3.64519 osd.28 up 1.00000 1.00000 29 hdd 3.64519 osd.29 up 1.00000 1.00000 30 hdd 3.64519 osd.30 up 1.00000 1.00000 31 hdd 3.64519 osd.31 up 1.00000 1.00000 32 hdd 3.64519 osd.32 up 1.00000 1.00000 33 hdd 3.64519 osd.33 up 1.00000 1.00000 34 hdd 3.64519 osd.34 up 1.00000 1.00000 35 hdd 3.64519 osd.35 up 1.00000 1.00000 -13 116.64600 host osd002 36 hdd 3.64519 osd.36 up 1.00000 1.00000 37 hdd 3.64519 osd.37 up 1.00000 1.00000 38 hdd 3.64519 osd.38 up 1.00000 1.00000 39 hdd 3.64519 osd.39 up 1.00000 1.00000 40 hdd 3.64519 osd.40 up 1.00000 1.00000 41 hdd 3.64519 osd.41 up 1.00000 1.00000 42 hdd 3.64519 osd.42 up 1.00000 1.00000 43 hdd 3.64519 osd.43 up 1.00000 1.00000 44 hdd 3.64519 osd.44 up 1.00000 1.00000 45 hdd 3.64519 osd.45 up 1.00000 1.00000 46 hdd 3.64519 osd.46 up 1.00000 1.00000 47 hdd 3.64519 osd.47 up 1.00000 1.00000 48 hdd 3.64519 osd.48 up 1.00000 1.00000 49 hdd 3.64519 osd.49 up 1.00000 1.00000 50 hdd 3.64519 osd.50 up 1.00000 1.00000 51 hdd 3.64519 osd.51 up 1.00000 1.00000 52 hdd 3.64519 osd.52 up 1.00000 1.00000 53 hdd 3.64519 osd.53 up 1.00000 1.00000 54 hdd 3.64519 osd.54 up 1.00000 1.00000 55 hdd 3.64519 osd.55 up 1.00000 1.00000 56 hdd 3.64519 osd.56 up 1.00000 1.00000 57 hdd 3.64519 osd.57 up 1.00000 1.00000 58 hdd 3.64519 osd.58 up 1.00000 1.00000 59 hdd 3.64519 osd.59 up 1.00000 1.00000 60 hdd 3.64519 osd.60 up 1.00000 1.00000 61 hdd 3.64519 osd.61 up 1.00000 1.00000 62 hdd 3.64519 osd.62 up 1.00000 1.00000 63 hdd 3.64519 osd.63 up 1.00000 1.00000 64 hdd 3.64519 osd.64 up 1.00000 1.00000 65 hdd 3.64519 osd.65 up 1.00000 1.00000 66 hdd 3.64519 osd.66 up 1.00000 1.00000 67 hdd 3.64519 osd.67 up 1.00000 1.00000 -16 116.64600 host osd003 68 hdd 3.64519 osd.68 up 1.00000 1.00000 69 hdd 3.64519 osd.69 up 1.00000 1.00000 70 hdd 3.64519 osd.70 up 1.00000 1.00000 71 hdd 3.64519 osd.71 up 1.00000 1.00000 72 hdd 3.64519 osd.72 up 1.00000 1.00000 73 hdd 3.64519 osd.73 up 1.00000 1.00000 74 hdd 3.64519 osd.74 up 1.00000 1.00000 75 hdd 3.64519 osd.75 up 1.00000 1.00000 76 hdd 3.64519 osd.76 up 1.00000 1.00000 77 hdd 3.64519 osd.77 up 1.00000 1.00000 78 hdd 3.64519 osd.78 up 1.00000 1.00000 79 hdd 3.64519 osd.79 up 1.00000 1.00000 80 hdd 3.64519 osd.80 up 1.00000 1.00000 81 hdd 3.64519 osd.81 up 1.00000 1.00000 82 hdd 3.64519 osd.82 up 1.00000 1.00000 83 hdd 3.64519 osd.83 up 1.00000 1.00000 84 hdd 3.64519 osd.84 up 1.00000 1.00000 85 hdd 3.64519 osd.85 up 1.00000 1.00000 86 hdd 3.64519 osd.86 up 1.00000 1.00000 87 hdd 3.64519 osd.87 up 1.00000 1.00000 88 hdd 3.64519 osd.88 up 1.00000 1.00000 89 hdd 3.64519 osd.89 up 1.00000 1.00000 90 hdd 3.64519 osd.90 up 1.00000 1.00000 91 hdd 3.64519 osd.91 up 1.00000 1.00000 92 hdd 3.64519 osd.92 up 1.00000 1.00000 93 hdd 3.64519 osd.93 up 1.00000 1.00000 94 hdd 3.64519 osd.94 up 1.00000 1.00000 95 hdd 3.64519 osd.95 up 1.00000 1.00000 96 hdd 3.64519 osd.96 up 1.00000 1.00000 97 hdd 3.64519 osd.97 up 1.00000 1.00000 98 hdd 3.64519 osd.98 up 1.00000 1.00000 99 hdd 3.64519 osd.99 up 1.00000 1.00000 -19 116.64600 host osd004 100 hdd 3.64519 osd.100 up 1.00000 1.00000 101 hdd 3.64519 osd.101 up 1.00000 1.00000 102 hdd 3.64519 osd.102 up 1.00000 1.00000 103 hdd 3.64519 osd.103 up 1.00000 1.00000 104 hdd 3.64519 osd.104 up 1.00000 1.00000 105 hdd 3.64519 osd.105 up 1.00000 1.00000 106 hdd 3.64519 osd.106 up 1.00000 1.00000 107 hdd 3.64519 osd.107 up 1.00000 1.00000 108 hdd 3.64519 osd.108 up 1.00000 1.00000 109 hdd 3.64519 osd.109 up 1.00000 1.00000 110 hdd 3.64519 osd.110 up 1.00000 1.00000 111 hdd 3.64519 osd.111 up 1.00000 1.00000 112 hdd 3.64519 osd.112 up 1.00000 1.00000 113 hdd 3.64519 osd.113 up 1.00000 1.00000 114 hdd 3.64519 osd.114 up 1.00000 1.00000 115 hdd 3.64519 osd.115 up 1.00000 1.00000 116 hdd 3.64519 osd.116 up 1.00000 1.00000 117 hdd 3.64519 osd.117 up 1.00000 1.00000 118 hdd 3.64519 osd.118 up 1.00000 1.00000 119 hdd 3.64519 osd.119 up 1.00000 1.00000 120 hdd 3.64519 osd.120 up 1.00000 1.00000 121 hdd 3.64519 osd.121 up 1.00000 1.00000 122 hdd 3.64519 osd.122 up 1.00000 1.00000 123 hdd 3.64519 osd.123 up 1.00000 1.00000 124 hdd 3.64519 osd.124 up 1.00000 1.00000 125 hdd 3.64519 osd.125 up 1.00000 1.00000 126 hdd 3.64519 osd.126 up 1.00000 1.00000 127 hdd 3.64519 osd.127 up 1.00000 1.00000 128 hdd 3.64519 osd.128 up 1.00000 1.00000 129 hdd 3.64519 osd.129 up 1.00000 1.00000 130 hdd 3.64519 osd.130 up 1.00000 1.00000 131 hdd 3.64519 osd.131 up 1.00000 1.00000 -22 116.64600 host osd005 132 hdd 3.64519 osd.132 up 1.00000 1.00000 133 hdd 3.64519 osd.133 up 1.00000 1.00000 134 hdd 3.64519 osd.134 up 1.00000 1.00000 135 hdd 3.64519 osd.135 up 1.00000 1.00000 136 hdd 3.64519 osd.136 up 1.00000 1.00000 137 hdd 3.64519 osd.137 up 1.00000 1.00000 138 hdd 3.64519 osd.138 up 1.00000 1.00000 139 hdd 3.64519 osd.139 up 1.00000 1.00000 140 hdd 3.64519 osd.140 up 1.00000 1.00000 141 hdd 3.64519 osd.141 up 1.00000 1.00000 142 hdd 3.64519 osd.142 up 1.00000 1.00000 143 hdd 3.64519 osd.143 up 1.00000 1.00000 144 hdd 3.64519 osd.144 up 1.00000 1.00000 145 hdd 3.64519 osd.145 up 1.00000 1.00000 146 hdd 3.64519 osd.146 up 1.00000 1.00000 147 hdd 3.64519 osd.147 up 1.00000 1.00000 148 hdd 3.64519 osd.148 up 1.00000 1.00000 149 hdd 3.64519 osd.149 up 1.00000 1.00000 150 hdd 3.64519 osd.150 up 1.00000 1.00000 151 hdd 3.64519 osd.151 up 1.00000 1.00000 152 hdd 3.64519 osd.152 up 1.00000 1.00000 153 hdd 3.64519 osd.153 up 1.00000 1.00000 154 hdd 3.64519 osd.154 up 1.00000 1.00000 155 hdd 3.64519 osd.155 up 1.00000 1.00000 156 hdd 3.64519 osd.156 up 1.00000 1.00000 157 hdd 3.64519 osd.157 up 1.00000 1.00000 158 hdd 3.64519 osd.158 up 1.00000 1.00000 159 hdd 3.64519 osd.159 up 1.00000 1.00000 160 hdd 3.64519 osd.160 up 1.00000 1.00000 161 hdd 3.64519 osd.161 up 1.00000 1.00000 162 hdd 3.64519 osd.162 up 1.00000 1.00000 163 hdd 3.64519 osd.163 up 1.00000 1.00000 -25 116.64600 host osd006 164 hdd 3.64519 osd.164 up 1.00000 1.00000 165 hdd 3.64519 osd.165 up 1.00000 1.00000 166 hdd 3.64519 osd.166 up 1.00000 1.00000 167 hdd 3.64519 osd.167 up 1.00000 1.00000 168 hdd 3.64519 osd.168 up 1.00000 1.00000 169 hdd 3.64519 osd.169 up 1.00000 1.00000 170 hdd 3.64519 osd.170 up 1.00000 1.00000 171 hdd 3.64519 osd.171 up 1.00000 1.00000 172 hdd 3.64519 osd.172 up 1.00000 1.00000 173 hdd 3.64519 osd.173 up 1.00000 1.00000 174 hdd 3.64519 osd.174 up 1.00000 1.00000 175 hdd 3.64519 osd.175 up 1.00000 1.00000 176 hdd 3.64519 osd.176 up 1.00000 1.00000 177 hdd 3.64519 osd.177 up 1.00000 1.00000 178 hdd 3.64519 osd.178 up 1.00000 1.00000 179 hdd 3.64519 osd.179 up 1.00000 1.00000 180 hdd 3.64519 osd.180 up 1.00000 1.00000 181 hdd 3.64519 osd.181 up 1.00000 1.00000 182 hdd 3.64519 osd.182 up 1.00000 1.00000 183 hdd 3.64519 osd.183 up 1.00000 1.00000 184 hdd 3.64519 osd.184 up 1.00000 1.00000 185 hdd 3.64519 osd.185 up 1.00000 1.00000 186 hdd 3.64519 osd.186 up 1.00000 1.00000 187 hdd 3.64519 osd.187 up 1.00000 1.00000 188 hdd 3.64519 osd.188 up 1.00000 1.00000 189 hdd 3.64519 osd.189 up 1.00000 1.00000 190 hdd 3.64519 osd.190 up 1.00000 1.00000 191 hdd 3.64519 osd.191 up 1.00000 1.00000 192 hdd 3.64519 osd.192 up 1.00000 1.00000 193 hdd 3.64519 osd.193 up 1.00000 1.00000 194 hdd 3.64519 osd.194 up 1.00000 1.00000 195 hdd 3.64519 osd.195 up 1.00000 1.00000
The metadata-pool lives on the SSDs, the data pool on the HDDs (via the device classes).
Updated by Oliver Freyermuth about 6 years ago
All HDD-OSDs have 4 TB, while the SSDs used for the metadata pool have 240 GB.
Updated by Peter Woodman about 6 years ago
Hey, I might be seeing the same bug. Can you paste in the operation dump that shows up right before that crash, and maybe like ~10 lines of previous context?
Updated by Oliver Freyermuth about 6 years ago
@Peter Woodman: Since the system recovered after many OSD restarts (see my previous comment) and I did not think about taking an ops dump, I can sadly not reproduce that now :-(. I'll keep it in mind as soon as the issue reappears, right now I can sadly only share the logfiles (which I uploaded in full via ceph-post-file, but I could also share parts of the publicly if it helps).
Updated by Sage Weil about 6 years ago
- Status changed from New to Need More Info
Can you reproduce the crash on one or more OSDs with 'debug osd = 20' and 'debug bluestore = 20'?
Also, can you check what the problematic PG is on the other crashing OSDs? For the attached log (osd.191) it is
-202> 2018-02-25 17:21:58.829663 7f503e854700 0 bluestore(/var/lib/ceph/osd/ceph-191) transaction dump: { "ops": [ { "op_num": 0, "op_name": "setattrs", "collection": "2.21es1_head", "oid": "1#2:785f4b65:::1000645413e.00000000:head#", "attr_lens": { "_": 275, "_layout": 30, "_parent": 346, "snapset": 35 } }, { "op_num": 1, "op_name": "setattr", "collection": "2.21es1_head", "oid": "1#2:785f4b65:::1000645413e.00000000:head#", "name": "hinfo_key", "length": 42 } ] }
so 2.21es1
Updated by Oliver Freyermuth about 6 years ago
The bad news (for the ticket) is that the problem vanished after restarting all crashing OSDs often enough,
and temporarily reducing min_size of the cluster (k=4 m=2, min_size is usually 5, I temporarily put it to 4).
So currently, I can't reproduce :-(.
Maybe Peter Woodman can add more info if his bug is really the same -
Or maybe 7de1dddf-27d4-4b6b-9128-0138bfaf85cf helps (from my comment #2), which was from another crashing OSD?
Updated by Peter Woodman about 6 years ago
Yeah, I've got some of that. Problem is, I'm not seeing debug log messages that should be there based on the failure, if I'm reading the code correctly. Will post when i get home.
Updated by Peter Woodman about 6 years ago
Actually, looks like your crashing ops are different from mine. I'll just open a new bug.
Updated by Sage Weil over 5 years ago
- Status changed from Need More Info to Can't reproduce