Bug #21173
closedOSD crash trying to decode erasure coded date from corrupted shards
0%
Description
OSDs are continuously crashing upon trying to backfill/recover certain objects, after inspecting these objects, I found size mismatch between files, this is the crashing thread
3941> 2017-08-29 22:09:07.666374 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.105:6804/27366 -- osd_map(1106979..1106980 src has 960094..1106980) v3 -- 0x7fd500f01680 con 0
3940> 2017-08-29 22:09:07.666439 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.105:6804/27366 -- MOSDECSubOpRead(143.371s3 1106980/1105690 ECSubRead(tid=23684, to_read={143:8efa89b4:::default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2:head=0,932160,0}, attrs_to_read=)) v3 -- 0x7fd50fdda080 con 0
3939> 2017-08-29 22:09:07.666493 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.114:6816/37284 -- MOSDECSubOpRead(143.371s0 1106980/1105690 ECSubRead(tid=23684, to_read={143:8efa89b4:::default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2:head=0,932160,0}, attrs_to_read=)) v3 -- 0x7fd4ad516a00 con 0
3938> 2017-08-29 22:09:07.666534 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.113:6802/12506 -- osd_map(1106979..1106980 src has 960094..1106980) v3 -- 0x7fd504198280 con 0
3937> 2017-08-29 22:09:07.666560 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.113:6802/12506 -- MOSDECSubOpRead(143.371s9 1106980/1105690 ECSubRead(tid=23684, to_read={143:8efa89b4:::default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2:head=0,932160,0}, attrs_to_read=)) v3 -- 0x7fd50fdda580 con 0
3935> 2017-08-29 22:09:07.666665 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.117:6802/12636 -- MOSDECSubOpRead(143.371s8 1106980/1105690 ECSubRead(tid=23684, to_read={143:8efa89b4:::default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2:head=0,932160,0}, attrs_to_read=)) v3 -- 0x7fd52e537980 con 0
3934> 2017-08-29 22:09:07.666735 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.109:6816/13028 -- osd_map(1106980..1106980 src has 960094..1106980) v3 -- 0x7fd50ca64c80 con 0
3933> 2017-08-29 22:09:07.666749 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.109:6816/13028 -- MOSDECSubOpRead(143.371s7 1106980/1105690 ECSubRead(tid=23684, to_read={143:8efa89b4:::default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2:head=0,932160,0}, attrs_to_read=)) v3 -- 0x7fd5170e8f00 con 0
3932> 2017-08-29 22:09:07.666819 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.110:6802/12139 -- osd_map(1106979..1106980 src has 960094..1106980) v3 -- 0x7fd4e85e4d00 con 0
3931> 2017-08-29 22:09:07.666837 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.110:6802/12139 -- MOSDECSubOpRead(143.371s5 1106980/1105690 ECSubRead(tid=23684, to_read={143:8efa89b4:::default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2:head=0,932160,0}, attrs_to_read=)) v3 -- 0x7fd503fe2300 con 0
3930> 2017-08-29 22:09:07.666863 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.123:6810/34596 -- MOSDECSubOpRead(143.371s1 1106980/1105690 ECSubRead(tid=23684, to_read={143:8efa89b4:::default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2:head=0,932160,0}, attrs_to_read=)) v3 -- 0x7fd50cff3980 con 0
3929> 2017-08-29 22:09:07.666900 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.135:6814/17687 -- MOSDECSubOpRead(143.371s6 1106980/1105690 ECSubRead(tid=23684, to_read={143:8efa89b4:::default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2:head=0,932160,0}, attrs_to_read=)) v3 -- 0x7fd4ae1bef00 con 0
3928> 2017-08-29 22:09:07.666926 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.136:6810/46475 -- MOSDECSubOpRead(143.371s4 1106980/1105690 ECSubRead(tid=23684, to_read={143:8efa89b4:::default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2:head=0,932160,0}, attrs_to_read=)) v3 -- 0x7fd50803e080 con 0
3927> 2017-08-29 22:09:07.666984 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.102:6807/31441 -- MOSDScrubReserve(143.1c8s0 GRANT e1106980) v1 -- 0x7fd500f5c800 con 0
3926> 2017-08-29 22:09:07.667437 7fd478bc2700 1 - 192.168.216.114:6816/37284 --> 192.168.216.114:6816/37284 -- MOSDECSubOpReadReply(143.371s0 1106980/1105690 ECSubReadReply(tid=23684, attrs_read=0)) v2 -- 0x7fd50ac558c0 con 0
0> 2017-08-29 22:09:08.098602 7fd478bc2700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/osd/ECUtil.cc: In function 'int ECUtil::decode(const ECUtil::stripe_info_t&, ceph::ErasureCodeInterfaceRef&, std::map<int, ceph::buffer::list>&, std::map<int, ceph::buffer::list*>&)' thread 7fd478bc2700 time 2017-08-29 22:09:08.094250
2017-08-29 22:09:08.169428 7fd478bc2700 -1 * Caught signal (Aborted) *
in thread 7fd478bc2700 thread_name:tp_osd_tp
0> 2017-08-29 22:09:08.169428 7fd478bc2700 -1 * Caught signal (Aborted) *
in thread 7fd478bc2700 thread_name:tp_osd_tp
We are trying to find the corrupted files using some tricks (after the OSD dies), but there are so many of them, this is a sample, most shards are 456K, but some are 60K, when we stop the OSD, remove these files, and start it again, the crash doesn't happen, and recovery proceeds.
thread = 7fd478bc2700
PG=143.371
NAME_PART=3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1
143.371 63780 0 0 0 0 386436908769 3298 3298 activating+degraded+remapped 2017-08-29 22:22:00.092502 1105203'307439 1107026:3400572 [132,624,167,68,652,620,535,218,187,549,266,234] 132 [132,292,167,68,544,234,535,218,187,145,266,309] 132 960594'307163 2017-08-07 04:47:48.178394 958883'299934 2017-07-24 12:18:02.085306
456K /var/lib/ceph/osd/ceph-132/current/143.371s0_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_0
456K /var/lib/ceph/osd/ceph-68/current/143.371s3_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_3
456K /var/lib/ceph/osd/ceph-132/current/143.371s0_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_0
456K /var/lib/ceph/osd/ceph-145/current/143.371s9_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_9
60K /var/lib/ceph/osd/ceph-167/current/143.371s2_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_2
60K /var/lib/ceph/osd/ceph-187/current/143.371s8_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_8
456K /var/lib/ceph/osd/ceph-218/current/143.371s7_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_7
456K /var/lib/ceph/osd/ceph-266/current/143.371s10_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_a
456K /var/lib/ceph/osd/ceph-292/current/143.371s1_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_1
456K /var/lib/ceph/osd/ceph-309/current/143.371s11_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_b
456K /var/lib/ceph/osd/ceph-535/current/143.371s6_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_6
456K /var/lib/ceph/osd/ceph-544/current/143.371s4_head/DIR_1/DIR_7/DIR_F/DIR_5/DIR_1/default.63296332.1\u\ushadow\u304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1\u2__head_2D915F71__8f_ffffffffffffffff_4
Updated by David Zafman over 6 years ago
At filestore debug level 10 this message "get_hash_info: Mismatch of total_chunk_size 466944" should have been generated in a recovery read attempt at osd.187 and/or osd.167.
The primary should have had a message like this for an attempt to read osd.187: "_failed_push: Read error 143:xxxxxxxx:::default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2:head r=0 errors={187(8)=-5}
I truncated shard 1 on on osd 1 as a test:
$ find dev -name 'bar*' -ls 409702 19420 -rw-r--r-- 1 dzafman dzafman 19881984 Aug 29 13:45 dev/osd3/current/2.0s0_head/bar__head_EFE6384B__2_ffffffffffffffff_0 409703 19420 -rw-r--r-- 1 dzafman dzafman 19881984 Aug 29 13:45 dev/osd2/current/2.0s2_head/bar__head_EFE6384B__2_ffffffffffffffff_2 409704 5468 -rw-r--r-- 1 dzafman dzafman 5595136 Aug 29 13:48 dev/osd1/current/2.0s1_head/bar__head_EFE6384B__2_ffffffffffffffff_1 $ bin/ceph-objectstore-tool --data-path dev/osd1 '["2.0s1",{"oid":"bar","key":"","snapid":-2,"hash":4024842315,"max":0,"pool":2,"namespace":"","shard_id":1,"max":0}]' get-attr hinfo_key | bin/ceph-dencoder import - type ECUtil::HashInfo decode dump_json { "total_chunk_size": 19881984, "cumulative_shard_hashes": { "hash": { "shard": 0, "hash": 3279216067 }, "hash": { "shard": 1, "hash": 1616075201 }, "hash": { "shard": 2, "hash": 3966965908 } } }
Updated by Mustafa Muhammad over 6 years ago
[root@ceph017 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-187/ '["143.371s8",{"oid":"default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2","key":"","snapid":-2,"hash":764501873,"max":0,"pool":143,"namespace":"","shard_id":8,"max":0}]' get-attr hinfo_key | ceph-dencoder import - type ECUtil::HashInfo decode dump_json
{
"total_chunk_size": 58080,
"cumulative_shard_hashes": {
"hash": {
"shard": 0,
"hash": 1528934879
},
"hash": {
"shard": 1,
"hash": 4161392522
},
"hash": {
"shard": 2,
"hash": 3354715535
},
"hash": {
"shard": 3,
"hash": 1867130716
},
"hash": {
"shard": 4,
"hash": 2427655492
},
"hash": {
"shard": 5,
"hash": 4243843471
},
"hash": {
"shard": 6,
"hash": 3460445679
},
"hash": {
"shard": 7,
"hash": 3397239655
},
"hash": {
"shard": 8,
"hash": 680106050
},
"hash": {
"shard": 9,
"hash": 1264594567
},
"hash": {
"shard": 10,
"hash": 2253242928
},
"hash": {
"shard": 11,
"hash": 1663251712
}
}
}
Updated by Mustafa Muhammad over 6 years ago
And this from a good chunk:
[root@ceph014 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-132/ '["143.371s0",{"oid":"default.63296332.1__shadow_304299676.2~3Mnd94mMNaUb0tK0fEB5bZn8YrB-A-F.1_2","key":"","snapid":-2,"hash":764501873,"max":0,"pool":143,"namespace":"","shard_id":0,"max":0}]' get-attr hinfo_key | ceph-dencoder import - type ECUtil::HashInfo decode dump_json
{
"total_chunk_size": 466080,
"cumulative_shard_hashes": {
"hash": {
"shard": 0,
"hash": 2716396157
},
"hash": {
"shard": 1,
"hash": 3823712376
},
"hash": {
"shard": 2,
"hash": 643392995
},
"hash": {
"shard": 3,
"hash": 3096269466
},
"hash": {
"shard": 4,
"hash": 3697605281
},
"hash": {
"shard": 5,
"hash": 2035529382
},
"hash": {
"shard": 6,
"hash": 102109390
},
"hash": {
"shard": 7,
"hash": 2587959089
},
"hash": {
"shard": 8,
"hash": 1273445979
},
"hash": {
"shard": 9,
"hash": 2924648671
},
"hash": {
"shard": 10,
"hash": 1959366179
},
"hash": {
"shard": 11,
"hash": 917397527
}
}
}
Updated by David Zafman over 6 years ago
One example had mod time Aug 5 for the smaller shards and Aug 7 for the larger.
mustafam: ...upgraded to Kraken 11.2.0 about that time, then 11.2.1, then 12.1.4 RC all while the cluster is almost dead (we hit another bug about osdmap)
Not sure what "about that time" was referring to.
Updated by Mustafa Muhammad over 6 years ago
David Zafman wrote:
One example had mod time Aug 5 for the smaller shards and Aug 7 for the larger.
mustafam: ...upgraded to Kraken 11.2.0 about that time, then 11.2.1, then 12.1.4 RC all while the cluster is almost dead (we hit another bug about osdmap)
Not sure what "about that time" was referring to.
OK, I meant around Aug 7, but now I think I found the upgrade timing in history, we upgraded on Aug 8, nothing was on Aug 5 and Aug 7.
Updated by David Zafman over 6 years ago
- Status changed from New to Won't Fix
Likely caused by OOM killer and attempts to recover cluster after high memory usage, leveldb corruption and software updates to a non-final release version. Closing