Project

General

Profile

Actions

Bug #12200

closed

assert(hinfo.get_total_chunk_size() == (uint64_t)st.st_size)

Added by David Zafman almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Corruption of an EC pool shard crashes osd during a deep-scrub. Specifically, the file size of one of the shards is larger than expected:

$ find dev -name '*foo*' -ls
279010   56 -rw-r--r--   1 dzafman  dzafman     51201 Jul  1 16:22 dev/osd4/current/1.6s1_head/foo__head_7FC1F406__1_ffffffffffffffff_1
279011   56 -rw-r--r--   1 dzafman  dzafman     51200 Jul  1 16:12 dev/osd2/current/1.6s2_head/foo__head_7FC1F406__1_ffffffffffffffff_2
279009   56 -rw-r--r--   1 dzafman  dzafman     51200 Jul  1 16:12 dev/osd3/current/1.6s0_head/foo__head_7FC1F406__1_ffffffffffffffff_0
2015-07-01 16:22:40.244766 7f3fa17ea700 10 osd.4 26 dequeue_op 0x7f3fb402cfe0 prio 127 cost 0 latency 0.000140 replica scrub(pg: 1.6s1,from:0'0,to:20'1,epoch:26,start:0//0//-1,end:MAX,chunky:1,deep:1,seed:4294967295,version:6) v6 pg pg[1.6s1( v 20'1 (0'0,20'1] local-les=26 n=1 ec=19 les/c 26/26 25/25/25) [3,4,2] r=1 lpr=25 pi=19-24/4 luod=0'0 crt=0'0 lcod 0'0 active]
2015-07-01 16:22:40.244799 7f3fa17ea700 10 osd.4 pg_epoch: 26 pg[1.6s1( v 20'1 (0'0,20'1] local-les=26 n=1 ec=19 les/c 26/26 25/25/25) [3,4,2] r=1 lpr=25 pi=19-24/4 luod=0'0 crt=0'0 lcod 0'0 active] handle_message: replica scrub(pg: 1.6s1,from:0'0,to:20'1,epoch:26,start:0//0//-1,end:MAX,chunky:1,deep:1,seed:4294967295,version:6) v6
2015-07-01 16:22:40.244814 7f3fa17ea700  7 osd.4 pg_epoch: 26 pg[1.6s1( v 20'1 (0'0,20'1] local-les=26 n=1 ec=19 les/c 26/26 25/25/25) [3,4,2] r=1 lpr=25 pi=19-24/4 luod=0'0 crt=0'0 lcod 0'0 active] replica_scrub
2015-07-01 16:22:40.244824 7f3fa17ea700 10 osd.4 pg_epoch: 26 pg[1.6s1( v 20'1 (0'0,20'1] local-les=26 n=1 ec=19 les/c 26/26 25/25/25) [3,4,2] r=1 lpr=25 pi=19-24/4 luod=0'0 crt=0'0 lcod 0'0 active] build_scrub_map_chunk [0//0//-1,MAX)  seed 4294967295
2015-07-01 16:22:40.244835 7f3fa17ea700 10 filestore(/home/dzafman/ceph/src/dev/osd4) collection_list_partial: 1.6s1_head
2015-07-01 16:22:40.244843 7f3fa17ea700 20 _collection_list_partial 0//0//-1 32-64 ls.size 0
2015-07-01 16:22:40.244937 7f3fa17ea700 20  prefixes 60000000,604F1CF7
2015-07-01 16:22:40.244947 7f3fa17ea700 20 filestore(/home/dzafman/ceph/src/dev/osd4) objects: [6//head//1/ffffffffffffffff/1,7fc1f406/foo/head//1/ffffffffffffffff/1]
2015-07-01 16:22:40.244957 7f3fa17ea700 10 osd.4 pg_epoch: 26 pg[1.6s1( v 20'1 (0'0,20'1] local-les=26 n=1 ec=19 les/c 26/26 25/25/25) [3,4,2] r=1 lpr=25 pi=19-24/4 luod=0'0 crt=0'0 lcod 0'0 active] be_scan_list scanning 1 objects deeply
2015-07-01 16:22:40.245000 7f3fa17ea700 10 filestore(/home/dzafman/ceph/src/dev/osd4) stat 1.6s1_head/7fc1f406/foo/head//1/ffffffffffffffff/1 = 0 (size 51201)
2015-07-01 16:22:40.245014 7f3fa17ea700 15 filestore(/home/dzafman/ceph/src/dev/osd4) getattrs 1.6s1_head/7fc1f406/foo/head//1/ffffffffffffffff/1
2015-07-01 16:22:40.245075 7f3fa17ea700 20 filestore(/home/dzafman/ceph/src/dev/osd4) fgetattrs 36 getting '_'
2015-07-01 16:22:40.245085 7f3fa17ea700 20 filestore(/home/dzafman/ceph/src/dev/osd4) fgetattrs 36 getting 'hinfo_key'
2015-07-01 16:22:40.245205 7f3fa17ea700 10 filestore(/home/dzafman/ceph/src/dev/osd4) getattrs 1.6s1_head/7fc1f406/foo/head//1/ffffffffffffffff/1 = 0
2015-07-01 16:22:40.245214 7f3fa17ea700 15 filestore(/home/dzafman/ceph/src/dev/osd4) read 1.6s1_head/7fc1f406/foo/head//1/ffffffffffffffff/1 0~524288
2015-07-01 16:22:40.245280 7f3fa17ea700 10 filestore(/home/dzafman/ceph/src/dev/osd4) FileStore::read 1.6s1_head/7fc1f406/foo/head//1/ffffffffffffffff/1 0~51201/524288
2015-07-01 16:22:40.245301 7f3fa17ea700  0 osd.4 pg_epoch: 26 pg[1.6s1( v 20'1 (0'0,20'1] local-les=26 n=1 ec=19 les/c 26/26 25/25/25) [3,4,2] r=1 lpr=25 pi=19-24/4 luod=0'0 crt=0'0 lcod 0'0 active] _scan_list  7fc1f406/foo/head//1 got -5 on read, read_error
2015-07-01 16:22:40.245323 7f3fa17ea700 10 osd.4 pg_epoch: 26 pg[1.6s1( v 20'1 (0'0,20'1] local-les=26 n=1 ec=19 les/c 26/26 25/25/25) [3,4,2] r=1 lpr=25 pi=19-24/4 luod=0'0 crt=0'0 lcod 0'0 active] get_hash_info: Getting attr on 7fc1f406/foo/head//1
2015-07-01 16:22:40.245337 7f3fa17ea700 10 osd.4 pg_epoch: 26 pg[1.6s1( v 20'1 (0'0,20'1] local-les=26 n=1 ec=19 les/c 26/26 25/25/25) [3,4,2] r=1 lpr=25 pi=19-24/4 luod=0'0 crt=0'0 lcod 0'0 active] get_hash_info: not in cache 7fc1f406/foo/head//1
2015-07-01 16:22:40.245379 7f3fa17ea700 10 filestore(/home/dzafman/ceph/src/dev/osd4) stat 1.6s1_head/7fc1f406/foo/head//1/ffffffffffffffff/1 = 0 (size 51201)
2015-07-01 16:22:40.245386 7f3fa17ea700 10 osd.4 pg_epoch: 26 pg[1.6s1( v 20'1 (0'0,20'1] local-les=26 n=1 ec=19 les/c 26/26 25/25/25) [3,4,2] r=1 lpr=25 pi=19-24/4 luod=0'0 crt=0'0 lcod 0'0 active] get_hash_info: found on disk, size 51201
2015-07-01 16:22:40.245397 7f3fa17ea700 15 filestore(/home/dzafman/ceph/src/dev/osd4) getattr 1.6s1_head/7fc1f406/foo/head//1/ffffffffffffffff/1 'hinfo_key'
2015-07-01 16:22:40.245413 7f3fa17ea700 10 filestore(/home/dzafman/ceph/src/dev/osd4) getattr 1.6s1_head/7fc1f406/foo/head//1/ffffffffffffffff/1 'hinfo_key' = 30
2015-07-01 16:22:40.261902 7f3fa17ea700 -1 osd/ECBackend.cc: In function 'ECUtil::HashInfoRef ECBackend::get_hash_info(const hobject_t&)' thread 7f3fa17ea700 time 2015-07-01 16:22:40.245421
osd/ECBackend.cc: 1482: FAILED assert(hinfo.get_total_chunk_size() == (uint64_t)st.st_size)

 ceph version 9.0.1-1111-g075fb9f (075fb9f9e07f5a97bda4f8a4a23cba4df5bc826d)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x1a0342b]
 2: (ECBackend::get_hash_info(hobject_t const&)+0x65c) [0x180040a]
 3: (ECBackend::be_deep_scrub(hobject_t const&, unsigned int, ScrubMap::object&, ThreadPool::TPHandle&)+0x43c) [0x180255e]
 4: (PGBackend::be_scan_list(ScrubMap&, std::vector<hobject_t, std::allocator<hobject_t> > const&, bool, unsigned int, ThreadPool::TPHandle&)+0x444) [0x16e599a]
 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool, unsigned int, ThreadPool::TPHandle&)+0x3a9) [0x1590fb5]
 6: (PG::replica_scrub(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x63d) [0x1591e63]
 7: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xa32) [0x1624320]
 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x47f) [0x1391075]

Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #8588: In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size is changed Duplicate06/11/2014

Actions
Actions

Also available in: Atom PDF