Project

General

Profile

Actions

Bug #9537

closed

OSD crash after writing 10GB file onto EC Pool: FAILED assert(hinfo.get_total_chunk_size() == (unsigned)st.st_size)

Added by Mallikarjun Biradar over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

100%

Source:
other
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On freshly created cluster, created an EC pool with default ec profile.
Wrote 5MB of object file using rados put command and ran pg scrub on pg on which object file resides. Cluster is in healthy state and all OSD's up and running.

Wrote 10GB of object file using rados put command, one of the OSD got crashed, command got hung, OSD continues to be into in/out state.

2014-09-19 20:12:39.815809 7f60bf332700 0 – 10.242.43.114:6821/37220 >> 10.242.43.116:6836/41394 pipe(0x56a6680 sd=115 :6821 s=0 pgs=0 cs=0 l=0 c=0x569e260).accept connect_seq 2 vs existing 1 state standby
2014-09-19 20:25:31.913815 7f60cf7bf700 -1 osd/ECBackend.cc: In function 'ECUtil::HashInfoRef ECBackend::get_hash_info(const hobject_t&)' thread 7f60cf7bf700 time 2014-09-19 20:25:31.911314
osd/ECBackend.cc: 1459: FAILED assert(hinfo.get_total_chunk_size() (unsigned)st.st_size)

ceph version 0.84-sd-sprint4 (3215c520e1306f50d0094b5646636c02456c9df4)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xb776db]
2: (ECBackend::get_hash_info(hobject_t const&)+0xaa6) [0x9fce46]
3: (ECBackend::submit_transaction(hobject_t const&, eversion_t const&, PGBackend::PGTransaction*, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, boost::optional<pg_hit_set_history_t>&, Context*, Context*, Context*, unsigned long, osd_reqid_t, std::tr1::shared_ptr<OpRequest>)+0x525) [0xa021d5]
4: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*, utime_t)+0x725) [0x8395d5]
5: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xf2c) [0x88601c]
6: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x34af) [0x88a28f]
7: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x611) [0x826581]
8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3d5) [0x6914d5]
9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x346) [0x691a36]
10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x821) [0xb677e1]
11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xb698f0]
12: (()+0x8182) [0x7f60ebc98182]
13: (clone()+0x6d) [0x7f60ea07e30d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

— begin dump of recent events —
-10000> 2014-09-19 20:25:29.067561 7f60bee2d700 1 – 10.242.43.114:6821/37220 < osd.18 10.242.43.116:6831/39684 3719 ==== MOSDECSubOpWriteReply(2.8s0 128 ECSubWriteReply(tid=1919, last_complete=0'0, committed=0, applied=1)) v1 ==== 67+0+0 (4168246392 0 0) 0x83f6000 con 0x5de5fa0
-9999> 2014-09-19 20:25:29.067591 7f60bee2d700 5 – op tracker – , seq: 10471, time: 2014-09-19 20:25:29.067533, event: header_read, op: MOSDECSubOpWriteReply(2.8s0 128 ECSubWriteReply(tid=1919, last_complete=0'0, committed=0, applied=1))

Steps:

1> Created ECPool on freshly installed cluster
ems@rack6-client-5:~$ sudo ceph osd pool create ecpool 12 12 erasure
pool 'ecpool' created
ems@rack6-client-5:~$
ems@rack6-client-5:~$ sudo rados df
pool name category KB objects clones degraded unfound rd rd KB wr wr KB
ecpool - 0 0 0 0 0 0 0 0 0
rbd - 0 0 0 0 0 0 0 0 0
testpool - 0 0 0 0 0 0 0 0 0
total used 895056 0
total avail 9230427696
total space 9231322752
ems@rack6-client-5:~$
ems@rack6-client-5:~$ sudo ceph -s
cluster 2ca6fab0-6a27-447e-8ac1-658f5e9399b4
health HEALTH_OK
monmap e1: 2 mons at {rack6-client-5=10.242.43.106:6789/0,rack6-client-6=10.242.43.107:6789/0}

, election epoch 10, quorum 0,1 rack6-client-5,rack6-client-6
osdmap e128: 24 osds: 24 up, 24 in
pgmap v300: 588 pgs, 3 pools, 0 bytes data, 0 objects
896 MB used, 8802 GB / 8803 GB avail
588 active+clean
ems@rack6-client-5:~$

2> Wrote 5MB object onto it
ems@rack6-client-5:~$ sudo rados put obj1 storm_build/sprint5/1.0.0.02/Ubuntu/x64/sndk-ifos-1.0.0.02/ceph-common_0.84-1_amd64.deb -p ecpool
ems@rack6-client-5:~$
ems@rack6-client-5:~$ sudo rados df
pool name category KB objects clones degraded unfound rd rd KB wr wr KB
ecpool - 4312 1 0 0 0 0 0 2 4312
rbd - 0 0 0 0 0 0 0 0 0
testpool - 0 0 0 0 0 0 0 0 0
total used 921856 1
total avail 9230400896
total space 9231322752
ems@rack6-client-5:~$

3> Run pg scrub on pg where object resides
ems@rack6-client-5:~$ sudo ceph osd map ecpool obj1
osdmap e128 pool 'ecpool' (2) object 'obj1' -> pg 2.6cf8deff (2.7) -> up ([18,6,13], p18) acting ([18,6,13], p18)
ems@rack6-client-5:~$
ems@rack6-client-5:~$ sudo ceph pg scrub 2.7
instructing pg 2.7 on osd.18 to scrub
ems@rack6-client-5:~$
ems@rack6-client-5:~$ sudo ceph -s
cluster 2ca6fab0-6a27-447e-8ac1-658f5e9399b4
health HEALTH_OK
monmap e1: 2 mons at {rack6-client-5=10.242.43.106:6789/0,rack6-client-6=10.242.43.107:6789/0}

, election epoch 10, quorum 0,1 rack6-client-5,rack6-client-6
osdmap e128: 24 osds: 24 up, 24 in
pgmap v308: 588 pgs, 3 pools, 4311 kB data, 1 objects
905 MB used, 8802 GB / 8803 GB avail
588 active+clean
ems@rack6-client-5:~$ sudo rados df
pool name category KB objects clones degraded unfound rd rd KB wr wr KB
ecpool - 4312 1 0 0 0 0 0 2 4312
rbd - 0 0 0 0 0 0 0 0 0
testpool - 0 0 0 0 0 0 0 0 0
total used 927040 1
total avail 9230395712
total space 9231322752
ems@rack6-client-5:~$

4> Wrote 10G file onto same pool
Command got hung. One of the OSD crashed.
After sometime, that OSD is fliping in/out of the cluster.
ems@rack6-client-5:~$ ls lrt File.tar
rw-rw-r
1 ems ems 9830400000 Sep 19 14:54 File.tar
ems@rack6-client-5:~$
ems@rack6-client-5:~$ sudo rados put obj2 File.tar -p ecpool
2014-09-19 14:56:30.465197 7fcc06295700 0 – 10.242.43.106:0/1034381 >> 10.242.43.114:6820/37220 pipe(0x7fcbfc000e40 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7fcbfc001950).fault
2014-09-19 14:56:51.710496 7fcc05f92700 0 – 10.242.43.106:0/1034381 >> 10.242.43.116:6830/39684 pipe(0x7fcbfc00cab0 sd=7 :0 s=1 pgs=0 cs=0 l=1 c=0x7fcbfc00c420).fault
2014-09-19 14:56:51.977428 7fcc05f92700 0 – 10.242.43.106:0/1034381 >> 10.242.43.116:6830/39684 pipe(0x7fcbfc00cab0 sd=6 :44614 s=1 pgs=0 cs=0 l=1 c=0x7fcbfc00c420).connect claims to be 0.0.0.0:6830/43022 not 10.242.43.116:6830/39684 - wrong node!
2014-09-19 14:56:52.377776 7fcc05f92700 0 – 10.242.43.106:0/1034381 >> 10.242.43.116:6830/39684 pipe(0x7fcbfc00cab0 sd=6 :44615 s=1 pgs=0 cs=0 l=1 c=0x7fcbfc00c420).connect claims to be 10.242.43.116:6830/43022 not 10.242.43.116:6830/39684 - wrong node!
2014-09-19 14:56:55.119593 7fcc06093700 0 – 10.242.43.106:0/1034381 >> 10.242.43.114:6820/43598 pipe(0x7fcbfc00cab0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7fcbfc0023d0).fault
2014-09-19 14:57:09.729642 7fcc05f92700 0 – 10.242.43.106:0/1034381 >> 10.242.43.114:6822/44194 pipe(0x7fcbfc00a740 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7fcbfc002810).fault
2014-09-19 14:57:24.716477 7fcc06295700 0 – 10.242.43.106:0/1034381 >> 10.242.43.114:6840/44525 pipe(0x7fcbfc00f340 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7fcbfc00ef40).fault
2014-09-19 14:57:39.722376 7fcc06093700 0 – 10.242.43.106:0/1034381 >> 10.242.43.114:6843/44859 pipe(0x7fcbfc010f50 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7fcbfc0109b0).fault


Files

osd.log (2.01 MB) osd.log Mallikarjun Biradar, 09/19/2014 11:51 AM
Actions #1

Updated by Mallikarjun Biradar over 9 years ago

Config:
OSD nodes: 3
Monitor nodes: 2
Number of OSD's: 24

This is observed on 0.84 and is consistently getting reproduced.

Actions #2

Updated by Loïc Dachary over 9 years ago

  • Category set to OSD
  • Status changed from New to Need More Info
  • Assignee changed from Sage Weil to Loïc Dachary

Could you please attach the last 20,000 (twenty thousand) lines of the logs of the crashed primary OSD ?

Actions #3

Updated by Loïc Dachary over 9 years ago

  • Status changed from Need More Info to Fix Under Review
  • % Done changed from 0 to 90

https://github.com/ceph/ceph/pull/2534 to be confirmed by the OSD logs

Actions #4

Updated by Mallikarjun Biradar over 9 years ago

OSD log of the primary OSD which crashed

Actions #5

Updated by Greg Farnum over 9 years ago

Please note that while this ought to work in the technical sense, you are unlikely to be happy with RADOS if you make it manage 10GB objects. Data balance won't work out properly (unless you have a lot of them), and no testing occurs with objects anywhere near that size. Expect other problems.

Actions #6

Updated by Loïc Dachary over 9 years ago

  • Status changed from Fix Under Review to Resolved
  • % Done changed from 90 to 100
Actions

Also available in: Atom PDF