Bug #8947
closedWriting rados objects with max objects set for cache pool crashed osd
0%
Description
Setting target_max_objects parameter and writing rados object onto cache pool crashed osd.
History of operations on the setup:
There are 2 other cache tiers which are used to test parameters
1> cache_min_flush_age & cache_target_full_ratio
2> cache_min_flush_age
Steps:
1> Created erasure coded profile "ecprofile"
2> Created erasure coded pool "ecpoolobj" using "ecprofile"
3> Created cachepoolobj (replicated pool)
4> Created cache tier using ecpoolobj & cachepoolobj
5> Set cachepoolobj in writeback mode
6> Set-overlay on ecpoolobj to redirect IO's to cachepoolobj
7> Set target_max_objects parameter to 10
8> Wrote one object onto cachepoolobj using rados utility
Start writing 10 rados objects on cachepoolobj
one of the osd got crashed initially.
After sometime, 3 of the osd's went down.
Attaching all osd's log.
ubuntu@ubuntuServer1:~$ sudo ceph osd pool create ecpoolobj 12 12 erasure ecprofile
pool 'ecpoolobj' created
ubuntu@ubuntuServer1:~$
ubuntu@ubuntuServer1:~$ sudo ceph osd pool create cachepoolobj 256 256
pool 'cachepoolobj' created
ubuntu@ubuntuServer1:~$ sudo ceph osd tier add ecpoolobj cachepoolobj
pool 'cachepoolobj' is now (or already was) a tier of 'ecpoolobj'
ubuntu@ubuntuServer1:~$ sudo ceph osd tier cache-mode cachepoolobj writeback
set cache-mode for pool 'cachepoolobj' to writeback
ubuntu@ubuntuServer1:~$ sudo ceph osd tier set-overlay ecpoolobj cachepoolobj
overlay for 'ecpoolobj' is now (or already was) 'cachepoolobj'
ubuntu@ubuntuServer1:~$ sudo ceph osd pool set cachepool target_max_objects 10
set pool 4 target_max_objects to 10
ubuntu@ubuntuServer1:~$
ubuntu@ubuntuServer1:~$ sudo rados df
pool name category KB objects clones degraded unfound rd rd KB wr wr KB
cachepool - 36 1 0 0 0 0 0 3 37
cachepoolage - 36 1 0 0 0 0 0 2 37
cachepoolobj - 0 0 0 0 0 0 0 0 0
data - 0 0 0 0 0 0 0 0 0
ecpool - 36 1 0 0 0 1 0 2 37
ecpoolage - 0 0 0 0 0 0 0 0 0
ecpoolobj - 0 0 0 0 0 0 0 0 0
metadata - 0 0 0 0 0 0 0 0 0
rbd - 0 0 0 0 0 0 0 0 0
total used 170460 3
total avail 757575652
total space 757746112
ubuntu@ubuntuServer1:~$
ubuntu@ubuntufunc1:~$ for i in {1..9}; do sudo rados put p cachepoolobj obj1 CEPH-Ubuntu14.04_x64-1.06.00/python-ceph_0.80-1_amd64.deb; done 10.66.26.5:0/1013789 >> 10.242.24.154:6802/28779 pipe(0x1f1bbe0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x1f1be50).fault
ubuntu@ubuntufunc1:~$ for i in {1..9}; do sudo rados put -p cachepoolobj obj$i CEPH-Ubuntu14.04_x64-1.06.00/python-ceph_0.80-1_amd64.deb; done
2014-07-25 18:26:03.041974 7f895d162700 0 -
2014-07-25 18:26:34.521877 7f895d061700 0 -- 10.66.26.5:0/1013789 >> 10.66.26.5:6800/12812 pipe(0x7f8954007190 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8954007400).fault
Crash logs:
28> 2014-07-25 18:28:15.643983 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.549656, event: header_read, op: pg_info(1 pgs e65:1.a)
27> 2014-07-25 18:28:15.644024 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.549657, event: throttled, op: pg_info(1 pgs e65:1.a)
26> 2014-07-25 18:28:15.644032 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.549748, event: all_read, op: pg_info(1 pgs e65:1.a)
25> 2014-07-25 18:28:15.644039 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.643981, event: dispatched, op: pg_info(1 pgs e65:1.a)
24> 2014-07-25 18:28:15.644046 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.644045, event: waiting_for_osdmap, op: pg_info(1 pgs e65:1.a)
23> 2014-07-25 18:28:15.644055 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.644053, event: started, op: pg_info(1 pgs e65:1.a)
22> 2014-07-25 18:28:15.644071 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.644069, event: done, op: pg_info(1 pgs e65:1.a)
21> 2014-07-25 18:28:15.644082 7f4f3c1d7700 1 - 10.242.24.154:6803/28779 <== osd.1 10.242.24.111:6801/31290 596 ==== pg_info(1 pgs e65:6.10) v4 ==== 714+0+0 (522635800 0 0) 0x5230680 con 0x487c420
20> 2014-07-25 18:28:15.644093 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.550890, event: header_read, op: pg_info(1 pgs e65:6.10)
19> 2014-07-25 18:28:15.644101 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.550891, event: throttled, op: pg_info(1 pgs e65:6.10)
18> 2014-07-25 18:28:15.644108 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.550907, event: all_read, op: pg_info(1 pgs e65:6.10)
17> 2014-07-25 18:28:15.644117 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.644091, event: dispatched, op: pg_info(1 pgs e65:6.10)
16> 2014-07-25 18:28:15.644125 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.644123, event: waiting_for_osdmap, op: pg_info(1 pgs e65:6.10)
15> 2014-07-25 18:28:15.644133 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.644132, event: started, op: pg_info(1 pgs e65:6.10)
14> 2014-07-25 18:28:15.644150 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.644148, event: done, op: pg_info(1 pgs e65:6.10)
13> 2014-07-25 18:28:15.644164 7f4f3c1d7700 1 - 10.242.24.154:6803/28779 <== osd.0 10.66.26.5:6801/12812 627 ==== pg_info(1 pgs e65:8.9) v4 ==== 714+0+0 (527915233 0 0) 0x4dcec40 con 0x487c160
12> 2014-07-25 18:28:15.644175 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.549772, event: header_read, op: pg_info(1 pgs e65:8.9)
11> 2014-07-25 18:28:15.644184 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.549772, event: throttled, op: pg_info(1 pgs e65:8.9)
10> 2014-07-25 18:28:15.644191 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.549987, event: all_read, op: pg_info(1 pgs e65:8.9)
-9> 2014-07-25 18:28:15.644168 7f4f299b2700 -1 ** Caught signal (Aborted) *
in thread 7f4f299b2700
1: /usr/bin/ceph-osd() [0x974a1f]
2: (()+0x10340) [0x7f4f4c5ee340]
3: (gsignal()+0x39) [0x7f4f4a8c2f79]
4: (abort()+0x148) [0x7f4f4a8c6388]
5: (_gnu_cxx::_verbose_terminate_handler()+0x155) [0x7f4f4b1ce6b5]
6: (()+0x5e836) [0x7f4f4b1cc836]
7: (()+0x5e863) [0x7f4f4b1cc863]
8: (()+0x5eaa2) [0x7f4f4b1ccaa2]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0xa5a692]
10: (ReplicatedPG::agent_estimate_atime_temp(hobject_t const&, int*, int*)+0x12e) [0x7ad25e]
11: (ReplicatedPG::agent_maybe_evict(std::tr1::shared_ptr<ObjectContext>&)+0x7a) [0x7e1a2a]
12: (ReplicatedPG::agent_work(int)+0xf45) [0x7fb1d5]
13: (OSDService::agent_entry()+0x557) [0x6090c7]
14: (OSDService::AgentThread::entry()+0xd) [0x67201d]
15: (()+0x8182) [0x7f4f4c5e6182]
16: (clone()+0x6d) [0x7f4f4a98730d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
8> 2014-07-25 18:28:15.644199 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.644172, event: dispatched, op: pg_info(1 pgs e65:8.9)
7> 2014-07-25 18:28:15.644207 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.644205, event: waiting_for_osdmap, op: pg_info(1 pgs e65:8.9)
6> 2014-07-25 18:28:15.644233 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.644230, event: started, op: pg_info(1 pgs e65:8.9)
5> 2014-07-25 18:28:15.644249 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.644247, event: done, op: pg_info(1 pgs e65:8.9)
4> 2014-07-25 18:28:15.644258 7f4f3c1d7700 1 - 10.242.24.154:6803/28779 <== osd.1 10.242.24.111:6801/31290 597 ==== pg_info(1 pgs e65:8.26) v4 ==== 714+0+0 (367980315 0 0) 0x4dcf180 con 0x487c420
3> 2014-07-25 18:28:15.644267 7f4f3c1d7700 5 - op tracker -- , seq: 1264, time: 2014-07-25 18:28:15.550916, event: header_read, op: pg_info(1 pgs e65:8.26)
2> 2014-07-25 18:28:15.644273 7f4f3c1d7700 5 - op tracker -- , seq: 1264, time: 2014-07-25 18:28:15.550916, event: throttled, op: pg_info(1 pgs e65:8.26)
1> 2014-07-25 18:28:15.644282 7f4f3c1d7700 5 - op tracker -- , seq: 1264, time: 2014-07-25 18:28:15.551111, event: all_read, op: pg_info(1 pgs e65:8.26)
0> 2014-07-25 18:28:15.644291 7f4f3c1d7700 5 -- op tracker -- , seq: 1264, time: 2014-07-25 18:28:15.644265, event: dispatched, op: pg_info(1 pgs e65:8.26)
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.2.log
-- end dump of recent events ---
ubuntu@ubuntuServer1:~$
Files