Project

General

Profile

Actions

Bug #8947

closed

Writing rados objects with max objects set for cache pool crashed osd

Added by Mallikarjun Biradar almost 10 years ago. Updated almost 10 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Setting target_max_objects parameter and writing rados object onto cache pool crashed osd.

History of operations on the setup:
There are 2 other cache tiers which are used to test parameters
1> cache_min_flush_age & cache_target_full_ratio
2> cache_min_flush_age

Steps:
1> Created erasure coded profile "ecprofile"
2> Created erasure coded pool "ecpoolobj" using "ecprofile"
3> Created cachepoolobj (replicated pool)
4> Created cache tier using ecpoolobj & cachepoolobj
5> Set cachepoolobj in writeback mode
6> Set-overlay on ecpoolobj to redirect IO's to cachepoolobj
7> Set target_max_objects parameter to 10
8> Wrote one object onto cachepoolobj using rados utility

Start writing 10 rados objects on cachepoolobj

one of the osd got crashed initially.

After sometime, 3 of the osd's went down.

Attaching all osd's log.

ubuntu@ubuntuServer1:~$ sudo ceph osd pool create ecpoolobj 12 12 erasure ecprofile
pool 'ecpoolobj' created
ubuntu@ubuntuServer1:~$

ubuntu@ubuntuServer1:~$ sudo ceph osd pool create cachepoolobj 256 256
pool 'cachepoolobj' created

ubuntu@ubuntuServer1:~$ sudo ceph osd tier add ecpoolobj cachepoolobj
pool 'cachepoolobj' is now (or already was) a tier of 'ecpoolobj'

ubuntu@ubuntuServer1:~$ sudo ceph osd tier cache-mode cachepoolobj writeback
set cache-mode for pool 'cachepoolobj' to writeback

ubuntu@ubuntuServer1:~$ sudo ceph osd tier set-overlay ecpoolobj cachepoolobj
overlay for 'ecpoolobj' is now (or already was) 'cachepoolobj'

ubuntu@ubuntuServer1:~$ sudo ceph osd pool set cachepool target_max_objects 10
set pool 4 target_max_objects to 10
ubuntu@ubuntuServer1:~$

ubuntu@ubuntuServer1:~$ sudo rados df
pool name category KB objects clones degraded unfound rd rd KB wr wr KB
cachepool - 36 1 0 0 0 0 0 3 37
cachepoolage - 36 1 0 0 0 0 0 2 37
cachepoolobj - 0 0 0 0 0 0 0 0 0
data - 0 0 0 0 0 0 0 0 0
ecpool - 36 1 0 0 0 1 0 2 37
ecpoolage - 0 0 0 0 0 0 0 0 0
ecpoolobj - 0 0 0 0 0 0 0 0 0
metadata - 0 0 0 0 0 0 0 0 0
rbd - 0 0 0 0 0 0 0 0 0
total used 170460 3
total avail 757575652
total space 757746112
ubuntu@ubuntuServer1:~$

ubuntu@ubuntufunc1:~$ for i in {1..9}; do sudo rados put p cachepoolobj obj1 CEPH-Ubuntu14.04_x64-1.06.00/python-ceph_0.80-1_amd64.deb; done
ubuntu@ubuntufunc1:~$ for i in {1..9}; do sudo rados put -p cachepoolobj obj$i CEPH-Ubuntu14.04_x64-1.06.00/python-ceph_0.80-1_amd64.deb; done
2014-07-25 18:26:03.041974 7f895d162700 0 -
10.66.26.5:0/1013789 >> 10.242.24.154:6802/28779 pipe(0x1f1bbe0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x1f1be50).fault
2014-07-25 18:26:34.521877 7f895d061700 0 -- 10.66.26.5:0/1013789 >> 10.66.26.5:6800/12812 pipe(0x7f8954007190 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8954007400).fault

Crash logs:
28> 2014-07-25 18:28:15.643983 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.549656, event: header_read, op: pg_info(1 pgs e65:1.a)
27> 2014-07-25 18:28:15.644024 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.549657, event: throttled, op: pg_info(1 pgs e65:1.a)
26> 2014-07-25 18:28:15.644032 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.549748, event: all_read, op: pg_info(1 pgs e65:1.a)
25> 2014-07-25 18:28:15.644039 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.643981, event: dispatched, op: pg_info(1 pgs e65:1.a)
24> 2014-07-25 18:28:15.644046 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.644045, event: waiting_for_osdmap, op: pg_info(1 pgs e65:1.a)
23> 2014-07-25 18:28:15.644055 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.644053, event: started, op: pg_info(1 pgs e65:1.a)
22> 2014-07-25 18:28:15.644071 7f4f3c1d7700 5 - op tracker -- , seq: 1261, time: 2014-07-25 18:28:15.644069, event: done, op: pg_info(1 pgs e65:1.a)
21> 2014-07-25 18:28:15.644082 7f4f3c1d7700 1 - 10.242.24.154:6803/28779 <== osd.1 10.242.24.111:6801/31290 596 ==== pg_info(1 pgs e65:6.10) v4 ==== 714+0+0 (522635800 0 0) 0x5230680 con 0x487c420
20> 2014-07-25 18:28:15.644093 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.550890, event: header_read, op: pg_info(1 pgs e65:6.10)
19> 2014-07-25 18:28:15.644101 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.550891, event: throttled, op: pg_info(1 pgs e65:6.10)
18> 2014-07-25 18:28:15.644108 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.550907, event: all_read, op: pg_info(1 pgs e65:6.10)
17> 2014-07-25 18:28:15.644117 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.644091, event: dispatched, op: pg_info(1 pgs e65:6.10)
16> 2014-07-25 18:28:15.644125 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.644123, event: waiting_for_osdmap, op: pg_info(1 pgs e65:6.10)
15> 2014-07-25 18:28:15.644133 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.644132, event: started, op: pg_info(1 pgs e65:6.10)
14> 2014-07-25 18:28:15.644150 7f4f3c1d7700 5 - op tracker -- , seq: 1262, time: 2014-07-25 18:28:15.644148, event: done, op: pg_info(1 pgs e65:6.10)
13> 2014-07-25 18:28:15.644164 7f4f3c1d7700 1 - 10.242.24.154:6803/28779 <== osd.0 10.66.26.5:6801/12812 627 ==== pg_info(1 pgs e65:8.9) v4 ==== 714+0+0 (527915233 0 0) 0x4dcec40 con 0x487c160
12> 2014-07-25 18:28:15.644175 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.549772, event: header_read, op: pg_info(1 pgs e65:8.9)
11> 2014-07-25 18:28:15.644184 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.549772, event: throttled, op: pg_info(1 pgs e65:8.9)
10> 2014-07-25 18:28:15.644191 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.549987, event: all_read, op: pg_info(1 pgs e65:8.9)
-9> 2014-07-25 18:28:15.644168 7f4f299b2700 -1 ** Caught signal (Aborted) *
in thread 7f4f299b2700

1: /usr/bin/ceph-osd() [0x974a1f]
2: (()+0x10340) [0x7f4f4c5ee340]
3: (gsignal()+0x39) [0x7f4f4a8c2f79]
4: (abort()+0x148) [0x7f4f4a8c6388]
5: (_gnu_cxx::_verbose_terminate_handler()+0x155) [0x7f4f4b1ce6b5]
6: (()+0x5e836) [0x7f4f4b1cc836]
7: (()+0x5e863) [0x7f4f4b1cc863]
8: (()+0x5eaa2) [0x7f4f4b1ccaa2]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0xa5a692]
10: (ReplicatedPG::agent_estimate_atime_temp(hobject_t const&, int*, int*)+0x12e) [0x7ad25e]
11: (ReplicatedPG::agent_maybe_evict(std::tr1::shared_ptr&lt;ObjectContext&gt;&)+0x7a) [0x7e1a2a]
12: (ReplicatedPG::agent_work(int)+0xf45) [0x7fb1d5]
13: (OSDService::agent_entry()+0x557) [0x6090c7]
14: (OSDService::AgentThread::entry()+0xd) [0x67201d]
15: (()+0x8182) [0x7f4f4c5e6182]
16: (clone()+0x6d) [0x7f4f4a98730d]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
8> 2014-07-25 18:28:15.644199 7f4f3c1d7700  5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.644172, event: dispatched, op: pg_info(1 pgs e65:8.9)
7> 2014-07-25 18:28:15.644207 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.644205, event: waiting_for_osdmap, op: pg_info(1 pgs e65:8.9)
6> 2014-07-25 18:28:15.644233 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.644230, event: started, op: pg_info(1 pgs e65:8.9)
5> 2014-07-25 18:28:15.644249 7f4f3c1d7700 5 - op tracker -- , seq: 1263, time: 2014-07-25 18:28:15.644247, event: done, op: pg_info(1 pgs e65:8.9)
4> 2014-07-25 18:28:15.644258 7f4f3c1d7700 1 - 10.242.24.154:6803/28779 <== osd.1 10.242.24.111:6801/31290 597 ==== pg_info(1 pgs e65:8.26) v4 ==== 714+0+0 (367980315 0 0) 0x4dcf180 con 0x487c420
3> 2014-07-25 18:28:15.644267 7f4f3c1d7700 5 - op tracker -- , seq: 1264, time: 2014-07-25 18:28:15.550916, event: header_read, op: pg_info(1 pgs e65:8.26)
2> 2014-07-25 18:28:15.644273 7f4f3c1d7700 5 - op tracker -- , seq: 1264, time: 2014-07-25 18:28:15.550916, event: throttled, op: pg_info(1 pgs e65:8.26)
1> 2014-07-25 18:28:15.644282 7f4f3c1d7700 5 - op tracker -- , seq: 1264, time: 2014-07-25 18:28:15.551111, event: all_read, op: pg_info(1 pgs e65:8.26)
0> 2014-07-25 18:28:15.644291 7f4f3c1d7700 5 -- op tracker -- , seq: 1264, time: 2014-07-25 18:28:15.644265, event: dispatched, op: pg_info(1 pgs e65:8.26)
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.2.log
--
end dump of recent events ---
ubuntu@ubuntuServer1:~$

Files

ceph-osd.0.log (10.4 KB) ceph-osd.0.log log for crashed osd Mallikarjun Biradar, 07/28/2014 08:15 AM
ceph-osd.2.log (9.14 KB) ceph-osd.2.log log for crashed osd Mallikarjun Biradar, 07/28/2014 08:15 AM
ceph-osd.3.log (78.9 KB) ceph-osd.3.log log for crashed osd Mallikarjun Biradar, 07/28/2014 08:15 AM
_usr_bin_ceph-osd.0.crash (11.8 MB) _usr_bin_ceph-osd.0.crash Mallikarjun Biradar, 07/28/2014 10:46 PM
_usr_bin_ceph-osd.0.zip (8.18 MB) _usr_bin_ceph-osd.0.zip Mallikarjun Biradar, 07/28/2014 10:46 PM

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #8982: cache pool osds crashing when data is evicting to underlying storage poolResolvedSage Weil07/31/2014

Actions
Actions

Also available in: Atom PDF