Actions
Bug #7630
closedSimple tiering test produces OSD assert
Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
An initial teiring test produced the included assert just as rados bench started. pools were on the same physical disks. Not sure if the procedure for setting up the teiring was correct, but it probably should do something nicer than throw an assert in the log and fail horribly either way.
Steps to produce:
ceph osd pool create cache 4096 ceph osd pool create base 4096 ceph osd pool set cache size 1 ceph osd pool set base size 3 ceph osd tier add base cache ceph osd tier cache-mode cache writeback ceph osd tier set-overlay base cache ceph osd pool set cache hit_set_type bloom ceph osd pool set cache hit_set_count 8 ceph osd pool set cache hit_set_period 60 ceph osd pool set cache target_max_objects 5000
Client output:
2014-03-06 09:14:54.960043 7f3dd59aa700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6835/20215 pipe(0x7f3de400a400 sd=11 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de400a660).fault 2014-03-06 09:14:54.966151 7f3dd58a9700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6940/4132 pipe(0x7f3de400ab60 sd=12 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de400a8d0).fault 2014-03-06 09:14:54.971522 7f3dd65b6700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6800/17141 pipe(0x7f3de400ae60 sd=31 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de400b0c0).fault 2014-03-06 09:14:54.974074 7f3dd5aab700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6875/25320 pipe(0x7f3de400b320 sd=31 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de400b580).fault 2014-03-06 09:14:54.985299 7f3dd50a1700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6910/30727 pipe(0x7f3de400b7b0 sd=16 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de400ba10).fault 2014-03-06 09:14:54.985383 7f3dd60b1700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6850/22081 pipe(0x7f3de400bc40 sd=31 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de400bea0).fault 2014-03-06 09:14:54.987171 7f3dd51a2700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6840/20706 pipe(0x7f3de400c430 sd=31 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de400c690).fault 2014-03-06 09:14:54.987204 7f3d6b5f7700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6890/27510 pipe(0x15c1920 sd=30 :0 s=1 pgs=0 cs=0 l=1 c=0x15b1600).fault 2014-03-06 09:14:54.991210 7f3dd56a7700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6810/17957 pipe(0x7f3de400d320 sd=9 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de400d580).fault 2014-03-06 09:14:54.995999 7f3dd68b9700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6865/23979 pipe(0x7f3de400e630 sd=31 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de400e890).fault 2014-03-06 09:14:54.996277 7f3dd4d9e700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6920/32527 pipe(0x7f3de400f540 sd=31 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de400f7a0).fault 2014-03-06 09:14:54.997669 7f3dd4a9b700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6825/19273 pipe(0x7f3de4010490 sd=21 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de40106f0).fault 2014-03-06 09:14:55.007518 7f3dd5cad700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6870/24641 pipe(0x7f3de4010920 sd=31 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de4010b80).fault 2014-03-06 09:14:55.028786 7f3d6b7f9700 0 -- 192.168.10.2:0/1008072 >> 192.168.10.1:6815/18383 pipe(0x7f3de40114b0 sd=31 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3de4011710).fault
Assert in OSD logs:
0> 2014-03-06 09:20:14.811587 7fbf89839700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::process_copy_chunk(hobject_t, tid_t, int)' thread 7fbf89839700 time 2014-03-06 09:20:14.793633 osd/ReplicatedPG.cc: 5444: FAILED assert(cop->rval >= 0) ceph version 0.77-655-g195d53a (195d53a7fc695ed954c85022fef6d2a18f68fe20) 1: (ReplicatedPG::process_copy_chunk(hobject_t, unsigned long, int)+0xf9d) [0x885d8d] 2: (C_Copyfrom::finish(int)+0x99) [0x8d6f79] 3: (Context::complete(int)+0x9) [0x658fc9] 4: (Finisher::finisher_thread_entry()+0x1b8) [0x980da8] 5: (()+0x7f6e) [0x7fbfa4d4df6e] 6: (clone()+0x6d) [0x7fbfa32f29cd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 0/ 0 mds 0/ 0 mds_balancer 0/ 0 mds_locker 0/ 0 mds_log 0/ 0 mds_log_expire 0/ 0 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 0 filer 0/ 1 striper 0/ 0 objecter 0/ 0 rados 0/ 0 rbd 0/ 0 journaler 0/ 0 objectcacher 0/ 0 client 0/ 0 osd 0/ 0 optracker 0/ 0 objclass 0/ 0 filestore 1/ 3 keyvaluestore 0/ 0 journal 0/ 0 ms 0/ 0 mon 0/ 0 monc 0/ 0 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 0/ 0 heartbeatmap 0/ 0 perfcounter 0/ 0 rgw 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /tmp/cbt/ceph/log/osd.0.log --- end dump of recent events --- 2014-03-06 09:20:14.830535 7fbf89839700 -1 *** Caught signal (Aborted) ** in thread 7fbf89839700 ceph version 0.77-655-g195d53a (195d53a7fc695ed954c85022fef6d2a18f68fe20) 1: ceph-osd() [0x95ebbf] 2: (()+0xfbb0) [0x7fbfa4d55bb0] 3: (gsignal()+0x37) [0x7fbfa322ef77] 4: (abort()+0x148) [0x7fbfa32325e8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fbfa3b3a6e5] 6: (()+0x5e856) [0x7fbfa3b38856] 7: (()+0x5e883) [0x7fbfa3b38883] 8: (()+0x5eaae) [0x7fbfa3b38aae] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0xa3e332] 10: (ReplicatedPG::process_copy_chunk(hobject_t, unsigned long, int)+0xf9d) [0x885d8d] 11: (C_Copyfrom::finish(int)+0x99) [0x8d6f79] 12: (Context::complete(int)+0x9) [0x658fc9] 13: (Finisher::finisher_thread_entry()+0x1b8) [0x980da8] 14: (()+0x7f6e) [0x7fbfa4d4df6e] 15: (clone()+0x6d) [0x7fbfa32f29cd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2014-03-06 09:20:14.830535 7fbf89839700 -1 *** Caught signal (Aborted) ** in thread 7fbf89839700 ceph version 0.77-655-g195d53a (195d53a7fc695ed954c85022fef6d2a18f68fe20) 1: ceph-osd() [0x95ebbf] 2: (()+0xfbb0) [0x7fbfa4d55bb0] 3: (gsignal()+0x37) [0x7fbfa322ef77] 4: (abort()+0x148) [0x7fbfa32325e8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fbfa3b3a6e5] 6: (()+0x5e856) [0x7fbfa3b38856] 7: (()+0x5e883) [0x7fbfa3b38883] 8: (()+0x5eaae) [0x7fbfa3b38aae] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0xa3e332] 10: (ReplicatedPG::process_copy_chunk(hobject_t, unsigned long, int)+0xf9d) [0x885d8d] 11: (C_Copyfrom::finish(int)+0x99) [0x8d6f79] 12: (Context::complete(int)+0x9) [0x658fc9] 13: (Finisher::finisher_thread_entry()+0x1b8) [0x980da8] 14: (()+0x7f6e) [0x7fbfa4d4df6e] 15: (clone()+0x6d) [0x7fbfa32f29cd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 0/ 0 mds 0/ 0 mds_balancer 0/ 0 mds_locker 0/ 0 mds_log 0/ 0 mds_log_expire 0/ 0 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 0 filer 0/ 1 striper 0/ 0 objecter 0/ 0 rados 0/ 0 rbd 0/ 0 journaler 0/ 0 objectcacher 0/ 0 client 0/ 0 osd 0/ 0 optracker 0/ 0 objclass 0/ 0 filestore 1/ 3 keyvaluestore 0/ 0 journal 0/ 0 ms 0/ 0 mon 0/ 0 monc 0/ 0 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 0/ 0 heartbeatmap 0/ 0 perfcounter 0/ 0 rgw 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /tmp/cbt/ceph/log/osd.0.log --- end dump of recent events ---
Updated by Mark Nelson about 10 years ago
rados bench command run:
/usr/bin/rados -c /tmp/cbt/ceph/ceph.conf -p base -b 4194304 bench 300 write --concurrent-ios 32 --no-cleanup
Updated by Greg Farnum about 10 years ago
- Status changed from Resolved to Duplicate
Actions