Project

General

Profile

Bug #13801

Deep-scrub will crash osd if osd backend is newstore

Added by Zhi Zhang over 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
Start date:
11/16/2015
Due date:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Because of this change (https://github.com/ceph/ceph/pull/6076), the hobject_t will contain pool id, hence the ghobject_t having this hobject_t will be not equal to ghobject_t().

In newstore, this will cause assertion failure as shown in below logs.

2015-11-12 18:59:32.045862 7fa22797a700 10 osd.6 1006 dequeue_op 0x7fa2473adb00 prio 127 cost 0 latency 0.000174 replica scrub(pg: 3.1e5,from:0'0,to:0'0,epoch:1006,start:MIN,end:MAX,chunky:1,deep:1,seed:4294967295,version:6) v6 pg pg[3.1e5( empty local-les=1004 n=0 ec=836 les/c/f 1004/1004/0 1003/1003/996) [0,6] r=1 lpr=1003 pi=930-1002/11 crt=0'0 active]
2015-11-12 18:59:32.045925 7fa22797a700 10 osd.6 pg_epoch: 1006 pg[3.1e5( empty local-les=1004 n=0 ec=836 les/c/f 1004/1004/0 1003/1003/996) [0,6] r=1 lpr=1003 pi=930-1002/11 crt=0'0 active] handle_message: 0x7fa2473adb00
2015-11-12 18:59:32.046018 7fa22797a700 10 osd.6 pg_epoch: 1006 pg[3.1e5( empty local-les=1004 n=0 ec=836 les/c/f 1004/1004/0 1003/1003/996) [0,6] r=1 lpr=1003 pi=930-1002/11 crt=0'0 active] build_scrub_map_chunk [3/00000000//0,MAX)  seed 4294967295
2015-11-12 18:59:32.046046 7fa22797a700 15 newstore(/var/lib/ceph/osd/ceph-6) collection_list 3.1e5_head start 3/00000000//0 end MAX max 2147483647
2015-11-12 18:59:32.046071 7fa22797a700 20 newstore(/var/lib/ceph/osd/ceph-6) collection_list range --.7ffffffffffffffb.a7800000. to --.7ffffffffffffffb.a8000000. and --.8000000000000003.a7800000. to --.8000000000000003.a8000000. start 3/00000000//0
2015-11-12 18:59:32.049697 7fa22797a700 -1 os/newstore/NewStore.cc: In function 'virtual int NewStore::collection_list(coll_t, ghobject_t, ghobject_t, bool, int, std::vector<ghobject_t>*, ghobject_t*)' thread 7fa22797a700 time 2015-11-12 18:59:32.046131
os/newstore/NewStore.cc: 1591: FAILED assert(k >= start_key && k < end_key)

The fix is to make compatible with previous change to create a ghobject_t() object with pool id and shard id in newstore.


Related issues

Related to Ceph - Bug #13668: OSD always crashes while doing scrub with new store as the backend Rejected 11/02/2015

Associated revisions

Revision 32e76839 (diff)
Added by Sage Weil about 3 years ago

os/newstore: make collection_list tolerate sloppy start position

Because of this change (#6076), the hobject_t will contain pool id, hence
the ghobject_t having this hobject_t will be not equal to ghobject_t().

In newstore, this will cause assertion failure:
FAILED assert(k >= start_key && k < end_key)

The fix is to make compatible with previous change to create a
ghobject_t object with pool id and shard id in newstore.

Fixes: #13801
Reported-by: Zhi Zhang <>
Signed-off-by: Sage Weil <>

History

#2 Updated by Kefu Chai over 3 years ago

  • Status changed from New to Need Review

#3 Updated by Sage Weil over 3 years ago

  • Assignee set to Sage Weil

#4 Updated by Nathan Cutler over 3 years ago

  • Related to Bug #13668: OSD always crashes while doing scrub with new store as the backend added

#5 Updated by Nathan Cutler over 3 years ago

#13668 looks related, or possibly a duplicate.

#13827 describes a scrub-related OSD crash in Hammer.

#6 Updated by Sage Weil over 2 years ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF