Bug #2185
osd/ReplicatedPG.cc: 5938: FAILED assert(r >= 0) in ReplicatedPG::scan_range()
0%
Description
osd/ReplicatedPG.cc: In function 'void ReplicatedPG::scan_range(hobject_t, int, int, PG::BackfillInterval*)' thread 7f87ebdcc700 time 2012-03-16 23:03:44.898625 osd/ReplicatedPG.cc: 5938: FAILED assert(r >= 0) ceph version 0.43 (commit:9fa8781c0147d66fcef7c2dd0e09cd3c69747d37) 1: (ReplicatedPG::scan_range(hobject_t, int, int, PG::BackfillInterval*)+0xd63) [0x521d43] 2: (ReplicatedPG::recover_backfill(int)+0x9bb) [0x53e73b] 3: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x232) [0x544392] 4: (OSD::do_recovery(PG*)+0x345) [0x58fe85] 5: (ThreadPool::worker()+0xa26) [0x6564a6] 6: (ThreadPool::WorkThread::entry()+0xd) [0x5d33ad] 7: (()+0x68ca) [0x7f87fcb4f8ca] 8: (clone()+0x6d) [0x7f87fb1d392d]
History
#1 Updated by Sage Weil over 11 years ago
- Description updated (diff)
- Private changed from Yes to No
#2 Updated by Oliver Francke over 11 years ago
Here output from osd.3 after recent crash:
root@fcmsnode3:/data/osd3/current# find 0.0_head
0.0_head
0.0_head/1000000001a.00000098__head_EDFB2800
0.0_head/1000000001a.000001f7__head_2FEF8C00
0.0_head/1000000001b.0000002e__head_74679E00
0.0_head/1000000001b.00000087__head_D3B1BC00
0.0_head/1000000001d.0000038e__head_19EEBA00
0.0_head/10000000024.000000a4__head_F8564600
0.0_head/10000000024.00000113__head_346BDE00
0.0_head/10000000024.0000014d__head_BAA3F000
0.0_head/10000000024.000002ed__head_A3E2CE00
0.0_head/10000000024.00000340__head_8712A400
0.0_head/10000000027.000003a7__head_C778B600
0.0_head/10000000028.00000053__head_87049E00
0.0_head/1000000002d.0000000f__head_89C76C00
0.0_head/10000000031.0000026d__head_782E0200
0.0_head/10000000032.0000002f__head_D4C43A00
0.0_head/20000000006.00000046__head_2DB32000
0.0_head/20000000007.0000004c__head_D1804C00
0.0_head/20000000007.00000204__head_AB17B800
0.0_head/20000000007.00000244__head_D880B400
0.0_head/10000000075.00000005__head_8DF44800
0.0_head/20000000008.000000ad__head_C29DA600
0.0_head/20000000008.000001ab__head_D04D3400
0.0_head/20000000008.000001dd__head_7A0FBE00
0.0_head/10000000463.000001e0__head_3B34FE00
0.0_head/1000000007b.0000003c__head_41227E00
#3 Updated by Sage Weil over 11 years ago
- Category set to OSD
on osd.0, root@fcmsnode0:/data/osd0/current/temp# getfattr -d . -ehex # file: . user.cephos.collection_version=0x02000000 user.cephos.phash.contents=0x0110000000000000000000000000000000 root@fcmsnode0:/data/osd0/current/temp# ls -al total 35964 drwxr-xr-x 2 root root 4096 Mar 17 02:05 . drwxr-xr-x 956 root root 32768 Mar 17 01:57 .. -rw-r--r-- 1 root root 3145728 Mar 17 02:05 10000000039.00000004__head_A8A4AB1B -rw-r--r-- 1 root root 3145728 Mar 17 02:05 10000000039.00000008__head_7377ECD0 -rw-r--r-- 1 root root 2097152 Mar 17 02:05 rb.0.0.00000000003c__head_E4EBCDD8 -rw-r--r-- 1 root root 2097152 Mar 17 02:05 rb.0.0.00000000003e__head_F1387790 -rw-r--r-- 1 root root 2097152 Mar 17 02:05 rb.0.0.000000000059__head_3DD789D8 -rw-r--r-- 1 root root 2097152 Mar 17 02:05 rb.0.0.00000000009d__head_03F35AAF -rw-r--r-- 1 root root 2097152 Mar 17 02:05 rb.0.0.0000000002d7__head_7FFE27F9 -rw-r--r-- 1 root root 2097152 Mar 17 02:05 rb.0.0.0000000002d9__head_E236D1B1 -rw-r--r-- 1 root root 1048576 Mar 17 02:05 rb.0.0.0000000004b7__head_D44ACE19 -rw-r--r-- 1 root root 3145728 Mar 17 02:05 rb.0.0.0000000004cb__head_D03C490F -rw-r--r-- 1 root root 1048576 Mar 17 02:05 rb.0.0.0000000004da__head_ECD1B618 -rw-r--r-- 1 root root 3145728 Mar 17 02:05 rb.0.0.000000000f77__head_C93E21C1 -rw-r--r-- 1 root root 3145728 Mar 17 02:05 rb.0.0.000000001856__head_676E61AF -rw-r--r-- 1 root root 2097152 Mar 17 02:05 rb.0.0.000000001d57__head_E8F5F0A9 -rw-r--r-- 1 root root 2097152 Mar 17 02:05 rb.0.0.000000001d58__head_B551E8D1 -rw-r--r-- 1 root root 2097152 Mar 17 02:05 rb.0.2.000000000316__head_6BCB1B11 r = -61 coll = temp 2012-03-17 01:56:01.378358 7f91a87a6700 journal throttle: waited for ops osd/ReplicatedPG.cc: In function 'void ReplicatedPG::scan_range(hobject_t, int, int, PG::BackfillInterval*)' thread 7f91a87a6700 time 2012-03-17 01:58:01.501100 osd/ReplicatedPG.cc: 5956: FAILED assert(r >= 0) ceph version 0.43-7-gcffe0ca (commit:cffe0caecdeba57c97e2bd3f74f679c16d4a4e0a) 1: (ReplicatedPG::scan_range(hobject_t, int, int, PG::BackfillInterval*)+0xce5) [0x532685] 2: (ReplicatedPG::do_scan(OpRequest*)+0x17e) [0x53286e] 3: (OSD::dequeue_op(PG*)+0x109) [0x592569] 4: (ThreadPool::worker()+0xa26) [0x602fb6] 5: (ThreadPool::WorkThread::entry()+0xd) [0x5d382d] 6: (()+0x68ca) [0x7f91b912c8ca] 7: (clone()+0x6d) [0x7f91b77b086d] ceph version 0.43-7-gcffe0ca (commit:cffe0caecdeba57c97e2bd3f74f679c16d4a4e0a) 1: (ReplicatedPG::scan_range(hobject_t, int, int, PG::BackfillInterval*)+0xce5) [0x532685] 2: (ReplicatedPG::do_scan(OpRequest*)+0x17e) [0x53286e] 3: (OSD::dequeue_op(PG*)+0x109) [0x592569] 4: (ThreadPool::worker()+0xa26) [0x602fb6] 5: (ThreadPool::WorkThread::entry()+0xd) [0x5d382d] 6: (()+0x68ca) [0x7f91b912c8ca] 7: (clone()+0x6d) [0x7f91b77b086d]
#4 Updated by Sage Weil over 11 years ago
strace indicated we had a missing xattr on
2268 stat("/data/osd0/current/164.2_head/rb.0.0.000000000000__head_DA680EE2", {st_mode=S_IFREG|0644, st_size=4194304, ...}) = 0
2268 getxattr("/data/osd0/current/164.2_head/rb.0.0.000000000000__head_DA680EE2", "user.ceph._", 0x7fb8d15492f0, 100) = -1 ENODATA (No data available)
No other references to it in the log. all other objects in the collection look fine.
we can either move the object out of the way, or try to rebuild the xattr.
#5 Updated by Samuel Just over 11 years ago
should be pretty easy to rebuild the xattr, removing the object would corrupt the rbd image
#6 Updated by Yehuda Sadeh over 11 years ago
In the wip-rbd-bid branch that I pushed last week I added an option to the rbd tool to create images using existing data. E.g., for this case we could run:
rbd create --size=<size> --bid=0 <newname>
and this will create a new rbd image out of the old one (though without any snapshots info). In any case, I think this should only be merged in with some --yes-i-really-mean-it flag as it can cause issues if misused.
#7 Updated by Sage Weil over 11 years ago
- Status changed from Need More Info to Won't Fix