Fix #5388: osd: localized reads (from replicas) reordered wrt writes - Ceph - Ceph

Actions

Copy link

Fix #5388

open

osd: localized reads (from replicas) reordered wrt writes

Added by Mike Bryant almost 11 years ago. Updated over 4 years ago.

Status:

New

Priority:

High

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I'm using hbase, with the hadoop-cephfs bindings, on top of a ceph 0.61 cluster.
I'm seeing instances where reading part of a file is returning all nulls, instead of the valid data.
This is showing up at the application level where it's trying to validate a magic number is there, failing, and crashing.

I've been running debugging the the hadoop bindings, and it appears to be doing the right thing, and just getting back nulls from the libcephfs layer.
The file it's reading incorrect data from is: ceph://null/hbase/tsdb/e50e991f7082c47e9172c5421af6e5f3/t/3847239684462394768

This is the interesting bit from the regionserver log:
2013-06-17 19:01:46,984 INFO org.apache.hadoop.fs.ceph.CephFileSystem: selectDataPool path=ceph://null/hbase/tsdb/e50e991f7082c47e9172c5421af6e5f3/.tmp/3566091379609591353 pool:repl=data:4 wanted=3
2013-06-17 19:01:47,172 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Bloom added to HFile (ceph://null/hbase/tsdb/e50e991f7082c47e9172c5421af6e5f3/.tmp/3566091379609591353): 4.5k, 3687/3841 (96%)
2013-06-17 19:01:47,233 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at ceph://null/hbase/tsdb/e50e991f7082c47e9172c5421af6e5f3/.tmp/3566091379609591353 to ceph://null/hbase/tsdb/e50e991f7082c47e9172c5421af6e5f3/t/3847239684462394768
2013-06-17 19:01:47,841 DEBUG org.apache.hadoop.fs.ceph.CephInputStream: CephInputStream constructor: initializing stream with fh 2651 and file length 2597375
2013-06-17 19:01:47,841 TRACE org.apache.hadoop.fs.ceph.CephInputStream: CephInputStream.seek: Seeking to position 2597315 on fd 2651
2013-06-17 19:01:47,841 TRACE org.apache.hadoop.fs.ceph.CephInputStream: CephInputStream.read: Reading 8 bytes from fd 2651
2013-06-17 19:01:47,841 TRACE org.apache.hadoop.fs.ceph.CephInputStream: CephInputStream.read: Reading 8 bytes from fd 2651: succeeded in reading 8 bytes
2013-06-17 19:01:47,841 TRACE org.apache.hadoop.fs.ceph.CephInputStream: CephInputStream.read: 8 byte output: 0000000000000000
(This is the bit where it's reading the trailer, and it should be some bytes corresponding to TRABLK)
2013-06-17 19:01:47,842 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server serverName=ec02sv14.ocado.com,60020,1371491732490, load=(requests=264597, regions=132, usedHeap=2709, maxHeap=4062): Replay of HLog required. Forcing server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,\x00\x00fQ\xB1\xE7`\x00\x00\x01\x00\x0E\xC2\x00\x00\x03\x00\x00\xC1,1371473213321.e50e991f7082c47e9172c5421af6e5f3.
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1070)
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:202)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:222)
Caused by: java.io.IOException: Trailer 'header' is wrong; does the trailer size match content?
at org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1527)
at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:885)
at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819)
at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.loadFileInfo(StoreFile.java:1003)
at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:382)
at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:438)
at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:557)
at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
... 5 more

For comparison this is an instance where it was fine:
2013-06-17 19:01:45,872 TRACE org.apache.hadoop.fs.ceph.CephInputStream: CephInputStream.read: Reading 8 bytes from fd 2644: succeeded in rea
ding 8 bytes
2013-06-17 19:01:45,872 TRACE org.apache.hadoop.fs.ceph.CephInputStream: CephInputStream.read: 8 byte output: 545241424c4b2224

When I look at the file it was reading from a fuse mountpoint, I can see the correct sequence of bytes.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Fix #5388

osd: localized reads (from replicas) reordered wrt writes

Updated by Sage Weil almost 11 years ago

Updated by Mike Bryant almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Mike Bryant almost 11 years ago

Updated by Sage Weil over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Samuel Just over 10 years ago

Updated by Patrick Donnelly over 4 years ago