Project

General

Profile

Bug #3112

ceph 32 bit kernel client issue with file size more than 4GB.

Added by Mohamed Pakkeer over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

We have been using 32 bit and 64 bit ceph cluster and kernel client. we have mounted ceph cluster using 32 bit and 64 bit Ubuntu OS kernel client. 64 bit cluster and 64bit kernel client is working fine. We are facing some serious issue with 32 bit kernel client with file size more than 4GB. Cluster and kernel client has configured with Ubuntu 12.04 Natty.

we used some video files for this testing. we are using Md5sum tool to check file integrity. Following is the test result of some different size video files. we found that checksum is equal ,where file size is less than or equal to 4GB. If the file size is more than 4GB ,checksum is different. It means file is corrupted.

File Name : TSOP_R3_R4_R5_MAL_V7_MPEG-reel-3-mpeg2.mxf
File Size : 3.7GB

Checksum for file in local disk
/Downloads$ md5sum TSOP_R3_R4_R5_MAL_V7_MPEG-reel-3-mpeg2.mxf
41fe810ac7b93bc7022d92f4d0bf13a2 TSOP_R3_R4_R5_MAL_V7_MPEG-reel-3-mpeg2.mxf

checksum for file in ceph cluster

/mnt/ceph/DCP$ sudo md5sum TSOP_R3_R4_R5_MAL_V7_MPEG-reel-3-mpeg2.mxf
41fe810ac7b93bc7022d92f4d0bf13a2 TSOP_R3_R4_R5_MAL_V7_MPEG-reel-3-mpeg2.mxf

File Name : TOP_R10_MPEG-reel-1-mpeg2.mxf
File Size : 4.0GB

Checksum for file in local disk

/Downloads$ md5sum TOP_R10_MPEG-reel-1-mpeg2.mxf
56c5ba8610a14f959131f01484ff1646 TOP_R10_MPEG-reel-1-mpeg2.mxf

checksum for file in ceph cluster

/mnt/ceph/DCP$ sudo md5sum TOP_R10_MPEG-reel-1-mpeg2.mxf
56c5ba8610a14f959131f01484ff1646 TOP_R10_MPEG-reel-1-mpeg2.mxf

File Name: ROMEO_R1-2-3-4-5-6-7-8_MPEG_230312-reel-7-mpeg2.mxf
File Size: 4.4GB

Checksum for file in local disk

/Downloads$ md5sum ROMEO_R1-2-3-4-5-6-7-8_MPEG_230312-reel-7-mpeg2.mxf
1d5728cf33cfa8cb617ca0990bced8b8 ROMEO_R1-2-3-4-5-6-7-8_MPEG_230312-reel-7-mpeg2.mxf

checksum for file in ceph cluster

/mnt/ceph/DCP$ sudo md5sum ROMEO_R1-2-3-4-5-6-7-8_MPEG_230312-reel-7-mpeg2.mxf
84e71c31b4474f52345a988abca47de0 ROMEO_R1-2-3-4-5-6-7-8_MPEG_230312-reel-7-mpeg2.mxf

File Name : ROMEO_R1-2-3-4-5-6-7-8_MPEG_230312-reel-6-mpeg2.mxf
File Size : 5.7 GB

Checksum for file in local disk

/Downloads$ md5sum ROMEO_R1-2-3-4-5-6-7-8_MPEG_230312-reel-6-mpeg2.mxf
75dffda35ba2d6505f179a7390d4f9a6 ROMEO_R1-2-3-4-5-6-7-8_MPEG_230312-reel-6-mpeg2.mxf

checksum for file in ceph cluster

/mnt/ceph/DCP$ sudo md5sum ROMEO_R1-2-3-4-5-6-7-8_MPEG_230312-reel-6-mpeg2.mxf
266ffe1eb14a7b263ae5f2ede9b45a61 ROMEO_R1-2-3-4-5-6-7-8_MPEG_230312-reel-6-mpeg2.mxf

History

#1 Updated by Sage Weil over 11 years ago

Hi,

A real simple test is to see if a write to a file offset > 4gb inappropriately wraps to a low file offset. Can you try

echo foo > /mnt/ceph/foo
dd if=/dev/zero of=/mnt/ceph/foo bs=1M count=1 seek=4096
head -1 /mnt/ceph/foo

and see if it still says 'foo' or nothing?

thanks!

#2 Updated by Mohamed Pakkeer over 11 years ago

Hi Sage,
I tried with dd, it shows foo. But I have tried nearly 100 more video files, but i am getting same error on all video files, which size is more than 4 GB. Same files are working fine with 64 bit kernel client. I didn't understand the issue.

Thanks for your help.

#3 Updated by Mohamed Pakkeer over 11 years ago

I tried with the following command to create a file with random data

dd if=/dev/random of=cephtest bs=1M count=1 seek=4096

and calculate checksum on local machine and ceph cluster ,checksum is different. But when i use the above command with zero ( if=/dev/zero) instead of random checksum is same. So There is some bug in 32 bit kernel client for file size more than 4GB.

#4 Updated by Mohamed Pakkeer over 11 years ago

I tried your dd command with /dev/random

echo foo > /mnt/ceph/foo
dd if=/dev/random of=/mnt/ceph/foo bs=1M count=1 seek=4096
head -1 /mnt/ceph/foo

i didn't get any output.

#5 Updated by Mohamed Pakkeer over 11 years ago

we have created two files( 4GB and 5 GB) using dd and if=/dev/urandom in our local machine for 32 kernel client issue testing.

File Size 4GB (checksum is same)

cephclient01:~/Downloads$ sudo dd if=/dev/urandom of=cephtesturandom bs=64M count=64
64+0 records in
64+0 records out
4294967296 bytes (4.3 GB) copied, 617.822 s, 7.0 MB/s

cephclient01:~/Downloads$ sudo cp cephtesturandom /mnt/ceph/

cephclient01:~/Downloads$ sudo md5sum cephtesturandom
b314669d035eee4591050fe431b4cb1d cephtesturandom

cephclient01~:*/mnt/ceph*$ sudo md5sum cephtesturandom
b314669d035eee4591050fe431b4cb1d cephtesturandom

FileSize 5GB (checksum is different)

cephclient01:~/Downloads$ sudo dd if=/dev/urandom of=cephtesturandoms bs=64M count=80
80+0 records in
80+0 records out
5368709120 bytes (5.4 GB) copied, 774.86 s, 6.9 MB/s

cephclient01:~/Downloads$ sudo cp cephtesturandoms /mnt/ceph/

cephclient01:~/Downloads$ sudo md5sum cephtesturandoms
85bc459d16f527e5cf1bb815c55efced cephtesturandoms

cephclient01:*/mnt/ceph*$ sudo md5sum cephtesturandoms
c9bcb28a0baa31e9c0e28708665525b2 cephtesturandoms

I have ensured ,there is no bug in linux cp command and md5sum from the following testing. I have copied same cephtesturandoms file in to local machine different location and remote machine local drive and calculated checksum on all location files and  got same checksum.

cephclient01:~/Downloads$ sudo md5sum cephtesturandoms
85bc459d16f527e5cf1bb815c55efced cephtesturandoms

Copy cephtesturandoms file in to local machine different folder

cephadmin@cephclient01:~/Downloads$ cp cephtesturandoms test/

cephadmin@cephclient01:~/Downloads/test$ md5sum cephtesturandoms
85bc459d16f527e5cf1bb815c55efced cephtesturandoms

Copy cephtesturandoms file in to remote machine

cephclient01:~/Downloads/test$ scp cephtesturandoms :/home/cephadmin/
cephtesturandoms 100% 5120MB 24.7MB/s 03:27

cephclient02:~$ sudo md5sum cephtesturandoms
85bc459d16f527e5cf1bb815c55efced cephtesturandoms

#6 Updated by Alex Elder over 11 years ago

  • Status changed from New to 12

I have set up a VM running a 32-bit kernel. It reports
via arch(1) that it is an i686 architecture. It is not
running a stock Ubuntu kernel so there could be a difference
as a result.

In any case, I have mounted a ceph file system on /mnt in that
environment and created a file this way:

echo foo > /mnt/foo
dd if=/dev/zero of=/mnt/foo bs=1M count=2 seek=4095

Separately, in a user-mode-linux environment (which
reports itself as x86_64 architecture) I created another
file:

echo bar > /mnt/bar
dd if=/dev/zero of=/mnt/bar bs=1M count=2 seek=4095

Running "cksum" on the result in both environments
did end up showing some odd behavior. In the 64-bit
uml environment I got these checksums:

1266644242 4296015872 /mnt/bar
3014477699 4296015872 /mnt/foo

Doing the same command in the 32-bit environment produced
the same results. HOWEVER, running the same cksum command
once more came up with a different checksum for /mnt/foo:

1266644242 4296015872 /mnt/bar
1164664926 4296015872 /mnt/foo

Repeating this several more times in the 32-bit environment
gets the same result (1266644242).

Back in the 64-bit environment, the checksums are all the
same as the original run.

At this point I don't know that I've learned anything new
or revealing, but I have reproduced a problem.

#7 Updated by Alex Elder over 11 years ago

I created a program to write out patterned data to a file,
and in a separate read mode, verify the data in a file
contains the expected pattern. The program writes out in
buffers of 1 MB, with patterns written every 512 bytes
(on sector boundaries).

I wrote out an (8GB + 1MB) file from the 64-bit environment
to a file in a ceph-mounted file system. I then read that
file back in, and it found all the data read matched what
was expected. (There were other issues--see below.)

I then read that 64-bit written file back from a VM
running a 32-bit kernel. Here there were errors reported
at the 4GB byte offset, and again at the 8GB byte offset.
What was read at both offses was what was written at sector
offset 0. So there clearly is a wrapping problem when
reading data above the 4GB boundary (i.e., 32-bit byte
offset).

I have had trouble successfully writing large files from
the 32-bit environment without my virtual machine hanging.
It seems to write a varying amount (unrelated to and much
less than any 32-bit boundary) before it hangs. So I have
not been successful testing a write from the 32-bit environment.
Hooking gdb to the VM process (thanks Dan!) shows that a BUG
got called as a result of kunmap() being called on a high
memory page. This happened inside the workqueue code, and
it looks to me like the work_queue structure being operated
on is bogus.

As a separate matter...

While reading the file generated in the 64-bit environment
from the 64-bit environment I got repeated errors reported
from the kernel--that memory had been exhausted--like this:
[42377.390000] kworker/0:2: page allocation failure: order:0, mode:0x20

#8 Updated by Alex Elder over 11 years ago

(Note--the problem writing files from the 32-bit environment has
been resolved. More info here: http://tracker.newdream.net/issues/3187)

I have found the problem.

In Linux, each page has an index, and its type is pgoff_t, whose underlying
type is (unsigned long). On a 32-bit machine (at least i686), that is a
32-bit type.

The ceph address space code was, in a number of spots (but in particular, in
a call to ceph_osdc_readpages() made in readpage_nounlock()), using the
page index as a basis for computing a 64-bit offset. This was done by
shifting the index by PAGE_CACHE_SHIFT (which is 12 for x86 architecture).

Because the shift was applied to the 32-bit value, if the index was ever
anything 2^20 or above, the result would overflow and the upper bits would
be lost. That clipped result would then be promoted to a 64-bit value that
got passed to ceph_osdc_readpages().

The fix is to cast the page->index value to the desired 64-bit target type
before doing the shift. I found four other places in the affected source
file that would be affected by the same thing, so I've fixed those as well.

#9 Updated by Alex Elder over 11 years ago

  • Status changed from 12 to 7

Patch to fix this is out for review.

#10 Updated by Alex Elder over 11 years ago

Reviewed, committed. Waiting for a test run to complete
before marking this one resolved.

#11 Updated by Alex Elder over 11 years ago

  • Status changed from 7 to Resolved

Sage ran a test using the current testing branch, which
includes this fix. The tests he ran completed without
error.

I also tested the result in a 32-bit VM environment and
verified the symptoms went away.

So I'm marking this resolved.

6285bc23 ceph: avoid 32-bit page index overflow

Also available in: Atom PDF