Bug #1774
closedclient: files become inaccessible in large directories (with snapshots?)
0%
Description
Taking snapshots of certain directories within ceph that hold backups of root filesystems of my openmoko phone causes some files to disappear. After some experimentation, I found out the issue doesn't only happen to files in the snapshots; sometimes I also get failure to access files in the original directories. From the observed behavior, I'm guessing it has to do with some border condition in the mds: the information is there, but it's not retrieved when the file happens to fall at some specific offset within the directory or somesuch. The evidence is that adding or removing files (and letting the mds commit the changes from its log, then starting a fresh mds) makes the faulty file vary, but once the diretory holds exactly the contects from the originally backed up image, the files that fail are always the same, though different ones in 3 different backup images with different sets of packages installed.
The faulty directory, in these 3 cases, has always been /var/lib/opkg/info, that holds multiple files per installed package, such as file lists, control scripts and more. File names are build out of the package name plus a suffix indicating the function, so we end up with long names, and lots of them. When I take a snapshot, we apparently cross a threshold, and then files that end up precisely at the border start to fail.
I attach level 20 debug dumps from the mds. It's surely not a coincidence that the 3 files that find says it can't stat (i.e., they appear as dir entries, but stat/read/write fails) are the ones that match appear at snapid offset messages in the mds logs:
- for d in .link/{Om2008.8-orig,shr-testing-2010-03+,shr-testing2011.1-2011-03-17}/usr/lib/opkg/info; do ../../gen-list $d > /dev/null; done
find: `.link/Om2008.8-orig/usr/lib/opkg/info/qtopia-phone-x11-composer-genericcomposer.list': No such file or directory
find: `.link/shr-testing-2010-03+/usr/lib/opkg/info/update-modules.postinst': No such file or directory
find: `.link/shr-testing2011.1-2011-03-17/usr/lib/opkg/info/task-shr-minimal-apps.control': No such file or directory
- grep "snapid 22 offset '[^']" ~/mds-baddir.log
2011-12-01 00:37:16.520212 7f2cde2e2700 mds.0.server snapid 22 offset 'qtopia-phone-x11-composer-genericcomposer.list'
2011-12-01 00:37:26.925145 7f2cde2e2700 mds.0.server snapid 22 offset 'update-modules.postinst'
2011-12-01 00:37:36.710803 7f2cde2e2700 mds.0.server snapid 22 offset 'task-shr-minimal-apps.control'
Neat, eh? I attach the compressed mds log.
Files