Project

General

Profile

Bug #7367

fail to run mds and mount rbd (v0.76)

Added by Loïc Dachary about 10 years ago. Updated about 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph-mon-lmb-B-1:~# ceph -s
    cluster 0b68be85-f5a1-4565-9ab1-6625b8a13597
     health HEALTH_WARN mds chab1 is laggy
     monmap e5: 3 mons at {chab1=172.20.106.84:6789/0,lmbb1=172.20.107.84:6789/0,loib1=172.20.108.84:6789/0}, election epoch 576, quorum 0,1,2 chab1,lmbb1,loib1
     mdsmap e24563: 1/1/1 up {0=chab1=up:active(laggy or crashed)}
     osdmap e54512: 21 osds: 21 up, 21 in
      pgmap v8039014: 6432 pgs, 6 pools, 3271 GB data, 1232 kobjects
            6470 GB used, 4239 GB / 10710 GB avail
                6432 active+clean

rbd pools are available:
root@machriemoor:~# rbd ls
bench
bench2

but mounting the images they contain fail
root@machriemoor:~# mount /dev/rbd/rbd/bench /mnt/tempo
mount: wrong fs type, bad option, bad superblock on /dev/rbd1,
dmesg :  1747.349670] rbd: rbd1: write 1000 at 4100000000 (0)
[ 1747.349670] 
[ 1747.349819] rbd: rbd1:   result -6 xferred 1000
[ 1747.349819] 
[ 1747.349963] blk_update_request: 127 callbacks suppressed
[ 1747.350082] end_request: I/O error, dev rbd1, sector 545259520
[ 1747.350203] quiet_error: 127 callbacks suppressed
[ 1747.350321] Buffer I/O error on device rbd1, logical block 68157440
[ 1747.350442] lost page write due to I/O error on rbd1
[ 1747.350744] rbd: rbd1: write 1000 at 4100020000 (20000)
[ 1747.350744] 
[ 1747.350889] rbd: rbd1:   result -6 xferred 1000
[ 1747.350889] 
[ 1747.351031] end_request: I/O error, dev rbd1, sector 545259776
[ 1747.351153] Buffer I/O error on device rbd1, logical block 68157472
[ 1747.351273] lost page write due to I/O error on rbd1
[ 1747.351669] rbd: rbd1: write 3000 at 4100040000 (40000)
[ 1747.351669] 
[ 1747.351870] rbd: rbd1:   result -6 xferred 3000
[ 1747.351870] 
[ 1747.352013] end_request: I/O error, dev rbd1, sector 545260032
[ 1747.352134] Buffer I/O error on device rbd1, logical block 68157504
[ 1747.352255] lost page write due to I/O error on rbd1
[ 1747.352374] Buffer I/O error on device rbd1, logical block 68157505

root@machriemoor:~# uname -a
Linux machriemoor 3.13.1-dsiun-130719 #12 SMP Fri Jan 31 12:08:15 CET 2014 x86_64 GNU/Linux

on the machine trying to mount

root@machriemoor:~# ceph --version
ceph version 0.70 (e3bb0656d92e74ead0342ae696039a51170fe941)

and it was rebooted recently
root@machriemoor:~# uptime
 23:09:09 up 34 min,  1 user,  load average: 0,21, 0,08, 0,08

on the machines running the osds and the mds
ceph-mds-loi-B-1:~# ceph --version
ceph version 0.76 (3b990136bfab74249f166dd742fd8e61637e63d9)

the mds refuses to start with the error
2014-02-07 23:12:37.038781 7f7538327780 -1 mds.-1.-1 *** one or more OSDs do not support TMAP2OMAP; upgrade OSDs before starting MDS (or downgrade MDS) ***

the cluster was upgraded to 0.75 and downgraded to 0.72 after seeing some problems and upgraded again to 0.76 in an attempt to fix them

Related issues

Related to Ceph - Bug #7368: ceph osd repair * blocks after some minutes and prevent other ceph pg repair commands Can't reproduce 02/07/2014

History

#1 Updated by Loïc Dachary about 10 years ago

  • Description updated (diff)

#2 Updated by Loïc Dachary about 10 years ago

  • Description updated (diff)

#3 Updated by Greg Farnum about 10 years ago

Is this a CephFS or an RBD bug report?
(Perhaps it should be two different ones. :p)

#4 Updated by Loïc Dachary about 10 years ago

  • Subject changed from fail to mount cephfs (v0.76) to fail to run mds and mount rbd (v0.76)

#5 Updated by Yann Dupont about 10 years ago

Hello, all & thanks to loic for taking the time to make a bugreport.

The problem occured on one of my cluster, the "test" cluster where I test from time to time new ceph features - and allow dev versions on it. There is data on it, but just test data. The possibility of breaking the cluster is allowed on this PARTICULAR cluster.

To summarize, this bugreport is really about a failed update. Don't put much time on it, the failure is most probably due to my over optimistic (read risky) behaviour. Anyway, the aim of this bugreport is to help avoiding this kind of situation on production clusters.

I will go into some more details :

The cluster is running since a long time ago, and was installed with Argonaut, or maybe before that. I was running 0.73 on this cluster, at this point all was fine. There is 21 OSD on IT, 6432 PG & 6 pools .

2 pools contains some rbd images, but most of the data is on cephfs.

Last week, I made an upgrade to latest dev at this time. 0.73 -> 0.75. 0.74 has not been installed, probably a mistake.

0.75 caused some problems : soon after upgrade I had lots of scrub errors. Someone else reported the very same problem on the mailing list. I shutted the cluster and waited for 0.76.

When 0.76 arrived I installed it, the cluster self healed & I finished repairing. It ended up with ~3000 inconsistent PG (flagged by 0.75)
I started a script for repairing PG, but it was very slow (I noticed some days after that it was probably limited by osd-scrub-load-threshold & osd-max-scrubs).

I also encounterd another bug concerning the commande ceph osd repair * . I'll try to made another bug report about this.

After 1 day & still lots of inconsistent PG, I finally decided to go back to 0.72 to see if it can help (that was probably a bad idea,but...)

Going back to 0.72, self healing again because of ~10% degraded PG, I suppose PG placement / cruhs rules have changed a bit. After self healing it ended with the ~3000 SCRUB errors. Pg repairing was still slow, and I also encountered the ceph osd repair * problem. After figuring out the osd-scrub-load-threshold and osd-max-scrubs parameters and injecting them, I finally accelerated the scrub process and ended with 0 remaining scrub errors.

But ~30 unfound PG ... and mds refused to start.

I finally go back to 0.76 again (Yeah, probably bad idea++), and after the mandatory data movement, had ~50 inconsistent PG. repaired, and ended with a fonctionnal cluster.

But to find what loic described : unmountable / corrrupted rbd images.

On the cephfs side, my mds refuse to start , this time I have a reason :
2014-02-08 00:31:59.361732 7f6c2f63f780 -1 mds.-1.-1 * one or more OSDs do not support TMAP2OMAP; upgrade OSDs before starting MDS (or downgrade MDS) *

which is very strange as I'm quite sure ALL my osd are now running 0.76 again...

Please don't put much time on it, I probably hitted some bugs or undefined behaviour because I choose to follow a nasty path...
Anyway, maybe the are some lessons to learn :

If the corruption is due to the downgrade, it could be a good idea to check versions and prohibit to start and osd/mon/mds with on a newer data set

There is something wrong with pg/osd repairing when you have lots of pg in inconsistent state.

just for reference here is the script I used :

ceph health detail | grep inconsistent | awk '{ print $2 }' > liste_p_repair
while read p; do ceph pg repair $p; sleep 1 ; done < liste_p_repair

#6 Updated by Yann Dupont about 10 years ago

Quick follow Up ;
with 0.76 OSD, MON, MDS: Creating new volume , formatting, mounting leads to normal comportment.
using "Old", already created volumes leads to a corrupt rbd image.

Going back to 0.72 OSD (not changing MON & MDS) leads to perfectly OK RBD images . Bot for OLD & 0.76 formatted images.

Only problem : MDS refuse to start...

0 mds.-1.0 handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding}, killing myself

#7 Updated by Yann Dupont about 10 years ago

I think this bug can be closed as we're probably not in a supported scenario.

I've been able to backup all my RBD data with 0.72, now back to 0.76 (and even with 0.77),
now mds start, I can begin to access to cephfs then all my mds are killed by a bug.

Should I use this bug report ou open another one ?

#8 Updated by Zheng Yan about 10 years ago

yes, please

#9 Updated by Yann Dupont about 10 years ago

OK, so see #7503, http://tracker.ceph.com/issues/7503.

Also see #7368, http://tracker.ceph.com/issues/7368 which for me is also a valid problem.

And for this one, #7367,we should close it now.

#10 Updated by Ian Colle about 10 years ago

  • Status changed from New to Closed

Also available in: Atom PDF