Project

General

Profile

Actions

Bug #17270

closed

[cephfs] fuse client crash when adding a new osd

Added by xiangyang yu over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello everyone,

I have met a ceph-fuse crash when i add osd to osd pool.

I am writing data through ceph-fuse,then i add one osd to osd pool, after less than 30 s, the ceph-fuse process crash.

The ceph-fuse client is 10.2.2, and the ceph osd is 0.94.3, details beblow:

[root@localhost ~]# rpm -qa | grep ceph
libcephfs1-10.2.2-0.el7.centos.x86_64
python-cephfs-10.2.2-0.el7.centos.x86_64
ceph-common-0.94.3-0.el7.x86_64
ceph-fuse-10.2.2-0.el7.centos.x86_64
ceph-0.94.3-0.el7.x86_64
ceph-mds-10.2.2-0.el7.centos.x86_64
[root@localhost ~]# 
[root@localhost ~]# 
[root@localhost ~]# rpm -qa | grep rados
librados2-devel-0.94.3-0.el7.x86_64
librados2-0.94.3-0.el7.x86_64
libradosstriper1-0.94.3-0.el7.x86_64
python-rados-0.94.3-0.el7.x86_64

ceph stat:

[root@localhost ~]# ceph status
    cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
     health HEALTH_WARN
            clock skew detected on mon.2, mon.0
            19 pgs stale
            19 pgs stuck stale
            Monitor clock skew detected 
     monmap e3: 3 mons at {0=10.222.5.229:6789/0,1=10.222.5.156:6789/0,2=10.222.5.162:6789/0}
            election epoch 26, quorum 0,1,2 1,2,0
     mdsmap e58: 1/1/1 up {0=0=up:active}, 1 up:standby
     osdmap e324: 9 osds: 9 up, 9 in
      pgmap v3505: 320 pgs, 3 pools, 4638 MB data, 1302 objects
            23373 MB used, 68695 MB / 92069 MB avail
                 301 active+clean
                  19 stale+active+clean

ceph osd stat:

[root@localhost ~]# ceph osd dump
epoch 324
fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
created 2016-09-13 11:08:34.629245
modified 2016-09-13 16:21:53.285729
flags 
pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 5 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 285 flags hashpspool crash_replay_interval 45 stripe_width 0
pool 6 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 282 flags hashpspool stripe_width 0
max_osd 9
osd.0 up   in  weight 1 up_from 271 up_thru 321 down_at 242 last_clean_interval [169,175) 10.222.5.229:6800/3780 10.222.5.229:6801/3780 10.222.5.229:6802/3780 10.222.5.229:6803/3780 exists,up 1bf6cda4-bf1a-4f8a-836d-b6aec970d257
osd.1 up   in  weight 1 up_from 223 up_thru 320 down_at 186 last_clean_interval [20,183) 10.222.5.229:6804/2228 10.222.5.229:6805/2228 10.222.5.229:6806/2228 10.222.5.229:6807/2228 exists,up 3f3ad2fa-52b1-46fd-af6c-05178b814e25
osd.2 up   in  weight 1 up_from 224 up_thru 320 down_at 186 last_clean_interval [22,183) 10.222.5.229:6808/2259 10.222.5.229:6809/2259 10.222.5.229:6810/2259 10.222.5.229:6811/2259 exists,up 9199193e-9928-4c5d-8adc-2c32a4c8716b
osd.3 up   in  weight 1 up_from 312 up_thru 313 down_at 303 last_clean_interval [0,0) 10.222.5.156:6800/3592 10.222.5.156:6801/3592 10.222.5.156:6802/3592 10.222.5.156:6803/3592 exists,up 9b8f1cb0-51df-42aa-8be4-8f6347235cc2
osd.4 up   in  weight 1 up_from 25 up_thru 322 down_at 0 last_clean_interval [0,0) 10.222.5.156:6804/25567 10.222.5.156:6805/25567 10.222.5.156:6806/25567 10.222.5.156:6807/25567 exists,up 0c719e5e-f8fc-46e0-926d-426bf6881ee0
osd.5 up   in  weight 1 up_from 27 up_thru 310 down_at 0 last_clean_interval [0,0) 10.222.5.156:6808/25678 10.222.5.156:6809/25678 10.222.5.156:6810/25678 10.222.5.156:6811/25678 exists,up 729e0749-2ce3-426a-a7f1-a3cbfa88ba0b
osd.6 up   in  weight 1 up_from 40 up_thru 313 down_at 0 last_clean_interval [0,0) 10.222.5.162:6807/15887 10.222.5.162:6808/15887 10.222.5.162:6809/15887 10.222.5.162:6810/15887 exists,up dea24f0f-4666-40af-98af-5ab8d42c37c6
osd.7 up   in  weight 1 up_from 45 up_thru 313 down_at 0 last_clean_interval [0,0) 10.222.5.162:6811/16040 10.222.5.162:6812/16040 10.222.5.162:6813/16040 10.222.5.162:6814/16040 exists,up 0e238745-0091-4790-9b39-c9d36f4ebbee
osd.8 up   in  weight 1 up_from 49 up_thru 314 down_at 0 last_clean_interval [0,0) 10.222.5.162:6815/16206 10.222.5.162:6816/16206 10.222.5.162:6817/16206 10.222.5.162:6818/16206 exists,up 59637f86-f283-4397-a63b-474976ee8047
[root@localhost ~]# 
[root@localhost ~]# ceph osd tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 9.00000 root default                                     
-5 3.00000     host yxy02                                   
 1 1.00000         osd.1       up  1.00000          1.00000 
 2 1.00000         osd.2       up  1.00000          1.00000 
 0 1.00000         osd.0       up  1.00000          1.00000 
-6 3.00000     host yxy03                                   
 4 1.00000         osd.4       up  1.00000          1.00000      -------------> OSD JUST ADD!
 5 1.00000         osd.5       up  1.00000          1.00000 
 3 1.00000         osd.3       up  1.00000          1.00000 
-7 3.00000     host zwr01                                   
 6 1.00000         osd.6       up  1.00000          1.00000 
 7 1.00000         osd.7       up  1.00000          1.00000 
 8 1.00000         osd.8       up  1.00000          1.00000 

Here is the client log:

   -72> 2016-09-13 16:15:21.586874 7f55783b0700  1 -- 10.222.5.162:0/3252549074 <== osd.4 10.222.5.156:6804/25567 3251 ==== osd_op_reply(17132 100000000b0.00000001 [write 0~2228224] v323'38 uv38 ondisk = 0) v6 ==== 187+0+0 (1642657507 0 0) 0x7f54e4590a70 con 0x7f55500219f0
   -71> 2016-09-13 16:15:21.586873 7f55782af700  1 -- 10.222.5.162:0/3252549074 <== osd.0 10.222.5.229:6800/3780 786 ==== osd_op_reply(17148 100000000af.00000001 [write 3670016~524288] v323'38 uv38 ondisk = 0) v6 ==== 187+0+0 (239835621 0 0) 0x7f54dc0013f0 con 0x7f555001d7a0
   -70> 2016-09-13 16:15:21.587052 7f55783b0700  1 -- 10.222.5.162:0/3252549074 <== osd.4 10.222.5.156:6804/25567 3252 ==== osd_op_reply(17138 100000000b0.00000001 [write 2228224~1966080] v323'39 uv39 ondisk = 0) v6 ==== 187+0+0 (1447166925 0 0) 0x7f54e4590a70 con 0x7f55500219f0
   -69> 2016-09-13 16:15:21.587058 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_BUFFER ref on 100000000af.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=0} mode=100755 size=10485760/16777216 mtime=2016-09-13 16:15:17.607516 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw flushing_caps=Fw objectset[100000000af ts 0/0 objects 3 dirty_or_tx 0] parents=0x7f550c00c980 0x7f556c0be140)
   -68> 2016-09-13 16:15:21.587098 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_CACHE ref on 100000000af.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=0} mode=100755 size=10485760/16777216 mtime=2016-09-13 16:15:17.607516 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw flushing_caps=Fw objectset[100000000af ts 0/0 objects 3 dirty_or_tx 0] parents=0x7f550c00c980 0x7f556c0be140)
   -67> 2016-09-13 16:15:21.587146 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_BUFFER ref on 100000000b0.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=0} mode=100755 size=10485760/16777216 mtime=2016-09-13 16:15:17.591570 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw flushing_caps=Fw objectset[100000000b0 ts 0/0 objects 3 dirty_or_tx 0] parents=0x7f54bc5f90e0 0x7f556c0be600)
   -66> 2016-09-13 16:15:21.587171 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_CACHE ref on 100000000b0.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=0} mode=100755 size=10485760/16777216 mtime=2016-09-13 16:15:17.591570 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw flushing_caps=Fw objectset[100000000b0 ts 0/0 objects 3 dirty_or_tx 0] parents=0x7f54bc5f90e0 0x7f556c0be600)
   -65> 2016-09-13 16:15:21.587663 7f55783b0700  1 -- 10.222.5.162:0/3252549074 <== osd.4 10.222.5.156:6804/25567 3253 ==== osd_op_reply(17141 100000000b5.00000001 [write 0~1966080] v323'40 uv40 ondisk = 0) v6 ==== 187+0+0 (4162314622 0 0) 0x7f54e4590a70 con 0x7f55500219f0
   -64> 2016-09-13 16:15:21.587822 7f55386f7700  1 -- 10.222.5.162:0/3252549074 <== osd.5 10.222.5.156:6808/25678 2865 ==== osd_op_reply(17161 100000000ba.00000000 [write 0~2228224] v323'48 uv48 ondisk = 0) v6 ==== 187+0+0 (3143255082 0 0) 0x7f54e0002f10 con 0x7f5550047900
   -63> 2016-09-13 16:15:21.587915 7f55386f7700  1 -- 10.222.5.162:0/3252549074 <== osd.5 10.222.5.156:6808/25678 2866 ==== osd_op_reply(17168 100000000ba.00000000 [write 2228224~1966080] v323'49 uv49 ondisk = 0) v6 ==== 187+0+0 (2364584196 0 0) 0x7f54e0002f10 con 0x7f5550047900
   -62> 2016-09-13 16:15:21.588742 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_BUFFER ref on 100000000ba.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100755 size=4194304/4194304 mtime=2016-09-13 16:15:17.676598 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[100000000ba ts 0/0 objects 1 dirty_or_tx 0] parents=0x7f54c8a384d0 0x7f556c1119e0)
   -61> 2016-09-13 16:15:21.588782 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_CACHE ref on 100000000ba.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100755 size=4194304/4194304 mtime=2016-09-13 16:15:17.676598 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[100000000ba ts 0/0 objects 1 dirty_or_tx 0] parents=0x7f54c8a384d0 0x7f556c1119e0)
   -60> 2016-09-13 16:15:21.645868 7f55783b0700  1 -- 10.222.5.162:0/3252549074 <== osd.4 10.222.5.156:6804/25567 3254 ==== osd_op_reply(17156 100000000b7.00000000 [write 0~1310720] v323'53 uv53 ondisk = 0) v6 ==== 187+0+0 (371286939 0 0) 0x7f54e4590a70 con 0x7f55500219f0
   -59> 2016-09-13 16:15:21.646023 7f55783b0700  1 -- 10.222.5.162:0/3252549074 <== osd.4 10.222.5.156:6804/25567 3255 ==== osd_op_reply(17165 100000000b7.00000000 [write 1310720~2883584] v323'54 uv54 ondisk = 0) v6 ==== 187+0+0 (719229667 0 0) 0x7f54e4590a70 con 0x7f55500219f0
   -58> 2016-09-13 16:15:21.646091 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_BUFFER ref on 100000000b7.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100755 size=4194304/4194304 mtime=2016-09-13 16:15:17.664874 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[100000000b7 ts 0/0 objects 1 dirty_or_tx 0] parents=0x7f54d0318800 0x7f556c110ba0)
   -57> 2016-09-13 16:15:21.646151 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_CACHE ref on 100000000b7.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100755 size=4194304/4194304 mtime=2016-09-13 16:15:17.664874 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[100000000b7 ts 0/0 objects 1 dirty_or_tx 0] parents=0x7f54d0318800 0x7f556c110ba0)
   -56> 2016-09-13 16:15:21.995626 7f55783b0700  1 -- 10.222.5.162:0/3252549074 <== osd.4 10.222.5.156:6804/25567 3256 ==== osd_op_reply(17151 100000000b3.00000002 [write 0~2097152] v323'48 uv48 ondisk = 0) v6 ==== 187+0+0 (2547660202 0 0) 0x7f54e4590a70 con 0x7f55500219f0
   -55> 2016-09-13 16:15:21.995774 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_BUFFER ref on 100000000b3.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=0} mode=100755 size=10485760/16777216 mtime=2016-09-13 16:15:17.608865 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw flushing_caps=Fw objectset[100000000b3 ts 0/0 objects 3 dirty_or_tx 0] parents=0x7f5534a0e720 0x7f556c0b77c0)
   -54> 2016-09-13 16:15:21.995839 7f55783b0700  1 -- 10.222.5.162:0/3252549074 <== osd.4 10.222.5.156:6804/25567 3257 ==== osd_op_reply(17154 100000000b5.00000001 [write 1966080~2228224] v323'41 uv41 ondisk = 0) v6 ==== 187+0+0 (2882567974 0 0) 0x7f54e4590a70 con 0x7f55500219f0
   -53> 2016-09-13 16:15:21.995829 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_CACHE ref on 100000000b3.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=0} mode=100755 size=10485760/16777216 mtime=2016-09-13 16:15:17.608865 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw flushing_caps=Fw objectset[100000000b3 ts 0/0 objects 3 dirty_or_tx 0] parents=0x7f5534a0e720 0x7f556c0b77c0)
   -52> 2016-09-13 16:15:21.995978 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_BUFFER ref on 100000000b5.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=0} mode=100755 size=10485760/16777216 mtime=2016-09-13 16:15:17.635885 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw flushing_caps=Fw objectset[100000000b5 ts 0/0 objects 3 dirty_or_tx 0] parents=0x7f54d0303c20 0x7f556c037ac0)
   -51> 2016-09-13 16:15:21.996022 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_CACHE ref on 100000000b5.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=0} mode=100755 size=10485760/16777216 mtime=2016-09-13 16:15:17.635885 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw flushing_caps=Fw objectset[100000000b5 ts 0/0 objects 3 dirty_or_tx 0] parents=0x7f54d0303c20 0x7f556c037ac0)
   -50> 2016-09-13 16:15:21.998647 7f55783b0700  1 -- 10.222.5.162:0/3252549074 <== osd.4 10.222.5.156:6804/25567 3258 ==== osd_op_reply(17158 100000000b8.00000000 [write 0~1966080] v323'49 uv49 ondisk = 0) v6 ==== 187+0+0 (1584949222 0 0) 0x7f54e4590a70 con 0x7f55500219f0
   -49> 2016-09-13 16:15:21.998788 7f55783b0700  1 -- 10.222.5.162:0/3252549074 <== osd.4 10.222.5.156:6804/25567 3259 ==== osd_op_reply(17166 100000000b8.00000000 [write 1966080~2228224] v323'50 uv50 ondisk = 0) v6 ==== 187+0+0 (3946494964 0 0) 0x7f54e4590a70 con 0x7f55500219f0
   -48> 2016-09-13 16:15:21.998913 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_BUFFER ref on 100000000b8.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100755 size=4194304/4194304 mtime=2016-09-13 16:15:17.669978 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[100000000b8 ts 0/0 objects 1 dirty_or_tx 0] parents=0x7f5515c3c040 0x7f556c111060)
   -47> 2016-09-13 16:15:21.998986 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_CACHE ref on 100000000b8.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100755 size=4194304/4194304 mtime=2016-09-13 16:15:17.669978 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[100000000b8 ts 0/0 objects 1 dirty_or_tx 0] parents=0x7f5515c3c040 0x7f556c111060)
   -46> 2016-09-13 16:15:22.720826 7f55797fa700 10 monclient: tick
   -45> 2016-09-13 16:15:22.720941 7f55797fa700 10 monclient: renew subs? (now: 2016-09-13 16:15:22.720929; renew after: 2016-09-13 16:10:33.093544) -- yes
   -44> 2016-09-13 16:15:22.720983 7f55797fa700 10 monclient: renew_subs - empty
   -43> 2016-09-13 16:15:22.721661 7f55805b3700  2 -- 10.222.5.162:0/3252549074 >> 10.222.5.229:6789/0 pipe(0x7f556c133990 sd=12 :55626 s=2 pgs=946 cs=1 l=1 c=0x7f556c09e3a0).reader got KEEPALIVE_ACK
   -42> 2016-09-13 16:15:22.872793 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b0 1394 seq 4 tid 1514 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 10485760/0 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.591570) v8 -- ?+0 0x7f555c001e20 con 0x7f556c00e790
   -41> 2016-09-13 16:15:22.872906 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b1 1395 seq 4 tid 1515 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 10485760/0 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.596742) v8 -- ?+0 0x7f555c002910 con 0x7f556c00e790
   -40> 2016-09-13 16:15:22.873074 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000af 1393 seq 4 tid 1516 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 10485760/0 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.607516) v8 -- ?+0 0x7f555c004330 con 0x7f556c00e790
   -39> 2016-09-13 16:15:22.873126 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b3 1397 seq 4 tid 1517 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 10485760/0 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.608865) v8 -- ?+0 0x7f555c004730 con 0x7f556c00e790
   -38> 2016-09-13 16:15:22.873206 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b4 1398 seq 4 tid 1518 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 10485760/0 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.604355) v8 -- ?+0 0x7f555c009120 con 0x7f556c00e790
   -37> 2016-09-13 16:15:22.873358 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b2 1396 seq 4 tid 1519 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=Fxcb follows 1 size 10485760/0 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.609681) v8 -- ?+0 0x7f555c0096e0 con 0x7f556c00e790
   -36> 2016-09-13 16:15:22.873440 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b5 1399 seq 4 tid 1520 caps=pAsLsXsFsc dirty=Fw wanted=- follows 1 size 10485760/0 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.635885) v8 -- ?+0 0x7f555c009ca0 con 0x7f556c00e790
   -35> 2016-09-13 16:15:22.873506 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b6 1400 seq 4 tid 1521 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=Fxcb follows 1 size 10485760/0 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.653195) v8 -- ?+0 0x7f555c00a260 con 0x7f556c00e790
   -34> 2016-09-13 16:15:22.873554 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b7 1401 seq 1 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size 4194304/4325376 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.664874) v8 -- ?+0 0x7f555c00a820 con 0x7f556c00e790
   -33> 2016-09-13 16:15:22.873689 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b8 1402 seq 1 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size 4194304/4325376 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.669978) v8 -- ?+0 0x7f555c00ade0 con 0x7f556c00e790
   -32> 2016-09-13 16:15:22.873731 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b9 1403 seq 1 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size 4194304/4325376 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.671537) v8 -- ?+0 0x7f555c00b3b0 con 0x7f556c00e790
   -31> 2016-09-13 16:15:22.873788 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000bb 1405 seq 1 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size 4194304/4325376 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.674291) v8 -- ?+0 0x7f555c00b980 con 0x7f556c00e790
   -30> 2016-09-13 16:15:22.873899 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000bc 1406 seq 1 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size 4194304/4325376 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.672719) v8 -- ?+0 0x7f555c00bf50 con 0x7f556c00e790
   -29> 2016-09-13 16:15:22.874009 7f557b7fe700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000ba 1404 seq 1 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size 4194304/4325376 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.676598) v8 -- ?+0 0x7f555c001e20 con 0x7f556c00e790
   -28> 2016-09-13 16:15:22.874677 7f557a7fc700  1 -- 10.222.5.162:0/3252549074 <== mds.0 10.222.5.162:6800/24314 8318 ==== client_caps(revoke ino 100000000b2 1396 seq 5 caps=pAsLsXsFsxcrwb dirty=- wanted=Fxcb follows 0 size 10485760/16777216 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.609681) v8 ==== 216+0+0 (2189111699 0 0) 0x7f556800d780 con 0x7f556c00e790
   -27> 2016-09-13 16:15:22.874760 7f557a7fc700  5 client.24191 handle_cap_grant on in 100000000b2 mds.0 seq 5 caps now pAsLsXsFsxcrwb was pAsxLsXsxFsxcrwb
   -26> 2016-09-13 16:15:22.874783 7f557a7fc700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b2 1396 seq 5 caps=pAsLsXsFsxcrwb dirty=- wanted=Fxcb follows 0 size 10485760/0 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.609681) v8 -- ?+0 0x7f556c0c3600 con 0x7f556c00e790
   -25> 2016-09-13 16:15:22.874829 7f557a7fc700  1 -- 10.222.5.162:0/3252549074 <== mds.0 10.222.5.162:6800/24314 8319 ==== client_caps(revoke ino 100000000b6 1400 seq 5 caps=pAsLsXsFsxcrwb dirty=- wanted=Fxcb follows 0 size 10485760/16777216 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.653195) v8 ==== 216+0+0 (2007964154 0 0) 0x7f556800e300 con 0x7f556c00e790
   -24> 2016-09-13 16:15:22.874867 7f557a7fc700  5 client.24191 handle_cap_grant on in 100000000b6 mds.0 seq 5 caps now pAsLsXsFsxcrwb was pAsxLsXsxFsxcrwb
   -23> 2016-09-13 16:15:22.874880 7f557a7fc700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.162:6800/24314 -- client_caps(update ino 100000000b6 1400 seq 5 caps=pAsLsXsFsxcrwb dirty=- wanted=Fxcb follows 0 size 10485760/0 ts 1/18446744073709551615 mtime 2016-09-13 16:15:17.653195) v8 -- ?+0 0x7f556c112820 con 0x7f556c00e790
   -22> 2016-09-13 16:15:23.688495 7f557affd700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.156:6804/25567 -- osd_op(client.24191.0:17171 5.91cd2dc2 100000000bd.00000000 [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e323) v7 -- ?+0 0x7f5550001410 con 0x7f55500219f0
   -21> 2016-09-13 16:15:23.688651 7f557affd700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.156:6804/25567 -- osd_op(client.24191.0:17172 5.d4d2b698 100000000be.00000000 [write 0~4194304] snapc 1=[] ondisk+write+known_if_redirected e323) v7 -- ?+0 0x7f5550024380 con 0x7f55500219f0
   -20> 2016-09-13 16:15:24.291465 7f55783b0700  1 -- 10.222.5.162:0/3252549074 <== osd.4 10.222.5.156:6804/25567 3260 ==== osd_op_reply(17171 100000000bd.00000000 [write 0~4194304] v323'49 uv49 ondisk = 0) v6 ==== 187+0+0 (3181485819 0 0) 0x7f54e4590a70 con 0x7f55500219f0
   -19> 2016-09-13 16:15:24.291638 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_BUFFER ref on 100000000bd.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100755 size=4194304/4194304 mtime=2016-09-13 16:15:17.689988 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[100000000bd ts 0/0 objects 1 dirty_or_tx 0] parents=0x7f5514004e80 0x7f556c0c2c80)
   -18> 2016-09-13 16:15:24.291707 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_CACHE ref on 100000000bd.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100755 size=4194304/4194304 mtime=2016-09-13 16:15:17.689988 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[100000000bd ts 0/0 objects 1 dirty_or_tx 0] parents=0x7f5514004e80 0x7f556c0c2c80)
   -17> 2016-09-13 16:15:24.292120 7f55783b0700  1 -- 10.222.5.162:0/3252549074 <== osd.4 10.222.5.156:6804/25567 3261 ==== osd_op_reply(17172 100000000be.00000000 [write 0~4194304] v323'46 uv46 ondisk = 0) v6 ==== 187+0+0 (31370795 0 0) 0x7f54e4590a70 con 0x7f55500219f0
   -16> 2016-09-13 16:15:24.292231 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_BUFFER ref on 100000000be.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100755 size=4194304/4194304 mtime=2016-09-13 16:15:17.690092 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[100000000be ts 0/0 objects 1 dirty_or_tx 0] parents=0x7f551401e490 0x7f556c0c3140)
   -15> 2016-09-13 16:15:24.292271 7f55815b5700  5 client.24191 put_cap_ref dropped last FILE_CACHE ref on 100000000be.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100755 size=4194304/4194304 mtime=2016-09-13 16:15:17.690092 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[100000000be ts 0/0 objects 1 dirty_or_tx 0] parents=0x7f551401e490 0x7f556c0c3140)
   -14> 2016-09-13 16:15:24.532379 7f55604f5700  1 -- 10.222.5.162:0/3252549074 <== osd.1 10.222.5.229:6804/2228 1469 ==== osd_op_reply(17159 100000000b6.00000002 [write 0~1703936] v323'50 uv50 ondisk = 0) v6 ==== 187+0+0 (1709573551 0 0) 0x7f54c4001300 con 0x7f555002c1a0
   -13> 2016-09-13 16:15:24.732652 7f55825b7700  1 -- 10.222.5.162:0/3252549074 --> 10.222.5.156:6800/3592 -- ping magic: 0 v1 -- ?+0 0x7f554c0078a0 con 0x7f5550037a10
   -12> 2016-09-13 16:15:26.827049 7f55604f5700  1 -- 10.222.5.162:0/3252549074 <== osd.1 10.222.5.229:6804/2228 1470 ==== osd_op_reply(17162 100000000b6.00000002 [write 1703936~393216] v323'51 uv51 ondisk = 0) v6 ==== 187+0+0 (3172726132 0 0) 0x7f54c4001300 con 0x7f555002c1a0
   -11> 2016-09-13 16:15:26.832007 7f55602f3700  1 -- 10.222.5.162:0/3252549074 <== osd.8 10.222.5.162:6815/16206 1451 ==== osd_op_reply(17147 100000000b6.00000001 [write 0~1966080] v323'50 uv50 ondisk = 0) v6 ==== 187+0+0 (4058597970 0 0) 0x7f54cc001150 con 0x7f55500461a0
   -10> 2016-09-13 16:15:26.832353 7f55602f3700  1 -- 10.222.5.162:0/3252549074 <== osd.8 10.222.5.162:6815/16206 1452 ==== osd_op_reply(17155 100000000b6.00000001 [write 1966080~2228224] v323'51 uv51 ondisk = 0) v6 ==== 187+0+0 (2720977930 0 0) 0x7f54cc001150 con 0x7f55500461a0
    -9> 2016-09-13 16:15:26.843866 7f557a7fc700  1 -- 10.222.5.162:0/3252549074 <== mds.0 10.222.5.162:6800/24314 8320 ==== client_caps(flush_ack ino 100000000b0 1394 seq 4 tid 1488 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 0 size 0/0 mtime 0.000000) v8 ==== 216+0+0 (744883454 0 0) 0x7f556800dc90 con 0x7f556c00e790
    -8> 2016-09-13 16:15:26.843945 7f557a7fc700  5 client.24191 handle_cap_flush_ack mds.0 cleaned - on 100000000b0.head(faked_ino=0 ref=3 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=0} mode=100755 size=10485760/16777216 mtime=2016-09-13 16:15:17.591570 caps=pAsLsXsFsc(0=pAsLsXsFsc) flushing_caps=Fw objectset[100000000b0 ts 0/0 objects 3 dirty_or_tx 0] parents=0x7f54bc5f90e0 0x7f556c0be600) with Fw
    -7> 2016-09-13 16:15:26.843990 7f557a7fc700  1 -- 10.222.5.162:0/3252549074 <== mds.0 10.222.5.162:6800/24314 8321 ==== client_caps(flush_ack ino 100000000b1 1395 seq 4 tid 1489 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 0 size 0/0 mtime 0.000000) v8 ==== 216+0+0 (3235489794 0 0) 0x7f5568000d70 con 0x7f556c00e790
    -6> 2016-09-13 16:15:26.844017 7f557a7fc700  5 client.24191 handle_cap_flush_ack mds.0 cleaned - on 100000000b1.head(faked_ino=0 ref=3 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=0} mode=100755 size=10485760/16777216 mtime=2016-09-13 16:15:17.596742 caps=pAsLsXsFsc(0=pAsLsXsFsc) flushing_caps=Fw objectset[100000000b1 ts 0/0 objects 3 dirty_or_tx 0] parents=0x7f5520029860 0x7f556c0bd380) with Fw
    -5> 2016-09-13 16:15:26.851251 7f55786b3700  1 -- 10.222.5.162:0/3252549074 <== osd.7 10.222.5.162:6811/16040 2517 ==== osd_op_reply(17134 100000000b2.00000001 [write 0~2752512] v323'46 uv46 ondisk = 0) v6 ==== 187+0+0 (1467277884 0 0) 0x7f5500003fe0 con 0x7f5550016c60
    -4> 2016-09-13 16:15:26.851253 7f55607f8700  1 -- 10.222.5.162:0/3252549074 <== osd.3 10.222.5.156:6800/3592 8 ==== osd_op_reply(17002 100000009c4.00000002 [write 0~1179648,write 1179648~917504] v323'29 uv29 ondisk = 0) v6 ==== 229+0+0 (2689207009 0 0) 0x7f54c0002350 con 0x7f5550037a10
    -3> 2016-09-13 16:15:26.851516 7f55607f8700  1 -- 10.222.5.162:0/3252549074 <== osd.3 10.222.5.156:6800/3592 9 ==== osd_op_reply(17130 100000000b6.00000000 [write 0~4194304] v323'52 uv52 ondisk = 0) v6 ==== 187+0+0 (2987763432 0 0) 0x7f54c0002350 con 0x7f5550037a10
    -2> 2016-09-13 16:15:26.852306 7f55786b3700  1 -- 10.222.5.162:0/3252549074 <== osd.7 10.222.5.162:6811/16040 2518 ==== osd_op_reply(17140 100000000b2.00000001 [write 2752512~1441792] v323'47 uv47 ondisk = 0) v6 ==== 187+0+0 (3703741868 0 0) 0x7f5500003fe0 con 0x7f5550016c60
    -1> 2016-09-13 16:15:26.852467 7f55607f8700  1 -- 10.222.5.162:0/3252549074 <== osd.3 10.222.5.156:6800/3592 10 ==== osd_op_reply(17149 100000000b2.00000002 [write 0~1572864] v323'37 uv37 ondisk = 0) v6 ==== 187+0+0 (3935522164 0 0) 0x7f54c0002350 con 0x7f5550037a10
     0> 2016-09-13 16:15:26.853667 7f55815b5700 -1 osdc/ObjectCacher.cc: In function 'void ObjectCacher::Object::try_merge_bh(ObjectCacher::BufferHead*)' thread 7f55815b5700 time 2016-09-13 16:15:26.851601
osdc/ObjectCacher.cc: 146: FAILED assert(p->second == bh)

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f5587528965]
 2: (ObjectCacher::Object::try_merge_bh(ObjectCacher::BufferHead*)+0x315) [0x7f55873d5475]
 3: (ObjectCacher::bh_write_commit(long, sobject_t, std::vector<std::pair<long, unsigned long>, std::allocator<std::pair<long, unsigned long> > >&, unsigned long, int)+0x905) [0x7f55873da135]
 4: (ObjectCacher::C_WriteCommit::finish(int)+0x58) [0x7f55873e8018]
 5: (Context::complete(int)+0x9) [0x7f558739b2f9]
 6: (C_Lock::finish(int)+0x29) [0x7f558739bc49]
 7: (Context::complete(int)+0x9) [0x7f558739b2f9]
 8: (Finisher::finisher_thread_entry()+0x216) [0x7f558744c6d6]
 9: (()+0x7dc5) [0x7f5586199dc5]
 10: (clone()+0x6d) [0x7f558507fced]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This problem was reproducible.

I remove one osd from the osd tree and after one minute, I add the same osd to osd pool and then fuse client crush.

Ceph fuse is writing data through smallfile too, and the script is

" python smallfile_cli.py --top /mnt/test --threads 8 --files 20 --file-size 10240 --record-size 512 --operation create"

and my remove osd steps are:

1. kill -9 $pid_num

2. ceph osd out $id

3. ceph osd down $id

4. ceph osd crush remove osd.$id

5. ceph auth del osd.$id

6. ceph osd rm osd.$id

and my add osd steps are:

1. mkfs.xfs and remount my the osd remove..

2. ceph osd create

3. ceph-osd -i $id --mkfs --osd-data=/data/osd/osd.$id --mkkey

4. ceph auth add osd.$id osd 'allow *' mon 'allow rwx' -i /data/osd/osd.$id/keyring

5. ceph osd crush create-or-move osd.$id 1.0 host=xxxx

6. ceph-osd -i $id


Files

ceph-client.admin.zip (152 KB) ceph-client.admin.zip xiangyang yu, 09/13/2016 10:01 AM
ceph-client.admin.zip (727 KB) ceph-client.admin.zip xiangyang yu, 09/13/2016 11:20 AM
add_osd_when_read.zip (337 KB) add_osd_when_read.zip xiangyang yu, 09/19/2016 08:14 AM

Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #17617: jewel: [cephfs] fuse client crash when adding a new osdResolvedLoïc DacharyActions
Actions

Also available in: Atom PDF