Project

General

Profile

Bug #59307

Updated by Prasanna Kumar Kalever about 1 year ago

*Description:* 

 Adding 500ms delay to east and west network interfaces and then enabling the mirroring (and take a snapshot) leads to    stuck in mirror snap create with "snap create timed out notifying lock owner" 

 *How Consistent:* 
 reproduced 3/3 

 *Reproducer steps:* 

 ✨ cat netns.sh  
 <pre><code class="c"> 
 #!/bin/sh 

 #For East network namespace: 
 sudo ip netns add east 
 sudo ip netns exec east ip link set dev lo up 
 sudo ip netns exec east ip link list 
 sudo ip link add veth0 type veth peer name veth1 
 sudo ip link set veth1 netns east 
 sudo ip netns exec east ifconfig veth1 10.1.1.1/24 up 
 sudo ifconfig veth0 10.1.1.2/24 up 
 sudo ip netns exec east route 

 #For West network namespace: 
 sudo ip netns add west 
 sudo ip netns exec west ip link set dev lo up 
 sudo ip netns exec west ip link list 
 sudo ip link add veth2 type veth peer name veth3 
 sudo ip link set veth3 netns west 
 sudo ip netns exec west ifconfig veth3 10.2.2.1/24 up 
 sudo ifconfig veth2 10.2.2.2/24 up 
 sudo ip netns exec west route 

 #Establish connection between two network ns 
 sudo ip link add br0 type bridge 
 sudo ip link set br0 up 
 sudo ip link set veth0 master br0 
 sudo ip link set veth2 master br0 

 sudo ip netns exec east ip route add 10.2.2.1 dev veth1 
 sudo ip netns exec west ip route add 10.1.1.1 dev veth3 

 </code></pre> 

 ✨ cat mirrorenv_ns.sh 
 <pre><code class="c"> 
 #!/bin/bash 
 set -xe 
 environment(){ 
 if [ ! -f site-a.conf ] && [ ! -L site-a.conf ]; then 
         ln -s run/clustera/ceph.conf site-a.conf 
 fi 
 if [ ! -f site-b.conf ] && [ ! -L site-b.conf ]; then 
         ln -s run/clusterb/ceph.conf site-b.conf 
 fi 
 } 
                                                                                                                       
 stop(){ 
 sudo ip netns exec east bash -c "../src/mstop.sh clustera" 
 sudo ip netns exec west bash -c "../src/mstop.sh clusterb" 
 sudo ip netns exec east killall rbd-mirror 
 sudo ip netns exec east killall rbd-mirror 
 sudo ip netns exec west killall rbd-mirror 
 sudo ip netns exec west killall rbd-mirror 
 } 

 start(){ 

 sudo ip netns exec east bash -c "MON=1 OSD=1 MGR=1 MDS=0 RGW=0 ../src/mstart.sh clustera --short -n -d --without-dashboard" 
 sudo ip netns exec west bash -c "MON=1 OSD=1 MGR=1 MDS=0 RGW=0 ../src/mstart.sh clusterb --short -n -d --without-dashboard" 

 #setup pool 
 sudo ip netns exec east ./bin/ceph --cluster site-a osd pool create pool1 
 sudo ip netns exec east ./bin/rbd --cluster site-a pool init pool1 

 sudo ip netns exec west ./bin/ceph --cluster site-b osd pool create pool1 
 sudo ip netns exec west ./bin/rbd --cluster site-b pool init pool1 

 #set to image mirror 
 sudo ip netns exec east ./bin/rbd --cluster site-a mirror pool enable pool1 image 

 #create token 
 sudo ip netns exec east ./bin/rbd --cluster site-a mirror pool peer bootstrap create pool1 | tail -n 1 > token 

 #import token 
 sudo ip netns exec west ./bin/rbd --cluster site-b mirror pool peer bootstrap import --site-name site-b pool1 token 

 #start rbd-mirror 

 sudo ip netns exec east ./bin/rbd-mirror --cluster site-a --rbd-mirror-delete-retry-interval=5 --rbd-mirror-image-state-check-interval=5 --rbd-mirror-journal-poll-age=1 --rbd-mirror-pool-replayers-refresh-interval=5 --debug-rbd=30 --debug-journaler=30 --debug-rbd_mirror=30 --daemonize=true 
 sudo ip netns exec west ./bin/rbd-mirror --cluster site-b --rbd-mirror-delete-retry-interval=5 --rbd-mirror-image-state-check-interval=5 --rbd-mirror-journal-poll-age=1 --rbd-mirror-pool-replayers-refresh-interval=5 --debug-rbd=30 --debug-journaler=30 --debug-rbd_mirror=30 --daemonize=true 

 sudo ip netns exec east ./bin/ceph --cluster site-a config set global debug_rbd 30 
 sudo ip netns exec east ./bin/ceph --cluster site-a config set global debug_rbd_mirror 30 
 #sudo ip netns exec east ./bin/ceph --cluster site-a config set client.rbd-mirror-peer debug_ms 1 

 sudo ip netns exec west ./bin/ceph --cluster site-b config set global debug_rbd 30 
 sudo ip netns exec west ./bin/ceph --cluster site-b config set global debug_rbd_mirror 30 
 #sudo ip netns exec west ./bin/ceph --cluster site-b config set client.rbd-mirror-peer debug_ms 1 

 } 

 NUM_ARGS=`echo "$@" | awk '{print NF}'` 
 ACTION=$1 
 if [ "$ACTION" == "start" ]; then 
         echo setting up environment 
         environment 
         echo start 
         start 
 elif [ "$ACTION" == "stop" ]; then 
         echo stop 
         stop 
 else 
         echo "Option not recognized" 
 fi 

 </pre> 


 *To setup two ceph clusters for mirroring, run below commands* 

 ✨ ./netns.sh 
 ✨ ./mirrorenv_ns.sh start 

 *Add delay in the network interfaces in east and west clusters* 
 ✨ ip netns exec east bash tc qdisc add dev veth1 root netem delay 500ms 
 ✨ ip netns exec west bash tc qdisc add dev veth3 root netem delay 500ms 

 *Terminal 1: The below command will open the east network namespace bash* 
 ✨ sudo ip netns exec east bash 
 *Create some images* 
 ✨ for i in {0..1}; do ./bin/rbd --cluster=site-a create --size 6G pool1/img$i; done 
 *Map & Mount the images* 
 ✨ for i in {0..1}; do ./bin/rbd-nbd --cluster=site-a map pool1/img$i; done 
 ✨ for i in {0..1}; do mkfs.xfs /dev/nbd$i; done 
 ✨ for i in {0..1}; do mkdir /mnt/nbd$i; done 
 ✨ for i in {0..1}; do mount /dev/nbd$i /mnt/nbd$i; done 

 *Perform IO* 
 ✨ mkdir fio_output 
 ✨ cat randrw.fio 
 <pre>  
 [global] 
 refill_buffers 
 time_based=1 
 size=5g 
 direct=1 
 group_reporting 
 ioengine=libaio 

 [workload] 
 rw=randrw 
 rate_iops=40,10 
 blocksize=4KB 
 #norandommap 
 iodepth=4 
 numjobs=1 
 #runtime=2d 
 runtime=60m 
 </pre> 

 ✨ for i in {0..1}; do fio randrw.fio --filename=/mnt/nbd${i}/file${i} --output=fio_output/fio${i}.txt & done  


 *Terminal 2: Open a new east bash terminal, as above terminal is occupied by fio workload* 
 ✨ sudo ip netns exec east bash 
 *Enable mirroring on all images* 
 ✨ for i in {0..1}; do ./bin/rbd --cluster=site-a mirror image enable pool1/img$i snapshot; done 
 *Custom schedule the snapshots ( or you can use snapshot schedule command too)* 
 ✨ while true; do for i in {0..1}; do time ./bin/rbd --cluster site-a mirror image snapshot pool1/img$i; done; sleep 60; done 


 *The above snapshot command will stuck in mirror snapshot create, grep logs for "snap create timed out notifying lock owner"*

Back