Project

General

Profile

Actions

Bug #23280

open

mds: restarted mds may show wrong num_strays stats

Added by 鹏 张 about 6 years ago. Updated about 5 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

1.on ceph filesystem:mkdir test1 test2
2.touch ./test1/1 ./test1/2
3.ln ./test1/1 ./test2/1
ln ./test1/2 ./test2/2
4.rm -rf ./test1/* . rm -rf ./test2/*
5.restart ceph-mds
6.ceph daemon mds.[node*] perf dump |grep strays

I saw the num_strays is the same of strays_created. But I think the num_strays is zero. when restat mds.It can scan all the stray dir and calculate the num_strays.i think the purge or reintegrated or migrated is not remove the dentry from stray dir map.


Files

ceph-mds.log.txt (17.2 KB) ceph-mds.log.txt 鹏 张, 03/13/2018 09:29 AM
ceph-mds.txt (10.4 KB) ceph-mds.txt 鹏 张, 03/14/2018 03:43 AM
Actions #1

Updated by Nathan Cutler about 6 years ago

  • Project changed from Ceph to CephFS
Actions #2

Updated by Patrick Donnelly about 6 years ago

  • Assignee set to Patrick Donnelly
Actions #3

Updated by Patrick Donnelly about 6 years ago

  • Subject changed from when unlink file. then restat ceph-mds. mdcache num_strays statics is Inaccurate to mds: restarted mds may show wrong num_strays stats
  • Category set to Correctness/Safety
  • Source set to Community (user)
  • Component(FS) MDS added
Actions #4

Updated by Patrick Donnelly about 6 years ago

  • Status changed from New to Need More Info

鹏 张 wrote:

1.on ceph filesystem:mkdir test1 test2
2.touch ./test1/1 ./test1/2
3.ln ./test1/1 ./test2/1
ln ./test1/2 ./test2/2
4.rm -rf ./test1/* . rm -rf ./test2/*
5.restart ceph-mds
6.ceph daemon mds.[node*] perf dump |grep strays

I saw the num_strays is the same of strays_created. But I think the num_strays is zero. when restat mds.It can scan all the stray dir and calculate the num_strays.i think the purge or reintegrated or migrated is not remove the dentry from stray dir map.

I'm not seeing an issue. Can you be more specific about what the problem is? Example commands and output?

Actions #5

Updated by 鹏 张 about 6 years ago

Patrick Donnelly wrote:

鹏 张 wrote:

1.on ceph filesystem:mkdir test1 test2
2.touch ./test1/1 ./test1/2
3.ln ./test1/1 ./test2/1
ln ./test1/2 ./test2/2
4.rm -rf ./test1/* . rm -rf ./test2/*
5.restart ceph-mds
6.ceph daemon mds.[node*] perf dump |grep strays

I saw the num_strays is the same of strays_created. But I think the num_strays is zero. when restat mds.It can scan all the stray dir and calculate the num_strays.i think the purge or reintegrated or migrated is not remove the dentry from stray dir map.

I'm not seeing an issue. Can you be more specific about what the problem is? Example commands and output?

Dear patrick:
After that i did more test. i think the stray dentry is stored in omap in object.when restart ceph-mds service,the mds can replay the journal.but the stray dentry from journal not inconsistent with omap. Next attchment is my ceph-mds.log. you can search the add dirfrag i added. when replay journal it can get stray dir from journal.But i did unlink all the file before. I think the dentry in stray dir is null.now it always the dentry added before.
Br

Actions #6

Updated by 鹏 张 about 6 years ago

Patrick Donnelly wrote:

鹏 张 wrote:

1.on ceph filesystem:mkdir test1 test2
2.touch ./test1/1 ./test1/2
3.ln ./test1/1 ./test2/1
ln ./test1/2 ./test2/2
4.rm -rf ./test1/* . rm -rf ./test2/*
5.restart ceph-mds
6.ceph daemon mds.[node*] perf dump |grep strays

I saw the num_strays is the same of strays_created. But I think the num_strays is zero. when restat mds.It can scan all the stray dir and calculate the num_strays.i think the purge or reintegrated or migrated is not remove the dentry from stray dir map.

I'm not seeing an issue. Can you be more specific about what the problem is? Example commands and output?

you can also do touch more files under filesystem.
1.touch 1 2 3 4
2.rm -rf 1 2 3 4
3.[root@node33 infinityfs1]# ceph daemon mds.node33 perf dump |grep stray
"num_strays": 0,
"num_strays_purging": 0,
"num_strays_delayed": 0,
"strays_created": 4,
"strays_purged": 4,
"strays_reintegrated": 0,
"strays_migrated": 0,
4.systemctl restart ceph-mds.target
5.[root@node33 infinityfs1]# ceph daemon mds.node33 perf dump |grep stray
"num_strays": 4,
"num_strays_purging": 0,
"num_strays_delayed": 0,
"strays_created": 4,
"strays_purged": 0,
"strays_reintegrated": 0,
"strays_migrated": 0,

the num_strays i think should be zero,unless the stray dentry is not release before.

Actions #7

Updated by 鹏 张 about 6 years ago

Patrick Donnelly wrote:

鹏 张 wrote:

1.on ceph filesystem:mkdir test1 test2
2.touch ./test1/1 ./test1/2
3.ln ./test1/1 ./test2/1
ln ./test1/2 ./test2/2
4.rm -rf ./test1/* . rm -rf ./test2/*
5.restart ceph-mds
6.ceph daemon mds.[node*] perf dump |grep strays

I saw the num_strays is the same of strays_created. But I think the num_strays is zero. when restat mds.It can scan all the stray dir and calculate the num_strays.i think the purge or reintegrated or migrated is not remove the dentry from stray dir map.

I'm not seeing an issue. Can you be more specific about what the problem is? Example commands and output?

also maybe the problem is the journal repaly.I add some debug message to it. void EMetaBlob::replay(MDSRank mds, LogSegment *logseg, MDSlaveUpdate *slaveup) .
if (!unlinked.empty()) {
for (set<CInode
>::iterator p = linked.begin(); p != linked.end(); ++p) {
dout(0) << "dentry is: " << **p << dendl;
unlinked.erase(*p);
}
this code print the unlink dentry.but it can not remove the dentry from dir.the attachment is my new ceph-mds.log.

Actions #8

Updated by Patrick Donnelly about 5 years ago

  • Assignee deleted (Patrick Donnelly)
Actions

Also available in: Atom PDF