From a83aaf4979e799705781ceb86a1f29d2b29736b1 Mon Sep 17 00:00:00 2001 From: Madhuparna Bhowmik Date: Wed, 4 Dec 2019 15:49:39 +0530 Subject: Documentation: filesystems: automount-support: Change reference to document autofs.txt to autofs.rst This patch fixes following documentation build warning: Warning: Documentation/filesystems/automount-support.txt references a file that doesn't exist: Documentation/filesystems/autofs.txt Signed-off-by: Madhuparna Bhowmik Link: https://lore.kernel.org/r/20191204101939.6939-1-madhuparnabhowmik04@gmail.com Signed-off-by: Jonathan Corbet --- Documentation/filesystems/automount-support.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt index b0afd3d55eaf15..7d9f8260756271 100644 --- a/Documentation/filesystems/automount-support.txt +++ b/Documentation/filesystems/automount-support.txt @@ -9,7 +9,7 @@ also be requested by userspace. IN-KERNEL AUTOMOUNTING ====================== -See section "Mount Traps" of Documentation/filesystems/autofs.txt +See section "Mount Traps" of Documentation/filesystems/autofs.rst Then from userspace, you can just do something like: -- cgit 1.2.3-korg From a1986433a9fd7a0410c9267805e19bcbdcffa2fc Mon Sep 17 00:00:00 2001 From: "Daniel W. S. Almeida" Date: Sun, 22 Dec 2019 22:00:30 -0300 Subject: Documentation: filesystems: convert vfat.txt to RST Converts vfat.txt to the reStructuredText format, improving presentation without changing the underlying content. Signed-off-by: Daniel W. S. Almeida ----------------------------------------------------------- Changes in v3: Removed unnecessary markup. Removed section "BUG REPORTS" as recommended by the maintainer. Changes in v2: Refactored long lines as pointed out by Jonathan Copied the maintainer Updated the reference in the MAINTAINERS file for vfat I did not move this into admin-guide, waiting on what the maintainer has to say about this and also about old sections in the text, if any. Link: https://lore.kernel.org/r/20191223010030.434902-1-dwlsalmeida@gmail.com Signed-off-by: Jonathan Corbet --- Documentation/filesystems/index.rst | 1 + Documentation/filesystems/vfat.rst | 387 ++++++++++++++++++++++++++++++++++++ Documentation/filesystems/vfat.txt | 347 -------------------------------- MAINTAINERS | 2 +- 4 files changed, 389 insertions(+), 348 deletions(-) create mode 100644 Documentation/filesystems/vfat.rst delete mode 100644 Documentation/filesystems/vfat.txt (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index ad6315a48d14ce..b03578063801e8 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -48,3 +48,4 @@ Documentation for filesystem implementations. autofs virtiofs + vfat diff --git a/Documentation/filesystems/vfat.rst b/Documentation/filesystems/vfat.rst new file mode 100644 index 00000000000000..e85d74e912956f --- /dev/null +++ b/Documentation/filesystems/vfat.rst @@ -0,0 +1,387 @@ +==== +VFAT +==== + +USING VFAT +========== + +To use the vfat filesystem, use the filesystem type 'vfat'. i.e.:: + + mount -t vfat /dev/fd0 /mnt + + +No special partition formatter is required, +'mkdosfs' will work fine if you want to format from within Linux. + +VFAT MOUNT OPTIONS +================== + +**uid=###** + Set the owner of all files on this filesystem. + The default is the uid of current process. + +**gid=###** + Set the group of all files on this filesystem. + The default is the gid of current process. + +**umask=###** + The permission mask (for files and directories, see *umask(1)*). + The default is the umask of current process. + +**dmask=###** + The permission mask for the directory. + The default is the umask of current process. + +**fmask=###** + The permission mask for files. + The default is the umask of current process. + +**allow_utime=###** + This option controls the permission check of mtime/atime. + + **-20**: If current process is in group of file's group ID, + you can change timestamp. + + **-2**: Other users can change timestamp. + + The default is set from dmask option. If the directory is + writable, utime(2) is also allowed. i.e. ~dmask & 022. + + Normally utime(2) checks current process is owner of + the file, or it has CAP_FOWNER capability. But FAT + filesystem doesn't have uid/gid on disk, so normal + check is too unflexible. With this option you can + relax it. + +**codepage=###** + Sets the codepage number for converting to shortname + characters on FAT filesystem. + By default, FAT_DEFAULT_CODEPAGE setting is used. + +**iocharset=** + Character set to use for converting between the + encoding is used for user visible filename and 16 bit + Unicode characters. Long filenames are stored on disk + in Unicode format, but Unix for the most part doesn't + know how to deal with Unicode. + By default, FAT_DEFAULT_IOCHARSET setting is used. + + There is also an option of doing UTF-8 translations + with the utf8 option. + +.. note:: ``iocharset=utf8`` is not recommended. If unsure, you should consider + the utf8 option instead. + +**utf8=** + UTF-8 is the filesystem safe version of Unicode that + is used by the console. It can be enabled or disabled + for the filesystem with this option. + If 'uni_xlate' gets set, UTF-8 gets disabled. + By default, FAT_DEFAULT_UTF8 setting is used. + +**uni_xlate=** + Translate unhandled Unicode characters to special + escaped sequences. This would let you backup and + restore filenames that are created with any Unicode + characters. Until Linux supports Unicode for real, + this gives you an alternative. Without this option, + a '?' is used when no translation is possible. The + escape character is ':' because it is otherwise + illegal on the vfat filesystem. The escape sequence + that gets used is ':' and the four digits of hexadecimal + unicode. + +**nonumtail=** + When creating 8.3 aliases, normally the alias will + end in '~1' or tilde followed by some number. If this + option is set, then if the filename is + "longfilename.txt" and "longfile.txt" does not + currently exist in the directory, longfile.txt will + be the short alias instead of longfi~1.txt. + +**usefree** + Use the "free clusters" value stored on FSINFO. It will + be used to determine number of free clusters without + scanning disk. But it's not used by default, because + recent Windows don't update it correctly in some + case. If you are sure the "free clusters" on FSINFO is + correct, by this option you can avoid scanning disk. + +**quiet** + Stops printing certain warning messages. + +**check=s|r|n** + Case sensitivity checking setting. + + **s**: strict, case sensitive + + **r**: relaxed, case insensitive + + **n**: normal, default setting, currently case insensitive + +**nocase** + This was deprecated for vfat. Use ``shortname=win95`` instead. + +**shortname=lower|win95|winnt|mixed** + Shortname display/create setting. + + **lower**: convert to lowercase for display, + emulate the Windows 95 rule for create. + + **win95**: emulate the Windows 95 rule for display/create. + + **winnt**: emulate the Windows NT rule for display/create. + + **mixed**: emulate the Windows NT rule for display, + emulate the Windows 95 rule for create. + + Default setting is `mixed`. + +**tz=UTC** + Interpret timestamps as UTC rather than local time. + This option disables the conversion of timestamps + between local time (as used by Windows on FAT) and UTC + (which Linux uses internally). This is particularly + useful when mounting devices (like digital cameras) + that are set to UTC in order to avoid the pitfalls of + local time. + +**time_offset=minutes** + Set offset for conversion of timestamps from local time + used by FAT to UTC. I.e. minutes will be subtracted + from each timestamp to convert it to UTC used internally by + Linux. This is useful when time zone set in ``sys_tz`` is + not the time zone used by the filesystem. Note that this + option still does not provide correct time stamps in all + cases in presence of DST - time stamps in a different DST + setting will be off by one hour. + +**showexec** + If set, the execute permission bits of the file will be + allowed only if the extension part of the name is .EXE, + .COM, or .BAT. Not set by default. + +**debug** + Can be set, but unused by the current implementation. + +**sys_immutable** + If set, ATTR_SYS attribute on FAT is handled as + IMMUTABLE flag on Linux. Not set by default. + +**flush** + If set, the filesystem will try to flush to disk more + early than normal. Not set by default. + +**rodir** + FAT has the ATTR_RO (read-only) attribute. On Windows, + the ATTR_RO of the directory will just be ignored, + and is used only by applications as a flag (e.g. it's set + for the customized folder). + + If you want to use ATTR_RO as read-only flag even for + the directory, set this option. + +**errors=panic|continue|remount-ro** + specify FAT behavior on critical errors: panic, continue + without doing anything or remount the partition in + read-only mode (default behavior). + +**discard** + If set, issues discard/TRIM commands to the block + device when blocks are freed. This is useful for SSD devices + and sparse/thinly-provisoned LUNs. + +**nfs=stale_rw|nostale_ro** + Enable this only if you want to export the FAT filesystem + over NFS. + + **stale_rw**: This option maintains an index (cache) of directory + *inodes* by *i_logstart* which is used by the nfs-related code to + improve look-ups. Full file operations (read/write) over NFS is + supported but with cache eviction at NFS server, this could + result in ESTALE issues. + + **nostale_ro**: This option bases the *inode* number and filehandle + on the on-disk location of a file in the MS-DOS directory entry. + This ensures that ESTALE will not be returned after a file is + evicted from the inode cache. However, it means that operations + such as rename, create and unlink could cause filehandles that + previously pointed at one file to point at a different file, + potentially causing data corruption. For this reason, this + option also mounts the filesystem readonly. + + To maintain backward compatibility, ``'-o nfs'`` is also accepted, + defaulting to "stale_rw". + +**dos1xfloppy : 0,1,yes,no,true,false** + If set, use a fallback default BIOS Parameter Block + configuration, determined by backing device size. These static + parameters match defaults assumed by DOS 1.x for 160 kiB, + 180 kiB, 320 kiB, and 360 kiB floppies and floppy images. + + + +LIMITATION +========== + +The fallocated region of file is discarded at umount/evict time +when using fallocate with FALLOC_FL_KEEP_SIZE. +So, User should assume that fallocated region can be discarded at +last close if there is memory pressure resulting in eviction of +the inode from the memory. As a result, for any dependency on +the fallocated region, user should make sure to recheck fallocate +after reopening the file. + +TODO +==== +Need to get rid of the raw scanning stuff. Instead, always use +a get next directory entry approach. The only thing left that uses +raw scanning is the directory renaming code. + + +POSSIBLE PROBLEMS +================= + +- vfat_valid_longname does not properly checked reserved names. +- When a volume name is the same as a directory name in the root + directory of the filesystem, the directory name sometimes shows + up as an empty file. +- autoconv option does not work correctly. + + +TEST SUITE +========== +If you plan to make any modifications to the vfat filesystem, please +get the test suite that comes with the vfat distribution at + +``_ + +This tests quite a few parts of the vfat filesystem and additional +tests for new features or untested features would be appreciated. + +NOTES ON THE STRUCTURE OF THE VFAT FILESYSTEM +============================================= +This documentation was provided by Galen C. Hunt gchunt@cs.rochester.edu and +lightly annotated by Gordon Chaffee. + +This document presents a very rough, technical overview of my +knowledge of the extended FAT file system used in Windows NT 3.5 and +Windows 95. I don't guarantee that any of the following is correct, +but it appears to be so. + +The extended FAT file system is almost identical to the FAT +file system used in DOS versions up to and including *6.223410239847* +:-). The significant change has been the addition of long file names. +These names support up to 255 characters including spaces and lower +case characters as opposed to the traditional 8.3 short names. + +Here is the description of the traditional FAT entry in the current +Windows 95 filesystem:: + + struct directory { // Short 8.3 names + unsigned char name[8]; // file name + unsigned char ext[3]; // file extension + unsigned char attr; // attribute byte + unsigned char lcase; // Case for base and extension + unsigned char ctime_ms; // Creation time, milliseconds + unsigned char ctime[2]; // Creation time + unsigned char cdate[2]; // Creation date + unsigned char adate[2]; // Last access date + unsigned char reserved[2]; // reserved values (ignored) + unsigned char time[2]; // time stamp + unsigned char date[2]; // date stamp + unsigned char start[2]; // starting cluster number + unsigned char size[4]; // size of the file + }; + + +The lcase field specifies if the base and/or the extension of an 8.3 +name should be capitalized. This field does not seem to be used by +Windows 95 but it is used by Windows NT. The case of filenames is not +completely compatible from Windows NT to Windows 95. It is not completely +compatible in the reverse direction, however. Filenames that fit in +the 8.3 namespace and are written on Windows NT to be lowercase will +show up as uppercase on Windows 95. + +.. note:: Note that the ``start`` and ``size`` values are actually little + endian integer values. The descriptions of the fields in this + structure are public knowledge and can be found elsewhere. + +With the extended FAT system, Microsoft has inserted extra +directory entries for any files with extended names. (Any name which +legally fits within the old 8.3 encoding scheme does not have extra +entries.) I call these extra entries slots. Basically, a slot is a +specially formatted directory entry which holds up to 13 characters of +a file's extended name. Think of slots as additional labeling for the +directory entry of the file to which they correspond. Microsoft +prefers to refer to the 8.3 entry for a file as its alias and the +extended slot directory entries as the file name. + +The C structure for a slot directory entry follows:: + + struct slot { // Up to 13 characters of a long name + unsigned char id; // sequence number for slot + unsigned char name0_4[10]; // first 5 characters in name + unsigned char attr; // attribute byte + unsigned char reserved; // always 0 + unsigned char alias_checksum; // checksum for 8.3 alias + unsigned char name5_10[12]; // 6 more characters in name + unsigned char start[2]; // starting cluster number + unsigned char name11_12[4]; // last 2 characters in name + }; + + +If the layout of the slots looks a little odd, it's only +because of Microsoft's efforts to maintain compatibility with old +software. The slots must be disguised to prevent old software from +panicking. To this end, a number of measures are taken: + + 1) The attribute byte for a slot directory entry is always set + to 0x0f. This corresponds to an old directory entry with + attributes of "hidden", "system", "read-only", and "volume + label". Most old software will ignore any directory + entries with the "volume label" bit set. Real volume label + entries don't have the other three bits set. + + 2) The starting cluster is always set to 0, an impossible + value for a DOS file. + +Because the extended FAT system is backward compatible, it is +possible for old software to modify directory entries. Measures must +be taken to ensure the validity of slots. An extended FAT system can +verify that a slot does in fact belong to an 8.3 directory entry by +the following: + + 1) Positioning. Slots for a file always immediately proceed + their corresponding 8.3 directory entry. In addition, each + slot has an id which marks its order in the extended file + name. Here is a very abbreviated view of an 8.3 directory + entry and its corresponding long name slots for the file + "My Big File.Extension which is long":: + + + + + + + + + .. note:: Note that the slots are stored from last to first. Slots + are numbered from 1 to N. The Nth slot is ``or'ed`` with + 0x40 to mark it as the last one. + + 2) Checksum. Each slot has an alias_checksum value. The + checksum is calculated from the 8.3 name using the + following algorithm:: + + for (sum = i = 0; i < 11; i++) { + sum = (((sum&1)<<7)|((sum&0xfe)>>1)) + name[i] + } + + + 3) If there is free space in the final slot, a Unicode ``NULL (0x0000)`` + is stored after the final character. After that, all unused + characters in the final slot are set to Unicode 0xFFFF. + +Finally, note that the extended name is stored in Unicode. Each Unicode +character takes either two or four bytes, UTF-16LE encoded. diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt deleted file mode 100644 index 91031298beb13d..00000000000000 --- a/Documentation/filesystems/vfat.txt +++ /dev/null @@ -1,347 +0,0 @@ -USING VFAT ----------------------------------------------------------------------- -To use the vfat filesystem, use the filesystem type 'vfat'. i.e. - mount -t vfat /dev/fd0 /mnt - -No special partition formatter is required. mkdosfs will work fine -if you want to format from within Linux. - -VFAT MOUNT OPTIONS ----------------------------------------------------------------------- -uid=### -- Set the owner of all files on this filesystem. - The default is the uid of current process. - -gid=### -- Set the group of all files on this filesystem. - The default is the gid of current process. - -umask=### -- The permission mask (for files and directories, see umask(1)). - The default is the umask of current process. - -dmask=### -- The permission mask for the directory. - The default is the umask of current process. - -fmask=### -- The permission mask for files. - The default is the umask of current process. - -allow_utime=### -- This option controls the permission check of mtime/atime. - - 20 - If current process is in group of file's group ID, - you can change timestamp. - 2 - Other users can change timestamp. - - The default is set from `dmask' option. (If the directory is - writable, utime(2) is also allowed. I.e. ~dmask & 022) - - Normally utime(2) checks current process is owner of - the file, or it has CAP_FOWNER capability. But FAT - filesystem doesn't have uid/gid on disk, so normal - check is too unflexible. With this option you can - relax it. - -codepage=### -- Sets the codepage number for converting to shortname - characters on FAT filesystem. - By default, FAT_DEFAULT_CODEPAGE setting is used. - -iocharset= -- Character set to use for converting between the - encoding is used for user visible filename and 16 bit - Unicode characters. Long filenames are stored on disk - in Unicode format, but Unix for the most part doesn't - know how to deal with Unicode. - By default, FAT_DEFAULT_IOCHARSET setting is used. - - There is also an option of doing UTF-8 translations - with the utf8 option. - - NOTE: "iocharset=utf8" is not recommended. If unsure, - you should consider the following option instead. - -utf8= -- UTF-8 is the filesystem safe version of Unicode that - is used by the console. It can be enabled or disabled - for the filesystem with this option. - If 'uni_xlate' gets set, UTF-8 gets disabled. - By default, FAT_DEFAULT_UTF8 setting is used. - -uni_xlate= -- Translate unhandled Unicode characters to special - escaped sequences. This would let you backup and - restore filenames that are created with any Unicode - characters. Until Linux supports Unicode for real, - this gives you an alternative. Without this option, - a '?' is used when no translation is possible. The - escape character is ':' because it is otherwise - illegal on the vfat filesystem. The escape sequence - that gets used is ':' and the four digits of hexadecimal - unicode. - -nonumtail= -- When creating 8.3 aliases, normally the alias will - end in '~1' or tilde followed by some number. If this - option is set, then if the filename is - "longfilename.txt" and "longfile.txt" does not - currently exist in the directory, 'longfile.txt' will - be the short alias instead of 'longfi~1.txt'. - -usefree -- Use the "free clusters" value stored on FSINFO. It'll - be used to determine number of free clusters without - scanning disk. But it's not used by default, because - recent Windows don't update it correctly in some - case. If you are sure the "free clusters" on FSINFO is - correct, by this option you can avoid scanning disk. - -quiet -- Stops printing certain warning messages. - -check=s|r|n -- Case sensitivity checking setting. - s: strict, case sensitive - r: relaxed, case insensitive - n: normal, default setting, currently case insensitive - -nocase -- This was deprecated for vfat. Use shortname=win95 instead. - -shortname=lower|win95|winnt|mixed - -- Shortname display/create setting. - lower: convert to lowercase for display, - emulate the Windows 95 rule for create. - win95: emulate the Windows 95 rule for display/create. - winnt: emulate the Windows NT rule for display/create. - mixed: emulate the Windows NT rule for display, - emulate the Windows 95 rule for create. - Default setting is `mixed'. - -tz=UTC -- Interpret timestamps as UTC rather than local time. - This option disables the conversion of timestamps - between local time (as used by Windows on FAT) and UTC - (which Linux uses internally). This is particularly - useful when mounting devices (like digital cameras) - that are set to UTC in order to avoid the pitfalls of - local time. -time_offset=minutes - -- Set offset for conversion of timestamps from local time - used by FAT to UTC. I.e. minutes will be subtracted - from each timestamp to convert it to UTC used internally by - Linux. This is useful when time zone set in sys_tz is - not the time zone used by the filesystem. Note that this - option still does not provide correct time stamps in all - cases in presence of DST - time stamps in a different DST - setting will be off by one hour. - -showexec -- If set, the execute permission bits of the file will be - allowed only if the extension part of the name is .EXE, - .COM, or .BAT. Not set by default. - -debug -- Can be set, but unused by the current implementation. - -sys_immutable -- If set, ATTR_SYS attribute on FAT is handled as - IMMUTABLE flag on Linux. Not set by default. - -flush -- If set, the filesystem will try to flush to disk more - early than normal. Not set by default. - -rodir -- FAT has the ATTR_RO (read-only) attribute. On Windows, - the ATTR_RO of the directory will just be ignored, - and is used only by applications as a flag (e.g. it's set - for the customized folder). - - If you want to use ATTR_RO as read-only flag even for - the directory, set this option. - -errors=panic|continue|remount-ro - -- specify FAT behavior on critical errors: panic, continue - without doing anything or remount the partition in - read-only mode (default behavior). - -discard -- If set, issues discard/TRIM commands to the block - device when blocks are freed. This is useful for SSD devices - and sparse/thinly-provisoned LUNs. - -nfs=stale_rw|nostale_ro - Enable this only if you want to export the FAT filesystem - over NFS. - - stale_rw: This option maintains an index (cache) of directory - inodes by i_logstart which is used by the nfs-related code to - improve look-ups. Full file operations (read/write) over NFS is - supported but with cache eviction at NFS server, this could - result in ESTALE issues. - - nostale_ro: This option bases the inode number and filehandle - on the on-disk location of a file in the MS-DOS directory entry. - This ensures that ESTALE will not be returned after a file is - evicted from the inode cache. However, it means that operations - such as rename, create and unlink could cause filehandles that - previously pointed at one file to point at a different file, - potentially causing data corruption. For this reason, this - option also mounts the filesystem readonly. - - To maintain backward compatibility, '-o nfs' is also accepted, - defaulting to stale_rw - -dos1xfloppy -- If set, use a fallback default BIOS Parameter Block - configuration, determined by backing device size. These static - parameters match defaults assumed by DOS 1.x for 160 kiB, - 180 kiB, 320 kiB, and 360 kiB floppies and floppy images. - - -: 0,1,yes,no,true,false - -LIMITATION ---------------------------------------------------------------------- -* The fallocated region of file is discarded at umount/evict time - when using fallocate with FALLOC_FL_KEEP_SIZE. - So, User should assume that fallocated region can be discarded at - last close if there is memory pressure resulting in eviction of - the inode from the memory. As a result, for any dependency on - the fallocated region, user should make sure to recheck fallocate - after reopening the file. - -TODO ----------------------------------------------------------------------- -* Need to get rid of the raw scanning stuff. Instead, always use - a get next directory entry approach. The only thing left that uses - raw scanning is the directory renaming code. - - -POSSIBLE PROBLEMS ----------------------------------------------------------------------- -* vfat_valid_longname does not properly checked reserved names. -* When a volume name is the same as a directory name in the root - directory of the filesystem, the directory name sometimes shows - up as an empty file. -* autoconv option does not work correctly. - -BUG REPORTS ----------------------------------------------------------------------- -If you have trouble with the VFAT filesystem, mail bug reports to -chaffee@bmrc.cs.berkeley.edu. Please specify the filename -and the operation that gave you trouble. - -TEST SUITE ----------------------------------------------------------------------- -If you plan to make any modifications to the vfat filesystem, please -get the test suite that comes with the vfat distribution at - - http://web.archive.org/web/*/http://bmrc.berkeley.edu/ - people/chaffee/vfat.html - -This tests quite a few parts of the vfat filesystem and additional -tests for new features or untested features would be appreciated. - -NOTES ON THE STRUCTURE OF THE VFAT FILESYSTEM ----------------------------------------------------------------------- -(This documentation was provided by Galen C. Hunt - and lightly annotated by Gordon Chaffee). - -This document presents a very rough, technical overview of my -knowledge of the extended FAT file system used in Windows NT 3.5 and -Windows 95. I don't guarantee that any of the following is correct, -but it appears to be so. - -The extended FAT file system is almost identical to the FAT -file system used in DOS versions up to and including 6.223410239847 -:-). The significant change has been the addition of long file names. -These names support up to 255 characters including spaces and lower -case characters as opposed to the traditional 8.3 short names. - -Here is the description of the traditional FAT entry in the current -Windows 95 filesystem: - - struct directory { // Short 8.3 names - unsigned char name[8]; // file name - unsigned char ext[3]; // file extension - unsigned char attr; // attribute byte - unsigned char lcase; // Case for base and extension - unsigned char ctime_ms; // Creation time, milliseconds - unsigned char ctime[2]; // Creation time - unsigned char cdate[2]; // Creation date - unsigned char adate[2]; // Last access date - unsigned char reserved[2]; // reserved values (ignored) - unsigned char time[2]; // time stamp - unsigned char date[2]; // date stamp - unsigned char start[2]; // starting cluster number - unsigned char size[4]; // size of the file - }; - -The lcase field specifies if the base and/or the extension of an 8.3 -name should be capitalized. This field does not seem to be used by -Windows 95 but it is used by Windows NT. The case of filenames is not -completely compatible from Windows NT to Windows 95. It is not completely -compatible in the reverse direction, however. Filenames that fit in -the 8.3 namespace and are written on Windows NT to be lowercase will -show up as uppercase on Windows 95. - -Note that the "start" and "size" values are actually little -endian integer values. The descriptions of the fields in this -structure are public knowledge and can be found elsewhere. - -With the extended FAT system, Microsoft has inserted extra -directory entries for any files with extended names. (Any name which -legally fits within the old 8.3 encoding scheme does not have extra -entries.) I call these extra entries slots. Basically, a slot is a -specially formatted directory entry which holds up to 13 characters of -a file's extended name. Think of slots as additional labeling for the -directory entry of the file to which they correspond. Microsoft -prefers to refer to the 8.3 entry for a file as its alias and the -extended slot directory entries as the file name. - -The C structure for a slot directory entry follows: - - struct slot { // Up to 13 characters of a long name - unsigned char id; // sequence number for slot - unsigned char name0_4[10]; // first 5 characters in name - unsigned char attr; // attribute byte - unsigned char reserved; // always 0 - unsigned char alias_checksum; // checksum for 8.3 alias - unsigned char name5_10[12]; // 6 more characters in name - unsigned char start[2]; // starting cluster number - unsigned char name11_12[4]; // last 2 characters in name - }; - -If the layout of the slots looks a little odd, it's only -because of Microsoft's efforts to maintain compatibility with old -software. The slots must be disguised to prevent old software from -panicking. To this end, a number of measures are taken: - - 1) The attribute byte for a slot directory entry is always set - to 0x0f. This corresponds to an old directory entry with - attributes of "hidden", "system", "read-only", and "volume - label". Most old software will ignore any directory - entries with the "volume label" bit set. Real volume label - entries don't have the other three bits set. - - 2) The starting cluster is always set to 0, an impossible - value for a DOS file. - -Because the extended FAT system is backward compatible, it is -possible for old software to modify directory entries. Measures must -be taken to ensure the validity of slots. An extended FAT system can -verify that a slot does in fact belong to an 8.3 directory entry by -the following: - - 1) Positioning. Slots for a file always immediately proceed - their corresponding 8.3 directory entry. In addition, each - slot has an id which marks its order in the extended file - name. Here is a very abbreviated view of an 8.3 directory - entry and its corresponding long name slots for the file - "My Big File.Extension which is long": - - - - - - - - Note that the slots are stored from last to first. Slots - are numbered from 1 to N. The Nth slot is or'ed with 0x40 - to mark it as the last one. - - 2) Checksum. Each slot has an "alias_checksum" value. The - checksum is calculated from the 8.3 name using the - following algorithm: - - for (sum = i = 0; i < 11; i++) { - sum = (((sum&1)<<7)|((sum&0xfe)>>1)) + name[i] - } - - 3) If there is free space in the final slot, a Unicode NULL (0x0000) - is stored after the final character. After that, all unused - characters in the final slot are set to Unicode 0xFFFF. - -Finally, note that the extended name is stored in Unicode. Each Unicode -character takes either two or four bytes, UTF-16LE encoded. diff --git a/MAINTAINERS b/MAINTAINERS index cc0a4a8ae06a5f..1df6007d641435 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17356,7 +17356,7 @@ F: drivers/mtd/nand/raw/vf610_nfc.c VFAT/FAT/MSDOS FILESYSTEM M: OGAWA Hirofumi S: Maintained -F: Documentation/filesystems/vfat.txt +F: Documentation/filesystems/vfat.rst F: fs/fat/ VFIO DRIVER -- cgit 1.2.3-korg From 2f123b9a359650374712e812c0c466f75e77ba0e Mon Sep 17 00:00:00 2001 From: "Daniel W. S. Almeida" Date: Fri, 10 Jan 2020 20:24:23 -0300 Subject: Documentation: convert nfs.txt to ReST This patch converts nfs.txt to RST. It also moves it to admin-guide. The reason for moving it is because this document contains information useful for system administrators, as noted on the following paragraph: 'The purpose of this document is to provide information on some of the special features of the NFS client that can be configured by system administrators'. Signed-off-by: Daniel W. S. Almeida Link: https://lore.kernel.org/r/cb9f2da2f2f6dd432b4cf9e05f79f74f4d54b6ab.1578697871.git.dwlsalmeida@gmail.com Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/index.rst | 1 + Documentation/admin-guide/nfs/index.rst | 8 ++ Documentation/admin-guide/nfs/nfs-client.rst | 141 +++++++++++++++++++++++++++ Documentation/filesystems/nfs/nfs.txt | 136 -------------------------- 4 files changed, 150 insertions(+), 136 deletions(-) create mode 100644 Documentation/admin-guide/nfs/index.rst create mode 100644 Documentation/admin-guide/nfs/nfs-client.rst delete mode 100644 Documentation/filesystems/nfs/nfs.txt (limited to 'Documentation/filesystems') diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 4405b74853121c..4433f3929481fb 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -76,6 +76,7 @@ configure specific aspects of kernel behavior to your liking. device-mapper/index efi-stub ext4 + nfs/index gpio/index highuid hw_random diff --git a/Documentation/admin-guide/nfs/index.rst b/Documentation/admin-guide/nfs/index.rst new file mode 100644 index 00000000000000..2fe77091c25c7a --- /dev/null +++ b/Documentation/admin-guide/nfs/index.rst @@ -0,0 +1,8 @@ +============= +NFS +============= + +.. toctree:: + :maxdepth: 1 + + nfs-client diff --git a/Documentation/admin-guide/nfs/nfs-client.rst b/Documentation/admin-guide/nfs/nfs-client.rst new file mode 100644 index 00000000000000..c4b777c7584b49 --- /dev/null +++ b/Documentation/admin-guide/nfs/nfs-client.rst @@ -0,0 +1,141 @@ +========== +NFS Client +========== + +The NFS client +============== + +The NFS version 2 protocol was first documented in RFC1094 (March 1989). +Since then two more major releases of NFS have been published, with NFSv3 +being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April +2003). + +The Linux NFS client currently supports all the above published versions, +and work is in progress on adding support for minor version 1 of the NFSv4 +protocol. + +The purpose of this document is to provide information on some of the +special features of the NFS client that can be configured by system +administrators. + + +The nfs4_unique_id parameter +============================ + +NFSv4 requires clients to identify themselves to servers with a unique +string. File open and lock state shared between one client and one server +is associated with this identity. To support robust NFSv4 state recovery +and transparent state migration, this identity string must not change +across client reboots. + +Without any other intervention, the Linux client uses a string that contains +the local system's node name. System administrators, however, often do not +take care to ensure that node names are fully qualified and do not change +over the lifetime of a client system. Node names can have other +administrative requirements that require particular behavior that does not +work well as part of an nfs_client_id4 string. + +The nfs.nfs4_unique_id boot parameter specifies a unique string that can be +used instead of a system's node name when an NFS client identifies itself to +a server. Thus, if the system's node name is not unique, or it changes, its +nfs.nfs4_unique_id stays the same, preventing collision with other clients +or loss of state during NFS reboot recovery or transparent state migration. + +The nfs.nfs4_unique_id string is typically a UUID, though it can contain +anything that is believed to be unique across all NFS clients. An +nfs4_unique_id string should be chosen when a client system is installed, +just as a system's root file system gets a fresh UUID in its label at +install time. + +The string should remain fixed for the lifetime of the client. It can be +changed safely if care is taken that the client shuts down cleanly and all +outstanding NFSv4 state has expired, to prevent loss of NFSv4 state. + +This string can be stored in an NFS client's grub.conf, or it can be provided +via a net boot facility such as PXE. It may also be specified as an nfs.ko +module parameter. Specifying a uniquifier string is not support for NFS +clients running in containers. + + +The DNS resolver +================ + +NFSv4 allows for one server to refer the NFS client to data that has been +migrated onto another server by means of the special "fs_locations" +attribute. See `RFC3530 Section 6: Filesystem Migration and Replication`_ and +`Implementation Guide for Referrals in NFSv4`_. + +.. _RFC3530 Section 6\: Filesystem Migration and Replication: http://tools.ietf.org/html/rfc3530#section-6 +.. _Implementation Guide for Referrals in NFSv4: http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 + +The fs_locations information can take the form of either an ip address and +a path, or a DNS hostname and a path. The latter requires the NFS client to +do a DNS lookup in order to mount the new volume, and hence the need for an +upcall to allow userland to provide this service. + +Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual +/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps: + + (1) The process checks the dns_resolve cache to see if it contains a + valid entry. If so, it returns that entry and exits. + + (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' + (may be changed using the 'nfs.cache_getent' kernel boot parameter) + is run, with two arguments: + - the cache name, "dns_resolve" + - the hostname to resolve + + (3) After looking up the corresponding ip address, the helper script + writes the result into the rpc_pipefs pseudo-file + '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel' + in the following (text) format: + + " \n" + + Where is in the usual IPv4 (123.456.78.90) or IPv6 + (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format. + is identical to the second argument of the helper + script, and is the 'time to live' of this cache entry (in + units of seconds). + + .. note:: + If is invalid, say the string "0", then a negative + entry is created, which will cause the kernel to treat the hostname + as having no valid DNS translation. + + + + +A basic sample /sbin/nfs_cache_getent +===================================== +.. code-block:: sh + + #!/bin/bash + # + ttl=600 + # + cut=/usr/bin/cut + getent=/usr/bin/getent + rpc_pipefs=/var/lib/nfs/rpc_pipefs + # + die() + { + echo "Usage: $0 cache_name entry_name" + exit 1 + } + + [ $# -lt 2 ] && die + cachename="$1" + cache_path=${rpc_pipefs}/cache/${cachename}/channel + + case "${cachename}" in + dns_resolve) + name="$2" + result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" + [ -z "${result}" ] && result="0" + ;; + *) + die + ;; + esac + echo "${result} ${name} ${ttl}" >${cache_path} diff --git a/Documentation/filesystems/nfs/nfs.txt b/Documentation/filesystems/nfs/nfs.txt deleted file mode 100644 index f2571c8bef74e5..00000000000000 --- a/Documentation/filesystems/nfs/nfs.txt +++ /dev/null @@ -1,136 +0,0 @@ - -The NFS client -============== - -The NFS version 2 protocol was first documented in RFC1094 (March 1989). -Since then two more major releases of NFS have been published, with NFSv3 -being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April -2003). - -The Linux NFS client currently supports all the above published versions, -and work is in progress on adding support for minor version 1 of the NFSv4 -protocol. - -The purpose of this document is to provide information on some of the -special features of the NFS client that can be configured by system -administrators. - - -The nfs4_unique_id parameter -============================ - -NFSv4 requires clients to identify themselves to servers with a unique -string. File open and lock state shared between one client and one server -is associated with this identity. To support robust NFSv4 state recovery -and transparent state migration, this identity string must not change -across client reboots. - -Without any other intervention, the Linux client uses a string that contains -the local system's node name. System administrators, however, often do not -take care to ensure that node names are fully qualified and do not change -over the lifetime of a client system. Node names can have other -administrative requirements that require particular behavior that does not -work well as part of an nfs_client_id4 string. - -The nfs.nfs4_unique_id boot parameter specifies a unique string that can be -used instead of a system's node name when an NFS client identifies itself to -a server. Thus, if the system's node name is not unique, or it changes, its -nfs.nfs4_unique_id stays the same, preventing collision with other clients -or loss of state during NFS reboot recovery or transparent state migration. - -The nfs.nfs4_unique_id string is typically a UUID, though it can contain -anything that is believed to be unique across all NFS clients. An -nfs4_unique_id string should be chosen when a client system is installed, -just as a system's root file system gets a fresh UUID in its label at -install time. - -The string should remain fixed for the lifetime of the client. It can be -changed safely if care is taken that the client shuts down cleanly and all -outstanding NFSv4 state has expired, to prevent loss of NFSv4 state. - -This string can be stored in an NFS client's grub.conf, or it can be provided -via a net boot facility such as PXE. It may also be specified as an nfs.ko -module parameter. Specifying a uniquifier string is not support for NFS -clients running in containers. - - -The DNS resolver -================ - -NFSv4 allows for one server to refer the NFS client to data that has been -migrated onto another server by means of the special "fs_locations" -attribute. See - http://tools.ietf.org/html/rfc3530#section-6 -and - http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 - -The fs_locations information can take the form of either an ip address and -a path, or a DNS hostname and a path. The latter requires the NFS client to -do a DNS lookup in order to mount the new volume, and hence the need for an -upcall to allow userland to provide this service. - -Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual -/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps: - - (1) The process checks the dns_resolve cache to see if it contains a - valid entry. If so, it returns that entry and exits. - - (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' - (may be changed using the 'nfs.cache_getent' kernel boot parameter) - is run, with two arguments: - - the cache name, "dns_resolve" - - the hostname to resolve - - (3) After looking up the corresponding ip address, the helper script - writes the result into the rpc_pipefs pseudo-file - '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel' - in the following (text) format: - - " \n" - - Where is in the usual IPv4 (123.456.78.90) or IPv6 - (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format. - is identical to the second argument of the helper - script, and is the 'time to live' of this cache entry (in - units of seconds). - - Note: If is invalid, say the string "0", then a negative - entry is created, which will cause the kernel to treat the hostname - as having no valid DNS translation. - - - - -A basic sample /sbin/nfs_cache_getent -===================================== - -#!/bin/bash -# -ttl=600 -# -cut=/usr/bin/cut -getent=/usr/bin/getent -rpc_pipefs=/var/lib/nfs/rpc_pipefs -# -die() -{ - echo "Usage: $0 cache_name entry_name" - exit 1 -} - -[ $# -lt 2 ] && die -cachename="$1" -cache_path=${rpc_pipefs}/cache/${cachename}/channel - -case "${cachename}" in - dns_resolve) - name="$2" - result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" - [ -z "${result}" ] && result="0" - ;; - *) - die - ;; -esac -echo "${result} ${name} ${ttl}" >${cache_path} - -- cgit 1.2.3-korg From f9a9349846f92b2dabd26cef1f3873e346ba8c1b Mon Sep 17 00:00:00 2001 From: "Daniel W. S. Almeida" Date: Fri, 10 Jan 2020 20:24:24 -0300 Subject: Documentation: nfsroot.txt: convert to ReST Convert nfsroot.txt to RST and move it to admin-guide. Content remains mostly the same. Signed-off-by: Daniel W. S. Almeida Link: https://lore.kernel.org/r/442d35917351f5260dd8ed7362e9b5f1264ef8ad.1578697871.git.dwlsalmeida@gmail.com Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/nfs/index.rst | 1 + Documentation/admin-guide/nfs/nfsroot.rst | 363 ++++++++++++++++++++++++++++++ Documentation/filesystems/nfs/nfsroot.txt | 355 ----------------------------- 3 files changed, 364 insertions(+), 355 deletions(-) create mode 100644 Documentation/admin-guide/nfs/nfsroot.rst delete mode 100644 Documentation/filesystems/nfs/nfsroot.txt (limited to 'Documentation/filesystems') diff --git a/Documentation/admin-guide/nfs/index.rst b/Documentation/admin-guide/nfs/index.rst index 2fe77091c25c7a..ea780cda554998 100644 --- a/Documentation/admin-guide/nfs/index.rst +++ b/Documentation/admin-guide/nfs/index.rst @@ -6,3 +6,4 @@ NFS :maxdepth: 1 nfs-client + nfsroot diff --git a/Documentation/admin-guide/nfs/nfsroot.rst b/Documentation/admin-guide/nfs/nfsroot.rst new file mode 100644 index 00000000000000..9249be637833b5 --- /dev/null +++ b/Documentation/admin-guide/nfs/nfsroot.rst @@ -0,0 +1,363 @@ +=============================================== +Mounting the root filesystem via NFS (nfsroot) +=============================================== + +:Authors: + Written 1996 by Gero Kuhlmann + + Updated 1997 by Martin Mares + + Updated 2006 by Nico Schottelius + + Updated 2006 by Horms + + Updated 2018 by Chris Novakovic + + + +In order to use a diskless system, such as an X-terminal or printer server +for example, it is necessary for the root filesystem to be present on a +non-disk device. This may be an initramfs (see Documentation/filesystems/ramfs-rootfs-initramfs.txt), +a ramdisk (see Documentation/admin-guide/initrd.rst) or a +filesystem mounted via NFS. The following text describes on how to use NFS +for the root filesystem. For the rest of this text 'client' means the +diskless system, and 'server' means the NFS server. + + + + +Enabling nfsroot capabilities +============================= + +In order to use nfsroot, NFS client support needs to be selected as +built-in during configuration. Once this has been selected, the nfsroot +option will become available, which should also be selected. + +In the networking options, kernel level autoconfiguration can be selected, +along with the types of autoconfiguration to support. Selecting all of +DHCP, BOOTP and RARP is safe. + + + + +Kernel command line +=================== + +When the kernel has been loaded by a boot loader (see below) it needs to be +told what root fs device to use. And in the case of nfsroot, where to find +both the server and the name of the directory on the server to mount as root. +This can be established using the following kernel command line parameters: + + +root=/dev/nfs + This is necessary to enable the pseudo-NFS-device. Note that it's not a + real device but just a synonym to tell the kernel to use NFS instead of + a real device. + + +nfsroot=[:][,] + If the `nfsroot' parameter is NOT given on the command line, + the default ``"/tftpboot/%s"`` will be used. + + Specifies the IP address of the NFS server. + The default address is determined by the ip parameter + (see below). This parameter allows the use of different + servers for IP autoconfiguration and NFS. + + Name of the directory on the server to mount as root. + If there is a "%s" token in the string, it will be + replaced by the ASCII-representation of the client's + IP address. + + Standard NFS options. All options are separated by commas. + The following defaults are used:: + + port = as given by server portmap daemon + rsize = 4096 + wsize = 4096 + timeo = 7 + retrans = 3 + acregmin = 3 + acregmax = 60 + acdirmin = 30 + acdirmax = 60 + flags = hard, nointr, noposix, cto, ac + + +ip=::::::::: + This parameter tells the kernel how to configure IP addresses of devices + and also how to set up the IP routing table. It was originally called + nfsaddrs, but now the boot-time IP configuration works independently of + NFS, so it was renamed to ip and the old name remained as an alias for + compatibility reasons. + + If this parameter is missing from the kernel command line, all fields are + assumed to be empty, and the defaults mentioned below apply. In general + this means that the kernel tries to configure everything using + autoconfiguration. + + The parameter can appear alone as the value to the ip + parameter (without all the ':' characters before). If the value is + "ip=off" or "ip=none", no autoconfiguration will take place, otherwise + autoconfiguration will take place. The most common way to use this + is "ip=dhcp". + + IP address of the client. + Default: Determined using autoconfiguration. + + IP address of the NFS server. + If RARP is used to determine + the client address and this parameter is NOT empty only + replies from the specified server are accepted. + + Only required for NFS root. That is autoconfiguration + will not be triggered if it is missing and NFS root is not + in operation. + + Value is exported to /proc/net/pnp with the prefix "bootserver " + (see below). + + Default: Determined using autoconfiguration. + The address of the autoconfiguration server is used. + + IP address of a gateway if the server is on a different subnet. + Default: Determined using autoconfiguration. + + Netmask for local network interface. + If unspecified the netmask is derived from the client IP address + assuming classful addressing. + + Default: Determined using autoconfiguration. + + Name of the client. + If a '.' character is present, anything + before the first '.' is used as the client's hostname, and anything + after it is used as its NIS domain name. May be supplied by + autoconfiguration, but its absence will not trigger autoconfiguration. + If specified and DHCP is used, the user-provided hostname (and NIS + domain name, if present) will be carried in the DHCP request; this + may cause a DNS record to be created or updated for the client. + + Default: Client IP address is used in ASCII notation. + + Name of network device to use. + Default: If the host only has one device, it is used. + Otherwise the device is determined using + autoconfiguration. This is done by sending + autoconfiguration requests out of all devices, + and using the device that received the first reply. + + Method to use for autoconfiguration. + In the case of options + which specify multiple autoconfiguration protocols, + requests are sent using all protocols, and the first one + to reply is used. + + Only autoconfiguration protocols that have been compiled + into the kernel will be used, regardless of the value of + this option:: + + off or none: don't use autoconfiguration + (do static IP assignment instead) + on or any: use any protocol available in the kernel + (default) + dhcp: use DHCP + bootp: use BOOTP + rarp: use RARP + both: use both BOOTP and RARP but not DHCP + (old option kept for backwards compatibility) + + if dhcp is used, the client identifier can be used by following + format "ip=dhcp,client-id-type,client-id-value" + + Default: any + + IP address of primary nameserver. + Value is exported to /proc/net/pnp with the prefix "nameserver " + (see below). + + Default: None if not using autoconfiguration; determined + automatically if using autoconfiguration. + + IP address of secondary nameserver. + See . + + IP address of a Network Time Protocol (NTP) server. + Value is exported to /proc/net/ipconfig/ntp_servers, but is + otherwise unused (see below). + + Default: None if not using autoconfiguration; determined + automatically if using autoconfiguration. + + After configuration (whether manual or automatic) is complete, two files + are created in the following format; lines are omitted if their respective + value is empty following configuration: + + - /proc/net/pnp: + + #PROTO: (depending on configuration method) + domain (if autoconfigured, the DNS domain) + nameserver (primary name server IP) + nameserver (secondary name server IP) + nameserver (tertiary name server IP) + bootserver (NFS server IP) + + - /proc/net/ipconfig/ntp_servers: + + (NTP server IP) + (NTP server IP) + (NTP server IP) + + and (in /proc/net/pnp) and and + (in /proc/net/ipconfig/ntp_servers) are requested during autoconfiguration; + they cannot be specified as part of the "ip=" kernel command line parameter. + + Because the "domain" and "nameserver" options are recognised by DNS + resolvers, /etc/resolv.conf is often linked to /proc/net/pnp on systems + that use an NFS root filesystem. + + Note that the kernel will not synchronise the system time with any NTP + servers it discovers; this is the responsibility of a user space process + (e.g. an initrd/initramfs script that passes the IP addresses listed in + /proc/net/ipconfig/ntp_servers to an NTP client before mounting the real + root filesystem if it is on NFS). + + +nfsrootdebug + This parameter enables debugging messages to appear in the kernel + log at boot time so that administrators can verify that the correct + NFS mount options, server address, and root path are passed to the + NFS client. + + +rdinit= + To specify which file contains the program that starts system + initialization, administrators can use this command line parameter. + The default value of this parameter is "/init". If the specified + file exists and the kernel can execute it, root filesystem related + kernel command line parameters, including 'nfsroot=', are ignored. + + A description of the process of mounting the root file system can be + found in Documentation/driver-api/early-userspace/early_userspace_support.rst + + +Boot Loader +=========== + +To get the kernel into memory different approaches can be used. +They depend on various facilities being available: + + +- Booting from a floppy using syslinux + + When building kernels, an easy way to create a boot floppy that uses + syslinux is to use the zdisk or bzdisk make targets which use zimage + and bzimage images respectively. Both targets accept the + FDARGS parameter which can be used to set the kernel command line. + + e.g:: + + make bzdisk FDARGS="root=/dev/nfs" + + Note that the user running this command will need to have + access to the floppy drive device, /dev/fd0 + + For more information on syslinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + + .. note:: + Previously it was possible to write a kernel directly to + a floppy using dd, configure the boot device using rdev, and + boot using the resulting floppy. Linux no longer supports this + method of booting. + +- Booting from a cdrom using isolinux + + When building kernels, an easy way to create a bootable cdrom that + uses isolinux is to use the isoimage target which uses a bzimage + image. Like zdisk and bzdisk, this target accepts the FDARGS + parameter which can be used to set the kernel command line. + + e.g:: + + make isoimage FDARGS="root=/dev/nfs" + + The resulting iso image will be arch//boot/image.iso + This can be written to a cdrom using a variety of tools including + cdrecord. + + e.g:: + + cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso + + For more information on isolinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + +- Using LILO + + When using LILO all the necessary command line parameters may be + specified using the 'append=' directive in the LILO configuration + file. + + However, to use the 'root=' directive you also need to create + a dummy root device, which may be removed after LILO is run. + + e.g:: + + mknod /dev/boot255 c 0 255 + + For information on configuring LILO, please refer to its documentation. + +- Using GRUB + + When using GRUB, kernel parameter are simply appended after the kernel + specification: kernel + +- Using loadlin + + loadlin may be used to boot Linux from a DOS command prompt without + requiring a local hard disk to mount as root. This has not been + thoroughly tested by the authors of this document, but in general + it should be possible configure the kernel command line similarly + to the configuration of LILO. + + Please refer to the loadlin documentation for further information. + +- Using a boot ROM + + This is probably the most elegant way of booting a diskless client. + With a boot ROM the kernel is loaded using the TFTP protocol. The + authors of this document are not aware of any no commercial boot + ROMs that support booting Linux over the network. However, there + are two free implementations of a boot ROM, netboot-nfs and + etherboot, both of which are available on sunsite.unc.edu, and both + of which contain everything you need to boot a diskless Linux client. + +- Using pxelinux + + Pxelinux may be used to boot linux using the PXE boot loader + which is present on many modern network cards. + + When using pxelinux, the kernel image is specified using + "kernel ". The nfsroot parameters + are passed to the kernel by adding them to the "append" line. + It is common to use serial console in conjunction with pxeliunx, + see Documentation/admin-guide/serial-console.rst for more information. + + For more information on isolinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + + + + +Credits +======= + + The nfsroot code in the kernel and the RARP support have been written + by Gero Kuhlmann . + + The rest of the IP layer autoconfiguration code has been written + by Martin Mares . + + In order to write the initial version of nfsroot I would like to thank + Jens-Uwe Mager for his help. diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt deleted file mode 100644 index ae433246456053..00000000000000 --- a/Documentation/filesystems/nfs/nfsroot.txt +++ /dev/null @@ -1,355 +0,0 @@ -Mounting the root filesystem via NFS (nfsroot) -=============================================== - -Written 1996 by Gero Kuhlmann -Updated 1997 by Martin Mares -Updated 2006 by Nico Schottelius -Updated 2006 by Horms -Updated 2018 by Chris Novakovic - - - -In order to use a diskless system, such as an X-terminal or printer server -for example, it is necessary for the root filesystem to be present on a -non-disk device. This may be an initramfs (see Documentation/filesystems/ -ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/admin-guide/initrd.rst) or a -filesystem mounted via NFS. The following text describes on how to use NFS -for the root filesystem. For the rest of this text 'client' means the -diskless system, and 'server' means the NFS server. - - - - -1.) Enabling nfsroot capabilities - ----------------------------- - -In order to use nfsroot, NFS client support needs to be selected as -built-in during configuration. Once this has been selected, the nfsroot -option will become available, which should also be selected. - -In the networking options, kernel level autoconfiguration can be selected, -along with the types of autoconfiguration to support. Selecting all of -DHCP, BOOTP and RARP is safe. - - - - -2.) Kernel command line - ------------------- - -When the kernel has been loaded by a boot loader (see below) it needs to be -told what root fs device to use. And in the case of nfsroot, where to find -both the server and the name of the directory on the server to mount as root. -This can be established using the following kernel command line parameters: - - -root=/dev/nfs - - This is necessary to enable the pseudo-NFS-device. Note that it's not a - real device but just a synonym to tell the kernel to use NFS instead of - a real device. - - -nfsroot=[:][,] - - If the `nfsroot' parameter is NOT given on the command line, - the default "/tftpboot/%s" will be used. - - Specifies the IP address of the NFS server. - The default address is determined by the `ip' parameter - (see below). This parameter allows the use of different - servers for IP autoconfiguration and NFS. - - Name of the directory on the server to mount as root. - If there is a "%s" token in the string, it will be - replaced by the ASCII-representation of the client's - IP address. - - Standard NFS options. All options are separated by commas. - The following defaults are used: - port = as given by server portmap daemon - rsize = 4096 - wsize = 4096 - timeo = 7 - retrans = 3 - acregmin = 3 - acregmax = 60 - acdirmin = 30 - acdirmax = 60 - flags = hard, nointr, noposix, cto, ac - - -ip=::::::: - :: - - This parameter tells the kernel how to configure IP addresses of devices - and also how to set up the IP routing table. It was originally called - `nfsaddrs', but now the boot-time IP configuration works independently of - NFS, so it was renamed to `ip' and the old name remained as an alias for - compatibility reasons. - - If this parameter is missing from the kernel command line, all fields are - assumed to be empty, and the defaults mentioned below apply. In general - this means that the kernel tries to configure everything using - autoconfiguration. - - The parameter can appear alone as the value to the `ip' - parameter (without all the ':' characters before). If the value is - "ip=off" or "ip=none", no autoconfiguration will take place, otherwise - autoconfiguration will take place. The most common way to use this - is "ip=dhcp". - - IP address of the client. - - Default: Determined using autoconfiguration. - - IP address of the NFS server. If RARP is used to determine - the client address and this parameter is NOT empty only - replies from the specified server are accepted. - - Only required for NFS root. That is autoconfiguration - will not be triggered if it is missing and NFS root is not - in operation. - - Value is exported to /proc/net/pnp with the prefix "bootserver " - (see below). - - Default: Determined using autoconfiguration. - The address of the autoconfiguration server is used. - - IP address of a gateway if the server is on a different subnet. - - Default: Determined using autoconfiguration. - - Netmask for local network interface. If unspecified - the netmask is derived from the client IP address assuming - classful addressing. - - Default: Determined using autoconfiguration. - - Name of the client. If a '.' character is present, anything - before the first '.' is used as the client's hostname, and anything - after it is used as its NIS domain name. May be supplied by - autoconfiguration, but its absence will not trigger autoconfiguration. - If specified and DHCP is used, the user-provided hostname (and NIS - domain name, if present) will be carried in the DHCP request; this - may cause a DNS record to be created or updated for the client. - - Default: Client IP address is used in ASCII notation. - - Name of network device to use. - - Default: If the host only has one device, it is used. - Otherwise the device is determined using - autoconfiguration. This is done by sending - autoconfiguration requests out of all devices, - and using the device that received the first reply. - - Method to use for autoconfiguration. In the case of options - which specify multiple autoconfiguration protocols, - requests are sent using all protocols, and the first one - to reply is used. - - Only autoconfiguration protocols that have been compiled - into the kernel will be used, regardless of the value of - this option. - - off or none: don't use autoconfiguration - (do static IP assignment instead) - on or any: use any protocol available in the kernel - (default) - dhcp: use DHCP - bootp: use BOOTP - rarp: use RARP - both: use both BOOTP and RARP but not DHCP - (old option kept for backwards compatibility) - - if dhcp is used, the client identifier can be used by following - format "ip=dhcp,client-id-type,client-id-value" - - Default: any - - IP address of primary nameserver. - Value is exported to /proc/net/pnp with the prefix "nameserver " - (see below). - - Default: None if not using autoconfiguration; determined - automatically if using autoconfiguration. - - IP address of secondary nameserver. - See . - - IP address of a Network Time Protocol (NTP) server. - Value is exported to /proc/net/ipconfig/ntp_servers, but is - otherwise unused (see below). - - Default: None if not using autoconfiguration; determined - automatically if using autoconfiguration. - - After configuration (whether manual or automatic) is complete, two files - are created in the following format; lines are omitted if their respective - value is empty following configuration: - - - /proc/net/pnp: - - #PROTO: (depending on configuration method) - domain (if autoconfigured, the DNS domain) - nameserver (primary name server IP) - nameserver (secondary name server IP) - nameserver (tertiary name server IP) - bootserver (NFS server IP) - - - /proc/net/ipconfig/ntp_servers: - - (NTP server IP) - (NTP server IP) - (NTP server IP) - - and (in /proc/net/pnp) and and - (in /proc/net/ipconfig/ntp_servers) are requested during autoconfiguration; - they cannot be specified as part of the "ip=" kernel command line parameter. - - Because the "domain" and "nameserver" options are recognised by DNS - resolvers, /etc/resolv.conf is often linked to /proc/net/pnp on systems - that use an NFS root filesystem. - - Note that the kernel will not synchronise the system time with any NTP - servers it discovers; this is the responsibility of a user space process - (e.g. an initrd/initramfs script that passes the IP addresses listed in - /proc/net/ipconfig/ntp_servers to an NTP client before mounting the real - root filesystem if it is on NFS). - - -nfsrootdebug - - This parameter enables debugging messages to appear in the kernel - log at boot time so that administrators can verify that the correct - NFS mount options, server address, and root path are passed to the - NFS client. - - -rdinit= - - To specify which file contains the program that starts system - initialization, administrators can use this command line parameter. - The default value of this parameter is "/init". If the specified - file exists and the kernel can execute it, root filesystem related - kernel command line parameters, including `nfsroot=', are ignored. - - A description of the process of mounting the root file system can be - found in: - - Documentation/driver-api/early-userspace/early_userspace_support.rst - - - - -3.) Boot Loader - ---------- - -To get the kernel into memory different approaches can be used. -They depend on various facilities being available: - - -3.1) Booting from a floppy using syslinux - - When building kernels, an easy way to create a boot floppy that uses - syslinux is to use the zdisk or bzdisk make targets which use zimage - and bzimage images respectively. Both targets accept the - FDARGS parameter which can be used to set the kernel command line. - - e.g. - make bzdisk FDARGS="root=/dev/nfs" - - Note that the user running this command will need to have - access to the floppy drive device, /dev/fd0 - - For more information on syslinux, including how to create bootdisks - for prebuilt kernels, see http://syslinux.zytor.com/ - - N.B: Previously it was possible to write a kernel directly to - a floppy using dd, configure the boot device using rdev, and - boot using the resulting floppy. Linux no longer supports this - method of booting. - -3.2) Booting from a cdrom using isolinux - - When building kernels, an easy way to create a bootable cdrom that - uses isolinux is to use the isoimage target which uses a bzimage - image. Like zdisk and bzdisk, this target accepts the FDARGS - parameter which can be used to set the kernel command line. - - e.g. - make isoimage FDARGS="root=/dev/nfs" - - The resulting iso image will be arch//boot/image.iso - This can be written to a cdrom using a variety of tools including - cdrecord. - - e.g. - cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso - - For more information on isolinux, including how to create bootdisks - for prebuilt kernels, see http://syslinux.zytor.com/ - -3.2) Using LILO - When using LILO all the necessary command line parameters may be - specified using the 'append=' directive in the LILO configuration - file. - - However, to use the 'root=' directive you also need to create - a dummy root device, which may be removed after LILO is run. - - mknod /dev/boot255 c 0 255 - - For information on configuring LILO, please refer to its documentation. - -3.3) Using GRUB - When using GRUB, kernel parameter are simply appended after the kernel - specification: kernel - -3.4) Using loadlin - loadlin may be used to boot Linux from a DOS command prompt without - requiring a local hard disk to mount as root. This has not been - thoroughly tested by the authors of this document, but in general - it should be possible configure the kernel command line similarly - to the configuration of LILO. - - Please refer to the loadlin documentation for further information. - -3.5) Using a boot ROM - This is probably the most elegant way of booting a diskless client. - With a boot ROM the kernel is loaded using the TFTP protocol. The - authors of this document are not aware of any no commercial boot - ROMs that support booting Linux over the network. However, there - are two free implementations of a boot ROM, netboot-nfs and - etherboot, both of which are available on sunsite.unc.edu, and both - of which contain everything you need to boot a diskless Linux client. - -3.6) Using pxelinux - Pxelinux may be used to boot linux using the PXE boot loader - which is present on many modern network cards. - - When using pxelinux, the kernel image is specified using - "kernel ". The nfsroot parameters - are passed to the kernel by adding them to the "append" line. - It is common to use serial console in conjunction with pxeliunx, - see Documentation/admin-guide/serial-console.rst for more information. - - For more information on isolinux, including how to create bootdisks - for prebuilt kernels, see http://syslinux.zytor.com/ - - - - -4.) Credits - ------- - - The nfsroot code in the kernel and the RARP support have been written - by Gero Kuhlmann . - - The rest of the IP layer autoconfiguration code has been written - by Martin Mares . - - In order to write the initial version of nfsroot I would like to thank - Jens-Uwe Mager for his help. -- cgit 1.2.3-korg From f8b8d030597a3b0a20e9cc2e958f82164690fbdb Mon Sep 17 00:00:00 2001 From: "Daniel W. S. Almeida" Date: Fri, 10 Jan 2020 20:24:26 -0300 Subject: Documentation: nfs-rdma: convert to ReST Convert nfs-rdma to ReST and move it to admin-guide. Content remais mostly untouched. Also, mark the doc as obsolete. Signed-off-by: Daniel W. S. Almeida Link: https://lore.kernel.org/r/9c88f184f9de2a3eb5181563e258559efc02f58a.1578697871.git.dwlsalmeida@gmail.com Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/nfs/index.rst | 1 + Documentation/admin-guide/nfs/nfs-rdma.rst | 292 +++++++++++++++++++++++++++++ Documentation/filesystems/nfs/nfs-rdma.txt | 274 --------------------------- 3 files changed, 293 insertions(+), 274 deletions(-) create mode 100644 Documentation/admin-guide/nfs/nfs-rdma.rst delete mode 100644 Documentation/filesystems/nfs/nfs-rdma.txt (limited to 'Documentation/filesystems') diff --git a/Documentation/admin-guide/nfs/index.rst b/Documentation/admin-guide/nfs/index.rst index ea780cda554998..875a96fe9d04f4 100644 --- a/Documentation/admin-guide/nfs/index.rst +++ b/Documentation/admin-guide/nfs/index.rst @@ -7,3 +7,4 @@ NFS nfs-client nfsroot + nfs-rdma diff --git a/Documentation/admin-guide/nfs/nfs-rdma.rst b/Documentation/admin-guide/nfs/nfs-rdma.rst new file mode 100644 index 00000000000000..ef0f3678b1fb8f --- /dev/null +++ b/Documentation/admin-guide/nfs/nfs-rdma.rst @@ -0,0 +1,292 @@ +=================== +Setting up NFS/RDMA +=================== + +:Author: + NetApp and Open Grid Computing (May 29, 2008) + +.. warning:: + This document is probably obsolete. + +Overview +======== + +This document describes how to install and setup the Linux NFS/RDMA client +and server software. + +The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server +was first included in the following release, Linux 2.6.25. + +In our testing, we have obtained excellent performance results (full 10Gbit +wire bandwidth at minimal client CPU) under many workloads. The code passes +the full Connectathon test suite and operates over both Infiniband and iWARP +RDMA adapters. + +Getting Help +============ + +If you get stuck, you can ask questions on the +nfs-rdma-devel@lists.sourceforge.net mailing list. + +Installation +============ + +These instructions are a step by step guide to building a machine for +use with NFS/RDMA. + +- Install an RDMA device + + Any device supported by the drivers in drivers/infiniband/hw is acceptable. + + Testing has been performed using several Mellanox-based IB cards, the + Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. + +- Install a Linux distribution and tools + + The first kernel release to contain both the NFS/RDMA client and server was + Linux 2.6.25 Therefore, a distribution compatible with this and subsequent + Linux kernel release should be installed. + + The procedures described in this document have been tested with + distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). + +- Install nfs-utils-1.1.2 or greater on the client + + An NFS/RDMA mount point can be obtained by using the mount.nfs command in + nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils + version with support for NFS/RDMA mounts, but for various reasons we + recommend using nfs-utils-1.1.2 or greater). To see which version of + mount.nfs you are using, type: + + .. code-block:: sh + + $ /sbin/mount.nfs -V + + If the version is less than 1.1.2 or the command does not exist, + you should install the latest version of nfs-utils. + + Download the latest package from: http://www.kernel.org/pub/linux/utils/nfs + + Uncompress the package and follow the installation instructions. + + If you will not need the idmapper and gssd executables (you do not need + these to create an NFS/RDMA enabled mount command), the installation + process can be simplified by disabling these features when running + configure: + + .. code-block:: sh + + $ ./configure --disable-gss --disable-nfsv4 + + To build nfs-utils you will need the tcp_wrappers package installed. For + more information on this see the package's README and INSTALL files. + + After building the nfs-utils package, there will be a mount.nfs binary in + the utils/mount directory. This binary can be used to initiate NFS v2, v3, + or v4 mounts. To initiate a v4 mount, the binary must be called + mount.nfs4. The standard technique is to create a symlink called + mount.nfs4 to mount.nfs. + + This mount.nfs binary should be installed at /sbin/mount.nfs as follows: + + .. code-block:: sh + + $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs + + In this location, mount.nfs will be invoked automatically for NFS mounts + by the system mount command. + + .. note:: + mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed + on the NFS client machine. You do not need this specific version of + nfs-utils on the server. Furthermore, only the mount.nfs command from + nfs-utils-1.1.2 is needed on the client. + +- Install a Linux kernel with NFS/RDMA + + The NFS/RDMA client and server are both included in the mainline Linux + kernel version 2.6.25 and later. This and other versions of the Linux + kernel can be found at: https://www.kernel.org/pub/linux/kernel/ + + Download the sources and place them in an appropriate location. + +- Configure the RDMA stack + + Make sure your kernel configuration has RDMA support enabled. Under + Device Drivers -> InfiniBand support, update the kernel configuration + to enable InfiniBand support [NOTE: the option name is misleading. Enabling + InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. + + Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or + iWARP adapter support (amso, cxgb3, etc.). + + If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. + +- Configure the NFS client and server + + Your kernel configuration must also have NFS file system support and/or + NFS server support enabled. These and other NFS related configuration + options can be found under File Systems -> Network File Systems. + +- Build, install, reboot + + The NFS/RDMA code will be enabled automatically if NFS and RDMA + are turned on. The NFS/RDMA client and server are configured via the hidden + SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The + value of SUNRPC_XPRT_RDMA will be: + + #. N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client + and server will not be built + + #. M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, + in this case the NFS/RDMA client and server will be built as modules + + #. Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client + and server will be built into the kernel + + Therefore, if you have followed the steps above and turned no NFS and RDMA, + the NFS/RDMA client and server will be built. + + Build a new kernel, install it, boot it. + +Check RDMA and NFS Setup +======================== + +Before configuring the NFS/RDMA software, it is a good idea to test +your new kernel to ensure that the kernel is working correctly. +In particular, it is a good idea to verify that the RDMA stack +is functioning as expected and standard NFS over TCP/IP and/or UDP/IP +is working properly. + +- Check RDMA Setup + + If you built the RDMA components as modules, load them at + this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel + card: + + .. code-block:: sh + + $ modprobe ib_mthca + $ modprobe ib_ipoib + + If you are using InfiniBand, make sure there is a Subnet Manager (SM) + running on the network. If your IB switch has an embedded SM, you can + use it. Otherwise, you will need to run an SM, such as OpenSM, on one + of your end nodes. + + If an SM is running on your network, you should see the following: + + .. code-block:: sh + + $ cat /sys/class/infiniband/driverX/ports/1/state + 4: ACTIVE + + where driverX is mthca0, ipath5, ehca3, etc. + + To further test the InfiniBand software stack, use IPoIB (this + assumes you have two IB hosts named host1 and host2): + + .. code-block:: sh + + host1$ ip link set dev ib0 up + host1$ ip address add dev ib0 a.b.c.x + host2$ ip link set dev ib0 up + host2$ ip address add dev ib0 a.b.c.y + host1$ ping a.b.c.y + host2$ ping a.b.c.x + + For other device types, follow the appropriate procedures. + +- Check NFS Setup + + For the NFS components enabled above (client and/or server), + test their functionality over standard Ethernet using TCP/IP or UDP/IP. + +NFS/RDMA Setup +============== + +We recommend that you use two machines, one to act as the client and +one to act as the server. + +One time configuration: +----------------------- + +- On the server system, configure the /etc/exports file and start the NFS/RDMA server. + + Exports entries with the following formats have been tested:: + + /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) + /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) + + The IP address(es) is(are) the client's IPoIB address for an InfiniBand + HCA or the client's iWARP address(es) for an RNIC. + + .. note:: + The "insecure" option must be used because the NFS/RDMA client does + not use a reserved port. + +Each time a machine boots: +-------------------------- + +- Load and configure the RDMA drivers + + For InfiniBand using a Mellanox adapter: + + .. code-block:: sh + + $ modprobe ib_mthca + $ modprobe ib_ipoib + $ ip li set dev ib0 up + $ ip addr add dev ib0 a.b.c.d + + .. note:: + Please use unique addresses for the client and server! + +- Start the NFS server + + If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA transport module: + + .. code-block:: sh + + $ modprobe svcrdma + + Regardless of how the server was built (module or built-in), start the + server: + + .. code-block:: sh + + $ /etc/init.d/nfs start + + or + + .. code-block:: sh + + $ service nfs start + + Instruct the server to listen on the RDMA transport: + + .. code-block:: sh + + $ echo rdma 20049 > /proc/fs/nfsd/portlist + +- On the client system + + If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA client module: + + .. code-block:: sh + + $ modprobe xprtrdma.ko + + Regardless of how the client was built (module or built-in), use this + command to mount the NFS/RDMA server: + + .. code-block:: sh + + $ mount -o rdma,port=20049 :/ /mnt + + To verify that the mount is using RDMA, run "cat /proc/mounts" and check + the "proto" field for the given mount. + + Congratulations! You're using NFS/RDMA! diff --git a/Documentation/filesystems/nfs/nfs-rdma.txt b/Documentation/filesystems/nfs/nfs-rdma.txt deleted file mode 100644 index 22dc0dd6889cbf..00000000000000 --- a/Documentation/filesystems/nfs/nfs-rdma.txt +++ /dev/null @@ -1,274 +0,0 @@ -################################################################################ -# # -# NFS/RDMA README # -# # -################################################################################ - - Author: NetApp and Open Grid Computing - Date: May 29, 2008 - -Table of Contents -~~~~~~~~~~~~~~~~~ - - Overview - - Getting Help - - Installation - - Check RDMA and NFS Setup - - NFS/RDMA Setup - -Overview -~~~~~~~~ - - This document describes how to install and setup the Linux NFS/RDMA client - and server software. - - The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server - was first included in the following release, Linux 2.6.25. - - In our testing, we have obtained excellent performance results (full 10Gbit - wire bandwidth at minimal client CPU) under many workloads. The code passes - the full Connectathon test suite and operates over both Infiniband and iWARP - RDMA adapters. - -Getting Help -~~~~~~~~~~~~ - - If you get stuck, you can ask questions on the - - nfs-rdma-devel@lists.sourceforge.net - - mailing list. - -Installation -~~~~~~~~~~~~ - - These instructions are a step by step guide to building a machine for - use with NFS/RDMA. - - - Install an RDMA device - - Any device supported by the drivers in drivers/infiniband/hw is acceptable. - - Testing has been performed using several Mellanox-based IB cards, the - Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. - - - Install a Linux distribution and tools - - The first kernel release to contain both the NFS/RDMA client and server was - Linux 2.6.25 Therefore, a distribution compatible with this and subsequent - Linux kernel release should be installed. - - The procedures described in this document have been tested with - distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). - - - Install nfs-utils-1.1.2 or greater on the client - - An NFS/RDMA mount point can be obtained by using the mount.nfs command in - nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils - version with support for NFS/RDMA mounts, but for various reasons we - recommend using nfs-utils-1.1.2 or greater). To see which version of - mount.nfs you are using, type: - - $ /sbin/mount.nfs -V - - If the version is less than 1.1.2 or the command does not exist, - you should install the latest version of nfs-utils. - - Download the latest package from: - - http://www.kernel.org/pub/linux/utils/nfs - - Uncompress the package and follow the installation instructions. - - If you will not need the idmapper and gssd executables (you do not need - these to create an NFS/RDMA enabled mount command), the installation - process can be simplified by disabling these features when running - configure: - - $ ./configure --disable-gss --disable-nfsv4 - - To build nfs-utils you will need the tcp_wrappers package installed. For - more information on this see the package's README and INSTALL files. - - After building the nfs-utils package, there will be a mount.nfs binary in - the utils/mount directory. This binary can be used to initiate NFS v2, v3, - or v4 mounts. To initiate a v4 mount, the binary must be called - mount.nfs4. The standard technique is to create a symlink called - mount.nfs4 to mount.nfs. - - This mount.nfs binary should be installed at /sbin/mount.nfs as follows: - - $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs - - In this location, mount.nfs will be invoked automatically for NFS mounts - by the system mount command. - - NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed - on the NFS client machine. You do not need this specific version of - nfs-utils on the server. Furthermore, only the mount.nfs command from - nfs-utils-1.1.2 is needed on the client. - - - Install a Linux kernel with NFS/RDMA - - The NFS/RDMA client and server are both included in the mainline Linux - kernel version 2.6.25 and later. This and other versions of the Linux - kernel can be found at: - - https://www.kernel.org/pub/linux/kernel/ - - Download the sources and place them in an appropriate location. - - - Configure the RDMA stack - - Make sure your kernel configuration has RDMA support enabled. Under - Device Drivers -> InfiniBand support, update the kernel configuration - to enable InfiniBand support [NOTE: the option name is misleading. Enabling - InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. - - Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or - iWARP adapter support (amso, cxgb3, etc.). - - If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. - - - Configure the NFS client and server - - Your kernel configuration must also have NFS file system support and/or - NFS server support enabled. These and other NFS related configuration - options can be found under File Systems -> Network File Systems. - - - Build, install, reboot - - The NFS/RDMA code will be enabled automatically if NFS and RDMA - are turned on. The NFS/RDMA client and server are configured via the hidden - SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The - value of SUNRPC_XPRT_RDMA will be: - - - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client - and server will not be built - - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, - in this case the NFS/RDMA client and server will be built as modules - - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client - and server will be built into the kernel - - Therefore, if you have followed the steps above and turned no NFS and RDMA, - the NFS/RDMA client and server will be built. - - Build a new kernel, install it, boot it. - -Check RDMA and NFS Setup -~~~~~~~~~~~~~~~~~~~~~~~~ - - Before configuring the NFS/RDMA software, it is a good idea to test - your new kernel to ensure that the kernel is working correctly. - In particular, it is a good idea to verify that the RDMA stack - is functioning as expected and standard NFS over TCP/IP and/or UDP/IP - is working properly. - - - Check RDMA Setup - - If you built the RDMA components as modules, load them at - this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel - card: - - $ modprobe ib_mthca - $ modprobe ib_ipoib - - If you are using InfiniBand, make sure there is a Subnet Manager (SM) - running on the network. If your IB switch has an embedded SM, you can - use it. Otherwise, you will need to run an SM, such as OpenSM, on one - of your end nodes. - - If an SM is running on your network, you should see the following: - - $ cat /sys/class/infiniband/driverX/ports/1/state - 4: ACTIVE - - where driverX is mthca0, ipath5, ehca3, etc. - - To further test the InfiniBand software stack, use IPoIB (this - assumes you have two IB hosts named host1 and host2): - - host1$ ip link set dev ib0 up - host1$ ip address add dev ib0 a.b.c.x - host2$ ip link set dev ib0 up - host2$ ip address add dev ib0 a.b.c.y - host1$ ping a.b.c.y - host2$ ping a.b.c.x - - For other device types, follow the appropriate procedures. - - - Check NFS Setup - - For the NFS components enabled above (client and/or server), - test their functionality over standard Ethernet using TCP/IP or UDP/IP. - -NFS/RDMA Setup -~~~~~~~~~~~~~~ - - We recommend that you use two machines, one to act as the client and - one to act as the server. - - One time configuration: - - - On the server system, configure the /etc/exports file and - start the NFS/RDMA server. - - Exports entries with the following formats have been tested: - - /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) - /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) - - The IP address(es) is(are) the client's IPoIB address for an InfiniBand - HCA or the client's iWARP address(es) for an RNIC. - - NOTE: The "insecure" option must be used because the NFS/RDMA client does - not use a reserved port. - - Each time a machine boots: - - - Load and configure the RDMA drivers - - For InfiniBand using a Mellanox adapter: - - $ modprobe ib_mthca - $ modprobe ib_ipoib - $ ip li set dev ib0 up - $ ip addr add dev ib0 a.b.c.d - - NOTE: use unique addresses for the client and server - - - Start the NFS server - - If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in - kernel config), load the RDMA transport module: - - $ modprobe svcrdma - - Regardless of how the server was built (module or built-in), start the - server: - - $ /etc/init.d/nfs start - - or - - $ service nfs start - - Instruct the server to listen on the RDMA transport: - - $ echo rdma 20049 > /proc/fs/nfsd/portlist - - - On the client system - - If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in - kernel config), load the RDMA client module: - - $ modprobe xprtrdma.ko - - Regardless of how the client was built (module or built-in), use this - command to mount the NFS/RDMA server: - - $ mount -o rdma,port=20049 :/ /mnt - - To verify that the mount is using RDMA, run "cat /proc/mounts" and check - the "proto" field for the given mount. - - Congratulations! You're using NFS/RDMA! -- cgit 1.2.3-korg From 0f3456ba9fb61584a891fb5264cf09e4d5fe0741 Mon Sep 17 00:00:00 2001 From: "Daniel W. S. Almeida" Date: Fri, 10 Jan 2020 20:24:27 -0300 Subject: Documentation: convert nfsd-admin-interfaces to ReST Convert nfsd-admin-interfaces to ReST and move it into admin-guide. Content remains mostly untouched. Signed-off-by: Daniel W. S. Almeida Link: https://lore.kernel.org/r/d471305e9c96dec38f18d2ff816fca2269a88e29.1578697871.git.dwlsalmeida@gmail.com Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/nfs/index.rst | 1 + .../admin-guide/nfs/nfsd-admin-interfaces.rst | 40 +++++++++++++++++++++ .../filesystems/nfs/nfsd-admin-interfaces.txt | 41 ---------------------- 3 files changed, 41 insertions(+), 41 deletions(-) create mode 100644 Documentation/admin-guide/nfs/nfsd-admin-interfaces.rst delete mode 100644 Documentation/filesystems/nfs/nfsd-admin-interfaces.txt (limited to 'Documentation/filesystems') diff --git a/Documentation/admin-guide/nfs/index.rst b/Documentation/admin-guide/nfs/index.rst index 875a96fe9d04f4..e0b2f4260ad708 100644 --- a/Documentation/admin-guide/nfs/index.rst +++ b/Documentation/admin-guide/nfs/index.rst @@ -8,3 +8,4 @@ NFS nfs-client nfsroot nfs-rdma + nfsd-admin-interfaces diff --git a/Documentation/admin-guide/nfs/nfsd-admin-interfaces.rst b/Documentation/admin-guide/nfs/nfsd-admin-interfaces.rst new file mode 100644 index 00000000000000..c05926f79054b1 --- /dev/null +++ b/Documentation/admin-guide/nfs/nfsd-admin-interfaces.rst @@ -0,0 +1,40 @@ +================================== +Administrative interfaces for nfsd +================================== + +Note that normally these interfaces are used only by the utilities in +nfs-utils. + +nfsd is controlled mainly by pseudofiles under the "nfsd" filesystem, +which is normally mounted at /proc/fs/nfsd/. + +The server is always started by the first write of a nonzero value to +nfsd/threads. + +Before doing that, NFSD can be told which sockets to listen on by +writing to nfsd/portlist; that write may be: + + - an ascii-encoded file descriptor, which should refer to a + bound (and listening, for tcp) socket, or + - "transportname port", where transportname is currently either + "udp", "tcp", or "rdma". + +If nfsd is started without doing any of these, then it will create one +udp and one tcp listener at port 2049 (see nfsd_init_socks). + +On startup, nfsd and lockd grace periods start. nfsd is shut down by a write of +0 to nfsd/threads. All locks and state are thrown away at that point. + +Between startup and shutdown, the number of threads may be adjusted up +or down by additional writes to nfsd/threads or by writes to +nfsd/pool_threads. + +For more detail about files under nfsd/ and what they control, see +fs/nfsd/nfsctl.c; most of them have detailed comments. + +Implementation notes +==================== + +Note that the rpc server requires the caller to serialize addition and +removal of listening sockets, and startup and shutdown of the server. +For nfsd this is done using nfsd_mutex. diff --git a/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt b/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt deleted file mode 100644 index 56a96fb08a73dd..00000000000000 --- a/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt +++ /dev/null @@ -1,41 +0,0 @@ -Administrative interfaces for nfsd -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Note that normally these interfaces are used only by the utilities in -nfs-utils. - -nfsd is controlled mainly by pseudofiles under the "nfsd" filesystem, -which is normally mounted at /proc/fs/nfsd/. - -The server is always started by the first write of a nonzero value to -nfsd/threads. - -Before doing that, NFSD can be told which sockets to listen on by -writing to nfsd/portlist; that write may be: - - - an ascii-encoded file descriptor, which should refer to a - bound (and listening, for tcp) socket, or - - "transportname port", where transportname is currently either - "udp", "tcp", or "rdma". - -If nfsd is started without doing any of these, then it will create one -udp and one tcp listener at port 2049 (see nfsd_init_socks). - -On startup, nfsd and lockd grace periods start. - -nfsd is shut down by a write of 0 to nfsd/threads. All locks and state -are thrown away at that point. - -Between startup and shutdown, the number of threads may be adjusted up -or down by additional writes to nfsd/threads or by writes to -nfsd/pool_threads. - -For more detail about files under nfsd/ and what they control, see -fs/nfsd/nfsctl.c; most of them have detailed comments. - -Implementation notes -^^^^^^^^^^^^^^^^^^^^ - -Note that the rpc server requires the caller to serialize addition and -removal of listening sockets, and startup and shutdown of the server. -For nfsd this is done using nfsd_mutex. -- cgit 1.2.3-korg From fbdcd0b8e56492dd85bd8d08f15a14334bb59259 Mon Sep 17 00:00:00 2001 From: "Daniel W. S. Almeida" Date: Fri, 10 Jan 2020 20:24:28 -0300 Subject: Documentation: nfs: idmapper: convert to ReST Convert idmapper.txt to ReST and move it to admin-guide. Content remains mostly unchanged otherwise. Signed-off-by: Daniel W. S. Almeida Link: https://lore.kernel.org/r/069e40cd551ea778538f8fe9ad15ee26e45fc748.1578697871.git.dwlsalmeida@gmail.com Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/nfs/index.rst | 1 + Documentation/admin-guide/nfs/nfs-idmapper.rst | 78 ++++++++++++++++++++++++++ Documentation/filesystems/nfs/idmapper.txt | 75 ------------------------- 3 files changed, 79 insertions(+), 75 deletions(-) create mode 100644 Documentation/admin-guide/nfs/nfs-idmapper.rst delete mode 100644 Documentation/filesystems/nfs/idmapper.txt (limited to 'Documentation/filesystems') diff --git a/Documentation/admin-guide/nfs/index.rst b/Documentation/admin-guide/nfs/index.rst index e0b2f4260ad708..8376d5225fc2a4 100644 --- a/Documentation/admin-guide/nfs/index.rst +++ b/Documentation/admin-guide/nfs/index.rst @@ -9,3 +9,4 @@ NFS nfsroot nfs-rdma nfsd-admin-interfaces + nfs-idmapper diff --git a/Documentation/admin-guide/nfs/nfs-idmapper.rst b/Documentation/admin-guide/nfs/nfs-idmapper.rst new file mode 100644 index 00000000000000..58b8e63412d527 --- /dev/null +++ b/Documentation/admin-guide/nfs/nfs-idmapper.rst @@ -0,0 +1,78 @@ +============= +NFS ID Mapper +============= + +Id mapper is used by NFS to translate user and group ids into names, and to +translate user and group names into ids. Part of this translation involves +performing an upcall to userspace to request the information. There are two +ways NFS could obtain this information: placing a call to /sbin/request-key +or by placing a call to the rpc.idmap daemon. + +NFS will attempt to call /sbin/request-key first. If this succeeds, the +result will be cached using the generic request-key cache. This call should +only fail if /etc/request-key.conf is not configured for the id_resolver key +type, see the "Configuring" section below if you wish to use the request-key +method. + +If the call to /sbin/request-key fails (if /etc/request-key.conf is not +configured with the id_resolver key type), then the idmapper will ask the +legacy rpc.idmap daemon for the id mapping. This result will be stored +in a custom NFS idmap cache. + + +Configuring +=========== + +The file /etc/request-key.conf will need to be modified so /sbin/request-key can +direct the upcall. The following line should be added: + +``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...`` +``#====== ======= =============== =============== ===============================`` +``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600`` + + +This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap. +The last parameter, 600, defines how many seconds into the future the key will +expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout +is not specified, nfs.idmap will default to 600 seconds. + +id mapper uses for key descriptions:: + + uid: Find the UID for the given user + gid: Find the GID for the given group + user: Find the user name for the given UID + group: Find the group name for the given GID + +You can handle any of these individually, rather than using the generic upcall +program. If you would like to use your own program for a uid lookup then you +would edit your request-key.conf so it look similar to this: + +``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...`` +``#====== ======= =============== =============== ===============================`` +``create id_resolver uid:* * /some/other/program %k %d 600`` +``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600`` + + +Notice that the new line was added above the line for the generic program. +request-key will find the first matching line and corresponding program. In +this case, /some/other/program will handle all uid lookups and +/usr/sbin/nfs.idmap will handle gid, user, and group lookups. + +See Documentation/security/keys/request-key.rst for more information +about the request-key function. + + +nfs.idmap +========= + +nfs.idmap is designed to be called by request-key, and should not be run "by +hand". This program takes two arguments, a serialized key and a key +description. The serialized key is first converted into a key_serial_t, and +then passed as an argument to keyctl_instantiate (both are part of keyutils.h). + +The actual lookups are performed by functions found in nfsidmap.h. nfs.idmap +determines the correct function to call by looking at the first part of the +description string. For example, a uid lookup description will appear as +"uid:user@domain". + +nfs.idmap will return 0 if the key was instantiated, and non-zero otherwise. diff --git a/Documentation/filesystems/nfs/idmapper.txt b/Documentation/filesystems/nfs/idmapper.txt deleted file mode 100644 index b86831acd5834e..00000000000000 --- a/Documentation/filesystems/nfs/idmapper.txt +++ /dev/null @@ -1,75 +0,0 @@ - -========= -ID Mapper -========= -Id mapper is used by NFS to translate user and group ids into names, and to -translate user and group names into ids. Part of this translation involves -performing an upcall to userspace to request the information. There are two -ways NFS could obtain this information: placing a call to /sbin/request-key -or by placing a call to the rpc.idmap daemon. - -NFS will attempt to call /sbin/request-key first. If this succeeds, the -result will be cached using the generic request-key cache. This call should -only fail if /etc/request-key.conf is not configured for the id_resolver key -type, see the "Configuring" section below if you wish to use the request-key -method. - -If the call to /sbin/request-key fails (if /etc/request-key.conf is not -configured with the id_resolver key type), then the idmapper will ask the -legacy rpc.idmap daemon for the id mapping. This result will be stored -in a custom NFS idmap cache. - - -=========== -Configuring -=========== -The file /etc/request-key.conf will need to be modified so /sbin/request-key can -direct the upcall. The following line should be added: - -#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ... -#====== ======= =============== =============== =============================== -create id_resolver * * /usr/sbin/nfs.idmap %k %d 600 - -This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap. -The last parameter, 600, defines how many seconds into the future the key will -expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout -is not specified, nfs.idmap will default to 600 seconds. - -id mapper uses for key descriptions: - uid: Find the UID for the given user - gid: Find the GID for the given group - user: Find the user name for the given UID - group: Find the group name for the given GID - -You can handle any of these individually, rather than using the generic upcall -program. If you would like to use your own program for a uid lookup then you -would edit your request-key.conf so it look similar to this: - -#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ... -#====== ======= =============== =============== =============================== -create id_resolver uid:* * /some/other/program %k %d 600 -create id_resolver * * /usr/sbin/nfs.idmap %k %d 600 - -Notice that the new line was added above the line for the generic program. -request-key will find the first matching line and corresponding program. In -this case, /some/other/program will handle all uid lookups and -/usr/sbin/nfs.idmap will handle gid, user, and group lookups. - -See for more information -about the request-key function. - - -========= -nfs.idmap -========= -nfs.idmap is designed to be called by request-key, and should not be run "by -hand". This program takes two arguments, a serialized key and a key -description. The serialized key is first converted into a key_serial_t, and -then passed as an argument to keyctl_instantiate (both are part of keyutils.h). - -The actual lookups are performed by functions found in nfsidmap.h. nfs.idmap -determines the correct function to call by looking at the first part of the -description string. For example, a uid lookup description will appear as -"uid:user@domain". - -nfs.idmap will return 0 if the key was instantiated, and non-zero otherwise. -- cgit 1.2.3-korg From 26f6225fa53dc4ad26b9d9d712c0f55a92eb2c23 Mon Sep 17 00:00:00 2001 From: "Daniel W. S. Almeida" Date: Fri, 10 Jan 2020 20:24:29 -0300 Subject: Documentation: nfs: convert pnfs-block-server to ReST Convert pnfs-block-server.txt to ReST and move it to admin-guide. Content remains mostly unchanged. Signed-off-by: Daniel W. S. Almeida Link: https://lore.kernel.org/r/c06903760e690c16d9df92f5e75f80381d6326d8.1578697871.git.dwlsalmeida@gmail.com Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/nfs/index.rst | 1 + .../admin-guide/nfs/pnfs-block-server.rst | 42 ++++++++++++++++++++++ .../filesystems/nfs/pnfs-block-server.txt | 37 ------------------- 3 files changed, 43 insertions(+), 37 deletions(-) create mode 100644 Documentation/admin-guide/nfs/pnfs-block-server.rst delete mode 100644 Documentation/filesystems/nfs/pnfs-block-server.txt (limited to 'Documentation/filesystems') diff --git a/Documentation/admin-guide/nfs/index.rst b/Documentation/admin-guide/nfs/index.rst index 8376d5225fc2a4..365f42a611a435 100644 --- a/Documentation/admin-guide/nfs/index.rst +++ b/Documentation/admin-guide/nfs/index.rst @@ -10,3 +10,4 @@ NFS nfs-rdma nfsd-admin-interfaces nfs-idmapper + pnfs-block-server diff --git a/Documentation/admin-guide/nfs/pnfs-block-server.rst b/Documentation/admin-guide/nfs/pnfs-block-server.rst new file mode 100644 index 00000000000000..b00a2e705cc4d6 --- /dev/null +++ b/Documentation/admin-guide/nfs/pnfs-block-server.rst @@ -0,0 +1,42 @@ +=================================== +pNFS block layout server user guide +=================================== + +The Linux NFS server now supports the pNFS block layout extension. In this +case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition +to handling all the metadata access to the NFS export also hands out layouts +to the clients to directly access the underlying block devices that are +shared with the client. + +To use pNFS block layouts with with the Linux NFS server the exported file +system needs to support the pNFS block layouts (currently just XFS), and the +file system must sit on shared storage (typically iSCSI) that is accessible +to the clients in addition to the MDS. As of now the file system needs to +sit directly on the exported volume, striping or concatenation of +volumes on the MDS and clients is not supported yet. + +On the server, pNFS block volume support is automatically if the file system +support it. On the client make sure the kernel has the CONFIG_PNFS_BLOCK +option enabled, the blkmapd daemon from nfs-utils is running, and the +file system is mounted using the NFSv4.1 protocol version (mount -o vers=4.1). + +If the nfsd server needs to fence a non-responding client it calls +/sbin/nfsd-recall-failed with the first argument set to the IP address of +the client, and the second argument set to the device node without the /dev +prefix for the file system to be fenced. Below is an example file that shows +how to translate the device into a serial number from SCSI EVPD 0x80:: + + cat > /sbin/nfsd-recall-failed << EOF + +.. code-block:: sh + + #!/bin/sh + + CLIENT="$1" + DEV="/dev/$2" + EVPD=`sg_inq --page=0x80 ${DEV} | \ + grep "Unit serial number:" | \ + awk -F ': ' '{print $2}'` + + echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log + EOF diff --git a/Documentation/filesystems/nfs/pnfs-block-server.txt b/Documentation/filesystems/nfs/pnfs-block-server.txt deleted file mode 100644 index 2143673cf1544b..00000000000000 --- a/Documentation/filesystems/nfs/pnfs-block-server.txt +++ /dev/null @@ -1,37 +0,0 @@ -pNFS block layout server user guide - -The Linux NFS server now supports the pNFS block layout extension. In this -case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition -to handling all the metadata access to the NFS export also hands out layouts -to the clients to directly access the underlying block devices that are -shared with the client. - -To use pNFS block layouts with with the Linux NFS server the exported file -system needs to support the pNFS block layouts (currently just XFS), and the -file system must sit on shared storage (typically iSCSI) that is accessible -to the clients in addition to the MDS. As of now the file system needs to -sit directly on the exported volume, striping or concatenation of -volumes on the MDS and clients is not supported yet. - -On the server, pNFS block volume support is automatically if the file system -support it. On the client make sure the kernel has the CONFIG_PNFS_BLOCK -option enabled, the blkmapd daemon from nfs-utils is running, and the -file system is mounted using the NFSv4.1 protocol version (mount -o vers=4.1). - -If the nfsd server needs to fence a non-responding client it calls -/sbin/nfsd-recall-failed with the first argument set to the IP address of -the client, and the second argument set to the device node without the /dev -prefix for the file system to be fenced. Below is an example file that shows -how to translate the device into a serial number from SCSI EVPD 0x80: - -cat > /sbin/nfsd-recall-failed << EOF -#!/bin/sh - -CLIENT="$1" -DEV="/dev/$2" -EVPD=`sg_inq --page=0x80 ${DEV} | \ - grep "Unit serial number:" | \ - awk -F ': ' '{print $2}'` - -echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log -EOF -- cgit 1.2.3-korg From 98600b71f2bfc066d5dc8a25abf5fef84f8fc96c Mon Sep 17 00:00:00 2001 From: "Daniel W. S. Almeida" Date: Fri, 10 Jan 2020 20:24:30 -0300 Subject: Documentation: nfs: pnfs-scsi-server: convert to ReST Convert pnfs-scsi-server to ReST and move it to admin-guide. Content remains mostly unchanged. Signed-off-by: Daniel W. S. Almeida Link: https://lore.kernel.org/r/5c4b8af41ca0a427a3987535815bccf47a65d320.1578697871.git.dwlsalmeida@gmail.com Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/nfs/index.rst | 1 + Documentation/admin-guide/nfs/pnfs-scsi-server.rst | 24 ++++++++++++++++++++++ Documentation/filesystems/nfs/pnfs-scsi-server.txt | 23 --------------------- 3 files changed, 25 insertions(+), 23 deletions(-) create mode 100644 Documentation/admin-guide/nfs/pnfs-scsi-server.rst delete mode 100644 Documentation/filesystems/nfs/pnfs-scsi-server.txt (limited to 'Documentation/filesystems') diff --git a/Documentation/admin-guide/nfs/index.rst b/Documentation/admin-guide/nfs/index.rst index 365f42a611a435..3601a708f33302 100644 --- a/Documentation/admin-guide/nfs/index.rst +++ b/Documentation/admin-guide/nfs/index.rst @@ -11,3 +11,4 @@ NFS nfsd-admin-interfaces nfs-idmapper pnfs-block-server + pnfs-scsi-server diff --git a/Documentation/admin-guide/nfs/pnfs-scsi-server.rst b/Documentation/admin-guide/nfs/pnfs-scsi-server.rst new file mode 100644 index 00000000000000..d2f6ee55807117 --- /dev/null +++ b/Documentation/admin-guide/nfs/pnfs-scsi-server.rst @@ -0,0 +1,24 @@ + +================================== +pNFS SCSI layout server user guide +================================== + +This document describes support for pNFS SCSI layouts in the Linux NFS server. +With pNFS SCSI layouts, the NFS server acts as Metadata Server (MDS) for pNFS, +which in addition to handling all the metadata access to the NFS export, +also hands out layouts to the clients so that they can directly access the +underlying SCSI LUNs that are shared with the client. + +To use pNFS SCSI layouts with with the Linux NFS server, the exported file +system needs to support the pNFS SCSI layouts (currently just XFS), and the +file system must sit on a SCSI LUN that is accessible to the clients in +addition to the MDS. As of now the file system needs to sit directly on the +exported LUN, striping or concatenation of LUNs on the MDS and clients +is not supported yet. + +On a server built with CONFIG_NFSD_SCSI, the pNFS SCSI volume support is +automatically enabled if the file system is exported using the "pnfs" +option and the underlying SCSI device support persistent reservations. +On the client make sure the kernel has the CONFIG_PNFS_BLOCK option +enabled, and the file system is mounted using the NFSv4.1 protocol +version (mount -o vers=4.1). diff --git a/Documentation/filesystems/nfs/pnfs-scsi-server.txt b/Documentation/filesystems/nfs/pnfs-scsi-server.txt deleted file mode 100644 index 5bef7268bd9fb6..00000000000000 --- a/Documentation/filesystems/nfs/pnfs-scsi-server.txt +++ /dev/null @@ -1,23 +0,0 @@ - -pNFS SCSI layout server user guide -================================== - -This document describes support for pNFS SCSI layouts in the Linux NFS server. -With pNFS SCSI layouts, the NFS server acts as Metadata Server (MDS) for pNFS, -which in addition to handling all the metadata access to the NFS export, -also hands out layouts to the clients so that they can directly access the -underlying SCSI LUNs that are shared with the client. - -To use pNFS SCSI layouts with with the Linux NFS server, the exported file -system needs to support the pNFS SCSI layouts (currently just XFS), and the -file system must sit on a SCSI LUN that is accessible to the clients in -addition to the MDS. As of now the file system needs to sit directly on the -exported LUN, striping or concatenation of LUNs on the MDS and clients -is not supported yet. - -On a server built with CONFIG_NFSD_SCSI, the pNFS SCSI volume support is -automatically enabled if the file system is exported using the "pnfs" -option and the underlying SCSI device support persistent reservations. -On the client make sure the kernel has the CONFIG_PNFS_BLOCK option -enabled, and the file system is mounted using the NFSv4.1 protocol -version (mount -o vers=4.1). -- cgit 1.2.3-korg From 6996e8ca8ba9727aac967577277c25b91f11705a Mon Sep 17 00:00:00 2001 From: "Daniel W. S. Almeida" Date: Fri, 10 Jan 2020 20:24:31 -0300 Subject: Documentation: nfs: fault_injection: convert to ReST Convert fault_injection.txt to ReST and move it to admin-guide. Signed-off-by: Daniel W. S. Almeida Link: https://lore.kernel.org/r/f7b0cf8fb1159a668f75ce82a581e7590568c2b8.1578697871.git.dwlsalmeida@gmail.com Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/nfs/fault_injection.rst | 70 +++++++++++++++++++++++ Documentation/admin-guide/nfs/index.rst | 1 + Documentation/filesystems/nfs/fault_injection.txt | 69 ---------------------- 3 files changed, 71 insertions(+), 69 deletions(-) create mode 100644 Documentation/admin-guide/nfs/fault_injection.rst delete mode 100644 Documentation/filesystems/nfs/fault_injection.txt (limited to 'Documentation/filesystems') diff --git a/Documentation/admin-guide/nfs/fault_injection.rst b/Documentation/admin-guide/nfs/fault_injection.rst new file mode 100644 index 00000000000000..eb029c0c15ce50 --- /dev/null +++ b/Documentation/admin-guide/nfs/fault_injection.rst @@ -0,0 +1,70 @@ +=================== +NFS Fault Injection +=================== + +Fault injection is a method for forcing errors that may not normally occur, or +may be difficult to reproduce. Forcing these errors in a controlled environment +can help the developer find and fix bugs before their code is shipped in a +production system. Injecting an error on the Linux NFS server will allow us to +observe how the client reacts and if it manages to recover its state correctly. + +NFSD_FAULT_INJECTION must be selected when configuring the kernel to use this +feature. + + +Using Fault Injection +===================== +On the client, mount the fault injection server through NFS v4.0+ and do some +work over NFS (open files, take locks, ...). + +On the server, mount the debugfs filesystem to and ls +/nfsd. This will show a list of files that will be used for +injecting faults on the NFS server. As root, write a number n to the file +corresponding to the action you want the server to take. The server will then +process the first n items it finds. So if you want to forget 5 locks, echo '5' +to /nfsd/forget_locks. A value of 0 will tell the server to forget +all corresponding items. A log message will be created containing the number +of items forgotten (check dmesg). + +Go back to work on the client and check if the client recovered from the error +correctly. + + +Available Faults +================ +forget_clients: + The NFS server keeps a list of clients that have placed a mount call. If + this list is cleared, the server will have no knowledge of who the client + is, forcing the client to reauthenticate with the server. + +forget_openowners: + The NFS server keeps a list of what files are currently opened and who + they were opened by. Clearing this list will force the client to reopen + its files. + +forget_locks: + The NFS server keeps a list of what files are currently locked in the VFS. + Clearing this list will force the client to reclaim its locks (files are + unlocked through the VFS as they are cleared from this list). + +forget_delegations: + A delegation is used to assure the client that a file, or part of a file, + has not changed since the delegation was awarded. Clearing this list will + force the client to reacquire its delegation before accessing the file + again. + +recall_delegations: + Delegations can be recalled by the server when another client attempts to + access a file. This test will notify the client that its delegation has + been revoked, forcing the client to reacquire the delegation before using + the file again. + + +tools/nfs/inject_faults.sh script +================================= +This script has been created to ease the fault injection process. This script +will detect the mounted debugfs directory and write to the files located there +based on the arguments passed by the user. For example, running +`inject_faults.sh forget_locks 1` as root will instruct the server to forget +one lock. Running `inject_faults forget_locks` will instruct the server to +forgetall locks. diff --git a/Documentation/admin-guide/nfs/index.rst b/Documentation/admin-guide/nfs/index.rst index 3601a708f33302..6b5a3c90fac562 100644 --- a/Documentation/admin-guide/nfs/index.rst +++ b/Documentation/admin-guide/nfs/index.rst @@ -12,3 +12,4 @@ NFS nfs-idmapper pnfs-block-server pnfs-scsi-server + fault_injection diff --git a/Documentation/filesystems/nfs/fault_injection.txt b/Documentation/filesystems/nfs/fault_injection.txt deleted file mode 100644 index f3a5b0a8ac0526..00000000000000 --- a/Documentation/filesystems/nfs/fault_injection.txt +++ /dev/null @@ -1,69 +0,0 @@ - -Fault Injection -=============== -Fault injection is a method for forcing errors that may not normally occur, or -may be difficult to reproduce. Forcing these errors in a controlled environment -can help the developer find and fix bugs before their code is shipped in a -production system. Injecting an error on the Linux NFS server will allow us to -observe how the client reacts and if it manages to recover its state correctly. - -NFSD_FAULT_INJECTION must be selected when configuring the kernel to use this -feature. - - -Using Fault Injection -===================== -On the client, mount the fault injection server through NFS v4.0+ and do some -work over NFS (open files, take locks, ...). - -On the server, mount the debugfs filesystem to and ls -/nfsd. This will show a list of files that will be used for -injecting faults on the NFS server. As root, write a number n to the file -corresponding to the action you want the server to take. The server will then -process the first n items it finds. So if you want to forget 5 locks, echo '5' -to /nfsd/forget_locks. A value of 0 will tell the server to forget -all corresponding items. A log message will be created containing the number -of items forgotten (check dmesg). - -Go back to work on the client and check if the client recovered from the error -correctly. - - -Available Faults -================ -forget_clients: - The NFS server keeps a list of clients that have placed a mount call. If - this list is cleared, the server will have no knowledge of who the client - is, forcing the client to reauthenticate with the server. - -forget_openowners: - The NFS server keeps a list of what files are currently opened and who - they were opened by. Clearing this list will force the client to reopen - its files. - -forget_locks: - The NFS server keeps a list of what files are currently locked in the VFS. - Clearing this list will force the client to reclaim its locks (files are - unlocked through the VFS as they are cleared from this list). - -forget_delegations: - A delegation is used to assure the client that a file, or part of a file, - has not changed since the delegation was awarded. Clearing this list will - force the client to reacquire its delegation before accessing the file - again. - -recall_delegations: - Delegations can be recalled by the server when another client attempts to - access a file. This test will notify the client that its delegation has - been revoked, forcing the client to reacquire the delegation before using - the file again. - - -tools/nfs/inject_faults.sh script -================================= -This script has been created to ease the fault injection process. This script -will detect the mounted debugfs directory and write to the files located there -based on the arguments passed by the user. For example, running -`inject_faults.sh forget_locks 1` as root will instruct the server to forget -one lock. Running `inject_faults forget_locks` will instruct the server to -forgetall locks. -- cgit 1.2.3-korg From 77ce1a47ebca88bf1eb3018855fc1709c7a1ed86 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 28 Jan 2020 07:41:01 +0100 Subject: docs: filesystems: add overlayfs to index.rst While the document is there, it is currently missing at the index file. Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/3b8e7783b1fcc71e4f94af5ea8e5fa264392f8c4.1580193653.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet --- Documentation/filesystems/index.rst | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index b03578063801e8..824a3ecbb0ca5a 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -47,5 +47,6 @@ Documentation for filesystem implementations. :maxdepth: 2 autofs + overlayfs virtiofs vfat -- cgit 1.2.3-korg