File: proc_sys_fs.5

package info (click to toggle)
manpages 6.9.1-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 19,808 kB
sloc: sh: 503; python: 222; perl: 165; makefile: 27; lisp: 22
file content (471 lines) | stat: -rw-r--r-- 14,961 bytes
.\" Copyright (C) 1994, 1995, Daniel Quinlan <quinlan@yggdrasil.com>
.\" Copyright (C) 2002-2008, 2017, Michael Kerrisk <mtk.manpages@gmail.com>
.\" Copyright (C) , Andries Brouwer <aeb@cwi.nl>
.\" Copyright (C) 2023, Alejandro Colomar <alx@kernel.org>
.\"
.\" SPDX-License-Identifier: GPL-3.0-or-later
.\"
.TH proc_sys_fs 5 2024-05-02 "Linux man-pages (unreleased)"
.SH NAME
/proc/sys/fs/ \- kernel variables related to filesystems
.SH DESCRIPTION
.TP
.I /proc/sys/fs/
This directory contains the files and subdirectories for kernel variables
related to filesystems.
.TP
.IR /proc/sys/fs/aio\-max\-nr " and " /proc/sys/fs/aio\-nr " (since Linux 2.6.4)"
.I aio\-nr
is the running total of the number of events specified by
.BR io_setup (2)
calls for all currently active AIO contexts.
If
.I aio\-nr
reaches
.IR aio\-max\-nr ,
then
.BR io_setup (2)
will fail with the error
.BR EAGAIN .
Raising
.I aio\-max\-nr
does not result in the preallocation or resizing
of any kernel data structures.
.TP
.I /proc/sys/fs/binfmt_misc
Documentation for files in this directory can be found
in the Linux kernel source in the file
.I Documentation/admin\-guide/binfmt\-misc.rst
(or in
.I Documentation/binfmt_misc.txt
on older kernels).
.TP
.IR /proc/sys/fs/dentry\-state " (since Linux 2.2)"
This file contains information about the status of the
directory cache (dcache).
The file contains six numbers,
.IR nr_dentry ,
.IR nr_unused ,
.I age_limit
(age in seconds),
.I want_pages
(pages requested by system) and two dummy values.
.RS
.IP \[bu] 3
.I nr_dentry
is the number of allocated dentries (dcache entries).
This field is unused in Linux 2.2.
.IP \[bu]
.I nr_unused
is the number of unused dentries.
.IP \[bu]
.I age_limit
.\" looks like this is unused in Linux 2.2 to Linux 2.6
is the age in seconds after which dcache entries
can be reclaimed when memory is short.
.IP \[bu]
.I want_pages
.\" looks like this is unused in Linux 2.2 to Linux 2.6
is nonzero when the kernel has called shrink_dcache_pages() and the
dcache isn't pruned yet.
.RE
.TP
.I /proc/sys/fs/dir\-notify\-enable
This file can be used to disable or enable the
.I dnotify
interface described in
.BR fcntl (2)
on a system-wide basis.
A value of 0 in this file disables the interface,
and a value of 1 enables it.
.TP
.I /proc/sys/fs/dquot\-max
This file shows the maximum number of cached disk quota entries.
On some (2.4) systems, it is not present.
If the number of free cached disk quota entries is very low and
you have some awesome number of simultaneous system users,
you might want to raise the limit.
.TP
.I /proc/sys/fs/dquot\-nr
This file shows the number of allocated disk quota
entries and the number of free disk quota entries.
.TP
.IR /proc/sys/fs/epoll/ " (since Linux 2.6.28)"
This directory contains the file
.IR max_user_watches ,
which can be used to limit the amount of kernel memory consumed by the
.I epoll
interface.
For further details, see
.BR epoll (7).
.TP
.I /proc/sys/fs/file\-max
This file defines
a system-wide limit on the number of open files for all processes.
System calls that fail when encountering this limit fail with the error
.BR ENFILE .
(See also
.BR setrlimit (2),
which can be used by a process to set the per-process limit,
.BR RLIMIT_NOFILE ,
on the number of files it may open.)
If you get lots
of error messages in the kernel log about running out of file handles
(open file descriptions)
(look for "VFS: file\-max limit <number> reached"),
try increasing this value:
.IP
.in +4n
.EX
echo 100000 > /proc/sys/fs/file\-max
.EE
.in
.IP
Privileged processes
.RB ( CAP_SYS_ADMIN )
can override the
.I file\-max
limit.
.TP
.I /proc/sys/fs/file\-nr
This (read-only) file contains three numbers:
the number of allocated file handles
(i.e., the number of open file descriptions; see
.BR open (2));
the number of free file handles;
and the maximum number of file handles (i.e., the same value as
.IR /proc/sys/fs/file\-max ).
If the number of allocated file handles is close to the
maximum, you should consider increasing the maximum.
Before Linux 2.6,
the kernel allocated file handles dynamically,
but it didn't free them again.
Instead the free file handles were kept in a list for reallocation;
the "free file handles" value indicates the size of that list.
A large number of free file handles indicates that there was
a past peak in the usage of open file handles.
Since Linux 2.6, the kernel does deallocate freed file handles,
and the "free file handles" value is always zero.
.TP
.IR /proc/sys/fs/inode\-max " (only present until Linux 2.2)"
This file contains the maximum number of in-memory inodes.
This value should be 3\[en]4 times larger
than the value in
.IR file\-max ,
since \fIstdin\fP, \fIstdout\fP
and network sockets also need an inode to handle them.
When you regularly run out of inodes, you need to increase this value.
.IP
Starting with Linux 2.4,
there is no longer a static limit on the number of inodes,
and this file is removed.
.TP
.I /proc/sys/fs/inode\-nr
This file contains the first two values from
.IR inode\-state .
.TP
.I /proc/sys/fs/inode\-state
This file
contains seven numbers:
.IR nr_inodes ,
.IR nr_free_inodes ,
.IR preshrink ,
and four dummy values (always zero).
.IP
.I nr_inodes
is the number of inodes the system has allocated.
.\" This can be slightly more than
.\" .I inode\-max
.\" because Linux allocates them one page full at a time.
.I nr_free_inodes
represents the number of free inodes.
.IP
.I preshrink
is nonzero when the
.I nr_inodes
>
.I inode\-max
and the system needs to prune the inode list instead of allocating more;
since Linux 2.4, this field is a dummy value (always zero).
.TP
.IR /proc/sys/fs/inotify/ " (since Linux 2.6.13)"
This directory contains files
.IR max_queued_events ", " max_user_instances ", and " max_user_watches ,
that can be used to limit the amount of kernel memory consumed by the
.I inotify
interface.
For further details, see
.BR inotify (7).
.TP
.I /proc/sys/fs/lease\-break\-time
This file specifies the grace period that the kernel grants to a process
holding a file lease
.RB ( fcntl (2))
after it has sent a signal to that process notifying it
that another process is waiting to open the file.
If the lease holder does not remove or downgrade the lease within
this grace period, the kernel forcibly breaks the lease.
.TP
.I /proc/sys/fs/leases\-enable
This file can be used to enable or disable file leases
.RB ( fcntl (2))
on a system-wide basis.
If this file contains the value 0, leases are disabled.
A nonzero value enables leases.
.TP
.IR /proc/sys/fs/mount\-max " (since Linux 4.9)"
.\" commit d29216842a85c7970c536108e093963f02714498
The value in this file specifies the maximum number of mounts that may exist
in a mount namespace.
The default value in this file is 100,000.
.TP
.IR /proc/sys/fs/mqueue/ " (since Linux 2.6.6)"
This directory contains files
.IR msg_max ", " msgsize_max ", and " queues_max ,
controlling the resources used by POSIX message queues.
See
.BR mq_overview (7)
for details.
.TP
.IR /proc/sys/fs/nr_open " (since Linux 2.6.25)"
.\" commit 9cfe015aa424b3c003baba3841a60dd9b5ad319b
This file imposes a ceiling on the value to which the
.B RLIMIT_NOFILE
resource limit can be raised (see
.BR getrlimit (2)).
This ceiling is enforced for both unprivileged and privileged process.
The default value in this file is 1048576.
(Before Linux 2.6.25, the ceiling for
.B RLIMIT_NOFILE
was hard-coded to the same value.)
.TP
.IR /proc/sys/fs/overflowgid " and " /proc/sys/fs/overflowuid
These files
allow you to change the value of the fixed UID and GID.
The default is 65534.
Some filesystems support only 16-bit UIDs and GIDs, although in Linux
UIDs and GIDs are 32 bits.
When one of these filesystems is mounted
with writes enabled, any UID or GID that would exceed 65535 is translated
to the overflow value before being written to disk.
.TP
.IR /proc/sys/fs/pipe\-max\-size " (since Linux 2.6.35)"
See
.BR pipe (7).
.TP
.IR /proc/sys/fs/pipe\-user\-pages\-hard " (since Linux 4.5)"
See
.BR pipe (7).
.TP
.IR /proc/sys/fs/pipe\-user\-pages\-soft " (since Linux 4.5)"
See
.BR pipe (7).
.TP
.IR /proc/sys/fs/protected_fifos " (since Linux 4.19)"
The value in this file is/can be set to one of the following:
.RS
.TP 4
0
Writing to FIFOs is unrestricted.
.TP
1
Don't allow
.B O_CREAT
.BR open (2)
on FIFOs that the caller doesn't own in world-writable sticky directories,
unless the FIFO is owned by the owner of the directory.
.TP
2
As for the value 1,
but the restriction also applies to group-writable sticky directories.
.RE
.IP
The intent of the above protections is to avoid unintentional writes to an
attacker-controlled FIFO when a program expected to create a regular file.
.TP
.IR /proc/sys/fs/protected_hardlinks " (since Linux 3.6)"
.\" commit 800179c9b8a1e796e441674776d11cd4c05d61d7
When the value in this file is 0,
no restrictions are placed on the creation of hard links
(i.e., this is the historical behavior before Linux 3.6).
When the value in this file is 1,
a hard link can be created to a target file
only if one of the following conditions is true:
.RS
.IP \[bu] 3
The calling process has the
.B CAP_FOWNER
capability in its user namespace
and the file UID has a mapping in the namespace.
.IP \[bu]
The filesystem UID of the process creating the link matches
the owner (UID) of the target file
(as described in
.BR credentials (7),
a process's filesystem UID is normally the same as its effective UID).
.IP \[bu]
All of the following conditions are true:
.RS 4
.IP \[bu] 3
the target is a regular file;
.IP \[bu]
the target file does not have its set-user-ID mode bit enabled;
.IP \[bu]
the target file does not have both its set-group-ID and
group-executable mode bits enabled; and
.IP \[bu]
the caller has permission to read and write the target file
(either via the file's permissions mask or because it has
suitable capabilities).
.RE
.RE
.IP
The default value in this file is 0.
Setting the value to 1
prevents a longstanding class of security issues caused by
hard-link-based time-of-check, time-of-use races,
most commonly seen in world-writable directories such as
.IR /tmp .
The common method of exploiting this flaw
is to cross privilege boundaries when following a given hard link
(i.e., a root process follows a hard link created by another user).
Additionally, on systems without separated partitions,
this stops unauthorized users from "pinning" vulnerable set-user-ID and
set-group-ID files against being upgraded by
the administrator, or linking to special files.
.TP
.IR /proc/sys/fs/protected_regular " (since Linux 4.19)"
The value in this file is/can be set to one of the following:
.RS
.TP 4
0
Writing to regular files is unrestricted.
.TP
1
Don't allow
.B O_CREAT
.BR open (2)
on regular files that the caller doesn't own in
world-writable sticky directories,
unless the regular file is owned by the owner of the directory.
.TP
2
As for the value 1,
but the restriction also applies to group-writable sticky directories.
.RE
.IP
The intent of the above protections is similar to
.IR protected_fifos ,
but allows an application to
avoid writes to an attacker-controlled regular file,
where the application expected to create one.
.TP
.IR /proc/sys/fs/protected_symlinks " (since Linux 3.6)"
.\" commit 800179c9b8a1e796e441674776d11cd4c05d61d7
When the value in this file is 0,
no restrictions are placed on following symbolic links
(i.e., this is the historical behavior before Linux 3.6).
When the value in this file is 1, symbolic links are followed only
in the following circumstances:
.RS
.IP \[bu] 3
the filesystem UID of the process following the link matches
the owner (UID) of the symbolic link
(as described in
.BR credentials (7),
a process's filesystem UID is normally the same as its effective UID);
.IP \[bu]
the link is not in a sticky world-writable directory; or
.IP \[bu]
the symbolic link and its parent directory have the same owner (UID)
.RE
.IP
A system call that fails to follow a symbolic link
because of the above restrictions returns the error
.B EACCES
in
.IR errno .
.IP
The default value in this file is 0.
Setting the value to 1 avoids a longstanding class of security issues
based on time-of-check, time-of-use races when accessing symbolic links.
.TP
.IR /proc/sys/fs/suid_dumpable " (since Linux 2.6.13)"
.\" The following is based on text from Documentation/sysctl/kernel.txt
The value in this file is assigned to a process's "dumpable" flag
in the circumstances described in
.BR prctl (2).
In effect,
the value in this file determines whether core dump files are
produced for set-user-ID or otherwise protected/tainted binaries.
The "dumpable" setting also affects the ownership of files in a process's
.IR /proc/ pid
directory, as described above.
.IP
Three different integer values can be specified:
.RS
.TP
\fI0\ (default)\fP
.\" In kernel source: SUID_DUMP_DISABLE
This provides the traditional (pre-Linux 2.6.13) behavior.
A core dump will not be produced for a process which has
changed credentials (by calling
.BR seteuid (2),
.BR setgid (2),
or similar, or by executing a set-user-ID or set-group-ID program)
or whose binary does not have read permission enabled.
.TP
\fI1\ ("debug")\fP
.\" In kernel source: SUID_DUMP_USER
All processes dump core when possible.
(Reasons why a process might nevertheless not dump core are described in
.BR core (5).)
The core dump is owned by the filesystem user ID of the dumping process
and no security is applied.
This is intended for system debugging situations only:
this mode is insecure because it allows unprivileged users to
examine the memory contents of privileged processes.
.TP
\fI2\ ("suidsafe")\fP
.\" In kernel source: SUID_DUMP_ROOT
Any binary which normally would not be dumped (see "0" above)
is dumped readable by root only.
This allows the user to remove the core dump file but not to read it.
For security reasons core dumps in this mode will not overwrite one
another or other files.
This mode is appropriate when administrators are
attempting to debug problems in a normal environment.
.IP
Additionally, since Linux 3.6,
.\" 9520628e8ceb69fa9a4aee6b57f22675d9e1b709
.I /proc/sys/kernel/core_pattern
must either be an absolute pathname
or a pipe command, as detailed in
.BR core (5).
Warnings will be written to the kernel log if
.I core_pattern
does not follow these rules, and no core dump will be produced.
.\" 54b501992dd2a839e94e76aa392c392b55080ce8
.RE
.IP
For details of the effect of a process's "dumpable" setting
on ptrace access mode checking, see
.BR ptrace (2).
.TP
.I /proc/sys/fs/super\-max
This file
controls the maximum number of superblocks, and
thus the maximum number of mounted filesystems the kernel
can have.
You need increase only
.I super\-max
if you need to mount more filesystems than the current value in
.I super\-max
allows you to.
.TP
.I /proc/sys/fs/super\-nr
This file
contains the number of filesystems currently mounted.
.SH SEE ALSO
.BR proc (5),
.BR proc_sys (5)