1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
|
# SPDX-License-Identifier: GPL-2.0
# Page Isolation
# Note: Run-time configuration is unsupported, service restart needed.
# Note: this file should be installed at /etc/sysconfig/rasdaemon
# Specify the threshold of isolating buggy pages.
#
# Format:
# [0-9]+[unit]
# Notice: please make sure match this format, rasdaemon will use default value for exception input cases.
#
# Supported units:
# PAGE_CE_REFRESH_CYCLE: D|d (day), H|h (hour), M|m (min), default is in hour
# PAGE_CE_THRESHOLD: K|k (x1000), M|m (x1000k), default is none
#
# The two configs will only take no effect when PAGE_CE_ACTION is "off".
PAGE_CE_REFRESH_CYCLE="24h"
PAGE_CE_THRESHOLD="50"
# Specify the threshold of isolating buggy memory rows.
#
# Format:
# [0-9]+[unit]
# Notice: please make sure match this format, rasdaemon will use default value for exception input cases.
#
# Supported units:
# ROW_CE_REFRESH_CYCLE: D|d (day), H|h (hour), M|m (min), default is in hour
# ROW_CE_THRESHOLD: K|k (x1000), M|m (x1000k), default is none
#
# The two configs will only take no effect when PAGE_CE_ACTION is "off".
ROW_CE_REFRESH_CYCLE="24h"
ROW_CE_THRESHOLD="50"
# Specify the internal action in rasdaemon to exceeding a row error threshold.
#
# off no action
# account only account errors
# soft try to soft-offline row without killing any processes
# This requires an uptodate kernel. Might not be successfull.
# hard try to hard-offline row by killing processes
# Requires an uptodate kernel. Might not be successfull.
# soft-then-hard First try to soft offline, then try hard offlining.
# Note: default offline choice is "off".
ROW_CE_ACTION="off"
# Specify the internal action in rasdaemon to exceeding a page error threshold.
#
# off no action
# account only account errors
# soft try to soft-offline page without killing any processes
# This requires an uptodate kernel. Might not be successfull.
# hard try to hard-offline page by killing processes
# Requires an uptodate kernel. Might not be successfull.
# soft-then-hard First try to soft offline, then try hard offlining.
# Note: default offline choice is "soft".
PAGE_CE_ACTION="soft"
# CPU Online Fault Isolation
# Whether to enable cpu online fault isolation (yes|no).
CPU_ISOLATION_ENABLE="no"
# Specify the threshold of CE numbers.
#
# Format:
# [0-9]+[unit]
#
# Supported units:
# CPU_CE_THRESHOLD: no unit
# CPU_ISOLATION_CYCLE: D|d (day), H|h (hour), M|m (minute), S|s (second), default is in second
CPU_CE_THRESHOLD="18"
CPU_ISOLATION_CYCLE="24h"
# Prevent excessive isolation from causing an avalanche effect
CPU_ISOLATION_LIMIT="10"
# Event Trigger
# Event trigger will be executed when the specified event occurs.
#
# Execute triggers path
# For example: TRIGGER_DIR=/etc/ras/triggers
TRIGGER_DIR=
# Execute these triggers when the mc_event occured, the triggers will not
# be executed if the trigger is not specified.
# For example:
# MC_CE_TRIGGER=mc_event_trigger
# MC_UE_TRIGGER=mc_event_trigger
MC_CE_TRIGGER=
MC_UE_TRIGGER=
# CE Statistic Threshold
#
# Specify the threshold of CE per second.
MC_CE_STAT_THRESHOLD=2000
# Poison page statistics
#
# Supported units:
# POISON_STAT_THRESHOLD: kB
POISON_STAT_THRESHOLD=102400
ERST_DELETE=1
|