File: config.sed

package info (click to toggle)
sedsed 2.0.0-3
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 4,008 kB
sloc: sed: 37,239; ansic: 2,395; python: 815; sh: 624; makefile: 95
file content (454 lines) | stat: -rw-r--r-- 12,872 bytes
parent folder | download | duplicates (2)
#n
#! /bin/sed -nf

####### DESCRIPTION ############################################################
#
# config.sed
# version 1.0
#
# This script processes a configuration file.
#
# The input to the script is a configuration file. Each line of the
# configuration file should be one of the following:
#
# 	1) a blank line (with or without whitespace);
# 	2) a comment;
# 	3) an assignment via equals ('=');
# 	4) an assignment via here document ('<<' or '<<-').
#
# The syntax of a comment is:
#
# 	[<ws>] '#' [<text>]
#
# The syntax of an assignment via equals is:
#
# 	[<ws>] <name> [<ws>] '=' [<ws>] [<value>] [<comment>]
#
# The name must begin with a letter or underscore and can contain any
# combination of digits, letters, dashes, periods, slashes, colons, and
# underscores.
#
# If the value is not present, the entire line will be treated as a comment. If
# the value is present, it must be either unquoted, single-quoted, or
# double-quoted. The value may begin on a subsequent line if each newline after
# the operator (and before the value) is immediately preceded by a backslash,
# and the backslash is the first non-whitespace character after the operator (or
# the first non-whitespace character of the line, in the case of intermittent
# lines). The value may continue over multiple lines if each newline within the
# value is immediately preceeded by a backslash. Such backslash-newline pairs
# will be removed and will not be included as part of the value. A newline
# cannot be part of a value assigned via equals.
#
# An unquoted value extends from the first non-whitespace character to the last
# non-whitespace character (or to the last non-whitespace character before a
# comment, if a comment is present). A pound sign may be included as part of the
# value if escaped by a backslash; the backslash will be removed and will not be
# included as part of the value. Each single quote and double quote (other than
# as the first character) will be treated as if escaped by a backslash (that is,
# no split tokenization will be performed. Each dollar sign, grave accent, and
# backslash (other than a backslash used to escape another character or a
# newline) will be treated as if escaped by a backslash (that is, no parameter
# substitution or command substitution will be performed); if such a backslash
# is present, it will be removed and will not be included as part of the value.
# Otherwise, each backslash (other than a backslash used to escape a newline)
# will be treated literally.
#
# A single-quoted value extends from the first character after the opening quote
# to the last character before the closing quote. A single quote cannot be
# included as part of the value. Each backslash (other than a backslash used to
# escape a newline) will be treated literally.
#
# A double-quoted value extends from the first character after the opening quote
# to the last character before the closing quote. A double quote may be included
# as part of the value if escaped by a backslash; the backslash will be removed
# and will not be included as part of the value. Each dollar sign, grave accent,
# and backslash (other than a backslash used to escape another character or a
# newline) will be treated as if escaped by a backslash (that is, no parameter
# substitution or command substitution will be performed); if such a backslash
# is present, it will be removed and will not be included as part of the value.
# Otherwise, each backslash (other than a backslash used to escape a newline)
# will be treated literally.
#
# The syntax of an assignment via here document is:
#
# 	[<ws>] <name> [<ws>] { '<<' | '<<-' } [<ws>] [<word> [<ws>]
# 	[<value-line>...]
# 	<word>]
#
# The name must adhere to the same syntax as specified for an assignment by
# equals.
#
# If the first occurrance of the word is not present, the entire line will be
# treated as a comment and processing will continue with the next line,
# interpreting it outside the context of a here document. If the first
# occurrance of the word is present, it extends from the first non-whitespace
# character after the operator to the last non-whitespace character, and will be
# treated literally (that is, no parameter substitution or command substitution
# will be performed, either in the word or in any of the value lines).
#
# If the operator is '<<' (meaning the here document is unindented), the value
# will be the concatenation of value lines, read literally, until the second
# occurrance of the word is encountered on a line by itself. The value will
# include each newline that separates a pair of value lines, but will not
# include the newline that separates the last value line from the second
# occurrance of the word.
#
# If the operator is '<<-' (meaning the here document is indented), the result
# will be the same as if the operator was '<<' except that leading tabs will be
# stripped from each line (including the second occurrance of the word).
#
# Since the here document is read literally, comments that appear in the here
# document will be included as part of the value. Also, note that this is not a
# true here document, since it is stored by sed and not in a temporary file.
#
# Because each value is stored by sed, the maximum length of a value is
# approximately eight kilobytes.
#
# The output of the script is a series of lines, one line per assignment in the
# configuration file. The syntax of each output line is:
#
# 	<name> '=' <encoded-value>
#
# Note that each name is guaranteed not to contain an equals sign, and each
# output line will contain no whitespace, unless whitespace is part of the
# value. Thus, the name of the assignment is the first character of the line to
# the last character before the first equals sign, and the encoded value of the
# assignment is the first character after the first equals sign to the last
# character of the line (not including the newline).
#
# In each encoded value, each double quote, dollar sign, grave accent, and
# backslash will be escaped by a backslash, and each newline (present only if
# assigned via here document) will be represented by the escape sequence '\n'.
# This is necessary for representing a newline and useful for assigning both an
# encoded value and a decoded value to shell parameters. Assuming the shell
# parameter 'LINE' contains an output line, consider the following:
#
# 	NAME=`echo $LINE | cut -f 1 -d "="`
# 	VALUE_ENCODED=`echo $LINE | cut -f 2- -d "="`
# 	eval "VALUE_DECODED=\"$VALUE_ENCODED\""
#
# The script will output a line as each assignment in the configuration file is
# processed, until the end of the configuration file is reached or until a
# syntax error is encountered in the configuration file. If the end of the
# configuration file is reached, the last output line will be a status code that
# indicates success. If a syntax error is encountered in the configuration file,
# the last output line will be a status code that indicates the error. The
# status code will be one of the following:
#
# 	 0  success;
# 	 1  invalid name;
# 	 2  beginning of value is folded beyond end-of-file;
# 	 3  unquoted value is folded beyond end-of-file;
# 	 4  single-quoted value is folded beyond end-of-file;
# 	 5  single-quoted value is missing closing quote;
# 	 6  single-quoted value contains extraneous quote;
# 	 7  double-quoted value is folded beyond end-of-file;
# 	 8  double-quoted value is missing closing quote;
# 	 9  double-quoted value contains extraneous quote;
# 	10  unindented here document continues beyond end-of-file;
# 	11  indented here document continues beyond end-of-file.
#
# If the status code is greater than 0, the second-to-last output line will be
# the line number at which the syntax error was encountered. If the status code
# is 1, the third-to-last output line will be the line of the configuration file
# with the invalid name. If the status code is greater than 1, the third-to-last
# output line will be the name that was to be assigned an invalid value.
#
#


####### HISTORY ################################################################
#
# 2001.11.28	Nathan D. Ryan <ryannd@cs.colorado.edu>
# 		Initial implementation. This probably would have been easier in
#		perl, but I got carried away...
#
#


# process a blank line
#
/^[	 ]*$/ b is_eof


# process a comment
#
/^[	 ]*#/ b is_eof


# process an assignment via equals ('=')
#
/^[	 ]*[A-Za-z_][0-9A-Za-z\.\/:_-]*[	 ]*=/ {

	# copy the name into the hold space, removing both extraneous whitespace
	# and the operator
	#
	h; s/^[	 ]*\([^	 =]*\).*$/\1/
	x; s/^[^=]*=[	 ]*\(.*\)$/\1/

	# unfold lines before the value
	#
	:unfold
	/^\\$/ {
		$ {
			s/^.*$/2/; b quit
		}
		N; s/\\\n[	 ]*//; b unfold
	}

	# process a null value
	#
	/^$/ b is_eof

	# process a null value with a comment
	#
	/^#/ b is_eof

	# process an unquoted value
	#
	/^[^"']/ {
		:unfold0

		# reset the replacement test
		#
		t reset0
		:reset0

		# replace a trailing comment with a space (the space prevents
		# unintended line folding)
		#
		s/\([^\\]\)\(\\*\)\2#.*$/\1\2\2 /; t trim0

		# unfold lines within the value
		#
		/[^\\]\(\\*\)\1\\$/ {
			$ {
				s/^.*$/3/; b quit
			}
			N; s/\\\n//; b unfold0
		}

		:trim0
		# remove trailing whitespace
		#
		s/[	 ]*$//

		# remove each backslash used to escape a pound sign or a single
		# quote
		#
		s/^\(\\*\)\1\\\([#']\)/\1\1\2/g
		s/\([^\\]\)\(\\*\)\1\\\([#']\)/\1\2\2\3/g

		b decode
	}

	# process a single-quoted value
	#
	/^'/ {
		:unfold1

		# reset the replacement test
		#
		t reset1
		:reset1

		# remove a trailing comment
		#
		s/\(.'[	 ]*\)#.*$/\1/; t trim1

		# handle an extraneous quote
		#
		/.'[	 ]*[^	 ]/ {
			s/^.*$/6/; b quit
		}

		# unfold lines within the value
		#
		/\\$/ {
			$ {
				s/^.*$/4/; b quit
			}
			N; s/\\\n//; b unfold1
		}

		:trim1
		# remove trailing whitespace
		#
		s/[	 ]*$//

		# handle a missing closing quote
		#
		/'$/ !{
			s/^.*$/5/; b quit
		}

		# remove the opening quote and the closing quote
		#
		s/^'\(.*\)'$/\1/

		b encode
	}

	# process a double-quoted value
	#
	/^"/ {
		:unfold2

		# reset the replacement test
		#
		t reset2
		:reset2

		# remove a trailing comment
		#
		s/\([^\\]\)\(\\*\)\2"[	 ]*#.*$/\1\2\2"/; t trim2

		# handle an extraneous quote
		#
		/[^\\]\(\\*\)\1"[	 ]*[^	 ]/ {
			s/^.*$/9/; b quit
		}

		# unfold lines within the value
		#
		/[^\\]\(\\*\)\1\\$/ {
			$ {
				s/^.*$/7/; b quit
			}
			N; s/\\\n//; b unfold2
		}

		:trim2
		# remove trailing whitespace
		#
		s/[	 ]*$//

		# handle a missing closing quote
		#
		/"$/ !{
			s/^.*$/8/; b quit
		}

		# remove the opening quote and the closing quote
		#
		s/^"\(.*\)"$/\1/

		b decode
	}
}


# process an assignment via here document ('<<' or '<<-')
#
/^[	 ]*[A-Za-z_][0-9A-Za-z\.\/:_-]*[	 ]*<</ {

	# copy the name into the hold space, removing extraneous whitespace
	# while retaining the operator
	#
	h; s/^[	 ]*\([^	 <]*\).*$/\1/
	x; s/^[^<]*\(<<.*\)$/\1/

	# process a null value
	#
	/^<<-\{0,1\}[	 ]*$/ b is_eof

	# process a value as an unindented here document
	#
	/^<<[^-]/ {

		# remove the operator
		#
		s/^<<[	 ]*\(.*\)$/\1/

		# add each line of the here document
		#
		:here0
		$ {
			s/^.*$/10/; b quit
		}
		N
		/^\(.*\)\n.*\n\1$/ !{
			b here0
		}

		# remove the opening word, the final newline, and the closing
		# word
		#
		s/^\(.*\)\n\(.*\)\n\1$/\2/

		b encode
	}

	# process a value as an indented here document
	#
	/^<<-.*$/ {

		# remove the operator
		#
		s/^<<-[	 ]*\(.*\)$/\1/

		# add each line of the here document, removing leading tabs
		#
		:here1
		$ {
			s/^.*$/11/; b quit
		}
		N; s/^\(.*\n\)	*\(.*\)$/\1\2/
		/^\(.*\)\n.*\n\1$/ !{
			b here1
		}

		# remove the opening word, the final newline, and the closing
		# word
		#
		s/^\(.*\)\n\(.*\)\n\1$/\2/

		b encode
	}
}


# process an invalid line
#
h; s/^.*$/1/


# output the results of an invalid assignment and quit; the hold space should
# contain either a line with an invalid name or a valid name that was to be
# assigned an invalid value, and the pattern space should contain a status code
# that indicates the error
#
:quit
x; p; =; x; p
q


# decode the value; remove each backslash used to escape a double quote, a
# dollar sign, a grave accent, or another backslash
#
:decode
s/^\(\\*\)\1\\\(["$`]\)/\1\1\2/g
s/\([^\\]\)\(\\*\)\1\\\(["$`]\)/\1\2\2\3/g
s/\\\\/\\/g

# encode the value; add a backslash to escape each double quote, dollar sign,
# grave accent, and backslash; replace each newlin with the escape sequence '\n'
#
:encode
s/\\/\\\\/g
s/\(["$`]\)/\\\1/g
s/\n/\\n/g

# output the results of a valid assignment; the hold space contains the name,
# name, and the pattern space contains the encoded value
#
x; s/^\(.*\)$/\1=/; G; s/\n//; p


# determine if the last line was just processed; if so, output a status code
# that indicates success
#
:is_eof
$ {
	s/^.*$/0/; p
}