Are there any rules for unix/linux shell variable naming?
For example, like the common rules for Java variable naming.
You have to be very careful not to use any UNIX command as a variable. It will mess the code and produce unexpected results. Also, keep in mind the reserved words (if, else, elif, do, done...) and that uppercase variables are reserved for system use.
From Rules for Naming variable name:
Variable name must begin with alphanumeric alpha character or underscore
character (_), followed by one or more alphanumeric or underscore
characters. Valid shell variable examples
Or as seen in The Open Group Base Specifications Issue 7:
In the shell command language, a word consisting solely of
underscores, digits, and alphabetics from the portable character set.
The first character of a name is not a digit.
Related
When I use this command in Linux
systemctl --user show-environment
After one option is listed, how can I change it with command or find relevant configuration files?
Add user environment variables to $HOME/.profile.
export SOME_VALUE=the-value
For system wide changes look at /etc/profile and /etc/profile.d/.
From man systemctl:
Environment Commands
systemd supports an environment block that is passed to processes the manager spawns. The names of the variables can contain
ASCII letters,
digits, and the underscore character. Variable names cannot be empty or start with a digit. In variable values, most characters are
allowed, but
the whole sequence must be valid UTF-8. (Note that control characters like newline (NL), tab (TAB), or the escape character
(ESC), are valid
ASCII and thus valid UTF-8). The total length of the environment block is limited to _SC_ARG_MAX value defined by
sysconf(3).
show-environment
Dump the systemd manager environment block. This is the environment block that is passed to all processes the manager spawns.
The
environment block will be dumped in straightforward form suitable for sourcing into most shells. If no special characters or
whitespace is
present in the variable values, no escaping is performed, and the assignments have the form "VARIABLE=value". If whitespace or
characters
which have special meaning to the shell are present, dollar-single-quote escaping is used, and assignments have the form
"VARIABLE=$'value'". This syntax is known to be supported by bash(1), zsh(1), ksh(1), and busybox(1)'s ash(1), but not dash(1)
or fish(1).
set-environment VARIABLE=VALUE...
Set one or more systemd manager environment variables, as specified on the command line. This command will fail if variable
names and values
do not conform to the rules listed above.
In a Linux environment I want to create a variable name with dashes. This is possible as I can set a name like that in jenkins, for which env gives the output (amongst other lines):
variable-with-dashes=test
But how can do that directly on the shell? Doing
export variable-with-dashes=test
gives an error
-bash: export: `variable-with-dashes=test': not a valid identifier
In both cases the shell seems to be /bin/bash.
I've never met a Bourne-style shell that allowed - in a variable name. Only ASCII letters (of either case), _ and digits are supported, and the first character must not be a digit.
If you have a program that requires an environment variable that doesn't match the shell restrictions, launch it with the env program.
env 'strange-name=some value' myprogram
Note that some shells (e.g. modern dash, mksh, zsh) remove variables whose name they don't like from the environment. (Shellshock has caused people to be more cautious about environment variable names, so restrictions are likely to become tighter over time, not more permissive.) So if you need to pass a variable whose name contains special character to a program, pass it directly, without a shell in between (env 'strange-name=some value' sh -c'…; myprogram' may or may not work).
https://unix.stackexchange.com/questions/23659/can-shell-variable-name-include-a-hyphen-or-dash
A name in bash shell is defined as:
A word consisting solely of letters, numbers, and underscores, and beginning with a letter or underscore. Names are used as shell variable and function names.
It's not possible to use - in names.
But how can do that directly on the shell?
I think you can write a bash builtin and in that builtin call setenv(3) to set your environment variable.
I want to restrict a UTF-8 string to only script characters in any language. By script characters I mean only those characters in the language's written script, i.e. no symbols or special characters. Same as scripts here: http://www.unicode.org/charts/index.html
Would I have to go off and identify these character ranges for each and every language in UTF-8? Or is something e.g. regex, library... that I can make use of?
Depending on the language you're implementing this in, you might be able to use Unicode character categories in regular expressions.
The following expression should match all letters and numbers, but exclude punctuation, whitespace, symbols, etc.
[\p{L}\p{N}]*
Here's a small demo on regex101.
I'm changing some notation in a few source code files.
In particular, variable names using the format
m_variable1
m_anothervariable
should be renamed and reformatted to
mVariable1
mAnotherVariable
That is, substitute m_ with m and make the next character uppercase.
I know how todo simple substitutions, like
%s/m_/m/gc
using vim, but not sure how to add syntax for changing a char to uppercase in a substitute statement?
You can make the first character of variable name uppercase, but I think you can hardly separate words from a consecutive string simply by built-in command.
I hope following command will help you:
:%s/\vm_(\w+)/m\u\1/g
Explaination
\v enables the 'very magic' mode
\u makes the first character of word after it uppercase
\1 references the first captured group
Result
mVariable1
mAnothervariable
I am reading this tutorial, and I encountered that bash script uses [...] as a wild card character. So what exactly [...] stands in a bash script?
It's a regex-style character matching syntax; from the Bash Reference Manual, §3.5.8.1 (Pattern Matching):
[...]
Matches any one of the enclosed characters. A pair of characters separated by a hyphen denotes a range expression; any character that sorts between those two characters, inclusive, using the current locale's collating sequence and character set, is matched. If the first character following the ‘[’ is a ‘!’ or a ‘^’ then any character not enclosed is matched. A ‘−’ may be matched by including it as the first or last character in the set. A ‘]’ may be matched by including it as the first character in the set. The sorting order of characters in range expressions is determined by the current locale and the value of the LC_COLLATE shell variable, if set.
For example, in the default C locale, ‘[a-dx-z]’ is equivalent to ‘[abcdxyz]’. Many locales sort characters in dictionary order, and in these locales ‘[a-dx-z]’ is typically not equivalent to ‘[abcdxyz]’; it might be equivalent to ‘[aBbCcDdxXyYz]’, for example. To obtain the traditional interpretation of ranges in bracket expressions, you can force the use of the C locale by setting the LC_COLLATE or LC_ALL environment variable to the value ‘C’.
Within ‘[’ and ‘]’, character classes can be specified using the syntax [:class:], where class is one of the following classes defined in the posix standard:
alnum alpha ascii blank cntrl digit graph lower
print punct space upper word xdigit
A character class matches any character belonging to that class. The word character class matches letters, digits, and the character ‘_’.
Within ‘[’ and ‘]’, an equivalence class can be specified using the syntax [=c=], which matches all characters with the same collation weight (as defined by the current locale) as the character c.
Within ‘[’ and ‘]’, the syntax [.symbol.] matches the collating symbol symbol.
(emphasis added to the most common usage patterns)
It is used in the tutorial to speak about regular expressions in addition to globbing ('*' and '?'). For example [a-z] regular expression will match one lowercase character.
Actually, what is a wildcard is [abc] for example. It matches one of the three letters.