Fix Mismatch Between Data And Local In Awk Command - linux

I am receiving the following error:
awk: cmd. line:1: (FILENAME=- FNR=798) warning: Invalid multibyte data detected. There may be a mismatch between your data and your locale.
The command I'm running is the following:
cat file.txt | awk 'length($0)<10000' > output-file.txt
The weird part is that if I pipe to other commands like awk '{ sub("\r$", ""); print }', it works just fine without an error.
Anyone see why I would get this error? Or, should I just ignore it?

Make the locale as C to use only ASCII character set with single byte encoding, pass LC_ALL=C to awk's environment:
LC_ALL=C awk 'length($0)<10000' file.txt >output-file.txt
Also you don't need to use cat as awk takes filename(s) as argument(s).

I've found three solutions on my machines:
Change environment variable
This has been answered on the approved one.
Add variable export LC_ALL=C to the environment.
Add parameter (only possible on gawk)
Add -b (binary) parameter. Like in:
cat file.txt | awk -b 'length($0)<10000' > output-file.txt
Use mawk instead of gawk
You can check if you are using gawk or mawk implementation on Linux (the first one is installed with a package of the same name on Ubuntu). For Ubuntu you can run
sudo update-alternatives --config awk
Source answer

Related

Trying to add a user inputted variable to end of file

Using Ubuntu.
Currently I'm trying to add a user inputted variable to the end of a file.
In short, this allows me to use a BASH script to automate adding VSFTPD users.
Currently I have used awk & sed.
I don't have the sed but here is my awk that I currently have together.
awk '{$centre_name}' /etc/vsftpd-users
GNU AWK solution
You might use -v to ram variable into GNU AWK, I would do it following way, let file.txt content be
1
2
3
then
var1="four"
awk -v var=${var1} '{print}END{print var}' file.txt
gives output
1
2
3
four
Explanation: I use -v to set awk's variable var value to value of shell's variable var1 then each line of file I just print as-is, after processing is done I do print value of var.
(tested in gawk 4.2.1)
GNU sed solution You might use $ to target last line and use command a for append line of text as follows, for file.txt shown earlier
var1="four"
sed "$ a ${var1}" file.txt
gives same output as above
(tested in GNU sed 4.5)

Are these awk commands vulnerable to code injection?

I was unsure on how to correctly script a particular awk command which uses a shell variable, when I read the answers to How do I use shell variables in an awk script?.
The accepted answer demonstrates how interpolating a shell variable in an awkcommand would be prone to malicious code injection, and while I was able to reproduce the demo, I could not find the same problem with either of the following two commands:
#HWLINK=enp10s0
ip -o route | awk '/'$HWLINK'/ && ! /default/ {print $1}'
ip -o route | awk "/$HWLINK/"' && ! /default/ {print $1}'
So, the main question is if any of these (or both) is vulnerable.
A secondary question would be which form is preferred. I tried ip -o route | awk -v hwlink="$HWLINK" '/hwlink/ && ! /default/ {print $1}' but that doesn't work.
p.s. this is a refactoring; the original command was ip -o route | grep $HWLINK | grep -v default | awk '{print $1}'.
Sure, both are vulnerable, the first a bit less so.
This breaks your second line:
HWLINK="/{}BEGIN{print \"Your mother was a hamster and your father smelt of elderberries\"}/"
The only reason it doesn't break your first line is, in order to be able to be injected into the first line it must not contain spaces.
HWLINK="/{}BEGIN{print\"Your_mother_was_a_hamster_and_your_father_smelt_of_elderberries\"}/"
I see you already got the correct syntax to use :)
Your idea was right about letting the shell variables getting interpolated inside awk could let malicious code injection. As rightly pointed use the -v syntax, but your attempt fails because the pattern match with variable doesn't work in the form /../, use the direct ~ match
ip -o route | awk -v hwlink="$HWLINK" '$0 ~ hwlink && ! /default/ {print $1}'
Recommended way to sanitize your variables passed to awk would be to use the ARGV array or ENVIRON variable. Variables passed this way don't undergo expansion done by the shell
value='foo\n\n'
awk 'BEGIN {var=ARGV[1]; delete ARGV[1]}' "$value"
If you printed the value of var inside the awk it would be a literal foo\n\n and not the multi-line string which usually happens when the shell expands it.

Assistance with bash, shell script syntax error

A former network engineer was using Xymon for node monitoring. This shell script was previously used to compare actively monitored nodes against the routing table of our core switch and output data accordingly to show any new networks.
Attempts to execute the shell script now return error:
awk: cmd. line:1: warning: escape sequence `\.' treated as plain `.'
Here's the commands applied to shell script:
cat /var/lib/rancid/dnow/configs/ushouston-dnw1-cr01 | awk '/display ip routing-table/{flag=1;next}/display vlan all/{flag=0}f$
ROUTELIST=$(grep -E "\.0/2[2,3,4]" /tmp/routes.txt | grep -v 192.168.13.1 | awk '{print $2}' | awk -F. '{print $1"\."$2"\."$3}$
for ROUTE in $ROUTELIST ; do
CHECK=$(grep -w $ROUTE /etc/hosts)
if [ "$CHECK" = "" ] ; then
echo "$ROUTE is not monitored"
fi
done
Any assistance or guidance understanding why the error is received and what needs to be adjusted is greatly appreciated.
Edit: I failed to mention this is a Linux system, kernel version 3.16.0-44-generic.
The warning is coming from this:
{print $1"\."$2"\."$3}
There's no need to escape . in ordinary strings, only in regular expressions. That should be:
{print $1"."$2"."$3}

Difference between awk -FS and awk -f in shell scripting

I am new to shell scripting and I'm very confused between awk -FS and awk -f commands used. I've tried reading multiple pages on the difference between these two but was not able to understand clearly. Kindly help.
Here is an example:
Lets consider that a text file say, data.txt has the below details.
S.No Product Qty Price
1-Pen-2-10
2-Pencil-1-5
3-Eraser-1-2
Now, when i try to use the following command:
$ awk -f'-' '{print $1,$2} data.txt
I get the below output:
1 Pen
2 Pencil
3 Eraser
But when i use the command:
$ awk -FS'-' '{print $1,$2} data.txt
the output is:
1-Pen-2-10
2-Pencil-1-5
3-Eraser-1-2
I don't understand the difference it does using the -FS command. Could somebody help me out on what exactly happens between these two commands. Thanks!
You are more confused than you think. There is no -FS.
FS is a variable that contains the field separator.
-F is an option that sets FS to it's argument.
-f is an option whose argument is the name of a file that contains the script to execute.
The scripts you posted would have produced syntax errors, not the output you say they produced, so idk what to tell you...
-FS is not an argument to awk. -F is, as is -f.
The -F argument tells awk what value to use for FS (the field separator).
The -f argument tells awk to use its argument as the script file to run.
This command (I fixed your quoting):
awk -f'-' '{print $1,$2}' data.txt
tells awk to use standard input (that's what - means) for its argument. This should hang when run in a terminal. And should be an error after that as awk then tries to use '{print $1,$2}' as a filename to read from.
This command:
awk -FS'-' '{print $1,$2}' data.txt
tells awk to use S- as the value of FS. Which you can see by running this command:
awk -FS'-' 'BEGIN {print "["FS"]"}'

Error assingning the output of a command to a variable in bash (Linux)

I wonder why this command:
FILE=`file /usr/bin/java | tr -d \`\' | awk '{print $5}'`
Results in this error message:
bash: command substitution: line 1: unexpected EOF while looking for matching ``'
bash: command substitution: line 2: syntax error: unexpected end of file
If I run the previous command without assigning it to a variable, it works as expected:
$ file /usr/bin/java | tr -d \`\' | awk '{print $5}'
/etc/alternatives/java
Does anyone know why this happens and how can I successfully assign the output value to a variable?
Note: for the curious, I'm trying to find the pointed path to a binary file from a symbolic link, so I can find out if it is a 32 or 64 bits file (in a generic way, not using something like java -version)
Note 2: I've tried removing quotes with sed instead of tr, but it returns the same error
Thank you very much in advance, regards...
Nacho
I think it's because you enclosed the commands inside back-ticks. Use $() instead of backticks.
FILE=$(file /usr/bin/java | tr -d \`\' | awk '{print $5}')

Resources