Linux - Read a record till to the end with awk - linux

lets say i have this text from the logfile:
Jun 10 11:09:07 mylinux daemon.notice openvpn[3710]: TCPv4_CLIENT link remote: 1.22.333.444:1111
But i don't need the part between "mylinux" and the next colon:
Thats the part i try to remove: daemon.notice openvpn[3710]
I "solved" it with awk, but thats not a good solution.
awk '{print $1,$2,$3,$4,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20;}' /var/log/messages
I just wrote many "$" to cover as many lines as possible, but this won't work if there are more lines then $ ofc.
I know i can check how many lines exist with "NF", but i don't know how to use this information.
Thats how records in a logfile look like:
Jun 10 11:47:29 FeketeLUA daemon.notice openvpn[3710]: LZO compression initialized
Jun 10 11:47:29 FeketeLUA daemon.notice openvpn[3710]: Attempting to establish TCP connection with 5.55.222.34:1122 [nonblock]
Jun 10 11:47:30 FeketeLUA daemon.notice openvpn[3710]: TCP connection established with 12.11.123.444:1111

I think regexes are the way to go here. This is possible with awk but easier with Perl:
perl -pe 's/mylinux\K.*?(?=TCPv4_CLIENT)/ /' /var/log/messages
Where
Everything before \K has to be there but is not considered part of the match (that is later replaced)
.*? matches any string non-greedily (i.e., the shortest possible match is taken rather than the longest)
(?=TCPv4_CLIENT) is a lookahead term that matches an empty string if (and only if) it is followed by TCPv4_CLIENT)
So the regex will match the part between mylinux and the first TCPv4_CLIENT that comes after it and replace it with a space.
Update: It's actually easier for the changed question since the ending delimiter is part of the removed match and we don't need the lookahead term for it:
perl -pe 's/FeketeLUA\K.*?://' /var/log/messages
\K and .*? continue to work as described before.

I must be missing something because it sounds like all you need is:
$ sed -r 's/(mylinux)[^:]+:/\1/' file
Jun 10 11:09:07 mylinux TCPv4_CLIENT link remote: 1.22.333.444:1111
$ awk '{x="mylinux"; sub(x"[^:]+:",x)} 1' file
Jun 10 11:09:07 mylinux TCPv4_CLIENT link remote: 1.22.333.444:1111
If instead you wanted to remove between 2 points without mentioning "mylinux" for example then that'd just be:
$ sed -r 's/(([^ ]+ +){4})[^:]+: /\1/' file
Jun 10 11:09:07 mylinux TCPv4_CLIENT link remote: 1.22.333.444:1111
$ awk '{print gensub(/(([^ ]+ +){4})[^:]+: /,"\\1","")}' file
Jun 10 11:09:07 mylinux TCPv4_CLIENT link remote: 1.22.333.444:1111
That 2nd awk command used gawk for gensub() - with other awks you'd use match()+substr().

Gnu awk way
awk 'match($0,/(.*mylinux).*(TCPv4_CLIENT.*)/,a){print a[1],a[2]}' file
Jun 10 11:09:07 mylinux TCPv4_CLIENT link remote: 1.22.333.444:1111
Capture the bits you want in array a, then prints them.

Related

wp wc product_cat get - Remove leading whitespace?

I am using the WP-CLI for updating WooCommerce product_cat terms. When using wp wc product_cat get to retrieve individual fields, a line feed character (a0) seems to get inserted as leading character. Example:
$ echo "»"$(wp wc product_cat --user=4 get 44277 --field="description")"«"
» All widgets for A.-C.«
Another example - Note that the leading character is before the opening "
$ i1=$(wp wc product_cat --user=4 get 18869 --field="name" --format="json")
$ echo "format=json: »"$i1"«"
format=json: » "AEG"«
Additional information:
This happens for all fields
I verified that the added character is a0 by updating the field and checking in the database
Using --format didn't make a difference
Using --context didn't make a difference
I'm working on Linux Mint with Bash version 5.0.17(1).
Did I make a mistake somehwere in my syntaxis that inadvertently inserted this leading character? Or am I missing something in how WP-CLI or Bash works? Thanks in advance! Jeroen

funky file name output from shell/bash?

So, im making a small script to do an entire task for me. The task is to get the output of the dmidecode -Fn into a text file and then take a part of the dmidecode output, in my example, the Address (0xE0000) as the file name of the txt.
My script goes as follows and does work, i have tested it. The only little issue that i have, is that the file name of the txt appears as "? 0xE0000.txt"
My question is, why am i getting a question mark followed by a space in the name?
#!/bin/bash
directory=$(pwd)
name=$(dmidecode|grep -i Address|sed 's/Address://')
inxi -Fn > $directory/"$name".txt
The quotes in the "$name".txt is to avoid an "ambiguous redirect" error i got when running the script.
Update #Just Somebody
root#server:/home/user/Desktop# dmidecode | sed -n 's/Address://p'
0xE0000
root#server:/home/user/Desktop#
Solution
The use of |sed -n 's/^.*Address:.*0x/0x/p' got rid of the "? " in 0xE0000.txt
A big thanks to everyone!
You've got a nonprinting char in there. Try:
dmidecode |grep -i Address|sed 's/Address://'| od -c
to see exactly what you're getting.
UPDATE: comments indicate there's a tab char in there that needs to be cleaned out.
UPDATE 2: the leading tab is before the word Address. Try:
name=$(dmidecode |grep -i Address|sed 's/^.*Address:.*0x/0x/')
or as #just_somebody points out:
name=$(dmidecode|sed -n 's/^.*Address:.*0x/0x/p')
UPDATE 3
This changes the substitution regex to replace
^ (start of line) followed by .* (any characters (including tab!)) followed by Address: followed by .* (any characters (including space!)) followed by 0x (which are always at the beginning of the address since it's in hex)
with
0x (because you want that as part of the result)
If you want to learn more, read about sed regular expressions and substitutions.

Applying a patch to files with spaces in names

Here's an output of diff -u "temp temp/docs 1.txt" "temp temp/docs 2.txt":
--- temp temp/docs 1.txt Mon Apr 7 16:15:08 2014
+++ temp temp/docs 2.txt Mon Apr 7 16:18:45 2014
## -2,6 +2,6 ##
22
333
4444
-555555
+55555
666666
7777777
However, feeding this diff to patch -u fails with following message:
can't find file to patch at input line 3
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
--------------------------
|--- temp temp/docs 1.txt Mon Apr 7 16:15:08 2014
|+++ temp temp/docs 2.txt Mon Apr 7 16:18:45 2014
--------------------------
Apparently, the spaces are the problem; is there a way to make patch to work on files with spaces in names?
No, GNU patch doesn't support this. Here's the official statement: http://www.gnu.org/software/diffutils/manual/html_node/Unusual-File-Names.html#Unusual%20File%20Names
Gnu patch 2.6.1 (linux) seems to obey at least 1 space (not tried with more) if the filename is separated from the date with tab.
YYMV
I encountered the same problem when trying to establish conventions how to do manual version control with diff and patch.
I found out that GNU "diff" creates quoted path names in the patch headers if they contain spaces, while BusyBox "diff" doesn't.
Neither GNU nor BusyBox "patch" accepts quoted path names.
If the problem is just embedded spaces within filenames, it can therefore be avoided by using "busybox patch" rather than GNU "patch".
Another solution is to postprocess the output of GNU "diff" before feeding it into "patch":
sed 's,^\([-+]\{3\} \)"\([^"]*\)",\1\2,' $PATCHFILE | patch -p1
This works whether $PATCHFILE was created with GNU or busybox diff, but will only work with unified diff format.
Unfortunately, it turns out that leading or trailing spaces in filenames cannot be preserved with this method, as "patch" will skip them when parsing the path names from the patch instructions.
The approach will neither work if the filename starts with a literal double quote - but then, who uses such file names?
Most of the time, however, the above approach works just fine.
Finally a note of other approaches I have also tried but which did not work:
First I tried to replace the quotation of the whole path names by individually quoted path name components. This failed because "patch" does not use double quotes as meta-characters at all. It considers them to be normal literal characters.
Then I tried to replace all spaces by "\040" like CVS does - but "patch" does not seem to accept octal-escapes either, and this failed too.

Extracting IP addresses from text file with batch

I have a text file with data like this:
Aug 21 [10.23.5.5] Teardown dynamic
Aug 18 [10.150.1.45] Aug 21 15:28:34 otoldc
Aug 24 [10.96.5.10] Aug 21 2012 18:58:26 HYD
Aug 24 [10.96.5.10] Aug 22 2012 18:58:26 HYD
Aug 21 [192.168.15.231] sendmail[18831]
I need to remove everything except IP addresses surrounded by "[" and "]". String length before "[" is fixed. String length after "]" varied.
I tried use one of existing solutions here but couldn't get success. Is it possible to do it using batch?
Thanks:-)
directly from command line: for /f "tokens=2 delims=[]" %F in (file.txt) do echo %F. Redirect as you wish.
Not as flexible as sed/awk & regexes, but it does not require external tools.
If you plan to put together something more complex though, I would really look to more powerful tools - apart from already mentioned awk or Perl natural choice on Win would be Powershell.
Install a version of sed if it's not already on your system.
$ sed -r -e 's/^[^[]*\[([^]\]*)].*/\1/' file.txt
10.23.5.5
10.150.1.45
10.96.5.10
10.96.5.10
192.168.15.231
This sed one-liner 'script' outputs each input line after removing everything from the lines except the contents inside the first set of [] square brackets on the line - it does not check those contents to make sure it matches an IP address.
You tagged this as batch, so I assume this is on Windows and not linux. All the same, I'd highly recommend you head over to Cygwin's website and download a copy. This will give you access to the cat and grep commands, which make this much simpler. Once you have Cygwin installed, you can run the following command to parse out the IP addresses from your log file.
cat your.log | grep -oE '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' > ips.txt
Cheers

filename last modification date shell in script

I'm using bash to build a script where I will get a filename in a variable an then with this variable get the file unix last modification date.
I need to get this modification date value and I can't use stat command.
Do you know any way to get it with the common available *nix commands?
Why you shouldn't use ls:
Parsing ls is a bad idea. Not only is the behaviour of certain characters in filenames undefined and platform dependant, for your purposes, it'll mess with dates when they're six months in the past. In short, yes, it'll probably work for you in your limited testing. It will not be platform-independent (so no portability) and the behaviour of your parsing is not guaranteed given the range of 'legal' filenames on various systems. (Ext4, for example, allows spaces and newlines in filenames).
Having said all that, personally, I'd use ls because it's fast and easy ;)
Edit
As pointed out by Hugo in the comments, the OP doesn't want to use stat. In addition, I should point out that the below section is BSD-stat specific (the %Sm flag doesn't work when I test on Ubuntu; Linux has a stat command, if you're interested in it read the man page).
So, a non-stat solution: use date
date, at least on Linux, has a flag: -r, which according to the man page:
display the last modification time of FILE
So, the scripted solution would be similar to this:
date -r ${MY_FILE_VARIABLE}
which would return you something similar to this:
zsh% date -r MyFile.foo
Thu Feb 23 07:41:27 CST 2012
To address the OP's comment:
If possible with a configurable date format
date has a rather extensive set of time-format variables; read the man page for more information.
I'm not 100% sure how portable date is across all 'UNIX-like systems'. For BSD-based (such as OS X), this will not work; the -r flag for the BSD-date does something completely different. The question doesn't' specify exactly how portable a solution is required to be. For a BSD-based solution, see the below section ;)
A better solution, BSD systems (tested on OS X, using BSD-stat; GNU stat is slightly different but could be made to work in the same way).
Use stat. You can format the output of stat with the -f flag, and you can select to display only the file modification data (which, for this question, is nice).
For example, stat -f "%m%t%Sm %N" ./*:
1340738054 Jun 26 21:14:14 2012 ./build
1340738921 Jun 26 21:28:41 2012 ./build.xml
1340738140 Jun 26 21:15:40 2012 ./lib
1340657124 Jun 25 22:45:24 2012 ./tests
Where the first bit is the UNIX epoch time, the date is the file modification time, and the rest is the filename.
Breakdown of the example command
stat -f "%m%t%Sm %N" ./*
stat -f: call stat, and specify the format (-f).
%m: The UNIX epoch time.
%t: A tab seperator in the output.
%Sm: S says to display the output as a string, m says to use the file modification data.
%N: Display the name of the file in question.
A command in your script along the lines of the following:
stat -f "%Sm" ${FILE_VARIABLE}
will give you output such as:
Jun 26 21:28:41 2012
Read the man page for stat for further information; timestamp formatting is done by strftime.
have perl?
perl -MFile::stat -e "print scalar localtime stat('FileName.txt')->mtime"
How about:
find $PATH -maxdepth 1 -name $FILE -printf %Tc
See the find manpage for other values you can use with %T.
You can use the "date" command adding the desired format option the format:
date +%Y-%m-%d -r /root/foo.txt
2013-05-27
date +%H:%M -r /root/foo.txt
23:02
You can use ls -l which lists the last modification time, and then use cut to cut out the modification date:
mod_date=$(ls -l $file_name | cut -c35-46)
This works on my system because the date appears between columns 35 to 46. You might have to play with it on your system.
The date is in two different formats:
Mmm dd hh:mm
Mmm dd yyyy
Files modified more than a year ago will have the later format. Files modified less than a year ago will have to first format. You could search for a ":" and know which format the file is in:
if echo "$mod_date" | grep -q ":"
then
echo "File was modified within the year"
else
echo "File was modified more than a year ago"
fi

Resources