How to remove everything before a specific string - linux

I need to get clean output from wireshark but recently i noticed that i've been getting outputs where my cut command doesn't work anymore
Data from wireshark
C.'¢¢#)ù¨.fEo³#(#¹³³÷÷P*,§ýý {P¹'GET /2015/dec/alltracks/playlist.m3u8
I used to use cut -f2 -d" " but now i notice some entries come with multiple spaces so my command fails.
How would I get rid of everything before GET, including the word GET? The goal is to get only /2015/dec/alltracks/playlist.m3u8

Use GNU grep in PCRE-mode enabled by -P flag and -o flag to print only the matching word.
grep -Po ".*GET \K(.*)" input-file
/2015/dec/alltracks/playlist.m3u8
Using a perl regEx
perl -nle 'print "$1" if /.*GET (.*)/' input-file
/2015/dec/alltracks/playlist.m3u8

You can do this by using the following regular expression
.*GET
Where .* recognises any characters (the * means zero or more) and GET recognises what you want including the leading space, so the output would be what you desire
sed "s/.*GET //g"
We would replace what we find with nothing (//).
s is for substitute while g is for global (which, depending on the case may or may not be necessary, although I'd recommend you to use it if you are willing to modify more than a line.

Related

Awk pattern always matches last record?

I'm in the process of switching from zsh to bash, and I need to produce a bash script that can remove duplicate entries in $PATH without reordering the entries (thus no sort -d magic). zsh has some nice array handling shortcuts that made it easy to do this efficiently, but I'm not aware of such shortcuts in bash. I came across this answer which has gotten me 90% of the way there, but there is a small problem that I would like to understand better. It appears that when I run that awk command, the last record processed incorrectly matches the pattern.
$ awk 'BEGIN{RS=ORS=":"}!a[$0]++' <<<"aa:bb:cc:aa:bb:cc"
aa:bb:cc:cc
$ awk 'BEGIN{RS=ORS=":"}!a[$0]++' <<<"aa:bb:cc:aa:bb"
aa:bb:cc:bb
$ awk 'BEGIN{RS=ORS=":"}!a[$0]++' <<<"aa:bb:cc:aa:bb:cc:" # note trailing colon
aa:bb:cc:
I don't understand awk well enough to know why it behaves this way, but I have managed to work around the issue by using an intermediate array like so.
array=($(awk 'BEGIN{RS=":";ORS=" "}!a[$0]++' <<<"aa:bb:cc:aa:bb:cc:"))
# Use a subshell to avoid modifying $IFS in current context
echo $(export IFS=":"; echo "${array[*]}")
aa:bb:cc
This seems like a sub-optimal solution however, so my question is: did I do something wrong in the awk command that is causing false positive matches on the final record processed?
The last record in your original string is cc\n which is different from cc. When unsure what's happening in any program in any language, adding some print statements is step 1 to debugging/investigating:
$ awk 'BEGIN{RS=ORS=":"} {print "<"$0">"}' <<<"aa:bb:cc:aa:bb:cc"
<aa>:<bb>:<cc>:<aa>:<bb>:<cc
>:$
If you want the RS to be : or \n then just state that (with GNU awk at least):
$ awk 'BEGIN{RS="[:\n]"; ORS=":"} !a[$0]++' <<<"aa:bb:cc:aa:bb:cc"
aa:bb:cc:$
The $ in all of the above is my prompt.
Another possible workaround instead of your bash array solution
$ echo "aa:bb:cc:aa:bb:cc" | tr ':' '\n' | awk '!a[$0]++' | paste -sd:
aa:bb:cc

how to remove first two words of a strings output

I want to remove the first two words that come up in my output string. this string is also within another string.
What I have:
for servers in `ls /data/field`
do
string=`cat /data/field/$servers/time`
This sends this text:
00:00 down server
I would like to remove "00:00 down" so that it only displays "server".
I have tried using cut -d ' ' -f2- $string which ends up just removing directories that the command searches.
Any ideas?
Please, do the things properly :
for servers in /data/field/*; do
string=$(cut -d" " -f3- /data/field/$servers/time)
echo "$string"
done
backticks are deprecated in 2014 in favor of the form $( )
don't parse ls output, use glob instead like I do with data/field/*
Check http://mywiki.wooledge.org/BashFAQ for various subjects
Use -d option to set the delimtier to space
$ echo 00:00 down server | cut -d" " -f3-
server
Note Use the field number 3 as the count starts from 1 and not 0
From man page
-d, --delimiter=DELIM
use DELIM instead of TAB for field delimiter
N- from N'th byte, character or field, to end of line
More Tests
$ echo 00:00 down server hello world| cut -d" " -f3-
server hello world
The for loop is capable of iterating through the files using globbing. So I would write something like
for servers in /data/field*
do
string=`cut -d" " -f3- /data/field/$servers/time`
...
...
You can use sed as well:
sed 's/^.* * //'
For the examples given, I prefer cut. But for the general problem expressed by the question, the answers above have minor short-comings. For instance, when you don't know how many spaces are between the words (cut), or whether they start with a space or not (cut,sed), or cannot be easily used in a pipeline (shell for-loop). Here's a perl example that is fast, efficient, and not too hard to remember:
| perl -pe 's/^\s*(\S+\s+){2}//'
Perl's -p operates like sed's. That is, it gobbles input one line at a time, like -n, and after dong work, prints the line again. The -e starts the command-line-based script. The script is simply a one-line substitute s/// expression; substitute matching regular expressions on the left hand side with the string on the right-hand side. In this case, the right-hand side is empty, so we're just cutting out the expression found on the left-hand side.
The regular expression, particular to Perl (and all PLRE derivatives, like those in Python and Ruby and Javascript), uses \s to match whitespace, and \S to match non-whitespace. So the combination of \S+\s+ matches a word followed by its whitespace. We group that sub-expression together with (...) and then tell sed to match exactly 2 of those in a row with the {m,n} expression, where n is optional and m is 2. The leading \s* means trim leading whitespace.

Line numbering in Grep

I have command in Grep:
cat nastava.html | grep '<td>[A-Z a-z]*</td><td>[0-9/]*</td>' | sed 's/[ \t]*<td>\([A-Z a-z]*\)<\/td><td>\([0-9]\{1,3\}\)\/[0-9]\{2\}\([0-9]\{2\}\)<\/td>.*/\1 mi\3\2 /'
|sort|grep -n ".*" | sed -r 's/(.*):(.*)/\1. \2/' >studenti.txt
I don't understand second line, sort is ok, grep -n means to num that sorted list, but why do we use here ".*"? It won't work without it, and i don't understand why.
The grep is used purely for the side effect of the line numbering with the -n option here, so the main thing is really to use a regular expression which matches all the input lines. As such, .* is not very elegant -- ^ would work without scanning every line, and $ trivially matches every line as well. Since you know the input lines are not empty, thus contain at least one character, the simple regular expression . would work perfectly, too.
However, as the end goal is to perform line numbering, a better solution is to use a dedicated tool for this purpose.
... | sort | nl -ba -s '. '
The -ba option specifies to number all lines (the default is to only add a line number to non-empty lines; we know there are no empty lines, so it's not strictly necessary here, but it's good to know) and the -s option specifies the separator string to put after the number.
A possible minor complication is that the line number format is whitespace-padded, so in the end, this solution may not work for you if you specifically want unpadded numbers. (But a sed postprocessor to fix that up is a lot simpler than the postprocessor for grep you have now -- just sed 's/^ *//' will remove leading whitespace).
... As an aside, the ugly cat | grep | sed pipeline can be abbreviated to just
sed -n 's%[ \t]*<td>\([A-Z a-z]*\)</td><td>\([0-9]\{1,3\}\)/[0-9]\{2\}\([0-9]\{2\}\)</td>.*%\1 mi\3\2 %p' nastava.html
The cat was never necessary in the first place, and the sed script can easily be refactored to only print when a substitution was performed (your grep regular expression was not exactly equivalent to the one you have in the sed script but I assume that was the intent). Also, using a different separator avoids having to backslash the slashes.
... And of course, if nastava.html is your own web page, the whole process is umop apisdn. You should have the students results in a machine-readable form, and generate a web page from that, rather than the other way around.
grep needs a regular expression to match. You can't run grep with no expression at all. If you want to number all the lines, just specify an expression that matches anything. I'd probably use ^ instead of .*.

Convert string to hexadecimal on command line

I'm trying to convert "Hello" to 48 65 6c 6c 6f in hexadecimal as efficiently as possible using the command line.
I've tried looking at printf and google, but I can't get anywhere.
Any help greatly appreciated.
Many thanks in advance,
echo -n "Hello" | od -A n -t x1
Explanation:
The echo program will provide the string to the next command.
The -n flag tells echo to not generate a new line at the end of the "Hello".
The od program is the "octal dump" program. (We will be providing a flag to tell it to dump it in hexadecimal instead of octal.)
The -A n flag is short for --address-radix=n, with n being short for "none". Without this part, the command would output an ugly numerical address prefix on the left side. This is useful for large dumps, but for a short string it is unnecessary.
The -t x1 flag is short for --format=x1, with the x being short for "hexadecimal" and the 1 meaning 1 byte.
If you want to do this and remove the spaces you need:
echo -n "Hello" | od -A n -t x1 | sed 's/ *//g'
The first two commands in the pipeline are well explained by #TMS in his answer, as edited by #James. The last command differs from #TMS comment in that it is both correct and has been tested. The explanation is:
sed is a stream editor.
s is the substitute command.
/ opens a regular expression - any character may be used. / is
conventional, but inconvenient for processing, say, XML or path names.
/ or the alternate character you chose, closes the regular expression and
opens the substitution string.
In / */ the * matches any sequence of the previous character (in this
case, a space).
/ or the alternate character you chose, closes the substitution string.
In this case, the substitution string // is empty, i.e. the match is
deleted.
g is the option to do this substitution globally on each line instead
of just once for each line.
The quotes keep the command parser from getting confused - the whole
sequence is passed to sed as the first option, namely, a sed script.
#TMS brain child (sed 's/^ *//') only strips spaces from the beginning of each line (^ matches the beginning of the line - 'pattern space' in sed-speak).
If you additionally want to remove newlines, the easiest way is to append
| tr -d '\n'
to the command pipes. It functions as follows:
| feeds the previously processed stream to this command's standard input.
tr is the translate command.
-d specifies deleting the match characters.
Quotes list your match characters - in this case just newline (\n).
Translate only matches single characters, not sequences.
sed is uniquely retarded when dealing with newlines. This is because sed is one of the oldest unix commands - it was created before people really knew what they were doing. Pervasive legacy software keeps it from being fixed. I know this because I was born before unix was born.
The historical origin of the problem was the idea that a newline was a line separator, not part of the line. It was therefore stripped by line processing utilities and reinserted by output utilities. The trouble is, this makes assumptions about the structure of user data and imposes unnatural restrictions in many settings. sed's inability to easily remove newlines is one of the most common examples of that malformed ideology causing grief.
It is possible to remove newlines with sed - it is just that all solutions I know about make sed process the whole file at once, which chokes for very large files, defeating the purpose of a stream editor. Any solution that retains line processing, if it is possible, would be an unreadable rat's nest of multiple pipes.
If you insist on using sed try:
sed -z 's/\n//g'
-z tells sed to use nulls as line separators.
Internally, a string in C is terminated with a null. The -z option is also a result of legacy, provided as a convenience for C programmers who might like to use a temporary file filled with C-strings and uncluttered by newlines. They can then easily read and process one string at a time. Again, the early assumptions about use cases impose artificial restrictions on user data.
If you omit the g option, this command removes only the first newline. With the -z option sed interprets the entire file as one line (unless there are stray nulls embedded in the file), terminated by a null and so this also chokes on large files.
You might think
sed 's/^/\x00/' | sed -z 's/\n//' | sed 's/\x00//'
might work. The first command puts a null at the front of each line on a line by line basis, resulting in \n\x00 ending every line. The second command removes one newline from each line, now delimited by nulls - there will be only one newline by virtue of the first command. All that is left are the spurious nulls. So far so good. The broken idea here is that the pipe will feed the last command on a line by line basis, since that is how the stream was built. Actually, the last command, as written, will only remove one null since now the entire file has no newlines and is therefore one line.
Simple pipe implementation uses an intermediate temporary file and all input is processed and fed to the file. The next command may be running in another thread, concurrently reading that file, but it just sees the stream as a whole (albeit incomplete) and has no awareness of the chunk boundaries feeding the file. Even if the pipe is a memory buffer, the next command sees the stream as a whole. The defect is inextricably baked into sed.
To make this approach work, you need a g option on the last command, so again, it chokes on large files.
The bottom line is this: don't use sed to process newlines.
echo hello | hexdump -v -e '/1 "%02X "'
Playing around with this further,
A working solution is to remove the "*", it is unnecessary for both the original requirement to simply remove spaces as well if substituting an actual character is desired, as follows
echo -n "Hello" | od -A n -t x1 | sed 's/ /%/g'
%48%65%6c%6c%6f
So, I consider this as an improvement answering the original Q since the statement now does exactly what is required, not just apparently.
Combining the answers from TMS and i-always-rtfm-and-stfw, the following works under Windows using gnu-utils versions of the programs 'od', 'sed', and 'tr':
echo "Hello"| tr -d '\42' | tr -d '\n' | tr -d '\r' | od -v -A n -tx1 | sed "s/ //g"
or in a CMD file as:
#echo "%1"| tr -d '\42' | tr -d '\n' | tr -d '\r' | od -v -A n -tx1 | sed "s/ //g"
A limitation on my solution is it will remove all double quotes (").
"tr -d '\42'" removes quote marks that the Windows 'echo' will include.
"tr -d '\r'" removes the carriage return, which Windows includes as well as '\n'.
The pipe (|) character must follow immediately after the string or the Windows echo will add that space after the string.
There is no '-n' switch to the Windows echo command.

Replacing a line in a csv file?

I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.
It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.
sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv
Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.
Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).
Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile
yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.

Resources