Find text in files and get the needed content

Find text in files and get the needed content - linux

I have many access_log files. This is a line from a file of them.
access_log.20111215:111.222.333.13 - - [15/Dec/2011:05:25:00 +0900] "GET /index.php?uid=01O9m5s23O0p&p=nutty&a=check_promotion&guid=ON HTTP/1.1" 302 - "http://xxx.com/index.php?uid=xxx&p=mypage&a=index&sid=&fid=&rand=1681" "Something/2.0 qqq(xxx;yyy;zzz)" "-" "-" 0
How to extract the uid "01O9m5s23O0p" from the lines which have the occurence of "p=nutty&a=check_promotion" and output to a new file.
For example, The "output.txt" file should be:
01O9m5s23O0p
01O9m5s0999p
01O9m5s3249p
fFDSFewrew23
SOMETHINGzzz
...
I tried the:
grep "p=nutty&a=check_promotion" access* > using_grep.out
and
fgrep -o "p=nutty&a=check_promotion" access* > using_fgrep.out
but it prints whole line. I just want to get the uid.
Summary:
1) Find the lines which have "p=nutty&a=check_promotion"
2) Extract uid from those lines.
3) Print them to a file.

Do exactly that, in three stages:
(formatted to avoid the scroll)
grep 'p=nutty&a=check_promotion' access* \
| grep -o '[[:alnum:]]\{4\}m5s[[:alnum:]]\{4\}p' \
> output.txt

If your lines which have p=nutty&a=check_promotion are similar in nature then we can set the delimiters and use awk to extract the uid and place them in a file.
awk -v FS="[?&=]" '
$0~/p=nutty&a=check_promotion/{ print $3 > "output_file"}' input_file
Test:
[jaypal:~/Temp] cat file
access_log.20111215:210.136.161.13 - - [15/Dec/2011:05:25:00 +0900] "GET /index.php?uid=01O9m5s23O0p&p=nutty&a=check_promotion&guid=ON HTTP/1.1" 302 - "http://xxx.com/index.php?uid=xxx&p=mypage&a=index&sid=&fid=&rand=1681" "Something/2.0 qqq(xxx;yyy;zzz)" "-" "-" 0
[jaypal:~/Temp] awk -v FS="[?&=]" '
$0~/p=nutty&a=check_promotion/{ print $3 > "output_file"}' input_file
[jaypal:~/Temp] cat output_file
01O9m5s23O0p

Related

How to export each part of a line of text file to its own file?

I have these output values of an Arduino Sensor saved to text file like this
9 P2.5=195.60 P10=211.00
10 P2.5=195.70 P10=211.10
11 P2.5=195.70 P10=211.10
2295 P2.5=201.20 P10=218.20
2300 P2.5=201.40 P10=218.40
...
...
And I want to extract each column to its own text file.
Expected Output: 3 text Files Number.txt, P25.txt and P10.txt where
Number.txt contains
9
10
11
2295
2300
P25.txt contains
195.60
195.70
195.70
201.20
201.40
and P10.txt contains
211.00
211.10
211.10
218.20
218.40
PS: the file has more than just 5 lines, so the code should be applied to every line.

Here is how you could do:
$ grep -Po '^[0-9.]+' data.txt > Number.txt
$ grep -Po '(?<=P2\.5=)[0-9.]+' data.txt > P25.txt
$ grep -Po '(?<=P10=)[0-9.]+' data.txt > P10.txt
^: Assert position at the start of the line.
[0-9.]+ Matches either a digit or a dot, between one and unlimited times, as much as possible.
(?<=): Positive lookbehind.
P2\.5=: Matches P2.5=.
P10=: Matches P10=.
-o: Print only matching part.
-P: Perl style regex.

Use awk, which can open files itself rather than rely on standard output.
awk '{sub("P2.5=", "", $2);
sub("P10=", "", $3);
print $1 > "Number.txt";
print $2 > "P25.txt";
print $3 > "P10.txt"; }' data.txt
or
awk '{print $1 > "Number.txt";
print substr($2, 6) > "P25.txt";
print substr($3, 5) > "P10.txt"; }' data.txt

Is there a way to remove a line fully?

I'm using a one-line command to compile and print all of the animal names listed in a log file.
The WILD names are all listed in capital letters under the /wild directory.
The output should appear in the format of one name per line, with no duplicates:
ANT
BAT
CAT
I tried
grep 'wild' animal.txt | awk '{print $7}' | sed 's/[a-z0-9./]//g' | sort -u
It showed what I want but I want to remove the whole string which contains special characters like -, # ? %
Below is a sample of the file animal.txt
191.21.66.100 - - [21/Aug/1995:05:17:57 -0400] "GET /wild/elvpage.htm#ZOO HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:22:35 -0400] "GET /wild/S/s_26s.jpg HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:22:41 -0400] "GET /wild/struct.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:34 -0400] "GET /wild/elvpage.htm HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:36 -0400] "GET /wild/endball.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:37 -0400] "GET /wild/hot.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:38 -0400] "GET /wild/elvhead3.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:38 -0400] "GET /wild/PEGASUS/minpeg1.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/DOG/DOG.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/SWAN/SWAN.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:39 -0400] "GET /wild/ATLAS/atlas.gif HTTP/1.0"
191.21.66.100 - - [01/Aug/1995:02:27:40 -0400] "GET /wild/LIZARD/lizard.gif HTTP/1.0"
Below is a sample of my output after running the command:
ATLAS
ATLAS-
CAT_
DOG
%FACT
-KWM
?TIL-
#ZOO

Why not allow only capital A-Z and remove everything else:
grep 'wild' animal.txt | awk '{print $7}' | sed 's/[^A-Z]//g'
from your example input, this will return:
PEGASUS
DOGDOG
SWANSWAN
ATLAS
LIZARD
If you need to: you can further cleanup empty lines by appending |sed "/^$/d" and then sort

You can use a single GNU sed command:
sed -n 's!.*/wild/\([A-Z][A-Z]\+\)/.*!\1!p' animal.txt
Means:
-n: Do not print every line.
s!X!Y! Substitute X with Y.
.*/wild/\([A-Z][A-Z]\+\)/*: find a capital letter followed by at least one capital letter and preceded by wild/. These should be followed by a / and anything. Capture (remember) the capital letters.
!\1!: Replace whatever you found with the capital letter sequence.
p: If it was a match then print the line.
Gives:
PEGASUS
DOG
SWAN
ATLAS
LIZARD

This might work for you (GNU sed):
sed -E '/.*\/wild\/[^A-Z ]*([A-Z]+).*/!d # delete lines with no uppercase letters
s//\1/ # remove everything but uppercases letters
H # append word to the hold space
$!d # delete all lines but the last
x # swap to the hold space
:a # loop name space
s/((\n[^\n]+).*)\2/\1/ # remove duplicates
ta # repeat until failure
s/.//' file # remove introduced newline

GNU awk to get result:
grep 'wild' animal.txt | awk '
($0 = $7)
{gsub(/\//, " ", $0)}; #replace '/' with space so we can separate $0 to ($1, $2, $3);
(NF == 3 && length($2) > 2) #check if there is three word in line ($1, $2, $3) and then check if length($2) is more then 2 character
{print $2}'
| sort -u
Answer:
grep 'wild' animal.txt | awk '
($0 = $7) {gsub(/\//, " ", $0)};
(NF == 3 && length($2) > 2) {print $2}' | sort -u

Get a combined list of filenames and their md5sums

I have a list of files listed 1 per line on some filein.txt like so:
mikesfile.php
ericsfile.php
subdir1/johnsfile.php
subdir1/davidsfile.php
subdir1/subdir2/ashleysfile.php
subdir1/subdir2/zoesfile.php
I need my bash script to read from that line by line, md5sum the corresponding files, and then write the file as well as its md5 on a new file named fileout.txt
For example:
e14086108b4d5d191c22b0a085694e4a - mikesfile.php
ebadb70de710217a7d4d4c9d114b8145 - ericsfile.php
b40bb5dfb23bf89b3011ff82d9cb0b0b - subdir1/johnsfile.php
d03e9b7306cb1f6c019b574437f54db0 - subdir1/davidsfile.php
f840a8d2ea7342303c807b6cb6339fd1 - subdir1/subdir2/ashleysfile.php
3560e05d5ccdad6900a5dfed1a4a8154 - subdir1/subdir2/zoesfile.php
I've been messing around with this:
while read line; do echo -n "$line" | md5sum; done; < filein > fileout
But it just dumps the md5 hash and completely omits the corresponding filenames. Searched all over trying to remedy this to no avail.
I'd very much appreciate your help in combining the two and properly writing them to the output file as shown. Many thanks in advance.

bash + awk solution:
while read -r fn; do
echo "$fn" | md5sum | awk -v fn="$fn" '{ print $0,fn }'
done < filein > fileout
$ cat fileout
1db757c4f098cebf93639f00e55bc88d - mikesfile.php
f063a35599d453721ada5d0c8fcc0185 - ericsfile.php
a8c3a721d12b432c94c23d463fb5a93f - subdir1/johnsfile.php
a4aa114d977c75153aac382574229d3a - subdir1/davidsfile.php
abd77236c393266115acda48ddb4f9a0 - subdir1/subdir2/ashleysfile.php
18ba7e37f42a837d33a7fb3e56a618b5 - subdir1/subdir2/zoesfile.php

Just a straight application of md5sum and printf will do it for you, e.g.
$ while read -r name; do
printf "%s %s %s\n" $(md5sum <<<"$name") "$name";
done <filein.txt
Output
1db757c4f098cebf93639f00e55bc88d - mikesfile.php
f063a35599d453721ada5d0c8fcc0185 - ericsfile.php
a8c3a721d12b432c94c23d463fb5a93f - subdir1/johnsfile.php
a4aa114d977c75153aac382574229d3a - subdir1/davidsfile.php
abd77236c393266115acda48ddb4f9a0 - subdir1/subdir2/ashleysfile.php
18ba7e37f42a837d33a7fb3e56a618b5 - subdir1/subdir2/zoesfile.php

Capturing the value of pattern in UNIX and writing to another file

I have a file
#InboxPulse.jmx
request.threads3=10
request.loop=10
duration=300
request.ramp=6
#LaunchPulse.jmx
request.threads1=20
request.loop1=5
duration1=300
request.ramp1=6
#BankRetail.jmx
request.threads2=30
request.loop2=7
duration2=300
request.ramp2=6
I would like to capture the values for
request.threads2
request.threads1
request.threads3
into another file like this:
10
20
30
I tried this
awk '/request.threads[0-9]{1,10}=/{print $NF}' build.properties >> sum.txt
It gives the output as:
request.threads3=10
request.threads1=20
request.threads2=30
How can I get the desired output?

Split on the = sign, match on field 1, print field 2:
awk -F'=' '$1 ~ /request.threads[0-9]+$/ {print $2}' build.properties >> sum.txt

1) Extracting values
$ grep -oP 'request.threads\d+=\K\d+' build.properties
10
20
30
Add > sum.txt to command to save output to a file
2) If sum of those values is needed
$ perl -lne '($v)=/request.threads\d+=\K(\d+)/; $s+=$v; END{print $s}' build.properties
60

parse httpd log in bash

my httpd log has the following format
123.251.0.000 - - [05/Sep/2014:18:19:24 -0700] "GET /myapp/MyService?param1=value1&param2=value2&param3=value3 HTTP/1.1" 200 15138 "-" "-"
I need to extract the following fields and display on a line:
IP value1 httpResponseCode(eg.200), dataLength
what's the most efficient way to do this in bash?

As you're using Linux, chances are that you also have GNU awk installed. If so:
$ awk 'match ($7, /param1=([^& ]*)/, m) { print $1, m[1], $9",", $10 }' http.log
gives:
123.251.0.000 value1 200, 15138
This works as long as value1 hasn't got an ampersand or space in it, which they shouldn't if the request has been escaped correctly.

$ cat tmp.txt
123.251.0.000 - - [05/Sep/2014:18:19:24 -0700] "GET /myapp/MyService?param1=value1&param2=value2&param3=value3 HTTP/1.1" 200 15138 "-" "-"
$ awk '{ print "IP", $1, $9, $10 }' tmp.txt
IP 123.251.0.000 200 15138

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Find text in files and get the needed content - linux

Do exactly that, in three stages: (formatted to avoid the scroll) grep 'p=nutty&a=check_promotion' access* \ | grep -o '[[:alnum:]]\{4\}m5s[[:alnum:]]\{4\}p' \ > output.txt

Related

How to export each part of a line of text file to its own file?

Is there a way to remove a line fully?

Get a combined list of filenames and their md5sums

Capturing the value of pattern in UNIX and writing to another file

parse httpd log in bash

Categories

Resources