How to filter domain in content?

How to filter domain in content? - text

How I can filter domain in content?
For example.... I have some text content, like this:
dropwox.com N/A $ 8.95 1 day ago
lute.info N/A $ 8.95 1 week ago
zolpidem4sleep.com N/A $ 8.95 1 week ago
youredmedsinfo.com N/A $ 8.95 1 week ago
youngsmhs.com N/A $ 8.95 1 week ago
jsntcj.com N/A $ 8.95 1 week ago
fioricetdirect2k.com 13,133,796 $ 8.95 1 week ago
dapoxetinebuynow.com N/A $ 8.95 1 week ago
86620000.com N/A $ 8.95 1 week ago
spidvid.com 1,884,910 $ 480.00 1 week ago
titsforall.com 20,318,475 $ 8.95 1 week ago
and I just need to filter the domains and see this list like:
dropwox.com
lute.info
zolpidem4sleep.com
youredmedsinfo.com
youngsmhs.com
Is any tool or online converter for do this work?
Help me

If a shell solution is OK, you can do something like this:
cut -d' ' -f1 file | sort | uniq
getting the first word, here using cut, but there are several other ways
sort them so that ...
uniq can filter out the doubles

That is an old question, but why not answer for coming generations?
If you use MacOS or Linux, there are a bunch of tools:
$ cat full_data.txt
dropwox.com N/A $ 8.95 1 day ago
lute.info N/A $ 8.95 1 week ago
zolpidem4sleep.com N/A $ 8.95 1 week ago
...
You may use any of the following:
sed: removing everything after space:
$ sed 's/ .*//' full_data.txt > domains.txt
grep: with regular expression, everything from the beginning (^) until the first space :
$ grep -o "^\S\+" full_data.txt > domains.txt
cut: Pick a first part, space is a delimeter:
$ cut -d' ' -f1 full_data.txt > domains.txt
awk: my beloved awk — pick the first part, space is a delimiter, then printing it:
$ awk '{print $1}' full_data.txt > domains.txt
Also, Perl — same, taking first "variable" line by line :
$ perl -lane 'print $F[0]' full_data.txt > domains.txt

Related

How to print lines between 2 values using tail & head and pipe?

For example:how can I print specific lines of a .txt file between line 5 and line 8 using only tail and head

Copied from here
infile.txt contains a numerical value on each line.
➜ X=3
➜ Y=10
➜ < infile.txt tail -n +"$X" | head -n "$((Y - X))"
3
4
5
6
7
8
9
➜

displaying 2 outputs as a 2 separate columns | bash

I have two outputs from 2 commands:
comm1=`ip a | grep ens | grep -v lo | cut -d' ' -f2`
output example:
>eth1
and command two
comm2=`ip a | grep inet| grep -v inet6 | grep -v 127 | cut -d' ' -f6`
output example:
>123.156.789
234.167.290
148.193.198
138.25.49
142.137.154
125.175.166
246.173.7
154.167.67
Desired output:
echo "$comm1 $comm2"
> eth1 123.156.789
234.167.290
148.193.198
138.25.49
142.137.154
125.175.166
246.173.7
154.167.67
If that would be single line outputs, then column -t works just fine,
echo "$comm1 $comm2" | column -t
but in this case, when one of the columns is multi line, it is not working..
Looking for an efficient solution
edited

You can use command paste and process substitution for this, e.g.:
$ paste <(comm1) <(comm2)

You might want the paste command.
$ seq 1 3 > a.txt
$ seq 5 10 > b.txt
$ paste a.txt b.txt
1 5
2 6
3 7
8
9
10

How to grep only two words in a line in file between them specific number of random words present

Given a file with this content:
Feb 1 ohio a1 rambo
Feb 1 ny a1 sandy
Feb 1 dc a2 rambo
Feb 2 alpht a1 jazzy
I only want the count of those lines containing Feb 1 and rambo.

You can use awk to do this more efficiently:
$ awk '/Feb 1/ && /rambo/' file
Feb 1 ohio a1 rambo
Feb 1 dc a2 rambo
To count matches:
$ awk '/Feb 1/ && /rambo/ {sum++} END{print sum}' file
2
Explanation
awk '/Feb 1/ && /rambo/' is saying: match all lines in which both Feb 1 and rambo are matched. When this evaluates to True, awk performs its default behaviour: print the line.
awk '/Feb 1/ && /rambo/ {sum++} END{print sum}' does the same, only that instead of printing the line, increments the var sum. When the file has been fully scanned, it enters in the END block, where it prints the value of the var sum.

Is Feb 1 always before rambo? if yes:
grep -c "Feb 1 .* rambo"

Try this as per #Marc's suggestions,
grep 'Feb 1.*rambo' file |wc -l
In case, position of both strings are not sure to be as mentioned in question following command will be useful,
grep 'rambo' file|grep 'Feb 1'|wc -l
The output will be,
2
Here is what I tried,

The awk solution is probably clearer, but this is a nice sed technique:
sed -n '/Feb 1/{/rambo/p; }' | wc -l

Cannot get this simple sed command

This sed command is described as follows
Delete the cars that are $10,000 or more. Pipe the output of the sort into a sed to do this, by quitting as soon as we match a regular expression representing 5 (or more) digits at the end of a record (DO NOT use repetition for this):
So far the command is:
$ grep -iv chevy cars | sort -nk 5
I have to add another pipe at the end of that command I think which "quits as soon as we match a regular expression representing 5 or more digits at the end of a record"
I tried things like
$ grep -iv chevy cars | sort -nk 5 | sed "/[0-9][0-9][0-9][0-9][0-9]/ q"
and other variations within the // but nothing works! What is the command which matches a regular expression representing 5 or more digits and quits according to this question?

Nominally, you should add a $ before the second / to match 5 digits at the end of the record. If you omit the $, then any sequence of 5 digits will cause sed to quit, so if there is another number (a VIN, perhaps) before the price, it might match when you didn't intend it to.
grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/q'
On the whole, it's safer to use single quotes around the regex, unless you need to substitute a shell variable into it (or unless the regex contains single quotes itself). You can also specify the repetition:
grep -iv chevy cars | sort -nk 5 | sed '/[0-9]\{5,\}$/q'
The \{5,\} part matches 5 or more digits. If for any reason that doesn't work, you might find you're using GNU sed and you need to do something like sed --posix to get it working in the normal mode. Or you might be able to just remove the backslashes. There certainly are options to GNU sed to change the regex mechanism it uses (as there are with GNU grep too).

Another way.
As you don't post a file sample, a did it as a guess.
Here I'm looking for lines with the word "chevy" where the field 5 is less than 10000.
awk '/chevy/ {if ( $5 < 10000 ) print $0} ' cars
I forgot the flag -i from grep ... so the correct is:
awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
$ cat > cars
Chevy 2 3 4 10000
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 10000
CHEVY 2 3 4 2000
Prevy 2 3 4 1000
Prevy 2 3 4 10000
$ awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 2000

grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/d'

Parsing string with grep

I need some help with parsing a string in Linux.
I have a string:
[INFO] Total time: 2 minutes 8 seconds
and want to get only
2 minutes 8 seconds

Using grep:
$ echo '[INFO] Total time: 2 minutes 8 seconds' | grep -o '[[:digit:]].*$'
2 minutes 8 seconds
Or sed:
$ echo '[INFO] Total time: 2 minutes 8 seconds' | sed 's/.*: //'
2 minutes 8 seconds
Or awk:
$ echo '[INFO] Total time: 2 minutes 8 seconds' | awk -F': ' '{print $2}'
2 minutes 8 seconds
Or cut:
$ echo '[INFO] Total time: 2 minutes 8 seconds' | cut -d: -f2
2 minutes 8 seconds
And then read sed & awk, Second Edition.

The sed and perl options do work, but in this trivial case, I'd prefer
echo "[INFO] Total time: 2 minutes 8 seconds" | cut -d: -f2
If you have something against spaces, you can just use
echo "[INFO] Total time: 2 minutes 8 seconds" | cut -d: -f2 | xargs
or even...
echo "[INFO] Total time: 2 minutes 8 seconds" | cut -d: -f2 | cut -c2-
PS. Trivia: you could do this with grep only if grep implemented positive lookbehind like this egrep -o '(?<=: ).*'; Unfortunately neither POSIX extended regex nor GNU extended regex implement lookbehind (http://www.regular-expressions.info/refflavors.html)

If the line prefix is always the same, simply use sed and replace the prefix with an empty string:
sed 's/\[INFO\] Total Time: //'
Assuming that the time is always the last thing in a line after a colon, use the following regex (replace each line with everything after the colon):
sed 's/^.*: \(.*\)$/\1/'

If you prefer AWK then it is quite simple
echo "[INFO] Total time: 2 minutes 8 seconds" | awk -F": " '{ print $2 }'

Use sed or perl:
echo "[INFO] Total time: 2 minutes 8 seconds" | sed -e 's/^\[INFO\] Total time:\s*//'
echo "[INFO] Total time: 2 minutes 8 seconds" | perl -pe "s/^\[INFO\] Total time:\s*//;"

If you are getting the info from the terminal then you can grep out the info and use cut with the delimiter to remove everything before the info you want.
grep INFO | cut -f2 -d:
If you want the info out of a file then you can grep the file
grep INFO somefilename | cut -f2 -d:

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to filter domain in content? - text

If a shell solution is OK, you can do something like this: cut -d' ' -f1 file | sort | uniq getting the first word, here using cut, but there are several other ways sort them so that ... uniq can filter out the doubles

Related

How to print lines between 2 values using tail & head and pipe?

displaying 2 outputs as a 2 separate columns | bash

How to grep only two words in a line in file between them specific number of random words present

Cannot get this simple sed command

Parsing string with grep

Categories

Resources