Pattern matching from file in perl [closed] - linux

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to use a grep in perl, but I am confused here.
What I want to do is, there is one file like-
this is abctemp1
this is temp2
this is temp3x
this is abctemp1
Now, I want to extract unique words from this file using pattern 'temp', i.e., 'abctemp1, temp2, temp3x'and want to store it in an array. How to do this?

use strict; use warnings;
my (#array, %seen);
/(\w*temp\w*)/ && !$seen{$1}++ && push #array, $1 while <DATA>;
print "$_\n" for #array;
__DATA__
this is abctemp1
this is temp2
this is temp3x
this is abctemp1

Words for every line are in #F, and are pushed into #r if contain temp and are not seen yet,
perl -anE 'push #r, grep { /temp/ && !$s{$_}++ } #F}{ say for #r' file

Related

Is there a way to consolidate similar (but not the same) rows in a text file? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a text file on a linux box that has two columns.
1. An IP address
2. A code for a location
Some IP addresses are listed more than once because more than one code is associated with it.
Example:
140.90.218.62 vaac
140.90.220.11 aawu
140.90.220.11 afc
140.90.220.11 arh
140.90.220.40 afc
I would like to consolidate such IP addresses to only be listed once, just with several location codes
Like this
140.90.218.62 vaac
140.90.220.11 aawu:afc:arh
140.90.220.40 afc
I could always code a for loop to read in the file, consolidate the values into an array, and write the cleaned up version back out.
Before I do that I was wonder if a combination of *nix utilities might do the job, do it with less code, etc.
Using awk
awk '{a[$1]=($1 in a?a[$1]":"$2:$2)}END{for (i in a) print i, a[i]}' file
Output:
140.90.220.11 aawu:afc:arh
140.90.220.40 afc
140.90.218.62 vaac
Explanation:
a[$1]=($1 in a?a[$1]":"$2:$2) - creates an indexed array with the IP address as key. Each $2 with the same IP is concatenated to the current value separated by a colon if ther's already an value.
for (i in a) print i,a[i] - when stdin closes, print all entries in a, the index (IP) first and all the values.
bash version 4, with associative arrays.
declare -A data
while read -r ip value; do
data[$ip]+=":$value"
done < file
for key in "${!data[#]}"; do
printf "%s %s\n" "$key" "${data[$key]#:}"
done
With perl:
perl -lanE 'push #{$ips{$F[0]}}, $F[1]; END { $" = ":"; say "$_ #{$ips{$_}}" for sort keys %ips }' yourfile.txt
outputs
140.90.218.62 vaac
140.90.220.11 aawu:afc:arh
140.90.220.40 afc

Linux/ unix duplicate names [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
What I need to do is, to check for duplicate domain names and find if there is some.
So far I tried many commands with grep, awk ,sort, uniq but couldn't work this out, I am feeling its very simple, but can't reach it.
P.s. If i use uniq -c I get a huge list of string in this file, and I see how many duplicates it has and which by number string it is.
adding 20 rows from the file I am using
1,google.com
2,facebook.com
3,youtube.com
4,yahoo.com
5,baidu.com
6,amazon.com
7,wikipedia.org
8,twitter.com
9,taobao.com
10,qq.com
11,google.co.in
12,live.com
13,sina.com.cn
14,weibo.com
15,linkedin.com
16,yahoo.co.jp
17,tmall.com
18,blogspot.com
19,ebay.com
20,hao123.com
The output I would like to see
> 2 google
> 2 yahoo
Thanks for help !
You could use something like this to get the output you want:
$ awk -F'[.,]' '{++a[$2]}END{for(i in a)if(a[i]>1)print a[i],i}' file
2 google
2 yahoo
With the input field separator to either . or ,, the first {block} is run for every row in the file. It builds up an array a using the second field: "google", "facebook", etc. $2 is the value of the second field, so ++a[$2] increments the value of the array a["google"], a["facebook"], etc. This means that the value in the array increases by one every time the same name is seen.
Once the whole file is processed, the for (i in a) loop runs through all of the keys in the array ("google", "facebook", etc.) and prints those whose value is greater than 1.
Given this file:
$ cat /tmp/test.txt
1,google.com
2,facebook.com
3,youtube.com
4,yahoo.com
5,baidu.com
6,amazon.com
7,wikipedia.org
8,twitter.com
9,taobao.com
10,qq.com
11,google.co.in
12,live.com
13,sina.com.cn
14,weibo.com
15,linkedin.com
16,yahoo.co.jp
17,tmall.com
18,blogspot.com
19,ebay.com
20,hao123.com
In a Perl 1 liner:
$ perl -lane '$count{$1}++ if /^\d+,(\w+)/; END {while (($k, $v) = each %count) { print "$v $k" if $v>1}}' /tmp/test.txt
2 yahoo
2 google

Escaping bash variable [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I'm a bit stuck with this. I'm delaring a variable at the top of my script, then I am creating a file as part of my script:
app="testing"
cat <<EOF >/etc/init.d/test
#!/bin/bash
args="--emperor $APPCONF/test/$APP.ini"
EOF
It doesn't seem to work though, it seems on the $app variable. Must I do something to this variable to get it to display it's value, "testing" inside the file I create?
Use consistent case. variable names are case sensitive.
Let's say you were doing this the Right Way. You'd want to store your data in an array:
args=( --emperor "${appconf}/test/${app}.ini" )
and then convert it to a string for embedding:
printf -v args_str '%q ' "${args[#]}"`
...and use that string inside your heredoc:
#!/bin/bash
args=( $args_str )
EOF
...beyond which, anything inside the script being created would want to expand it as an array:
run_something "${args[#]}"
See BashFAQ #50 for rationale and details.
Besides using consistent case ($app is different from $APP), you may want to enclose your variable names within brackets - you may get issues if you use spaces in between your variables values otherwise, and it's considered a good practice. For example:
args="--emperor ${APPCONF}/test/${APP}.ini"
That way, $APPCONF does not get confused with ${APP}CONF also. I hope this helps!
I'm not sure to understand your question. I suppose that you would like to end with a file
/etc/init.d/test
containing the text:
#!/bin/bash
args="--emperor $APPCONF/test/testing.ini"
if so your script should be:
app="testing"
cat <<EOF >/etc/init.d/test
#!/bin/bash
args="--emperor \$APPCONF/test/$app.ini"
EOF

Replace in a CSV file value of a column [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a a CSV file with this structure:
31126000283424431;32285076389;t;text text;1;3;;1;1;0.9;0.81;0;0;1;1;1;2013-11-21;;NL
31126000279521531;32308233749;c;text text;1;2;;1;9;2.79;7.78;0;0;4;16;9;2013-11-21;;NL
31126000279406931;32291254349;c;text text;1;5;;1;3;0.98;0.96;0;0;3;9;0;2013-11-21;;NL
31126000272138431;32284912829;c;text text;1;3;;1;1;0;0;0;0;3;9;0;2013-11-21;;NL
31126000271468431;32304086789;t;text text;1;5;;1;1;0.2;0.04;0;0;2;4;1;2013-11-21;;NL
31126000269838731;29269530509;c;text text;1;1;;1;1;0.45;0.2;0;0;3;9;0;2013-11-21;;NL
and I need to replace the number after the sixth semicolon to 0.
So the output file would look like:
31126000283424431;32285076389;t;text text;1;0;;1;1;0.9;0.81;0;0;1;1;1;2013-11-21;;NL
31126000279521531;32308233749;c;text text;1;0;;1;9;2.79;7.78;0;0;4;16;9;2013-11-21;;NL
31126000279406931;32291254349;c;text text;1;0;;1;3;0.98;0.96;0;0;3;9;0;2013-11-21;;NL
31126000272138431;32284912829;c;text text;1;0;;1;1;0;0;0;0;3;9;0;2013-11-21;;NL
31126000271468431;32304086789;t;text text;1;0;;1;1;0.2;0.04;0;0;2;4;1;2013-11-21;;NL
31126000269838731;29269530509;c;text text;1;0;;1;1;0.45;0.2;0;0;3;9;0;2013-11-21;;NL
I have been trying awk, sed, and cut, but I can't get it to work.
thank you
your example shows the 6th col, but after the 5th semi.
awk -F';' -v OFS=';' '$6=0;7' file
try the line above
sed "s/;[^;]\{1,\}/;0/5" YourFile.csv
assume there is always something in colum
sed "s/;[^;]*/;0/5" YourFile.csv
change in every case even if there is no number is 6th column
If you've got php on your machine you could use this very handy CSV Paser class. It will convert your CSV in to a 2D array, from which you can cycle through using a foreach loop and change the data.
$csvFile
foreach($csvFile as $value){
foreach($value as $k => $v){
if($k == 5){ $v = 0}
}
}
This would automate it for a CSV file of any size of the same format.
perl -MText::CSV_XS -e'$csv=Text::CSV_XS->new({sep_char=>";", eol=>"\n"}); while($row=$csv->getline(ARGV)) {$row->[5]=0;$csv->print(STDOUT, $row)}'
Use as filter or put input file as parameter. You should use proper CSV parser for any serious work.

Need to replace line from one file based on another file in Linux bash shell [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have two files , one as a template and other as a input file for next command. I have to update my input file based on the values in template
First file looks like as shown below
TKTSPEC.2.ASSETATTRID=REVISION&
TKTSPEC.2.REFOBJECTID=31&
TKTSPEC.2.TICKETSPECID=410&
TKTSPEC.2.SECTION=&
TKTSPEC.3.ASSETATTRID=NUM&
TKTSPEC.3.REFOBJECTID=31&
TKTSPEC.3.TICKETSPECID=411&
TKTSPEC.3.SECTION=&
TKTSPEC.4.ASSETATTRID=MPNUM&
TKTSPEC.4.REFOBJECTID=31&
TKTSPEC.4.TICKETSPECID=412&
TKTSPEC.4.SECTION=&
My Template file looks like
TKTSPEC.2.ASSETATTRID=REVISION&
TKTSPEC.2.TABLEVALUE=5&
TKTSPEC.3.ASSETATTRID=NUM&
TKTSPEC.3.TABLEVALUE=RDPVS&
TKTSPEC.4.ASSETATTRID=MPNUM&
TKTSPEC.4.TABLEVALUE=NEWPROJECT&
My Desired output is as follows
TKTSPEC.2.ASSETATTRID=REVISION&
TKTSPEC.2.TABLEVALUE=5&
TKTSPEC.2.REFOBJECTID=31&
TKTSPEC.2.TICKETSPECID=410&
TKTSPEC.2.SECTION=&
TKTSPEC.3.ASSETATTRID=NUM&
TKTSPEC.3.TABLEVALUE=RDPVS&
TKTSPEC.3.REFOBJECTID=31&
TKTSPEC.3.TICKETSPECID=411&
TKTSPEC.3.SECTION=&
TKTSPEC.4.ASSETATTRID=MPNUM&
TKTSPEC.4.TABLEVALUE=NEWPROJECT&
TKTSPEC.4.REFOBJECTID=31&
TKTSPEC.4.TICKETSPECID=412&
TKTSPEC.4.SECTION=&
I have to check the ASSETATTRID from my first file and then insert a new line with corresponding value from the second file.Second file has value for every assetattrid.
Can this be acheived using awk or other linux based commands ?
One way:
awk -F. 'NR==FNR{getline x;a[$2$3]=x;next}$2$3 in a{print;print a[$2$3];next}1' templatefile inpfile
this oneliner may work for you:
awk 'NR==FNR{k=$0;getline;a[k]=$0;next}$0 in a{$0=$0"\n"a[$0]}1' templ input

Resources