GREP or Regex Search unique characters with a particular series [closed] - linux

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I am novice,I want search a huge file using grep or regex which has list of Unique Id's.
Example file:
/icon_edit.png\" \/><\/a> AP-28992 : ABCD-1103_01 [v1]","2","2012-10-27 18:40:47","2012-01-04 13:22:41"],
["shawn","extra\/fax","<!-- 0000000000 --><a href=\"javascript:openTCEditWindow(0000,000);\"><img title=\"
TSD\" src=\"gui\/themes\/default\/images\/icon_edit.png\" \/><\/a> AP-28993 : ABCD-1103_02
[v1]","2","2012-10-27 18:40:47","2012-01-04 13:22:41"],
["shawn","extra\/traax","<!-- 0000000000 --> ABCD_110_01
Should be filtered uniquely below like:
ABCD-1103
ABCD-110

I guess ABCD-110 is your input pattern and space is delimiter
so if your input file viz. abc.txt is like (i have modified the last line)
$cat abc.txt
/icon_edit.png\" \/><\/a> AP-28992 : ABCD-1103_01 [v1]","2","2012-10-27
18:40:47","2012-01-04 13:22:41"],
["shawn","extra\/fax","<!-- 0000000000 --><a
href=\"javascript:openTCEditWindow(0000,000);\"><img title=\"
TSD\" src=\"gui\/themes\/default\/images\/icon_edit.png\" \/><\/a> AP-28993 :
ABCD-1103_02
[v1]","2","2012-10-27 18:40:47","2012-01-04 13:22:41"],
["shawn","extra\/traax","<!-- 0000000000 --> ABCD-110_01
Then the following works:
$cat abc.txt | grep -ow "ABCD-110.*" | awk '{print $1}'
ABCD-1103_01
ABCD-1103_02
ABCD-110_01

Related

Bash string operation [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I am trying to translate markdown file into confluence markup as a complete beginner.
I need to make [Title](https:// site.com) into [Title|https:// site.com]. If it was just one link, i could add it to a var and printf it, but I am having trouble figuring out how to do it if I have 10 links for example.
Previously I used CONTENT=$(echo "${CONTENT//# /h1. }") to replace strings but since now every string is different, I am stuck at how to solve this. I found the solution written in javascript: http://chunpu.github.io/markdown2confluence/browser but fail to understand how to do it in bash.
For this test file
$ cat file
[Title](https://site1.com)
[Title](https://site2.com)
[Title](https://site3.com)
[Title](https://site4.com)
[Title](https://site5.com)
[Title](https://site6.com)
[Title](https://site7.com)
[Title](https://site8.com)
[Title](https://site9.com)
[Title](https://site10.com)
Sed variant:
$ sed 's/\](/|/;s/)/\]/' file
[Title|https://site1.com]
[Title|https://site2.com]
[Title|https://site3.com]
[Title|https://site4.com]
[Title|https://site5.com]
[Title|https://site6.com]
[Title|https://site7.com]
[Title|https://site8.com]
[Title|https://site9.com]
[Title|https://site10.com]
Bash variant:
while read -r line; do
line=${line//](/|}
line=${line//)/]}
echo $line
done < file
[Title|https://site1.com]
[Title|https://site2.com]
[Title|https://site3.com]
[Title|https://site4.com]
[Title|https://site5.com]
[Title|https://site6.com]
[Title|https://site7.com]
[Title|https://site8.com]
[Title|https://site9.com]
[Title|https://site10.com]
Awk variant:
$ awk '{ sub(/\]\(/, "|"); sub(/\)/, "]"); print }' file
[Title|https://site1.com]
[Title|https://site2.com]
[Title|https://site3.com]
[Title|https://site4.com]
[Title|https://site5.com]
[Title|https://site6.com]
[Title|https://site7.com]
[Title|https://site8.com]
[Title|https://site9.com]
[Title|https://site10.com]

AWK command not working in linux but works in mac [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
Can someone tell me what am I doing wrong here? It seems to work on my mac shell but does not work on linux box it seems. Looks like different version of awk? I want to make sure my code works on the linux version.
echo -e "${group_values_with_counts}" | awk '$1>='${value2}' { print "{\"count\":\""$1"\",\"type\":\""$2"\"}" }'
21:19:41 awk: $1>= { print "{\"count\":\""$1"\",\"type\":\""$2"\"}" }
21:19:41 awk: ^ syntax error
You're trying to pass the value of a shell variable into awk the wrong way and using a non-portable echo. The right way (assuming value2 doesn't contain any backslashes) is:
printf '%s\n' "$group_values_with_counts" |
awk -v value2="$value2" '$1>=value2{ print "{\"count\":\""$1"\",\"type\":\""$2"\"}" }'
If value2 can contains backslashes and you want them treated literally (e.g. you do not want \t converted to a tab character) then you need to pass it in using ENVIRON or ARGV. See http://cfajohnson.com/shell/cus-faq-2.html#Q24.

ubuntu linux sed affects file properties? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a text file of 75000 items, 2 lines for each item. line 1 has an identifier, line 2 a text string.
I need to remove 130 items, random identifiers that I have in a list or can put in a file.
I can carry out the removal for one item, but not for more than one.
I tried piping the identifiers and get an empty output file.
I tried repeated commands of sed -e 'expression' inputfile > outfile. This works, but requires a new output file that then becomes the inputfile for the next iteration and so on. this might be the last resort.
I tried sed -i in iteration; this crashes and the error is that there is no file by the name of the inputfile. Which is clearly not the case, as I can see it, ls it and grep the number of identifiers in it. Only sed can't seem to read it.
I even found a python/biopython script online for this exact problem, it is very simple and does not give error messages, but it also removes only the first item.
I think it has something to do with file properties/temporary files that don't really exist (?).
I am using Ubuntu 12.04 'Precise'
How can I get around this issue?
quick and dirty (no check if modification file is created, ...)
sed
Assuming there is no special meta character in your pattern list
sed 's#.*#/&/{N;d;}#' YourListToExclude > /tmp/exclude.sed
sed -f /tmp/exclude.sed YourDataFile > /tmp/YourDataFile.tmp
mv /tmp/YourDataFile.tmp YourDataFile
rm /tmp/exclude.sed
awk
awk 'FNR==NR{ex=(ex==""?"":ex"|")$0;next}$0!~ex{print;getline;print;next}{getline}' YourListToExclude YourDataFile > /tmp/YourDataFile.tmp
mv /tmp/YourDataFile.tmp YourDataFile

Changing the date format in a file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm working on redHat linux.
I've a file which looks like :
$vi filename
Jan,1,00:00:01,someone checked your file
Jan,3,09:38:02,applebee
Jan,16,10:20:03, ****************
Jan,18,03:04:03, ***************
I want the output to look like:
2015/01/01,00:00:01,someone checked your file
2015/01/03,3,09:38:02,applebee
2015/01/16,16,10:20:03, ****************
2015/01/18,03:04:03, ***************
Please help me to do this. Thanks
If you have GNU date, try:
$ awk -F, '{cmd="date -d \""$1" "$2"\" +%Y/%m/%d"; cmd|getline d; print d","$3","$4; close(cmd)}' file
2015/01/01,00:00:01,someone checked your file
2015/01/03,09:38:02,applebee
2015/01/16,10:20:03, ****************
2015/01/18,03:04:03, ***************
This approach cannot be used with the BSD (OSX) version of date because it does not support any comparable -d option.
How it works
awk implicitly loops over lines of input, breaking each line into fields.
-F,
This tells awk to use a comma as the field separator
cmd="date -d \""$1" "$2"\" +%Y/%m/%d"
This creates a string variable, cmd, and contains a date command. I am assuming that you have GNU date.
cmd|getline d
This runs the command and captures the output in variable d.
print d","$3","$4
This prints the output that you asked for.
close(cmd)
This closes the command.

Parsing the headers of sequence file [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I have a multiple sequence file as
>abc|d017961
sequence1......
>cdf|rhtdm9
sequence2......
>ijm|smthr12
sequence3......
>abc|d011wejr
sequence4......
>stg|eethwe77
sequence5......
I want to edit the file and want the result file as
>abc_ABC__d017961
sequence1......
>cdf_CDF__rhtdm9
sequence2......
>ijm_IJM__smthr12
sequence3......
>abc_ABC__d011wejr
sequence4......
>stg_STG__eethwe77
sequence5......
Thanks!
perl -pe 's/ (\w+) \| /$1_\U$1\E__/x' file
or
perl -lpe '$_ = "$1_\U$1\E__$2" if / (\w+) \| (\w+)/x' file
You can define the input field separator (FS) to be |, the output field separator (OFS) to be _ and then use the toupper() function.
All together:
$ awk 'BEGIN{OFS="_"; FS="\|"}{print $1,toupper($1),OFS,$2}' file
abc_ABC___d017961 sequence1......
cdf_CDF___rhtdm9 sequence2......
ijm_IJM___smthr12 sequence3......
abc_ABC___d011wejr sequence4......
stg_STG___eethwe77 sequence5......

Resources