grep two string as variable to use in a script - linux
Could you pls help me how can i grep and use the strings mentioned below as variable at the 3rd line in the file.txt involved.
file.txt
line1: some words with 123#domain.com
line2: some words
line3: path = /aaa/bbb/domain.com/user#domain.com/ccc/123#test.com/
So need to grep "user#domain.com" and "123#test" at line3 to use as variables in a script like ;
#!/bin/bash
var1 = some_code result as "user#domain.com"
var2 = some_code result as "123#test"
run_a_command $var1 $var2
Thanks in advance,
If the format of the file is same as you have shown then you could do:
arr=($(awk -F'/' '/path/{print $5,$7}' file)) # Extract the desired 2 fields
arr[1]=${arr[1]%\.com} # Remove the suffix ".com"
run_a_command ${a[0]} ${a[1]}
Depending on the file content, you may also want to adjust the awk part to extract. You can also check if the one or both array elements are empty if that could be a possibility. If it's always third line, then you can do using NR==3 check in the awk pattern matching part: arr=($(awk -F'/' 'NR==3 && /path/{print $5,$7}' file)).
If the input file has more complex format (E.g. what if multiple such lines are there in the input file etc), then you should update the question as any solution depends on that.
What about:
grep -o -E '\/[^\/]+#[^\/\.]+' INFILE | sed "s/\///g"
Maybe the following is what you are looking for?
grep -o -E '\/[^\/]+#[^\/]+(\/|$)' INFILE | sed "s/\///g"
Related
replace part of the url from a few cells in a CSV file in each row
I have the below CSV file. I would like to be able to convert it so I have IDs without URLs. tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID https://some-url.com/tree/90000607/90000610,scientific,https://some-url.com/tree/90000607/90000610,Bacteria,,accepted,f,Bacteria,https://some-url.com/name/bbni/90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,https://some-url.com/instance/bbni/90000609,TWD,https://some-url.com/reference/bbni/90000596,,,Bacteria,,,,,ABC,-,/tree/90000607/90000610 I would like to accomplish the following one of two outcomes. I have tried different things using sed piping it through a few times, but I am unable to do it in one command using regEx. Option 1: tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID tree/90000607/90000610,scientific,tree/90000607/90000610,Bacteria,,accepted,f,Bacteria,name/bbni/90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,instance/bbni/90000609,TWD,reference/bbni/90000596,,,Bacteria,,,,,ABC,-,/tree/90000607/90000610 Option 2: tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID 90000610,scientific,90000610,Bacteria,,accepted,f,Bacteria,90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,90000609,TWD,90000596,,,Bacteria,,,,,ABC,-,90000610 If someone can assist with what you have done before, it would help me out. Things I Tried: #!/bin/bash sed -e 's/[a-z]*:\/\/[a-z]*.[a-z]*.[a-z]*\/[a-z]*\/[a-z]*\/[a-z]*\/[a-z]*//g' BBC-taxon-2019-03-26-4546.csv > test.csv sed -e 's/[0-9]\/[0-9]/[0-9]|[0-9]/g' test.csv Above code needs to to write a command for each type of replacement and create a new file each time, so I gave up. #!/bin/bash # Set Input File here... input="BBC-taxon-2019-03-26-4546.csv" # Check if file exists [ ! -f $input ] && { echo "No file with name: $input. File not found"; exit 123; } # Set file separator and read fields into variables while IFS=',' read -ra fields; do echo "Fields: ${fields[*]}" echo "Number of Elements: ${#fields[#]}" echo "Each Element has: ${#fields}" for i in "${fields[#]}" do echo $i done # fields[0] = ${fields[0]} done < "$input" The above code creates a iterable array but I don't know how I can use sed on each value cell for a certain column. If anyone can help, that is great.
Input: tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID https://some-url.com/tree/90000607/90000610,scientific,https://some-url.com/tree/90000607/90000610,Bacteria,,accepted,f,Bacteria,https://some-url.com/name/bbni/90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,https://some-url.com/instance/bbni/90000609,TWD,https://some-url.com/reference/bbni/90000596,,,Bacteria,,,,,ABC,-,/tree/90000607/90000610 For option 1 use: sed -E 's#(https?://[^,/]+)?(/[^/]+/[^/]+/[0-9]+)#\2#g' input.csv tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID /tree/90000607/90000610,scientific,/tree/90000607/90000610,Bacteria,,accepted,f,Bacteria,/name/bbni/90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,/instance/bbni/90000609,TWD,/reference/bbni/90000596,,,Bacteria,,,,,ABC,-,/tree/90000607/90000610 For option 2 use: sed -E 's#(https?://[^,]+|(/[^,/]+)+)/([0-9]+)#\3#g' input.csv tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID 90000610,scientific,90000610,Bacteria,,accepted,f,Bacteria,90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,90000609,TWD,90000596,,,Bacteria,,,,,ABC,-,90000610 add the option -i.bak to change directly the input file (inline mode) a backup file will be taken .bak
If you know that each thing you are trying to parse is a url, and it wont conflict with other data fields, why not regex for the exact url string? like this: sed -e 's/http[s]:\/\/.*\.com//g' test.csv
if your data in 'd' file, try gnu sed 1st line not print tree and number, 2nd one print it because it has \1 at replace. sed -Ez 's#\bhttps://[^/]+/tree/\w+/##g ' d sed -Ez 's#\bhttps://[^/]+(/tree/\w+/)#\1#g ' d
echo without trimming the space in awk command
I have a file consisting of multiple rows like this 10|EQU000000001|12345678|3456||EOMCO042|EOMCO042|31DEC2018|16:51:17|31DEC2018|SHOP NO.5,6,7 RUNWAL GRCHEMBUR MHIN|0000000010000.00|6761857316|508998|6011|GL I have to split and replace the column 11 into 4 different columns using the count of character. This is the 11th column containing extra spaces also. SHOP NO.5,6,7 RUNWAL GRCHEMBUR MHIN This is I have done ls *.txt *.TXT| while read line do subName="$(cut -d'.' -f1 <<<"$line")" awk -F"|" '{ "echo -n "$11" | cut -c1-23" | getline ton; "echo -n "$11" | cut -c24-36" | getline city; "echo -n "$11" | cut -c37-38" | getline state; "echo -n "$11" | cut -c39-40" | getline country; $11=ton"|"city"|"state"|"country; print $0 }' OFS="|" $line > $subName$output done But while doing echo of 11th column, its trimming the extra spaces which leads to mismatch in count of character. Is there any way to echo without trimming spaces ? Actual output 10|EQU000000001|12345678|3456||EOMCO042|EOMCO042|31DEC2018|16:51:17|31DEC2018|SHOP NO.5,6,7 RUNWAL GR|CHEMBUR MHIN|||0000000010000.00|6761857316|508998|6011|GL Expected Output 10|EQU000000001|12345678|3456||EOMCO042|EOMCO042|31DEC2018|16:51:17|31DEC2018|SHOP NO.5,6,7 RUNWAL GR|CHEMBUR|MH|IN|0000000010000.00|6761857316|508998|6011|GL
The least annoying way to code this that I've found so far is: perl -F'\|' -lane '$F[10] = join "|", unpack "a23 A13 a2 a2", $F[10]; print join "|", #F' It's fairly straightforward: Iterate over lines of input; split each line on | and put the fields in #F. For the 11th field ($F[10]), split it into fixed-width subfields using unpack (and trim trailing spaces from the second field (A instead of a)). Reassemble subfields by joining with |. Reassemble the whole line by joining with | and printing it. I haven't benchmarked it in any way, but it's likely much faster than the original code that spawns multiple shell and cut processes per input line because it's all done in one process. A complete solution would wrap it in a shell loop: for file in *.txt *.TXT; do outfile="${file%.*}$output" perl -F'\|' -lane '...' "$file" > "$outfile" done Or if you don't need to trim the .txt part (and you don't have too many files to fit on the command line): perl -i.out -F'\|' -lane '...' *.txt *.TXT This simply places the output for each input file foo.txt in foo.txt.out.
A pure-bash implementation of all this logic #!/usr/bin/env bash shopt -s nocaseglob extglob for f in *.txt; do subName=${f%.*} while IFS='|' read -r -a fields; do location=${fields[10]} ton=${location:0:23}; ton=${ton%%+([[:space:]])} city=${location:23:12}; city=${city%%+([[:space:]])} state=${location:36:2} country=${location:38:2} fields[10]="$ton|$city|$state|$country" printf -v out '%s|' "${fields[#]}" printf '%s\n' "${out:0:$(( ${#out} - 1 ))}" done <"$f" >"$subName.out" done It's slower (if I did this well, by about a factor of 10) than pure awk would be, but much faster than the awk/shell combination proposed in the question. Going into the constructs used: All the ${varname%...} and related constructs are parameter expansion. The specific ${varname%pattern} construct removes the shortest possible match for pattern from the value in varname, or the longest match if % is replaced with %%. Using extglob enables extended globbing syntax, such as +([[:space:]]), which is equivalent to the regex syntax [[:space:]]+.
search for a string and after getting result cut that word and store result in variable
I Have a file name abc.lst i ahve stored that in a variable it contain 3 words string among them i want to grep second word and in that i want to cut the word from expdp to .dmp and store that into variable example:- REFLIST_OP=/tmp/abc.lst cat $REFLIST_OP 34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM Desired Output:- expdp_TEST_P119_*_18112017.dmp I Have tried below command :- FULL_DMP_NAME=`cat $REFLIST_OP|grep /orabackup|awk '{print $2}'` echo $FULL_DMP_NAME /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp
REFLIST_OP=/tmp/abc.lst awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" Test Results: $ REFLIST_OP=/tmp/abc.lst $ cat "$REFLIST_OP" 34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM $ awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" expdp_TEST_P119_*_18112017.dmp To save in variable myvar=$( awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" )
Following awk may help you on same. awk -F'/| ' '{print $6}' Input_file OR awk -F'/| ' '{print $6}' "$REFLIST_OP" Explanation: Simply making space and / as a field separator(as per your shown Input_file) and then printing 6th field of the line which is required by OP. To see the field number and field's value you could use following command too: awk -F'/| ' '{for(i=1;i<=NF;i++){print i,$i}}' "$REFLIST_OP"
Using sed with one of these regex sed -e 's/.*\/\([^[:space:]]*\).*/\1/' abc.lst capture non space characters after /, printing only the captured part. sed -re 's|.*/([^[:space:]]*).*|\1|' abc.lst Same as above, but using different separator, thus avoiding to escape the /. -r to use unescaped ( sed -e 's|.*/||' -e 's|[[:space:]].*||' abc.lst in two steps, remove up to last /, remove from space to end. (May be easiest to read/understand) myvar=$(<abc.lst); myvar=${myvar##*/}; myvar=${myvar%% *}; echo $myvar If you want to avoid external command (sed)
Print all columns except first using AWK
I have a file which contains file list. The file looks like this $ cat filelist D src/layouts/PersonAccount-Person Account Layout.layout D src/objects/Case Account-Record List.object I want to cut first two Columns and print only file names with along directory path. This list is dynamic. File name has spaces in between. So I can't use space as delimiter. How to get this using AWK command? The output should be like this src/layouts/PersonAccount-Person Account Layout.layout src/objects/Case Account-Record List.object
Can you try this once: bash-4.4$ cat filelist |awk '{$1="";print $0}' src/layouts/PersonAccount-Person Account Layout.layout src/objects/Case Account-Record List.object else if you want to remove 2 columns it would be: awk '{$1=$2="";print $0}' This will produce the below output: bash-4.4$ cat filelist |awk '{$1=$2="";print $0}' Account Layout.layout Account-Record List.object
Try this out: awk -F" " '{$1=""; print $0}' filelist | sed 's/^ //c' Here sed is used to remove the first space of the output line.
print only file names with along directory path awk approach: awk '{ sub(/^[[:space:]]*[^[:space:]][[:space:]]+/,"",$0) }1' filelist The output: src/layouts/PersonAccount-Person Account Layout.layout src/objects/Case Account-Record List.object ---------- To extract only basename of the file: awk -F'/' '{print $NF}' filelist The output: PersonAccount-Person Account Layout.layout Case Account-Record List.object
This will do exactly what you want for your example : sed -E 's/(.*)([ ][a-zA-Z0-9]+\/[a-zA-Z0-9]+\/[a-zA-Z0-9. -]+)/\2/g' filelist Explanation : Its matching your path (including spaces if there were any ) and then replacing the whole line with that one match. Easy peasy lemon squeezy :) Regards!
A simple grep grep -o '[^[:blank:]]*/.*' filelist That's zero or more non-blank characters followed by a slash followed by the rest of the string. This will not match any lines that don't have a slash
Here is a portable POSIX shell solution: #!/bin/sh cat "$#" |while read line; do echo "${line#* * }" done This loops over each line of the given input file(s) (or else standard input) and prints the line without the first two spaces or the text that exists before them. It is not greedy. Unlike some of the other answers here, this will preserve spacing (if any) in the rest of the line. If you want that as a one-liner: while read L < filelist; do echo "${L#* * }"; done This will fail if the uppermost directory's name starts with a space. To work around that, you need to peel away the leading ten characters (which I assume are static): #!/bin/sh cat "$#" |while read line; do echo "${line#??????????}" done As a one-liner, in bash, this can be simplified by using substrings: while read L < filelist; do echo "${L:10}"; done
Linux: Append Word Count to Each Line of a File
Quite new to Linux at the moment, I've seen some straightforward answers for appending a constant/non-changing word/component to the end of a file e.g. shell script add suffix each line However, I'd like to know how to append the word count for each line of a .csv file to the end of each line, so that: word1, word2, word3 foo1, foo2 bar1, bar2, bar3, bar4 Becomes: word1, word2, word3, 3 foo1, foo2, 2 bar1, bar2, bar3, bar4, 4 I am working with comma separated values, so if there is a quicker/simpler way to do it by making use of the commas rather than the items, then that would work as well. Cheers!
Simple awk solution: awk -F ',' '{print $0", "NF}' file.csv -F argument can be used to specify the field separator, , in your case. $0 will contain the entire line NF is the variable that contains the number of fields in the line
you can use this: while read line; do N=`echo $line | wc -w`; echo $line", "$N; done < inputfile.txt
a simple (yet most likely slow) bash script could do the trick: #!/bin/bash newfile=$1.tmp cat $1 | while read l ; do echo -n $l \ >> $newfile echo $l | wc -w >> $newfile done then move files according to your liking (be save by using tempfile ...) for file: one, one, two, one, two, three, I get: one, 1 one, two, 2 one, two, three, 3