I am learning about "AWK". Well, I need to output awk command on variable to parse it.
The file have 130000 lanes. I need put with AWK a column like array to use the variable in other part of the script. Sorry for my english and ask me if you dont understand my objetive.
well the code:
awk '/file:/{ name=$3 ; print name ;)' ejemplo.txt
I try:
list=$(awk '/file:/{ name=$3 ; print name;)' ejemplo.txt)
but when I try to show the content in the variable $list, only show me 1 lane.
I tried declare array but only show me 1 result
anyone understand what happen? how can i build a array with all the output?
I try this code to build a array with AWK. Maybe I am a little dumb but i dont look how to solve my problem:
#!/bin/bash
#filename: script2.sh
conta=$(cat ejemplo_hdfs.txt | wc -l)
for i in `seq 0 $conta`;do
objeto=" "
owner=" "
group=" "
awk '{
if($1=="#" && $2=="file:") objeto=$3;
else if($1=="#" && $2=="owner:") owner=$3;
else if($1=="#" && $2=="group:") group=$3;
else
print $3
;}' ejemplo_hdfs.txt
echo $objeto+","+$owner+","+$group
done
To assign an array to a variable in bash, the whole expression that generates the elements needs to be in parenthesis. Each word produced by the evaluated expression becomes an element of the resulting array.
Example:
#!/bin/bash
foo=($(awk -F, '{print $2}' x.txt))
# Size of array
echo "There are ${#foo[#]} elements"
# Iterate over each element
for f in "${foo[#]}"; do
echo "$f"
done
# Use a specific element.
echo "The second element is ${foo[1]}"
$ cat x.txt
1,a dog
2,b
3,c
$ ./array_example.sh
There are 4 elements
a
dog
b
c
The second element is dog
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Editor's note: I've clarified the problem definition, because I think the problem is an interesting one, and this question deserves to be reopened.
I've got a text file containing key-value lines in the following format - note that the # lines below are only there to show repeating blocks and are NOT part of the input:
Country:United Kingdom
Language:English
Capital city:London
#
Country:France
Language:French
Capital city:Paris
#
Country:Germany
Language:German
Capital city:Berlin
#
Country:Italy
Language:Italian
Capital city:Rome
#
Country:Russia
Language:Russian
Capital city:Moscow
Using shell commands and utilities, how can I transform such a file to CSV format, so it will look like this?
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
In other words:
Make the key names the column names of the CSV header row.
Make the values from each block a data row each.
[OP's original] Edit: My idea would be to separate the entries e.g. Country:France would become Country France, and then grep/sed the heading. However I have no idea how to move the headings from a single column to several separate ones.
A simple solution with cut, paste, and head (assumes input file file, outputs to file out.csv):
#!/usr/bin/env bash
{ cut -d':' -f1 file | head -n 3 | paste -d, - - -;
cut -d':' -f2- file | paste -d, - - -; } >out.csv
cut -d':' -f1 file | head -n 3 creates the header line:
cut -d':' -f1 file extracts the first :-based field from each input line, and head -n 3 stops after 3 lines, given that the headers repeat every 3 lines.
paste -d, - - - takes 3 input lines from stdin (one for each -) and combines them to a single, comma-separated output line (-d,)
cut -d':' -f2- file | paste -d, - - - creates the data lines:
cut -d':' -f2- file extracts everything after the : from each input line.
As above, paste then combines 3 values to a single, comma-separated output line.
agc points out in a comment that the column count (3) and the paste operands (- - -) are hard-coded above.
The following solution parameterizes the column count (set it via n=...):
{ n=3; pasteOperands=$(printf '%.s- ' $(seq $n))
cut -d':' -f1 file | head -n $n | paste -d, $pasteOperands;
cut -d':' -f2- file | paste -d, $pasteOperands; } >out.csv
printf '%.s- ' $(seq $n) is a trick that produces a list of as many space-separated - chars. as there are columns ($n).
While the previous solution is now parameterized, it still assumes that the column count is known in advance; the following solution dynamically determines the column count (requires Bash 4+ due to use of readarray, but could be made to work with Bash 3.x):
# Determine the unique list of column headers and
# read them into a Bash array.
readarray -t columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' file)
# Output the header line.
(IFS=','; echo "${columnHeaders[*]}") >out.csv
# Append the data lines.
cut -d':' -f2- file | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[#]})) >>out.csv
awk -F: 'seen[$1]++ { exit } { print $1 } outputs each input line's column name (the 1st :-separated field), remembers the column names in associative array seen, and stops at the first column name that is seen for the second time.
readarray -t columnHeaders reads awk's output line by line into array columnHeaders
(IFS=','; echo "${columnHeaders[*]}") >out.csv prints the array elements using a space as the separator (specified via $IFS); note the use of a subshell ((...)) so as to localize the effect of modifying $IFS, which would otherwise have global effects.
The cut ... pipeline uses the same approach as before, with the operands for paste being created based on the count of the elements of array columnHeaders (${#columnHeaders[#]}).
To wrap the above up in a function that outputs to stdout and also works with Bash 3.x:
toCsv() {
local file=$1 columnHeaders
# Determine the unique list of column headers and
# read them into a Bash array.
IFS=$'\n' read -d '' -ra columnHeaders < <(awk -F: 'seen[$1]++ { exit } { print $1 }' "$file")
# Output the header line.
(IFS=','; echo "${columnHeaders[*]}")
# Append the data lines.
cut -d':' -f2- "$file" | paste -d, $(printf '%.s- ' $(seq ${#columnHeaders[#]}))
}
# Sample invocation
toCsv file > out.csv
My bash script for this would be :
#!/bin/bash
count=0
echo "Country,Language,Capital city"
while read line
do
(( count++ ))
(( count -lt 3 )) && printf "%s," "${line##*:}"
(( count -eq 3 )) && printf "%s\n" "${line##*:}" && (( count = 0 ))
done<file
Output
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
Edit
Replaced [ stuff ] with (( stuff )) ie test with double parenthesis which is used for arithmetic expansion.
You can also write a slightly more generalized version of a bash script that can take the number of repeating rows holding the data and produce output on that basis to avoid hardcoding the header values and handle additional fields. (you could also just scan the field names for the first repeat and set the repeat rows in that manner as well).
#!/bin/bash
declare -i rc=0 ## record count
declare -i hc=0 ## header count
record=""
header=""
fn="${1:-/dev/stdin}" ## filename as 1st arg (default: stdin)
repeat="${2:-3}" ## number of repeating rows (default: 3)
while read -r line; do
record="$record,${line##*:}"
((hc == 0)) && header="$header,${line%%:*}"
if ((rc < (repeat - 1))); then
((rc++))
else
((hc == 0)) && { printf "%s\n" "${header:1}"; hc=1; }
printf "%s\n" "${record:1}"
record=""
rc=0
fi
done <"$fn"
There are any number of ways to approach the problem. You will have to experiment to find the most efficient for your data file size, etc. Whether you use a script, or a combination of shell tools, cut, paste, etc.. is to a large extent left to you.
Output
$ bash readcountry.sh country.txt
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
Output with 4 Fields
Example input file adding a Population field:
$ cat country2.txt
Country:United Kingdom
Language:English
Capital city:London
Population:20000000
<snip>
Output
$ bash readcountry.sh country2.txt 4
Country,Language,Capital city,Population
United Kingdom,English,London,20000000
France,French,Paris,10000000
Germany,German,Berlin,150000000
Italy,Italian,Rome,9830000
Russia,Russian,Moscow,622000000
Using datamash, tr, and join:
datamash -t ':' -s -g 1 collapse 2 < country.txt | tr ',' ':' |
datamash -t ':' transpose |
join -t ':' -a1 -o 1.2,1.3,1.1 - /dev/null | tr ':' ','
Output:
Country,Language,Capital city
United Kingdom,English,London
France,French,Paris
Germany,German,Berlin
Italy,Italian,Rome
Russia,Russian,Moscow
I have a list file as
mike
jack
jack
mike
sometimes is (no mike)
jack
jack
I would like to test whether this file only contains one mike or multiple mike's like following
if [list **only** contains one `mike` or multiple `mike`'s]
then
do something
else
echo jack(other's name) is using it
done
[ "$(sort inputfile | uniq)" = mike ]
sort the input, then remove all identical lines. You need to sort the input for uniq because it works only for consecutive identical lines.
Short form:
[ "$(sort --unique inputfile)" = mike ]
Solution in bash
You can use a while loop with read to read through the lines as
while read line;
do
[ $line == "mike" ] && ((count=$count+1))
done < inputFile
The $count will contain the count of mike in the file.
$ echo $count
2
Solution in awk
$ awk '/mike/{count++}; END{print count}' input
2
find_mike () {
mike_count=$(grep -c 'mike');
if (( mike_count == 1 )); then
printf 'I found only one mike.'
elif (( mike_count > 1 )); then
printf 'I found %d mikes.' "$mike_count"
else
printf '%s\n' "I have no idea where is mike";
fi
}
Usage example:
$ find_mike < input_file.txt
I found 2 mikes
grep -xvq mike inputfile
-x: match the whole line
-v: invert the match
-q: do not print anything; exit at first match
This command exits with 0 as soon as it finds something that is not mike. If the file is empty or contains only (any number of) mike lines, it exits with 1.
grep is very, very fast. And it stops as soon as possible parsing the input file.
You might want to invert the exit value by prefixing the command with an exclamation mark.
I tried a lot of things, but now I am at my wit's end.
My problem is I need the index of a specific string from my dynamically generated output.
In example, I want the index from the string 'cookie' in this output:
1337 cat dog table cookie 42
So in this example I would need this result:
5
One problem is that I need that number for a later executed awk command. Another problem is that this generated output has a flexible length and you are not able to 'sed' something with . - or something else. There is no such pattern like this.
Cheers
Just create an array mapping the string value to it's index and then print the entry:
$ cat file
1337 cat dog table cookie 42
$ awk -v v="cookie" '{v2i[v]=0; for (i=1;i<=NF;i++) v2i[$i]=i; print v2i[v]}' file
5
The above will print 0 if the string doesn't exist as a field on the given line.
By the way, you say you need that number output from the above "for a later executed awk command". Wild idea - why not do both steps in one awk command?
Ugly, but possible:
echo '1337 cat dog table cookie 42' \
| tr ' ' '\n' \
| grep -Fn cookie \
| cut -f1 -d:
Here is a way to find position of word in a string using gnu awk (due to RS), and store it to a variable.
pat="cookie"
pos=$(echo "1337 cat dog table cookie 42" | awk '{print NF+1;exit}' RS="$pat")
echo "$pos"
5
If you do not have gnu awk
pat="cookie"
pos=$(echo "1337 cat dog table cookie 42" | awk '{for (i=1;i<=NF;i++) if ($i~p) print i}' p="$pat")
echo "$pos"
5
Here is pure bash way of doing it with arrays, no sed or awk or GNUs required ;-)
# Load up array, you would use your own command in place of echo
array=($(echo 1337 cat dog table cookie 42))
# Show what we have
echo ${array[*]}
1337 cat dog table cookie 42
# Find which element contains our pattern
for ((i=0;i<${#array[#]};i++)); do [ ${array[$i]} == "cookie" ] && echo $(($i+1)); done
5
Of course, you could set a variable to use later instead of echoing $i+1. You may also want some error checking in case pattern isn't found, but you get the idea!
Here is another answer, not using arrays, or "sed" or "awk" or "tr", just based on the bash IFS separating the values for you:
#!/bin/bash
output="cat dog mouse cookie 42" # Or output=$(yourProgram)
f=0 # f will be your answer
i=0 # i counts the fields
for x in $output; do \
((i++)); [[ "$x" = "cookie" ]] && f=$i; \
done
echo $f
Result:
4
Or you can put it all on one line, if you remove the backslashes, like this:
#!/bin/bash
output="cat dog mouse cookie 42" # Or output=$(yourProgram)
f=0;i=0;for x in $output; do ((i++)); [[ "$x" = "cookie" ]] && f=$i; done
echo $f
Explanation:
The "[[a=b]] && c" part is just shorthand for
if [a=b]; then
c
fi
It relies on shortcut evaluation of logicals. Basically, we are asking shell to determine if the two statements "a equals b" AND the statement "c" are both true. If a is not equal to b, it already knows it doesn't need to evaluate c because they already can't both be true - so f doesn't get the value of i. If, on the other hand, a is equal to b, the shell must still evaluate statement "c" to see if it is also true - and when it does so, f will get the value of i.
Pat="cookie"
YourInput | sed -n "/${Pat}/ {s/.*/ & /;s/ ${Pat} .*/I/;s/[[:blank:]\{1,\}[^[:blank:]\{1,\}/I/g
s/I\{9\}/9/;s/I\{8\}/8/;s/I\{7\}/7/;s/IIIIII/6/;s/IIIII/5/;s/IIII/4/;s/III/3/;s/II/2/;s/I/1/
p;q;}
$ s/.*/0/p"
if there is more than 9 cols, a more complex sed could be made or pass through a wc -c instead
I am trying to get the column names of a file and print them iteratively. I guess the problem is with the print $i but I don't know how to correct it. The code I tried is:
#! /bin/bash
for i in {2..5}
do
set snp = head -n 1 smaller.txt | awk '{print $i}'
echo $snp
done
Example input file:
ID Name Age Sex State Ext
1 A 12 M UT 811
2 B 12 F UT 818
Desired output:
Name
Age
Sex
State
Ext
But the output I get is blank screen.
You'd better just read the first line of your file and store the result as an array:
read -a header < smaller.txt
and then printf the relevant fields:
printf "%s\n" "${header[#]:1}"
Moreover, this uses bash only, and involves no unnecessary loops.
Edit. To also answer your comment, you'll be able to loop through the header fields thus:
read -a header < smaller.txt
for snp in "${header[#]:1}"; do
echo "$snp"
done
Edit 2. Your original method had many many mistakes. Here's a corrected version of it (although what I wrote before is a much preferable way of solving your problem):
for i in {2..5}; do
snp=$(head -n 1 smaller.txt | awk "{print \$$i}")
echo "$snp"
done
set probably doesn't do what you think it does.
Because of the single quotes in awk '{print $i}', the $i never gets expanded by bash.
This algorithm is not good since you're calling head and awk 4 times, whereas you don't need a single external process.
Hope this helps!
You can print it using awk itself:
awk 'NR==1{for (i=2; i<=5; i++) print $i}' smaller.txt
The main problem with your code is that your assignment syntax is wrong. Change this:
set snp = head -n 1 smaller.txt | awk '{print $i}'
to this:
snp=$(head -n 1 smaller.txt | awk '{print $i}')
That is:
Do not use set. set is for setting shell options, numbered parameters, and so on, not for assigning arbitrary variables.
Remove the spaces around =.
To run a command and capture its output as a string, use $(...) (or `...`, but $(...) is less error-prone).
That said, I agree with gniourf_gniourf's approach.
Here's another alternative; not necessarily better or worse than any of the others:
for n in $(head smaller.txt)
do
echo ${n}
done
somthin like
for x1 in $(head -n1 smaller.txt );do
echo $x1
done