Adding a linebreak in csv inside concat() - linux

I have an xmlstarlet command that looks like this:
xml sel -T-t -m /xml/path -v "concat(name,','value,',')" -n filename.xml > output.csv
It outputs like so
#output.csv
name,value,
name,value,
name,value,
I want it to look like
name,name,name,
value,value,value,
I have been focused on trying different combinations within concat:
"concat(name,'<p>'value,',')"
"concat(name,'<br />'value,',')"
"concat(name,'"<p>"'value,',')"
"concat(name,'\n'value,',')"
Am I looking at the completely wrong area?
The route I ended up taking was using a macro that transposed 8 rows into columns, within excel.

Well, your concat statement explicitly concatenates names and values. It seems what you want to do is loop over the elements twice, selecting first the names and then the values.

If you couldn't format the output using xml, then below script :
awk 'BEGIN{FS=","}
NR==FNR{if(NR!=1){array[$1]=$2;next}}
END{
for (key in array) {printf "%s,",key;valuelist=valuelist""array[key]","}
{printf "\n";print valuelist}
}' your_file_name > temp.txt && mv temp.txt your_file_name
will do the work for you.

You can process your data after the xml command:
unset allnames allvalues
while IFS=, read -r name value; do
allnames+="${name},"
allvalues+="${value},"
done < <(echo "name1,value1
name2,value2,
name3,value3" )
echo "${allnames}"
echo "${allvalues}"
You can not pipe it through the while loop (vars set inside the loop will got lost), so you can use it with your command like this:
unset allnames allvalues
while IFS=, read -r name value; do
allnames+="${name},"
allvalues+="${value},"
done < <(xml sel -T-t -m /xml/path -v "concat(name,','value,',')" -n filename.xml )
echo "${allnames}" > output.csv
echo "${allvalues}">> output.csv

Related

Bash: How to count the number of occurrences of a string within a file?

I have a file that looks something like this:
dog
cat
dog
dog
fish
cat
I'd like to write some kind of code in Bash to make the file formatted like:
dog:1
cat:1
dog:2
dog:3
fish:1
cat:2
Any idea on how to do this? The file is very large (> 30K lines), so the code should be somewhat fast.
I am thinking some kind of loop...
Like this:
while read line;
echo "$line" >> temp.txt
val=$(grep $line temp.txt)
echo "$val" >> temp2.txt
done < file.txt
And then paste -d ':' file1.txt temp2.txt
However, I am concerned that this would be really slow, as you're going line-by-line. What do other people think?
You may use this simple awk to do this job for you:
awk '{print $0 ":" ++freq[$0]}' file
dog:1
cat:1
dog:2
dog:3
fish:1
cat:2
Here's what I came up with:
declare -A arr; while read -r line; do ((arr[$line]++)); echo "$line:${arr[$line]}" >> output_file; done < input_file
First, declare hash table arr. Then read every line in a for loop and increment the value in the array with the key of the read line. Then echo out the line, followed out by the value in the hashtable. Lastly append into a file 'out'.
Awk or sed are very powerful but it's not bash, here is the bash variant
raw=( $(cat file) ) # read file
declare -A index # init indexed array
for item in ${raw[#]}; { ((index[$item]++)); } # 1st loop through raw data to count items
for item in ${raw[#]}; { echo $item:${index[$item]}; } # 2nd loop change data

AWK output to array

I am learning about "AWK". Well, I need to output awk command on variable to parse it.
The file have 130000 lanes. I need put with AWK a column like array to use the variable in other part of the script. Sorry for my english and ask me if you dont understand my objetive.
well the code:
awk '/file:/{ name=$3 ; print name ;)' ejemplo.txt
I try:
list=$(awk '/file:/{ name=$3 ; print name;)' ejemplo.txt)
but when I try to show the content in the variable $list, only show me 1 lane.
I tried declare array but only show me 1 result
anyone understand what happen? how can i build a array with all the output?
I try this code to build a array with AWK. Maybe I am a little dumb but i dont look how to solve my problem:
#!/bin/bash
#filename: script2.sh
conta=$(cat ejemplo_hdfs.txt | wc -l)
for i in `seq 0 $conta`;do
objeto=" "
owner=" "
group=" "
awk '{
if($1=="#" && $2=="file:") objeto=$3;
else if($1=="#" && $2=="owner:") owner=$3;
else if($1=="#" && $2=="group:") group=$3;
else
print $3
;}' ejemplo_hdfs.txt
echo $objeto+","+$owner+","+$group
done
To assign an array to a variable in bash, the whole expression that generates the elements needs to be in parenthesis. Each word produced by the evaluated expression becomes an element of the resulting array.
Example:
#!/bin/bash
foo=($(awk -F, '{print $2}' x.txt))
# Size of array
echo "There are ${#foo[#]} elements"
# Iterate over each element
for f in "${foo[#]}"; do
echo "$f"
done
# Use a specific element.
echo "The second element is ${foo[1]}"
$ cat x.txt
1,a dog
2,b
3,c
$ ./array_example.sh
There are 4 elements
a
dog
b
c
The second element is dog

Iterative Bash Script Bug

Using a bash script, I'm trying to iterate through a text file that only has around 700 words, line-by-line, and run a case-insensitive grep search in the current directory using that word on particular files. To break it down, I'm trying to output the following to a file:
Append a newline to a file, then the searched word, then another newline
Append the results of the grep command using that search
Repeat steps 1 and 2 until all words in the list are exhausted
So for example, if I had this list.txt:
search1
search2
I'd want the results.txt to be:
search1:
grep result here
search2:
grep result here
I've found some answers throughout the stack exchanges on how to do this and have come up with the following implementation:
#!/usr/bin/bash
while IFS = read -r line;
do
"\n$line:\n" >> "results.txt";
grep -i "$line" *.in >> "results.txt";
done < "list.txt"
For some reason, however, this (and the numerous variants I've tried) isn't working. Seems trivial, but I'd it's been frustrating me beyond belief. Any help is appreciated.
Your script would work if you changed it to:
while IFS= read -r line; do
printf '\n%s:\n' "$line"
grep -i "$line" *.in
done < list.txt > results.txt
but it'd be extremely slow. See https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for why you should think long and hard before writing a shell loop just to manipulate text. The standard UNIX tool for manipulating text is awk:
awk '
NR==FNR { words2matches[$0]; next }
{
for (word in words2matches) {
if ( index(tolower($0),tolower(word)) ) {
words2matches[word] = words2matches[word] $0 ORS
}
}
}
END {
for (word in words2matches) {
print word ":" ORS words2matches[word]
}
}
' list.txt *.in > results.txt
The above is untested of course since you didn't provide sample input/output we could test against.
Possible problems:
bash path - use /bin/bash path instead of /usr/bin/bash
blank spaces - remove ' ' after IFS
echo - use -e option for handling escape characters (here: '\n')
semicolons - not required at end of line
Try following script:
#!/bin/bash
while IFS= read -r line; do
echo -e "$line:\n" >> "results.txt"
grep -i "$line" *.in >> "results.txt"
done < "list.txt"
You do not even need to write a bash script for this purpose:
INPUT FILES:
$ more file?.in
::::::::::::::
file1.in
::::::::::::::
abc
search1
def
search3
::::::::::::::
file2.in
::::::::::::::
search2
search1
abc
def
::::::::::::::
file3.in
::::::::::::::
abc
search1
search2
def
search3
PATTERN FILE:
$ more patterns
search1
search2
search3
CMD:
$ grep -inf patterns file*.in | sort -t':' -k3 | awk -F':' 'BEGIN{OFS=FS}{if($3==buffer){print $1,$2}else{print $3; print $1,$2}buffer=$3}'
OUTPUT:
search1
file1.in:2
file2.in:2
file3.in:2
search2
file2.in:1
file3.in:3
search3
file1.in:4
file3.in:5
EXPLANATIONS:
grep -inf patterns file*.in will grep all the file*.in with all the patterns located in patterns file thanks to -f option, using -i forces insensitive case, -n will add the line numbers
sort -t':' -k3 you sort the output with the 3rd column to regroup patterns together
awk -F':' 'BEGIN{OFS=FS}{if($3==buffer){print $1,$2}else{print $3; print $1,$2}buffer=$3}' then awk will print the display that you want by using : as Field Separator and Output Field Separator, you use a buffer variable to save the pattern (3rd field) and you print the pattern whenever it changes ($3!=buffer)

BASH convert a text file line by line into variables

I want to make a script for an automated setup for a multiseat system. First action is
lspci | grep -i 'vga\|graphic' | cut -b 1-7 > text.txt
Now i want to put the two lines of the file into variables. My dowdy solution was this:
VAR1=$(head -n 1 text.txt)
VAR2=$(tail -n 1 text.txt)
It also works, however, there's probably a better solution to convert a text file line by line into variables.
The following should achieve exactly what you're doing, without the use of a temporary file
#!/bin/bash
{ read -r var1 _ && read -r var2 _; } < <(lspci | grep -i 'vga\|graphics')
Now, if you have several lines from lspci | grep -i 'vga\|graphics' (or just one, or none), you might want something more general, i.e., put the results in an array:
#!/bin/bash
var=()
while read -r f _; do var+=( "$f" ); done < <(lspci | grep -i 'vga\|graphics')
# display the content of var
declare -p var
If you have a recent version of Bash, and you love mapfile and awk (but who doesn't?), you could also do something like this:
#!/bin/bash
mapfile -t var < <(lspci | awk 'tolower($0) ~ /var|graphics/ { print $1 }')
# display the content of var
declare -p var
For a Pure Bash possibility (except for lspci, of course):
#!/bin/bash
shopt -s extglob
var=()
while read -r v rest; do
[[ ${rest,,} = *#(vga|graphics)* ]] && var+=( "$v" )
done < <(lspci)
# display var
declare -p var
This uses:
Lower case conversion of rest with ${rest,,}
Pattern matching and extended globs with *#(vga|graphics)* (to avoid regular expressions altogether).
If you can format your text file into name value pairs, you could use bash associative arrays to store and reference each item. Note in this code = is used as the delimiter to separate the name value pair.
#read in fast config (name value pair file)
declare -A MYMAP
while read item; do
NAME=$(cut -d "=" -f1 <<<"$item")
VALUE=$(cut -d "=" -f2 <<<"$item")
MYMAP["$NAME"]="$VALUE"
done <./config_file.txt
#size of map
MYMAP_N=${#MYMAP[#]}
#make a list of keys
KEYS=("${!MYMAP[#]}")
#dereference map
SELECTION="${MYMAP["my_first_key"]}"
If the values do not contain spaces, in bash using the array variable:
declare -a vars
eval "vars=(`echo line1; echo line2`)" # the `echo ...` simulates your command
echo number of values: ${#vars[#]}
for ((I = 0; I < ${#vars[#]}; ++I )); do
echo value \#$I is ${vars[$I]}
done
echo all values : ${vars[*]}
The trick is to generate the statement initializing the array with the values, and then eval it.
If the values have spaces/special characters, then escaping/quoting might be necessary.
read VAR1 VAR2 < <(sed -n '1p;$p' myfile | tr '\n' ' ')
This ought to do what you need it uses process substitution to print the lines you want and then redirects them to the variables, if you want different lines just build this statement as you need with a for loop putting whatever lines you want if you want them all use wc to count the lines, then build VAR1 .. VAR[n] and sed -n '1p;2p;3p..[n]p' and you then can eval the built statement.

Unix shell scripting: assign value from file to variable into while loop, and use this value outside the loop

I'm writing a shell script, where I have to extract the contents of a file which is of type:
type1|valueA
type2|valueB
type1|valueC
type2|valueD
type3|valueE
....
typen|valueZ.
For each type in column_1, I have a target variable, which concatenates the values of the same type, to get a result like this:
var1=valueA,valueC
var2=valueB,valueD
var3=valueE
.....
Script implements something like this:
var1="HELLO"
var2="WORLD"
...
cat $file | while read record; do
#estract column_1 and column_2 from $record
if [ $column_1 = "tipo1" ]; then
var1="$var1, column_2" ## column_2 = valueB
elif ....
....
fi
done
But when I try to use the value of any of the variables where I chained column_2:
echo "$var1 - $var2"
I get the original values:
HELLO - WORLD.
Searching the internet, I read that the problem is related to the fact that the pipeline creates a subshell where the actual values are copied.
Is there a way to solve this problem!?
Above all, there is a way that would fit for all types of shells, in fact, this script must run potentially on different shells!?
I do not want to use file support on which to write the partial results.
You don't need to use cat. Piping something into while creates a subshell. When the subshell exits, the values of variables set in the loop are lost (as would be directory changes using cd as another example. Instead, you should redirect your file into the done:
while condition
do
# do some stuff
done < inputfile
By the way, instead of:
while read record
you can do:
while IFS='|' read -r column1 column2
BASH FAQ entry #24: "I set variables in a loop. Why do they suddenly disappear after the loop terminates? Or, why can't I pipe data to read?"
Oneliner:
for a in `awk "-F|" '{print $1;}' test | sort -u` ; do echo -n "$a =" ; grep -e "^$a" test | awk "-F|" '{ printf(" %s,", $2);}' ; echo "" ; done
Using awk
awk '{a[$1]=a[$1]==""?$2:a[$1] OFS $2}
END{for (i in a) print i"="a[i]}' FS=\| OFS=, file
type1=valueA,valueC
type2=valueB,valueD
type3=valueE

Resources