Add Doublequotes for all columns in text file with bash - linux

Sorry for bad format... now it shows the point:
(columns - tab delimited)
Input:
1 2 3 4
5 6 7 8
some columns may looks like this:
1 some text 2 text with space 3 lots of `:"'etc. 4
5 6 7 8
How to make output:
"1" "2" "3" "4"
"5" "6" "7" "8"
Or even better:
"1","2","3","4"
"5","6","7","8"
Got it! It's a bit stupid... but works:
sed 's/\t/","/g' input.txt | sed 's/^/"/;s/$/"/'
first sed is changing tabs to "," and next one is adding " at the beginning and the end of line.

Replace each tab with ", ", put an extra " to the beginning an to the end:
string=$'1\t2\t3\t4\t5\t6\t7\t8'
echo \"${string//$'\t'/\", \"}\"

This might work for you (GNU sed):
echo -e '1\t2\t3\t4\t5\t6\t7\t8' | sed 's/[^\t]\+/"&"/g;y/\t/,/'
"1","2","3","4","5","6","7","8"

Something like this in perl should work:
perl -F'\t' -lape '$_ = qw(") . join(qw(","), #F) . qw(")' infile
It splits each line of infile at tab chars and joins them with "," while also pre- and appending ".

The simplest solution I know of uses perl:
while read; do
IFS=$'\t' perl -e '$,=","; $\="\n"; print map(qq/"$_"/, #ARGV); ' $REPLY
done < input.txt
A pure bash solution that requires a temporary variable:
while read; do
IFS=$'\t' printf -v var '"%s",' $REPLY
echo "${var%,}"
done < input.txt

Pure Bash, for tab delimited output:
while read line ; do
echo -e "\"${line//$'\t'/\"\t\"}\""
done < "$infile"
for comma delimited output:
while read line ; do
echo -e "\"${line//$'\t'/\",\"}\""
done < "$infile"
Spaces inside strings are allowed.

Related

Get the last two letters of each line in a file using script shell

I have a .txt file with 20 lines, and would love to get the last two letters of each line. it equals AA in every line then print Good. if not, print Bad.
line11111111111111111 AA
line22222222222222222 AA
line33333333333333333 AA
.....................
line20202020202020202 AA
This is GOOD.
===========================
line11111111111111111 AB
line22222222222222222 AC
line33333333333333333 WD
.....................
line20202020202020202 ZZ
This is BAD.
Did this but needs improvement : sed 's/^.*\(.\{2\}\)/\1/'
based on your file layout
$ awk '$NF!="AA"{f=1; exit} END{print (f?"BAD":"GOOD")}' file
note that you don't have to check the rest after the first non "AA" token.
You may use a single command awk:
awk 'substr($0, length()-1) != "AA"{exit 1}' file && echo "GOOD" || echo "BAD"
substr($0, length()-1) will extract last 2 characters of every line. awk command will exit with 1 if we don't fine AA in any line.
Use a grep invert-match to identify lines not ending with "AA":
if egrep -q -v AA$ input.txt; then echo "bad"; else echo "good";fi
This script shoud work with awk. The name of the txt file for me is .test you can change it with your file name.
if [ "$(awk '{ print $2 }' .test | uniq)" = "AA" ]; then
echo "This is GOOD";
else echo "This is BAD";
fi
How it works:
First, awk is used to get the second column by awk '{ print $2 }' and using uniq command we are taking unique entries from each line. If all the lines are AA uniq makes it only 1 line. At last we check whether this last product is only "AA" (1 line string with 2 As) or not.
Solution with grep and wc:
if [ "`grep -v 'AA$' your-file-here | wc -l`" == "0" ] ; then echo 'GOOD' ; else echo 'BAD' ; fi
The grep command checks for all lines not ending with AA and wc counts how many lines.

How to append and print Numbers from a text file?

I need to start from file.txt, which contains entries like this:
1
2
3
4
5
I need to print the following:
1,100
2,100
3,100
4,100
5,100
I have attempted this, but am receiving an invalid number error:
printf '%d,100\n' "$(< file.txt)"
You can use awk:
$ awk '{printf "%s,100\n", $0}' file
1,100
2,100
3,100
4,100
5,100
You could use
while read in; do echo "$in,100"; done < file.txt
Your error is caused by printf getting the whole file at once and not line by line.
This is one of the rare occurrences of "too many quotes". Observe:
$ cat file.txt
1
2
3
4
5
$ var=$(< file.txt)
$ echo "$var" # Quotes preserve original whitespace
1
2
3
4
5
$ echo $var # No quotes reduce all whitespace to single spaces
1 2 3 4 5
The quotes make echo "see" just a single argument, namely the formatted file contents. Without the quotes, every line becomes an argument to echo, and they're printed separated by just spaces.
So, you can solve your problem with
$ printf '%d,100\n' $(< file.txt)
1,100
2,100
3,100
4,100
5,100
Another solution would be to use sed:
$ sed 's/$/,100/' file.txt
1,100
2,100
3,100
4,100
5,100
This substitutes the end of the line, $, with ,100, for each line.
How about:
( set -f; set -- $(< file.txt)
printf '%d,100\n' "$#" )
seq 6 | awk '$0=$0",100"'
1,100
2,100
3,100
4,100
5,100
6,100

How do I produce a tab separated file from a text file?

I've got a file with numbers on odd lines and text on even lines like this:
123123
string A
456456
string B
789789
string C
I want to get it to this format (tab separated):
123123 string A
456456 string B
789789 string C
I tried to remove all newlines with
tr -s -d " " < myFile
then use sed with
sed -i 's/[0-9]\s/' myFile
but without great success.
Can you help me get this done?
The simplest way is to use paste as follows:
paste - - < myFile
In this command, paste reads the file from stdin and combines every two lines (that's what the two "-" are for) into a single line separated by a tab.
Yo can try the following:
paste <(grep -E '^[[:digit:]]+' myFile) \
<(grep -E '^[[:alpha:]]+' myFile) \
Try:
sed '$!N;s/\n/\t/' inputfile
This would join the lines separated by a TAB character.
Example:
$ seq 10 | sed '$!N;s/\n/\t/'
1 2
3 4
5 6
7 8
9 10
Using awk:
awk 'NR%2{printf "%s\t", $0;next}1' file
123123 string A
456456 string B
789789 string C
Using perl:
perl -pe 'chomp; $_ = ($. % 2) ? sprintf "%s\t", $_ : sprintf "%s\n", $_;' file
Using bash:
c=0
while read a; do
((c++))
if ((c%2)); then
printf "%s\t" "$a"
else
echo "$a"
fi
done < file

print a line which has a digit repeated n times in the third field

I have a file with contents:
20120619112139,3,22222288100597,01,503352786544597,,W,ROAMER,,,,0,mme2
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120611171517,3,22222288100620,,503352786544620,11917676228846,B,ROAMER,8,2505,,U,
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120611171003,3,22222288100618,02,503352786544618,,W,ROAMER,8,2505,,0,
20120611171046,3,00000000000000,02,503352786544618,11917676228846,W,ROAMER,8,2505,,0,
20120611171101,3,22222288100618,02,503352786544618,11917676228846,W,ROAMER,8,2505,,0,
20120611171101,3,22222222222222,02,503352786544618,11917676228846,W,ROAMER,8,2505,,0,
I need to check if the third field of any line has one digit repeated all through 14 times, like:00000000000000 and print such lines to another file
I tried this code:
awk '$3 ~ /[0-9]{14}/' myfile > output.txt
But this prints lines having "22222288100618" such values as well.
Also i tried:
for i in `cat myfile`
do
if [ `echo $i | cut -d"," -f 3 | egrep "^[0-9]{14}$"` ];
then echo $i >> output.txt;
fi
done
This doesn't help as well.This also prints all the lines.
But I only need these lines in the output file.
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120611171046,3,00000000000000,02,503352786544618,11917676228846,W,ROAMER,8,2505,,0,
20120611171101,3,22222222222222,02,503352786544618,11917676228846,W,ROAMER,8,2505,,0,
Thanks in advance for any immediate help
Don't know if this can be done with awk but this should work:
perl -aF, -nle '$F[2]=~/(\d)\1{13}/&& print'
You can use an expression like 0{14}|1{14}.... Try this:
$ for i in 0 1 2 3 4 5 6 7 8 9; do re=$re${re:+|}$i{14}; done
$ awk -F, --posix \$3~/$re/ myfile
(gawk requires --posix to recognize the interval expression {14}. This may not be necessary with all awk.)
Using grep:
grep -E "[0-9]+,[0-9]+,([0-9])\1{13}" myfile
sed -n '/^[^,]+,[^,]+,([0-9])\1{13}/p' input_file

split for words separated with semicolon

I have some string like
1;2;3;4;5
I want to be able to iterate over this string taking each word one by one. For the first iteration to take 1 the next to take 2 and the last 5.
I want to have something like this
for i in $(myVar)
do
echo $i
done
but I do not know how to fill the myvar
echo '1;2;3;4;5' | tr \; \\n | while read line ; do echo $line; done
There's no need to back up the IFS variable if you assign it only for a single command:
$ IFS=';' read -a words <<<"1;2;3;4;5"
$ for word in "${words[#]}"
do
echo "$word"
done
1
2
3
4
5
Other useful syntax:
$ echo "${words[0]}"
1
$ echo "${words[#]: -1}"
5
$ echo "${words[#]}"
1 2 3 4 5
Probably the easiest way to do this is change the IFS environment variable:
OLDIFS="$IFS"
IFS=';'
for num in $a; do echo $num; done
# prints:
1
2
3
4
5
IFS="$OLDIFS"
Remember to change it back afterwards or weird things will happen! :)
From the bash man page:
IFS The Internal Field Separator that is used for word splitting
after expansion and to split lines into words with the read
builtin command. The default value is ``<space><tab><new-
line>''.
This might work for you:
array=($(sed 'y/;/ /' <<<"1;2;3;4;5"))
for word in "${array[#]}"; do echo "$word"; done
for w in $(echo '1;2;3;4;5' | tr \; \\n); do echo $w; done

Resources