How do I produce a tab separated file from a text file? - linux

I've got a file with numbers on odd lines and text on even lines like this:
123123
string A
456456
string B
789789
string C
I want to get it to this format (tab separated):
123123 string A
456456 string B
789789 string C
I tried to remove all newlines with
tr -s -d " " < myFile
then use sed with
sed -i 's/[0-9]\s/' myFile
but without great success.
Can you help me get this done?

The simplest way is to use paste as follows:
paste - - < myFile
In this command, paste reads the file from stdin and combines every two lines (that's what the two "-" are for) into a single line separated by a tab.

Yo can try the following:
paste <(grep -E '^[[:digit:]]+' myFile) \
<(grep -E '^[[:alpha:]]+' myFile) \

Try:
sed '$!N;s/\n/\t/' inputfile
This would join the lines separated by a TAB character.
Example:
$ seq 10 | sed '$!N;s/\n/\t/'
1 2
3 4
5 6
7 8
9 10

Using awk:
awk 'NR%2{printf "%s\t", $0;next}1' file
123123 string A
456456 string B
789789 string C
Using perl:
perl -pe 'chomp; $_ = ($. % 2) ? sprintf "%s\t", $_ : sprintf "%s\n", $_;' file
Using bash:
c=0
while read a; do
((c++))
if ((c%2)); then
printf "%s\t" "$a"
else
echo "$a"
fi
done < file

Related

Test if ALL The Contents of One File exist in a Second File

I have found a few examples on stackoveflow on how to do this but none of them work for me.
bash text search: find if the content of one file exists in another file
I want to test whether ALL the contents of one text file exists in the same format/block/style somewhere in a second file and if not add the contents of SRC >> $TGT.
If I execute these commands manually in the console, then it returns the contents of $SRC:
SRC="mytextfile1.txt"
TGT="mytextfile2.txt"
grep -F -f $SRC $TGT
cat $TGT|grep -f $SRC
And this returns nothing:
grep $SRC -q -f $TGT
And this keeps appending each time it is executed:
function append {
f1=$(wc -c < "$SRC")
diff -y <(od -An -tx1 -w1 -v "$SRC") <(od -An -tx1 -w1 -v "$TGT") | \
rev | cut -f2 | uniq -c | grep -v '[>|]' | numgrep /${f1}../ | \
grep -q -m1 '.+*' || cat "$SRC" >> "$TGT";
}
So how can I do this so that it can then be tested in an if statement ?!
EDIT
Here's an example of the file contents:
$SRC File
text 1
text 2
text d
text e
text f
text g
$TGT File Before Modified
text 1
text 2
text 3
text 4
text a
text b
text c
$TGT File After Modified
text 1
text 2
text 3
text 4
text a
text b
text c
text 1
text 2
text d
text e
text f
text g
I would use perl's index for this:
if ! perl -0 -we '
open my $f1, "<", "mytextfile1.txt";
open my $f2, "<", "mytextfile2.txt";
exit( index(<$f2>, <$f1>) == -1 )'
then
cat mytextfile1.txt >> mytextfile2.txt
fi
The key here is -0, which makes the <> operator read the entire file instead of just one line. Note that the logic is somewhat convolutee. If index returns -1, the content is not matched and perl returns non-zero, which the shell treats as failure. So the if condition is inverted. It seems more natural that perl succeeds when the content matches, but perhaps it would be cleaner to use != and and remove the outer inversion.
Could you please try following, based on your logic(explained by OP in comments, that all contents of Input_file src should be present in same order in Input_file tgt) try following.
awk '
FNR==NR{
a[FNR,$0]
val1=(val1?val1 ORS:"")$0
next
}
((FNR,$0) in a){
count++
val2=(val2?val2 ORS:"")$0
}
END{
if(count==length(a)){
print val1 ORS val2
}
}
' file_src file_tgt

adding double quotes, commas and removing newlines

I have a file that have a list of integers:
12542
58696
78845
87855
...
I want to change them into:
"12542", "58696", "78845", "87855", "..."
(no comma at the end)
I believe I need to use sed but couldnt figure it out how. Appreciate your help.
You could do a sed multiline trick, but the easy way is to take advantage of shell expansion:
echo $(sed '$ ! s/.*/"&",/; $ s/.*/"&"/' foo.txt)
Run echo $(cat file) to see why this works. The trick, in a nutshell, is that the result of cat is parsed into tokens and interpreted as individual arguments to echo, which prints them separated by spaces.
The sed expression reads
$ ! s/.*/"&",/
$ s/.*/"&"/
...which means: For all but the last line ($ !) replace the line with "line",, and for the last line, with "line".
EDIT: In the event that the file contains not just a line of integers like in OP's case (when the file can contain characters the shell expands), the following works:
EDIT2: Nicer code for the general case.
sed -n 's/.*/"&"/; $! s/$/,/; 1 h; 1 ! H; $ { x; s/\n/ /g; p; }' foo.txt
Explanation: Written in a more readable fashion, the sed script is
s/.*/"&"/
$! s/$/,/
1 h
1! H
$ {
x
s/\n/ /g
p
}
What this means is:
s/.*/"&"/
Wrap every line in double quotes.
$! s/$/,/
If it isn't the last line, append a comma
1 h
1! H
If it is the first line, overwrite the hold buffer with the result of the previous transformation(s), otherwise append it to the hold buffer.
$ {
x
s/\n/ /g
p
}
If it is the last line -- at this point the hold buffer contains the whole line wrapped in double quotes with commas where appropriate -- swap the hold buffer with the pattern space, replace newlines with spaces, and print the result.
Here is the solution,
sed 's/.*/ "&"/' input-file|tr '\n' ','|rev | cut -c 2- | rev|sed 's/^.//'
First change your input text line in quotes
sed 's/.*/ "&"/' input-file
Then, this will convert your new line to commas
tr '\n' ',' <your-inputfile>
The last commands including rev, cut and sed are used for formatting the output according to requirement.
Where,
rev is reversing string.
cut is removing trailing comma from output.
sed is removing the first character in the string to formatting it accordingly.
Output:
With perl without any pipes/forks :
perl -0ne 'print join(", ", map { "\042$_\042" } split), "\n"' file
OUTPUT:
"12542", "58696", "78845", "87855"
Here's a pure Bash (Bash≥4) possibility that reads the whole file in memory, so it won't be good for huge files:
mapfile -t ary < file
((${#ary[#]})) && printf '"%s"' "${ary[0]}"
((${#ary[#]}>1)) && printf ', "%s"' "${ary[#]:1}"
printf '\n'
For huge files, this awk seems ok (and will be rather fast):
awk '{if(NR>1) printf ", ";printf("\"%s\"",$0)} END {print ""}' file
One way, using sed:
sed ':a; N; $!ba; s/\n/", "/g; s/.*/"&"/' file
Results:
"12542", "58696", "78845", "87855", "..."
You can write the column oriented values in a row with no comma following the last as follows:
cnt=0
while read -r line || test -n "$line" ; do
[ "$cnt" = "0" ] && printf "\"%s\"" "$line"
printf ", \"%s\"" "$line"
cnt=$((cnt + 1))
done
printf "\n"
output:
$ bash col2row.sh dat/ncol.txt
"12542", "12542", "58696", "78845", "87855"
A simplified awk solution:
awk '{ printf sep "\"%s\"", $0; sep=", " }' file
Takes advantage of uninitialized variables defaulting to an empty string in a string context (sep).
sep "\"%s\"" synthesizes the format string to use with printf by concatenating sep with \"%s\". The resulting format string is applied to $0, each input line.
Since sep is only initialized after the first input record, , is effectively only inserted between output elements.

Convert Row to Column in shell

I am in need of converting the below in multiple files. Text need not be same, but will be in the same format and length
File 1:
XXXxx81511
XXX is Present
abcdefg
07/09/2014
YES
1
XXX
XXX-XXXX
File 2:
XXXxx81511
XXX is Present
abcdefg
07/09/2014
YES
1
XXX
XXX-XXXX
TO
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX
Basically converting row to column and appending to a new file while adding commas to separate them.
I am trying cat filename | tr '\n' ',' but the results do get added in the same line. like this
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX,XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX
Use:
paste -sd, file1 file2 .... fileN
#e.g.
paste -sd, *.txt file*
prints
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
and if you need the empty line after each one
paste -sd, file* | sed G
prints
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
Short perl variant:
perl -pe 'eof||s|$/|,|' files....
You need to insert an echo after tr. Use a script like this:
for f in file1 file2; do
tr '\n' ',' < "$f"; echo
done > files.output
Use a for loop:
for f in file*; do sed ':a;N;$!ba;s/\n/,/g' < $f; done
The sed code was taken from sed: How can I replace a newline (\n)?. tr '\n' ',' didn't work on my limited test setup.
perl -ne 'chomp; print $_ . (($. % 8) ? "," : "\n")' f*
where:
-n reads the file line by line but doesn't print each line
-e executes the code from the command line
8 number of lines in each file
f* glob for files (replace with something that will select all
your files). If you need a specific order, you will probably need
something more complicated here.

Add Doublequotes for all columns in text file with bash

Sorry for bad format... now it shows the point:
(columns - tab delimited)
Input:
1 2 3 4
5 6 7 8
some columns may looks like this:
1 some text 2 text with space 3 lots of `:"'etc. 4
5 6 7 8
How to make output:
"1" "2" "3" "4"
"5" "6" "7" "8"
Or even better:
"1","2","3","4"
"5","6","7","8"
Got it! It's a bit stupid... but works:
sed 's/\t/","/g' input.txt | sed 's/^/"/;s/$/"/'
first sed is changing tabs to "," and next one is adding " at the beginning and the end of line.
Replace each tab with ", ", put an extra " to the beginning an to the end:
string=$'1\t2\t3\t4\t5\t6\t7\t8'
echo \"${string//$'\t'/\", \"}\"
This might work for you (GNU sed):
echo -e '1\t2\t3\t4\t5\t6\t7\t8' | sed 's/[^\t]\+/"&"/g;y/\t/,/'
"1","2","3","4","5","6","7","8"
Something like this in perl should work:
perl -F'\t' -lape '$_ = qw(") . join(qw(","), #F) . qw(")' infile
It splits each line of infile at tab chars and joins them with "," while also pre- and appending ".
The simplest solution I know of uses perl:
while read; do
IFS=$'\t' perl -e '$,=","; $\="\n"; print map(qq/"$_"/, #ARGV); ' $REPLY
done < input.txt
A pure bash solution that requires a temporary variable:
while read; do
IFS=$'\t' printf -v var '"%s",' $REPLY
echo "${var%,}"
done < input.txt
Pure Bash, for tab delimited output:
while read line ; do
echo -e "\"${line//$'\t'/\"\t\"}\""
done < "$infile"
for comma delimited output:
while read line ; do
echo -e "\"${line//$'\t'/\",\"}\""
done < "$infile"
Spaces inside strings are allowed.

print a line which has a digit repeated n times in the third field

I have a file with contents:
20120619112139,3,22222288100597,01,503352786544597,,W,ROAMER,,,,0,mme2
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120611171517,3,22222288100620,,503352786544620,11917676228846,B,ROAMER,8,2505,,U,
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120611171003,3,22222288100618,02,503352786544618,,W,ROAMER,8,2505,,0,
20120611171046,3,00000000000000,02,503352786544618,11917676228846,W,ROAMER,8,2505,,0,
20120611171101,3,22222288100618,02,503352786544618,11917676228846,W,ROAMER,8,2505,,0,
20120611171101,3,22222222222222,02,503352786544618,11917676228846,W,ROAMER,8,2505,,0,
I need to check if the third field of any line has one digit repeated all through 14 times, like:00000000000000 and print such lines to another file
I tried this code:
awk '$3 ~ /[0-9]{14}/' myfile > output.txt
But this prints lines having "22222288100618" such values as well.
Also i tried:
for i in `cat myfile`
do
if [ `echo $i | cut -d"," -f 3 | egrep "^[0-9]{14}$"` ];
then echo $i >> output.txt;
fi
done
This doesn't help as well.This also prints all the lines.
But I only need these lines in the output file.
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120703112557,3,00000000000000,,503352786544021,,B,,8,2505,,U,
20120611171046,3,00000000000000,02,503352786544618,11917676228846,W,ROAMER,8,2505,,0,
20120611171101,3,22222222222222,02,503352786544618,11917676228846,W,ROAMER,8,2505,,0,
Thanks in advance for any immediate help
Don't know if this can be done with awk but this should work:
perl -aF, -nle '$F[2]=~/(\d)\1{13}/&& print'
You can use an expression like 0{14}|1{14}.... Try this:
$ for i in 0 1 2 3 4 5 6 7 8 9; do re=$re${re:+|}$i{14}; done
$ awk -F, --posix \$3~/$re/ myfile
(gawk requires --posix to recognize the interval expression {14}. This may not be necessary with all awk.)
Using grep:
grep -E "[0-9]+,[0-9]+,([0-9])\1{13}" myfile
sed -n '/^[^,]+,[^,]+,([0-9])\1{13}/p' input_file

Resources