combining the text files in to one text file - linux

i have a requirement like the following.
i am using linux
i have a set of text files like text1.txt ,text2.txt, text3.txt.
now i am combining into one final text file.
text1.txt
1
NULL
NULL
4
text2.txt
1
2
NULL
4
text3.txt
a
b
c
d
i am using the following command :
paste -d ' ' text1.txt text2.txt text3.txt >> text4.txt
i am getting the :
text4.txt
1 1 a
2 b
c
4 4 d
but i want the output like the following
text4.txt
1 1 a
NULL 2 b
NULL NULL c
4 4 d
NOTE :- NULL means space
i am passing this text4 to another loop as a input so here there i am reading the variable by positionl
thanks in advance

I expect that you want TABs separating your records in file4.txt... what about this?
NLINES=$(wc -l file1.txt | awk '{print $1}')
rm -f file4.txt
for i in $(seq 1 $NLINES); do
rec1=$(sed -n "$i p" file1.txt)
rec2=$(sed -n "$i p" file2.txt)
rec3=$(sed -n "$i p" file3.txt)
echo -e "$rec1\t$rec2\t$rec3" >> file4.txt
done
But actually paste, without "-d ' '" gave the same exact result!

you can achieve same with AWK command
awk '{a[FNR]=a[FNR]$0" "}END{for(i=1;i<=length(a);i++)print a[i]}' text1.txt text2.txt text3.txt >> text4.txt

Related

How do I print a line in one file based on a corresponding value in a second file?

I have two files:
File1:
A
B
C
File2:
2
4
3
I would like to print each line in file1 the number of times found on the corresponding line of file2, and then append each line to a separate file.
Desired output:
A
A
B
B
B
B
C
C
C
Here is one of the approaches I have tried:
touch output.list
paste file1 file2 > test.dict
cat test.dict
A 2
B 4
C 3
while IFS="\t" read -r f1 f2
do
yes "$f1" | head -n "$f2" >> output.list
done < test.dict
For my output I get a bunch of lines that read:
head: : invalid number of lines
Any guidance would be greatly appreciated. Thanks!
Change IFS to an ANSI-C quoted string or remove IFS (since the default value already contains tab).
You can also use a process substitution and prevent the temporary file.
while IFS=$'\t' read -r f1 f2; do
yes "$f1" | head -n "$f2"
done < <(paste file1 file2) > output.list
You could loop through the output of paste and use a c-style for loop in replacement of yes and head.
#!/usr/bin/env bash
while read -r first second; do
for ((i=1;i<=second;i++)); do
printf '%s\n' "$first"
done
done < <(paste file1.txt file2.txt)
If you think that the output is correct add
| tee file3.txt
with a white space in between the command substitution to redirect the output to a stdout and the output file which is file3.txt
done < <(paste file1.txt file2.txt) | tee file3.txt
You can do this with an awk 1-liner
$ awk 'NR==FNR{a[++i]=$0;next}{for(j=0;j<$0;j++)print a[FNR]}' File1 File2
A
A
B
B
B
B
C
C
C

Sort the tab-delimited numbers on each line of a file

I'm trying to sort the numbers on each line of a file individually. The numbers within one line are separated by tabs. (I used spaces but they're actually tabs.)
For example, for the following input
5 8 7 6
1 5 6 8
8 9 7 1
the desired output would be:
5 6 7 8
1 5 6 7
1 7 8 9
My attempt so far is:
let i=1
while read line
do
echo "$line" | tr " " "\n" | sort -g
cut -f $i fileName | paste -s >> tempFile$$
((++i))
done < fileName
This is the best I got - I'm sure it can be done in 6 characters with awk/sed/perl:
while read line
do
echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t >> another-file.txt
done < my-input-file.txt
Using a few features that are specific to GNU awk:
$ awk 'BEGIN{ PROCINFO["sorted_in"] = "#ind_num_asc" }
{ delete(a); n = 0; for (i=1;i<=NF;++i) a[$i];
for (i in a) printf "%s%s", i, (++n<NF?FS:RS) }' file
5 6 7 8
1 5 6 8
1 7 8 9
Each field is set as a key in the array a. In GNU awk it is possible to specify the order in which the for (i in a) loop traverses the array - here, I've set it to do so in ascending numerical order.
Here is a bash script that can do it. It takes a filename argument or reads stdin, was tested on CentOS and assumes IFS=$' \t\n'.
#!/bin/bash
if [ "$1" ] ; then exec < "$1" ; fi
cat - | while read line
do
set $line
echo $(for var in "$#"; do echo $var; done | sort -n) | tr " " "\t"
done
If you want to put the output in another file run it as:
cat input_file | sorting_script > another_file
or
sorting_script input_file > another file
Consider using perl for this:
perl -ape '#F=sort #F;$_="#F\n"' input.txt
Here -a turns on automatic field splitting (like awk does) into the array #F, -p makes it execute the script for each line and print $_ each time, and -e specifies the script directly on the command line.
Not quite 6 characters, I'm afraid, Sean.
This should have been simple in awk, but it doen't quite have the features needed. If there had been an array $# corresponding to the fields $1, $2, etc., then the solution would have been awk '{asort $#}' input.txt, but sadly no such array exits. The loops required to move the fields into an array and out of it again make it longer than the bash version:
awk '{for(i=1;i<=NF;i++)a[i]=$i;asort(a);for(i=1;i<=NF;i++)printf("%s ",a[i]);printf("\n")}' input.txt
So awk isn't the right tool for the job here. It's also a bit odd that sort itself doesn't have a switch to control its sorting direction.
Using awk
$ cat file
5 8 7 6
1 5 6 8
8 9 7 1
$ awk '{c=1;while(c!=""){c=""; for(i=1;i<NF;i++){n=i+1; if($i>$n){c=$i;$i=$n;$n=c}}}}1' file
5 6 7 8
1 5 6 8
1 7 8 9
Better Readable version
awk '{
c=1
while(c!="")
{
c=""
for(i=1;i<NF;i++)
{
n=i+1
if($i>$n)
{
c=$i
$i=$n
$n=c
}
}
}
}1
' file
If you have ksh, you may try this
#!/usr/bin/env ksh
while read line ; do
set -s +A cols $line
echo ${cols[*]}
done < "input_file"
Test
[akshay#localhost tmp]$ cat test.ksh
#!/usr/bin/env ksh
cat <<EOF | while read line ; do set -s +A cols $line; echo ${cols[*]};done
5 8 7 6
1 5 6 8
8 9 7 1
EOF
[akshay#localhost tmp]$ ksh test.ksh
5 6 7 8
1 5 6 8
1 7 8 9

Cat headers and renaming a column header using awk?

I've got an input file (input.txt) like this:
name value1 value2
A 3 1
B 7 4
C 2 9
E 5 2
And another file with a list of names (names.txt) like so:
B
C
Using grep -f, I can get all the lines with names "B" and "C"
grep -wFf names.txt input.txt
to get
B 7 4
C 2 9
However, I want to keep the header at the top of the output file, and also rename the column name "name" with "ID". And using grep, to keep the rows with names B and C, the output should be:
**ID** value1 value2
B 7 4
C 2 9
I'm thinking awk should be able to accomplish this, but being new to awk I'm not sure how to approach this. Help appreciated!
While it is certainly possible to do this in awk, the fastest way to solve your actual problem is to simply prepend the header you want in front of the grep output.
echo **ID** value1 value2 > Output.txt && grep -wFf names.txt input.txt >> Output.txt
Update Since the OP has multiple files, we can modify the above line to pull the first line out of the input file instead.
head -n 1 input.txt | sed 's/name/ID/' > Output.txt && grep -wFf names.txt input.txt >> Output.txt
Here is how to do it with awk
awk 'FNR==NR {a[$1];next} FNR==1 {$1="ID";print} {for (i in a) if ($1==i) print}' name input
ID value1 value2
B 7 4
C 2 9
Store the names in an array a
Then test filed #1 if it contains data in array a

How to sort data while printing line by line in Bash

I'm writing a shell script traversing a list of directories and counting words from files inside them. The code prints data each time I read a file. So the output is not sorted. How can I sort it?
The output right now is like this:
cat 5
door 1
bird 3
dog 1
and I want to sort it first by second column and then by first column:
dog 1
door 1
bird 3
cat 5
You can pipe your shell script to:
sort -n -k2 -k1
With -n you specify numeric sort and with -k2 that you want to sort first by the second field and with -k1 to sort then by first field.
First of all, I tried to reproduce what OP is doing, so after creating the files, I tried this command:
% for i in *; do echo -n "$i "; wc -w < $i; done
bird 3
cat 5
dog 1
door 1
Then I have added the sort:
% (for i in *; do echo -n "$i "; wc -w < $i; done) | sort -n -k 2 -k 1
dog 1
door 1
bird 3
cat 5

Find value from one csv in another one (like vlookup) in bash (Linux)

I have already tried all options that I found online to solve my issue but without good result.
Basically I have two csv files (pipe separated):
file1.csv:
123|21|0452|IE|IE|1|MAYOBAN|BRIN|OFFICE|STREET|MAIN STREET|MAYOBAN|
123|21|0453|IE|IE|1|CORKKIN|ROBERT|SURNAME|CORK|APTS|CORKKIN|
123|21|0452|IE|IE|1|CORKCOR|NAME|HARRINGTON|DUBLIN|STREET|CORKCOR|
file2.csv:
MAYOBAN|BANGOR|2400
MAYOBEL|BELLAVARY|2400
CORKKIN|KINSALE|2200
CORKCOR|CORK|2200
DUBLD11|DUBLIN 11|2100
I need a linux bash script to find the value of pos.3 from file2 based on the content of pos7 in file1.
Example:
file1, line1, pos 7: MAYOBAN
find MAYOBAN in file2, return pos 3 (2400)
the output should be something like this:
**2400**
**2200**
**2200**
**etc...**
Please help
Jacek
A little approach, far away to be perfect:
DELIMITER="|"
for i in $(cut -f 7 -d "${DELIMITER}" file1.csv );
do
grep "${i}" file2.csv | cut -f 3 -d "${DELIMITER}";
done
This will work, but since the input files must be sorted, the output order will be affected:
join -t '|' -1 7 -2 1 -o 2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)
The output would look like:
2200
2200
2400
which is useless. In order to have a useful output, include the key value:
join -t '|' -1 7 -2 1 -o 0,2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)
The output then looks like this:
CORKCOR|2200
CORKKIN|2200
MAYOBAN|2400
Edit:
Here's an AWK version:
awk -F '|' 'FNR == NR {keys[$7]; next} {if ($1 in keys) print $3}' file1.csv file2.csv
This loops through file1.csv and creates array entries for each value of field 7. Simply referring to an array element creates it (with a null value). FNR is the record number in the current file and NR is the record number across all files. When they're equal, the first file is being processed. The next instruction reads the next record, creating a loop. When FNR == NR is no longer true, the subsequent file(s) are processed.
So file2.csv is now processed and if it has a field 1 that exists in the array, then its field 3 is printed.
You can use Miller (https://github.com/johnkerl/miller).
Starting from input01.txt
123|21|0452|IE|IE|1|MAYOBAN|BRIN|OFFICE|STREET|MAIN STREET|MAYOBAN|
123|21|0453|IE|IE|1|CORKKIN|ROBERT|SURNAME|CORK|APTS|CORKKIN|
123|21|0452|IE|IE|1|CORKCOR|NAME|HARRINGTON|DUBLIN|STREET|CORKCOR|
and input02.txt
MAYOBAN|BANGOR|2400
MAYOBEL|BELLAVARY|2400
CORKKIN|KINSALE|2200
CORKCOR|CORK|2200
DUBLD11|DUBLIN 11|2100
and running
mlr --csv -N --ifs "|" join -j 7 -l 7 -r 1 -f input01.txt then cut -f 3 input02.txt
you will have
2400
2200
2200
Some notes:
-N to set input and output without header;
--ifs "|" to set the input fields separator;
-l 7 -r 1 to set the join fields of the input files;
cut -f 3 to extract the field named 3 from the join output
cut -d\| -f7 file1.csv|while read line
do
grep $line file1.csv|cut -d\| -f3
done

Resources