Bash merge columned files into one file with rows

Bash merge columned files into one file with rows - excel

I have many data files in this format:
-1597.5421
-1909.6982
-1991.8743
-2033.5744
But I would like to merge them all into one data file with each original data file taking up one row with spaces in between so I can import it in excel.
-1597.5421 -1909.6982 -1991.8743 -2033.5744
-1789.3324 -1234.5678 -9876.5433 -9999.4321
And so on. Each file is named ALL.ene and every directory in my working directory contains it. Can someone give me a quick fix? Thanks!
:edit. Each file has 11 entries. Those were just examples.

for i in */ALL.ene
do
echo $(<$i)
done > result.txt

Assumptions:
I assume all your data files are of this format:
<something1><newline>
<something2><newline>
<something3><newline>
So for example, if the last newline is missing, the following script will miss the field corresponding to <something3>.
Usage: ./merge.bash -o <output file> <input file list or glob>
The script appends to any existing output files from previous runs. It also does not make any assumptions to how many fields of data every input file has. It blindly puts every line into a line in the output file separated by spaces.
#!/bin/bash
# set -o xtrace # uncomment to debug
declare output
[[ $1 =~ -o$ ]] && output="$2" && shift 2 || { \
echo "The first argument should always be -o <output>";
exit -1; }
declare -a files=("${#}") row
for file in "${files[#]}";
do
while read data; do
row+=("$data")
done < "$file"
echo "${row[#]}" >> "$output"
row=()
done
Example:
$ cat data1
-1597.5421
-1909.6982
-1991.8743
-2033.5744
$ cat data2
-1789.3324
-1234.5678
-9876.5433
-9999.4321
$ ./merge.bash -o test data{1,2}
$ cat test
-1597.5421 -1909.6982 -1991.8743 -2033.5744
-1789.3324 -1234.5678 -9876.5433 -9999.4321

This is what coreutils paste is good at, try:
paste -s data_files*

Related

How to eliminate the duplicate records from the delimited file in Linux file (myfile_I.out: application/octet-stream; charset=binary)

I'm trying to load the data from linux file(which contains duplicates,and the data is unloaded from source table) to a table.
mylinux file properties:
$ file -bi myfile_I.out
application/octet-stream; charset=binary
before loading the data to a table.I should delete the duplicates from the linux file.
My approach to delete the duplicates:
Unloaded the data from source table to temp file (TempeEX.out)
from TempEX.out file performed sort -u function and deleted the
duplicates and the final unique data records loading to myfile_I.out
Finally load the myfile_I.out data to a target_table
I am facing the issue in STEP 2 {Unable to delete the complete duplicates from TempEX.out file}
#------------------------------------------------------------------#
#- Delete the duplicates from TempEX.out write the unique data-----#
#------------------to myfile_I.out----------------------------------#
echo -e "Eliminate the duplicates from the ${FILE_PATH}/TempEX.out
file" >> ${LOG}
sort -u ${FILE_PATH}/TempEX.out > ${DEST_PATH}/myfile_I.out
echo -e "Unique records successfully written into
${DEST_PATH}/myfile_I.out" >> ${LOG}
count=0
while read
do
((count=$count+1))
done <${DEST_PATH}/myfile_I.out
echo -e "Total No of unique records in ${DEST_PATH}/myfile_I.out:"
${count} "\n" >> $LOG
#-----------------------------------------------------------------#
Actual Results:
Counts:
$wc -l TempEX.out myfile_I.out
196466 TempEX.out -->#File Contains duplicate records#
196460 myfile_I.out-->#Unique records after my approach(sort -u)#
392926 total
I did some sort functions to know the duplicates present in myfile_I.out
Duplicate record count in TempEX.out file
$ cut -d'^X' -f1,6,10 TempEX.out|sort|uniq -d|wc -l
5
Duplicate record count in myfile_I.out file
$ cut -d'^X' -f1,6,10 myfile_I.out|sort|uniq -d|wc -l
1
Got which records(on primary_key) having duplicates in TempEX.out file
$ cut -d'^X' -f1,6,10 TempEX.out|sort|uniq -d|cat
701234567 412345678 19
701234568 412345677 18
709875641 412345859 17
701234569 425984031 21
701234570 409845216 20
Got which records(on primary_key) having duplicates in myfile_I.out file
$ cut -d'^X' -f1,6,10 myfile_I.out|sort|uniq -d|cat
709875641 412345859 17
Expected Results:
To eliminate the duplicates from TempEX.out file an load the unique data to myfile_I.out.
sort -u TempEX.out > myfile_I.out /*cant resolving the issue*/
Can we do something like this?(perform up on primary keys)
sort -u -f1,6,10 TempEX.out > myfile_I.out

Here is a little script that might help. It won't modify the original file with the new data, but create a new file to load (I always prefer to keep the original in case of errors). It's doing its verification on the primary key, but will ensure that in case of a duplicate primary key that the other columns are also the same. The rational being that, even if you don't mention it, there could be modification of existent data or errors from the input system. Anyways, the script will send those lines in a different file for the user review.
It's written in the comments, but just to be sure, no field in any column should have a value with blank space.
#!/bin/ksh
TIMESTAMP=$(date +"%Y%m%d%H%M")
#No sense to do anything if the files are not readable.
if [[ ! -r $1 || ! -r $2 ]]; then
print "ERROR - You must provide 2 parameters : 1 = path/filename of DB content 2 = path/filename of New Data"
exit
fi
#Declaring 2 associative matrix
typeset -A TableDB
typeset -A DataToAdd
#Opening the different files. 3 and 4 for reading and 5 and 6 for writting.
#File handlers :
# 3 for the data from the DB,
# 4 for the new data to add,
# 5 to write the new data to load (unique and new),
# 6 to write the data in problem (same primary key but with different values)
exec 3<$1
exec 4<$2
exec 5>Data2Load_${TIMESTAMP}.txt
exec 6>Data2Verify_${TIMESTAMP}.txt
#Loading the 2 matrix with their data.
#Here it is assumed that no field in any column contain blank spaces.
#Working with only 3 columns as in the example
while read -u3 a b c && read -u4 d e f; do
TableDB[$a]=( $a $b $c )
DataToAdd[$d]=( $d $e $f )
done
#Checking for duplicate and writting only the new one to load without the lines in possible errors
for i in ${!DataToAdd[#]}; do
if [[ -z ${TableDB[$i]} ]]; then
print -u5 "${DataToAdd[$i][0]} ${DataToAdd[$i][1]} ${DataToAdd[$i][2]}"
elif [[ ${DataToAdd[$i][1]} != ${TableDB[$i][1]} || ${DataToAdd[$i][2]} != ${TableDB[$i][2]} ]]; then
print -u6 "${DataToAdd[$i][0]} ${DataToAdd[$i][1]} ${DataToAdd[$i][2]}"
fi
done
#closing the different files
exec 3>&-
exec 4>&-
exec 5>&-
exec 6>&-
Hope it helps !

Set variable from content of files in bash(loop)

I'm trying to upload certificates(just created) to some storage.
So I can read all certificates in my folder and want to use content of each of this file to a variable in a loop.
#!/bin/bash
dir="${0%/*}"
#for f in $(cat $dir"/"*.crt)
# do
# data='{"certificate_data":'"$f"'}'
#done
url="localhost:50183/api/v0.1/Certificates"
data='{"certificate_data":'$(cat $dir"/"*.crt)'}'
echo "$data"
So I got all certificates in one time but I need to get in $data each of content of files in a loop with correct form something like:
{"certificate_data":"<certificate_data_from_file>"}
{"certificate_data":"<certificate_data_from_file>"}
......
and so on
I know that I should use another one loop but don't know how.
Be grateful for any tips!

This should do the work:
#!/bin/bash
for f in ./dir/*.crt
do
data='{"certificate_data":"'"$(< "${f}")"'"}'
echo "${data}"
done
Test:
$ ls ./dir/*
./dir/cert1.crt ./dir/cert2.crt
$ cat ./dir/*
I am certificate1.
I am certificate2.
$ ./cert.sh
{"certificate_data":"I am certificate1."}
{"certificate_data":"I am certificate2."}

sed command issue with string replacement

I'm having a weird problem with the sed command.
I have a script that take a c file, copy it X times and then replace the name of the functions inside it by adding number to the name.
For example:
originalFile.c contains these functions check0, check1 check2
The script will generate those file:
originalFile1.c: check0 check1 check2
originalFile2.c: check3 check4 check5
originalFile3.c: check6 check7 check8
... and so on.
Now the problem... If I generate enough files so the number goes up to 10,20 or more I noticed something in the name of the function. The first function of the file is renamed incorrectly but the other are corrects. For example:
originalFileX.c: __check165__ check16 check17
...
originalFileZ.c: __check297__ __check298__ check29 -> in this file 2 names are incorrects.
Also, If I print the name with echo everything is correct. Do you have any idea what could be wrong?
Here is my script (I run it under OSX):
#!/bin/bash
NUMCHECK=3
# $1: filename
# $2: number of function in the file
# $3: number of function I want to generate
# $4: function basename
function replace_name() {
FILE_NUM=$((($3+($2-1))/$2))
TMP=0
for (( i=1; i<$FILE_NUM+1; i++ ))
do
cp $1.mm test/$1$i.mm
for (( j=0; j<$2; j++ ))
do
OLDNAME="$4$j"
NEWNAME="$4$TMP"
echo $OLDNAME:$NEWNAME
sed -i "" "s/$OLDNAME/$NEWNAME/g" test/$1$i.mm
TMP=$(($TMP+1))
done
done
}
replace_name check $NUMCHECK 60 check

Youre doing 3 runs of the sed in each file. Just imagine the following
sed -i s/check0/check150/g test/check51.mm
sed -i s/check1/check151/g test/check51.mm
sed -i s/check2/check152/g test/check51.mm
The
s/check0/check150/g changes the check0 to check150 - ok
s/check1/check151/g will change the previous check150 to check15150 (because it finds the check1 string in the check150 too, from the previous step).
etc...
You need more precisely define your regex. because here isn't any example input, can't help more.

bash: How to transfer/copy only the file names to separate similar files?

I've some files in a folder A which are named like that:
001_file.xyz
002_file.xyz
003_file.xyz
in a separate folder B I've files like this:
001_FILE_somerandomtext.zyx
002_FILE_somerandomtext.zyx
003_FILE_somerandomtext.zyx
Now I want to rename, if possible, with just a command line in the bash all the files in folder B with the file names in folder A. The file extension must stay different.
There is exactly the same amount of files in each folder A and B and they both have the same order due to numbering.
I'm a total noob, but I hope some easy answer for the problem will show up.
Thanks in advance!
ZVLKX
*Example edited for clarification

An implementation might look a bit like this:
renameFromDir() {
useNamesFromDir=$1
forFilesFromDir=$2
for f in "$forFilesFromDir"/*; do
# Put original extension in $f_ext
f_ext=${f##*.}
# Put number in $f_num
f_num=${f##*/}; f_num=${f_num%%_*}
# look for a file in directory B with same number
set -- "$useNamesFromDir"/"${f_num}"_*.*
[[ $1 && -e $1 ]] || {
echo "Could not find file number $f_num in $dirB" >&2
continue
}
(( $# > 1 )) && {
# there's more than one file with the same number; write an error
echo "Found more than one file with number $f_num in $dirB" >&2
printf ' - %q\n' "$#" >&2
continue
}
# extract the parts of our destination filename we want to keep
destName=${1##*/} # remove everything up to the last /
destName=${destName%.*} # and past the last .
# write the command we would run to stdout
printf '%q ' mv "$f" "$forFilesFromDir/$destName.$f_ext"; printf '\n'
## or uncomment this to actually run the command
# mv "$f" "$forFilesFromDir/$destName.$f_ext"
done
}
Now, how would we test this?
mkdir -p A B
touch A/00{1,2,3}_file.xyz B/00{1,2,3}_FILE_somerandomtext.zyx
renameFromDir A B
Given that, the output is:
mv B/001_FILE_somerandomtext.zyx B/001_file.zyx
mv B/002_FILE_somerandomtext.zyx B/002_file.zyx
mv B/003_FILE_somerandomtext.zyx B/003_file.zyx

Sorry if this isn't helpful, but I had fun writing it.
This renames items in folder B to the names in folder A, preserving the extension of B.
A_DIR="./A"
A_FILE_EXT=".xyz"
B_DIR="./B"
B_FILE_EXT=".zyx"
FILES_IN_A=`find $A_DIR -type f -name *$A_FILE_EXT`
FILES_IN_B=`find $B_DIR -type f -name *$B_FILE_EXT`
for A_FILE in $FILES_IN_A
do
A_BASE_FILE=`basename $A_FILE`
A_FILE_NUMBER=(${A_BASE_FILE//_/ })
A_FILE_WITHOUT_EXTENSION=(${A_BASE_FILE//./ })
for B_FILE in $FILES_IN_B
do
B_BASE_FILE=`basename $B_FILE`
B_FILE_NUMBER=(${B_BASE_FILE//_/ })
if [ ${A_FILE_NUMBER[0]} == ${B_FILE_NUMBER[0]} ]; then
mv $B_FILE $B_DIR/$A_FILE_WITHOUT_EXTENSION$B_FILE_EXT
break
fi
done
done

Twice Bash command substitution

I have directories a1..a5, b1..b5 and c1..c5. Inside each directory I have two files a1, b1 and c1.
do mkdir /tmp/{a,b}$d; touch /tmp/{a,b,c}$d/{a,b,c}1; done;
I want to get all the files starting with 'a' or 'b' inside the directories starting with an 'a'. I can do it with:
DIRS=`ls -1 -d /tmp/{a,b}*/a*`
echo ${DIRS}
and obtain:
/tmp/a1/a1 /tmp/a2/a1 /tmp/a3/a1 /tmp/a4/a1 /tmp/a5/a1
/tmp/b1/a1 /tmp/b2/a1 /tmp/b3/a1 /tmp/b4/a1 /tmp/b5/a1
Now, I will use a variable called DATA to store the directories and later get the files:
DATA="/tmp/{a,b}*"
echo ${DATA}
DIRS=`ls -1 -d ${DATA}/a*`
echo ${DIRS}
In the output, the DATA contents is OK (/tmp/{a,b}*), but I receive the following error:
ls: cannot access /tmp/{a,b}*/a*: No such file or directory
Any idea why this happens?

I solved the problem, but I can't find any reference about why my previous attempts failed.
DATA="/tmp/{a,b}*"
echo ${DATA}
DIRS=`eval "ls -1 -d ${DATA}/a*"`
echo ${DIRS}
Output:
/tmp/a1/a1 /tmp/a2/a1 /tmp/a3/a1 /tmp/a4/a1 /tmp/a5/a1 /tmp/b1/a1
/tmp/b2/a1 /tmp/b3/a1 /tmp/b4/a1 /tmp/b5/a1

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash merge columned files into one file with rows - excel

for i in */ALL.ene do echo $(<$i) done > result.txt

This is what coreutils paste is good at, try: paste -s data_files*

Related

How to eliminate the duplicate records from the delimited file in Linux file (myfile_I.out: application/octet-stream; charset=binary)

Set variable from content of files in bash(loop)

sed command issue with string replacement

bash: How to transfer/copy only the file names to separate similar files?

Twice Bash command substitution

Categories

Resources