How to add a column at the end of multiple csv files using shell script

How to add a column at the end of multiple csv files using shell script - linux

I have a couple of thousands CSV file. All of them have same structure and header. I would like to add a column at the end of the file. I found several solutions that add a column and value to that column but I didn't find anything that adds the header for that new column. For example, I have files like 1001.csv, 1002.csv, 1003.csv and so on.
Contents of 1001.csv
ID,URL
1,one.com
2,two.com
I want to modify it like this
ID,URL,FILE
1,one.com,1001
2,two.com,1001
Since I have tons of files like this, I don't want to mess up the data while adding a column. Also, I don't want to produce extra files if it's possible to do in place update.

I tested this on a huge number of files and it worked really fast. This code removes the header first then add a column plus value to the column and finally brings the header back.
#!/bin/bash
# How to run $ ./this-script.sh inputdir/
# here inputdir contains all csv files
# input argument is dir name
DIRNAME=`basename $1`
# go to target directory
cd $DIRNAME
# get list of all csv files
csvfiles=`ls *.csv`
for FILENAME in $csvfiles
do
echo $FILENAME
# filename without extension
CODE="${FILENAME%.*}"
echo $CODE
## remove header
tail -n +2 "$FILENAME" > "$FILENAME.tmp" && mv "$FILENAME.tmp" "$FILENAME"
## add new field at the end
sed "s/$/,$CODE/" "$FILENAME" > "$FILENAME.tmp2"
## add header with new column name
# keep filename.bak as a backup for safety
sed -i.bak 1i"id,url,file" "$FILENAME.tmp2"
# if all good then remove temp files
rm "$FILENAME"
rm "$FILENAME.tmp2.bak"
# rename output file to original name
mv "$FILENAME.tmp2" "$FILENAME"
done
# go back to parent directory
cd ..

Related

How to identify modified file names in a directory?

I have the following four files in a directory:
TRAILBLAZER_107-10016_FTP_SCR_CT_CTAC
TRAILBLAZER_107-10016_FTP_SCR_CT_Recon
TRAILBLAZER_107-10016_FTP_SCR_PET_NAC
TRAILBLAZER_107-10016_FTP_SCR_PET_AC_Frames
And i've made a simple for loop to go through each of those files and change the name of the file based off a certain key word in the name. This essentially just changes the name of the bottom two files:
for file in TRAILBLAZER*
do
mv "$file" "${file/PET_AC/PET_TESTAC}"
mv "$file" "${file/PET_NAC/TESTNAC}"
done
How would I be able to echo the number of files that have been altered by that for loop, and the number of files in the directory that remain unchanged?

The Parameter Expansion (with substring replacement) does not return any value to let you know if it succeeded or failed. You will need to do that manually. Save the filename before you modify it and check if the file exists afterwards. If it doesn't, a change was made so increment a counter.
You could do something similar to:
changed=0 ## counter
for file in TRAILBLAZER*
do
fname="$file" ## save original file name
mv "$file" "${file/PET_AC/PET_TESTAC}"
mv "$file" "${file/PET_NAC/TESTNAC}"
[ -f "$fname" ] || ((changed++)) ## original exists or increment counter
done
printf "files changed: %s\n", "$changed"

Bash Script to replicate files

I have 25 files in a directory. I need to amass 25000 files for testing purposes. I thought I could just replicate these files over and over until I get 25000 files. I could manually copy paste 1000 times but that seemed tedious. So I thought I could write a script to do it for me. I tried
cp * .
As a trial but I got an error that said the source and destination file are the same. If I were to automate it how would i do it so that each of the 1000 times the new files are made with unique names?

As discussed in the comments, you can do something like this:
for file in *
do
filename="${file%.*}" # get everything up to last dot
extension="${file##*.}" # get extension (text after last dot)
for i in {00001..10000}
do
cp $file ${filename}${i}${extension}
done
done
The trick for i in {00001..10000} is used to loop from 1 to 10000 having the number with leading zeros.
The ${filename}${i}${extension} is the same as $filename$i$extension but makes more clarity over what is a variable name and what is text. This way, you can also do ${filename}_${i}${extension} to get files like a_23.txt, etc.
In case your current files match a specific pattern, you can always do for file in a* (if they all are on the a + something format).

If you want to keep the extension of the files, you can use this. Assuming, you want to copy all txt-files:
#!/bin/bash
for f in *.txt
do
for i in {1..10000}
do
cp "$f" "${f%.*}_${i}.${f##*.}"
done
done

You could try this:
for file in *; do for i in {1..1000}; do cp $file $file-$i; done; done;
It will append a number to any existing files.

The next script
for file in *.*
do
eval $(sed 's/\(.*\)\.\([^\.]*\)$/base="\1";ext="\2";/' <<< "$file")
for n in {1..1000}
do
echo cp "$file" "$base-$n.$ext"
done
done
will:
take all files with extensions *.*
creates the basename and extension (sed)
in a cycle 1000 times copyes the original file to file-number.extension
it is for DRY-RUN, remove the echo if satisfied

Copy text from multiple files, same names to different path in bash (linux)

I need help copying content from various files to others (same name and format, different path).
For example, $HOME/initial/baby.desktop has text which I need to write into $HOME/scripts/baby.desktop. This is very simple for a single file, but I have 2500 files in $HOME/initial/ and the same number in $HOME/scripts/ with corresponding names (same names and format). I want append (copy) the content of file in path A to path B (which have the same name and format), to the end of file in path B without erase the content of file in path B.
Example content of $HOME/initial/*.desktop to final $HOME/scripts/*.desktop. I tried the following, but it don't work:
cd $HOME/initial/
for i in $( ls *.desktop ); do egrep "Icon" $i >> $HOME/scripts/$i; done

Firstly, I would backup $HOME/initial and $HOME/scripts, because there is lots of scope for people misunderstanding your question. Like this:
cd $HOME
tar -cvf initial.tar initial
tar -cvf scripts.tar scripts
That will put all the files in $HOME/initial into a single tarfile called initial.tar and all the files in $HOME/scripts into a single tarfile called scripts.tar.
Now for your question... in general, if you want to put the contents of FileB onto the end of FileA, the command is
cat FileB >> FileA
Note the DOUBLE ">>" which means "append" rather than single ">" which means overwrite.
So, I think you want to do this:
cd $HOME/initial/baby.desktop
cat SomeFile >> $HOME/scripts/baby.desktop/SomeFile
where SomeFile is the name of any file you choose to test with. I would test that has worked and then, if you are happy with that, go ahead and run the same command inside a loop:
cd $HOME/initial/baby.desktop
for SOURCE in *
do
DESTINATION="$HOME/scripts/baby.desktop/$SOURCE"
echo Appending "$SOURCE" to "$DESTINATION"
#cat "$SOURCE" >> "$DESTINATION"
done
When the output looks correct, remove the "#" at the start of the penultimate line and run it again.

I solved it, if some people want learn how to resolve is very simple:
using Sed
I need only the match (or pattern) line "Icon=/usr/share/some_picture.png into $HOME/initial/example.desktop to other with same name and format $HOME/scripts/example.desktop, but I had a lot of .desktop files (2500 files)
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do sed -ne '/Icon=/ p' $i >> $HOME/scripts/$i ; done
_________
If you need only copy all to other file with same name and format
using cat
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do cat $i >> $HOME/scripts/$i ; done

Split files according to a field and save in subdirectory created using the root name

I am having trouble with several bits of code, I am no expert in Linux Bash programming unfortunately so I have tried unsuccessfully to find something that works for my task all day and was hoping you could help guide me in the right direction.
I have many large files that I would like to split according to the third field within each of them, I would like to keep the header in each of the sub-files, and save the created sub-files in new directories created from the root names of the files.
The initial files stored in the original directory are:
Downloads/directory1/Levels_CHG_Lab_S_sample1.txt
Downloads/directory1/Levels_CHG_Lab_S_sample2.txt
Downloads/directory1/Levels_CHG_Lab_S_sample3.txt
and so on..
Each of these files have 200 columns, and column 3 contains values from 1 through 10.
I would like to split each of the files above based on the value of this column, and store the subfiles in subfolders, so for example sub-folder "Downloads/directory1/sample1" will contain 10 files (with the header line) derived by splitting the file Downloads/directory1/Levels_CHG_Lab_S_sample1.txt.
I have tried now many different steps for these steps, with no success.. I must be making this more complicated than it is since the code I have tried looks aweful…
Here is the code I am trying to work from:
FILES=Downloads/directory1/
for f in $FILES
do
# Create folder with root name by stripping file names
fname=${echo $f | sed 's/.txt//;s/Levels_CHG_Lab_S_//'}
echo "Creating sub-directory [$fname]"
mkdir "$fname"
# Save the header
awk 'NR==1{print $0}' $f > header
# Split each file by third column
echo "Splitting file $f"
awk 'NR>1 {print $0 > $3".txt" }' $f
# Move newly created files in sub directory
mv {1..10}.txt $fname # I have no idea how to do specify the files just created
# Loop through the sub-files to attach header row:
for subfile in $fname
do
cat header $subfile >> tmp_file
mv -f tmp_file $subfile
done
done
All these steps seem very complicated to me, I would very much appreciate if you could help me solve this in the right way. Thank you very much for your help.
-fra

You have a few problems with your code right now. First of all, at no point do you list the contents of your downloads directory. You are simply setting the FILES variable to a string that is the path to that directory. You would need something like:
FILES=$(ls Downloads/directory1/*.txt)
You also never cd to the Downloads/directory1 folder, so your mkdir would create directories in cwd; probably not what you want.
If you know that the numbers in column 3 always range from 1 to 10, I would just pre-populate those files with the header line before you split the file.
Try this code to do what you want (untested):
BASEDIR=Downloads/directory1/
FILES=$(ls ${BASEDIR}/*.txt)
for f in $FILES; do
# Create folder with root name by stripping file names
dirname=$(echo $f | sed 's/.txt//;s/Levels_CHG_Lab_S_//')
dirname="${BASENAME}/${dirname}/"
echo "Creating sub-directory [$dirname]"
mkdir "$dirname"
# Save the header to each file
HEADER_LINE=$(head -n1 $f)
for i in {1..10}; do
echo ${HEADER_LINE} > ${dirname}/${i}.txt
done
# Split each file by third column
echo "Splitting file $f"
awk -v dirname=${dirname} 'NR>1 {filename=dirname$3".txt"; print $0 >> filename }' $f
done

Select and zip files according to CSV input

I'm trying to read each column inside a csv file, and then search a directory using the data from the columns on the csv. Once all rows have been read, I'd like to create a new directory, copy the files to that directory, and finally zip the directory. Any ideas on how to do it?
Here's an example of the CSV file:
add an example here...

This is a draft of the script based on the informations you gave:
destdir="/path/to/destination/directory"
filelist=""
while IFS=, read col1 col2 col3 col4 ...
do
# check if the file should be included in filelist
if [ ... ]; then
filelist+="$col1/$col2/$col3.$col4 "
fi
if [ ! -z "$filelist" ]; then
mkdir -p $destdir
for goodfile in $filelist; do
cp $goodfile $destdir
done
fi
done < file.csv
To make things clearer I assumed the columns in the CSV file are just the parts of the path and name of the files you're interested in.
Hope this helps. :)

I don't know, if you would like to process every column in a different way, but here is a scratch of a script, if it does not matter.
separator=";" # select separator type in your csv file
csv_file="`pwd`/file.csv" # path to your csv file
search_dir="`pwd`/searchdir" # directory, where we search for files, that are in csv files
copy_to_dir="`pwd`/newdir" #directory where to save copied files
mkdir $copy_to_dir #making copy dir
for i in `cat ${csv_file} | tr $separator '\n'` ; do # reading all the entries in csv file
find $search_dir -name "$i" | xargs -i cp {} $copy_to_dir
# assuming, that all entries in csv are files to be found in a dir and copied
done
zip -j -r copied_files.zip $copy_to_dir # zip found contents
rmdir $copy_to_dir # if you need just the zip file

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to add a column at the end of multiple csv files using shell script - linux

Related

How to identify modified file names in a directory?

Bash Script to replicate files

Copy text from multiple files, same names to different path in bash (linux)

Split files according to a field and save in subdirectory created using the root name

Select and zip files according to CSV input

Categories

Resources