Select and zip files according to CSV input - linux

I'm trying to read each column inside a csv file, and then search a directory using the data from the columns on the csv. Once all rows have been read, I'd like to create a new directory, copy the files to that directory, and finally zip the directory. Any ideas on how to do it?
Here's an example of the CSV file:
add an example here...

This is a draft of the script based on the informations you gave:
destdir="/path/to/destination/directory"
filelist=""
while IFS=, read col1 col2 col3 col4 ...
do
# check if the file should be included in filelist
if [ ... ]; then
filelist+="$col1/$col2/$col3.$col4 "
fi
if [ ! -z "$filelist" ]; then
mkdir -p $destdir
for goodfile in $filelist; do
cp $goodfile $destdir
done
fi
done < file.csv
To make things clearer I assumed the columns in the CSV file are just the parts of the path and name of the files you're interested in.
Hope this helps. :)

I don't know, if you would like to process every column in a different way, but here is a scratch of a script, if it does not matter.
separator=";" # select separator type in your csv file
csv_file="`pwd`/file.csv" # path to your csv file
search_dir="`pwd`/searchdir" # directory, where we search for files, that are in csv files
copy_to_dir="`pwd`/newdir" #directory where to save copied files
mkdir $copy_to_dir #making copy dir
for i in `cat ${csv_file} | tr $separator '\n'` ; do # reading all the entries in csv file
find $search_dir -name "$i" | xargs -i cp {} $copy_to_dir
# assuming, that all entries in csv are files to be found in a dir and copied
done
zip -j -r copied_files.zip $copy_to_dir # zip found contents
rmdir $copy_to_dir # if you need just the zip file

Related

How to add a column at the end of multiple csv files using shell script

I have a couple of thousands CSV file. All of them have same structure and header. I would like to add a column at the end of the file. I found several solutions that add a column and value to that column but I didn't find anything that adds the header for that new column. For example, I have files like 1001.csv, 1002.csv, 1003.csv and so on.
Contents of 1001.csv
ID,URL
1,one.com
2,two.com
I want to modify it like this
ID,URL,FILE
1,one.com,1001
2,two.com,1001
Since I have tons of files like this, I don't want to mess up the data while adding a column. Also, I don't want to produce extra files if it's possible to do in place update.
I tested this on a huge number of files and it worked really fast. This code removes the header first then add a column plus value to the column and finally brings the header back.
#!/bin/bash
# How to run $ ./this-script.sh inputdir/
# here inputdir contains all csv files
# input argument is dir name
DIRNAME=`basename $1`
# go to target directory
cd $DIRNAME
# get list of all csv files
csvfiles=`ls *.csv`
for FILENAME in $csvfiles
do
echo $FILENAME
# filename without extension
CODE="${FILENAME%.*}"
echo $CODE
## remove header
tail -n +2 "$FILENAME" > "$FILENAME.tmp" && mv "$FILENAME.tmp" "$FILENAME"
## add new field at the end
sed "s/$/,$CODE/" "$FILENAME" > "$FILENAME.tmp2"
## add header with new column name
# keep filename.bak as a backup for safety
sed -i.bak 1i"id,url,file" "$FILENAME.tmp2"
# if all good then remove temp files
rm "$FILENAME"
rm "$FILENAME.tmp2.bak"
# rename output file to original name
mv "$FILENAME.tmp2" "$FILENAME"
done
# go back to parent directory
cd ..

Copy numbered files to corresponding numbered directory using Linux bash commands or script

This should be a relatively straightforward problem but I haven't found any answers within stackoverflow. In a given directory, I have ~1000 files that are numbered (e.g. chem-0320.inp). I would like to cp the numbered file to a correspondingly numbered directory; all copied files will be renamed with the same name. I would like to do this for a specified numbered of files (#'s 300-500 for example).
For example, I would like to copy chem-0320.inp to a directory named 320 and rename it mech.dat.
Another example: copy chem-0430.inp to a directory named 430 and rename it mech.dat.
Thanks in advance for your help!
The following script would do the work for you
for file in *.inp
do
dir=$(echo $file | sed -r 's/[^0-9]+0([0-9]+).*/\1/g')
mv $file $dir/mech.dat
done
"cd" first to right dir. Subdirs will be created there.
#!/bin/bash
lo_limit=300
hi_limit=500
for file in ./*.inp
do
dir="${file//[^0-9]/}"
dir_cut="${dir:1:3}" # leading zero cut off
if [ $dir_cut -ge $lo_limit ] && [ $dir_cut -le $hi_limit ]; then
echo "$file $dir_cut"
mkdir -p "$dir_cut"
cp "$file" "$dir_cut"/mech.dat
fi
done

Copy text from multiple files, same names to different path in bash (linux)

I need help copying content from various files to others (same name and format, different path).
For example, $HOME/initial/baby.desktop has text which I need to write into $HOME/scripts/baby.desktop. This is very simple for a single file, but I have 2500 files in $HOME/initial/ and the same number in $HOME/scripts/ with corresponding names (same names and format). I want append (copy) the content of file in path A to path B (which have the same name and format), to the end of file in path B without erase the content of file in path B.
Example content of $HOME/initial/*.desktop to final $HOME/scripts/*.desktop. I tried the following, but it don't work:
cd $HOME/initial/
for i in $( ls *.desktop ); do egrep "Icon" $i >> $HOME/scripts/$i; done
Firstly, I would backup $HOME/initial and $HOME/scripts, because there is lots of scope for people misunderstanding your question. Like this:
cd $HOME
tar -cvf initial.tar initial
tar -cvf scripts.tar scripts
That will put all the files in $HOME/initial into a single tarfile called initial.tar and all the files in $HOME/scripts into a single tarfile called scripts.tar.
Now for your question... in general, if you want to put the contents of FileB onto the end of FileA, the command is
cat FileB >> FileA
Note the DOUBLE ">>" which means "append" rather than single ">" which means overwrite.
So, I think you want to do this:
cd $HOME/initial/baby.desktop
cat SomeFile >> $HOME/scripts/baby.desktop/SomeFile
where SomeFile is the name of any file you choose to test with. I would test that has worked and then, if you are happy with that, go ahead and run the same command inside a loop:
cd $HOME/initial/baby.desktop
for SOURCE in *
do
DESTINATION="$HOME/scripts/baby.desktop/$SOURCE"
echo Appending "$SOURCE" to "$DESTINATION"
#cat "$SOURCE" >> "$DESTINATION"
done
When the output looks correct, remove the "#" at the start of the penultimate line and run it again.
I solved it, if some people want learn how to resolve is very simple:
using Sed
I need only the match (or pattern) line "Icon=/usr/share/some_picture.png into $HOME/initial/example.desktop to other with same name and format $HOME/scripts/example.desktop, but I had a lot of .desktop files (2500 files)
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do sed -ne '/Icon=/ p' $i >> $HOME/scripts/$i ; done
_________
If you need only copy all to other file with same name and format
using cat
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do cat $i >> $HOME/scripts/$i ; done

Split files according to a field and save in subdirectory created using the root name

I am having trouble with several bits of code, I am no expert in Linux Bash programming unfortunately so I have tried unsuccessfully to find something that works for my task all day and was hoping you could help guide me in the right direction.
I have many large files that I would like to split according to the third field within each of them, I would like to keep the header in each of the sub-files, and save the created sub-files in new directories created from the root names of the files.
The initial files stored in the original directory are:
Downloads/directory1/Levels_CHG_Lab_S_sample1.txt
Downloads/directory1/Levels_CHG_Lab_S_sample2.txt
Downloads/directory1/Levels_CHG_Lab_S_sample3.txt
and so on..
Each of these files have 200 columns, and column 3 contains values from 1 through 10.
I would like to split each of the files above based on the value of this column, and store the subfiles in subfolders, so for example sub-folder "Downloads/directory1/sample1" will contain 10 files (with the header line) derived by splitting the file Downloads/directory1/Levels_CHG_Lab_S_sample1.txt.
I have tried now many different steps for these steps, with no success.. I must be making this more complicated than it is since the code I have tried looks aweful…
Here is the code I am trying to work from:
FILES=Downloads/directory1/
for f in $FILES
do
# Create folder with root name by stripping file names
fname=${echo $f | sed 's/.txt//;s/Levels_CHG_Lab_S_//'}
echo "Creating sub-directory [$fname]"
mkdir "$fname"
# Save the header
awk 'NR==1{print $0}' $f > header
# Split each file by third column
echo "Splitting file $f"
awk 'NR>1 {print $0 > $3".txt" }' $f
# Move newly created files in sub directory
mv {1..10}.txt $fname # I have no idea how to do specify the files just created
# Loop through the sub-files to attach header row:
for subfile in $fname
do
cat header $subfile >> tmp_file
mv -f tmp_file $subfile
done
done
All these steps seem very complicated to me, I would very much appreciate if you could help me solve this in the right way. Thank you very much for your help.
-fra
You have a few problems with your code right now. First of all, at no point do you list the contents of your downloads directory. You are simply setting the FILES variable to a string that is the path to that directory. You would need something like:
FILES=$(ls Downloads/directory1/*.txt)
You also never cd to the Downloads/directory1 folder, so your mkdir would create directories in cwd; probably not what you want.
If you know that the numbers in column 3 always range from 1 to 10, I would just pre-populate those files with the header line before you split the file.
Try this code to do what you want (untested):
BASEDIR=Downloads/directory1/
FILES=$(ls ${BASEDIR}/*.txt)
for f in $FILES; do
# Create folder with root name by stripping file names
dirname=$(echo $f | sed 's/.txt//;s/Levels_CHG_Lab_S_//')
dirname="${BASENAME}/${dirname}/"
echo "Creating sub-directory [$dirname]"
mkdir "$dirname"
# Save the header to each file
HEADER_LINE=$(head -n1 $f)
for i in {1..10}; do
echo ${HEADER_LINE} > ${dirname}/${i}.txt
done
# Split each file by third column
echo "Splitting file $f"
awk -v dirname=${dirname} 'NR>1 {filename=dirname$3".txt"; print $0 >> filename }' $f
done

copy all unique files in a directory based on hashes

file=$3
#Using $3 as I am using 1 & 2 in the rest of the script[that works]
file_hash=md5sum "$file" | cut -d ' ' -f l
#generates hashes for file
for a in /path/to/source/* #loop for all files in directory
do
if [ "$file_hash" == $(md5sum "$a" | cut -d ' ' -f l) ]:
#if the file hash is equal to the hash generated then file is copied to path/to/source
then cp "file" /path/to/source/*
else cp "$file" "file.JPG" mv "file.JPG" /path/to/source/$file #otherwise the file renamed as file.JPG so it is not overwritten
fi
done
Can anyone help me with this code?
I'm trying to write a script in Bash which will generate hashes for all my files within a directory, if there is two duplicate hashes, then only one of the images is copied to the destination directory, can anyone see where I am going wrong here?
I have to use md5sum, so no other sha1s, fdupes or anything like that unfortunately.
Assuming it doesn't matter which of the unique files is copied, a simple way would be to use bash's support for associative arrays:
declare -A files
while read hash name
do
files[$hash]=$name
done < <(md5sum /path/to/source/*)
cp "${files[#]}" /path/to/dest
Any file with an identical hash will simply overwrite the record of the previous one, leaving you with only unique files in the array.

Resources