Select and zip files according to CSV input

Select and zip files according to CSV input - linux

I'm trying to read each column inside a csv file, and then search a directory using the data from the columns on the csv. Once all rows have been read, I'd like to create a new directory, copy the files to that directory, and finally zip the directory. Any ideas on how to do it?
Here's an example of the CSV file:
add an example here...

This is a draft of the script based on the informations you gave:
destdir="/path/to/destination/directory"
filelist=""
while IFS=, read col1 col2 col3 col4 ...
do
# check if the file should be included in filelist
if [ ... ]; then
filelist+="$col1/$col2/$col3.$col4 "
fi
if [ ! -z "$filelist" ]; then
mkdir -p $destdir
for goodfile in $filelist; do
cp $goodfile $destdir
done
fi
done < file.csv
To make things clearer I assumed the columns in the CSV file are just the parts of the path and name of the files you're interested in.
Hope this helps. :)

I don't know, if you would like to process every column in a different way, but here is a scratch of a script, if it does not matter.
separator=";" # select separator type in your csv file
csv_file="`pwd`/file.csv" # path to your csv file
search_dir="`pwd`/searchdir" # directory, where we search for files, that are in csv files
copy_to_dir="`pwd`/newdir" #directory where to save copied files
mkdir $copy_to_dir #making copy dir
for i in `cat ${csv_file} | tr $separator '\n'` ; do # reading all the entries in csv file
find $search_dir -name "$i" | xargs -i cp {} $copy_to_dir
# assuming, that all entries in csv are files to be found in a dir and copied
done
zip -j -r copied_files.zip $copy_to_dir # zip found contents
rmdir $copy_to_dir # if you need just the zip file

Related

How to add a column at the end of multiple csv files using shell script

I have a couple of thousands CSV file. All of them have same structure and header. I would like to add a column at the end of the file. I found several solutions that add a column and value to that column but I didn't find anything that adds the header for that new column. For example, I have files like 1001.csv, 1002.csv, 1003.csv and so on.
Contents of 1001.csv
ID,URL
1,one.com
2,two.com
I want to modify it like this
ID,URL,FILE
1,one.com,1001
2,two.com,1001
Since I have tons of files like this, I don't want to mess up the data while adding a column. Also, I don't want to produce extra files if it's possible to do in place update.

I tested this on a huge number of files and it worked really fast. This code removes the header first then add a column plus value to the column and finally brings the header back.
#!/bin/bash
# How to run $ ./this-script.sh inputdir/
# here inputdir contains all csv files
# input argument is dir name
DIRNAME=`basename $1`
# go to target directory
cd $DIRNAME
# get list of all csv files
csvfiles=`ls *.csv`
for FILENAME in $csvfiles
do
echo $FILENAME
# filename without extension
CODE="${FILENAME%.*}"
echo $CODE
## remove header
tail -n +2 "$FILENAME" > "$FILENAME.tmp" && mv "$FILENAME.tmp" "$FILENAME"
## add new field at the end
sed "s/$/,$CODE/" "$FILENAME" > "$FILENAME.tmp2"
## add header with new column name
# keep filename.bak as a backup for safety
sed -i.bak 1i"id,url,file" "$FILENAME.tmp2"
# if all good then remove temp files
rm "$FILENAME"
rm "$FILENAME.tmp2.bak"
# rename output file to original name
mv "$FILENAME.tmp2" "$FILENAME"
done
# go back to parent directory
cd ..

Copy numbered files to corresponding numbered directory using Linux bash commands or script

This should be a relatively straightforward problem but I haven't found any answers within stackoverflow. In a given directory, I have ~1000 files that are numbered (e.g. chem-0320.inp). I would like to cp the numbered file to a correspondingly numbered directory; all copied files will be renamed with the same name. I would like to do this for a specified numbered of files (#'s 300-500 for example).
For example, I would like to copy chem-0320.inp to a directory named 320 and rename it mech.dat.
Another example: copy chem-0430.inp to a directory named 430 and rename it mech.dat.
Thanks in advance for your help!

The following script would do the work for you
for file in *.inp
do
dir=$(echo $file | sed -r 's/[^0-9]+0([0-9]+).*/\1/g')
mv $file $dir/mech.dat
done

"cd" first to right dir. Subdirs will be created there.
#!/bin/bash
lo_limit=300
hi_limit=500
for file in ./*.inp
do
dir="${file//[^0-9]/}"
dir_cut="${dir:1:3}" # leading zero cut off
if [ $dir_cut -ge $lo_limit ] && [ $dir_cut -le $hi_limit ]; then
echo "$file $dir_cut"
mkdir -p "$dir_cut"
cp "$file" "$dir_cut"/mech.dat
fi
done

Copy text from multiple files, same names to different path in bash (linux)

I need help copying content from various files to others (same name and format, different path).
For example, $HOME/initial/baby.desktop has text which I need to write into $HOME/scripts/baby.desktop. This is very simple for a single file, but I have 2500 files in $HOME/initial/ and the same number in $HOME/scripts/ with corresponding names (same names and format). I want append (copy) the content of file in path A to path B (which have the same name and format), to the end of file in path B without erase the content of file in path B.
Example content of $HOME/initial/*.desktop to final $HOME/scripts/*.desktop. I tried the following, but it don't work:
cd $HOME/initial/
for i in $( ls *.desktop ); do egrep "Icon" $i >> $HOME/scripts/$i; done

Firstly, I would backup $HOME/initial and $HOME/scripts, because there is lots of scope for people misunderstanding your question. Like this:
cd $HOME
tar -cvf initial.tar initial
tar -cvf scripts.tar scripts
That will put all the files in $HOME/initial into a single tarfile called initial.tar and all the files in $HOME/scripts into a single tarfile called scripts.tar.
Now for your question... in general, if you want to put the contents of FileB onto the end of FileA, the command is
cat FileB >> FileA
Note the DOUBLE ">>" which means "append" rather than single ">" which means overwrite.
So, I think you want to do this:
cd $HOME/initial/baby.desktop
cat SomeFile >> $HOME/scripts/baby.desktop/SomeFile
where SomeFile is the name of any file you choose to test with. I would test that has worked and then, if you are happy with that, go ahead and run the same command inside a loop:
cd $HOME/initial/baby.desktop
for SOURCE in *
do
DESTINATION="$HOME/scripts/baby.desktop/$SOURCE"
echo Appending "$SOURCE" to "$DESTINATION"
#cat "$SOURCE" >> "$DESTINATION"
done
When the output looks correct, remove the "#" at the start of the penultimate line and run it again.

I solved it, if some people want learn how to resolve is very simple:
using Sed
I need only the match (or pattern) line "Icon=/usr/share/some_picture.png into $HOME/initial/example.desktop to other with same name and format $HOME/scripts/example.desktop, but I had a lot of .desktop files (2500 files)
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do sed -ne '/Icon=/ p' $i >> $HOME/scripts/$i ; done
_________
If you need only copy all to other file with same name and format
using cat
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do cat $i >> $HOME/scripts/$i ; done

Split files according to a field and save in subdirectory created using the root name

I am having trouble with several bits of code, I am no expert in Linux Bash programming unfortunately so I have tried unsuccessfully to find something that works for my task all day and was hoping you could help guide me in the right direction.
I have many large files that I would like to split according to the third field within each of them, I would like to keep the header in each of the sub-files, and save the created sub-files in new directories created from the root names of the files.
The initial files stored in the original directory are:
Downloads/directory1/Levels_CHG_Lab_S_sample1.txt
Downloads/directory1/Levels_CHG_Lab_S_sample2.txt
Downloads/directory1/Levels_CHG_Lab_S_sample3.txt
and so on..
Each of these files have 200 columns, and column 3 contains values from 1 through 10.
I would like to split each of the files above based on the value of this column, and store the subfiles in subfolders, so for example sub-folder "Downloads/directory1/sample1" will contain 10 files (with the header line) derived by splitting the file Downloads/directory1/Levels_CHG_Lab_S_sample1.txt.
I have tried now many different steps for these steps, with no success.. I must be making this more complicated than it is since the code I have tried looks aweful…
Here is the code I am trying to work from:
FILES=Downloads/directory1/
for f in $FILES
do
# Create folder with root name by stripping file names
fname=${echo $f | sed 's/.txt//;s/Levels_CHG_Lab_S_//'}
echo "Creating sub-directory [$fname]"
mkdir "$fname"
# Save the header
awk 'NR==1{print $0}' $f > header
# Split each file by third column
echo "Splitting file $f"
awk 'NR>1 {print $0 > $3".txt" }' $f
# Move newly created files in sub directory
mv {1..10}.txt $fname # I have no idea how to do specify the files just created
# Loop through the sub-files to attach header row:
for subfile in $fname
do
cat header $subfile >> tmp_file
mv -f tmp_file $subfile
done
done
All these steps seem very complicated to me, I would very much appreciate if you could help me solve this in the right way. Thank you very much for your help.
-fra

You have a few problems with your code right now. First of all, at no point do you list the contents of your downloads directory. You are simply setting the FILES variable to a string that is the path to that directory. You would need something like:
FILES=$(ls Downloads/directory1/*.txt)
You also never cd to the Downloads/directory1 folder, so your mkdir would create directories in cwd; probably not what you want.
If you know that the numbers in column 3 always range from 1 to 10, I would just pre-populate those files with the header line before you split the file.
Try this code to do what you want (untested):
BASEDIR=Downloads/directory1/
FILES=$(ls ${BASEDIR}/*.txt)
for f in $FILES; do
# Create folder with root name by stripping file names
dirname=$(echo $f | sed 's/.txt//;s/Levels_CHG_Lab_S_//')
dirname="${BASENAME}/${dirname}/"
echo "Creating sub-directory [$dirname]"
mkdir "$dirname"
# Save the header to each file
HEADER_LINE=$(head -n1 $f)
for i in {1..10}; do
echo ${HEADER_LINE} > ${dirname}/${i}.txt
done
# Split each file by third column
echo "Splitting file $f"
awk -v dirname=${dirname} 'NR>1 {filename=dirname$3".txt"; print $0 >> filename }' $f
done

copy all unique files in a directory based on hashes

file=$3
#Using $3 as I am using 1 & 2 in the rest of the script[that works]
file_hash=md5sum "$file" | cut -d ' ' -f l
#generates hashes for file
for a in /path/to/source/* #loop for all files in directory
do
if [ "$file_hash" == $(md5sum "$a" | cut -d ' ' -f l) ]:
#if the file hash is equal to the hash generated then file is copied to path/to/source
then cp "file" /path/to/source/*
else cp "$file" "file.JPG" mv "file.JPG" /path/to/source/$file #otherwise the file renamed as file.JPG so it is not overwritten
fi
done
Can anyone help me with this code?
I'm trying to write a script in Bash which will generate hashes for all my files within a directory, if there is two duplicate hashes, then only one of the images is copied to the destination directory, can anyone see where I am going wrong here?
I have to use md5sum, so no other sha1s, fdupes or anything like that unfortunately.

Assuming it doesn't matter which of the unique files is copied, a simple way would be to use bash's support for associative arrays:
declare -A files
while read hash name
do
files[$hash]=$name
done < <(md5sum /path/to/source/*)
cp "${files[#]}" /path/to/dest
Any file with an identical hash will simply overwrite the record of the previous one, leaving you with only unique files in the array.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Select and zip files according to CSV input - linux

Related

How to add a column at the end of multiple csv files using shell script

Copy numbered files to corresponding numbered directory using Linux bash commands or script

Copy text from multiple files, same names to different path in bash (linux)

Split files according to a field and save in subdirectory created using the root name

copy all unique files in a directory based on hashes

Categories

Resources