Linux rename batch files according to a list - linux

I am looking to rename a bunch of files according to the names found in a separate list. Here is the situation:
Files:
file_0001.txt
file_0102.txt
file_ab42.txt
I want to change the names of these files according to a list of corresponding names that looks like :
0001 abc.01
0102 abc.02
ab42 def.01
I want to replace, for each file, the part of the name present in the first column of my list by the part in the second column:
file_0001.txt -> file_abc.01.txt
file_0102.txt -> file_abc.02.txt
file_ab42.txt -> file_def.01.txt
I looked into several mv, rename and such commands, but I only found ways to rename batch files according to a single pattern in the file name, not matching the changes with a list.
Does anyone has a example of script that I could use to do that ?

while read a b; do mv file_$a.txt $b;done < listfile

Related

Copy a set of files using ADF

I have 10 files in a folder and want to move 4 of them in a different location.
I tried 2 approaches to achieve this -
using lookup to retrieve the filenames from a json file- then feeding it to a for each iterator
using metadata to get file names from source folder and then adding if condition inside a for each to copy the files.
But in both the cases, all the files in source folder gets copied.
Any help would be appreciated.
Thanks!!
There a 3 ways you can consider selecting your files depending on the requirement or blockers.
Checkout official MS doc: Copy activity properties
1. Dynamic content for FilePath property in Source Dataset.
2. You can use Wildcard character in the source folder and file path in the source Dataset.
Allowed wildcards are: * (matches zero or more characters) and ?
(matches zero or single character); use ^ to escape if your actual
folder name has wildcard or this escape char inside. See more
examples in Folder and file filter
examples.
3. List of Files
Point to a text file that includes a list of files you want to copy,
one file per line, which is the relative path to the path configured
in the dataset. When using this option, do not specify file name in
dataset. See more examples in File list
examples.
Example:
Parameterize source dataset and set source file name to that which passes the expression evaluation in IfCondition Activity.

How to read multiple CSV (leaving out specific ones) from a nested directory in PySpark?

Lets say I have a directory called 'all_data', and inside this, I have several other directories based on the date of the data that it contains. These directories are named date_2020_11_01 to date_2020_11_30 and each one of these contain csv files which I intend to read in a single dataframe.
But I don't want to read the data for date_2020_11_15 and date_2020_11_16. How do I do it?
I'm not sure how to exclude certain files, but you can specify a range of file names using brackets. Code below would select all files without 11_15 and 11_16:
spark.read.csv("date_2020_11_{1[0-4,7-9],[0,2-3][0-9]}.csv")
df= spark.read.format("parquet").option("header", "true").load(paths)
where paths is a list of all the paths where data is present, worked for me.
Simple method is, read all data directory as it is and apply filter condition
df.filter("dataColumn != 'date_2020_11_15' & 'date_2020_11_16'")
Else you can use OS module read directory and iterate to that list to eliminate those date directory using condition.

Copying and pasting using Python for files with similar but not exact names

I have two folders each with several files.
Folder 1:
abc_1600_efg.xlsx
abc_1601_efg.xlsx
abc_1602_efg.xlsx
abc_1603_efg.xlsx
Folder 2:
ijk_1600_xyz.xlsx
ijk_1601_xyz.xlsx
ijk_1602_xyz.xlsx
ijk_1603_xyz.xlsx
lmn_1600_tuv.xlsx
lmn_1601_tuv.xlsx
lmn_1602_tuv.xlsx
lmn_1603_tuv.xlsx
Assuming the files in each folder are randomized, anyone have any ideas on how to use python 3.x to copy from file 'abc_1600_efg.xlsx' in folder 1 then have python search for the corresponding file in folder 2 ('ijk_1600_xyz.xlsx'). The number portion of the title is the key that needs to be matched. Then I want to paste the data into the file 'ijk_1600_xyz.xlsx' (folder two has two files with the same number 1600 but I need to find just the 'ijk_1600_xyz' file).
I want to loop this so that this would be done for every file in folder 1 starting at 1600 then 1601 then 1602 etc. I have the copy and paste portion finished I'm just stuck on the search and match portion.
Thank you in advance.
I haven't checked it
but something like:
import re,os
for file1 in os.listdir(folder1):
match=re.match('..._(\d+)_.*'),file1).group(1)
for file2 in os.listdir(folder2):
if ('_'+match+'_' in file2) :
... copy ...
Anyway, you should know how to adapt to these situations.

Get a top level from Path object of pathlib

I use pathlib to match all files recursively to filter the files based on their content. Then I would like to find what is the top level of the folder of this file. Assume the following. I have a file in the folder:
a/b/c/file.log
I do the search from the level a:
for f in path_data.glob("**/*"):
if something inside file f:
# I would like to get in what folder this file is, i.e. 'b'
I now that I can get all parents levels using:
f.parents would give me b/c
f.parent would give me c
f.name would give me file.log
But how could I get b?
Just to precise: the number of levels where the file is stored is not known.
UPD: I know I could do it with split, but I would like to know if there is a proper API to do that. I couldn't find it.
The question was asked a while ago, but didn't quite get the attention. Nevertheless, I still would publish the answer:
f.parts[0]

Matching text files from a list of system numbers

I have ~ 60K bibliographic records, which can be identified by system number. These records also hold full-text (individudal text files named by system number).
I have lists of system numbers in bunches of 5K and I need to find a way to copy only the text files from each 5K list.
All text files are stored in a directory (/fulltext) and are named something along these lines:
014776324.txt.
The 5k lists are plain text stored in separated directories (e.g. /5k_list_1, 5k_list_2, ...), where each system number matches to a .txt file.
For example: bibliographic record 014776324 matches to 014776324.txt.
I am struggling to find a way to copy into the 5k_list_* folders only the corresponding text files.
Any idea?
Thanks indeed,
Let's assume we invoke the following script this way:
./the-script.sh fulltext 5k_list_1 5k_list_2 [...]
Or more succinctly:
./the-script.sh fulltext 5k_list_*
Then try using this (totally untested) script:
#!/usr/bin/env bash
set -eu # enable error checking
src_dir=$1 # first argument is where to copy files from
shift 1
for list_dir; do # implicitly consumes remaining args
while read bibliographic record sys_num rest; do
cp "$src_dir/$sys_num.txt" "$list_dir/"
done < "$list_dir/list.txt"
done

Resources