Rename file to folder path - bulk

I'm trying to rename files within thousands of folders to the folder path
So the file 123.jpg in C:\ABC\123.jpg needs to be renamed to ABC_123.jpg
Whats the simplest way of doing so?

Depends on your language... take folders/file names in as a string. parse through it example: string[] lines = folders.split('\'). Then use the spot with the ABC to concatenate it to the files string.

Related

python 3.x: how to combine dictionary zipped log file if exist in two different directories into another directory

I have below structure of directories having .log.gz files. Also these logs files are having String dictionary type. Anyone of these directories may not exist as well.
xyz1/
2022-08-08T01:31Z.log.gz
2022-08-08T01:33Z.log.gz
xyz12/
2022-08-08T01:30Z.log.gz
2022-08-08T01:33Z.log.gz
I want to create another directory and combine above files
xyz/
2022-08-08T01:30Z.log.gz
2022-08-08T01:31Z.log.gz
2022-08-08T01:33Z.log.gz
Conditions:
any one of xyz1 and xyz2 may exist or both can exist
If same name file exist in both directory combine them into third one "xyz"
While combining the String dictionary format should be retained
Solution I opted:
Check one directory if it exists and iterate over files.
For each files check if same file exist in other directory.
If yes, than decompress both files and combine them, zip them into xyz directory
if not than copy it into xyz
Is there any better way to perform above operation. Below is how I combine two log files.
combinefile = {}
combinefile.update(json.loads(xyz1/file1.log))
combinefile.update(json.loads(xyz2/file1.log))

Copy a set of files using ADF

I have 10 files in a folder and want to move 4 of them in a different location.
I tried 2 approaches to achieve this -
using lookup to retrieve the filenames from a json file- then feeding it to a for each iterator
using metadata to get file names from source folder and then adding if condition inside a for each to copy the files.
But in both the cases, all the files in source folder gets copied.
Any help would be appreciated.
Thanks!!
There a 3 ways you can consider selecting your files depending on the requirement or blockers.
Checkout official MS doc: Copy activity properties
1. Dynamic content for FilePath property in Source Dataset.
2. You can use Wildcard character in the source folder and file path in the source Dataset.
Allowed wildcards are: * (matches zero or more characters) and ?
(matches zero or single character); use ^ to escape if your actual
folder name has wildcard or this escape char inside. See more
examples in Folder and file filter
examples.
3. List of Files
Point to a text file that includes a list of files you want to copy,
one file per line, which is the relative path to the path configured
in the dataset. When using this option, do not specify file name in
dataset. See more examples in File list
examples.
Example:
Parameterize source dataset and set source file name to that which passes the expression evaluation in IfCondition Activity.

How to read multiple CSV (leaving out specific ones) from a nested directory in PySpark?

Lets say I have a directory called 'all_data', and inside this, I have several other directories based on the date of the data that it contains. These directories are named date_2020_11_01 to date_2020_11_30 and each one of these contain csv files which I intend to read in a single dataframe.
But I don't want to read the data for date_2020_11_15 and date_2020_11_16. How do I do it?
I'm not sure how to exclude certain files, but you can specify a range of file names using brackets. Code below would select all files without 11_15 and 11_16:
spark.read.csv("date_2020_11_{1[0-4,7-9],[0,2-3][0-9]}.csv")
df= spark.read.format("parquet").option("header", "true").load(paths)
where paths is a list of all the paths where data is present, worked for me.
Simple method is, read all data directory as it is and apply filter condition
df.filter("dataColumn != 'date_2020_11_15' & 'date_2020_11_16'")
Else you can use OS module read directory and iterate to that list to eliminate those date directory using condition.

Get a top level from Path object of pathlib

I use pathlib to match all files recursively to filter the files based on their content. Then I would like to find what is the top level of the folder of this file. Assume the following. I have a file in the folder:
a/b/c/file.log
I do the search from the level a:
for f in path_data.glob("**/*"):
if something inside file f:
# I would like to get in what folder this file is, i.e. 'b'
I now that I can get all parents levels using:
f.parents would give me b/c
f.parent would give me c
f.name would give me file.log
But how could I get b?
Just to precise: the number of levels where the file is stored is not known.
UPD: I know I could do it with split, but I would like to know if there is a proper API to do that. I couldn't find it.
The question was asked a while ago, but didn't quite get the attention. Nevertheless, I still would publish the answer:
f.parts[0]

How to edit multiple file names at once?

I have directory full of .txt files (2000 files). they have very long name. I want to edit their name and just keep certain letter from inside of their name as file name.
like this :
UNCID_279113.TCGA-A6-2683-01A-01R-0821-07.100902_UNC7-RDR3001641_00025_FC_62EPOAAXX.1.trimmed.annotated.gene.quantification.txt
I want eliminate this long names and just keep the name starting from TCGA and ending after three - ; for example, my new file name would be : TCGA-A6-2683-01A
does anybody knows how can I do this for whole files in one directory?
Assuming the files are in the current directory:
library(gsubfn)
pat <- "TCGA-[^-]*-[^-]*-[^-]*"
file.names <- dir(pattern = pat)
new.names <- strapplyc(file.names, pat, simplify = TRUE)
file.rename(file.names, new.names)
Create a shell/batch script Here is a variation. It produces a UNIX shell file or a Windows batch file. You can then review the file and run it:
# UNIX
writeLines(paste("mv", file.names, new.names), con = "tcga_rename.sh")
shell("tcga_rename.sh")
or on Windows:
# Windows
writeLines(paste("rename", file.names, new.names), con = "tcga_rename.bat")
shell("tcga_rename.bat")
REVISED: Factored out pat, simplified and added variations.
Assuming your files are in the current working directory, try
library(stringr)
files <- list.files(".", pattern=".txt")
file.rename(files, str_extract(files, "TCGA(-\\w+){3}"))
You can do something like this:
pattern <- ".*(TCGA-[^-]+-[^-]+-[^-]*).*"
file.rename(
list.files("."),
sub(pattern, "\\1", list.files("."))
)
But be super careful that the sub command does what you think it will do before you run the full thing (i.e. just run the sub piece). Hard to be sure this won't cause a problem without knowing what patterns you have in your file names.
Also, in this case replace list.files(".") with your directory. Note you don't need to filter our the files that match the pattern in the first place since sub will only modify the file names that do match the pattern (not super efficient if you have a lot of files that don't match the pattern, but easier to write, if a concern, you can use the pattern argument as Greg Snow does).
You cane use list.files() to get a list of the filenames in the directory, then use substitute with regular expressions to edit the names, then file.rename to actually do the renaming.
Something like (untested):
curfiles <- list.files(pattern='TCGA') # only grab files with TCGA in them
newfiles <- sub("^.*(TCGA-[a-zA-z0-9]+-[a-zA-Z0-9]+-[a-zA-Z0-9]+).*$", "\\1", curfiles)
file.rename(curfiles,newfiles)

Resources