How to edit multiple file names at once? - linux

I have directory full of .txt files (2000 files). they have very long name. I want to edit their name and just keep certain letter from inside of their name as file name.
like this :
UNCID_279113.TCGA-A6-2683-01A-01R-0821-07.100902_UNC7-RDR3001641_00025_FC_62EPOAAXX.1.trimmed.annotated.gene.quantification.txt
I want eliminate this long names and just keep the name starting from TCGA and ending after three - ; for example, my new file name would be : TCGA-A6-2683-01A
does anybody knows how can I do this for whole files in one directory?

Assuming the files are in the current directory:
library(gsubfn)
pat <- "TCGA-[^-]*-[^-]*-[^-]*"
file.names <- dir(pattern = pat)
new.names <- strapplyc(file.names, pat, simplify = TRUE)
file.rename(file.names, new.names)
Create a shell/batch script Here is a variation. It produces a UNIX shell file or a Windows batch file. You can then review the file and run it:
# UNIX
writeLines(paste("mv", file.names, new.names), con = "tcga_rename.sh")
shell("tcga_rename.sh")
or on Windows:
# Windows
writeLines(paste("rename", file.names, new.names), con = "tcga_rename.bat")
shell("tcga_rename.bat")
REVISED: Factored out pat, simplified and added variations.

Assuming your files are in the current working directory, try
library(stringr)
files <- list.files(".", pattern=".txt")
file.rename(files, str_extract(files, "TCGA(-\\w+){3}"))

You can do something like this:
pattern <- ".*(TCGA-[^-]+-[^-]+-[^-]*).*"
file.rename(
list.files("."),
sub(pattern, "\\1", list.files("."))
)
But be super careful that the sub command does what you think it will do before you run the full thing (i.e. just run the sub piece). Hard to be sure this won't cause a problem without knowing what patterns you have in your file names.
Also, in this case replace list.files(".") with your directory. Note you don't need to filter our the files that match the pattern in the first place since sub will only modify the file names that do match the pattern (not super efficient if you have a lot of files that don't match the pattern, but easier to write, if a concern, you can use the pattern argument as Greg Snow does).

You cane use list.files() to get a list of the filenames in the directory, then use substitute with regular expressions to edit the names, then file.rename to actually do the renaming.
Something like (untested):
curfiles <- list.files(pattern='TCGA') # only grab files with TCGA in them
newfiles <- sub("^.*(TCGA-[a-zA-z0-9]+-[a-zA-Z0-9]+-[a-zA-Z0-9]+).*$", "\\1", curfiles)
file.rename(curfiles,newfiles)

Related

I need my Control-M File Watcher job to pick up a file having a specific type of file pattern name

I have files of two file pattern
HUB.SG.20220902.01.P
and
HUB.SG.20220902.001.P
in the second file name the .001. will keep on incrementing like the next file will be
HUB.SG.20220902.002.P
.
.
.
HUB.SG.20220902.100.P
and so on
but the first file will always be of the file name pattern as : HUB.SG.20220902.01.P
I want my File watcher job to just pick file of the pattern: HUB.SG.20220902.001.P and not of the pattern: HUB.SG.20220902.01.P
If I add in my job the filename = "path/HUB.SG..**.P" - it picks up files of both the filename pattern.
How do I resolve this so my file watcher job just picks up file of the filename pattern : "HUB.SG.20220902.___.P"
Use one * to match on one or more characters (including nothing).
Use ? to match single character. In your example you would use HUB.SG.20220902.???.P
However, you can use Control-M system variables for the date (should you be looking for today's dated file as standard). In that case, use %%$ODATE in place of 2022090 etc.
You can use a pattern like:
path/HUB.SG..**.???.P

How to search for regular expression match on s3 folder, and parse the files

Below is the s3 folder :
s3://bucket-name/20210802-123429/DM/US/2021/08/02/12/test.json
20210802-123429 is archive job which puts the files .
what i could achieve:
cred_obj = cred_conn.list_objects_v2(Bucket=cfg.Bucket_Details['extractjson'], Prefix="DM"+'/'+"US"+'/'+self.yr+'/'+self.mth+'/'+self.day+'/'+self.hr+'/')
Problem statement :
But, in above line, im not sure how to match the criteria for 20210802 and parse the "test.json"
list_objects_v2 does not support RegEx match. The only way to search is using the prefix. Therefore, you must know the prefix or part of the prefix in order to search.
timestr_arc = todays_dt.strftime("%Y%m%d")
cred_obj = cred_conn.list_objects_v2(Bucket=cfg.Bucket_Details['extractjson'], Prefix="DM"+'/'+"US"+'/'+str(self.timestr_arc))
This will check for the specific condition

Python3 - How to write a number to a file using a variable and sum it with the current number in the file

Suppose I have a file named test.txt and it currently has the number 6 inside of it. I want to use a variable such as x=4 then write to the file and add the two numbers together and save the result in the file.
var1 = 4.0
f=open(test.txt)
balancedata = f.read()
newbalance = float(balancedata) + float(var1)
f.write(newbalance)
print(newbalance)
f.close()
It's probably simpler than you're trying to make it:
variable = 4.0
with open('test.txt') as input_handle:
balance = float(input_handle.read()) + variable
with open('test.txt', 'w') as output_handle:
print(balance, file=output_handle)
Make sure 'test.txt' exists before you run this code and has a number in it, e.g. 0.0 -- you can also modify the code to deal with creating the file in the first place if it's not already there.
Files only read and write strings (or bytes for files opened in binary mode). You need to convert your float to a string before you can write it to your file.
Probably str(newbalance) is what you want, though you could customize how it appears using format if you want. For instance, you could round the number to two decimal places using format(newbalance, '.2f').
Also note that you can't write to a file opened only for reading, so you probably need to either use mode 'r+' (which allows both reading and writing) combined with a f.seek(0) call (and maybe f.truncate() if the length of the new numeric string might be shorter than the old length), or close the file and reopen it in 'w' mode (which will truncate the file for you).

Get a top level from Path object of pathlib

I use pathlib to match all files recursively to filter the files based on their content. Then I would like to find what is the top level of the folder of this file. Assume the following. I have a file in the folder:
a/b/c/file.log
I do the search from the level a:
for f in path_data.glob("**/*"):
if something inside file f:
# I would like to get in what folder this file is, i.e. 'b'
I now that I can get all parents levels using:
f.parents would give me b/c
f.parent would give me c
f.name would give me file.log
But how could I get b?
Just to precise: the number of levels where the file is stored is not known.
UPD: I know I could do it with split, but I would like to know if there is a proper API to do that. I couldn't find it.
The question was asked a while ago, but didn't quite get the attention. Nevertheless, I still would publish the answer:
f.parts[0]

Searching in multiple files using findstr, only proceeding with the resulting files? (cmd)

I'm currently working on a project where I search hundreds of files using findstr in the command line. If I find the string which I searched for, I want to proceed with this exact file (and the other ones that include my string).
So in my case:
I searched for the string WRI2016 by using:
H:\KOBINI>findstr "WRI2016" *.ini > %temp%\xx.txt && %temp%\xx.txt
To see what the PC does, I save it in a .txt file as you can see.
So if my file includes WRI2016 I want to extract some facts out of the file. In my case it is NR, Kunde, WebHDAktiv, DigIDAktiv.
But I just can't find a proper way to link both of these functions.
At first I simply printed all of the parameters:
H:\KOBINI>findstr "\<NR Kunde WRI2016 WebHDAktiv DigIDAktiv" *.ini > %temp%\xx.csv && %temp%\xx.csv
I also played around using the if command but that didn't really work out. I'm pretty new to this stuff as you'll see in my following tries to solve this problem:
H:\KOBINI>findstr "\<NR DigIDAktiv WebHDAktiv" set a =*.ini findstr "WRI2016" set b =*.ini if a EQU b > %temp%\xx.txt && %temp%\xx.txt
So all I wanted to achieve with that weird code was: if there is a WRI2016 in the file, give me the remaining parameters. But that didn't work out at all.
I also tried it with using new lines for every command which didn't change a thing.
As I want this to be a .csv in the end I want to add a semicolon between my parameters, any chance how I could do that? I've seen versions using -s";" which didn't do anything for me.
Sorry, I'm quite new and thought I'd give it a shot.
an example of my .ini files Looks like this:
> Kunde=Markt
> Nr=101381
> [...]
> DigIDAktiv=Ja
> WebHDAktiv=Nein
> Version=WRI2016_U2_P1
some files have a different Version though.
So I only want to know "NR, DigIDAktiv ..." if it's the 2016 Version.
As a result it should be sorted in a CSV, in different columns.
My Folder Looks like this
So I search These files in order to find Version 2016 and then try to extract my Information and put it into a .csv

Resources