Split a string by '_'

Split a string by '_' - python-3.x

I have a number of files in a directory with the following file format:
roll_#_oe_yyyy-mm-dd.csv
where # is a integer and yyyy-mm-dd is a date (for example roll_6_oe_2008-02-12).
I am trying to use the split function so I can return the number on its own. So for example:
roll_6_oe_2008-02-12 would yield 6
and
roll_14_oe_2008-02-12 would yield 14
I have tried :
filename.split("_")
but cannot write the number to a variable. What can I try next?

Supposing that: filename = 'roll_14_oe_2008-02-12'
print(filename.split('_')) evaluates to ['roll', '14', 'oe', '2008-02-12']
The number you want to retrieve is in the 2nd position of the list:
my_number = filename.split('_')[1]
You could also extract the number using regex:
import re
filename = 'roll_134_oe_2008-02-12'
number_match = re.match("roll_*(\d+)", filename)
if number_match:
print number_match.group(1)
Working example for both methods: http://www.codeskulptor.org/#user41_jEFOv5N5GN_2.py

Related

How to search for a specific string and replace a number in this string usign python?

I have a text file that is a Fortran code. I am trying to create copies (Fortran files) of this code. But in each copy I create I want to find and replace few things. As an example:
This is the code and I want to search for pMax, tShot, and replace the numbers next to it.
I would be grateful if someone can give me a hint on how this can be done using python 3.x. Thank you.
This is my try at it:
There is no error but for some reason, re.sub() doesn't replace the string with the desired string 'pMax' in the destination file. Although printing the return value of re.sub shows that the pMax is modified to the desired value.
vd_load_name = []
for i in range(len(Pressure_values)):
vd_load_name.append('{}_{}MPa'.format(n,int(Pressure_values[i])))
src_dir = os.getcwd() #get the current working dir
# create a dir where we want to copy and rename
dest_dir = os.mkdir('vd_loads')
os.listdir()
dest_dir = src_dir+"/vd_loads"
for i in range(len(vd_load_name)):
src_file = os.path.join(src_dir, 'n_t05_pressure_MPa.txt')
shutil.copy(src_file,dest_dir) #copy the file to destination dir
dst_file = os.path.join(dest_dir,'n_t05_pressure_MPa.txt')
new_dst_file_name = os.path.join(dest_dir, vd_load_name[i]+'.txt')
os.rename(dst_file, new_dst_file_name)#rename
os.chdir(dest_dir)
dest_file_path = dest_dir+'/'+vd_load_name[i]+'.txt'
with open(dest_file_path,'r+') as reader:
#reading all the lines in the file one by one
for line in reader:
re.sub(r"pMax=\d+", "pMax=" + str(int(Pressure_values[i])), line)
print(re.sub(r"pMax=\d+", "pMax=" + str(int(Pressure_values[i])), line))
Actualy part of the fortran code that I want to edit:
integer :: i !shot index in x
integer :: j !shot index in y
integer :: sigma !1D overlap
dimension curCoords(nblock,ndim), velocity(nblock,ndim),dirCos(nblock,ndim,ndim), value(nblock)
character*80 sname
pMax=3900d0 !pressure maximum [MPa] Needs to be updated!!
fact=1d0 !correction factor
JLTYP=0 !key that identifies the distributed load type, 0 for Surface-based load
t=stepTime !current time[s]
tShot=1.2d-7 !Time of a shot[s] Needs to be updated!!
sigma=0 !1D overlap in x&y [%]

How to iterate over multiple files by name within given range?

So I'm trying to iterate over multiple xml files from a library which contains more then 100k files, I need to list files by their 3 last digits.
Expected result is a list of files named from 'asset-PD471090' to 'asset-PD471110' or 'asset-GT888185' to 'asset-GT888209', and so on.
My Code -
'''
import glob
strtid = input('From ID: ') # First file in range
seps = strtid[-3:]
endid = input('To ID: ') # Last file in range
eeps = endid[-3:]
FileId = strtid[:5] # always same File Id for whole range
for name in glob.iglob('asset-' + FileId + [seps-eeps] + '.xml', recursive=True):
print(name) # iterate over every file in given range and print file names.
'''
The error I'm getting is
TypeError: unsupported operand type(s) for -: 'str' and 'str'
How to load a specific range of input files ?

As the error tells you: you try to use - on strings:
strtid = input('From ID: ') # string
seps = strtid[-3:] # part of a string
endid = input('To ID: ') # string
eeps = endid[-3:] # part of a string
FileId = strtid[:5] # also part of a string
# [seps-eeps]: trying to substract a string from a string:
for name in glob.iglob('asset-' + FileId + [seps-eeps] + '.xml', recursive=True):
You can convert the string to a integer using int("1234") - won't help you much though, because then you only have one (wrong) number for your iglob.
If you wanted to give them as glob-pattern you would need to encase them in stringdelimiters - and glob does not work that way with numberranges:
"[123-678]" would be one digit of 1,2,3,4,5,6,7,8 - not 123 up to 678
However, you can test your files yourself:
import os
def get_files(directory, prefix, postfix, numbers):
lp = len(prefix) # your assets-GT
li = len(postfix) + 4 # your id + ".xml"
for root, dirs, files in os.walk(directory):
for file in sorted(files): # sorted to get files in order, might not need it
if int(file[lp:len(file)-li]) in numbers:
yield os.path.join(root,file)
d = "test"
prefix = "asset-GT" # input("Basename: ")
postfix = "185" # input("Id: ")
# create demo files to search into
os.makedirs(d)
for i in range(50,100):
with open (os.path.join(d,f"{prefix}{i:03}{postfix}.xml"),"w") as f:
f.write("")
# search params
fromto = "75 92" # input("From To (space seperated numbers): ")
fr, to = map(int,fromto.strip().split())
to += 1 # range upper limit is exclusive, so need to add 1 to include it
all_searched = list(get_files("./test", prefix, postfix, range(fr,to)))
print(*all_searched, sep="\n")
Output:
./test/asset-GT075185.xml
./test/asset-GT076185.xml
./test/asset-GT077185.xml
./test/asset-GT078185.xml
./test/asset-GT079185.xml
./test/asset-GT080185.xml
./test/asset-GT081185.xml
./test/asset-GT082185.xml
./test/asset-GT083185.xml
./test/asset-GT084185.xml
./test/asset-GT085185.xml
./test/asset-GT086185.xml
./test/asset-GT087185.xml
./test/asset-GT088185.xml
./test/asset-GT089185.xml
./test/asset-GT090185.xml
./test/asset-GT091185.xml
./test/asset-GT092185.xml

Python: Trouble indexing a list from .split()

I'm currently working on a folder rename program that will crawl a directory, and rename specific words to their abbreviated version. These abbreviations are kept in a dictionary. When I try to replace mylist[mylist.index(w)] with the abbreviation, it replaces the entire list. The list shows 2 values, but it is treating them like a single index. Any help would be appreciated, as I am very new to Python.
My current test environment has the following:
c:\test\Accounting 2018
My expected result when this is completed, is c:\test\Acct 2018
import os
keyword_dict = {
'accounting': 'Acct',
'documents': 'Docs',
'document': 'Doc',
'invoice': 'Invc',
'invoices': 'Invcs',
'operations': 'Ops',
'administration': 'Admin',
'estimate': 'Est',
'regulations': 'Regs',
'work order': 'WO'
}
path = 'c:\\test'
def format_path():
for kw in os.walk(path, topdown=False):
#split the output to separate the '\'
usable_path = kw[0].split('\\')
#pull out the last folder name
string1 = str(usable_path[-1])
#Split this output based on ' '
mylist = [string1.lower().split(" ")]
#Iterate through the folders to find any values in dictionary
for i in mylist:
for w in i:
if w in keyword_dict.keys():
mylist[i.index(w)] = keyword_dict.get(w)
print(mylist)
format_path()
When I use print(mylist) prior to the index replacement, I get ['accounting', '2018'], and print(mylist[0]) returns the same result.
After the index replacement, the print(mylist) returns ['acct] the ['2018'] is now gone as well.
Why is it treating the list values as a single index?

I didn't test the following but it should point to the right direction. But first, not sure if it is a good idea spacing is the way to go (Accounting 2018) could come up as accounting2018 or accounting_2018. Better to use regular expression. Anyway, here is a slightly modified version of your code:
import os
keyword_dict = {
'accounting': 'Acct',
'documents': 'Docs',
'document': 'Doc',
'invoice': 'Invc',
'invoices': 'Invcs',
'operations': 'Ops',
'administration': 'Admin',
'estimate': 'Est',
'regulations': 'Regs',
'work order': 'WO'
}
path = 'c:\\test'
def format_path():
for kw in os.walk(path, topdown=False):
#split the output to separate the '\'
usable_path = kw[0].split('\\')
#pull out the last folder name
string1 = str(usable_path[-1])
#Split this output based on ' '
mylist = string1.lower().split(" ") #Remove [] since you are creating a list within a list for no reason
#Iterate through the folders to find any values in dictionary
for i in range(0,len(mylist)):
abbreviation=keyword_dict.get(mylist[i],'')
if abbreviation!='': #abbrevaition exists so overwrite it
mylist[i]=abbreviation
new_path=" ".join(mylist) #create new path (i.e. ['Acct', '2018']==>Acct 2018
usable_path[len(usable_path)-1]=new_path #replace the last item in the original path then rejoin the path
print("\\".join(usable_path))

What you need is:
import re, os
regex = "|".join(keyword_dict.keys())
repl = lambda x : keyword_dict.get(x.group().lower())
path = 'c:\\test'
[re.sub(regex,repl, i[0],re.I) for i in os.walk(path)]
You need to ensure the above is working.(So far it is working as expected) before you can rename

How to get the file with the latest yymm in a Directory?

There are several files in a directory which contains digit and non-digits as well. Lets say abc1710.csv, xyz1709.txt, abc1708.txt, abc.txt, xyz.csv.
I want to extract only the latest YYMM from the fileName.
FileNames = (next(os.walk('C:\\Python34\\PyScript'))[2])
def check_file_name(f):
try:
digits = f[-4:]
if len(digits) != 4:
return False
int(f[-4:])
except:
return False
return True
# first filter out bad file names:
good_filenames = [x for x in FileNames if check_file_name(x)]
# now run the code on "good names" only:
fileName=(max(good_filenames))
value=(fileName[-4:])
result = re.sub(r'[a-z]+', '', fileName)
print(result)

The function check_file_name assumes the extension is trimmed already, but you give it the entire file name. It should be fixed by doing
good_filenames = [x for x in FileNames if check_file_name(x[:-4])]
In addition, if you want to get the largest number, you also have to make your comparison on the numbers, not on the filenames:
fileName=max(good_filenames, key=lambda x: int(x[-8:-4]))
Future suggestions:
Don't use CamelCase, except on class names
There's no need to put a parenthesis around function calls (e.g. (max(good_filenames)))

Python changing file name

My application offers the ability to the user to export its results. My application exports text files with name Exp_Text_1, Exp_Text_2 etc. I want it so that if a file with the same file name pre-exists in Desktop then to start counting from this number upwards. For example if a file with name Exp_Text_3 is already in Desktop, then I want the file to be created to have the name Exp_Text_4.
This is my code:
if len(str(self.Output_Box.get("1.0", "end"))) == 1:
self.User_Line_Text.set("Nothing to export!")
else:
import os.path
self.txt_file_num = self.txt_file_num + 1
file_name = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt" + "_" + str(self.txt_file_num) + ".txt")
file = open(file_name, "a")
file.write(self.Output_Box.get("1.0", "end"))
file.close()
self.User_Line_Text.set("A text file has been exported to Desktop!")

you likely want os.path.exists:
>>> import os
>>> help(os.path.exists)
Help on function exists in module genericpath:
exists(path)
Test whether a path exists. Returns False for broken symbolic links
a very basic example would be create a file name with a formatting mark to insert the number for multiple checks:
import os
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
#the "{}" is a formatting mark so we can do file_name.format(num)
num = 1
while os.path.exists(name_to_format.format(num)):
num+=1
new_file_name = name_to_format.format(num)
this would check each filename starting with Exp_Txt_1.txt then Exp_Txt_2.txt etc. until it finds one that does not exist.
However the format mark may cause a problem if curly brackets {} are part of the rest of the path, so it may be preferable to do something like this:
import os
def get_file_name(num):
return os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_" + str(num) + ".txt")
num = 1
while os.path.exists(get_file_name(num)):
num+=1
new_file_name = get_file_name(num)
EDIT: answer to why don't we need get_file_name function in first example?
First off if you are unfamiliar with str.format you may want to look at Python doc - common string operations and/or this simple example:
text = "Hello {}, my name is {}."
x = text.format("Kotropoulos","Tadhg")
print(x)
print(text)
The path string is figured out with this line:
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
But it has {} in the place of the desired number. (since we don't know what the number should be at this point) so if the path was for example:
name_to_format = "/Users/Tadhg/Desktop/Exp_Txt_{}.txt"
then we can insert a number with:
print(name_to_format.format(1))
print(name_to_format.format(2))
and this does not change name_to_format since str objects are Immutable so the .format returns a new string without modifying name_to_format. However we would run into a problem if out path was something like these:
name_to_format = "/Users/Bob{Cat}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Bobcat{}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Smiley{:/Desktop/Exp_Txt_{}.txt"
Since the formatting mark we want to use is no longer the only curly brackets and we can get a variety of errors:
KeyError: 'Cat'
IndexError: tuple index out of range
ValueError: unmatched '{' in format spec
So you only want to rely on str.format when you know it is safe to use. Hope this helps, have fun coding!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Split a string by '_' - python-3.x

Related

How to search for a specific string and replace a number in this string usign python?

How to iterate over multiple files by name within given range?

Python: Trouble indexing a list from .split()

How to get the file with the latest yymm in a Directory?

Python changing file name

Categories

Resources