Lua pattern to match the path - string

I would like to take a string representing a file path, strip off the file name and save just the path.
For example, if I have:
"/folder1/folder2/file.name"
I would like to end up with "/folder1/folder2/" in my string.
I've been playing around with string.match() as documented here: http://lua-users.org/wiki/StringLibraryTutorial
I have the following code:
mystring = "/var/log/test.log"
print(string.match(mystring, "%/"))
When I run this script, I end up with just a '/' returned.
I was expecting that it would return the positions of the two '/' in the string.
I've also tried replacing the pattern "%/" with just "/" but that gives me the same results.
I'm sure I'm missing something very simple but I can't see what it is.

By the pattern %/ or /, you are telling string.match to look for a string / and that's what you got. Try with this:
local mystring = "/var/log/test.log"
print(string.match(mystring, ".+/"))
Here the pattern .+/ means to look for one or more whatever characters(.) followed by a /. + means it's greedy, i.e, match as long as possible.

Try any this options:
local mystring = "/var/log/test.log"
print(mystring:gsub('([%w+]+%.%w+)$',''))
output: /var/log/
local mystring = "/var/log/test.log"
print(mystring:match('^(/.+/)'))
output: /var/log/
Or too a simple function
function anystring(string)
local string = string:match('^(/.+/)')
if anystring then
return string
else
return false
end
end
local mystring = anystring('/var/log/test.log')
print(mystring)
output: /var/log/
You can be as specific as possible when putting the pattern to avoid code errors

Related

Regex With Lookahead For Fixed Length String

strings = [
r"C:\Photos\Selfies\1|",
r"C:\HDPhotos\Landscapes\2|",
r"C:\Filters\Pics\12345678|",
r"C:\Filters\Pics2\00000000|",
r"C:\Filters\Pics2\00000000|XAV7"
]
for string in strings:
matchptrn = re.match(r"(?P<file_path>.*)(?!\d{8})", string)
if matchptrn:
print("FILE PATH = "+matchptrn.group('file_path'))
I am trying to get this regular expression with a lookahead to work the way I though it would. Examples of Look Aheads on most websites seem to be pretty basic string matches i.e. not matching 'bar' if it is preceded by a 'foo' as an example of a negative look behind.
My goal is to capture in the group file_path the actual file path only if the string does NOT have an 8 character length number in it just before the pipe symbol | and match anything after the pipe symbol in another group (something I haven't implemented here).
So in the above example it should match only the first two strings
C:\Photos\Selfies\1
C:\HDPhotos\Landscapes\2
In case of the last string
C:\Filters\Pics2\00000000|XAV7
I'd like to match C:\Filters\Pics2\00000000 in <file_path> and match XAV7in another group named .
(This is something I can figure out on my own if I get some help with the negative look ahead)
Currently <file_path> matches everything, which makes sense since it is non-greedy (.*)
I want it to only capture if the last part of the string before the pipe symbol is NOT an 8 length character.
OUTPUT OF CODE SNIPPET PASTED BELOW
FILE PATH = C:\Photos\Selfies\1|
FILE PATH = C:\HDPhotos\Landscapes\2|
FILE PATH = C:\Filters\Pics\12345678|
FILE PATH = C:\Filters\Pics2\00000000|
FILE PATH = C:\Filters\Pics2\00000000|XAV7
Making this modification of \\
matchptrn = re.match(r"(?P<file_path>.*)\\(?!\d{8})", string)
if matchptrn:
print("FILE PATH = "+matchptrn.group('file_path'))
makes things worse as the output is
FILE PATH = C:\Photos\Selfies
FILE PATH = C:\HDPhotos\Landscapes
FILE PATH = C:\Filters
FILE PATH = C:\Filters
FILE PATH = C:\Filters
Can someone please explain this as well ?
You can use
^(?!.*\\\d{8}\|$)(?P<file_path>.*)\|(?P<suffix>.*)
See the regex demo.
Details
^ - start of a string
(?!.*\\\d{8}\|$) - fail the match if the string contains \ followed with eight digits and then | at the end of string
(?P<file_path>.*) - Group "file_path": any zero or more chars other than line break chars as many as possible
\| - a pipe
(?P<suffix>.*) - Group "sfuffix": the rest of the string, any zero or more chars other than line break chars, as many as possible.
See the Python demo:
import re
strings = [
r"C:\Photos\Selfies\1|",
r"C:\HDPhotos\Landscapes\2|",
r"C:\Filters\Pics\12345678|",
r"C:\Filters\Pics2\00000000|",
r"C:\Filters\Pics2\00000000|XAV7"
]
for string in strings:
matchptrn = re.match(r"(?!.*\\\d{8}\|$)(?P<file_path>.*)\|(?P<suffix>.*)", string)
if matchptrn:
print("FILE PATH = {}, SUFFIX = {}".format(*matchptrn.groups()))
Output:
FILE PATH = C:\Photos\Selfies\1, SUFFIX =
FILE PATH = C:\HDPhotos\Landscapes\2, SUFFIX =
FILE PATH = C:\Filters\Pics2\00000000, SUFFIX = XAV7

re.MULTILINE flag is interfering with the end of line $ operator

Sorry if this is a duplicate/basic question, I couldn't find any similar questions.
I have the following multiline string
my_txt = """
foo.exe\n
bar.exec\n
abab.exe\n
"""
(The newlines aren't actually written in my code, I put them there for clarity).
I want to match every file that ends with a .exe, (not .exec).
My regex was initially:
my_reg = re.compile(".+[.](?=exe$)")
my_matches = my_reg.finditer(my_txt)
I hoped that it would first find every character, go back until it found the ., and then check if the characters exe and a newline followed.
Only one match was found, and that was:
abab.exe.
I tried to mess around a bit, and changed the first line:
my_reg = re.compile(".+[.](?=exe$)",flags=re.MULTILINE).
This time, it successfully ran, returning
foo.
abab.
I thought re.MULTILINE wasn't supposed to interfere with the $ operator, or am I wrong about the $ operator/misusing something?
Thanks in advance!
You do need the multiline flag, otherwise $ will only match the absolute end of your input. You just need to match exe instead of using a lookahead:
my_reg = re.compile(".+[.]exe$", re.MULTILINE)
Output:
['foo.exe', 'abab.exe']
Demo
If you are trying to match the filename without the extension, you can put the period inside the lookahead:
my_reg = re.compile(r".+(?=\.exe$)", re.MULTILINE)
Output:
['foo', 'abab']
Demo

os.path.exists() always returns false

I am trying to check if a file exits or not in the specified directory. If it is, then I would move the file to another directory. Here is my code
def move(pnin, pno):
if (os.path.exists(pnin)):
shutil.move(pnin, pno)
here is an example of pnin and pno
pnin='D:\\extracted\\extrimg_2016000055202500\\2016000055202500_65500000007006_11_6.png'
pno=D:\folder\discarded
I have a bit more than 8000 input directories. I copied this pnin from the output of print(pnin).When I define pnin externally as in the example, the if statement works. But when I want to run 'move' function iteratively, if statement is never executed. What could be the problem and how can I solve this?
Here is how I call move function:
def clean_Data(inputDir, outDir):
if (len(listf) > 1):
for l in range(1,len(listf)):
fname = hashmd5[m][l]
pathnamein = os.path.join(inputDir, fname)
pathnamein = "%r"%pathnamein
pathnameout = outfile
move(pathnamein, pathnameout)
When I try below code it does not give any output. For loop şs working. When I use print(pathnamein) in the for loop it shows all the values of pathnamein.
def move(pnin, pno):
os.path.exists(pnin)
You should use backslash to escape backslashes in your pno string:
pno='D:\\folder\\discarded'
or use a raw string instead:
pno=r'D:\folder\discarded'
Otherwise \f would be considered a formfeed character.

Lua pattern to stop when end of line

I need to get help for a pattern in Lua stopping to read after a line break.
My code:
function getusers(file)
local list, close = {}
local user, value = string.match(file,"(UserName=)(.*)")
print(value)
f:close()
end
f = assert(io.open('file2.ini', "r"))
local t = f:read("*all")
getusers(t)
--file2.ini--
user=a
UserName=Tom
Password=xyz
UserName=Jane
Output of script using file2.ini:
Tom
Password=xyz
UserName=Jane
How to get the pattern to stop after it reaches the end of line?
You can use the pattern
"(UserName=)(.-)\n"
Note that besides the extra \n, the lazy modifier - is used instead of *.
As #lhf points out, make sure the file ends with a new line. I think you can append a \n to the string manually before matching.

matlab iterative filenames for saving

this question about matlab:
i'm running a loop and each iteration a new set of data is produced, and I want it to be saved in a new file each time. I also overwrite old files by changing the name. Looks like this:
name_each_iter = strrep(some_source,'.string.mat','string_new.(j).mat')
and what I#m struggling here is the iteration so that I obtain files:
...string_new.1.mat
...string_new.2.mat
etc.
I was trying with various combination of () [] {} as well as 'string_new.'j'.mat' (which gave syntax error)
How can it be done?
Strings are just vectors of characters. So if you want to iteratively create filenames here's an example of how you would do it:
for j = 1:10,
filename = ['string_new.' num2str(j) '.mat'];
disp(filename)
end
The above code will create the following output:
string_new.1.mat
string_new.2.mat
string_new.3.mat
string_new.4.mat
string_new.5.mat
string_new.6.mat
string_new.7.mat
string_new.8.mat
string_new.9.mat
string_new.10.mat
You could also generate all file names in advance using NUM2STR:
>> filenames = cellstr(num2str((1:10)','string_new.%02d.mat'))
filenames =
'string_new.01.mat'
'string_new.02.mat'
'string_new.03.mat'
'string_new.04.mat'
'string_new.05.mat'
'string_new.06.mat'
'string_new.07.mat'
'string_new.08.mat'
'string_new.09.mat'
'string_new.10.mat'
Now access the cell array contents as filenames{i} in each iteration
sprintf is very useful for this:
for ii=5:12
filename = sprintf('data_%02d.mat',ii)
end
this assigns the following strings to filename:
data_05.mat
data_06.mat
data_07.mat
data_08.mat
data_09.mat
data_10.mat
data_11.mat
data_12.mat
notice the zero padding. sprintf in general is useful if you want parameterized formatted strings.
For creating a name based of an already existing file, you can use regexp to detect the '_new.(number).mat' and change the string depending on what regexp finds:
original_filename = 'data.string.mat';
im = regexp(original_filename,'_new.\d+.mat')
if isempty(im) % original file, no _new.(j) detected
newname = [original_filename(1:end-4) '_new.1.mat'];
else
num = str2double(original_filename(im(end)+5:end-4));
newname = sprintf('%s_new.%d.mat',original_filename(1:im(end)-1),num+1);
end
This does exactly that, and produces:
data.string_new.1.mat
data.string_new.2.mat
data.string_new.3.mat
...
data.string_new.9.mat
data.string_new.10.mat
data.string_new.11.mat
when iterating the above function, starting with 'data.string.mat'

Resources