Regex With Lookahead For Fixed Length String - python-3.x

strings = [
r"C:\Photos\Selfies\1|",
r"C:\HDPhotos\Landscapes\2|",
r"C:\Filters\Pics\12345678|",
r"C:\Filters\Pics2\00000000|",
r"C:\Filters\Pics2\00000000|XAV7"
]
for string in strings:
matchptrn = re.match(r"(?P<file_path>.*)(?!\d{8})", string)
if matchptrn:
print("FILE PATH = "+matchptrn.group('file_path'))
I am trying to get this regular expression with a lookahead to work the way I though it would. Examples of Look Aheads on most websites seem to be pretty basic string matches i.e. not matching 'bar' if it is preceded by a 'foo' as an example of a negative look behind.
My goal is to capture in the group file_path the actual file path only if the string does NOT have an 8 character length number in it just before the pipe symbol | and match anything after the pipe symbol in another group (something I haven't implemented here).
So in the above example it should match only the first two strings
C:\Photos\Selfies\1
C:\HDPhotos\Landscapes\2
In case of the last string
C:\Filters\Pics2\00000000|XAV7
I'd like to match C:\Filters\Pics2\00000000 in <file_path> and match XAV7in another group named .
(This is something I can figure out on my own if I get some help with the negative look ahead)
Currently <file_path> matches everything, which makes sense since it is non-greedy (.*)
I want it to only capture if the last part of the string before the pipe symbol is NOT an 8 length character.
OUTPUT OF CODE SNIPPET PASTED BELOW
FILE PATH = C:\Photos\Selfies\1|
FILE PATH = C:\HDPhotos\Landscapes\2|
FILE PATH = C:\Filters\Pics\12345678|
FILE PATH = C:\Filters\Pics2\00000000|
FILE PATH = C:\Filters\Pics2\00000000|XAV7
Making this modification of \\
matchptrn = re.match(r"(?P<file_path>.*)\\(?!\d{8})", string)
if matchptrn:
print("FILE PATH = "+matchptrn.group('file_path'))
makes things worse as the output is
FILE PATH = C:\Photos\Selfies
FILE PATH = C:\HDPhotos\Landscapes
FILE PATH = C:\Filters
FILE PATH = C:\Filters
FILE PATH = C:\Filters
Can someone please explain this as well ?

You can use
^(?!.*\\\d{8}\|$)(?P<file_path>.*)\|(?P<suffix>.*)
See the regex demo.
Details
^ - start of a string
(?!.*\\\d{8}\|$) - fail the match if the string contains \ followed with eight digits and then | at the end of string
(?P<file_path>.*) - Group "file_path": any zero or more chars other than line break chars as many as possible
\| - a pipe
(?P<suffix>.*) - Group "sfuffix": the rest of the string, any zero or more chars other than line break chars, as many as possible.
See the Python demo:
import re
strings = [
r"C:\Photos\Selfies\1|",
r"C:\HDPhotos\Landscapes\2|",
r"C:\Filters\Pics\12345678|",
r"C:\Filters\Pics2\00000000|",
r"C:\Filters\Pics2\00000000|XAV7"
]
for string in strings:
matchptrn = re.match(r"(?!.*\\\d{8}\|$)(?P<file_path>.*)\|(?P<suffix>.*)", string)
if matchptrn:
print("FILE PATH = {}, SUFFIX = {}".format(*matchptrn.groups()))
Output:
FILE PATH = C:\Photos\Selfies\1, SUFFIX =
FILE PATH = C:\HDPhotos\Landscapes\2, SUFFIX =
FILE PATH = C:\Filters\Pics2\00000000, SUFFIX = XAV7

Related

How to find a line which contains a string without any suffix and prefix in a string?

I tried to find the solution on different platform, but I couldn't able to. So I am here.
I am reading a line in a file which contains a specific string(user Input). But the Problem is, my Code is reading all the lines. For an example.
Here user Input is: "Mon_ErrEntryEspSqPlaus"
Output line:
/begin MEASUREMENT Icsp_Dem_Deb_LfEve_Mon_ErrEntryEspSqPlaus
Here Output line string has Suffix with it. Not intended.
Instead of reading just below line:
941 "Mon_ErrEntryEspSqPlaus"
No Suffix and prefix in the above line with user Input string.
Here is the Code:
import re
def a2l_reader(parameter):
count = 0;
count_1 = 0;
with open("TPT.a2l", errors = 'replace') as myfile:
for num, line in enumerate(myfile,1):
if parameter in line:
if re.match(r'sample', line):
count += 1
else:
count_1 += 1
print(count)
print(count_1)
The Question is how to search for the specific line which contains a specific string without Suffix and prefix. Since I have to use the number associated with that string.
Thanks in advance
Instead of
if parameter in line:
you can simply do
if parameter == line:
and it will only proceed if there is an exact match. The first example (which is the one you have in your code) will match if there are substrings matching your input
In that case if you want to match the exact string you can split by spaces and then check contains using in ::
Split by Spaces and the check in list
if parameter in re.split("( )",line):

How to find a substring in a line from a text file and add that line or the characters after the searched string into a list using Python?

I have a MIB dataset which is around 10k lines. I want to find a certain string (for eg: "SNMPv2-MIB::sysORID") in the text file and add the whole line into a list. I am using Jupyter Notebooks for running the code.
I used the below code to search the search string and it print the searched string along with the next two strings.
basic = open('mibdata.txt')
file = basic.read()
city_name = re.search(r"SNMPv2-MIB::sysORID(?:[^a-zA-Z'-]+[a-zA-Z'-]+) {1,2}", file)
city_name = city_name.group()
print(city_name)
Sample lines in file:
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB::notificationLogMIB
SNMPv2-MIB::sysORDescr.1 = STRING: The MIB for Message Processing and Dispatching.
The output expected is
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB::notificationLogMIB
but i get only
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB
The problem with changing the number of string after the searched strings is that the number of strings in each line is different and i cannot specify a constant. Instead i want to use '\n' as a delimiter but I could not find one such post.
P.S. Any other solution is also welcome
EDIT
You can read all lines one by one of the file and look for a certain Regex that matches the case.
r(NMPv2-MIB::sysORID).* finds the encounter of the string in the parenthesis and then matches everything followed after.
import re
basic = open('file.txt')
entries = map(lambda x : re.search(r"(SNMPv2-MIB::sys).*",x).group() if re.search(r"(SNMPv2-MIB::sys).*",x) is not None else "", basic.readlines())
non_empty_entries = list(filter(lambda x : x is not "", entries))
print(non_empty_entries)
If you are not comfortable with Lambdas, what the above script does is
taking the text from the file, splits it into lines and checks all lines individually for a regex match.
Entries is a list of all lines where the match was encountered.
EDIT vol2
Now when the regex doesn't match it will add an empty string and after we filter them out.

how do i open a file in python with variable name?

I am trying to open a file that has a random time date stamp as part of its name. Most of the filename is known eg filename = 2017_01_23_624.txt is my test name. The date and numbers is the part I am trying to replace with something unknown as it will change.
My question relates to opening an existing file and not creating a new filename. The filenames are created by a separate program and I must be able to open them. I also want to change them once they are open.
I have been trying to construct the filename string as follow but get invalid syntax.
filename = "testfile" + %s + ".txt"
print(filename)
fo = open(filename)
You can't open a file with a partial filename containing a wildcard for it to match. What you would have to do it look at all the files in your directory and pick the one that matches best.
Simple example:
import os
filename = "filename2017_" # known section
direc = r"directorypath"
matches = [ fname for fname in os.listdir(direc) if fname.startswith(filename) ]
print(matches)
You can however use the glob module (Thanks to bzimor) to pattern match your files. See glob docs for more info.
import glob, os
# ? - represents any single character in the filename
# * - represents any number of characters in the filename
direc = r"directorypath"
pattern = 'filename2017-??-??-*.txt'
matches = glob.glob(os.path.join(direc, pattern))
print(matches)
Similar to first solution is that you still get back a list of filenames to choose from to then open. But with the glob module you can more accurately match your files if you so need. It all depends on how tight you want it to be.

matlab function replacing last part of strings between known characters

I have a text file TF including a set of the following kind of strings:
"linStru.twoZoneBuildingStructure.north.airLeakage.senTem.T",
"linStru.twoZoneBuildingStructure.north.vol.Xi[1]",
"linStru.twoZoneBuildingStructure.south.airLeakage.senTem.T",
"linStru.twoZoneBuildingStructure.south.vol.Xi[1]", "
"linStru.twoZoneBuildingStructure.north_ext.layMul.nMat[1].monoLayer1Nf.T[1]",
"linStru.twoZoneBuildingStructure.north_ext.layMul.nMat[1].monoLayer2Nf.T[2]",
Given a line L, starting from the end let the substring s denote the portion of the string between ," and the first .
To make it clearer, for L=1: s=T, for L=2: s=Xi[1], for L=5: s=T[1], etc.
Given a text file TF in the above format, I want to write a MATLAB function which takes TF and replaces the corresponding s on each line with der(s).
For example, the function should change the above strings as follows:
"linStru.twoZoneBuildingStructure.north.airLeakage.senTem.der(T)",
"linStru.twoZoneBuildingStructure.north.vol.der(Xi[1])",
"linStru.twoZoneBuildingStructure.south.airLeakage.senTem.der(T)",
"linStru.twoZoneBuildingStructure.south.vol.der(Xi[1])", "
"linStru.twoZoneBuildingStructure.north_ext.layMul.nMat[1].monoLayer1Nf.der(T[1])",
"linStru.twoZoneBuildingStructure.north_ext.layMul.nMat[1].monoLayer2Nf.der(T[2])",
How can such a function be written?
Something like
regexprep(TF, '\.([^.]+)",$', '.der($1)",', 'dotexceptnewline', 'lineanchors')
It finds the longest sequence of non-dot characters appearing between a dot before and quote-comma-endline after, and encloses that inside der( ).
I see there is a small " typo on the fourth line of your text file. I'm going to remove this to make things simpler.
As such, the simplest way that I can see you do this is iterate through all of your strings, remove the single quotes, then find the point in your string where the last . occurs. Extract this substring, then manually insert the der() in between this string. Assuming that those strings are in a text file called functions.txt, you would read in your text file using textread to read in individual strings. As such:
names = textread('functions.txt', '%s');
names should now be a cell array of names where each element is each string encapsulated in double quotes. Use findstr to extract where the . is located, then extract the last location of where this is. Extract this substring, then replace this string with der(). In other words:
out_strings = cell(1, numel(names)); %// To store output strings
for idx = 1 : numel(names)
%// Extract actual string without quotes and comma
name_str = names{idx}(2:end-2);
%// Find the last dot
dot_locs = findstr(name_str, '.');
%// Last dot location
last_dot_loc = dot_locs(end);
%// Extract substring after dot
last_string = name_str(last_dot_loc+1:end);
%// Create new string
out_strings{idx} = ['"' name_str(1:last_dot_loc) 'der(' last_string ')",'];
end
This is the output I get:
celldisp(out_strings)
out_strings{1} =
"linStru.twoZoneBuildingStructure.north.airLeakage.senTem.der(T)",
out_strings{2} =
"linStru.twoZoneBuildingStructure.north.vol.der(Xi[1])",
out_strings{3} =
"linStru.twoZoneBuildingStructure.south.airLeakage.senTem.der(T)",
out_strings{4} =
"linStru.twoZoneBuildingStructure.south.vol.der(Xi[1])",
out_strings{5} =
"linStru.twoZoneBuildingStructure.north_ext.layMul.nMat[1].monoLayer1Nf.der(T[1])",
out_strings{6} =
"linStru.twoZoneBuildingStructure.north_ext.layMul.nMat[1].monoLayer2Nf.der(T[2])",
The last thing you want to do is write each line of text to your text file. You can use fopen to open up a file for writing. fopen returns a file ID that is associated with the file you want to write to. You then use fprintf to print your strings and name a newline for each string using this file ID. You then close the file using fclose with this same file ID. As such, if we wanted to output a text file called functions_new.txt, we would do:
%// Open up the file and get ID
fid = fopen('functions_new.txt', 'w');
%// For each string we have...
for idx = 1 : numel(out_strings)
%// Write the string to file and make a new line
fprintf(fid, '%s\n', out_strings{idx});
end
%// Close the file
fclose(fid);
Another way to do it with regexprep:
str_out = regexprep(str_in, '\.([^\.]+)"$','\.der($1)"');
Example: for
str_in = {'"linStru.twoZoneBuildingStructure.north.airLeakage.senTem.T"'
'"linStru.twoZoneBuildingStructure.north.vol.Xi[1]"'};
this gives
str_out =
'"linStru.twoZoneBuildingStructure.north.airLeakage.senTem.der(T)"'
'"linStru.twoZoneBuildingStructure.north.vol.der(Xi[1])"'

Lua pattern to match the path

I would like to take a string representing a file path, strip off the file name and save just the path.
For example, if I have:
"/folder1/folder2/file.name"
I would like to end up with "/folder1/folder2/" in my string.
I've been playing around with string.match() as documented here: http://lua-users.org/wiki/StringLibraryTutorial
I have the following code:
mystring = "/var/log/test.log"
print(string.match(mystring, "%/"))
When I run this script, I end up with just a '/' returned.
I was expecting that it would return the positions of the two '/' in the string.
I've also tried replacing the pattern "%/" with just "/" but that gives me the same results.
I'm sure I'm missing something very simple but I can't see what it is.
By the pattern %/ or /, you are telling string.match to look for a string / and that's what you got. Try with this:
local mystring = "/var/log/test.log"
print(string.match(mystring, ".+/"))
Here the pattern .+/ means to look for one or more whatever characters(.) followed by a /. + means it's greedy, i.e, match as long as possible.
Try any this options:
local mystring = "/var/log/test.log"
print(mystring:gsub('([%w+]+%.%w+)$',''))
output: /var/log/
local mystring = "/var/log/test.log"
print(mystring:match('^(/.+/)'))
output: /var/log/
Or too a simple function
function anystring(string)
local string = string:match('^(/.+/)')
if anystring then
return string
else
return false
end
end
local mystring = anystring('/var/log/test.log')
print(mystring)
output: /var/log/
You can be as specific as possible when putting the pattern to avoid code errors

Resources