How to find a line which contains a string without any suffix and prefix in a string? - python-3.x

I tried to find the solution on different platform, but I couldn't able to. So I am here.
I am reading a line in a file which contains a specific string(user Input). But the Problem is, my Code is reading all the lines. For an example.
Here user Input is: "Mon_ErrEntryEspSqPlaus"
Output line:
/begin MEASUREMENT Icsp_Dem_Deb_LfEve_Mon_ErrEntryEspSqPlaus
Here Output line string has Suffix with it. Not intended.
Instead of reading just below line:
941 "Mon_ErrEntryEspSqPlaus"
No Suffix and prefix in the above line with user Input string.
Here is the Code:
import re
def a2l_reader(parameter):
count = 0;
count_1 = 0;
with open("TPT.a2l", errors = 'replace') as myfile:
for num, line in enumerate(myfile,1):
if parameter in line:
if re.match(r'sample', line):
count += 1
else:
count_1 += 1
print(count)
print(count_1)
The Question is how to search for the specific line which contains a specific string without Suffix and prefix. Since I have to use the number associated with that string.
Thanks in advance

Instead of
if parameter in line:
you can simply do
if parameter == line:
and it will only proceed if there is an exact match. The first example (which is the one you have in your code) will match if there are substrings matching your input

In that case if you want to match the exact string you can split by spaces and then check contains using in ::
Split by Spaces and the check in list
if parameter in re.split("( )",line):

Related

Count the number of characters in a file

The question:
Write a function file_size(filename) that returns a count of the number of characters in the file whose name is given as a parameter. You may assume that when being tested in this CodeRunner question your function will never be called with a non-existent filename.
For example, if data.txt is a file containing just the following line: Hi there!
A call to file_size('data.txt') should return the value 10. This includes the newline character that will be added to the line when you're creating the file (be sure to hit the 'Enter' key at the end of each line).
What I have tried:
def file_size(data):
"""Count the number of characters in a file"""
infile = open('data.txt')
data = infile.read()
infile.close()
return len(data)
print(file_size('data.txt'))
# data.txt contains 'Hi there!' followed by a new line
character.
I get the correct answer for this file however I fail a test that users a larger/longer file which should have a character count of 81 but I still get 10. I am trying to get the code to count the correct size of any file.

How do I search for a substring in a string then find the character before the substring in python

I am making a small project in python that lets you make notes then read them by using specific arguments. I attempted to make an if statement to check if the string has a comma in it, and if it does, than my python file should find the comma then find the character right below that comma and turn it into an integer so it can read out the notes the user created in a specific user-defined range.
If that didn't make sense then basically all I am saying is that I want to find out what line/bit of code is causing this to not work and return nothing even though notes.txt has content.
Here is what I have in my python file:
if "," not in no_cs: # no_cs is the string I am searching through
user_out = int(no_cs[6:len(no_cs) - 1])
notes = open("notes.txt", "r") # notes.txt is the file that stores all the notes the user makes
notes_lines = notes.read().split("\n") # this is suppose to split all the notes into a list
try:
print(notes_lines[user_out])
except IndexError:
print("That line does not exist.")
notes.close()
elif "," in no_cs:
user_out_1 = int(no_cs.find(',') - 1)
user_out_2 = int(no_cs.find(',') + 1)
notes = open("notes.txt", "r")
notes_lines = notes.read().split("\n")
print(notes_lines[user_out_1:user_out_2]) # this is SUPPOSE to list all notes in a specific range but doesn't
notes.close()
Now here is the notes.txt file:
note
note1
note2
note3
and lastly here is what I am getting in console when I attempt to run the program and type notes(0,2)
>>> notes(0,2)
jeffv : notes(0,2)
[]
A great way to do this is to use the python .partition() method. It works by splitting a string from the first occurrence and returns a tuple... The tuple consists of three parts 0: Before the separator 1: The separator itself 2: After the separator:
# The whole string we wish to search.. Let's use a
# Monty Python quote since we are using Python :)
whole_string = "We interrupt this program to annoy you and make things\
generally more irritating."
# Here is the first word we wish to split from the entire string
first_split = 'program'
# now we use partition to pick what comes after the first split word
substring_split = whole_string.partition(first_split)[2]
# now we use python to give us the first character after that first split word
first_character = str(substring_split)[0]
# since the above is a space, let's also show the second character so
# that it is less confusing :)
second_character = str(substring_split)[1]
# Output
print("Here is the whole string we wish to split: " + whole_string)
print("Here is the first split word we want to find: " + first_split)
print("Now here is the first word that occurred after our split word: " + substring_split)
print("The first character after the substring split is: " + first_character)
print("The second character after the substring split is: " + second_character)
output
Here is the whole string we wish to split: We interrupt this program to annoy you and make things generally more irritating.
Here is the first split word we want to find: program
Now here is the first word that occurred after our split word: to annoy you and make things generally more irritating.
The first character after the substring split is:
The second character after the substring split is: t

Search for numbers after a certain string in an output file?

I have an output file with a load of information in and I want to read a number value that appears after a specific word.
In my file, I have a line such as
"Final energy, E = -82137.1098 eV"
What I would like to do is search my file for the string 'Final energy' and then read and store the number value.
So far I have managed to search the file for 'Final energy' and print the entire line containing that string but I can't seem to find a way to then read the number.
So far my code goes like this
energystring = 'Final energy'
with open(filename, 'r') as file:
for line in file:
if energystring in line:
energyline = line
print(energyline)
Thank you for any help you can give.
You just need to parse the number out of the string then. You can split the string on whitespace to get all the words, try to cast each word to a float, and get the one that works. Since there's only one number in the string, whatever successfully casts to float is your energy number.
def get_energy_level(line):
for word in line.split():
try:
return float(word)
except ValueError:
pass
with open(filename, 'r') as file:
for line in file:
if energystring in line:
energy_level = get_energy_level(line)

How to find a substring in a line from a text file and add that line or the characters after the searched string into a list using Python?

I have a MIB dataset which is around 10k lines. I want to find a certain string (for eg: "SNMPv2-MIB::sysORID") in the text file and add the whole line into a list. I am using Jupyter Notebooks for running the code.
I used the below code to search the search string and it print the searched string along with the next two strings.
basic = open('mibdata.txt')
file = basic.read()
city_name = re.search(r"SNMPv2-MIB::sysORID(?:[^a-zA-Z'-]+[a-zA-Z'-]+) {1,2}", file)
city_name = city_name.group()
print(city_name)
Sample lines in file:
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB::notificationLogMIB
SNMPv2-MIB::sysORDescr.1 = STRING: The MIB for Message Processing and Dispatching.
The output expected is
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB::notificationLogMIB
but i get only
SNMPv2-MIB::sysORID.10 = OID: NOTIFICATION-LOG-MIB
The problem with changing the number of string after the searched strings is that the number of strings in each line is different and i cannot specify a constant. Instead i want to use '\n' as a delimiter but I could not find one such post.
P.S. Any other solution is also welcome
EDIT
You can read all lines one by one of the file and look for a certain Regex that matches the case.
r(NMPv2-MIB::sysORID).* finds the encounter of the string in the parenthesis and then matches everything followed after.
import re
basic = open('file.txt')
entries = map(lambda x : re.search(r"(SNMPv2-MIB::sys).*",x).group() if re.search(r"(SNMPv2-MIB::sys).*",x) is not None else "", basic.readlines())
non_empty_entries = list(filter(lambda x : x is not "", entries))
print(non_empty_entries)
If you are not comfortable with Lambdas, what the above script does is
taking the text from the file, splits it into lines and checks all lines individually for a regex match.
Entries is a list of all lines where the match was encountered.
EDIT vol2
Now when the regex doesn't match it will add an empty string and after we filter them out.

regex - Making all letters in a text lowercase using re.sub in python but exclude specific string?

I am writing a script to convert all uppercase letters in a text to lower case using regex, but excluding specific strings/characters such as "TEA", "CHI", "I", "#Begin", "#Language", "ENG", "#Participants", "#Media", "#Transcriber", "#Activities", "SBR", "#Comment" and so on.
The script I have is currently shown below. However, it does not provide the desired outputs. For instance when I input "#Activities: SBR", the output given is "#Activities#activities: sbr#activities: sbrSBR". The intended output is "#Activities": "SBR".
I am using Python 3.5.2
Can anyone help to provide some guidance? Thank you.
import os
from itertools import chain
import re
def lowercase_exclude_specific_string(line):
line = line.strip()
PATTERN = r'[^TEA|CHI|I|#Begin|#Language|ENG|#Participants|#Media|#Transcriber|#Activities|SBR|#Comment]'
filtered_line = re.sub(PATTERN, line.lower(), line)
return filtered_line
First, let's see why you're getting the wrong output.
For instance when I input "#Activities: SBR", the output given is
"#Activities#activities: sbr#activities: sbrSBR".
This is because your code
PATTERN = r'[^TEA|CHI|I|#Begin|#Language|ENG|#Participants|#Media|#Transcriber|#Activities|SBR|#Comment]'
filtered_line = re.sub(PATTERN, line.lower(), line)
is doing negated character class matching, meaning it will match all characters that are not in the list and replace them with line.lower() (which is "#activities: sbr"). You can see the matched characters in this regex demo.
The code will match ":" and " " (whitespace) and replace both with "#activities: sbr", giving you the result "#Activities#activities: sbr#activities: sbrSBR".
Now to fix that code. Unfortunately, there is no direct way to negate words in a line and apply substitution on the other words on that same line. Instead, you can split the line first into individual words, then apply re.sub on it using your PATTERN. Also, instead of a negated character class, you should use a negative lookahead:
(?!...)
Negative lookahead assertion. This is the opposite of the positive assertion; it succeeds if the contained expression doesn’t match at
the current position in the string.
Here's the code I got:
def lowercase_exclude_specific_string(line):
line = line.strip()
words = re.split("\s+", line)
result = []
for word in words:
PATTERN = r"^(?!TEA|CHI|I|#Begin|#Language|ENG|#Participants|#Media|#Transcriber|#Activities|SBR|#Comment).*$"
lword = re.sub(PATTERN, word.lower(), word)
result.append(lword)
return " ".join(result)
The re.sub will only match words not in the PATTERN, and replace it with its lowercase value. If the word is part of the excluded pattern, it will be unmatched and re.sub returns it unchanged.
Each word is then stored in a list, then joined later to form the line back.
Samples:
print(lowercase_exclude_specific_string("#Activities: SBR"))
print(lowercase_exclude_specific_string("#Activities: SOME OTHER TEXT SBR"))
print(lowercase_exclude_specific_string("Begin ABCDEF #Media #Comment XXXX"))
print(lowercase_exclude_specific_string("#Begin AT THE BEGINNING."))
print(lowercase_exclude_specific_string("PLACE #Begin AT THE MIDDLE."))
print(lowercase_exclude_specific_string("I HOPe thIS heLPS."))
#Activities: SBR
#Activities: some other text SBR
begin abcdef #Media #Comment xxxx
#Begin at the beginning.
place #Begin at the middle.
I hope this helps.
EDIT:
As mentioned in the comments, apparently there is a tab in between : and the next character. Since the code splits the string using \s, the tab can't be preserved, but it can be restored by replacing : with :\t in the final result.
return " ".join(result).replace(":", ":\t")

Resources