Read specific area or string in a text file - python-3.x

I have a text file I have written user data to. Username, E-Mail and Password.
That's how the userfile looks like for now
[<< LOGIN >>]
Username: admin
Password: 12345678
E-Mail: hue#hue.hue
[<< LOGIN END >>]
Now for the question.
How can I tell python to specifically read the password only? I mean, it may be possible for now that we know what the password is and what its lenght is. But how am I supposed to read the password later when I encrypt it and get some gibberish with 30+ characters?

The line will contain password so just split once and get the second element:
In [20]: from simplecrypt import encrypt
In [21]: ciph = encrypt('password', "12345678")
In [22]: line = "Password: " + ciph
In [23]: line
Out[23]: 'Password: sc\x00\x01\x0cP\xa1\xee\'$"\xc1\x85\xe0\x04\xd2wg5\x98\xbf\xb4\xd0\xacr\xd3\\\xbc\x9e\x00\xf1\x9d\xbe\xdb\xaa\xe6\x863Om\xcf\x0fc\xdeX\xfa\xa5\x18&\xd7\xcbh\x9db\xc9\xbeZ\xf6\xb7\xd3$\xcd\xa5\xeb\xc8\xa9\x9a\xfa\x85Z\xc5\xb3%~\xbc\xdf'
In [24]: line.split(None,1)[1]
Out[24]: 'sc\x00\x01\x0cP\xa1\xee\'$"\xc1\x85\xe0\x04\xd2wg5\x98\xbf\xb4\xd0\xacr\xd3\\\xbc\x9e\x00\xf1\x9d\xbe\xdb\xaa\xe6\x863Om\xcf\x0fc\xdeX\xfa\xa5\x18&\xd7\xcbh\x9db\xc9\xbeZ\xf6\xb7\xd3$\xcd\xa5\xeb\xc8\xa9\x9a\xfa\x85Z\xc5\xb3%~\xbc\xdf'
In [25]: decrypt("password",line.split(None,1)[1])
Out[25]: '12345678'
In [26]: "12345678" == decrypt("password",line.split(None,1)[1])
Out[26]: True
When you iterate over the file simple use if line.startswith("Password")...
with open(your_file) as f:
for line in f:
if line.startswith("Password"):
password = line.rstrip().split(None,1)[1]
# do your check
You could use a dict and pickle using passwordas a key then just do a lookup:

How can I tell python to specifically read the password only?
data.txt:
[<< LOGIN >>]
Username: admin
Password: 12345678
E-Mail: hue#hue.hue
[<< LOGIN END >>]
[<< LOGIN >>]
Username: admin
Password: XxyYo345320945!##!$##!##$%^%^^##$%!##$#!#41211
E-Mail: hue#hue.hue
[<< LOGIN END >>]
...
import re
f = open('data.txt')
pattern = r"""
Password #Match the word 'Password', followed by...
\s* #whitespace(\s), 0 or more times(*), followed by...
: #a colon
\s* #whitespace, 0 or more times...
(.*) #any character(.), 0 or more times(*). The parentheses 'capture' this part of the match.
"""
regex = re.compile(pattern, re.X) #When you use a pattern over and over for matching, it's more efficient to 'compile' the pattern.
for line in f:
match_obj = regex.match(line)
if match_obj: #then the pattern matched the line
password = match_obj.group(1) #group(1) is what matched the 'first' set of parentheses in the pattern
print password
f.close()
--output:--
12345678
XxyYo345320945!##!$##!##$%^%^^##$%!##$#!#41211
A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check if a particular string matches a given regular expression (or if a given regular expression matches a particular string, which comes down to the same thing).
Regular expressions can be concatenated to form new regular expressions; if A and B are both regular expressions, then AB is also a regular expression. In general, if a string p matches A and another string q matches B, the string pq will match AB. This holds unless A or B contain low precedence operations; boundary conditions between A and B; or have numbered group references. Thus, complex expressions can easily be constructed from simpler primitive expressions like the ones described here. For details of the theory and implementation of regular expressions, consult the Friedl book referenced above, or almost any textbook about compiler construction.
A brief explanation of the format of regular expressions follows. For further information and a gentler presentation, consult the Regular Expression HOWTO.
https://docs.python.org/3/library/re.html#module-re

Related

How can I find all the strings that contains "/1" and remove from a file using Python?

I have this file that contains these kinds of strings "1405079/1" the only common in them is the "/1" at the end. I want to be able to find those strings and remove them, below is sample code
but it's not doing anything.
with open("jobstat.txt","r") as jobstat:
with open("runjob_output.txt", "w") as runjob_output:
for line in jobstat:
string_to_replace = ' */1'
line = line.replace(string_to_replace, " ")
with open("jobstat.txt","r") as jobstat:
with open("runjob_output.txt", "w") as runjob_output:
for line in jobstat:
string_to_replace ='/1'
line =line.rstrip(string_to_replace)
print(line)
Anytime you have a "pattern" you want to match against, use a regular expression. The pattern here, given the information you've provided, is a string with an arbitrary number of digits followed by /1.
You can use re.sub to match against that pattern, and replace instances of it with another string.
import re
original_string= "some random text with 123456/1, and midd42142/1le of words"
pattern = r"\d*\/1"
replacement = ""
re.sub(pattern, replacement, original_string)
Output:
'some random text with , and middle of words'
Replacing instances of the pattern with something else:
>>> re.sub(pattern, "foo", original_string)
'some random text with foo, and middfoole of words'

How do I make re.finditer only return each line once

I am searching a text file that is a "phoneBook" for an assignment and am using regex finditer, but if a name has the letter a in it twice it prints that line twice which is what I am trying to avoid. Also is there a way to have it ignore case?
def searchPhonebook(s): #This will search the phonebook(s) for the inputed data that is assigned to d
print()
d=input("Please enter the Name, Character, Phone Number, or a number: ") #Variable d which is the inputted data
print()
import re
pattern = re.compile(d)
for line in open("phone.txt"):
for match in re.finditer(pattern,line):
print(line)
So when I search 'a' it returns
Jack Hammer,277-4829
Jack Hammer,277-4829
Mike Rafone,345-3453
Earl Lee Riser,701-304-8293
So I would like it to return each one once, and also find capitalization of 'a', like Abby
Don't use findall(). Just test whether the line matches the pattern:
for line in open("phone.txt"):
if re.search(pattern, line):
print(line)
Actually, I'm not sure why you're using re at all. Do your users really enter regular expression patterns? If they're just entering a plain string, use if d in line:

How can I split text using pyparsing with a specific token?

PLEASE NOTE:
In Splitting text into lines with pyparsing it is about how to parse a file using a single token at the end of a line which is \n that is pretty easy peasy. My question differs as I have hard time ignoring last text which is started before : and exclude it from free text search entered before filters.
On our API I have a user input like some free text port:45 title:welcome to our website and what I need to have at the end of parsing is 2 parts -> [some free text, port:45 title:welcome]
from pyparsing import *
token = "some free text port:45 title:welcome to our website"
t = Word(alphas, " "+alphanums) + Word(" "+alphas,":"+alphanums)
This does give me an error:
pyparsing.ParseException: Expected W:( ABC..., :ABC...), found ':' (at char 21), (line:1, col:22)
Because it gets all strings up to some free text port and then :45 title:welcome to our website.
How can I get all data before port: in a separate group and port:.... in another group using pyparsing?
I know that the question is about pyparsing, but for the specific use I think using regex is far more standard and simpler where instead pyparsing is probably better suited for more complicated parsing problems.
Here one possible working regex:
^(.+port\:\d+) (title:.+)$
And here the python code:
import re
pattern = "^(.+port\:\d+) (title:.+)$"
token = "some free text port:45 title:welcome to our website"
m = re.match(pattern, token)
if m:
grp1, grp2 = m.group(1), m.group(2)
Adding " " as one of the valid characters in a Word pretty much always has this problem, and so is general a pyparsing anti-pattern. Word does its character repetition matching inside its parse() method, so there is no way to add any kind of lookahead.
To get spaces in your expressions, you will probably need a OneOrMore, wrapped in originalTextFor, like this:
import pyparsing as pp
word = pp.Word(pp.printables, excludeChars=":")
non_tag = word + ~pp.FollowedBy(":")
# tagged value is two words with a ":"
tag = pp.Group(word + ":" + word)
# one or more non-tag words - use originalTextFor to get back
# a single string, including intervening white space
phrase = pp.originalTextFor(non_tag[1, ...])
parser = (phrase | tag)[...]
parser.runTests("""\
some free text port:45 title:welcome to our website
""")
Prints:
some free text port:45 title:welcome to our website
['some free text', ['port', ':', '45'], ['title', ':', 'welcome'], 'to our website']
[0]:
some free text
[1]:
['port', ':', '45']
[2]:
['title', ':', 'welcome']
[3]:
to our website

How to try if lines in a file match with lines in an other file in Python

How do I "brute" every line in a file Until I find what matches it, what I mean is I turned every line in save.data and brute.txt into two lists (For easy access), here is brute.txt:
username
username1
password
password1
And here is save.data (Since this is for a Batch-file game, there is no need to quote strings like "username1"):
username1 = PlayerName
password1 = PlayerPass
So, my request is, I want to try if line 1 from brute.txt matches the the thing before the equal sign in save.data (which is 'username1'), if it doesn't match with it pass to the next line, and so on until it reaches the end of the file (save.data) then try if line 2 from brute.txt matches line 1 from save.data (which matches) if not, pass to if line 2 from brute.txt matches the thing before the qual sign in line 2 in save.data and so on ... And finally, when "username" matches "username", make a variable called username with the value of what is after the equal sign in save.data. So, when the "bruting" process is finished, I must have two variables, one is username = PlayerName and the other is password = PlayerPass for further use. I tried while, for and try loops but I got stuck because to do so I need to know what is in save.data.
-If you didn't understand something, kindly comment it and I will clear it up.
There are probably more efficient ways to do this, but to answer the question you asked..
First open the save.data file and read the contents into a list:
with open('save.data') as fp:
save_data = [line.split(' = ') for line in fp.read().splitlines()]
do the same for the brute.txt file:
with open('brute.txt') as fp:
brute = fp.read().splitlines()
then just iterate through usernames and passwords:
for username, password in save_data:
if username in brute:
break
else:
print("didn't find the username")
the username and password variables from the for-loop will have the correct values after the for-loop breaks.
(please note that the else: is on the for-loop, not the if..)

question tuples in regex -regular expressions

So I have some code on which I will use an regex on.
Specifically, I need to use re.findall() and a single regular expression to extract the three names and email addresses from the 'string'. To create list of 3 tuples like so: [('Mary Boe', 'md90#uw.com'), ('Cheri Moe Drake', 'cmd39#gmail.gbl'), ('R.L. Fitzgeri', 'fit.rl#hotmail.ing')]
here is the string....
string = """Name: Mary Boe, Email: md90#uw.com\n
Name: Cheri Moe Drake, Email: cmd39#gmail.gbl\n
Name: R.L. Fitzgeri, Email: fit.rl#hotmail.ing"""
So far I have used the following to get ['R.L. Fitzgeri']
with
re.findall('\S\S\w\S\s\w\S\w\w\w\w\S\w',string)
And I have been able to get fit.rl#hotmail.ing
with
re.findall('\w\\w\\w\\S\w\w\S\w\w\w\w\S\w\w\S\w\w\w',string)
I have been able to get Cheri Moe Drake with
re.findall('\w\w\w\w\w\s\w\w\w\s\w\w\w\w\w',string)
But I have struggled condensing this, and secondly, struggled to get it so that it all comes out, as I said, like:
[('Mary Boe', 'md90#uw.com'), ('Cheri Moe Drake', 'cmd39#gmail.gbl'), ('R.L. Fitzgeri', 'fit.rl#hotmail.ing')]
Here is a way to do the job:
import re
string = """Name: Jane Doe, Email: jd12#uw.com\n
Name: Sally Sue Draper, Email: ssd59#gmail.edu\n
Name: J.D. Salinger, Email: sal.jd#hotmail.org"""
pattern = r'Name: (.+?), Email: (.+)'
result = re.findall(pattern, string)
print(result)
Output:
[('Jane Doe', 'jd12#uw.com'), ('Sally Sue Draper', 'ssd59#gmail.edu'), ('J.D. Salinger', 'sal.jd#hotmail.org')]
Regex explanation:
Name: # literally
(.+?) # group 1, 1 or more any character but newline, not greedy
, Email: # literally
(.+) # group 2, 1 or more any character but newline
If you always have the same format, it may make more sense to avoid regular expressions in this scenario and approach the problem alternatively:
string = """Name: Jane Doe, Email: jd12#uw.com\n
Name: Sally Sue Draper, Email: ssd59#gmail.edu\n
Name: J.D. Salinger, Email: sal.jd#hotmail.org"""
people = [person for person in string.split('\n') if person]
people_list = []
for person in people:
name = ''
for char in person[6:]:
if char == ',':
break
else:
name += char
email = ''
for char in person[::-1]:
if char == ' ':
break
else:
email += char
email = email[::-1]
person_tuple = (name, email)
people_list.append(person_tuple)
This will give you a list of tuples if you print people_list:
[('Jane Doe', 'jd12#uw.com'), ('Sally Sue Draper', 'ssd59#gmail.edu'), ('J.D. Salinger', 'sal.jd#hotmail.org')]
This assumes that all your lines start with Name:, which is why the loop builds a name by concatenating all characters after that up to the first comma it finds.
For the email, it does the same thing in reverse: it takes all characters starting from the end of the string until it finds a space, where the email is effectively ending. It then puts it back in order to get the correct email.
To build the list of contacts, the loop will format name and email into a tuple that will be appended to people_list until there are no more contacts to add.
If you insist on using regular expressions, then a good use for those could be if you want to validate email addresses and not add a contact to your list if the email does not correspond to the format of your choosing (or leave it blank instead). The regex pattern could look like this:
email = 'example#email.com'
pattern = r'[a-z]+[a-z0-9]*[\w._-]*#[a-z]+\.[a-z]{1,3}$'
if re.match(pattern, email):
# do something with email here
Note that in this case, the regex uses symbols like + and * to avoid repetition, which is one of the keys in building a more robust regex.

Resources