write utf-8 content in files / python 3 - python-3.x

Its again about utf-8 issues, for the 1001st time. Please don't mark this question as duplicate, 'cause I cannot find answers elsewhere.
Since some months I am working successfully with the following small script (which may improved, I know that), which delivers me a simple database functionality. But I wrote it for very simple data storage, like local config and auth data, let's say not for more sophisticated content as known from cookies. It worked for me, until I tried to store non latin characters for the 1st time.
In the following script I already added the import codecs stuff, including the altered lines f = codecs.open(file, 'w', 'utf-8'). Do not know if this is the right approach.
Can somebody show me the trick ? Let's say "John Doe" is french, "John Doé", how do I store it ?
The class itself (to be imported)
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import os, errno
import json
import codecs
class Ddpos:
def db(self,table,id,col=''):
table = '/Users/michag/Documents/ddposdb/'+table
try:
os.makedirs(table)
os.chmod(table, 0o755)
except OSError as e:
if e.errno != errno.EEXIST:
raise
file = table+'/'+id+'.txt'
if not os.path.isfile(file):
f = codecs.open(file, 'w', 'utf-8')
f.write('{}')
f.close()
f = codecs.open(file, 'r', 'utf-8')
r = json.loads(f.readline().strip())
f.close()
if isinstance(col, str) and len(col) > 0:
if col in r:
return json.dumps(r[col])
else:
return ''
elif isinstance(col, list) and len(col) > 0:
res = {}
for el in range(0,len(col)):
if col[el] in r:
res[col[el]] = r[col[el]]
return json.dumps(res)
elif isinstance(col, dict) and len(col) > 0:
for el in col:
r[el] = col[el]
f = codecs.open(file, 'w', 'utf-8')
f.write(json.dumps(r))
f.close()
return json.dumps(r)
else:
return json.dumps(r)
ddpos = Ddpos()
The call / usage
#!/usr/bin/python3
# -*- coding: utf-8 -*-
from ddpos import *
# set values and return all values as dict
print ('1.: '+ddpos.db('cfg','local',{'admin':'John Doé','email':'johndoe#email.com'}))
# return all values as dict
print ('2.: '+ddpos.db('cfg','local'))
# return one value as string
print ('3.: '+ddpos.db('cfg','local','email'))
# return two or more values as dict
print ('4.: '+ddpos.db('cfg','local',['admin','email']))
It prints and stores this in case of "John Doe"
1.: {"admin": "John Doe", "email": "johndoe#email.com"}
2.: {"admin": "John Doe", "email": "johndoe#email.com"}
3.: "johndoe#email.com"
4.: {"admin": "John Doe", "email": "johndoe#email.com"}
and this in case of french guy "John Doé"
1.: {"email": "johndoe#email.com", "admin": "John Do\u00e9"}
2.: {"email": "johndoe#email.com", "admin": "John Do\u00e9"}
3.: "johndoe#email.com"
4.: {"email": "johndoe#email.com", "admin": "John Do\u00e9"}
For me it is more important to learn and to understand, how it works and why or why not, but to know that there already classes which would do the job for me. Thanks for your support.

After moderator deceze I answer my own question, with credentials to user mata and python3 itself.
Here's the "new" script. The poor french guy now is renamed to "J€hn Doéß" and he's still alive.
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import os, errno
import json
class Ddpos:
def db(self,table,id,col=''):
table = '/Users/michag/Documents/ddposdb/'+table
try:
os.makedirs(table)
os.chmod(table, 0o755)
except OSError as e:
if e.errno != errno.EEXIST:
raise
file = table+'/'+id+'.txt'
if not os.path.isfile(file):
f = open(file, 'w')
f.write('{}')
f.close()
f = open(file, 'r')
r = json.loads(f.readline().strip())
f.close()
if isinstance(col, str) and len(col) > 0:
if col in r:
return r[col]
else:
return ''
elif isinstance(col, list) and len(col) > 0:
res = {}
for el in range(0,len(col)):
if col[el] in r:
res[col[el]] = r[col[el]]
return res
elif isinstance(col, dict) and len(col) > 0:
for el in col:
r[el] = col[el]
f = open(file, 'w')
f.write(json.dumps(r))
f.close()
return r
else:
return r
ddpos = Ddpos()

UPDATE
I made some improvements. Now the stored dict is human readable (for those unbelievers, like me) and sorted case insensitive. This sorting process for sure consumes a bit of performance, but hey, I will use the read-only process some houndred times more than the write process.
Now the stored dict looks like this:
{
"auth_descr": "dev unit, office2",
"auth_email": "me#myemail.com",
"auth_key": "550e3 **shortened sha256** d73b1",
"auth_unit_id": "2.3.1",
"vier": "44é", # utf-8 example
"Vjier": "vier44" # uppercase example
}
I don't know where I did a mistake in earlier version(s), but if you take a look on the utf-8 example. "é" now is stored as an "é" and not as "\u00e9".
The class now looks like this (with minor changes)
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import os, errno
import json
class Ddpos:
def db(self,table,id,col=''):
table = '/ddpos/db/'+table
try:
os.makedirs(table)
os.chmod(table, 0o755)
except OSError as e:
if e.errno != errno.EEXIST:
raise
file = table+'/'+id+'.txt'
if not os.path.isfile(file):
f = open(file, 'w', encoding='utf-8')
f.write('{}')
f.close()
f = open(file, 'r', encoding='utf-8')
r = json.loads(f.read().strip())
f.close()
if isinstance(col, str) and len(col) > 0:
if col in r:
return r[col]
else:
return ''
elif isinstance(col, list) and len(col) > 0:
res = {}
for el in range(0,len(col)):
if col[el] in r:
res[col[el]] = r[col[el]]
return res
elif isinstance(col, dict) and len(col) > 0:
for el in col:
r[el] = col[el]
w = '{\n'
for key in sorted(r, key=lambda y: y.lower()):
w += '\t"%s": "%s",\n' % (key, r[key])
w = w[:-2]+'\n'
w += '}'
f = open(file, 'w', encoding='utf-8')
f.write(w)
f.close()
return r
else:
return r
ddpos = Ddpos()

Related

How do I change a text dictionary from file into usable dictionary

Right so, I need to make this function that basically saves a player's username in a dictionary which is next saved in a text file to be reused again.
The problem is on reusing it I can't manage to get the str that I get from the file into a dictionary.
Here is my code:
from ast import eval
def verification(j, d):
if j in d.keys():
return d
else:
d[j] = [0,0]
return d
savefile = open("save.txt", "r")
'''d = dict()
for line in savefile:
(key, val) = line.split(".")
d[key] = val
print(d)'''
d = savefile.read()
python_dict = literal_eval(d)
savefile.close()
j = input("name? ")
result = verification(j, python_dict)
savefile = open("save.txt", "w")
'''for i in result:
text = i + "." + str(result[i]) + " \n"
savefile.write(text)'''
savefile.write(str(result))
savefile.close()
As you can see I tried with the literal_eval from ast. I also tried to do a .split() but that wouldn't work. So I'm stuck. Any ideas? It would be of great help.
Thanks
There is no need to do your own encoding/decoding from scratch when you have existing libraries to do it for you.
One good example is JSON which is also not Python exclusive so the database you create can be used by other applications.
This can be done easily by:
import json
def verification(j, d):
if j not in d:
d[j] = [0,0]
return d
with open("save.txt", "r") as savefile:
python_dict = json.load(savefile)
j = input("name? ")
result = verification(j, python_dict)
with open("save.txt", "w") as savefile:
json.dump(result, savefile)

need to read text file line by line using python and get users data into a pandas dataframe

i need to read text file line by line using python and get users data into a pandas dataframe
i tried below
import pandas as pd
y=0
Name =[]
Age =[]
with open('file.txt', 'r') as fp:
for line in fp:
if line =="<USERDATA":
row=True
break
else:
l = line.split("=")[0]
i = line.split("=")[-1]
row=False
if row == False:
if "\tName" in l:
Name.append(i)
elif "\Age" in l:
Age.append(i)
else:
pass
else:
pass
while 0<=y<(len(Name))-1:
z={"Name":Nmae[y],"Age":Age[y]}
y += 1
df = pd.DataFrame(z,columns=["Name","Age"],index=None)
files contents is some how like below:
sample
You have some logical issues, I have fixed them, I would encourage you to compare your code with mine and then try to see the differences present and if you have any doubts, comment below.
import pandas as pd
import numpy as np
y=0
Name =[]
Age =[]
z={}
with open('file.txt', 'r') as fp:
for line in fp:
line = line.strip()
if line ==r'<USERDATA':
row=True
continue
if line ==r'<USEREND':
if (len(Age)<len(Name)):
Age.append(np.nan) #only adding once since done at end of each row
elif (len(Name)<len(Age)):
Name.append(np.nan)
continue
else:
l = line.split("=")[0].strip()
i = line.split("=")[-1].strip()
row=False
if row == False:
if l == 'Name':
Name.append(i)
elif l == 'Age':
Age.append(i)
z={"Name":Name,"Age":Age}
df = pd.DataFrame(z,columns=["Name","Age"],index=None)

Python file write issue with Pandas

i wrote this python script to search for unseen mail in a mailbox, download xlsx attachment, make some modification on it and then post them to another service.
All is working perfect with just one issue:
In the original xlsx file there is a column named "zona" containing the italian two letter string for the province.
If this value is "NA" (the value of the province of NAPLES) when
saving the resultant xlsx files has blank cell instead of NA.
is NA a reserved word and if yes, there is a way to quote it?
import os,email,imaplib,socket,requests
import pandas as pd
mail_user = os.environ.get('MAIL_USER')
mail_password = os.environ.get('MAIL_PASS')
mail_server = os.environ.get('MAIL_SERVER')
detach_dir = '.'
url=<removed url>
if mail_user is None or mail_password is None or mail_server is None:
print ('VARIABILI DI AMBIENTE NON DEFINITE')
exit(1)
try:
with imaplib.IMAP4_SSL(mail_server) as m:
try:
m.login(mail_user,mail_password)
m.select("INBOX")
resp, items = m.search(None, "UNSEEN")
items = items[0].split()
for emailid in items:
resp, data = m.fetch(emailid, "(RFC822)")
email_body = data[0][1] # getting the mail content
mail = email.message_from_bytes(email_body) # parsing the mail content to get a mail object
if mail.get_content_maintype() != 'multipart':
continue
for part in mail.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
if filename.endswith('.xlsx'):
att_path = os.path.join(detach_dir, filename)
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
xl = pd.ExcelFile(att_path)
df1 = xl.parse(sheet_name=0)
df1 = df1.replace({'\'':''}, regex=True)
df1.loc[df1['Prodotto'] == 'SP_TABLETA_SAMSUNG','Cod. ID.'] = 'X'
df1.loc[df1['Prodotto'] == 'AP_TLC','Cod. ID.'] = 'X'
df1.loc[df1['Prodotto'] == 'APDCMB00003','Cod. ID.'] = 'X'
df1.loc[df1['Prodotto'] == 'APDCMB03252','Cod. ID.'] = 'X'
writer = pd.ExcelWriter(att_path, engine='xlsxwriter')
df1.to_excel(writer, sheet_name='Foglio1', index=False)
writer.save()
uf = {'files': open(att_path, 'rb')}
http.client.HTTPConnection.debuglevel = 0
r = requests.post(url, files=uf)
print (r.text)
except imaplib.IMAP4_SSL.error as e:
print (e)
exit(1)
except imaplib.IMAP4.error:
print ("Errore di connessione al server")
exit(1)
It seems that Pandas is treating the NA value as a NaN and therefore, when you write to excel it writes this value as '' by default (see docs).
You can pass na_rep='NA' to the to_excel() function in order to write it out as a string;
df1.to_excel(writer, sheet_name='Foglio1', index=False, na_rep='NA')
But as a precaution keep an eye out as any other NaN values present in your df will also be written to the excel file as 'NA'.
Reading the docs link post by #Matt B. i found this solution:
df1 = xl.parse(sheet_name=0, keep_default_na=False, na_values=['_'])
If i understand well only _ are interpreted as "not avalaible"

Archive/pack a directory with contents as plain-text representation?

Under Linux / bash, how can I obtain a plain-text representation of a directory of its contents? (Note that by "plain-text" here I mean "UTF-8").
In other words, how could I "pack" or "archive" a directory (with contents - including binary files) as a plain text file - such that I could "unpack" it later, and obtain the same directory with its contents?
I was interested in this for a while, and I think I finally managed to cook up a script that works in both Python 2.7 and 3.4 -- however, I'd still like to know if there is something else that does the same. Here it is as a Gist (with some more comments):
https://gist.github.com/anonymous/1a68bf2c9134fd5312219c8f68713632
Otherwise, I'm posting a slightly abridged version here (below) for reference.
The usage is: to archive/pack into a .json text file:
python archdir2text-json.py -a /tmp > myarchdir.json
... and to unpack from the .json text file into the current (calling) directory:
python archdir2text-json.py -u myarchdir.json
Binary files are handled as base64.
Here is the script:
archdir2text-json.py
#!/usr/bin/env python
import pprint, inspect
import argparse
import os
import stat
import errno
import base64
import codecs
class SmartDescriptionFormatter(argparse.RawDescriptionHelpFormatter):
def _fill_text(self, text, width, indent):
if text.startswith('R|'):
paragraphs = text[2:].splitlines()
rebroken = [argparse._textwrap.wrap(tpar, width) for tpar in paragraphs]
rebrokenstr = []
for tlinearr in rebroken:
if (len(tlinearr) == 0):
rebrokenstr.append("")
else:
for tlinepiece in tlinearr:
rebrokenstr.append(tlinepiece)
return '\n'.join(rebrokenstr)
return argparse.RawDescriptionHelpFormatter._fill_text(self, text, width, indent)
textchars = bytearray({7,8,9,10,12,13,27} | set(range(0x20, 0x100)) - {0x7f})
is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))
cwd = os.getcwd()
if os.name == 'nt':
import win32api, win32con
def folder_is_hidden(p):
if os.name== 'nt':
attribute = win32api.GetFileAttributes(p)
return attribute & (win32con.FILE_ATTRIBUTE_HIDDEN | win32con.FILE_ATTRIBUTE_SYSTEM)
else:
return os.path.basename(p).startswith('.') #linux-osx
def path_hierarchy(path):
hierarchy = {
'type': 'folder',
'name': os.path.basename(path),
'path': path,
}
try:
cleared_contents = [contents
for contents in os.listdir(path)
if not(
os.path.isdir(os.path.join(path, contents))
and
folder_is_hidden(os.path.join(path, contents))
)]
hierarchy['children'] = [
path_hierarchy(os.path.join(path, contents))
for contents in cleared_contents
]
except OSError as e:
if e.errno == errno.ENOTDIR:
hierarchy['type'] = 'file'
else:
hierarchy['type'] += " " + str(e)
if hierarchy['type'] == 'file':
isfifo = stat.S_ISFIFO(os.stat(hierarchy['path']).st_mode)
if isfifo:
ftype = "fifo"
else:
try:
data = open(hierarchy['path'], 'rb').read()
ftype = "bin" if is_binary_string(data) else "txt"
if (ftype == "txt"):
hierarchy['content'] = data.decode("utf-8")
else:
hierarchy['content'] = base64.b64encode(data).decode("utf-8")
except Exception as e:
ftype = str(e)
hierarchy['ftype'] = ftype
return hierarchy
def recurse_unpack(inobj, relpath=""):
if (inobj['type'] == "folder"):
rpname = relpath + inobj['name']
sys.stderr.write("folder name: " + rpname + os.linesep);
os.mkdir(rpname)
for tchild in inobj['children']:
recurse_unpack(tchild, relpath=relpath+inobj['name']+os.sep)
elif (inobj['type'] == "file"):
rfname = relpath + inobj['name']
sys.stderr.write("file name: " + rfname + os.linesep)
if inobj['ftype'] == "txt":
with codecs.open(rfname, "w", "utf-8") as text_file:
text_file.write(inobj['content'])
elif inobj['ftype'] == "bin":
with open(rfname, "wb") as bin_file:
bin_file.write(base64.b64decode(inobj['content']))
if __name__ == '__main__':
import json
import sys
parser = argparse.ArgumentParser(formatter_class=SmartDescriptionFormatter, description="""R|Command-line App that packs/archives (and vice-versa) a directory to a plain-text .json file; should work w/ both Python 2.7 and 3.4
see full help text in https://gist.github.com/anonymous/1a68bf2c9134fd5312219c8f68713632""")
parser.add_argument('input_paths', type=str, nargs='*', default=['.'],
help='Paths to files/directories to include in the archive; or path to .json archive file')
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument('-a', '--archive', action='store_true', help="Interpret input_paths as paths to files/directories, and archive them to a .json file (output to stdout)")
group.add_argument('-u', '--unpack', action='store_true', help="Interpret input_paths as path to an archive .json file, and unpack it in the current directory")
args = parser.parse_args()
if (args.archive):
valid_input_paths = []
for p in args.input_paths:
if os.path.isdir(p) or os.path.exists(p):
valid_input_paths.append(p)
else:
sys.stderr.write("Ignoring invalid input path: " + p + os.linesep)
sys.stderr.write("Encoding input path(s): " + str(valid_input_paths) + os.linesep)
path_hier_arr = [path_hierarchy(vp) for vp in valid_input_paths]
outjson = json.dumps(path_hier_arr, indent=2, sort_keys=True, separators=(',', ': '))
print(outjson)
elif (args.unpack):
valid_input_paths = []
for p in args.input_paths:
if os.path.isdir(p) or os.path.exists(p):
valid_input_paths.append(p)
else:
sys.stderr.write("Ignoring invalid input path: " + p + os.linesep)
for vp in valid_input_paths:
with open(vp) as data_file:
data = json.load(data_file)
for datachunk in data:
recurse_unpack(datachunk)

How to merge two lists at a delimited token in python3

I am a CS major at the University of Alabama, we have a project in our python class and I am stuck...probably for some stupid reason, but I cant seem to find the answer.
here is the link to the project, as it would be a pain to try and explain on here.
http://beastie.cs.ua.edu/cs150/projects/project1.html
here is my code:
import sys
from scanner import scan
def clInput():
#Gets command line input
log1 = sys.argv[1]
log2 = sys.argv[2]
name = sys.argv[3]
if len(sys.argv) != 4:
print('Incorrect number of arguments, should be 3')
sys.exit(1)
return log1,log2,name
def openFiles(log1,log2):
#Opens sys.argv[1]&[2] for reading
f1 = open(log1, 'r')
f2 = open(log2, 'r')
return f1, f2
def merge(log1,log2):
#Merges parsed logs into list without '---'
log1Parse = [[]]
log2Parse = [[]]
log1Count = 0
log2Count = 0
for i in log1:
if i != ['---']:
log1Parse[log1Count].append(i)
else:
log1Count += 1
log1Parse.append([])
for i in log2:
if i != ['---']:
log2Parse[log2Count].append(i)
else:
log2Count += 1
log2Parse.append([])
return(log1Parse[0] + log2Parse[0] + log1Parse[1] + log2Parse[1])
def searchMerge(name,merged):
#Searches Merged list for sys.argv[3]
for i in range(len(merged)):
if (merged[i][1] == name):
print(merged[i][0],merged[i][1]," ".join(merged[i][2:]))
def main():
log1,log2,name = clInput()
f1,f2 = openFiles(log1,log2)
#Sets the contents of the two scanned files to variables
tokens1 = scan(f1)
tokens2 = scan(f2)
#Call to merge and search
merged = merge(tokens1,tokens2)
searchMerge(name,merged)
main()
ok. so heres the problem. We are to merge two lists together into a sorted master list, delimited at the ---'s
my two log files match the ones posted on the website i linked to above. This code works, however if there are more than two instances of the ---'s in each list, it will not jump to the next list to get the other tokens, and so forth. I have it working for two with the merge function. at the end of that function i return
return(log1Parse[0] + log2Parse[0] + log1Parse[1] + log2Parse[1])
but this only works for two instances of ---. Is there anyway i can change my return to look at all of the indexes instead of having to manually put in [0],[1],[2], etc.? I need it to delimit and merge for an arbitrary amount. Please help!!
p.s. disregard the noobness...im a novice, we all gotta start somewhere
p.p.s. - the from scanner import scan is a scanner i wrote to take in all of the tokens in a given list
so.py:
import sys
def main():
# check and load command line arguments
# your code
if len(sys.argv) != 4:
print('Incorrect number of arguments, should be 3')
sys.exit(1)
# open files using file io
# your code
f1 = open(log1, 'r')
f2 = open(log2, 'r')
# list comprehension to process and filter log files
l1 = [ x.strip().split(" ",2) for x in f1.readlines() if x.strip() != "---" ]
l2 = [ x.strip().split(" ",2) for x in f2.readlines() if x.strip() != "---" ]
f1.close()
f2.close()
sorted_merged_lists = sorted(l1 + l2)
results = [ x for x in sorted_merged_lists if x[1] == name ]
for result in results:
print result
main()
CLI:
$ python so.py log1.txt log2.txt Matt
['12:06:12', 'Matt', 'Logged In']
['13:30:07', 'Matt', 'Opened Terminal']
['15:02:00', 'Matt', 'Opened Evolution']
['15:31:16', 'Matt', 'Logged Out']
docs:
http://docs.python.org/release/3.0.1/tutorial/datastructures.html#list-comprehensions
http://docs.python.org/release/3.0.1/library/stdtypes.html?highlight=strip#str.strip
http://docs.python.org/release/3.0.1/library/stdtypes.html?highlight=split#str.split
http://docs.python.org/release/3.0.1/library/functions.html?highlight=sorted#sorted

Resources