When I run this script I can verify that it loops through all of the values, but not all of them get passed into my dictionary
file = open('path', 'rb')
readFile = PyPDF2.PdfFileReader(file)
lineData = {}
totalPages = readFile.numPages
for i in range(totalPages):
pageObj = readFile.getPage(i)
pageText = pageObj.extractText
newTrans = re.compile(r'Jan \d{2,}')
for line in pageText(pageObj).split('\n'):
if newTrans.match(line):
newValue = re.split(r'Jan \d{2,}', line)
newValueStr = ' '.join(newValue)
newKey = newTrans.findall(line)
newKeyStr = ' '.join(newKey)
print(newKeyStr + newValueStr)
lineData[newKeyStr] = newValueStr
print(len(lineData))
There are 80+ data pairs but when I run this the dict only gets 37
Well, duplicate keys, maybe? Try to make lineData = [] and append there: lineData.append({newKeyStr:newValueStr} and then check how many records you get.
Related
I'm reading a file called MissingItems.txt, the contents of which is a lsit of bar codes and looks like this
[3000000.0, 5000000.0, 6000000.0, 7000000.0, 8000000.0, 1234567.0, 1234568.0, 9876543.0, 3000001.0, 5000001.0, 6000001.0, 7000001.0, 8000001.0, 1234561.0, 1234561.0, 9876541.0, 6000002.0, 7000002.0, 8000002.0, 1234562.0, 1234562.0, 9876542.0,9876543.0,9876544.0]
I have replaced the square brackets and then split the line as below
OpenFile = open(r"G:MissingItems.txt","r")
for line in OpenFile:
remove = line.replace('[','')
remove1 = remove.replace(']','')
plates = remove1.split(",")
Plate1 = plates[0]
Plate2 = plates[1]
Plate3 = plates[2]
Plate4 = plates[3]
Plate5 = plates[4]
Plate6 = plates[5]
Plate7 = plates[6]
Plate8 = plates[7]
Plate9 = plates[8]
Plate10 = plates[9]
Plate11 = plates[10]
Plate12 = plates[11]
Plate13 = plates[12]
Plate14 = plates[13]
Plate15 = plates[14]
Plate16 = plates[15]
Plate17 = plates[16]
Plate18 = plates[17]
Plate19 = plates[18]
Plate20 = plates[19]
Plate21 = plates[20]
Plate22 = plates[21]
Plate23 = plates[22]
Plate24 = plates[23]
Is there a way to remove the .0 from the bar codes, preferable before splitting? So I would get '3000000', rather than '3000000.0'. I've tried to use replace, but I'm not sure how to get it to recognize they are at the end of the bar codes.
This is one approach using ast.literal_eval and int.
Ex:
import ast
with open(r"G:MissingItems.txt","r") as infile:
for line in infile:
plates = [int(i) for i in ast.literal_eval(line.strip())]
print(plates)
# --> [3000000, 5000000, 6000000, 7000000, 8000000, 1234567, 1234568, 9876543, 3000001, 5000001, 6000001, 7000001, 8000001, 1234561, 1234561, 9876541, 6000002, 7000002, 8000002, 1234562, 1234562, 9876542, 9876543, 9876544]
Your file seems to have JSON formatted lines, so you could use a JSON parser:
import json
with open(r"G:MissingItems.txt","r") as openfile:
for line in openfile:
plate = json.loads(line)
print(plate)
This makes plate a list of numbers (not strings), so the difference between 3000.0 and 3000 disappears (as they are representations of the same number). It is only when you would need to output them in a decimal representation that you would worry about the number of decimals to output.
Secondly, it is bad practice to create separate variables for plate1 plate2, ... In such a scenario you should work with a list, and access the values with plate[0], plate[1], ...
fd = open(nom_fichier, 'r')
liste_chaine = fd.readlines()
liste_chaine2 = []
for item in liste_chaine:
if item not in "'noir\n','blanc\n','Humain\n', 'Ordinateur\n', 'False\n', 'True\n":
liste_chaine2.append(item)
liste_chaine2 = [i.replace('\n', '') for i in liste_chaine2]
return liste_chaine2
['3,3,blanc', '3,4,noir', '4,3,noir', '4,4,blanc']
i am reading a file and trying to return a string output exactly like :
3,3,blanc
4,3,noir
3,4,white
i cleaned the file with the code above but need to clean up this list to the required output
You can split your string and put it together again to meet your requirements:
string = '33blanc 34noir 43noir 44blanche'
result = '\n'.join(['{},{},{}'.format(v[0], v[1], v[2:]) for v in string.split()])
print(result)
3,3,blanc
3,4,noir
4,3,noir
4,4,blanche
Trying to create a PRAW scraper that can pull the comments from a list of sub_ids. Only returns the last sub_ids comment data.
I'm guessing I must be overwriting something. I've looked through other questions but because I'm using PRAW it has specific parameters and I can't figure out what could/should be replaced.
sub_ids = ["2ypash", "7ocvlb", "7okxkf"]
for sub_id in sub_ids:
submission = reddit.submission(id=sub_id)
submission.comments.replace_more(limit=None, threshold=0)
comments = submission.comments.list()
commentlist = []
for comment in comments:
commentsdata = {}
commentsdata["id"] = comment.id
commentsdata["subreddit"] = str(submission.subreddit)
commentsdata["thread"] = str(submission.title)
commentsdata["author"] = str(comment.author)
commentsdata["body"] = str(comment.body)
commentsdata["score"] = comment.score
commentsdata["created_utc"] = datetime.datetime.fromtimestamp(comment.created_utc)
commentsdata["parent_id"] = comment.parent_id
commentlist.append(commentsdata)
Indentation was your downfall. The reason your code was failing was because comments were only assigned after the sub_ids have finished looping. So when you iterate through comments, they're only the last sub_id's comments.
First, move the commentlist = [] out before both for loops (so that it's right after line 1)
Second, everything from comments = submission.comments.list() (inclusive) onward needs to be indented so they're ran within the sub_ids iteration.
Here is what it should look like finally:
sub_ids = ["2ypash", "7ocvlb", "7okxkf"]
commentlist = []
for sub_id in sub_ids:
submission = reddit.submission(id=sub_id)
submission.comments.replace_more(limit=None, threshold=0)
comments = submission.comments.list()
for comment in comments:
commentsdata = {}
commentsdata["id"] = comment.id
commentsdata["subreddit"] = str(submission.subreddit)
commentsdata["thread"] = str(submission.title)
commentsdata["author"] = str(comment.author)
commentsdata["body"] = str(comment.body)
commentsdata["score"] = comment.score
commentsdata["created_utc"] = datetime.datetime.fromtimestamp(comment.created_utc)
commentsdata["parent_id"] = comment.parent_id
commentlist.append(commentsdata)
I have a binary data file I would like to append a header to using python. Below is the code I have to create the header but I am unsure on how to add it to the test.dat file.
import struct
import os
from struct import *
date = 20151027
version = 1
datatype = str.encode('P')
indextype = str.encode('I')
recct = int(os.path.getsize("H:\\test\\test.dat")/16)
delim = str.encode(' ')
filler = str.encode(' ')
delta = 'F'
pdate = pack('l', date)
pversion = pack('h', version)
pdatatype = pack('>s', datatype)
pindextype = pack('>s', indextype)
precct = pack('l', recct)
pdelim = pack('s', delim)
pfiller = pack('<2s', filler)
header = pdate+pversion+pdatatype+pindextype+precct,pdelim,pfiller
Read the file in, then write the file out with the header. Be sure to use binary mode:
with open(r'H:\test\test.dat','rb') as f:
data = f.read()
with open(r'H:\test\test.dat','wb') as f:
f.write(header + data)
Also, you can pack in one statement:
header = struct.pack('lhssls2s',date,version,datatype,indextype,recct,delim,filler)
str.encode('P') is an odd way of saying 'P'.encode() or just b'P'.
I have a function to load sounds, but not one for loading images. This is how my image loading is layed out currently:
if os.path.exists("themes/voltorb"):
vgui = pygame.image.load("themes/voltorb/gui.png")
voptions = pygame.image.load("themes/voltorb/options.png")
vachievements = pygame.image.load("themes/voltorb/achievements.png")
voverlay = pygame.image.load("themes/voltorb/overlay.png")
vconfirm = pygame.image.load("themes/voltorb/confirm.png")
vboom = pygame.mixer.Sound("themes/voltorb/boom.mp3")
vcoin = pygame.mixer.Sound("themes/voltorb/coin.mp3")
vtheme = {"gui":vgui,"options":voptions,"achievements":vachievements,"overlay":voverlay,"confirm":vconfirm,"coin":vcoin,"boom":vboom,"music":vmusic}
themedb.update({"v":vtheme})
if os.path.exists("themes/fluttershy"):
fcoin = pygame.mixer.Sound("themes/fluttershy/coin.mp3")
fgui = pygame.image.load("themes/fluttershy/gui.png")
foptions = pygame.image.load("themes/fluttershy/options.png")
fachievements = pygame.image.load("themes/fluttershy/achievements.png")
foverlay = pygame.image.load("themes/fluttershy/overlay.png")
ftheme = {"gui":fgui,"options":foptions,"achievements":fachievements,"overlay":foverlay,"confirm":fconfirm,"coin":vcoin,"boom":vboom,"music":vmusic}
themedb.update({"f":ftheme})
if os.path.exists("themes/mario"):
mgui = pygame.image.load("themes/mario/gui.png")
moptions = pygame.image.load("themes/mario/options.png")
machievements = pygame.image.load("themes/mario/achievements.png")
moverlay = pygame.image.load("themes/mario/overlay.png")
mtheme = {"gui":mgui,"options":moptions,"achievements":machievements,"overlay":moverlay,"confirm":mconfirm,"coin":vcoin,"boom":vboom,"music":vmusic}
themedb.update({"m":mtheme})
if os.path.exists("%appdata%/KWScripts/Voltorb/themes/secret1"):
s1gui = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret1/gui.png")
s1options = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret1/options.png")
s1achievements = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret1/achievements.png")
s1overlay = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret1/overlay.png")
s1theme = {"gui":s1gui,"options":s1options,"achievements":s1achievements,"overlay":s1overlay,"confirm":s1confirm,"coin":vcoin,"boom":vboom,"music":vmusic}
themedb.update({"s1":s1theme})
if os.path.exists("%appdata%/KWScripts/Voltorb/themes/secret2"):
s2gui = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret2/gui.png")
s2options = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret2/options.png")
s2achievements = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret2/achievements.png")
s2overlay = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret2/overlay.png")
s2theme = {"gui":s2gui,"options":s2options,"achievements":s2achievements,"overlay":s2overlay,"confirm":s2confirm,"coin":s2coin,"boom":s2boom,"music":s2music}
themedb.update({"s2":s2theme})
if os.path.exists("%appdata%/KWScripts/Voltorb/themes/secret3"):
s3gui = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret3/gui.png")
s3options = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret3/options.png")
s3achievements = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret3/achievements.png")
s3overlay = pygame.image.load("%appdata%/KWScripts/Voltorb/themes/secret3/overlay.png")
s3theme = {"gui":s3gui,"options":s3options,"achievements":s3achievements,"overlay":s3overlay,"confirm":s3confirm,"coin":s3coin,"boom":s3boom,"music":s3music}
themedb.update({"s3":s3theme})
I'm not sure if there's any easy way to do this, but I have the most difficult way typed already. If anyone has an idea of how to shorten this, then thanks!
Take all your images and put them in a dict, where the key is the variable you were using, and the value is the path:
vimages = {'vgui': "themes/voltorb/gui.png", 'voptions': "themes/voltorb/options.png", 'vachievements': "themes/voltorb/achievements.png"} # and so on...
Then, iterate through vimages, checking for the existence of each individual file, then calling pygame.image.load() on it, and store the result in your already-existing dict (vtheme, in this case).
This way, you don't need to keep writing out pygame.image.load() over and over again.