KeyError: 'AK' census2010.allData['AK']['Anchorage'] - python-3.x

'''
Reads the data from the Excel spreadsheet.
Counts the number of census tracts in each county.
Counts the total population of each county.
Prints the results.
This means your code will need to do the follwing:
Open and read the cells of an Excel document with the openpyxl module.
Calculate all the tract and population data and store it in a data structure.
Write the dat structure to a text file with the .py extension using the pprint module.
'''
import openpyxl,os,pprint
os.chdir('C:\Python34')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb.get_sheet_by_name('Population by Census Tract')
CountyData = {}
for row in range(2,sheet.max_row + 1):
state = sheet['B' + str(row)].value
county= sheet['C' + str(row)].value
pop = sheet['D' + str(row)].value
CountyData.setdefault(state, {})
CountyData[state].setdefault(county, {'tracts': 0, 'pop': 0})
CountyData[state][county]['tracts'] += 1
CountyData[state][county]['pop'] += int(pop)
print('Writing the results...')
resultFile = open('census2010.py', 'w')
resultFile.write('allData = ' + pprint.pformat(CountyData))
resultFile.close()
print('Done')
I can't deal with some KeyErrors. I made this program by following the instructions of "Project: Reading Data from a Spreadsheet" on this website: https://automatetheboringstuff.com/chapter12/
(I downloaded the excel file from here: https://www.nostarch.com/automatestuff/)
When I typed census2010.allData['AK']['Anchorage'], I got KeyError: 'AK'. I tried typing the other state abbreviations, but it didn't work either. Please help me out with this.

Related

extracting data from multiple pdfs and putting that data into an excel table

I am taking data extracted from multiple pdfs that were merged into one pdf.
The data is based on clinical measurements taken from a sample at different time points. Some time points have certain measurement values while others are missing.
So far, I've been able to merge the pdfs, extract the text and specific data from the text, but I want to put it all into a corresponding excel table.
Below is my current code:
import PyPDF2
from PyPDF2 import PdfFileMerger
from glob import glob
#merge all pdf files in current directory
def pdf_merge():
merger = PdfFileMerger()
allpdfs = [a for a in glob("*.pdf")]
[merger.append(pdf) for pdf in allpdfs]
with open("Merged_pdfs1.pdf", "wb") as new_file:
merger.write(new_file)
if __name__ == "__main__":
pdf_merge()
#scan pdf
text =""
with open ("Merged_pdfs1.pdf", "rb") as pdf_file, open("sample.txt", "w") as text_file:
read_pdf = PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
for page_number in range(0, number_of_pages):
page = read_pdf.getPage(page_number)
text += page.extractText()
text_file.write(text)
#turn text script into list, separated by newlines
def Convert(text):
li = list(text.split("\n"))
return li
li = Convert(text)
filelines = []
for line in li:
filelines.append(line)
print(filelines)
#extract data from text and put into dictionary
full_data = []
test_data = {"Sample":[], "Timepoint":[],"Phosphat (mmol/l)":[], "Bilirubin, total (µmol/l)":[],
"Bilirubin, direkt (µmol/l)":[], "Protein (g/l)":[], "Albumin (g/l)":[],
"AST (U/l)":[], "ALT (U/l)":[], "ALP (U/l)":[], "GGT (U/l)":[], "IL-6 (ng/l)":[]}
for line2 in filelines:
# For each data item, extract it from the line and strip whitespace
if line2.startswith("Phosphat"):
test_data["Phosphat (mmol/l)"].append(line2.split(" ")[-2].strip())
if line2.startswith("Bilirubin,total"):
test_data["Bilirubin, total (µmol/l)"].append(line2.split(" ")[-2].strip())
if line2.startswith("Bilirubin,direkt"):
test_data["Bilirubin, direkt (µmol/l)"].append(line2.split(" ")[-4].strip())
if line2.startswith("Protein "):
test_data["Protein (g/l)"].append( line2.split(" ")[-2].strip())
if line2.startswith("Albumin"):
test_data["Albumin (g/l)"].append(line2.split(" ")[-2].strip())
if line2.startswith("AST"):
test_data["AST (U/l)"].append(line2.split(" ")[-2].strip())
if line2.startswith("ALT"):
test_data["ALT (U/l)"].append(line2.split(" ")[-4].strip())
if line2.startswith("Alk."):
test_data["ALP (U/l)"].append(line2.split(" ")[-2].strip())
if line2.startswith("GGT"):
test_data["GGT (U/l)"].append(line2.split(" ")[-4].strip())
if line2.startswith("Interleukin-6"):
test_data["IL-6 (ng/l)"].append(line2.split(" ")[-4].strip())
for sampnum in range(100):
num = str(sampnum)
sampletype = "T" and "H"
if line2.startswith(sampletype+num):
sample = sampletype+num
test_data["Sample"]=sample
for time in range(0,360):
timepoint = str(time) + "h"
word_list = list(line2.split(" "))
for word in word_list:
if word == timepoint:
test_data["Timepoint"].append(word)
full_data.append(test_data)
import pandas as pd
df = pd.DataFrame(full_data)
df.to_excel("IKC4.xlsx", sheet_name="IKC", index=False)
print(df)
The issue is I'm wondering how to move the individual items in the list to their own cells in excel, with the proper timepoint, since they dont necessarily correspond to the right timepoint. For example, timepoint 1 and 3 can have protein measurements, whereas timepoint 2 is missing this info, but timepoint 3 measurements are found at position 2 in the list and will likely be in the wrong row for an excel table.
I figured maybe I need to make an alternative dictionary for the timepoints, and attach the corresponding measurements to the proper timepoint. I'm starting to get confused though on how to do all this and am now asking for help!
Thanks in advance :)
I tried doing an "else" argument after every if argument to add a "-" if there if a measurement wasnt present for that timepoint, but I got far too many dashes since it iterates through the lines of the entire pdf.

How to store the result from loop into the variable or the list

I have the code below that runs on the active excel sheet to check specified cells.
If the specified cell shows "Fail", it will print out the failed person and the time.
import xlwings as xw
import xlrd
def check_result():
sheet = xw.books.active.sheets.active
for x in range(1, 5):
if sheet['B' + str(x)].value =="Fail":
print(sheet['A' + str(x)].value, xlrd.xldate_as_datetime(sheet['C' + str(x)].value, 0))
check_result()
Sample Data
How can I save this printed result into the variable or the list?
The excel file (.xlsm) is connecting to the third party software, and this file needs to be opened to generate the data.
From the print statement, it looks like that there are three outputs. You can have a list or tuple to save the printed data like this.
With List
List = []
List.append([sheet['A' + str(x)].value, xlrd.xldate_as_datetime(sheet['C' + str(x)].value, 0])
NOTE: define List outside of your for loop.
Let me know if there is any problem with the code.

Trying to compare two integers in Python

Okay, I have been digging through Stackoverflow and other sites trying understand why this is not working. I created a function to open a csv file. The function opens the file once to count the number of rows then again to actually process the file. What I am attempting to do is this. Once a file has been processed and the record counts match. I will then load the data into a database. The problem is that the record counts are not matching. I checked both variables and they are both 'int', so I do not understand why '==' is not working for me. Here is the function I created:
def mktdata_import(filedir):
'''
This function is used to import market data
'''
files = []
files = filedir.glob('*.csv')
for f in files:
if fnmatch.fnmatch(f,'*NASDAQ*'):
num_rows = 0
nasObj = []
with open(f,mode='r') as nasData:
nasIn = csv.DictReader(nasData, delimiter=',')
recNum = sum(1 for _ in nasData)
with open(f,mode='r') as nasData:
nasIn = csv.DictReader(nasData, delimiter=',')
for record in nasIn:
if (recNum - 1) != num_rows:
num_rows += 1
nasObj.append(record)
elif(recNum - 1) == num_rows:
print('Add records to database')
else:
print('All files have been processed')
print('{} has this many records: {}'.format(f, num_rows))
print(type(recNum))
print(type(num_rows))
else:
print("Not a NASDAQ file!")
(moving comment to answer)
nasData includes all the rows in the file, including the header row. When converting the data to dictionaries with DictReader, only the data rows are processed so len(nasData) will always be one more than len(nasIn)
As the OP mentioned, iterating the elements did not work so using the line number was required to get the script working: (recNum) == nasIn.line_num

Read file and output specific fields to CSV file

I'm trying to search for data based on a key word and export that data to an Excel or text file.
When I "print" the variable/list it works no problem. When I try and output the data to a file it only outputs the last entry. I think something is wrong with the iteration, but I can't figure it out.
import xlsxwriter
#Paths
xls_output_path = 'C:\\Data\\'
config = 'C:\\Configs\\filename.txt'
excel_inc = 0 #used to increment the excel columns so not everything
#is written in "A1"
lines = open(config,"r").read().splitlines()
search_term = "ACL"
for i, line in enumerate(lines):
if search_term in line:
split_lines = line.split(' ') #Split lines via a space.
linebefore = lines[i - 1] #Print the line before the search term
linebefore_split = linebefore.split(' ') #Split the line before via
#space
from_obj = linebefore_split[2] #[2] holds the data I need
to_object = split_lines[4] #[4] holds the data I need
print(len(split_lines)) #Prints each found line with no
#problem.
excel_inc = excel_inc + 1 #Increments for column A so not all of
#the data is placed in A1
excel_inc_str = str(excel_inc) #Change type to string so it can
#concatenate.
workbook = xlsxwriter.Workbook(xls_output_path + 'Test.xlsx') #Creates the xls file
worksheet = workbook.add_worksheet()
worksheet.write('A' + excel_inc_str, split_lines[4]) #Write data from
#split_lines[4]
#to column A
workbook.close()
I created this script so it will go and find all lines in the "config" file with the keyword "ACL".
It then has the ability to print the line before and the actual line the data is found. This works great.
My next step is outputting the data to an excel spreadsheet. This is where I get stuck.
The script only prints the very last item in the column A row 10.
I need help figuring out why it'll print the data correctly, but it won't output it to an excel spreadsheet or even a .txt file.
Try this - I moved your workbook and worksheet definitions outside the loop, so it doesn't keep getting redefined.
import xlsxwriter
#Paths
xls_output_path = 'C:\\Data\\'
config = 'C:\\Configs\\filename.txt'
excel_inc = 0 #used to increment the excel columns so not everything
#is written in "A1"
lines = open(config,"r").read().splitlines()
search_term = "ACL"
workbook = xlsxwriter.Workbook(xls_output_path + 'Test.xlsx') #Creates the xls file
worksheet = workbook.add_worksheet()
for i, line in enumerate(lines):
if search_term in line:
split_lines = line.split(' ') #Split lines via a space.
linebefore = lines[i - 1] #Print the line before the search term
linebefore_split = linebefore.split(' ') #Split the line before via
#space
from_obj = linebefore_split[2] #[2] holds the data I need
to_object = split_lines[4] #[4] holds the data I need
print(len(split_lines)) #Prints each found line with no
#problem.
excel_inc = excel_inc + 1 #Increments for column A so not all of
#the data is placed in A1
excel_inc_str = str(excel_inc) #Change type to string so it can
#concatenate.
worksheet.write('A' + excel_inc_str, split_lines[4]) #Write data from
#split_lines[4]
#to column A
workbook.close()

Creating a dictionary from one excel workbook, matching the keys with another workbook, paste values

I hope someone can provide a little help. I'm attempting to pull data from one excel workbook, titled DownTime, and create a dictionary of coil(product) numbers matched with "codes" that coil has experienced. I have been able to accomplish this part, it's pretty straight forward.
The part that is tripping me up, is how to match the coil numbers with a different excel workbook, and paste in the corresponding "codes".
So here is what I have so far:
import openpyxl
from collections import defaultdict
DT = openpyxl.load_workbook('DownTime.xlsm')
bl2 = DT.get_sheet_by_name('BL2')
CS = openpyxl.load_workbook('CoilSummary.xlsm')
line = CS.get_sheet_by_name('BL2')
#opening needed workbooks with specific worksheets
coil =[]
rc = []
code = defaultdict(set)
cnum = ''
next_row = 2
col = 32
for row in range(2, bl2.max_row + 1):
coil = bl2['K' + str(row)].value
rc = bl2['D' + str(row)].value
code[coil].add(rc)
# Creating a dictionary that represents each coil with corresponding codes
for key,value in code.items():
cnum = line['B' + str(row)].value
if cnum == key:
line.write(next_row, col, value)
next_row+=1
# Attempting to match coil numbers with dictionary and column B
# if the key is present, paste the value in column AF
CS.close()
DT.close()
A sample output of the dictionary looks as follows:
('M30434269': {106, 107, 173}, 'M30434270': {132, 424, 106, 173, 188}, 'M30434271': {194, 426, 202, 106, 173}})
Only there are about 22,000 entries.
So to reiterate what I want to accomplish:
I want to take this dictionary that I made from the workbook DownTime, match the keys with a column in CoilSummary, and if the keys match the cell entry, paste the value into a blank cell at the end of the table.
Example:
"CoilNum" "Date" "Shift" "info1" "info2" "Code"
M30322386 03/03/2017 06:48:30 3 1052 1722 ' '
M30322390 03/03/2017 05:18:26 3 703 1662 ' '
I would like to match the "CoilNum" with the keys in the dictionary, and paste the values into "Code".
I hope I explained that well enough. Any help with the code, or point to a website for reference, would be very much appreciated. I just don't want to have to type all of these codes in by hand!
Thank you!
After much research and trial and error, accidentally corrupting excel files and getting generally frustrated with python and excel, I figured it out. Here is what I have:
# -*- coding: utf-8 -*-
# importing tools needed for the code to work
import pyexcel as pe
from collections import defaultdict
import openpyxl as op
coil =''
rc = {}
code = defaultdict(list)
next_row = 2
col = 33
cnum = []
temp = ''
def write_data(code,cnum):
''' Used to open a given sheet in a workbook. The code will then compare values
collected from one column in a specific sheet referred to as "coils" and compares it to a dictionary where the key's are also "coils."
If the coil number matches, the code will then paste the values in a new workbook. From here the values can be copied by hand and pasted into the excel file of choice.'''
sheet = pe.get_sheet(file_name="CoilSummaryTesting.xlsx")
next_row = 2
lst = []
while next_row <= len(cnum):
for key in code.keys():
for step in cnum:
if str(step) == str(key):
for val in code.values():
temp = val
lst.append(temp)
next_row+=1
if step!=key:
break
break
for item in lst:
sublist = (" ").join(str(item))
sheet.row+= [sublist]
sheet.save_as("CoilSummaryTest.xlsx")
print("\nCoils Compared: ",next_row)
def open_downtime():
''' Pull data from a second excel file to obtain the coil numbers with corresponding downtime codes'''
DT = op.load_workbook('DownTime.xlsm')
bl2 = DT.get_sheet_by_name('BL2')
n = 1
for row in bl2.iter_cols(min_col=11,max_col=11):
for colD in row:
code[colD.offset(row=1,column=0).value].append(colD.offset(row=1,column=-7).value
n+=1
print('\nNumber of rows in DownTime file: ',n)
return code
def open_coil():
'''Opens the first workbook and sheet to know how many rows are needed for coil comparision.'''
i = 1
CSR = op.load_workbook('CoilSummaryTesting.xlsx')
line_read = CSR.get_sheet_by_name('BL2')
for rows in line_read.iter_cols(min_col=2, max_col=2):
for col in rows:
cnum.append(col.offset(row=1,column=0).value)
i+=1
print('\nNumber of rows in CoilSummary file: ',i)
return write_data(open_downtime(),cnum)
def main():
sheet = open_coil()
if __name__ == "__main__":
main()
I understand this is probably not the shortest version of this code and there are probably a lot of ways to get it to paste directly into the excel file of my choice, but I couldn't figure that part out yet.
What I did differently is using pyexcel. This proved to be the easiest when it came to just pasting values into rows or columns. Using join, I broke the generated list of lists up to allow each sublist to be inserted in its own row. I currently settled on having the generated rows saved to a different excel workbook because having continuously corrupted workbooks during this exploration; however, if anyone knows how to manipulate this code to eliminate the last step of having to copy the rows to paste into the desired workbook, please let me know.

Resources