Read out .csv and hand results to a dictionary - python-3.x

I am learning some coding, and I am stuck with an error I can't explain. Basically I want to read out a .csv file with birth statistics from the US to figure out the most popular name in the time recorded.
My code looks like this:
# 0:Id, 1: Name, 2: Year, 3: Gender, 4: State, 5: Count
names = {} # initialise dict names
maximum = 0 # store for maximum
l = []
with open("Filepath", "r") as file:
for line in file:
l = line.strip().split(",")
try:
name = l[1]
if name in names:
names[name] = int(names[name]) + int(l(5))
else:
names[name] = int(l(5))
except:
continue
print(names)
max(names)
def max(values):
for i in values:
if names[i] > maximum:
names[i] = maximum
else:
continue
return(maximum)
print(maximum)
It seems like the dictionary does not take any values at all since the print command does not return anything. Where did I go wrong (incidentally, the filepath is correct, it takes a while to get the result since the .csv is quite big. So my assumption is that I somehow made a mistake writing into the dictionary, but I was staring at the code for a while now and I don't see it!)

A few suggestions to improve your code:
names = {} # initialise dict names
maximum = 0 # store for maximum
with open("Filepath", "r") as file:
for line in file:
l = line.strip().split(",")
names[name] = names.get(name, 0) + l[5]
maximum = [(v,k) for k,v in names]
maximum.sort(reversed=True)
print(maximum[0])
You will want to look into Python dictionaries and learn about get. It helps you accomplish the objective of making your names dictionary in less lines of codes (more Pythonic).
Also, you used def to generate a function but you never called that function. That is why it's not printing.
I propose the shorted code above. Ask if you have questions!

Figured it out.
I think there were a few flow issues: I called a function before defining it... is that an issue or is python okay with that?
Also I think I used max as a name for a variable, but there is a built-in function with the same name, that might cause an issue I guess?! Same with value
This is my final code:
names = {} # initialise dict names
l = []
def maxval(val):
maxname = max(val.items(), key=lambda x : x[1])
return maxname
with open("filepath", "r") as file:
for line in file:
l = line.strip().split(",")
name = l[1]
try:
names[name] = names.get(name, 0) + int(l[5])
except:
continue
#print(str(l))
#print(names)
print(maxval(names))

Related

CSV manipulation problem. A little complex and would like the solution to not be using pandas

CSV file:
Acct,phn_1,phn_2,phn_3,Name,Consent,zipcode
1234,45678,78906,,abc,NCN,10010
3456,48678,,78976,def,NNC,10010
Problem:
Based on consent value which is for each of the phones (in 1st row: 1st N is phn_1, C for phn_2 and so on) I need to retain only that phn column and move the remaining columns to the end of the file.
The below is what I have. My approach isn't that great is what I feel. I'm trying to get the id of the individual Ns and Cs, get the id and map it with the phone (but I'm unable to iterate through the phn headers and compare the id's of the Ns and Cs)
with open('file.csv', 'rU') as infile:
reader = csv.DictReader(infile) data = {} for row in reader:
for header, value in row.items():
data.setdefault(header, list()).append(value) # print(data)
Consent = data['Consent']
for i in range(len(Consent)):
# print(list(Consent[i]))
for idx, val in enumerate(list(Consent[i])):
# print(idx, val)
if val == 'C':
#print("C")
print(idx)
else:
print("N")
Could someone provide me with the solution for this?
Please Note: Do not want the solution to be by using pandas.
You’ll find my answer in the comments of the code below.
import csv
def parse_csv(file_name):
""" """
# Prepare the output. Note that all rows of a CSV file must have the same structure.
# So it is actually not possible to put the phone numbers with no consent at the end
# of the file, but what you can do is to put them at the end of the row.
# To ensure that the structure is the same on all rows, you need to put all phone numbers
# at the end of the row. That means the phone number with consent is duplicated, and that
# is not very efficient.
# I chose to put the result in a string, but you can use other types.
output = "Acct,phn,Name,Consent,zipcode,phn_1,phn_2,phn_3\n"
with open(file_name, "r") as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# Search the letter “C” in “Consent” and get the position of the first match.
# Add one to the result because the “phn_×” keys are 1-based and not 0-based.
first_c_pos = row["Consent"].find("C") + 1
# If there is no “C”, then the “phn” key is empty.
if first_c_pos == 0:
row["phn"] = ""
# If there is at least one “C”, create a key string that will take the values
# phn_1, phn_2 or phn_3.
else:
key = f"phn_{first_c_pos}"
row["phn"] = row[key]
# Add the current row to the result string.
output += ",".join([
row["Acct"], row["phn"], row["Name"], row["Consent"],
row["zipcode"], row["phn_1"], row["phn_2"], row["phn_3"]
])
output += "\n"
# Return the string.
return(output)
if __name__ == "__main__":
output = parse_csv("file.csv")
print(output)

Trying to compare two integers in Python

Okay, I have been digging through Stackoverflow and other sites trying understand why this is not working. I created a function to open a csv file. The function opens the file once to count the number of rows then again to actually process the file. What I am attempting to do is this. Once a file has been processed and the record counts match. I will then load the data into a database. The problem is that the record counts are not matching. I checked both variables and they are both 'int', so I do not understand why '==' is not working for me. Here is the function I created:
def mktdata_import(filedir):
'''
This function is used to import market data
'''
files = []
files = filedir.glob('*.csv')
for f in files:
if fnmatch.fnmatch(f,'*NASDAQ*'):
num_rows = 0
nasObj = []
with open(f,mode='r') as nasData:
nasIn = csv.DictReader(nasData, delimiter=',')
recNum = sum(1 for _ in nasData)
with open(f,mode='r') as nasData:
nasIn = csv.DictReader(nasData, delimiter=',')
for record in nasIn:
if (recNum - 1) != num_rows:
num_rows += 1
nasObj.append(record)
elif(recNum - 1) == num_rows:
print('Add records to database')
else:
print('All files have been processed')
print('{} has this many records: {}'.format(f, num_rows))
print(type(recNum))
print(type(num_rows))
else:
print("Not a NASDAQ file!")
(moving comment to answer)
nasData includes all the rows in the file, including the header row. When converting the data to dictionaries with DictReader, only the data rows are processed so len(nasData) will always be one more than len(nasIn)
As the OP mentioned, iterating the elements did not work so using the line number was required to get the script working: (recNum) == nasIn.line_num

Is there a way to pass variable as counter to list index in python?

Sorry if i am asking very basic question but i am new to python and need help with below question
I am trying to write a file parser where i am counting number of occurrences(modified programs) mentioned in the file.
I am trying to then store all the occurrences in a empty list and putting counter for each occurrence.
Till here all is fine
Now i am trying to create files based on the names captured in the empty list and store the lines that are not matching between in separate file but i am getting error index out of range as when i am passing el[count] is taking count as string and not taking count's value.
Can some one help
import sys
import re
count =1
j=0
k=0
el=[]
f = open("change_programs.txt", 'w+')
data = open("oct-released_diff.txt",encoding='utf-8',errors='ignore')
for i in data:
if len(i.strip()) > 0 and i.strip().startswith("diff --git"):
count = count + 1
el.append(i)
fl=[]
else:
**filename = "%s.txt" % el[int (count)]**
h = open(filename, 'w+')
fl.append(i)
print(fl, file=h)
el = '\n'.join(el)
print(el, file=f)
print(filename)
data.close()

How to use 'for' loop to split values and create list of dictionaries?

Disclaimer: I am an absolute beginner in Python programming so please bear with me. I am taking a class for this and extremely desperate to get help.
I am creating a program that can read data from ANY text file which contains information like so:
Produce, Is it a fruit (Y/N)
String: "Apple","Y""Banana","Y""Pumpkin","N""Orange","Y""Eggplant","N"...
I need to convert the string to a list that will look like this:
"Apple","Y"
"Banana","Y"
"Pumpkin","N"
...
After that, I have to split/separate the values so they can fit into a dictionary that will look like this:
{"produce": xxx,"fruit": Yes/No}
For this task, I was told that I need to use the for loop to split the lines and create a list of dictionaries. But, I don't know how and where to put it.
Note that the program must be able to read data from any file. The user must also be able to modify whether the listed fruit/veg is indeed a fruit or not.
Thank you so much in advance!
I hope this is what you want...
string="apple","Y","banana","Y","pumpkin","N"
dict={}
for i in range(0,len(string),2):
dict[string[i]]=string[i+1]
for k,v in dict.items():
print(k,v)
So here I am after a lot of comments,
here is the suggested solution and this will work
x = "Apple","Y""Banana","Y""Pumpkin","N""Orange","Y""Eggplant","N"
length = len(x)
mainList = []
def split_str(s):
return [ch for ch in s]
for i in range(length):
dict = {}
if (i == 0):
dict["produce"] = x[i]
if(split_str(x[i+1])[0] == 'Y'):
dict["fruit"] = 'Yes'
else:
dict["fruit"] = 'No'
mainList.append(dict)
else:
if(i < 5):
dict["produce"] = x[i][1:]
if(split_str(x[i+1])[0] == 'Y'):
dict["fruit"] = 'Yes'
else:
dict["fruit"] = 'No'
mainList.append(dict)
print(mainList)
online fiddle link:
https://pyfiddle.io/fiddle/b3de895b-8542-419d-841a-ad7ddf008d9a/?i=true
Thank you so much for those who answered my question. I was able to run this properly using the following codes:
# Read the contents from the file first.
def get_content(filename):
f = open(filename,"r")
if f.mode == 'r':
content = f.read()
content = content.replace('"','')
return content
# Convert the contents to list of dictionaries (Y/N being a boolean).
def convert_to_list(content):
string = sorted(content.split('\n'),key=str.lower)
produce_list = []
for x in string:
a = x.split(',')
b: bool = bool('Y' in a[1])
d = dict({'produce': a[0], 'fruit':b})
restaurant_list.append(d)
return restaurant_list
I was able to complete this with help outside the site. Thank you so much for everyone's input!

How can i get IfcOpenShell for python to write with the same unicode as the file it reads?

I'm using IfcOpenshell to read an .ifc file. make some changes, then write it to a new .ifc file. But IfcOpenshell is not writing the unicode the same way as it reads it.
I'm creating a script taht adds a pset with properties to each ifcelement. the value of these properties are copied from existing properties. So basically i'm creating a pset that gathers chosen information to a single place.
This has worked great until the existing values contained unicode utf-8.
It is read and decoded to show the correct value when printed, but it does not write the unicode the same way as it reads it.
I tried changing the unicode used in PyCharm, no luck. I found simular posts elsewhere without finding a fix.
From what i've read elsewhere it has something to do with the unicode encoder/decoder IfcOpenshell use, but i cant be sure.
def mk_pset():
global param_name
global param_type
global max_row
global param_map
wb = load_workbook(b)
sheet = wb.active
max_row = sheet.max_row
max_column = sheet.max_column
param_name = []
param_type = []
param_map=[]
global pset_name
pset_name = sheet.cell(row=2, column=1).value
for pm in range(2, max_row+1):
param_name.append((sheet.cell(pm, 2)).value)
param_type.append((sheet.cell(pm, 3)).value)
param_map.append((sheet.cell(pm,4)).value)
print(param_type,' - ',len(param_type))
print(param_name,' - ',len(param_name))
create_pset()
def create_pset():
ifcfile = ifcopenshell.open(ifc_loc)
create_guid = lambda: ifcopenshell.guid.compress(uuid.uuid1().hex)
owner_history = ifcfile.by_type("IfcOwnerHistory")[0]
element = ifcfile.by_type("IfcElement")
sets = ifcfile.by_type("IfcPropertySet")
list = []
for sett in sets:
list.append(sett.Name)
myset = set(list)
global antall_parametere
global index
index = 0
antall_parametere = len(param_name)
if pset_name not in myset:
property_values = []
tot_elem = (len(element))
cur_elem = 1
for e in element:
start_time_e=time.time()
if not e.is_a() == 'IfcOpeningElement':
type_element.append(e.is_a())
for rel_e in e.IsDefinedBy:
if rel_e.is_a('IfcRelDefinesByProperties'):
if not rel_e[5][4] == None:
index = 0
while index < antall_parametere:
try:
ind1 = 0
antall_ind1 = len(rel_e[5][4])
while ind1 < antall_ind1:
if rel_e[5][4][ind1][0] == param_map[index]:
try:
if not rel_e[5][4][ind1][2]==None:
p_type = rel_e[5][4][ind1][2].is_a()
p_verdi =rel_e[5][4][ind1][2][0]
p_t=param_type[index]
property_values.append(ifcfile.createIfcPropertySingleValue(param_name[index], param_name[index],ifcfile.create_entity(p_type,p_verdi),None),)
ind1 += 1
else:
ind1 +=1
except TypeError:
pass
break
else:
ind1 += 1
except AttributeError and IndexError:
pass
index += 1
index = 0
property_set = ifcfile.createIfcPropertySet(create_guid(), owner_history, pset_name, pset_name,property_values)
ifcfile.createIfcRelDefinesByProperties(create_guid(), owner_history, None, None, [e], property_set)
ifc_loc_edit = str(ifc_loc.replace(".ifc", "_Edited.ifc"))
property_values = []
print(cur_elem, ' av ', tot_elem, ' elementer ferdig. ',int(tot_elem-cur_elem),'elementer gjenstår. Det tok ',format(time.time()-start_time_e),' sekunder')
cur_elem += 1
ifcfile.write(ifc_loc_edit)
else:
###print("Pset finnes")
sg.PopupError("Pset er allerede oprettet i modell.")
I expect p_verdi written to be equal to the p_verdi read.
Original read (D\X2\00F8\X0\r):
#2921= IFCBUILDINGELEMENTPROXYTYPE('3QPADpsq71CHeCe7e3GDm5',#32,'D\X2\00F8\X0\r',$,$,$,$,'DA64A373-DB41-C131-1A0C-A07A0340DC05',$,.NOTDEFINED.);
Written (D\X4\000000F8\X0\r):
#2921=IFCBUILDINGELEMENTPROXYTYPE('3QPADpsq71CHeCe7e3GDm5',#32,'D\X4\000000F8\X0\r',$,$,$,$,'DA64A373-DB41-C131-1A0C-A07A0340DC05',$,.NOTDEFINED.);
Decoded to "Dør"
this happens to hard spaceing also:
('2\X2\00A0\X0\090')
prints correctly as:('2 090')
gets written:
('2\X4\000000A0\X0\090')
written form is unreadable by my ifc using software.
Not so much an answere as a workaround.
After more research i found out that most IFC reading software seems to not support X4 coding, so i made a workaround with regex. Basically finding everything and replacing \X4\0000 with \X2. This has worked with all the spec chars i've encountered so far. But as stated, is just a workaround that probably wont work for everyone.
def X4trans_2(target_file,temp_fil):
from re import findall
from os import remove,rename
dec_file = target_file.replace('.ifc', '_dec.ifc')
tempname = target_file
dec_list = []
with open(temp_fil, 'r+') as r,open(dec_file, 'w', encoding='cp1252') as f:
for line in r:
findX4 = findall(r'\\X4\\0000+[\w]+\\X0\\', str(line))
if findX4:
for fx in findX4:
X4 = str(fx)
newX = str(fx).replace('\\X4\\0000', '\X2\\')
line = line.replace(str(X4), newX) # print ('Fant X4')
f.writelines(line)
remove(temp_fil)
try:
remove(target_file)
except FileNotFoundError:
pass
rename(dec_file,tempname)
It basically opens the ifc as text, find and replace X4 with X2 and writes it again.

Resources