Related
I'm new in Python and trying to get my head around on this code.
We have to import a text file named line-items.txt; excerpt of the txt are as follows including its heading:
product name quantity unit price
product a 1 10.00
product b 5 19.70
product a 3 10.00
product b 7 19.70
We need to write a code that will search for the product name and sum its quantity and unit price then the sales revenue formula would be "total unit price of the product" * "total quantity of the product"; we have to create new text file and the output should be something like this:
product name sales volume sales revenue
product a 4 40.0
product b 12 236.39999999999998
On my code below it has searched the quantity of product b which is 5 and 7 and its unit price (I did print statement to check its output but on the code below I commented the unit price for simplicity) but it's not adding the values that it has searched:
def main():
# opening file to read line-items.txt
with open("line-items.txt", "r") as line_items:
# to get the list of lines and reading the second line of the text
prod_b = 0
newtxt = line_items.readlines()[1:]
for line in newtxt:
text = line.strip().split()
product_name = text[0:2]
quantity = text[2]
unit_price = text[3]
if product_name == ['product', 'b']:
prod_b += int(quantity)
unit_price_b = float(unit_price)
# print(unit_price_b)
print(quantity)
line_items.close()
if name == 'main':
main()
The output of the code above are as follows; it's not adding 5 and 7; what am I doing wrong?
5
7
Thanks,
Rogue
While the answer provided by #JonSG is certainly more elegant. The problem with your code is quite simple and is caused by an indentation error. You need to indent the if statement under the for loop as shown below:
def main():
# opening file to read line-items.txt
with open("line-items.txt", "r") as line_items:
# to get the list of lines and reading the second line of the text
prod_b = 0
newtxt = line_items.readlines()[1:]
for line in newtxt:
text = line.strip().split()
product_name = text[0:2]
quantity = text[2]
unit_price = text[3]
if product_name == ['product', 'b']:
prod_b += int(quantity)
unit_price_b = float(unit_price)
# print(unit_price_b)
print(quantity)
line_items.close()
Using a nested collections.defaultdict makes this problem rather straightforward.
import collections
import json
results = collections.defaultdict(lambda: collections.defaultdict(float))
with open("line-items.txt", "r") as line_items:
next(line_items) ## skip first line
for row in line_items.readlines():
cells = row.split(" ")
product_name = f"{cells[0]} {cells[1]}"
quatity = int(cells[2])
price = float(cells[3])
results[product_name]["quantity"] += quatity
results[product_name]["sales volume"] += quatity * price
print(json.dumps(results, indent=4))
results in:
{
"product a": {
"quantity": 4.0,
"sales volume": 40.0
},
"product b": {
"quantity": 12.0,
"sales volume": 236.4
}
}
I am working on a project where I return the search results for a particular topic from the dataset of scraped news articles. However, for few of the inputs, I get 'IndexError: positional index out of bounds' while for others, the code works just fine. I even tried to limit the number of outputs and printed the indexes of the rows which are to be printed just to be sure that '.iloc' does not return that error but it still is happening.
Data:
Code:
'''
def search_index(c):
global input_str
c = c.lower()
c = re.sub("((\S+)?(http(S)?)(\S+))|((\S+)?(www)(\S+))|((\S+)?(\#)(\S+)?)", " ", c)
c = re.sub('[^a-zA-Z0-9\n]',' ',c)
d = list(c.split())
input_str = [word for word in d if word not in stop_words]
print(input_str)
e = OrderedDict()
f = []
temp=[]
for index,content in data.iterrows():
count = 0
points = 0
for i in input_str:
if i in (content['word_count']).keys():
#pdb.set_trace()
count += 1 # considering how many words from the input match with the content
points += content['word_count'][i] # considering the number of times those words occur in the content corpus
if len(input_str)<=3:
if count>=1:
e[index] = {'count':count,'points':points}
elif 3 < len(input_str) <=5:
if count>=2:
e[index] = {'count':count,'points':points}
elif len(input_str) > 5:
if count>=3:
e[index] = {'count':count,'points':points}
#print('\nIndex:',index,'\nContent:\n',content['Content'])
# the lambda function first sorts the dictionary based on the 'count' and then on the basis of 'points'
for key,val in sorted(e.items(), key=lambda kv: (kv[1]['count'],kv[1]['points']),reverse=True):
f.append(key)
#print(key,val)
#print(f)
#data.iloc[f,:]
print('Total number of results: ',len(f))
if len(f)>50 :
temp=f[:20]
print(temp)
print('Top 20 results:\n')
a = data.iloc[temp,[0,1,2,3]].copy()
else:
a = data.iloc[f,[0,1,2,3]].copy()
print(a)
'''
'''
def user_ask():
b = input('Enter the topic you''re interested in:')
articles = search_index(b)
print(articles)
'''
'''
user_ask()
'''
Output: For this input I am getting the required output
'''
Enter the topic youre interested in:Joe Biden
['joe', 'biden']
Total number of results: 2342
[2337, 3314, 4164, 3736, 3750, 3763, 4246, 3386, 3392, 13369, 3006, 4401,
4089, 3787, 4198, 3236, 4432, 4097, 4179, 4413]
Top 20 results:
Link \
2467 https://abcnews.go.com/Politics/rep-max-rose-c...
3471 https://abcnews.go.com/International/dalai-lam...
4343 https://abcnews.go.com/US/georgia-legislation-...
3910 https://abcnews.go.com/Politics/temperatures-c...
3924 https://abcnews.go.com/Business/cheap-fuel-pul...
3937 https://abcnews.go.com/US/puerto-ricans-demand...
4425 https://abcnews.go.com/Politics/trump-biden-is...
3543 https://abcnews.go.com/Business/record-number-...
3549 https://abcnews.go.com/US/michigan-state-stude...
17774 https://abcnews.go.com/Politics/bernie-sanders...
3152 https://abcnews.go.com/Politics/note-gop-aids-...
4583 https://abcnews.go.com/Politics/polls-show-tig...
4268 https://abcnews.go.com/International/students-...
3962 https://abcnews.go.com/Politics/heels-arizona-...
4377 https://abcnews.go.com/Politics/north-carolina...
3388 https://abcnews.go.com/Lifestyle/guy-fieri-lau...
4614 https://abcnews.go.com/Politics/persistence-he...
4276 https://abcnews.go.com/Politics/congressional-...
4358 https://abcnews.go.com/US/nursing-home-connect...
4595 https://abcnews.go.com/US/hurricane-sally-upda...
Title \
2467 Rep. Max Rose calls on Trump to up COVID-19 ai...
3471 The Dalai Lama's simple advice to navigating C...
4343 Georgia lawmakers pass bill that gives court t...
3910 Temperatures and carbon dioxide are up, regula...
3924 Has cheap fuel pulled the plug on electric veh...
3937 Puerto Ricans demand state of emergency amid r...
4425 Trump vs. Biden on the issues: Foreign policy
3543 Record number of women CEOs on this year's For...
3549 All Michigan State students asked to quarantin...
17774 Bernie Sanders, Danny Glover Attend Game 7 of ...
3152 The Note: GOP aids Trump in programming around...
4583 Trump adviser predicts Sunbelt sweep, misleads...
4268 2 students allegedly caught up in Belarus crac...
3962 On heels of Arizona Senate primary, Republican...
4377 North Carolina to be a crucial battleground st...
3388 Guy Fieri has helped raise over $22M for resta...
4614 Little girls will have to wait 4 more years, W...
4276 Congressional Black Caucus to propose policing...
4358 Nursing home in Connecticut transferring all r...
4595 Sally slams Gulf Coast with life-threatening f...
Content Category
2467 New York Rep. Max Rose joined “The View” Monda... Politics
3471 As millions of people around the world continu... International
4343 They've done their time behind bars and been o... US
3910 Every week we'll bring you some of the climate... Politics
3924 Electric vehicles have always been a tough sel... Business
3937 As Puerto Rico struggles to recover from multi... US
4425 American foreign policy for over half a centur... Politics
3543 A record high number of female CEOs are at the... Business
3549 All local Michigan State University students h... US
17774 — -- Bernie Sanders capped Memorial Day off by... Politics
3152 The TAKE with Rick Klein\nPresident Donald Tru... Politics
4583 Facing polls showing a competitive race in as ... Politics
4268 A U.S. student studying at New York’s Columbia... International
3962 What's sure to be one of the most expensive an... Politics
4377 North Carolina, home of the upcoming business ... Politics
3388 Guy Fieri should add donations to his triple d... Lifestyle
4614 Four years ago, a major political party nomina... Politics
4276 The Congressional Black Caucus is at work on a... Politics
4358 All residents at a Connecticut nursing home ar... US
4595 Sally made landfall near Gulf Shores, Alabama,... US
None
'''
For this input it is returning an error.
'''
Enter the topic youre interested in:Joe
['joe']
Total number of results: 2246
[4246, 4594, 3763, 3736, 4448, 2337, 3431, 3610, 3636, 4089, 13369, 15363,
7269, 21077, 3299, 4372, 4413, 7053, 15256, 1305]
Top 20 results:
--------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-31-ff543d46b951> in <module>
----> 1 user_ask()
<ipython-input-27-31af284a01b4> in user_ask()
4 if int(a) == 0:
5 b = input('Enter the topic you''re interested in:')
----> 6 articles = search_index(b)
7 print(articles)
8
<ipython-input-25-4a5261a1e717> in search_index(c)
50 print(temp)
51 print('Top 20 results:\n')
---> 52 a = data.iloc[temp,[0,1,2,3]].copy()
53 else:
54 a = data.iloc[f,[0,1,2,3]].copy()
c:\users\henis\appdata\local\programs\python\python37\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1759 except (KeyError, IndexError, AttributeError):
1760 pass
-> 1761 return self._getitem_tuple(key)
1762 else:
1763 # we by definition only have the 0th axis
c:\users\henis\appdata\local\programs\python\python37\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
2064 def _getitem_tuple(self, tup: Tuple):
2065
-> 2066 self._has_valid_tuple(tup)
2067 try:
2068 return self._getitem_lowerdim(tup)
c:\users\henis\appdata\local\programs\python\python37\lib\site-packages\pandas\core\indexing.py in _has_valid_tuple(self, key)
700 raise IndexingError("Too many indexers")
701 try:
--> 702 self._validate_key(k, i)
703 except ValueError:
704 raise ValueError(
c:\users\henis\appdata\local\programs\python\python37\lib\site-packages\pandas\core\indexing.py in _validate_key(self, key, axis)
2006 # check that the key does not exceed the maximum size of the index
2007 if len(arr) and (arr.max() >= len_axis or arr.min() < -len_axis):
-> 2008 raise IndexError("positional indexers are out-of-bounds")
2009 else:
2010 raise ValueError(f"Can only index by location with a
[{self._valid_types}]")
IndexError: positional indexers are out-of-bounds
'''
trying to make a bill
price = {'sugar' : 45,'rice': 60,'tealeaves':450,'wheat':40,'oil':100};
ordered = {'sugar':2,'rice': 3,'tealeaves':0.5,'wheat':4,'oil':1}
total = list()
for k,v in price:
value = price[k]*kgsordered[k]
print (k,':',value)
total.append(value)
print('*'*4,'CG Grocery Store','*'*4)
print('Your final bill is ₹',total.sum())
print('Thank you for shopping with us!!')
traceback coming
Traceback (most recent call last):
File "C:\Users\user\Desktop\My Python Files\curiosity gym python
HW.py", line 4, in
for k,v in price: ValueError: too many values to unpack (expected 2)
Firstly, you have to use .items() to iterate through a dictionary.
Secondly, you were using kgsordered[k] instead of ordered[k], which gives you an error, since kgsordered isn't defined.
And in the end, in you want to calculate the sum of all the elements in a list, you do it by doing sum(total), where total is your list
price = {'sugar' : 45,'rice': 60,'tealeaves':450,'wheat':40,'oil':100};
ordered = {'sugar':2,'rice': 3,'tealeaves':0.5,'wheat':4,'oil':1}
total = list()
for k,v in price.items():
value = price[k]*ordered[k]
print (k,':',value)
total.append(value)
print('*'*4,'CG Grocery Store','*'*4)
print('Your final bill is ₹',sum(total))
print('Thank you for shopping with us!!')
# output
sugar : 90
rice : 180
tealeaves : 225.0
wheat : 160
oil : 100
**** CG Grocery Store ****
Your final bill is ₹ 755.0
Thank you for shopping with us!!
Goal
Apply deid_notes function to df
Background
I have a df that resembles this sample df
import pandas as pd
df = pd.DataFrame({'Text' : ['there are many different types of crayons',
'i like a lot of sports cares',
'the middle east has many camels '],
'P_ID': [1,2,3],
'Word' : ['crayons', 'cars', 'camels'],
'P_Name' : ['John', 'Mary', 'Jacob'],
'N_ID' : ['A1', 'A2', 'A3']
})
#rearrange columns
df = df[['Text','N_ID', 'P_ID', 'P_Name', 'Word']]
df
Text N_ID P_ID P_Name Word
0 many types of crayons A1 1 John crayons
1 i like sports cars A2 2 Mary cars
2 has many camels A3 3 Jacob camels
I use the following function to deidentify certain words within the Text column using NeuroNER http://neuroner.com/
def deid_notes(text):
#use predict function from neuorNER to tag words to be deidentified
ner_list = n1.predict(text)
#n1.predict wont work in this toy example because neuroNER package needs to be installed (and installation is difficult)
#but the output resembles this: [{'start': 1, 'end:' 11, 'id': 1, 'tagged word': crayon}]
#use start and end position of tagged words to deidentify and replace with **BLOCK**
if len(ner_list) > 0:
parts_to_take = [(0, ner_list[0]['start'])] + [(first["end"]+1, second["start"]) for first, second in zip(ner_list, ner_list[1:])] + [(ner_list[-1]['end'], len(text)-1)]
parts = [text[start:end] for start, end in parts_to_take]
deid = '**BLOCK**'.join(parts)
#if n1.predict does not identify any words to be deidentified, place NaN
else:
deid='NaN'
return pd.Series(deid, index='Deid')
Problem
I apply the deid_notes function to my df using the following code
fx = lambda x: deid_notes(x.Text,axis=1)
df.join(df.apply(fx))
But I get the following error
AttributeError: ("'Series' object has no attribute 'Text'", 'occurred at index Text')
Question
How do I get the deid_notes function to work on my df?
Assuming you are returning a pandas series as output from deid_notes function taking text as the only input argument. Pass the axis = 1 argument to the apply instead of died_notes. For eg.
# Dummy function
def deid_notes(text):
deid = 'prediction to: ' + text
return pd.Series(deid, index = ['Deid'])
fx = lambda x: deid_notes(x.Text)
df.join(df.apply(fx, axis =1))
I try to select specific fields from my Qdata.txt file and use field[2] to calculate average for every years separate. My code give only total average.
data file looks like: (1. day of year: 101 and last: 1231)
Date 3700300 6701500
20000101 21.00 223.00
20000102 20.00 218.00
. .
20001231 7.40 104.00
20010101 6.70 104.00
. .
20130101 8.37 111.63
. .
20131231 45.00 120.98
import sys
td=open("Qdata.txt","r") # open file Qdata
total=0
count=0
row1=True
for row in td :
if (row1) :
row1=False # row1 is for topic
else:
fields=row.split()
try:
total=total+float(fields[2])
count=count+1
# Errors.
except IndexError:
continue
except ValueError:
print("File is incorrect.")
sys.exit()
print("Average in 2000 was: ",total/count)
You could use itertools.groupby using the first four characters as the key for grouping.
with open("data.txt") as f:
next(f) # skip first line
groups = itertools.groupby(f, key=lambda s: s[:4])
for k, g in groups:
print(k, [s.split() for s in g])
This gives you the entries grouped by year, for further processing.
Output for your example data:
2000 [['20000101', '21.00', '223.00'], ['20000102', '20.00', '218.00'], ['20001231', '7.40', '104.00']]
2001 [['20010101', '6.70', '104.00']]
2013 [['20130101', '8.37', '111.63'], ['20131231', '45.00', '120.98']]
You could create a dict (or even a defaultdict) for total and count instead:
import sys
from collections import defaultdict
td=open("Qdata.txt","r") # open file Qdata
total=defaultdict(float)
count=defaultdict(int)
row1=True
for row in td :
if (row1) :
row1=False # row1 is for topic
else:
fields=row.split()
try:
year = int(fields[0][:4])
total[year] += float(fields[2])
count[year] += 1
# Errors.
except IndexError:
continue
except ValueError:
print("File is incorrect.")
sys.exit()
print("Average in 2000 was: ",total[2000]/count[2000])
Every year separate? You have to divide your input into groups, something like this might be what you want:
from collections import defaultdict
row1 = True
year_sums = defaultdict(list)
for row in td:
if row1:
row1 = False
continue
fields = row.split()
year = fields[0][:4]
year_sums[year].append(float(fields[2]))
for year in year_sums:
avarage = sum(year_sums[year])/count(year_sums[year])
print("Avarage in {} was: {}".format(year, avarage)
That is just some example code, I don't know if it works for sure, but should give you an idea what you can do. year_sums is a defaultdict containing lists of values grouped by years. You can then use it for other statistics if you want.