Python 3 script uses too much memory - python-3.x

As homework for IT lessons I need to write a script which will check for the highest power of 4 which is in modified input number, but I can use only 8MB of RAM. I used for this logarithmic function, so my code looks like this:
from math import log, floor
n = int(input())
numbers = []
for i in range (0, n):
numbers.append(floor(int(input()) / 10))
for i in numbers:
print(4 ** floor(log(i, 4)))
But I checked this script on my PC and it uses more than 8MB!
Partition of a set of 74690 objects. Total size = 8423721 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 23305 31 2100404 25 2100404 25 str
1 19322 26 1450248 17 3550652 42 tuple
2 5017 7 724648 9 4275300 51 types.CodeType
3 9953 13 716915 9 4992215 59 bytes
4 742 1 632536 8 5624751 67 type
5 4618 6 628048 7 6252799 74 function
6 742 1 405720 5 6658519 79 dict of type
7 187 0 323112 4 6981631 83 dict of module
8 612 1 278720 3 7260351 86 dict (no owner)
9 63 0 107296 1 7367647 87 set
<197 more rows. Type e.g. '_.more' to view.>
On my phone, however, this script uses only 2.5MB:
Partition of a set of 35586 objects. Total size = 2435735 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 9831 28 649462 27 649462 27 str
1 9014 25 365572 15 1015034 42 tuple
2 4669 13 261232 11 1276266 52 bytes
3 2357 7 198684 8 1474950 61 types.CodeType
4 436 1 166276 7 1641226 67 type
5 2156 6 155232 6 1796458 74 function
6 436 1 130836 5 1927294 79 dict of type
7 93 0 87384 4 2014678 83 dict of module
8 237 1 62280 3 2076958 85 dict (no owner) 9 1091 3 48004 2 2124962 87 types.WrapperDescriptorType
<115 more rows. Type e.g. '_.more' to view.>
I tried changing list to tuple, but it didn't make any difference.
Is there any possibility to decrease/limit RAM usage?

Related

Creating a list from series of pandas

Click here for the imageI m trying to create a list from 3 different series which will be of the shape "({A} {B} {C})" where A denotes the 1st element from series 1, B is for 1st element from series 2, C is for 1st element from series 3 and this way it should create a list containing 600 element.
List 1 List 2 List 3
u_p0 1 v_p0 2 w_p0 7
u_p1 21 v_p1 11 w_p1 45
u_p2 32 v_p2 25 w_p2 32
u_p3 45 v_p3 76 w_p3 49
... .... ....
u_p599 56 v_p599 78 w_599 98
Now I want the output list as follows
(1 2 7)
(21 11 45)
(32 25 32)
(45 76 49)
.....
These are the 3 series I created from a dataframe
r1=turb_1.iloc[qw1] #List1
r2=turb_1.iloc[qw2] #List2
r3=turb_1.iloc[qw3] #List3
Pic of the seriesFor the output I think formatted string python method will be useful but I m quite not sure how to proceed.
turb_3= ["({A} {B} {C})".format(A=i,B=j,C=k) for i in r1 for j in r2 for k in r3]
Any kind of help will be useful.
Use pandas.DataFrame.itertuples with str.format:
# Sample data
print(df)
col1 col2 col3
0 1 2 7
1 21 11 45
2 32 25 32
3 45 76 49
fmt = "({} {} {})"
[fmt.format(*tup) for tup in df[["col1", "col2", "col3"]].itertuples(False, None)]
Output:
['(1 2 7)', '(21 11 45)', '(32 25 32)', '(45 76 49)']

Most frequently occurring numbers across multiple columns using pandas

I have a data frame with numbers in multiple columns listed by date, what I'm trying to do is find out the most frequently occurring numbers across the whole data set, also grouped by date.
import pandas as pd
import glob
def lotnorm(pdobject) :
# clean up special characters in the column names and make the date column the index as a date type.
pdobject["Date"] = pd.to_datetime(pdobject["Date"])
pdobject = pdobject.set_index('Date')
for column in pdobject:
if '#' in column:
pdobject = pdobject.rename(columns={column:column.replace('#','')})
return pdobject
def lotimport() :
lotret = {}
# list files in data directory with csv filename
for lotpath in [f for f in glob.glob("data/*.csv")]:
lotname = lotpath.split('\\')[1].split('.')[0]
lotret[lotname] = lotnorm(pd.read_csv(lotpath))
return lotret
print(lotimport()['ozlotto'])
------------- Output ---------------------
1 2 3 4 5 6 7 8 9
Date
2020-07-07 4 5 7 9 12 13 32 19 35
2020-06-30 1 17 26 28 38 39 44 14 41
2020-06-23 1 3 9 13 17 20 41 28 45
2020-06-16 1 2 13 21 22 27 38 24 33
2020-06-09 8 11 26 27 31 38 39 3 36
... .. .. .. .. .. .. .. .. ..
2005-11-15 7 10 13 17 30 32 41 20 14
2005-11-08 12 18 22 28 33 43 45 23 13
2005-11-01 1 3 11 17 24 34 43 39 4
2005-10-25 7 16 23 29 36 39 42 19 43
2005-10-18 5 9 12 30 33 39 45 7 19
The output I am aiming for is
Number frequency
45 201
32 195
24 187
14 160
48 154
--------------- Updated with append experiment -----------
I tried using append to create a single series from the dataframe, which worked for individual lines of code but got a really odd result when I ran it inside a for loop.
temp = lotimport()['ozlotto']['1']
print(temp)
temp = temp.append(lotimport()['ozlotto']['2'], ignore_index=True, verify_integrity=True)
print(temp)
temp = temp.append(lotimport()['ozlotto']['3'], ignore_index=True, verify_integrity=True)
print(temp)
lotcomb = pd.DataFrame()
for i in (lotimport()['ozlotto'].columns.tolist()):
print(f"{i} - {type(i)}")
lotcomb = lotcomb.append(lotimport()['ozlotto'][i], ignore_index=True, verify_integrity=True)
print(lotcomb)
This solution might be the one you are looking for.
freqvalues = np.unique(df.to_numpy(), return_counts=True)
df2 = pd.DataFrame(index=freqvalues[0], data=freqvalues[1], columns=["Frequency"])
df2.index.name = "Numbers"
df2
Output:
Frequency
Numbers
1 6
2 5
3 5
5 8
6 4
7 7
8 2
9 7
10 3
11 4
12 2
13 8
14 1
15 4
16 4
17 6
18 4
19 5
20 9
21 3
22 4
23 2
24 4
25 5
26 4
27 6
28 1
29 6
30 3
31 3
... ...
70 6
71 6
72 5
73 5
74 2
75 8
76 5
77 3
78 3
79 2
80 3
81 4
82 6
83 9
84 5
85 4
86 1
87 3
88 4
89 3
90 4
91 4
92 3
93 5
94 1
95 4
96 6
97 6
98 1
99 6
97 rows × 1 columns
df.max(axis=0)
for columns
df.max(axis=1)
for index
Ok so the final answer I came up with was a mix of a few things including some of the great input from people in this thread. Essentially I do the following:
Pull in the CSV file and clean up the dates and the column names, then convert it to a pandas dataframe.
Then create a new pandas series and append each column to it ignoring dates to prevent conflicts.
Once I have the series, I use Vioxini's suggestion to use numpy to get counts of unique values and then turn the values into the index, after that sort the column by count in descending order and return the top 10 values.
Below is the resulting code, I hope it helps someone else.
import pandas as pd
import glob
import numpy as np
def lotnorm(pdobject) :
# clean up special characters in the column names and make the date column the index as a date type.
pdobject["Date"] = pd.to_datetime(pdobject["Date"])
pdobject = pdobject.set_index('Date')
for column in pdobject:
if '#' in column:
pdobject = pdobject.rename(columns={column:column.replace('#','')})
return pdobject
def lotimport() :
lotret = {}
# list files in data directory with csv filename
for lotpath in [f for f in glob.glob("data/*.csv")]:
lotname = lotpath.split('\\')[1].split('.')[0]
lotret[lotname] = lotnorm(pd.read_csv(lotpath))
return lotret
lotcomb = pd.Series([],dtype=object)
for i in (lotimport()['ozlotto'].columns.tolist()):
lotcomb = lotcomb.append(lotimport()['ozlotto'][i], ignore_index=True, verify_integrity=True)
freqvalues = np.unique(lotcomb.to_numpy(), return_counts=True)
lotop = pd.DataFrame(index=freqvalues[0], data=freqvalues[1], columns=["Frequency"])
lotop.index.name = "Numbers"
lotop.sort_values(by=['Frequency'],ascending=False).head(10)

When using min() - ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() [duplicate]

How can I reference the minimum value of two dataframes as part of a pandas dataframe equation? I tried using the python min() function which did not work. I'm sorry if this is well-documented somewhere but I have not been able to find a working solution for this problem. I am looking for something along the lines of this:
data['eff'] = pd.DataFrame([data['flow_h'], data['flow_c']]).min() *Cp* (data[' Thi'] - data[' Tci'])
I also tried to use pandas min() function, which is also not working.
min_flow = pd.DataFrame([data['flow_h'], data['flow_c']]).min()
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
I was confused by this error. The data columns are just numbers and a name, I wasn't sure where the index comes into play.
import pandas as pd
import numpy as np
np.random.seed(365)
rows = 10
flow = {'flow_c': [np.random.randint(100) for _ in range(rows)],
'flow_d': [np.random.randint(100) for _ in range(rows)],
'flow_h': [np.random.randint(100) for _ in range(rows)]}
data = pd.DataFrame(flow)
# display(data)
flow_c flow_d flow_h
0 82 36 43
1 52 48 12
2 33 28 77
3 91 99 11
4 44 95 27
5 5 94 64
6 98 3 88
7 73 39 92
8 26 39 62
9 56 74 50
If you are trying to get the row-wise mininum of two or more columns, use pandas.DataFrame.min. Note that by default axis=0; specifying axis=1 is necessary.
data['min_c_h'] = data[['flow_h','flow_c']].min(axis=1)
# display(data)
flow_c flow_d flow_h min_c_h
0 82 36 43 43
1 52 48 12 12
2 33 28 77 33
3 91 99 11 11
4 44 95 27 27
5 5 94 64 5
6 98 3 88 88
7 73 39 92 73
8 26 39 62 26
9 56 74 50 50
If you like to get a single minimum value of multiple columns:
data[['flow_h','flow_c']].min().min()
the first "min()" calculates the minimum per column and returns a pandas series. The second "min" returns the minimum of the minimums per column.

Efficient way to perform iterative subtraction and division operations on pandas columns

I have a following dataframe-
A B C Result
0 232 120 9 91
1 243 546 1 12
2 12 120 5 53
I want to perform the operation of the following kind-
A B C Result A-B/A+B A-C/A+C B-C/B+C
0 232 120 9 91 0.318182 0.925311 0.860465
1 243 546 1 12 -0.384030 0.991803 0.996344
2 12 120 5 53 -0.818182 0.411765 0.920000
which I am doing using
df['A-B/A+B']=(df['A']-df['B'])/(df['A']+df['B'])
df['A-C/A+C']=(df['A']-df['C'])/(df['A']+df['C'])
df['B-C/B+C']=(df['B']-df['C'])/(df['B']+df['C'])
which I believe is a very crude and ugly way to do.
How to do it in a more correct way?
You can do the following:
# take columns in a list except the last column
colnames = df.columns.tolist()[:-1]
# compute
for i, c in enumerate(colnames):
if i != len(colnames):
for k in range(i+1, len(colnames)):
df[c + '_' + colnames[k]] = (df[c] - df[colnames[k]]) / (df[c] + df[colnames[k]])
# check result
print(df)
A B C Result A_B A_C B_C
0 232 120 9 91 0.318182 0.925311 0.860465
1 243 546 1 12 -0.384030 0.991803 0.996344
2 12 120 5 53 -0.818182 0.411765 0.920000
This is a perfect case to use DataFrame.eval:
cols = ['A-B/A+B','A-C/A+C','B-C/B+C']
x = pd.DataFrame([df.eval(col).values for col in cols], columns=cols)
df.assign(**x)
A B C Result A-B/A+B A-C/A+C B-C/B+C
0 232 120 9 91 351.482759 786.753086 122.000000
1 243 546 1 12 240.961207 243.995885 16.583333
2 12 120 5 53 128.925000 546.998168 124.958333
The advantage of this method respect to the other solution, is that it does not depend on the order of the operation sings that appear as column names, but rather as mentioned in the documentation it is used to:
Evaluate a string describing operations on DataFrame columns.

Find total no of links to and from node based on data in csv

I have a csv with the following info
Src Rx LinkId Weight
===================================
2 1 4000 10
2 1 4056 15
3 1 4100 10
3 1 4156 15
28 1 10650 8
113 2 15051 205
113 3 15058 205
1 4 3952 9
1 4 3951 5
1 4 3950 34
2 4 4052 9
47 4 18672 44
47 4 18670 38
69 4 4701 11
69 4 4700 21
70 4 4801 11
`
The linkId is unique. Each row represents the link between two devices. For example, source 2 and rx 1 means that a link goes from 2 to 1.
I intend to compute the total weight of all the links originating from each device and coming into each device like so:
Device Out weight In weight
=============================
2 25 205
1 48 58
and so on.
I would like to know if doing this is possible in excel. If yes, how.
Using a pivot table may be the best solution here and I think that if you select this table and click pivot-table it will give you your answer.
Alternatively, you can make a column for each in and out and use =sumif(Src, 1, weight ) and then use the totals at the bottom of each column.

Resources