IndexError file: list index out of range( arithmetic mean) - python-3.x

A file is given, there are 4 columns in it, (index, age, height (inches), weight (pounds)) the file length is 20000+ the task is the following, i need to add the weight and height, and get the average value (I also tried to convert inches and pounds to cm and kg) I added the column and divided by the number, but I get this error mean + = float (line [2]) IndexError file: list index out of range
def data_scv():
with open('data.csv') as csv_file:
csv_reader = csv.reader(csv_file)
next(csv_reader)
mean = 0
n = 0
for line in csv_reader:
mean += float(line[2])
n += 1
ave_height = str((mean / n) * 2.54)
for line in csv_reader:
mean += float(line[3])
n += 1
ave_weight = str((mean / n) / 2.205)
return f'Average Height(Centimeters) is - {ave_height} <br> Average Height(Kilograms) is - {ave_weight}'

If I understand this correctly (python isn't exactly my expertise) your issue is that you're trying to start your index in the position 1. Python's first position is position 0.

Related

How to generate all possible binary strings from a clue effectively?

I can get a clue in the form of a list (e.g. [1,3,1]) and the length of the string (e.g. 8) and I generate all possible strings given the clue. That is:
01011101
10111010
10111001
10011101
Having three groups of 1s separated by one or more 0s of length given by the clue (in that order).
The clue specifies lengths of groups of 1s separated by at least one 0. The order of these groups must follow the order in the clue list.
My approach would be to use recursion, where each call tries to insert a specific group of 1s in the string (in the order of the clue list). It uses a for-loop to place it in all possible indices of the string and recursively calls itself for each of these placements with a clue = clue[1:] and size = size - clue[0].
How can I do that effectively in Python?
I would just use combinations_with_replacement to generate all possible combinations and build your answers that way.
from itertools import combinations_with_replacement
from collections import Counter
def generate_answers(clue, length):
segs = len(clue) + 1 # segment indices that a zero can be placed
excess_zeros = length - sum(clue) - (segs - 2) # number of zeros that can be moved around
for comb in combinations_with_replacement(range(segs), excess_zeros):
count = Counter(comb) # number of zeros to be added to each segment
for i in range(1, len(clue)):
count[i] += 1 # add the zeros required to separate the ones
output = ''
for i in range(segs): # build string
output += '0' * count[i]
if i < len(clue):
output += '1' * clue[i]
print(output)
clue = [1, 3, 1]
length = 8
generate_answers(clue, length)
Output:
'01011101'
'10011101'
'10111001'
'10111010'
This is another way to do it (recursively) without external libraries:
def generate_answers(clue, size):
if len(clue) == 0:
return [[0 for _ in range(size)]]
groups = gen_groups(clue)
min_len = sum(clue) + len(clue) - 1
free_spaces = size - min_len
result = recursive_combinations(len(groups) + 1, free_spaces)
solution = []
for res in result:
in_progress = []
for i in range(len(groups)):
in_progress += [0 for _ in range(res[i])]
in_progress += groups[i]
in_progress += [0 for _ in range(res[-1])]
solution.append(in_progress)
return solution
def gen_groups(clue):
result = []
for elem in clue:
in_progress = []
for i in range(elem):
in_progress.append(1)
in_progress.append(0)
result.append(in_progress)
if len(result) > 0:
result[-1].pop()
return result
def recursive_combinations(fields, zeroes):
if fields <= 0 or zeroes< 0:
return []
if fields == 1:
return [[zeroes]]
solution = []
for i in range(zeroes+ 1):
result = recursive_combinations(fields - 1, zeroes- i)
solution += [[i] + res for res in result]
return solution

Problem in the function of my program code python

I tried to make a program to do the below things but apparently, the function doesn't work. I want my function to take two or more arguments and give me the average and median and the maximum number of those arguments.
example input:
calc([2, 20])
example output : (11.0, 11.0, 20)
def calc():
total = 0
calc = sorted(calc)
for x in range(len(calc)):
total += int(calc[x])
average = total / len(calc)
sorted(calc)
maximum = calc[len(calc) - 1]
if len(calc) % 2 != 0:
median = calc[(len(calc) // 2) + 1]
else:
median = (float(calc[(len(calc) // 2) - 1]) + float(calc[(len(calc) // 2)])) / 2
return (average, median, maximum)
There are some things I'm going to fix as I go since I can't help myself.
First, you main problem is arguments.
If you hand a function arguments
calc([2, 20])
It needs to accept arguments.
def calc(some_argument):
This will fix your main problem but another thing is you shouldn't have identical names for your variables.
calc is your function name so it should not also be the name of your list within your function.
# changed the arg name to lst
def calc(lst):
lst = sorted(lst)
# I'm going to just set these as variables since
# you're doing the calculations more than once
# it adds a lot of noise to your lines
size = len(lst)
mid = size // 2
total = 0
# in python we can just iterate over a list directly
# without indexing into it
# and python will unpack the variable into x
for x in lst:
total += int(x)
average = total / size
# we can get the last element in a list like so
maximum = lst[-1]
if size % 2 != 0:
# this was a logical error
# the actual element you want is mid
# since indexes start at 0
median = lst[mid]
else:
# here there is no reason to explicity cast to float
# since python division does that automatically
median = (lst[mid - 1] + lst[mid]) / 2
return (average, median, maximum)
print(calc([11.0, 11.0, 20]))
Output:
(14.0, 11.0, 20)
Because you are passing arguments into a function that doesn't accept any, you are getting an error. You could fix this just by making the first line of your program:
def calc(calc):
But it would be better to accept inputs into your function as something like "mylist". To do so you would just have to change your function like so:
def calc(mylist):
calc=sorted(mylist)

Python: Fastest way to subtract elements of datasets of HDF5 file?

Here is one interesting problem.
Input: Input is two arrays (Nx4, sorted in column-2) stored in datasets-1 and 2 in HDF5 file (input.h5). N is huge (originally belonging to 10 GB of file, hence stored in HDF5 file).
Output: Subtracting each column-2 element of dataset-2 from dataset-1, such that the difference (delta) is between +/-4000. Eventually saving this info in dset of a new HDF5 file. I need to refer to this new file back-and-forth, hence HDF5 not a text file.
Concern: I initially used .append method but that crashed the execution for 10GBs input. So, I am now using dset.resize method (and would like to stick to it preferably). I am also using binary search as I was told in one of my last posts. So now, although the script seems to be working for large (10 GBs) of datasets, it is quite slow! The subtraction (for/while) loop is possibly the culprit! Any suggestions on how I can make this fast? I aim to use the fastest approach (and possibly the simplest, since I am a beginner).
import numpy as np
import time
import h5py
import sys
import csv
f_r = h5py.File('input.h5', 'r+')
dset1 = f_r.get('dataset_1')
dset2 = f_r.get('dataset_2')
r1,c1 = dset1.shape
r2,c2 = dset2.shape
left, right, count = 0,0,0
W = 4000 # Window half-width
n = 1
# **********************************************
# HDF5 Out Creation
# **********************************************
f_w = h5py.File('data.h5', 'w')
d1 = np.zeros(shape=(0, 4))
dset = f_w.create_dataset('dataset_1', data=d1, maxshape=(None, None), chunks=True)
for j in range(r1):
e1 = dset1[j,1]
# move left pointer so that is within -delta of e
while left < r2 and dset2[left,1] - e1 <= -W:
left += 1
# move right pointer so that is outside of +delta
while right < r2 and dset2[right,1] - e1 <= W:
right += 1
for i in range(left, right):
delta = e1 - dset2[i,1]
dset.resize(dset.shape[0] + n, axis=0)
dset[count, 0:4] = [count, dset1[j,1], dset2[i,1], delta]
count += 1
print("\nFinal shape of dataset created: " + str(dset.shape))
f_w.close()
EDIT (Aug 8, chunking HDF5 file as suggested by #kcw78)
#kcw78: So, I tried chunking as well. The following works well for small files (<100MB) but the computation time increases incredibly when I play with GBs of data. Can something be improvised in my code to make it fast?
My suspicion is for j loop is computationally expensive and may be the reason, any suggestions ?
filename = 'file.h5'
with h5py.File(filename, 'r') as fid:
chunks1 = fid["dataset_1"][:, :]
with h5py.File(filename, 'r') as fid:
chunks2 = fid["dataset_2"][:, :]
print(chunks1.shape, chunks2.shape) # shape is (13900,4) and (138676,4)
count = 0
W = 4000 # Window half-width
# **********************************************
# HDF5-Out Creation
# **********************************************
f_w = h5py.File('data.h5', 'w')
d1 = np.zeros(shape=(0, 4))
dset = f_w.create_dataset('dataset_1', data=d1, maxshape=(None, None), chunks=True)
# chunk size to read from first/second dataset
size1 = 34850
size2 = 34669
# save "n" no. of subtracted values in dset
n = 10**4
u = 0
fill_index = 0
for c in range(4): # read 4 chunks of dataset-1 one-by-one
h = c * size1
chunk1 = chunks1[h:(h + size1)]
for d in range(4): # read chunks of dataset-2
g = d * size2
chunk2 = chunks2[g:(g + size2)]
r2 = chunk2.shape[0]
left, right = 0, 0
for j in range(chunk1.shape[0]): # grab col.2 values from dataset-1
e1 = chunk1[j, 1]
while left < r2 and chunk2[left, 1] - e1 <= -W:
left += 1
# move right pointer so that is outside of +delta
while right < r2 and chunk2[right, 1] - e1 <= W:
right += 1
for i in range(left, right):
if chunk1[j, 0]<8193 and chunk2[i, 0] <8193:
e2 = chunk2[i, 1]
delta = e1 - e2 # subtract col.2 values
count += 1
if fill_index == (n):
dset.resize(dset.shape[0] + n, axis=0)
dset[u:(u + n), 0:4] = [count, e1, e1, delta]
u = u * n
fill_index = 0
fill_index += 1
del chunk2
del chunk1
f_w.close()
print(count) # these are (no. of) subtracted values such that the difference is between +/- 4000
EDIT (Jul 31)
I tried reading in chunks and even using memory mapping. It is efficient if I do not perform any subtraction and just go through the chunks. The for j in range(m): is the one that is inefficient; probably because I am grabbing each value of the chunk from file-1. This is when I am just subtracting and not saving the difference. Any better logic/implementation you can think of that can be replaced for "for j in range(m):?
size1 = 100_000_0
size2 = 100_000_0
filename = ["file-1.txt", "file-2.txt"]
chunks1 = pd.read_csv(filename[0], chunksize=size1,
names=['c1', 'c2', 'lt', 'rt'])
fp1 = np.memmap('newfile1.dat', dtype='float64', mode='w+', shape=(size1,4))
fp2 = np.memmap('newfile2.dat', dtype='float64', mode='w+', shape=(size2,4))
for chunk1 in chunks1: # grab chunks from file-1
m, _ = chunk1.shape
fp1[0:m,:] = chunk1
chunks2 = pd.read_csv(filename[1], chunksize=size2,
names=['ch', 'tmstp', 'lt', 'rt'])
for chunk2 in chunks2: # grab chunks from file-2
k, _ = chunk2.shape
fp2[0:k, :] = chunk2
for j in range(m): # Grabbing values from file-1's chunk
e1 = fp1[j,1]
delta_mat = e1 - fp2 # just a test, actually e1 should be subtracted from col-2 of fp2, not the whole fp2
count += 1
fp2.flush()
a += k
fp1.flush()
del chunks2
i += m
prog_count += m

Trouble understanding modulus on a negative in 'Conways Game of Life'

I'm going over the book 'automate the boring stuff with python' and cannot understanding a simple expression with the % operator. The expression is leftCoord = (x - 1) % WIDTH which on the first iteration of the loop evaluates to (0 - 1) % 60. In my mind the % operator should evaluate to the remainder of a division. Why does it evaluate to 9?
This is the part of the program that precedes the expression in question:
import random,time,copy
WIDTH = 60
HEIGHT = 20
# Create a list of list for the cells:
nextCells = []
for x in range(WIDTH):
column = [] # Create a new column.
for y in range(HEIGHT):
if random.randint(0,1) == 0:
column.append('#') # Add a living cell.
else:
column.append(' ') # Add a dead cell.
nextCells.append(column) # nextCells is a list of column lists.
while True: # Main program loop.
print('\n\n\n\n\n') # Separate each step with newlines.
currentCells = copy.deepcopy(nextCells)
# Print currentCells on the screen:
for y in range(HEIGHT):
for x in range(WIDTH):
print(currentCells[x][y], end='') # Print the # or space.
print() # Print a newline at the end of the row.
# Calculate the next step's cells based on current step's cells:
for x in range(WIDTH):
for y in range(HEIGHT):
# Get neighboring coordinates:
# % WIDTH ensures leftCoord is always between 0 and WIDTH -1
leftCoord = (x - 1) % WIDTH
rightCoord = (x + 1) % WIDTH
aboveCoord = (y - 1) % HEIGHT
belowCoord = (y + 1) % HEIGHT
For the sake of example, let's assume that you're using a table of 10x10.
The % operator isn't so intuitive when the first number is smaller than the second. Try going into the interactive python shell and running 4 % 10. Try 8 % 10. Notice how you always get the same number back? That's because the answer to the division is 0... with your whole number being left over as remainder. For most numbers in the table, the modulus doesn't do anything at all.
Now try -1 % 10 (simulating what this would do for the top row). It gives you 9, indicating the bottom row. If you run 10 % 10 (simulating the bottom row), it gives you 0, indicating the top row. Effectively, this makes the table "wrap"... the cells in the top row affect the bottom and vice versa. It also wraps around the sides.
Hope this helps!

Getting Rid of Needless Arguments

import random
def makeTable(f, v):
f = random.randint(1, 7)
v = random.randint(f, 10)
table = [[0 for x in range(v)] for y in range(f)]
for i in range(f):
for j in range(v):
table[i][j] = random.randint(1, 100)
return table
def printTable(table):
# print the table (helpful for visualizing)
for i in range(len(table)):
print("\t".join(str(a) for a in table[i]))
def calculateSums(startRow, startColumn, initialSum):
if startRow == f - 1:
# if the last row is reached add each of the coming values
# to the variable initialSum and append to the list totalSums.
for k in range(startColumn, v):
totalSums.append(initialSum + table[f - 1][k])
else:
while startColumn <= (v - f + startRow):
# the constraint (v-f+startRow) determines the limit
# how far each row can go to the right.
# e.g. in a 3*5 table the limit for the first row is the third column,
# the limit for the 2nd row is the fourth column, etc.
initialSum += table[startRow][startColumn]
calculateSums(startRow + 1, startColumn + 1, initialSum)
initialSum -= table[startRow][startColumn]
startColumn += 1
return max(totalSums)
f = random.randint(1, 7)
v = random.randint(f, 10)
# generate the number of flowers, f, and number of vases, v.
table = makeTable(f, v)
print('Number of flowers: ', f)
print('Number of vases: ', v)
printTable(table)
totalSums = []
print('Maximum sum is: ', calculateSums(0, 0, 0))
(I have a feeling that this is a stupid question, and possibly asked already, but I do not know how to search for it.)
In the above code calculateSums function will always receive the same arguments, thus it seems silly to pass them in. Yet I could not figure out how to manage them inside the function. What am I supposed to do?
You could use default parameters:
def calculateSums(startRow=0, startColumn=0, initialSum=0):
You can then make the initial call without parameters:
print('Maximum sum is: ', calculateSums())
However, be careful when using expressions in default arguments. These are calculated when the function is defined, not when it’s called.

Resources