I want to measure a processing time of a part of my code and I used timeit function for the purpose. However it returns IndentationError from inside of the timeit function.
Here is my code;
for stem, result in zip(stem_dirs, result_dirs):
code_to_measure = '''
print(stem, '\n', result)
subprocess.call(['python', './a.py', "--dir_in", stem, "--dir_out", result])
'''
proccess_time = timeit.timeit(code_to_measure)
print(proccess_time)
Here is the error I get;
Traceback (most recent call last):
File "code_test.py", line 115, in <module>
proccess_time = timeit.timeit(code_to_measure)
File "/usr/local/lib/python3.6/timeit.py", line 233, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/local/lib/python3.6/timeit.py", line 123, in __init__
compile(stmtprefix + stmt, dummy_src_name, "exec")
File "<timeit-src>", line 3
print(stem, '
^
IndentationError: unexpected indent
However, the timeit function in the code below still runs properly;
# importing the required module
import timeit
# code snippet to be executed only once
mysetup = "from math import sqrt"
# code snippet whose execution time is to be measured
mycode = '''
def example():
mylist = []
for x in range(100):
mylist.append(sqrt(x))
'''
# timeit statement
print(timeit.timeit(setup = mysetup,
stmt = mycode,
number = 10000))
Here is the output of the code;
0.002189640999858966
I am not too sure how to solve the issue. Please advise me if you have any suggestion or solutions on this issue.
Thank you so much in advance.
Bit of a late reply but I ran into the same issue.
It is not possible to simply use triple-quoted strings with newlines in the timeit call. If you want multiple statements in your statement string you can separate them with a ;.
For your code it would look something like this:
for stem, result in zip(stem_dirs, result_dirs):
code_to_measure = f"print({stem}, '\n', {result});subprocess.call(['python', './a.py', '--dir_in', stem, '--dir_out', {result}])"
proccess_time = timeit.timeit(code_to_measure)
print(proccess_time)
(Also adding the variables via format string since timeit runs in an empty environment)
The reason why the timeit call below runs is because it does not actually execute the statements in the function. All it does is create the function, which also explains why it is so ridiculously fast.
The two ways to perform indentation is to either use whitespaces (standard norm is to use 4 whitespaces for one level indentaion), or to use tabs. Make sure you are not mixing them. Stick to one of it.
I might be able to help you more by telling exactly what is the problem if you can share your code with me as a .py file.
Related
I'm trying to reconvert a program that I wrote but getting rid of all for loops.
The original code reads a file with thousands of lines that are structured like:
Ex. 2 lines of a file:
As you can see, the first line starts with LPPD;LEMD and the second line starts with DAAE;LFML. I'm only interested in the very first and second element of each line.
The original code I wrote is:
# Libraries
import sys
from collections import Counter
import collections
from itertools import chain
from collections import defaultdict
import time
# START
# #time=0
start = time.time()
# Defining default program argument
if len(sys.argv)==1:
fileName = "file.txt"
else:
fileName = sys.argv[1]
takeOffAirport = []
landingAirport = []
# Reading file
lines = 0 # Counter for file lines
try:
with open(fileName) as file:
for line in file:
words = line.split(';')
# Relevant data, item1 and item2 from each file line
origin = words[0]
destination = words[1]
# Populating lists
landingAirport.append(destination)
takeOffAirport.append(origin)
lines += 1
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
airports_dict = defaultdict(list)
# Merge lists into a dictionary key:value
for key, value in chain(Counter(takeOffAirport).items(),
Counter(landingAirport).items()):
# 'AIRPOT_NAME':[num_takeOffs, num_landings]
airports_dict[key].append(value)
# Sum key values and add it as another value
for key, value in airports_dict.items():
#'AIRPOT_NAME':[num_totalMovements, num_takeOffs, num_landings]
airports_dict[key] = [sum(value),value]
# Sort dictionary by the top 10 total movements
airports_dict = sorted(airports_dict.items(),
key=lambda kv:kv[1], reverse=True)[:10]
airports_dict = collections.OrderedDict(airports_dict)
# Print results
print("\nAIRPORT"+ "\t\t#TOTAL_MOVEMENTS"+ "\t#TAKEOFFS"+ "\t#LANDINGS")
for k in airports_dict:
print(k,"\t\t", airports_dict[k][0],
"\t\t\t", airports_dict[k][1][1],
"\t\t", airports_dict[k][1][0])
# #time=1
end = time.time()- start
print("\nAlgorithm execution time: %0.5f" % end)
print("Total number of lines read in the file: %u\n" % lines)
airports_dict.clear
takeOffAirport.clear
landingAirport.clear
My goal is to simplify the program using map, reduce and filter. So far I have sorted teh creation of the two independent lists, one for each first element of each file line and another list with the second element of each file line by using:
# Creates two independent lists with the first and second element from each line
takeOff_Airport = list(map(lambda sub: (sub[0].split(';')[0]), lines))
landing_Airport = list(map(lambda sub: (sub[0].split(';')[1]), lines))
I was hoping to find the way to open the file and achieve the exact same result as the original code by been able to opemn the file thru a map() function, so I could pass each list to the above defined maps; takeOff_Airport and landing_Airport.
So if we have a file as such
line 1
line 2
line 3
line 4
and we do like this
open(file_name).read().split('\n')
we get this
['line 1', 'line 2', 'line 3', 'line 4', '']
Is this what you wanted?
Edit 1
I feel this is somewhat reduntant but since map applies a function to each element of an iterator we will have to have our file name in a list, and we ofcourse define our function
def open_read(file_name):
return open(file_name).read().split('\n')
print(list(map(open_read, ['test.txt'])))
This gets us
>>> [['line 1', 'line 2', 'line 3', 'line 4', '']]
So first off, calling split('\n') on each line is silly; the line is guaranteed to have at most one newline, at the end, and nothing after it, so you'd end up with a bunch of ['all of line', ''] lists. To avoid the empty string, just strip the newline. This won't leave each line wrapped in a list, but frankly, I can't imagine why you'd want a list of one-element lists containing a single string each.
So I'm just going to demonstrate using map+strip to get rid of the newlines, using operator.methodcaller to perform the strip on each line:
from operator import methodcaller
def readFile(fileName):
try:
with open(fileName) as file:
return list(map(methodcaller('strip', '\n'), file))
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
Sadly, since your file is context managed (a good thing, just inconvenient here), you do have to listify the result; map is lazy, and if you didn't listify before the return, the with statement would close the file, and pulling data from the map object would die with an exception.
To get around that, you can implement it as a trivial generator function, so the generator context keeps the file open until the generator is exhausted (or explicitly closed, or garbage collected):
def readFile(fileName):
try:
with open(fileName) as file:
yield from map(methodcaller('strip', '\n'), file)
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
yield from will introduce a tiny amount of overhead over directly iterating the map, but not much, and now you don't have to slurp the whole file if you don't want to; the caller can just iterate the result and get a split line on each iteration without pulling the whole file into memory. It does have the slight weakness that opening the file will be done lazily, so you won't see the exception (if there is any) until you begin iterating. This can be worked around, but it's not worth the trouble if you don't really need it.
I'd generally recommend the latter implementation as it gives the caller flexibility. If they want a list anyway, they just wrap the call in list and get the list result (with a tiny amount of overhead). If they don't, they can begin processing faster, and have much lower memory demands.
Mind you, this whole function is fairly odd; replacing IOErrors with prints and (implicitly) returning None is hostile to API consumers (they now have to check return values, and can't actually tell what went wrong). In real code, I'd probably just skip the function and insert:
with open(fileName) as file:
for line in map(methodcaller('strip', '\n'), file)):
# do stuff with line (with newline pre-stripped)
inline in the caller; maybe define split_by_newline = methodcaller('split', '\n') globally to use a friendlier name. It's not that much code, and I can't imagine that this specific behavior is needed in that many independent parts of your file, and inlining it removes the concerns about when the file is opened and closed.
I am trying to create a function that will add two list such that if list1 is [9,1,2] and list2 is [8,5,3] then the two lists added together would produce a list yielding. ]1,7,6,5] since 912+853= 1765.
The following is the code I have written:
def list_addition(list1,list2):
otlist1=''
otlist2=''
for items1 in list1:
otlist1+= items1
for items2 in otlist2:
otlist2+= items2
strinum = int(otlist1)+ int(otlist2)
return strinum
print(list_addition(['3','6','7'], ['4','9','0']))
I keep getting this error:
Traceback (most recent call last):
File "C:/Users/Chuck/PycharmProjects/arrayaddition/Arrays.py", line 13, in <module>
list_addition(['3','6','7'], ['4','9','0'])
File "C:/Users/Chuck/PycharmProjects/arrayaddition/Arrays.py", line 10, in list_addition
strinum = int(otlist1)+ int(otlist2)
ValueError: invalid literal for int() with base 10: ''
I obviously know my code even if it did work as written wouldn't be complete as I would still need to put in the final codes to convert the integer variable 'strinum' back to a list, but I can't get there if my code is failing to properly add the two converted lists. When I break down the code and write the two for loops separately, convert them to integers and add them everything works perfectly fine. So the code below was good:
list1=['7','9','6']
otlist1=''
for items1 in list1:
otlist1+= items1
print(otlist1)
ist1=['5','7','0']
otlist2=''
for items1 in ist1:
otlist2+= items1
print(otlist2)
print(int(otlist1) + int(otlist2))
But for some reason when I try to put the two for loops inside a single function I get the error. I am a complete newbie to programming and I want to know what I am not understanding about function syntax. Any help will be greatly appreciated.
What's the proper way to make use of Python 3's html.parser's getpos() method?
I used the following example to explore a subset of html.parser methods:
https://docs.python.org/3/library/html.parser.html#examples
My copy-and-pasted demo program works. But now I want to use the html.parser's getpos() method to acquire a tag's line number and offset.
After numerous experiments, including trying to add a separate def getpos() method to the class given in the example (nothing at all was output), the only way I've been able to make getpos() return its line number and offset tuple is by inserting one line of (what seems to me to be) clumsy and ugly code per line 4 of the following snippet:
from html.parser import HTMLParser
...
class FlareTopicParser(HTMLParser):
def handle_starttag(self, tag, attrs):
# Following line inserted by me into class's examples.
print(" Line, offset ==", HTMLParser.getpos(self))
# This working code from examples per
# https://docs.python.org/3/library/html.parser.html#examples
print(" Start tag:", tag)
for attr in attrs:
print(" attr:", attr)
That works -- to give but one example, for the zero-indented start tag on line 5 of the HTML input file it prints:
Line, offset == (5, 0)
But the HTMLParser.getpos(self) construction in line 4 of the example code seems (to this only-occasional Python 3 coder) clumsy and wrong.
What's the correct, or if you will better, way to use getpos()?
No need to override getpos in your parser; I suggest to rewrite line 4 as follows:
(line, column) = self.getpos()
print("line %d column %d") % (line, column)
With such call to getpos() you can also use line or column independently.
Here's the way to use getpos():
row, col = parser.getpos()
html.splitlines()[row-1][col:col+100]
I have a very large file ~40GB and 674,877,098 lines I want to read and extract specific columns from. I can get about 3GB of data transferred then I get the following error.
Traceback (most recent call last):
File "C:\Users\Codes\Read_cat_write.py", line 44, in <module>
tid = int(columns[2])
IndexError: list index out of range
Sample of data that is being read in.
1,100000000,100000000,39,2.704006988169216e15,310057,0
2,100000001,100000000,38,2.650346740514816e15,303904,0.01
3,100000002,100000000,37,2.136985003098112e15,245039,0.03
4,100000003,100000000,36,2.29479163101184e15,263134,0.05
5,100000004,100000000,35,1.834645477916672e15,210371,0.06
6,100000005,100000000,34,1.814063860416512e15,208011,0.08
7,100000006,100000000,33,1.808883592986624e15,207417,0.1
8,100000007,100000000,32,1.806241248575488e15,207114,0.12
9,100000008,100000000,31,1.651783621410816e15,189403,0.14
10,100000009,100000000,30,1.634821184946176e15,187458,0.16
Code
from itertools import islice
F = r'C:\Users\Outfiles\comp_cat_raw.txt'
w = open(r'C:\Users\Outfiles\comp_cat_3col.txt','a')
def filesave(TID,M,R):
X = str(TID)
Y = str(M)
Z = str(R)
w.write(X)
w.write('\t')
w.write(Y)
w.write('\t')
w.write(Z)
w.write('\n')
N = 680000000
f = open(F) #Opens file
f.readline() # Strips Header
nlines = islice(f, N) #slices file to only read N lines
for line in nlines:
if line !='':
line = line.strip()
line = line.replace(',',' ') # Replace comma with space
columns = line.split() # Splits into column
tid = int(columns[2])
m = float(columns[4])
r = float(columns[6])
filesave(tid,m,r)
w.close()
I have looked at the file being read in at the point where the error occurs, but I don't see anything wrong with the file so I am at a loss as to the cause of this error.
Chances are, there is some line with maybe one single comma in there, or none, or an empty line, whatever. Probably just put a try-except statement around the statement and catch the index error, probably printing out the line in question, and you should be done. Besides that, there are some things in your code, that might be worth to improve.
Have a look at the csv module especially. It has some optimized C-code exactly for what you want to do, so it should be much faster. This answer shows mainly how to write the iteration with csv.
This whole slice construction seems to be superfluous. A simple for line in f: will do and is the most efficient way to handle this iteration.
Use line.split(',') directly, instead of replacing them first with spaces.
Use with open(F) as f: instead of calling close yourself. For this script it might make no difference, but this way you make sure, that you e.g. don't create open file handles in case of errors.
I have tried to understand this by looking in previous threads but I still don't understand why I get this error for only one of two variables in the following piece of code (the code sucks I know):
alfaphet=('abcdefghijklmnopqrstuvxyz')
cryptalfaphet=('defghjiklmnopqrstuvxyzabc')
spaceNumber=[]
textCopy=[]
def crypt():
textCopy=[]
print('print the text that you want to encrypt:')
text=input()
for i in range(len(text)):
for j in range(len(alfaphet)):
if text[i]==alfaphet[j]:
textCopy.append(cryptalfaphet[j])
if text[i]==' ':
spaceNumber.append(i)
for i in range(len(spaceNumber)):
for j in range(len(text)):
if list(range(len(text)))[j]==int(spaceNumber[i]):
textCopy.insert(j, ' ')
textCopy=''.join(textCopy)
print(textCopy)
crypt()
This code works fine, but if I remove the
textCopy=[]
asignment from the beginning of the def-block, I get an error like this:
Traceback (most recent call last):
File "C:/Python33/dekrypt.py", line 26, in <module>
crypt()
File "C:/Python33/dekrypt.py", line 13, in crypt
textCopy.append(cryptalfaphet[j])
UnboundLocalError: local variable 'textCopy' referenced before assignment
My question is why this doesn't happen with the spaceNumber variable. spaceNumber is as far I can see also referenced before asignment with the
spaceNumber.append(i)
asignment? It is referenced before the def-block, but so was the textCopy vaiable right? What is the difference, they're both empty lists from the beginning and I use the .append() method on both, but Python seems to treat them differently!?
You can avoid this error by adding the following line at beginning of your function
def crypt():
global textCopy
...
however, this isn't a python best practice. See this post for further details.