I'm learning Python 3 and I have a question.
I have a text file 'test.txt' and it's contents are:
1, 2
3, 4
5, 7
#######
10, 11
19, 20
The number on the left side is an x coordinate and the one on the right side is a y coordinate.
I want to get the distance from six digits, and the distance from four digit numbers below '#######'.
Since the distance formula is,
Distance = SQRT((y2-y1)^2 + (x2-x1)^2).
My problem is that I want to extract six numbers, and calculate the total distance from three pairs. For example,
total_distance = SQRT((4-2)^2 + (3-1)^2) + SQRT((7-4)^2 + (5-3)^2).
After that, I want to get (10, 11) and (19, 20). What makes me confused is that how can I skip '#######' and extracting numbers as x and y coordinates.
I began to write code like this:
with open("text.txt") as filestream:
for line in filestream:
currentline = line.split(",")
I'm trying to figure out how to solve this issue. Can you help me out or give some advise on what should I do?
Try using regex:
import re
with open("text.txt") as f:
arr = [[int(l.split(',')[0]), int(l.split(',')[1])] for l in f.readlines() if re.match('\d+, \d+', l)]
And if you need the parts separated:
import re
arr = []
with open("text.txt") as f:
parts = re.compile('#+').split(f.read())
for part in parts:
arr.append([[int(l.split(',')[0]), int(l.split(',')[1])] for l in part.split('\n') if re.match('\d+, \d+', l)])
And all of that is off course, if you are certain that the file exists and has that particular format.
PS: I cant figure out what you are trying to do with those numbers
Related
I am attempting to translate a MATLAB function to Python from Timothy Sauer,
Numerical Analysis Second Edition, page 546, Program 12.8. The original function
receives a square matrix and returns a matrix with the same eigenvalues but in
Upper Hessenberg form. The original function creates Householder reflectors to produce zeros in the
offdiagonals of the matrix and performs similarity transformations on the original matrix to
get it to upper hessenberg form.
My Python translation succeeds only in obtaining the eigenvalues for 3x3 matrices
but not for 4x4 matrices. Would anyone know the cause of the error? I pasted my code with success and failing cases below. Thank you.
import numpy as np
import math
norm = lambda v:math.sqrt(np.sum(v**2))
def upper_hessenberg(A):
'''
Translated from Timothy Sauer, Numerical Analysis Second Edition, page 546, Program 12.8
Input: Square Matrix, A
Output: B, a Similar Matrix with Same Eigenvalues as A except in Upper Hessenberg form
V, a matrix containing the reflectors used to produce zeros in the off diagonals
'''
rows, columns = A.shape
B = A[:,:].astype(np.float) #will store the similar matrix
V = np.zeros(shape=(rows,columns),dtype=float) #will store the reflectors
for column in range(columns-2): #start from the 1st column end at the third to last column
row = column
x = B[row+1: ,column] #decapitate the column
reflection_of_x = np.zeros(len(x)) #first entry is the norm, followed by 0s
if abs(norm(x)) <= np.finfo(float).eps: #if there are already 0s inthe offdiagonals skip this column
continue
reflection_of_x[0] = norm(x)
v = reflection_of_x - x # v, (the difference vector) represents the line connecting the original column to the reflection of the column (see Timothy Sauer Num Analysis 2nd Edition Figure 4.11 Householder reflector)
v = v/norm(v) #normalize to length of 1 (unit vector)
V[:len(v), column] = v #save the reflector in an upper triangular matrix called V
#verify with x-2*(x # v * v) should equal a vector with all zeros except the leading entry
column_projections = np.outer(v , v # B[row+1:, column:]) #project each col onto difference vector
B[row+1:, column:] = B[row+1:, column:] - (2 * column_projections)
row_projections = np.outer(v, B[row:, column + 1:] # v).T #project each row onto difference vector
B[row:, column + 1:] = B[row:, column + 1:] - (2 * row_projections)
return V, B
# Algorithm succeeds only with 3x3 matrices
eigvectors = np.array([
[1,3,2],
[4,5,6],
[7,8,9],
])
eigvalues = np.array([
[4,0,0],
[0,3,0],
[0,0,2]
])
M = eigvectors # eigvalues # np.linalg.inv(eigvectors)
print("The expected eigvals :", np.linalg.eigvals(M))
V,B = upper_hessenberg(M)
print("For 3x3 matrices, The function successfully produces these eigvals",np.linalg.eigvals(B))
#But with 4x4 matrices it fails
eigvectors = np.array([
[1,3,2,4],
[4,5,6,2],
[7,8,9,5],
[5,2,7,8]
])
eigvalues = np.array([
[4,0,0,0],
[0,3,0,0],
[0,0,2,0],
[0,0,0,1]
])
M = eigvectors # eigvalues # np.linalg.inv(eigvectors)
print("The expected eigvals :", np.linalg.eigvals(M))
V,B = upper_hessenberg(M)
print("For 4x4 matrices, The function fails to obtain correct eigvals",np.linalg.eigvals(B))
Your error is that you try to be too efficient. While the last rows are indeed increasingly reduced with leading zeros, this is not the case for the last columns. So in row_projections you need to remove the limiter row:, change to B[:, column + 1:].
You are using the unstable variant of the "improved" Householder reflector. The older version would use the larger of x_refl - x and x_refl + x by setting reflection_of_x[0] = -np.sign(x[0])*norm(x) (or remove all minus signs there).
The stable variant of the improved reflector would use the binomial trick in the normalization of x_refl - x if this difference becomes too small.
x_refl - x = [ norm(x) - x[0], - x[1:] ]
= [ norm(x[1:])^2/(norm(x) + x[0]), - x[1:] ]
(x_refl - x)/norm(x_refl - x)
[ norm(x[1:]), - (norm(x)+x[0])*(x[1:]/norm(x[1:])) ]
= -----------------------------------------------------
sqrt(2*norm(x)*(norm(x)+x[0]))
While the parts may have wildly different scales, no catastrophic cancellation happens for x[0]>0.
See the discussion about the same algorithm from Golub/van Loan 4th ed. in for further details and opinions and the code from that book.
The program I have here is simulating the velocity of a falling object.
The velocity is calculated by subtracting the y position from time_1 and time_2.
The problem that I have is that the dimensions of array v and array t don't match. Instead of shortening array t I would like to add 0 at the beginning of the v array. So that the graph will show v = 0 at t= 0. Yes, I know it is a small interval and that it does not really matter. But I want to know it for educational purpose.
I'm wondering if i can write the line v = (y[1:] - y[:-1])/0.1 in a from where i keep the dimension.
The ideal thing that would happen is that the array y will be substracted with an array y[:-1] and that this subtraction will happen at the end of the y array so the result will be an array of dimension 101 with a 0 as start value.
I would like to know your thoughts about this.
import matplotlib.pyplot as plt
t = linspace(0,10,101)
g = 9.80665
y = 0.5*g*t*t
v = (y[1:] - y[:-1])/0.1
plt.plot(t,v)
plt.show()
is there a function where i can add a certain value to the beginning of an array? np.append will add it to the end.
Maybe you could just pre-define the length of the result at the beginning and then fill up the values:
import numpy as np
dt = .1
g = 9.80665
t_end = 10
t = np.arange(0,t_end+dt,dt)
y = 0.5*g*t*t
v = np.zeros(t.shape[0])
v[1:] = (y[1:] - y[:-1])/dt
if you simply looking for the append at index function it would be this one:
np.insert([1,2,3,4,5,6], 2, 100)
>> array([ 1, 2, 100, 3, 4, 5, 6])
another possible solution to this would be to use np.append but inverse your order :
import numpy as np
v = np.random.rand(10)
value = 42 # value to append at the beginning of v
value_arr = np.array([value]) # dimensions should be adjust for multidimensional array
v = np.append(arr = value_arr, values = v, axis=0)
and the possible variants following the same idea, using np.concatenate or np.hstack ...
regarding your second question in comments, one solution may be :
t = np.arange(6)
condlist = [t <= 2, t >= 4]
choicelist = [1, 1]
t = np.select(condlist, choicelist, default=t)
I am running this program to find the distribution of characters in a specific text.
# this is a paragraph from python documentation :)
mytext = 'When a letter is first k encountered, it is missing from the mapping, so the default_factory function calls int() to supply a default count of zero. The increment operation then builds up the count for each letter.The function int() which always returns zero is just a special case of constant functions. A faster and more flexible way to create constant functions is to use a lambda function which can supply any constant value (not just zero):'
d = dict()
ignorelist = ('(',')',' ', ',', '.', ':', '_')
for n in mytext:
if(n not in ignorelist):
n = n.lower()
if n in d.keys():
d[n] = d[n] + 1
else:
d[n] = 1
xx = list(d.keys())
yy = list(d.values())
import matplotlib.pyplot as plt
plt.scatter(xx,yy, marker = '*')
plt.show()
Both the list has 25 elements. For some strange reason the plot is coming like this. It ends in 'J' in x axis.
If I zoom it the right side gets visible but there is no points plotted.
Note that this will be fixed as of matplotlib version 2.2
It seems you have found a bug in the new categorical feature of matplotlib 2.1. For single letter categories it will apparently limit its functionality to 10 items. If categories consist of more letters, this does not happen.
In any case, a solution is to plot numerical values (just as one would have needed to do prior to matplotlib 2.1 anyways). Then set the ticklabels to the categories.
# this is a paragraph from python documentation :)
mytext = 'When a letter is first k encountered, it is missing from the mapping, so the default_factory function calls int() to supply a default count of zero. The increment operation then builds up the count for each letter.The function int() which always returns zero is just a special case of constant functions. A faster and more flexible way to create constant functions is to use a lambda function which can supply any constant value (not just zero):'
d = dict()
ignorelist = ('(',')',' ', ',', '.', ':', '_')
for n in mytext:
if(n not in ignorelist):
n = n.lower()
if n in d.keys():
d[n] = d[n] + 1
else:
d[n] = 1
xx,yy = zip(*d.items())
import numpy as np
import matplotlib.pyplot as plt
xx_sorted, order = np.unique(xx, return_inverse=True)
plt.scatter(order,yy, marker="o")
plt.xticks(range(len(xx)), xx_sorted)
plt.show()
I am trying to create a program which creates a Fibonacci sequence up to the value of the sequence being 200. I have the basic set up down where I can compute the sequence but I wish to display it in a certain way and I have forgotten how to achieve this.
I wish to write the numbers to an array which I have defined as empty initially, compute the numbers and assign them to the array and print said array. In my code below the computation is ok but when printed to screen, the array shows the value 233 which is above 200 and not what I'm looking for. I wish to print all the values under 200 which I've stored in an array.
Is there a better way to initially define the array for what I want and what is the correct way to print the array at the end with all elements below 200?
Code follows:
#This program calculates the fibonacci sequence up to the value of 200
import numpy as np
x = np.empty(14, float) #Ideally creates an empty array to deposit the fibonacci numbers in
f = 0.0 #Dummy variable to be edited in the while loop
#Here the first two values of the sequence are defined alongside a counter starting at i = 1
x[0] = 0.0
x[1] = 1.0
i = 1
#While loop which computes the values and writes them to the array x
while f <= 200:
f = x[i]+x[i-1] #calculates the sequence element
i += 1 #Increases the iteration counter by 1 for each loop
x[i] = f #set the array element equal to the calculated sequence number
print(x)
For reference here is a quick terminal output, Ideally I wish to remove the last element:
[ 0. 1. 1. 2. 3. 5. 8. 13. 21. 34. 55. 89.
144. 233.]
There are a number of stylistic points here. Firstly, you should probably use integers, rather than floats. Secondly, you should simply append each number to a list, rather than pre-define an array of a particular size.
Here's an interactive session:
>>> a=[0,1]
>>> while True:
b=a[-1]+a[-2]
if b<=200:
a.append(b)
else:
break
>>> a
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]
Here is a way without using indices:
a = 0
x = [a]
b = 1
while b <= 200:
x.append(b)
a, b = b, a+b
print(x)
I have a file like
2.0 4 3
0.5 5 4
-0.5 6 1
-2.0 7 7
.......
the actual file is pretty big
which I want to read and add couple of columns, first added column, column(4) = column(2) * column(3) and 2nd column added would be column 5 = column(2)/column(1) + column(4) so the result should be
2.0 4 3 12 14
0.5 5 4 20 30
-0.5 6 1 6 -6
-2.0 7 7 49 45.5
.....
which I want to write in a different file.
with open('test3.txt', encoding ='latin1') as rf:
with open('test4.txt', 'w') as wf:
for line in rf:
float_list= [float(i) for i in line.split()]
print(float_list)
But so far I just have this. I am just able create the list not sure how to perform arithmetic on the list and create new columns. I think I am completely off here. I am just a beginner in python. Any help will be greatly appreciated. Thanks!
I would reuse your formulae, but shifting indexes since they start at 0 in python.
I would extend the read column list of floats with the new computations, and write back the line, space separated (converting back to str in a list comprehension)
So, the inner part of the loop can be written as follows:
with open('test3.txt', encoding ='latin1') as rf:
with open('test4.txt', 'w') as wf:
for line in rf:
column= [float(i) for i in line.split()] # your code
column.append(column[1] * column[2]) # add column
column.append(column[1]/column[0] + column[3]) # add another column
wf.write(" ".join([str(x) for x in column])+"\n") # write joined strings, separated by spaces
Something like this - see comments in code
with open('test3.txt', encoding ='latin1') as rf:
with open('test4.txt', 'w') as wf:
for line in rf:
float_list = [float(i) for i in line.split()]
# calculate two new columns
float_list.append(float_list[1] * float_list[2])
float_list.append(float_list[1]/float_list[0] + float_list[3])
# convert all values to text
text_list = [str(i) for i in float_list]
# concatente all elements and write line
wf.write(' '.join(text_list) + '\n')
Try the following:
map() is used to convert each element of the list to float, by the end it is used again to convert each float to str so we can concatenate them.
with open('out.txt', 'w') as out:
with open('input.txt', 'r') as f:
for line in f:
my_list = map(float, line.split())
my_list.append(my_list[1]*my_list[2])
my_list.append(my_list[1] / my_list[0] + my_list[3])
my_list = map(str, my_list)
out.write(' '.join(my_list) + '\n')