Parse txt-file, from string to int/float in Python3.x - python-3.x

Currently, I need to parse the string saved in a rot.txt fileļ¼š
3 3
-0.0963063 0.994044 -0.0510079
-0.573321 -0.0135081 0.81922
0.813651 0.10814 0.571207
The first row denotes the dim of a rotation matrix, and the rest is a rotation matrix.
I tried the following lines:
import numpy as np
rotation = np.genfromtxt(f_name, delimiter=" ")
Obviously, the first row should be skipped. How can I fix this? THX in advance.

Related

Create a square matrix in python

I want to get an number from user an create a square matrix of number*number but I can't do it for now. Could you please help me about it. It should look like this:
https://i.stack.imgur.com/ZxtCl.png
import numpy as np
input = 4
matrix = np.array(range(0, input**2 - 1)).reshape((input, input))

Bokeh ValueError: expected an element of either Seq(String)

I'm trying to build a simple bar chart via bokeh but struggling for it to recognize the x-axis and keep getting a ValueError... I think it needs to be in string format but for some reason whatever I try it just won't work. Please note, the column that contains the Years (as floats by the looks of it) is called RegionName, if it seems confusing. Please see my code below, any suggestions?
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool
import os
from bokeh.palettes import Spectral5
from bokeh.transform import factor_cmap
os.chdir("C:/Users/Vladimir.Tikhnenko/Python/Land Reg")
# Pivot data
def pivot2(infile="Land Registry.csv", outfile="SalesVolume.csv"):
df=pd.read_csv(infile)
table=pd.pivot_table(df,index=
["RegionName"],columns="Year",values="SalesVolume",aggfunc=sum)
table.to_csv(outfile)
return table
pivot2()
# Transpose data
df=pd.read_csv("SalesVolume.csv")
df=df.drop(df.columns[1:28],1)
df=pd.read_csv("SalesVolume.csv", index_col=0, header=None).T
df.to_csv("C:\\Users\Vladimir.Tikhnenko\Python\Land
Reg\SalesVolume.csv",index=None)
df=pd.read_csv("SalesVolume.csv")
source = ColumnDataSource(df)
years = source.data['RegionName'].tolist()
p = figure(x_range=['RegionName'])
color_map = factor_cmap(field_name='RegionName',palette=Spectral5,
factors=years)
p.vbar(x='RegionName', top='Southwark', source=source, width=1,
color=color_map)
p.title.text ='Transactions'
p.xaxis.axis_label = 'Years'
p.yaxis.axis_label = 'Number of Sales'
show(p)
the error message is
ValueError: expected an element of either Seq(String), Seq(Tuple(String,
String)) or Seq(Tuple(String, String, String)), got [1968.0, 1969.0, 1970.0,
1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977.0, 1978.0, 1979.0,
1980.0, 1981.0, 1982.0, 1983.0, 1984.0, 1985.0, 1986.0, 1987.0, 1988.0,
1989.0, 1990.0, 1991.0, 1992.0, 1993.0, 1994.0, 1995.0, 1996.0, 1997.0,
1998.0, 1999.0, 2000.0, 2001.0, 2002.0, 2003.0, 2004.0, 2005.0, 2006.0,
2007.0, 2008.0, 2009.0, 2010.0, 2011.0, 2012.0, 2013.0, 2014.0, 2015.0,
2016.0, 2017.0, 2018.0]
Categorical factors must only be strings (or sequences of strings for nested factors), so factor_cmap only accepts lists of those things. You passed it a list a numbers, which causes the error shown. To use use the years as categorical factors, you need to convert them to strings as suggested, and use those string values to initialize x_range, and for the coordinates to vbar.
Alternatively, if you want to use numerical values for the years, but just want to have fixed, controlled tick locations, do this:
p = figure() # don't pass x_range
p.xaxis.ticker = years
And then also use linear_cmap to map the numerical values (instead of factor_cmap)

Converting an array of strings containing range of integer values into an array of floats

Click here to see an image that contains a screenshot sample of the data.
I have a CSV file with a column for temperature range with values like "20-25" stored as string. I need to convert this to 22.5 as a float.
Need this to be done for the entire column of such values, not a single value.I want to know how this can be done in Python as i am very new to it.
Notice in the sample data image that there are NaN values as well in the records
Like said in the reactions split the array using "-" as argument.
Second, create a float array of it. Finally, take the average using numpy.
import numpy as np
temp_input = ["20-25", "36-40", "10-11", "23-24"]
# split and convert to float
# [t.split("-") for t in temp_input] is an inline iterator
tmp = np.array([t.split("-") for t in temp_input], dtype=np.float32)
# average the tmp array
temp_output = np.average(tmp, axis=1)
And here's a oneliner:
temp_output = [np.average(np.array(t.split('-'), dtype=np.float32)) for t in temp_input]

python, loading a string from file

I'm trying to load a .txt file into my python project using numpy:
import numpy as np
import sys
g = np.loadtxt(sys.argv[1])
this command has worked for me when .txt file was a 0/1 matrix, but not
working now as it is a string matrix (4*7 table of words like "crew")
error says "cant convert string to float".. any help?
Take a look at the dtype parameter. (here)
dtype : data-type, optional
Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array. In this case, the number of columns used must match the number of fields in the data-type.
The default is float, which results in the error you are pointing out in your question.
One option is using pandas:
import numpy as np
import pandas as pd
arr = pd.read_table(filename, sep=" ", header=None).values
(Assuming the separator is a whitespace and there is no header column. Specify otherwise).

Matplotlib: Import and plot multiple time series with legends direct from .csv

I have several spreadsheets containing data saved as comma delimited (.csv) files in the following format: The first row contains column labels as strings ('Time', 'Parameter_1'...). The first column of data is Time and each subsequent column contains the corresponding parameter data, as a float or integer.
I want to plot each parameter against Time on the same plot, with parameter legends which are derived directly from the first row of the .csv file.
My spreadsheets have different numbers of (columns of) parameters to be plotted against Time; so I'd like to find a generic solution which will also derive the number of columns directly from the .csv file.
The attached minimal working example shows what I'm trying to achieve using np.loadtxt (minus the legend); but I can't find a way to import the column labels from the .csv file to make the legends using this approach.
np.genfromtext offers more functionality, but I'm not familiar with this and am struggling to find a way of using it to do the above.
Plotting data in this style from .csv files must be a common problem, but I've been unable to find a solution on the web. I'd be very grateful for your help & suggestions.
Many thanks
"""
Example data: Data.csv:
Time,Parameter_1,Parameter_2,Parameter_3
0,10,0,10
1,20,30,10
2,40,20,20
3,20,10,30
"""
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('Data.csv', skiprows=1, delimiter=',') # skip the column labels
cols = data.shape[1] # get the number of columns in the array
for n in range (1,cols):
plt.plot(data[:,0],data[:,n]) # plot each parameter against time
plt.xlabel('Time',fontsize=14)
plt.ylabel('Parameter values',fontsize=14)
plt.show()
Here's my minimal working example for the above using genfromtxt rather than loadtxt, in case it is helpful for anyone else.
I'm sure there are more concise and elegant ways of doing this (I'm always happy to get constructive criticism on how to improve my coding), but it makes sense and works OK:
import numpy as np
import matplotlib.pyplot as plt
arr = np.genfromtxt('Data.csv', delimiter=',', dtype=None) # dtype=None automatically defines appropriate format (e.g. string, int, etc.) based on cell contents
names = (arr[0]) # select the first row of data = column names
for n in range (1,len(names)): # plot each column in turn against column 0 (= time)
plt.plot (arr[1:,0],arr[1:,n],label=names[n]) # omitting the first row ( = column names)
plt.legend()
plt.show()
The function numpy.genfromtxt is more for broken tables with missing values rather than what you're trying to do. What you can do is simply open the file before handing it to numpy.loadtxt and read the first line. Then you don't even need to skip it. Here is an edited version of what you have here above that reads the labels and makes the legend:
"""
Example data: Data.csv:
Time,Parameter_1,Parameter_2,Parameter_3
0,10,0,10
1,20,30,10
2,40,20,20
3,20,10,30
"""
import numpy as np
import matplotlib.pyplot as plt
#open the file
with open('Data.csv') as f:
#read the names of the colums first
names = f.readline().strip().split(',')
#np.loadtxt can also handle already open files
data = np.loadtxt(f, delimiter=',') # no skip needed anymore
cols = data.shape[1]
for n in range (1,cols):
#labels go in here
plt.plot(data[:,0],data[:,n],label=names[n])
plt.xlabel('Time',fontsize=14)
plt.ylabel('Parameter values',fontsize=14)
#And finally the legend is made
plt.legend()
plt.show()

Resources