Changing specific strings into floats in multidimensional array - python-3.x

I saved all the data into an array full of strings, but I want to change the strings in that array into float without changing the header (the first row) and first column of the array. How should I change my code?
import numpy as np
import csv
with open('MI_5MINS_INDEX.csv', encoding="utf-8") as f:
data=list(csv.reader(f))
for line in data:
line.remove('')
ary=np.array(data)
ary.astype(float)

Use pandas read_csv() and it will work as you wish.

Related

How to manipulate this csv, in py especially, while i insert one Character like psg, and the result is 9

This is the example csv. I already tried another way but still print useless things, i just want to get the values (Count).
You can just read through pandas:
import pandas as pd
df = pd.read_csv(r'FILE.csv')
print(df['Count'])
If you need to print only the Counts:
print(*df['Count'].values.tolist(),sep="\n")

Need to sort a numpy array based on column value, but value is present in String format

I am a newbie in python and need to extract info from a csv file containing terrorism data.
I need to extract top 5 cities in India, having maximum casualities, where Casuality = Killed(given in CSV) + Wounded(given in CSV).
City column is also given in the CSV file.
Output format should be like below in descending order of casuality
city_1 casualty_1 city_2 casualty_2 city_3 casualty_3 city_4
casualty_4 city_5 casualty_5
Link to CSV- https://ninjasdatascienceprod.s3.amazonaws.com/3571/terrorismData.csv?AWSAccessKeyId=AKIAIGEP3IQJKTNSRVMQ&Expires=1554719430&Signature=7uYCQ6pAb1xxPJhI%2FAfYeedUcdA%3D&response-content-disposition=attachment%3B%20filename%3DterrorismData.csv
import numpy as np
import csv
file_obj=open("terrorismData.csv",encoding="utf8")
file_data=csv.DictReader(file_obj,skipinitialspace=True)
country=[]
killed=[]
wounded=[]
city=[]
final=[]
#Making lists
for row in file_data:
if row['Country']=='India':
country.append(row['Country'])
killed.append(row['Killed'])
wounded.append(row['Wounded'])
city.append(row['City'])
final.append([row['City'],row['Killed'],row['Wounded']])
#Making numpy arrays out of lists
np_month=np.array(country)
np_killed=np.array(killed)
np_wounded=np.array(wounded)
np_city=np.array(city)
np_final=np.array(final)
#Fixing blank values in final arr
for i in range(len(np_final)):
for j in range(len(np_final[0])):
if np_final[i][j]=='':
np_final[i][j]='0.0'
#Counting casualities(killed+wounded) and storing in 1st column of final array
for i in range(len(np_final)):
np_final[i,1]=float(np_final[i,1])+float(np_final[i,2])
#Descending sort on casualities column
np_final=np_final[np_final[:,1].argsort()[::-1]]
I expect np_final to get sorted on column casualities , but it's not happening because type(casualities) is coming as 'String'
Any help is appreciated.
I would offer for you to use Pandas. It would be easier for you to manipulate date.
Read everything to DataFrame. It should read numbers into number formats.
If you must to use np, while reading data, you could simply cast your values to float or integer and everything should work, if there are no other bugs.
Something like this:
for row in file_data:
if row['Country']=='India':
country.append(row['Country'])
killed.append(int(row['Killed']))
wounded.append(int(row['Wounded']))
city.append(row['City'])
final.append([row['City'],row['Killed'],row['Wounded']])

Converting an array of strings containing range of integer values into an array of floats

Click here to see an image that contains a screenshot sample of the data.
I have a CSV file with a column for temperature range with values like "20-25" stored as string. I need to convert this to 22.5 as a float.
Need this to be done for the entire column of such values, not a single value.I want to know how this can be done in Python as i am very new to it.
Notice in the sample data image that there are NaN values as well in the records
Like said in the reactions split the array using "-" as argument.
Second, create a float array of it. Finally, take the average using numpy.
import numpy as np
temp_input = ["20-25", "36-40", "10-11", "23-24"]
# split and convert to float
# [t.split("-") for t in temp_input] is an inline iterator
tmp = np.array([t.split("-") for t in temp_input], dtype=np.float32)
# average the tmp array
temp_output = np.average(tmp, axis=1)
And here's a oneliner:
temp_output = [np.average(np.array(t.split('-'), dtype=np.float32)) for t in temp_input]

python, loading a string from file

I'm trying to load a .txt file into my python project using numpy:
import numpy as np
import sys
g = np.loadtxt(sys.argv[1])
this command has worked for me when .txt file was a 0/1 matrix, but not
working now as it is a string matrix (4*7 table of words like "crew")
error says "cant convert string to float".. any help?
Take a look at the dtype parameter. (here)
dtype : data-type, optional
Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array. In this case, the number of columns used must match the number of fields in the data-type.
The default is float, which results in the error you are pointing out in your question.
One option is using pandas:
import numpy as np
import pandas as pd
arr = pd.read_table(filename, sep=" ", header=None).values
(Assuming the separator is a whitespace and there is no header column. Specify otherwise).

Matplotlib: Import and plot multiple time series with legends direct from .csv

I have several spreadsheets containing data saved as comma delimited (.csv) files in the following format: The first row contains column labels as strings ('Time', 'Parameter_1'...). The first column of data is Time and each subsequent column contains the corresponding parameter data, as a float or integer.
I want to plot each parameter against Time on the same plot, with parameter legends which are derived directly from the first row of the .csv file.
My spreadsheets have different numbers of (columns of) parameters to be plotted against Time; so I'd like to find a generic solution which will also derive the number of columns directly from the .csv file.
The attached minimal working example shows what I'm trying to achieve using np.loadtxt (minus the legend); but I can't find a way to import the column labels from the .csv file to make the legends using this approach.
np.genfromtext offers more functionality, but I'm not familiar with this and am struggling to find a way of using it to do the above.
Plotting data in this style from .csv files must be a common problem, but I've been unable to find a solution on the web. I'd be very grateful for your help & suggestions.
Many thanks
"""
Example data: Data.csv:
Time,Parameter_1,Parameter_2,Parameter_3
0,10,0,10
1,20,30,10
2,40,20,20
3,20,10,30
"""
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('Data.csv', skiprows=1, delimiter=',') # skip the column labels
cols = data.shape[1] # get the number of columns in the array
for n in range (1,cols):
plt.plot(data[:,0],data[:,n]) # plot each parameter against time
plt.xlabel('Time',fontsize=14)
plt.ylabel('Parameter values',fontsize=14)
plt.show()
Here's my minimal working example for the above using genfromtxt rather than loadtxt, in case it is helpful for anyone else.
I'm sure there are more concise and elegant ways of doing this (I'm always happy to get constructive criticism on how to improve my coding), but it makes sense and works OK:
import numpy as np
import matplotlib.pyplot as plt
arr = np.genfromtxt('Data.csv', delimiter=',', dtype=None) # dtype=None automatically defines appropriate format (e.g. string, int, etc.) based on cell contents
names = (arr[0]) # select the first row of data = column names
for n in range (1,len(names)): # plot each column in turn against column 0 (= time)
plt.plot (arr[1:,0],arr[1:,n],label=names[n]) # omitting the first row ( = column names)
plt.legend()
plt.show()
The function numpy.genfromtxt is more for broken tables with missing values rather than what you're trying to do. What you can do is simply open the file before handing it to numpy.loadtxt and read the first line. Then you don't even need to skip it. Here is an edited version of what you have here above that reads the labels and makes the legend:
"""
Example data: Data.csv:
Time,Parameter_1,Parameter_2,Parameter_3
0,10,0,10
1,20,30,10
2,40,20,20
3,20,10,30
"""
import numpy as np
import matplotlib.pyplot as plt
#open the file
with open('Data.csv') as f:
#read the names of the colums first
names = f.readline().strip().split(',')
#np.loadtxt can also handle already open files
data = np.loadtxt(f, delimiter=',') # no skip needed anymore
cols = data.shape[1]
for n in range (1,cols):
#labels go in here
plt.plot(data[:,0],data[:,n],label=names[n])
plt.xlabel('Time',fontsize=14)
plt.ylabel('Parameter values',fontsize=14)
#And finally the legend is made
plt.legend()
plt.show()

Resources