I am writing a dataframe into a csv as follows:
appended_Dimension.to_csv("outputDimension.csv")
The dataframe is as follows:
Cameroun Rwanda Niger Zambia Mali Angola Ethiopia
ECON 0.056983 0.064422 0.047602 0.070119 0.048395 0.059233 0.085559
FOOD 0.058250 0.046348 0.048849 0.043527 0.049064 0.013157 0.081436
ENV 0.013906 0.004013 0.010519 0.001973 0.005360 0.023010 0.008469
HEA 0.041496 0.078403 0.040154 0.054466 0.029954 0.053007 0.061761
PERS 0.056687 0.021978 0.062655 0.056477 0.087056 0.089886 0.043747
The output is as follows:
I d like to write data in a float format so i can process it in csv directly. How can i do that please?
You cannot keep it as float inside the csv. The csv will treat everything as strings. You must load your data from the csv and perform the relevant operations then save it back. You cannot manipulate it while it is present inside the csv.
Related
I want to save a bunch of values in a dataframe to csv but I keep running in the problem that something changes to values while saving. Let's have a look at the MWE:
import pandas as pd
import csv
df = {
"value1": [110.589, 222.534, 390.123],
"value2": [50.111, 40.086, 45.334]
}
df.round(1)
#checkpoint
df.to_csv(some_path)
If I debug it and look at the values of df at the step which I marked "checkpoint", thus after rounding, they are like
[110.6000, 222.5000, 390.1000],
[50.1000, 40.1000, 45.3000]
In reality, my data frame is much larger and when I open the csv after saving, some values (usually in a random block of a couple of dozen rows) have changed! They then look like
[110.600000000001, 222.499999999999, 390.099999999999],
[50.099999999999, 40.100000000001, 45.300000000001]
So it's always a 0.000000000001 offset from the "real"/rounded values. Does anybody know what's going on here/how I can avoid this?
This is a typical floating point problem. pandas gives you the option to define a float_format:
df.to_csv(some_path, float_format='%.4f')
This will force 4 decimals (or actually, does a cut-off at 4 decimals). Note that values will be treated as strings now, so if you set quoting on strings, then these columns are also quoted.
I have an .xlsx file with 5 columns(X,Y,Z,Row_Cog,Col_Cog) and will be in the same order each time. I would like to have each column as a variable in python. I am implementing the below method but would like to know if there is a better way to do it.
Also I am writing the range manually(in the for loop) while I would like to have a robust way to know the length of each column in excel(no of rows) and assign it.
#READ THE TEST DATA from Excel file
import xlrd
workbook = xlrd.open_workbook(r"C:\Desktop\SawToothCalib\TestData.xlsx")
worksheet = workbook.sheet_by_index(0)
X_Test=[]
Y_Test=[]
Row_Test=[]
Col_Test=[]
for i in range(1, 29):
x_val= worksheet.cell_value(i,0)
X_Test.append(x_val)
y_val= worksheet.cell_value(i,2)
Y_Test.append(y_val)
row_val= worksheet.cell_value(i,3)
Row_Test.append(row_val)
col_val= worksheet.cell_value(i,4)
Col_Test.append(col_val)
Do you really need this package? You can easily do this kind of operation with pandas.
You can read your file as a DataFrame with:
import pandas as pd
df = pd.read_excel(path + 'file.xlsx', sheet_name=the_sheet_you_want)
and access the list of columns with df.columns. You can acces each column with df['column name']. If there are empty entries, they are stored as NaN. You can know how many you have with df['column_name'].isnull().
If you are uncomfortable with DataFrames, you can then convert the columns to lists or arrays, like
df['my_col'].tolist()
or
df['my_col'].to_numpy()
I want to store my dataframe in h5 file. My dataframe is:
dfbhbh=pd.DataFrame([m1bhbh,m2bhbh,adcobhbh,edcobhbh,alisabhbh,elisabhbh,tevolbhbh,distbhbh,metalbhbh,compbhbh,weightbhbh]).T
dfbhbh.columns=['m_1','m_2','a_DCO','e_DCO','a_LISA','e_LISA','t_evol','dist','Z','component','weight']
I am trying to convert it using:
hf=h5py.File('anew', 'w')
for i in range(len(dfbhbh)):
hf.create_dataset('simulations',list(dfbhbh.iloc[i]))
And I'm getting the error
TypeError: Can't convert element 9 (low_alpha_disc) to hsize_t
I removed the entire array of the component (even though it is extremely significant) but the code did not run.
I also tried to insert directly the data in the h5 file like this
hf.create_dataset('simulations', m1bhbh)
I got this error
Dimensionality is too large (dimensionality is too large)
The variable 'm1bhbh' is a float type with length 1499.
Try:
hf.create_dataset('simulations', data = m1bhbh)
instead of
hf.create_dataset('simulations', m1bhbh)
(Don't forget to clear outputs before running it.)
I am trying to access data from a CSV using python. I am able to access entire columns for data values; however, I want to also access rows, an use like and indexed coordinate system (0,1) being column 0, row 1. So far I have this:
#Lukas Robin
#25.07.2021
import csv
with open("sun_data.csv") as sun_data:
sunData = csv.reader(sun_data, delimiter=',')
global data
for data in sunData:
print(data)
I don't normally use data tables or CSV, so this is a new area for me.
As mentioned in the comment, you could make the jump to using pandas and spend a little time learning that. It would be a good investment of time if you plan to do much data analysis or work with data tables regularly.
If you just want to pull in a table of numbers and access it as you request, you are perfectly fine using csv package and doing that. Below is an example...
If your .csv file has a header in it, you can simply add in next(sun_data) before starting the inner loop to advance the iterator and let that data fall on the floor...
import csv
f_in = 'data_table.csv'
data = [] # a container to hold the results
with open(f_in, 'r') as source:
sun_data = csv.reader(source, delimiter=',')
for row in sun_data:
# convert the read-in values to float data types (or ints or ...)
row = [float(t) for t in row]
# append it to the data table
data.append(row)
print(data[1][0])
I have a Python function that asks for the input of a cluster name that in return will pull from a csv file and returns the columns as follows:
server10, xxxxxxxx1, yyyyyyyy1
server11, xxxxxxxx2, yyyyyyyy2
server12, xxxxxxxx3, yyyyyyyy3
server13, xxxxxxxx4, yyyyyyyy4
server14, xxxxxxxx5, yyyyyyyy5
server15, xxxxxxxx6, yyyyyyyy6
server16, xxxxxxxx7, yyyyyyyy7
server17, xxxxxxxx8, yyyyyyyy8
server18, xxxxxxxx9, yyyyyyyy9
server19, xxxxxxx10, yyyyyyy10
I'm using DictReader class from the csv module. How can I put each column into a list while retaining the information of the rows?
Many thanks to Patrick Haugh for the hint. I was able to look back at my DictReader function and use hostname.append(col['hostname']) to put the columns into separate lists.