Convert float64 column to int64 column

Convert float64 column to int64 column - python-3.x

I have a csv which has a column that contains a long "ID" string such as 9075841942209708806(int64). Now, when I read this csv file into a pandas data frame, this number turns into -9.191700e+18(float64).
How can the id of -9.191700e+18(float64) be converted in its original form, i.e. 9075841942209708806(int64)?

To change dtype of column you need to use:
df['ID'] = df['ID'].astype('int64')
Documentation here:
LINK

Related

How to convert Excel imported data in format %m%d%y H:M in a dataframe to datetime data?

I have a dataframe where the first rows look like this:
When I list df.iloc1 it returns a column with those dates, but it says they are type "object". I tried to convert them to string using:
df.iloc[1] = df.iloc[1].astype(str)
It still lists the data type as object. But a string is an object, right? So I tried variations on this to convert to datetime:
df.iloc[1] = pd.to_datetime(df.iloc[1], format='%mm/%dd/%yyyy %H:%M')
error: time data '11/22/2022 5:15' does not match format '%mm/%dd/%yyyy %H:%M' (match)

I have a column date of the form '20041230' . I want to convert this column to the form 2004-12-30

I Have a column named "Date" which has values of the form '20041230'.
How to convert this to 2004-12-30 in pandas.
I tried applying pd.to_datetime to the column, but I am getting garbage values attached to the date.

A safe method to have strings would be:
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y-%m-%d')
For datetime type, use normalize:
df['Date'] = pd.to_datetime(df['Date']).dt.normalize()

Write Array and Variable to Dataframe

I have an array in the format [27.214 27.566] - there can be several numbers. Additionally I have a Datetime variable.
now=datetime.now()
datetime=now.strftime('%Y-%m-%d %H:%M:%S')
time.sleep(0.5)
agilent.write("MEAS:TEMP? (#101:102)")
values = np.fromstring(agilent.read(), dtype=float, sep=',')
The output from the array is [27.214 27.566]
Now I would like to write this into a dataframe with the following structure:
Datetime, FirstValueArray, SecondValueArray, ....
How to do this? In the dataframe every one minute a new array is added.

I will assume you want to append a row to an existing dataframe df with appropriate columns : value1, value2, ..., lastvalue, datetime
We can easily convert the array to a series :
s = pd.Series(array)
What you want to do next is append the datetime value to the series :
s.append(datetime, ignore_index=True) cf Series.append
Now you have a series whose length matches df.columns. You want to convert that series to a dataframe to be able to use pd.concat :
df_to_append = s.to_frame().T
We need to get the transpose of the original dataframe, because Series.to_frame() returns a dataframe with the series as a single column, and we want a single index but multiple columns.
Before you concatenate, however, you need to make sure both those dataframes columns names match, or it will create additional columns :
df_to_append.columns = df.columns
Now we can concatenate our two dataframes :
pd.concat([df, df_to_append], ignore_index=True) cf pandas.Concat
For further details, see the documentation

Why am I getting this TypeError when I try to slice my Pandas DataFrame?

I pulled some stock data from a financial API and created a DataFrame with it. Columns were 'date', 'data1', 'data2', 'data3'. Then, I converted that DataFrame into a CSV with 'date' column as index:
df.to_csv('data.csv', index_label='date')
In a second script, I read that CSV and attempted to slice the resulting DataFrame between two dates:
df = pd.read_csv('data.csv', parse_dates=['date'] ,index_col='date')
df = df['2020-03-28':'2020-04-28']
When I attempt to do this, I get the following TypeError:
TypeError: cannot do slice indexing on <class 'pandas.core.indexes.numeric.Int64Index'> with these indexers [2020-03-28] of <class 'str'>
So clearly, the problem is that I'm trying to use a str to slice a datetime object. But here's the confusing part! If in the first step, I save the DataFrame to a csv and DO NOT set 'date' as index:
df.to_csv('data.csv')
In my second script, I no longer get the TypeError:
df = pd.read_csv('data.csv', parse_dates=['date'] ,index_col='date')
df = df['2020-03-28':'2020-04-28']
Now it works just fine. The only problem is I have the default Pandas index column to deal with.
Why do I get a TypeError when I set the 'date' column as index in my CSV...but I do NOT get a TypeError when I don't set any index in the CSV?

It seems that in your "first" instance of df, date column was an
ordinary column (not the index) and this DataFrame had a default
index - consecutive integers (its name is not important).
In this situation running df.to_csv('data.csv', index_label='date')
causes that the output file contains:
date,date,data1,data2,data3
0,2020-03-27,10.5,12.3,13.2
1,2020-03-28,10.6,12.9,14.7
i.e.:
the index column (integers) was given date name, passed by you in
index_label parameter,
the next column, which in df was named date was also
given date name.
Then if you read it running
df = pd.read_csv('data.csv', parse_dates=['date'], index_col='date'), then:
the first date column (integers) has been read as date and
set as the index,
the second date column (dates) has been read as date.1 and
it is an ordinary column.
Now when you run df['2020-03-28':'2020-04-28'], you attempt to find rows
with index in the range given. But the index column is of Int64Index
type (check this in your installation), hence just the mentioned exception
was thrown.
Things look other way when you run df.to_csv('data.csv').
Now this file contains:
,date,data1,data2,data3
0,2020-03-27,10.5,12.3,13.2
1,2020-03-28,10.6,12.9,14.7
i.e.:
the first column (which in df was the index) has no name and int
values,
the only column named date is the second column and contains
dates.
Now when you read it, the result is:
date (converted do DatetimeIndex) is the index,
the original index column got name Unnamed: 0, no surprise,
since in the source file it had no name.
And now, when you run df['2020-03-28':'2020-04-28'] everything is OK.
The thing to learn for the future:
Running df.to_csv('data.csv', index_label='date') does not set this
column as the index. It only saves the current index column
under the given name, without any check whether any other column has
just the same name.
The result is that 2 columns can have the same name.

Handle missing data for a dataframe column of datatype object

I have a pandas dataframe and one of the column is of datatype object . There is a blank element present in this column, so I tried to check if there are other empty element in this column by using df['colname'].isnull().sum() but it is giving me 0. How can I replace the above value(empty) with some arbitrary value(numeric) so that I can convert this column into a column of float datatype for further computation.

pandas.to_numeric
df['colname'] = pd.to_numeric(df['colname'], errors='coerce')
This will produce np.nan for any thing it can't convert to a number. After this, you can fill in with any value you'd like with fillna
df['colname'] = df['colname'].fillna(0)
All in one go
df['colname'] = pd.to_numeric(df['colname'], errors='coerce').fillna(0)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Convert float64 column to int64 column - python-3.x

To change dtype of column you need to use: df['ID'] = df['ID'].astype('int64') Documentation here: LINK

Related

How to convert Excel imported data in format %m%d%y H:M in a dataframe to datetime data?

I have a column date of the form '20041230' . I want to convert this column to the form 2004-12-30

Write Array and Variable to Dataframe

Why am I getting this TypeError when I try to slice my Pandas DataFrame?

Handle missing data for a dataframe column of datatype object

Categories

Resources