How to convert a datatype of pandas dataframe from str to float in Python3? - python-3.x

import pandas as pd
d=[('Shubham',24),
('Shrikant',58),
('na',34)]
df = pd.DataFrame(d,columns=['Name','Age'])
df.dtypes
Output:
Name object
Age int32
dtype: object
How do I convert the datatype of 'Name' column to float ?
df['Name'].astype(float)
Getting below error:
ValueError: could not convert string to float: 'na'

If you mean converting the name into number then no, string can't be turn into number directly using astype for what I know. If you meant to encode it then it is as follow:
import pandas as pd
d=[('Shubham',24),
('Shrikant',58),
('na',34)]
df = pd.DataFrame(d,columns=['Name','Age'])
df['Name'] = df['Name'].astype('category').cat.codes
print(df.head())

Related

How to convert working with date time in Pandas?

I have datetime field like 2017-01-15T02:41:38.466Z and would like to convert it to %Y-%m-%d format. How can this be achieved in pandas or python?
I tried this
frame['datetime_ordered'] = pd.datetime(frame['datetime_ordered'], format='%Y-%m-%d')
but getting the error
cannot convert the series to <class 'int'>
The following code worked
d_parser= lambda x: pd.datetime.strptime(x,'%Y-%m-%dT%H:%M:%S.%fZ')
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0,parse_dates['datetime_ordered'],date_parser=d_parser)
li.append(df)
frame =pd.concat(li, axis=0, ignore_index=True)
import datetime
from datetime import datetime
date_str="2017-01-15T02:41:38.466Z"
a_date=pd.to_datetime(date_str)
print("date time value", a_date)
#datetime to string with format
print(a_date.strftime('%Y-%m-%d'))

TypeError on Pandas DataFrame

I have an error trying to convert integer numbers to DateTime format on a CSV file using Pandas.
The code I'm using is:
import pandas as pd
from datetime import datetime,timedelta
data=pd.read_csv("Dataset.csv",low_memory=False)
data.Date = data.Date.apply(lambda x:datetime.strptime(x, '%Y-%m-%d'))
The DataFrame is:
The error is:
TypeError: strptime() argument 1 must be str, not int
Does anyone know what is wrong here?
Thank you!!

Pandas convert csv data to floats

I am trying to read data from csv and compute new values. Pandas interprets data as strings, hence, I cannot do math on values. attempt to convert values to floats fails as well. What is the correct way to convert string df to floats?
import pandas as pd
df = pd.read_csv('data.csv', names=['Open','High','Low','Close'])
#TypeError: unsupported operand type(s) for -: 'str' and 'str'
df['HL_PCT'] = (df['High']-df['Low'])/df['Close']
#ValueError: could not convert string to float: 'Low'
df['HL_PCT'] = (df['High'].astype(float)-df['Low'].astype(float))/df['Close'].astype(float)
print(df.head())
# head.csv
# Open,High,Low,Close
# 100.1,110.1,90.1,101.1
# 100.2,110.2,90.2,101.2
# 100.3,110.3,90.3,101.3
# 100.4,110.4,90.4,101.4
The problem was caused by the header. Should have skipped it like this:
df = pd.read_csv('data.csv', names=['Open','High','Low','Close'], skiprows=[0])

Turn numeric text string with powers of ten nomenclator (e+) into float in python pandas

I've got a dataframe with more than 30000 rows and almost 40 columns exported from a csv file.
The most part of it mixes str with int features.
-integers are int
-floats and powers of ten are str
It looks like this:
Id A B
1 2.5220019e+008 1742087
2 1.7766118e+008 2223964.5
3 3.3750285e+008 2705867.8
4 97782360 2.5220019e+008
I've tried the following code:
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point, LineString, shape
df = pd.read_csv('mycsvfile.csv').astype(float)
Which yields the this error message:
ValueError: could not convert string to float: '-1.#IND'
I guess that it has to do about the exponencial nomenclator of powers of ten (e+) that the python libraries isn't able to transform.
Is there a way to fix it?
From my conversation with QuangHoang I should apply the function:
pd.to_numeric(df['column'], errors='coerce')
Since almost the whole DataFrame are str objects, I ran the following code line:
df2 = df.apply(lambda x : pd.to_numeric(x, errors='coerce'))

Convert panda column to a string

I am trying to run the below script to add to columns to the left of a file; however it keeps giving me
valueError: header must be integer or list of integers
Below is my code:
import pandas as pd
import numpy as np
read_file = pd.read_csv("/home/ex.csv",header='true')
df=pd.DataFrame(read_file)
def add_col(x):
df.insert(loc=0, column='Creation_DT', value=pd.to_datetime('today'))
df.insert(loc=1, column='Creation_By', value="Sean")
df.to_parquet("/home/sample.parquet")
add_col(df)
Any ways to make the creation_dt column a string?
According to pandas docs header is row number(s) to use as the column names, and the start of the data and must be int or list of int. So you have to pass header=0 to read_csv method.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Also, pandas automatically creates dataframe from read file, you don't need to do it additionally. Use just
df = pd.read_csv("/home/ex.csv", header=0)
You can try:
import pandas as pd
import numpy as np
read_file = pd.read_csv("/home/ex.csv")
df=pd.DataFrame(read_file)
def add_col(x):
df.insert(loc=0, column='Creation_DT', value=str(pd.to_datetime('today')))
df.insert(loc=1, column='Creation_By', value="Sean")
df.to_parquet("/home/sample.parquet")
add_col(df)

Resources