The script:
import csv
import pandas as pd
df2 = pd.read_csv("aapl.csv", header= 0 , encoding = 'latin-1')
print(df2)
result:
enter image description here
As you can see when printing where there is CSV formula I get the formula instead of the value...
any solutions?
you can specify the column type
import openpyxl, csv
import pandas as pd
df2 = pd.read_csv("aapl.csv", header= 0 , encoding = 'latin-1', dtype={"upper": int,"lower": int})
print(df2)
hope this will help you
Related
I am trying to change all negative values to 0 in excel files.
However, it seems like the pandas skips the first row.
Please help me with preventing skip issue! Thank you.
Here is the code:
# importing pandas module
import pandas as pd
import numpy as np
import csv
from pandas import DataFrame
df = pd.read_csv("FAPI-N2-rere_2D_modified.csv")
df[df < 0 ] = 0
df.to_csv('FAPI-N2-rere_2D_modified2.csv')
==========================================================
I have tried to add some codes into the above,
# importing pandas module
import pandas as pd
import numpy as np
import csv
from pandas import DataFrame
df = pd.read_csv("FAPI-N2-rere_2D_modified.csv", **header = None**)
df[df < 0 ] = 0
df.to_csv('FAPI-N2-rere_2D_modified2.csv')
However, i keep getting the typeerror:
TypeError: '<' not supported between instances of 'str' and 'int'
I would be so much appreciated if anyone could please help me.
Thank you so much!
Some columns of your dataframe contains strings.
If their content is numeric you can try to convert them into numeric datatype, here it is explained how to do that:
Change column type in pandas
If you have columns containing strings, you need to replace your negative values only on numeric columns.
I don't know why na_values is not changing the values with "$-" to NaN. I have manually entered the $- in the file and there are no spaces.
import pandas as pd
df=pd.read_csv('discounted_products.csv',na_values = ['$-'])
df.head()
enter image description here
Please help here.
It could be because pandas works with regex by default on string methods (albeit it is not mentioned in the specific read_csv documentation). Try
na_values = [r'$-']
Update
It worked fine for me
from io import StringIO
df = pd.read_csv(StringIO(
'''a,b
test,$-'''), na_values=[r'$-'])
print(df)
a b
0 test NaN
I am trying to run the below script to add to columns to the left of a file; however it keeps giving me
valueError: header must be integer or list of integers
Below is my code:
import pandas as pd
import numpy as np
read_file = pd.read_csv("/home/ex.csv",header='true')
df=pd.DataFrame(read_file)
def add_col(x):
df.insert(loc=0, column='Creation_DT', value=pd.to_datetime('today'))
df.insert(loc=1, column='Creation_By', value="Sean")
df.to_parquet("/home/sample.parquet")
add_col(df)
Any ways to make the creation_dt column a string?
According to pandas docs header is row number(s) to use as the column names, and the start of the data and must be int or list of int. So you have to pass header=0 to read_csv method.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Also, pandas automatically creates dataframe from read file, you don't need to do it additionally. Use just
df = pd.read_csv("/home/ex.csv", header=0)
You can try:
import pandas as pd
import numpy as np
read_file = pd.read_csv("/home/ex.csv")
df=pd.DataFrame(read_file)
def add_col(x):
df.insert(loc=0, column='Creation_DT', value=str(pd.to_datetime('today')))
df.insert(loc=1, column='Creation_By', value="Sean")
df.to_parquet("/home/sample.parquet")
add_col(df)
I am pretty new to Python (using Python3) and read Pandas to import dataset.
I need to import dataset from url - https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt
and convert it to csv file, I am getting some special character in converted csv -> ��
I am download txt file and converting it to csv, is is the right approach?
and converted csv is putting entire text into one column
from urllib.request import urlretrieve
import pandas as pd
from pandas import DataFrame
url = 'https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt'
urlretrieve(url, 'index.txt')
df = pd.read_csv('index.txt', sep='/t', engine='python', lineterminator='\r\n')
csv_file = df.to_csv('index.csv', sep='\t', index=False, header=True)
print(csv_file)
after successful import, I have to Extract X as all columns except the first column and Y as first column also.
I'll appreciate your all help.
from urllib.request import urlretrieve
import pandas as pd
url = 'https://newonlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission/index.txt'
urlretrieve(url, 'index.txt')
df = pd.read_csv('index.txt', sep='\t',encoding='utf-16')
Y = df[['REMISS']]
X = df.drop(['REMISS'],axis=1)
I am trying to use below code to get posts with specific keywords from my csv file but I keep getting KeyErro "Tag1"
import re
import string
import pandas as pd
import openpyxl
import glob
import csv
import os
import xlsxwriter
import numpy as np
keywords = {"agile","backlog"}
# all your keywords
df = pd.read_csv(r"C:\Users\ferr1982\Desktop\split1_out.csv",
error_bad_lines=False)#, sep="," ,
encoding="utf-8")
output = pd.DataFrame(columns=df.columns)
for i in range(len(df.index)):
#if (df.loc[df['Tags'].isin(keywords)]):
if any(x in ((df['Tags1'][i]),(df['Tags2'][i]), (df['Tags3'][i] ),
(df['Tags4'][i]) , (df['Tags5'][i])) for x in keywords):
output.loc[len(output)] = [df[j][i] for j in df.columns]
output.to_csv("new_data5.csv", incdex=False)
Okay, it turned to be that there is a little space before "Tags" column in my CSV file !
it is working now after I added the space to the name in the code above.