pandas read_csv function fails - python-3.x

I am trying to convert the following csv into a dataframe, using simply :
import pandas as pd
ticket = pd.read_csv("file.csv")
However, due to the missing quotation marks on the first column of the csv :
A,"B","C","D"
0,"1","2","3"
It fails to assign properly each rows to its rightful header.

import pandas as pd
df = pd.read_csv('your_csv.csv')
IN [1]: df
Out[1]:
A "B" "C" "D"
0 0 "1" "2" "3"
Seems to work for me with the given data placed into a .csv, can you be more specific about the error?

Related

How to correctly convert Year-month to Year-quarter

I am trying to convert a pandas dataframe containing date in YYYYMM format to YYYYQ format as below
import pandas as pd
dat = pd.DataFrame({'date' : ['200612']})
pd.PeriodIndex(pd.to_datetime(dat.date), freq='Q')
However this generates output as 2012Q2, whereas correct output should be 2006Q4
What is the right way to get correct Quarter?
Explicitly specific the input format:
dat = pd.DataFrame({'date' : ['200612']})
pd.PeriodIndex(pd.to_datetime(dat.date, format='%Y%m'), freq='Q')
Output:
PeriodIndex(['2006Q4'], dtype='period[Q-DEC]', name='date')

Get second column of a data frame using pandas

I am new to Pandas in Python and I am having some difficulties returning the second column of a dataframe without column names just numbers as indexes.
import pandas as pd
import os
directory = 'A://'
sample = 'test.txt'
# Test with Air Sample
fileAir = os.path.join(directory,sample)
dataAir = pd.read_csv(fileAir,skiprows=3)
print(dataAir.iloc[:,1])
The data I am working with would be similar to:
data = [[1,2,3],[1,2,3],[1,2,3]]
Then, using pandas I wanted to have only
[[2,2,2]].
You can use
dataframe_name[column_index].values
like
df[1].values
or
dataframe_name['column_name'].values
like
df['col1'].values

Pandas : how to consider content of certain columns as list

Let's say I have a simple pandas dataframe named df :
0 1
0 a [b, c, d]
I save this dataframe into a CSV file as follow :
df.to_csv("test.csv", index=False, sep="\t", encoding="utf-8")
Then later in my script I read this csv :
df = pd.read_csv("test.csv", index_col=False, sep="\t", encoding="utf-8")
Now what I want to do is to use explode() on column '1' but it does not work because the content of column '1' is not a list since I saved df into a CSV file.
What I tried so far is to change column '1' type into a list with astype() without any success.
Thank you by advance.
Try this, Since you are reading from csv file,your dataframe value in column A (1 in your case) is essentially a string for which you need to infer the values as list.
import pandas as pd
import ast
df=pd.DataFrame({"A":["['a','b']","['c']"],"B":[1,2]})
df["A"]=df["A"].apply(lambda x: ast.literal_eval(x))
Now, the following works !
df.explode("A")

Use Pandas to extract the values from a column based on some condition

I'm trying to pick a particular column from a csv file using Python's Pandas module, where I would like to fetch the Hostname if the column Group is SJ or DC.
Below is what I'm trying but it's not printing anything:
import csv
import pandas as pd
pd.set_option('display.height', 500)
pd.set_option('display.max_rows', 5000)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 500)
low_memory=False
data = pd.read_csv('splnk.csv', usecols=['Hostname', 'Group'])
for line in data:
if 'DC' and 'SJ' in line:
print(line)
The data variable contains the values for Hostname & Group columns as follows:
11960 NaN DB-Server
11961 DC Sap-Server
11962 SJ comput-server
Note: while printing the data it stripped the data and does not print complete data.
PS: I have used the pandas.set_option to get the complete data on the terminal!
for line in data: doesn't iterate over row contents, it iterates over the column names. Pandas has several good ways to filter columns by their contents.
For example, you can use df.Series.isin() to select rows matching one of several values:
print data[data['Group'].isin(['DC', 'SJ'])]['Hostname']
If it's important that you iterate over rows, you can use df.iterrows():
for index, row in data.iterrows():
if row['Group'] == 'DC' or row['Group'] == 'SJ':
print row['Hostname']
If you're just getting started with Pandas, I'd recommend trying a tutorial to get familiar with the basic structure.
Try this:
import csv
import pandas as pd
import numpy as np #You can comment numpy as it is not needed.
low_memory=False
data = pd.read_csv('splnk.csv', usecols=['Hostname', 'Group'])
hostnames = data[(data['Group']=='DC') | (data['Group']=='SJ')]['Hostname'] # corrected the `hostname` to `Hostname`
print(hostnames)

Unable to Parse pandas Series to datetime

I'm importing a csv files which contain a datetime column, after importing the csv, my data frame will contain the Dat column which type is pandas.Series, I need to have another column that will contain the weekday:
import pandas as pd
from datetime import datetime
data =
pd.read_csv("C:/Users/HP/Desktop/Fichiers/Proj/CONSOMMATION_1h.csv")
print(data.head())
all the data are okay, but when I do the following:
data['WDay'] = pd.to_datetime(data['Date'])
print(type(data['WDay']))
# the output is
<class 'pandas.core.series.Series'>
the data is not converted to datetime, so I can't get the weekday.
Problem is you need dt.weekday with .dt:
data['WDay'] = data['WDay'].dt.weekday
Without dt is used for DataetimeIndex (not in your case) - DatetimeIndex.weekday:
data['WDay'] = data.index.weekday
use the command data.dtypes to check the type of the columns.

Resources