Python Pandas Merge two CSV based on Time Stamp - python-3.x

Could someone give me a tip on how to merge two CSV files based on time stamp? Concat works, but I also need to organize the data based on one single stamp, DateTime. In the shell output snip below both DateTime columns are visible. Thank you
import pandas as pd
import numpy as np
import datetime
WUdata = pd.read_csv('C:\\Users\\bbartling\\Documents\\Python\\test_SP
data\\3rd GoRound k Nearest\\WB data\\WU\\WUdata.csv')
print(WUdata.describe())
print(WUdata.shape)
print(WUdata.columns)
print(WUdata.info())
kWdata = pd.read_csv('C:\\Users\\bbartling\\Documents\\Python\\test_SP
data\\3rd GoRound k Nearest\\WB data\\WU\\kWdata.csv')
print(kWdata.describe())
print(kWdata.shape)
print(kWdata.columns)
print(kWdata.info())
merged = pd.concat([WUdata, kWdata], axis=1)
print(merged)

Related

Get second column of a data frame using pandas

I am new to Pandas in Python and I am having some difficulties returning the second column of a dataframe without column names just numbers as indexes.
import pandas as pd
import os
directory = 'A://'
sample = 'test.txt'
# Test with Air Sample
fileAir = os.path.join(directory,sample)
dataAir = pd.read_csv(fileAir,skiprows=3)
print(dataAir.iloc[:,1])
The data I am working with would be similar to:
data = [[1,2,3],[1,2,3],[1,2,3]]
Then, using pandas I wanted to have only
[[2,2,2]].
You can use
dataframe_name[column_index].values
like
df[1].values
or
dataframe_name['column_name'].values
like
df['col1'].values

How do I pull daily pytrends data for multiple keywords and save them to a .csv

I've locked myself out of pytrends trying to solve this. Found some help in an old post
There are a few elements, firstly, I don't fully understand the documentation e.g. what is the payload? When i run it it doesn't seem to do anything. The result is I'm working with a lot of copy pasted code.
Second, I want to get keyword trend data for the year to date in a .csv
import pandas as pd
from pytrends.exceptions import ResponseError
from pytrends.request import TrendReq
import matplotlib.pyplot as plt
data = []
kw_list = ["maxi dresses", "black shorts"]
for kw in kw_list:
kw_data = dailydata.get_daily_data(kw, 2020, 1, 2020, 4, geo = 'GB')
data.append(kw_data)
data.to_csv(r"C:\Users\XXXX XXXXX\Documents\Python Files\PyTrends\trends_py.csv".)
I also tried:
df =pytrends.get_historical_interest(kw_list, year_start=2020, month_start=1, day_start=1, year_end=2020, month_end=4, geo='GB', gprop='', sleep=0)
df = df.reset_index()
df.head(20)
Though for my purposes get_historical_interest is useless because it provides hourly data with lots of 0s. The hourly data also doesn't match trends.

Data Frame column to extract time with AM/PM fromat from datetime value

I am reading in some excel data that contains datetime values stored as '8/13/2019 4:51:00 AM' and formatted as '4:51:00 AM' in excel. I would like to have a data frame that converts the value to a timestamp formatted as '4:51 AM' or H%:M% p%.
I have tried using datetime strptime but I don't believe I have been using it correctly. None of my attempts have worked so I have left it out of the code below. The two columns I would like to convert are 'In Punch' and 'Out Punch'
import pandas as pd
import pymssql
import numpy as np
import xlrd
import os
from datetime import datetime as dt
rpt = xlrd.open_workbook('OpenReport.xls', logfile=open(os.devnull,'w'))
rpt = pd.read_excel(rpt, skiprows=7)[['ID','Employee','Date/Time','In Punch','Out Punch',
'In Punch Comment','Out Punch Comment', 'Totaled Amount']]
rpt
Any suggestions will be greatly appreciated. Thanks
EDIT:
Working with the following modifications now.
rpt['In Punch'] = pd.to_datetime(rpt['In Punch']).dt.strftime('%I:%M %p')
rpt['Out Punch'] = pd.to_datetime(rpt['Out Punch']).dt.strftime('%I:%M %p')
Try working with datetime inside pandas. Convert Pandas Column to DateTime has some good suggestions that could help you out.
rpt['In Punch'] = pd.to_datetime(rpt['In Punch'])
Then you can do all sorts of lovely tweaks to a datetime. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html

Unable to Parse pandas Series to datetime

I'm importing a csv files which contain a datetime column, after importing the csv, my data frame will contain the Dat column which type is pandas.Series, I need to have another column that will contain the weekday:
import pandas as pd
from datetime import datetime
data =
pd.read_csv("C:/Users/HP/Desktop/Fichiers/Proj/CONSOMMATION_1h.csv")
print(data.head())
all the data are okay, but when I do the following:
data['WDay'] = pd.to_datetime(data['Date'])
print(type(data['WDay']))
# the output is
<class 'pandas.core.series.Series'>
the data is not converted to datetime, so I can't get the weekday.
Problem is you need dt.weekday with .dt:
data['WDay'] = data['WDay'].dt.weekday
Without dt is used for DataetimeIndex (not in your case) - DatetimeIndex.weekday:
data['WDay'] = data.index.weekday
use the command data.dtypes to check the type of the columns.

Pandas: Generating a data frame from each spreadsheet in a large excel file

I have a large excel file which I have imported into pandas, made up of 92 sheets.
I want to use a loop or some tool to generate dataframes from the data in each spreadsheet (one dataframe from each spreadsheet), which also automatically names each dataframe.
I have only just started using pandas and jupyter so I am not very experienced at all.
This is the code I have so far:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import datetime
%matplotlib inline
concdata = pd.ExcelFile('Documents/Research Project/Data-Ana/11July-27Dec.xlsx')
I also have a list of all the spreadsheet names:
#concdata.sheet_names
Thanks!
Instead of making each DataFrame its own variable you can assign each sheet a name in a Python dictionary like so:
dfs = {}
for sheet in concdata.sheet_names:
dfs[sheet] = concdata.parse(sheet)
And then access each DataFrame with the sheet name:
dfs['sheet_name_here']
Doing it this way allows you to have amortised O(1) lookup of sheets.

Resources