How do I use .loc in a DataFrame to seperate out time values? Examples would be great - pandas-loc

How do I use .loc in a DataFrame to seperate out time values? Examples would be great!
This is what my datetime looks like

Related

Reshape dataframe type. GroupBy or pivot table or for i in?

I have a dataframe which looks like the below one.
I'd like to convert to below table type. Pivot? For i in ?
Any advice in getting this done.
I can't replicate your data because of the way you posted it (please don't do that again) but try something along this, using pd.pivot_table:
import pandas as pd
new = pd.pivot_table(df,index=['city','class','sort'],columns='month', values ='amount', aggfunc='sum')

Is there a way to get the oldest date of a Pandas dataframe when not all columns are dates?

Just to give a little background, I'm trying to loop through the dataframe and query each Asset by it's oldest date. Then I can trim the data with Pandas locally for each of the items that make the asset.
The dataframe that I'm tryin to loop through looks something like this:
I've tried using query_date = df.min(axis=1) but it picks up the values rather than the dates.
query_date will be the start date for each query that will be inside a for loop.
Thanks in advance for the help

Groupby Strings

I'm trying to create a new dataframe from an existing dataframe. I tried groupby but I didn't seem to sum up the strings as a whole number. Instead it returned many strings(colleges in this case).
This is the original dataframe
I tried groupby to get the number(whole number) of the colleges but it returned a many colleges(string) instead
How do I return the number of colleges as an integer in the new column 'totalPlayer'? Please help.
Hoping, I understand your question correctly.
You need to count the distinct values in college column.
Assuming df is the name of your data frame. Below code will help.
df['college'].nunique()
Helping Links -
Counting unique values in a column in pandas dataframe like in Qlik?
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.nunique.html

How to automatically index DataFrame created from groupby in Pandas

Using the Kiva Loan_Data from Kaggle I aggregated the Loan Amounts by country. Pandas allows them to be easily turned into a DataFrame, but indexes on the country data. The reset_index can be used to create a numerical/sequential index, but I'm guessing I am adding an unnecessary step. Is there a way to create an automatic default index when creating a DataFrame like this?
Use as_index=False
groupby
split-apply-combine
df.groupby('country', as_index=False)['loan_amount'].sum()

A more efficient way of getting the nlargest values of a Pyspark Dataframe

I am trying to get the top 5 values of a column of my dataframe.
A sample of the dataframe is given below. In fact the original dataframe has thousands of rows.
Row(item_id=u'2712821', similarity=5.0)
Row(item_id=u'1728166', similarity=6.0)
Row(item_id=u'1054467', similarity=9.0)
Row(item_id=u'2788825', similarity=5.0)
Row(item_id=u'1128169', similarity=1.0)
Row(item_id=u'1053461', similarity=3.0)
The solution I came up with is to sort all of the dataframe and then to take the first 5 values. (the code below does that)
items_of_common_users.sort(items_of_common_users.similarity.desc()).take(5)
I am wondering if there is a faster way of achieving this.
Thanks
You can use RDD.top method with key:
from operator import attrgetter
df.rdd.top(5, attrgetter("similarity"))
There is a significant overhead of DataFrame to RDD conversion but it should be worth it.

Resources