how can i sum a columns in DataFrame with each date in time series data

how can i sum a columns in DataFrame with each date in time series data - python-3.x

Here's the example
'''
df = pd.DataFrame({'Country': ['United States', 'China', 'Italy', 'spain'],
'2020-01-01' : [0, 2, 1, 0]
'2020-01-02' : [1, 0, 1, 2]
'2020-01-03' : [0, 3, 2, 0]
df
'''
i want to sum the value of columns by date so that next columns has the added value.__ which means 2020-01-02 has a new added value of (2020-01-01+2020-01-02) and so on..

Convert Country column to index by DataFrame.set_index and use DataFrame.cumsum per rows by axis=1:
df = df.set_index('Country').cumsum(axis=1)
print (df)
2020-01-01 2020-01-02 2020-01-03
Country
United States 0 1 1
China 2 2 5
Italy 1 2 4
spain 0 2 2
Or select all columns without first by DataFrame.iloc before cumsum:
df.iloc[:, 1:] = df.iloc[:, 1:].cumsum(axis=1)
print (df)
Country 2020-01-01 2020-01-02 2020-01-03
0 United States 0 1 1
1 China 2 2 5
2 Italy 1 2 4
3 spain 0 2 2

Related

Python pandas move cell value to another cell in same row

I have a dataFrame like this:
id Description Price Unit
1 Test Only 1254 12
2 Data test Fresher 4
3 Sample 3569 1
4 Sample Onces Code test
5 Sample 245 2
I want to move to the left Description column from Price column if not integer then become NaN. I have no specific word to call in or match, the only thing is if Price column have Non-integer value, that string value move to Description column.
I already tried pandas replace and concat but it doesn't work.
Desired output is like this:
id Description Price Unit
1 Test Only 1254 12
2 Fresher 4
3 Sample 3569 1
4 Code test
5 Sample 245 2

This should work
# data
df = pd.DataFrame({'id': [1, 2, 3, 4, 5],
'Description': ['Test Only', 'Data test', 'Sample', 'Sample Onces', 'Sample'],
'Price': ['1254', 'Fresher', '3569', 'Code test', '245'],
'Unit': [12, 4, 1, np.nan, 2]})
# convert price column to numeric and coerce errors
price = pd.to_numeric(df.Price, errors='coerce')
# for rows where price is not numeric, replace description with these values
df.Description = df.Description.mask(price.isna(), df.Price)
# assign numeric price to price column
df.Price = price
df

Use:
#convert valeus to numeric
price = pd.to_numeric(df['Price'], errors='coerce')
#test missing values
m = price.isna()
#shifted only matched rows
df.loc[m, ['Description','Price']] = df.loc[m, ['Description','Price']].shift(-1, axis=1)
print (df)
id Description Price
0 1 Test Only 1254
1 2 Fresher NaN
2 3 Sample 3569
3 4 Code test NaN
4 5 Sample 245
If need numeric values in ouput Price column:
df = df.assign(Price=price)
print (df)
id Description Price
0 1 Test Only 1254.0
1 2 Fresher NaN
2 3 Sample 3569.0
3 4 Code test NaN
4 5 Sample 245.0

Count positive, negative or zero values numbers for multiple columns in Python

Given a dataset as follows:
[{'id': 1, 'ltp': 2, 'change': nan},
{'id': 2, 'ltp': 5, 'change': 1.5},
{'id': 3, 'ltp': 3, 'change': -0.4},
{'id': 4, 'ltp': 0, 'change': 2.0},
{'id': 5, 'ltp': 5, 'change': -0.444444},
{'id': 6, 'ltp': 16, 'change': 2.2}]
Or
id ltp change
0 1 2 NaN
1 2 5 1.500000
2 3 3 -0.400000
3 4 0 2.000000
4 5 5 -0.444444
5 6 16 2.200000
I would like to count the number of positive, negative and 0 values for columns ltp and change, the result may like this:
columns positive negative zero
0 ltp 5 0 1
1 change 3 2 0
How could I do that with Pandas or Numpy? Thanks.
Updated: if I need groupby type and count following the logic above
id ltp change type
0 1 2 NaN a
1 2 5 1.500000 a
2 3 3 -0.400000 a
3 4 0 2.000000 b
4 5 5 -0.444444 b
5 6 16 2.200000 b
The expected output:
type columns positive negative zero
0 a ltp 3 0 0
1 a change 1 1 0
2 b ltp 2 0 1
3 b change 2 1 0

Use np.sign with selected columns first, then counts values in value_counts, transpose, replaced missing values and last rename columns names by dictionary with convert index to column columns:
d= {-1:'negative', 1:'positive', 0:'zero'}
df = (np.sign(df[['ltp','change']])
.apply(pd.value_counts)
.T
.fillna(0)
.astype(int)
.rename(columns=d)
.rename_axis('columns')
.reset_index())
print (df)
columns negative zero positive
0 ltp 0 1 5
1 change 2 0 3
EDIT: Another solution with type column with DataFrame.melt, mapping column with np.sign and count values by crosstab:
d= {-1:'negative', 1:'positive', 0:'zero'}
df1 = df.melt(id_vars='type', value_vars=['ltp','change'], var_name='columns')
df1['value'] = np.sign(df1['value']).map(d)
df1 = (pd.crosstab([df1['type'],df1['columns']], df1['value'])
.rename_axis(columns=None)
.reset_index())
print (df1)
type columns negative positive zero
0 a change 1 1 0
1 a ltp 0 3 0
2 b change 1 2 0
3 b ltp 0 2 1

How to normalize the entity having multiple values for the one feature in featuretools?

Below is an example:
buy_log_df = pd.DataFrame(
[
["2020-01-02", 0, 1, 2, 2],
["2020-01-02", 1, 1, 1, 3],
["2020-01-02", 2, 2, 1, 1],
["2020-01-02", 3, 3, 3, 1],
],
columns=['date', 'sale_id', 'customer_id', "item_id", "quantity"]
)
item_df = pd.DataFrame(
[
[1, 100],
[2, 200],
[3, 300],
],
columns=['item_id', 'price']
)
item_df2 = pd.DataFrame(
[
[1, '1 3 10'],
[2, '1 3'],
[3, '2 5'],
],
columns=['item_id', 'tags']
)
As you can see here, each item in item_df has multiple tag values as an one feature.
Here is what I've tried:
item_df2 = pd.concat([item_df2, item_df2['tags'].str.split(expand=True)], axis=1)
item_df2 = pd.melt(
item_df2,
id_vars=['item_id'],
value_vars=[0,1,2],
value_name="tags"
)
tag_log_df = item_df2[item_df2['tags'].notna()].drop("variable", axis=1,).sort_values("item_id")
tag_log_df
>>>
item_id tags
0 1 1
3 1 3
6 1 10
1 2 1
4 2 3
2 3 2
5 3 5
It looks like I can't normalize this item entity (from buy_log entity) because it has multiple duplicated item_ids in the table.
How can I handle this case when I design the entityset?

Thanks for the question. To handle multiple tag values, you can normalize the tags into a data frame before structuring the entity set.
buy_log_df
date sale_id customer_id item_id quantity
2020-01-02 0 1 2 2
2020-01-02 1 1 1 3
2020-01-02 2 2 1 1
2020-01-02 3 3 3 1
item_df
item_id price
1 100
2 200
3 300
tag_log_df
item_id tags
1 1
1 3
1 10
2 1
2 3
3 2
3 5
With the normalized data, you can then structure the entity set.
es = ft.EntitySet()
es.entity_from_dataframe(
entity_id='buy_log',
dataframe=buy_log_df,
index='sale_id',
time_index='date',
)
es.entity_from_dataframe(
entity_id='item',
dataframe=item_df,
index='item_id',
)
es.entity_from_dataframe(
entity_id='tag_log',
dataframe=tag_log_df,
index='tag_log_id',
make_index=True,
)
parent = es['item']['item_id']
child = es['buy_log']['item_id']
es.add_relationship(ft.Relationship(parent, child))
child = es['tag_log']['item_id']
es.add_relationship(ft.Relationship(parent, child))

Binning with pd.Cut Beyond range(replacing Nan with "<min_val" or ">Max_val" )

df= pd.DataFrame({'days': [0,31,45,35,19,70,80 ]})
df['range'] = pd.cut(df.days, [0,30,60])
df
Here as code is reproduced , where pd.cut is used to convert a numerical column to categorical column . pd.cut usually gives category as per the list passed [0,30,60]. In this row's 0 , 5 & 6 categorized as Nan which is beyond the [0,30,60]. what i want is 0 should categorized as <0 & 70 should categorized as >60 and similarly 80 should categorized as >60 respectively, If possible dynamic text labeling of A,B,C,D,E depending on no of category created.

For the first part, adding -np.inf and np.inf to the bins will ensure that everything gets a bin:
In [5]: df= pd.DataFrame({'days': [0,31,45,35,19,70,80]})
...: df['range'] = pd.cut(df.days, [-np.inf, 0, 30, 60, np.inf])
...: df
...:
Out[5]:
days range
0 0 (-inf, 0.0]
1 31 (30.0, 60.0]
2 45 (30.0, 60.0]
3 35 (30.0, 60.0]
4 19 (0.0, 30.0]
5 70 (60.0, inf]
6 80 (60.0, inf]
For the second, you can use .cat.codes to get the bin index and do some tweaking from there:
In [8]: df['range'].cat.codes.apply(lambda x: chr(x + ord('A')))
Out[8]:
0 A
1 C
2 C
3 C
4 B
5 D
6 D
dtype: object

Drop a column in pandas if all values equal 1?

How do I drop columns in pandas where all values in that column are equal to a particular number? For instance, consider this dataframe:
df = pd.DataFrame({'A': [1, 1, 1, 1],
'B': [0, 1, 2, 3],
'C': [1, 1, 1, 1]})
print(df)
Output:
A B C
0 1 0 1
1 1 1 1
2 1 2 1
3 1 3 1
How would I drop the 1 columns so that the output is:
B
0 0
1 1
2 2
3 3

Use DataFrame.loc with test if at least one non 1 value by DataFrame.ne with DataFrame.any:
df1 = df.loc[:, df.ne(1).any()]
Or test for 1 by DataFrame.eq with DataFrame.all for all Trues per columns and inverted mask by ~:
df1 = df.loc[:, ~df.eq(1).all()]
print (df1)
B
0 0
1 1
2 2
3 3
EDIT:
One consideration is what do you want to happen if you have a column with Nan and 1 only?
Then replace NaNs to 0 by DataFrame.fillna and use same solution like before:
df1 = df.loc[:, df.fillna(0).ne(1).any()]
df1 = df.loc[:, ~df.fillna(0).eq(1).all()]

You can use any:
df.loc[:, df.ne(1).any()]

One consideration is what do you want to happen if you have a column with Nan and 1 only?
If you want to drop under this condition also, you will to either fillna with 1 or add or and new condition.
df = pd.DataFrame({'A': [1, 1, 1, 1],
'B': [0, 1, 2, 3],
'C': [1, 1, 1, np.nan]})
print(df)
A B C
0 1 0 1.0
1 1 1 1.0
2 1 2 1.0
3 1 3 NaN
All these leave that column with NaN and 1's.
df.loc[:, df.ne(1).any()]
df.loc[:, ~df.eq(1).all()]
So, you can add this addition to drop that column also.
df.loc[:, ~(df.eq(1) | df.isna()).all()]
Output:
B
0 0
1 1
2 2
3 3

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

how can i sum a columns in DataFrame with each date in time series data - python-3.x

Related

Python pandas move cell value to another cell in same row

Count positive, negative or zero values numbers for multiple columns in Python

How to normalize the entity having multiple values for the one feature in featuretools?

Binning with pd.Cut Beyond range(replacing Nan with "<min_val" or ">Max_val" )

Drop a column in pandas if all values equal 1?

Categories

Resources