Modify a column in Python such that the numbering is continuous - python-3.x

I have a dataset given as such:
#Load the required libraries
import pandas as pd
#Create dataset
data = {'team': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'],
'Run_time': [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 4],
'Married': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No'],
'Self_Employed': ['No', 'No', 'Yes', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'No', 'No'],
'LoanAmount': [123, 128, 66, 120, 141, 52,96,15,85,36,58,89],
}
#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)
The dataset looks as such:
Here, in the 'Run_time' column, the numbering starts at different index values.
I wish to ensure that the 'Run_time' column starts from 1 only.
The dataset needs to look as such:
Can somebody please let me know how to modify this column in Python such that the numbering is continuous?

import pandas as pd
#Create dataset
data = {'team': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'],
'Run_time': [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 4],
'Married': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No'],
'Self_Employed': ['No', 'No', 'Yes', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'No', 'No'],
'LoanAmount': [123, 128, 66, 120, 141, 52,96,15,85,36,58,89],
}
#Convert to dataframe
df = pd.DataFrame(data)
# print("df = \n", df)
df.Run_time = df.index+1
df

Related

Create an additional column in a datframe based on a specific condition

I have a dataset given as such:
#Load the required libraries
import pandas as pd
#Create dataset
data = {'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
'Run_time': [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 4],
'Married': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No'],
'Self_Employed': ['No', 'No', 'Yes', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'No', 'No'],
'LoanAmount': [123, 128, 66, 120, 141, 52,96,15,85,36,58,89],
}
#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)
Here, I wish to add an additional column 'Last_entry' which will contain 0's and 1's.
This column appears such that, for team-A, the last run-time is 5. So that row has Last_entry=1 and all other run-times for team-A should be 0.
For team-B, the last run-time is 3. So that row has Last_entry=1 and all other run-times for team-B should be 0.
For team-C, the last run-time is 4. So that row has Last_entry=1 and all other run-times for team-C should be 0.
The net result needs to look as such:
New dataframe by adding additional column
Can somebody please let me know how to achieve this task in python?
I wish to add an additional column in an existing dataset by using python
You can use groupby and tail to get the last entry for each team. Then make a new column of zeroes, and set the resulting rows to one:
# Determine indicies for last entries
last_entry_idx = df.groupby('team').tail(1).index
# Create new column
df['last_entry'] = 0
df.loc[last_entry_idx, 'last_entry'] = 1

Make a list inside the dictionary in python

I have a data frame like below. I want to get a dictionary consisting of a list.My expected output is. Can you pls assist me to get it?
You can use the handy groupby function in Pandas:
df = pd.DataFrame({
'Department': ['y1', 'y1', 'y1', 'y2', 'y2', 'y2'],
'Section': ['A', 'B', 'C', 'A', 'B', 'C'],
'Cost': [10, 20, 30, 40, 50, 60]
})
output = {dept: group['Cost'].tolist() for dept, group in df.groupby('Department')}
gives
{'y1': [10, 20, 30], 'y2': [40, 50, 60]}

How To Create A New List of Tuple From One List

sampleList = ['CustomerA', 'Yes', 'No', 'No', 'CustomerB', 'No', 'No', 'No', 'CustomerC', 'Yes', 'Yes', 'No']
Preferred Output : [('CustomerA', 'Yes', 'No', 'No'), ('CustomerB', 'No', 'No', 'No'), ('CustomerC', 'Yes', 'Yes', 'No')]
I wanted to create a list tuple from just one list.
Try this out.
sampleList = [
'CustomerA', 'Yes', 'No', 'No',
'CustomerB', 'No', 'No', 'No',
'CustomerC', 'Yes', 'Yes', 'No'
]
preferredOutput = [
tuple(sampleList[n : n + 4])
for n in range(0, len(sampleList), 4)
]
print(preferredOutput)
# OUTPUT (IN PRETTY FORM)
#
# [
# ('CustomerA', 'Yes', 'No', 'No'),
# ('CustomerB', 'No', 'No', 'No'),
# ('CustomerC', 'Yes', 'Yes', 'No')
# ]
You can use list comprehension output=[tuple(sampleList[4*i:4*i+4]) for i in range(3)]

Why the output is different in this list comprehension?

When I Execute the following code it works correctly:
hassam="CHECK"
list1={i:hassam[i] for i in range(5)}
list1
output:
{0: 'C', 1: 'H', 2: 'E', 3: 'C', 4: 'K'}
but when i execute this:
hassam="CHECK"
list1={hassam[i]:i for i in range(5)}
list1
output:
{'C': 3, 'H': 1, 'E': 2, 'K': 4}
why isnt this:
{'C': 1, 'H': 2, 'E': 3,'C' : 4 ,'K': 5}
For the dictionary :
{0: 'C', 1: 'H', 2: 'E', 3: 'C', 4: 'K'}
the numbers being the key is not same.
But for the dictionary
{'C': 1, 'H': 2, 'E': 3,'C' : 4 ,'K': 5}
python doesn't allow duplicate keys. Therefore the key is updated with the new value.
Here, it showed so because dictionaries cannot have same keys with diffrent values. So try using list1 as a list like:
list1=[{hassam[i]:i} for i in range(5)]
This would give:
[{'C': 0}, {'H': 1}, {'E': 2}, {'C': 3}, {'K': 4}]
Or a tuple instead of individual dictionaries:
list1=[(hassam[i],i) for i in range(5)]

How to properly recombine grouped observables?

I'm trying to create a tool for analysing stock prices.
I've got a stream of price data for different stocks, and I want to have an observable to emit events whenever it receives a new, distinct and complete set of prices.
My plan: grouping the stream into different sub-streams for different stocks, and recombining their latest values.
Let's say I've got a stream of events like this:
from rx import Observable
stock_events = [
{'stock': 'A', 'price': 15},
{'stock': 'A', 'price': 16},
{'stock': 'B', 'price': 24},
{'stock': 'C', 'price': 37},
{'stock': 'A', 'price': 18},
{'stock': 'D', 'price': 42},
{'stock': 'B', 'price': 27},
{'stock': 'B', 'price': 27},
{'stock': 'C', 'price': 31},
{'stock': 'D', 'price': 44}
]
price_source = Observable.from_list(stock_events)
Here is my first (naive) approach:
a_source = price_source.filter(lambda x: x['stock'] == 'A').distinct_until_changed()
b_source = price_source.filter(lambda x: x['stock'] == 'B').distinct_until_changed()
c_source = price_source.filter(lambda x: x['stock'] == 'C').distinct_until_changed()
d_source = price_source.filter(lambda x: x['stock'] == 'D').distinct_until_changed()
(Observable
.combine_latest(a_source, b_source, c_source, d_source, lambda *x: x)
.subscribe(print))
This correctly gives me:
({'stock': 'A', 'price': 18}, {'stock': 'B', 'price': 24}, {'stock': 'C', 'price': 37}, {'stock': 'D', 'price': 42})
({'stock': 'A', 'price': 18}, {'stock': 'B', 'price': 27}, {'stock': 'C', 'price': 37}, {'stock': 'D', 'price': 42})
({'stock': 'A', 'price': 18}, {'stock': 'B', 'price': 27}, {'stock': 'C', 'price': 31}, {'stock': 'D', 'price': 42})
({'stock': 'A', 'price': 18}, {'stock': 'B', 'price': 27}, {'stock': 'C', 'price': 31}, {'stock': 'D', 'price': 44})
Yet, I feel that this should be better handled by group_by, instead of several filterings, so here's a re-write:
(price_source
.group_by(lambda e: e['stock'])
.map(lambda obs: obs.distinct_until_changed())
.combine_latest(lambda *x: x)
.subscribe(print))
But this time, I get:
(<rx.core.anonymousobservable.AnonymousObservable object at 0x000000000105EA20>,)
(<rx.core.anonymousobservable.AnonymousObservable object at 0x000000000776AB00>,)
(<rx.core.anonymousobservable.AnonymousObservable object at 0x000000000776A438>,)
(<rx.core.anonymousobservable.AnonymousObservable object at 0x000000000775E7F0>,)
What have I missed here? How do I "unwrap" the nested observables?
If you did want to use groupby it would be something like below in C#. This doesn't meet your requirement of a "complete" set though. As per comments, suspect CombineLatest would be better here.
price_source.GroupBy(x => x.Stock)
.Select(gp => gp.DistinctUntilChanged(x => x.Price))
.SelectMany(x => x)
.Subscribe(s => Console.WriteLine($"{s.Stock} : {s.Price}"));

Resources