Need helping moving data from one column to another - python-3.x

New to python and pandas and trying to figure this out.
I'm dealing with a data set that's pretty messy. There are 500 rows and 9 columns. In a few instances, data that should be in coulmn 9 has been indexed into column 8, along with column 8 data.
... Col 8 Col 9
0 2 weeks No. 13
1 1 week No. 2
2 12 weeks, No 1
3 15 weeks No. 8
4 7 weeks, No. 1
How can I separate the data and move to the proper column?
I applied a split(), but don't know how to move it over.
I'm thinking I need to use the apply(), but not sure on how.
Any suggestions?

You can split() with expand=True, then fillna() to fill the missing values:
df[['Col 8', 'Col 9']] = df['Col 8'].str.split(',', expand=True).fillna({1: df['Col 9']})
# Col 8 Col 9
# 0 2 weeks No. 13
# 1 1 week No. 2
# 2 12 weeks No 1
# 3 15 weeks No. 8
# 4 7 weeks No. 1

Related

Iterating through a data frame and grouping values in a range

I have a python data frame of weekly data like this :
Week Val
1 11
2 11
3 11
4 11
5 9
6 9
7 9
8 9
I would like create an output table like this:
Week 1 Week 2 Val
1 4 11
5 8 9
Apologies, I am quite new to python and its iterative tools. I am not sure how to solve this problem.
I tried to match using the previous row columns but I do not think how to go further:
df['Match'] = df['Val'].eq(df['Val'].shift(-1))
You want to groupby the consecutive blocks of Val. So you can use cumsum on the non-zero differences to get the block:
blocks = df['Val'].ne(df['Val'].shift(1)).cumsum()
(df.groupby(blocks, as_index=False)
.agg(Week1=('Week','min'), Week2=('Week','max'), Val=('Val', 'first'))
)
Or you can chain:
(df.groupby(df['Val'].ne(df['Val'].shift(1)).cumsum(), as_index=False)
.agg(Week1=('Week','min'), Week2=('Week','max'),Val=('Val', 'first'))
)
Output:
Week1 Week2 Val
0 1 4 11
1 5 8 9

compare two data frames and update value in one data frame by comparing another data frame value

I have two data frames. Examples:
df1:
A B C
5 7 6
8 1 1
1 0 7
3 4 9
5 7 4
9 2 0
df2:
A B C
3 2 1
6 5 7
9 7 9
1 1 2
6 4 5
0 8 6
Both data frames have same index.
What I want is , wherever df1's value is less than 5,
I want to update df2's value to 0, else keep it same.
I tried the following code:
df2[df1<5]=0
but when I am printing df2, its showing same values as original df2.
I know I am missing something really simple.
Please help me.
Thank you.

Pandas DataFrame: how do we keep columns based on the index name?

I seem to run into some python or enumerate bugs that I am not quite sure how to fix it (See here for more details).
Long story short, I desire to see multiple data sets that has a column name of 0,4,6,8,10,12,14.
0 4 6 8 10 12
1 2 5 4 2 1
5 3 0 1 5 10
....
But my current data looks like the following
0 4 2 6 8 10 12
1 2 5 4 2 1
5 3 0 1 5 10
....
Therefore, I would like to add a code that keeps columns based on the index number (including only 0,4,6,8,10,12).
Is there a pandas function that can help with this?

Count number of rows with conditions met in multiple columns

I can use
=COUNTIFS(range1, criteria1, range2, criteria2…) + COUNTIFS(...) etc
Count number of rows with conditions met in two columns
My data is such that the above will require needing approximately 150+ instances of countifs just for one cell. And I need to do this for 10K or so cells.
So wondering if there is another function/way to do this in a more efficient manner? Example of data as follows;
1 2 10 6
2 3 9 4
3 4 8 5
7 6 7 2
4 5 1 6
2 1 6 7
5 6 2 8
2 5 3 10
6 7 2 10
And need to count how many times number 2 & 6 occur on the same row.
EDIT: I also need to do this with any number combination. ie 2&6, 1&6, 5&10 etc.
Thanks
Add a column with a serial no from 1 to n.
Then use a nested if:
SUM(IF(SERIAL_NO = SERIAL NO
IF( IF COLUMN1 =2 AND COLUMN 3 = 6,10))) AS 26

Dynamically determining range to apply formula/function in EXCEL

I need to determine the range to apply the Frequency function. Here's what the problem is. On the given sheet, I have subtotals for my data and there is a column which has "Stop" Values.
The data would look something like:
Route1
Order# Stop# Qty
001016 1 5
008912 1 5
062232 2 6
062232 3 2
069930 4 1
1000 4 3
1001 4 4
1001 5 8
1003 8 1
Route 1 Subtotal 6 35
Route2
Order# Stop# Qty
10065 1 5
10076 1 5
10077 2 6
10079 3 2
10087 4 1
10098 4 3
10109 4 4
10171 5 8
10175 8 1
Route 2 Subtotal 6 35
How do I write VBA code for calculating the distinct stop values. I need the distinct count of the stop#. Hence in the example above you can see that the total stops are 6 because 1 stop can have multiple orders and 1 route can have multiple orders/stop. Hope I am making sense here. Let me know how I would write my VBA code for this. Thanks for your help.
For the Stop Subtotal unique count, try this formula (adjust ranges as required):
=COUNT(1/FREQUENCY(B2:B10,B2:B10))

Resources