Split a dataframe column into different columns - python-3.x

I am using below code to split 1 dataframe column which has cell as below. Now in this case sometimes one of the below description, consequences, probable causes, recommended actions can be missing.
Column cell 1
Description
abc.
Consequences
def.
Probable causes
ghi.
Recommended actions
jkl
Column cell 2
Description
mno.
Probable causes
pqr
Recommended actions
stu
and so on.
How to split it to get.
Description Consequences Probable causes recommended actions
1st row abc def ghi jkl
2nd row mno - pqr stu
3rd row - vwx yza bcd
4th row efg hij - klm
5th row nop qrs tuv -
and so on.
I am using:
pattern = r"^Description\n*(?P<Description>.*?)\n+Consequences\n*(?P<Consequences>.*?)\n+Probable causes\n*(?P<Cause>.*?)\n+Recommended actions\n*(?P<Actions>.*)"
df = result[0].str.extract(pattern).fillna("")
But with this I am only getting values which have all 4 fields available. What modification I will have to do to get rows even with missing values.

Related

EXCEL Sum up points based on placements (combine VLOOKUP and SUM)

e.g. I have a list of race results:
A B C D E F...
NAME P. RACE1 RACE2 RACE
abc =? 1 3 3
bcd 3 2 4
cde 4 4 2
def 2 1 1
and another sheet with points for each result:
A B
PLACE POINT
1 10
2 5
3 2
4 1
Is it possible to get the total points in sheet1 column B based on the race results in column C-E..?
Is it a connection from VLOOKUP and SUM?
Yes, that's possible. You can use a SUMPRODUCT formula for that. You may use this one in column B:
=SUMPRODUCT((C2:E2=$A$13:$A$16)*$B$13:$B$16)
Your result will look like this:
This is an array function. The term C2:E2=$A$13:$A$16 will check for race 1 to 3 if it was 1st, 2nd, 3rd or 4th place. This will result in an "imaginary" array of TRUE and FALSE. For name "abc", it will look like that.
Those results are then multiplied with the points from B13:B16 and the sum is formed.
In Excel O365, one could use:
Formula in B2:
=SUM(VLOOKUP(C2:E2,H$2:I$5,2))

How to filter out a first value of a column and then second value and then third and so on till nth using python

I have a Excel file as below which has multiple information in it :
Name list company
x [xyz,mno,pqr] xyz
y [abc,rst,hij] abc
x [xyz,mno,pqr] uvw
y [abc,rst,hij] def
x [xyz,mno,pqr] mno
y [abc,rst,hij] rst
and from this excel i want to apply filter in Name column and take the first value and then want to check few things and then filter second value and so on which i am explaining with an example below for first value:
Suppose I have filtered "x" from Name column then i have 3 rows , so from list column (horizontal) i need to check whether all three "xyz", "mno" and "pqr" are present in company column (vertical) or not. So, here "xyz" and "mno" are present in first and third row of company column but "pqr" is not present in any of the row. So in output i want "pqr" as shown below:
Name list company Output
x [xyz,mno,pqr] xyz pqr
y [abc,rst,hij] abc hij
x [xyz,mno,pqr] uvw pqr
y [abc,rst,hij] def hij
x [xyz,mno,pqr] mno pqr
y [abc,rst,hij] rst hij
It looks very complex to me and I am unable to get to any code or solution. Your help will really be appreciated.
As per the suggestion I have used below code:
import pandas as pd
import numpy as np
frame=pd.read_excel("Book2.xlsx")
frame_Liste=frame.Liste.values.tolist()
frame_company=frame.company.values.tolist()
frame_col3=[]
for items in frame_Liste:
frame_col3.append(list(set(items)-set(frame_company)))
frame["output"]=frame_col3
frame.to_excel("df.xlsx", index = False)
However I am getting output but the output is wrong and weird.I am showing you the output below:
Link for Output I got from the above code
Next time please be more specific about your problem, i.e. that you have the data in a sheet. Also please take care on your notation, since "[]" means list in python. Thus it was confusing. Now after you wrote that your data is in an excel sheet the problem is clear
import pandas as pd
data=pd.read_excel("path")
frame_Liste_as_String=frame.Liste.tolist()
frame_Liste=[x.split(',') for x in frame_Liste_as_String]
frame_Company=frame.Company.tolist()
frame_col3=[]
for items in frame_Liste:
frame_col3.append(list(set(items)-set(frame_Company)))
frame["col3"]=frame_col3
Name Liste Company col3
0 x xyz,mno,pqr xyz [pqr]
1 y abc,rst,hij abc [hij]
2 x xyz,mno,pqr uvw [pqr]
3 y abc,rst,hij def [hij]
4 x xyz,mno,pqr mno [pqr]
5 y abc,rst,hij rst [hij]
That should solve your problem
In case your data in the Liste Column is really in brackets change the line to
frame_Liste=[x.strip('][').split(',') for x in frame_Liste_as_String]

List of records to assign to X number of people

I have list of records which I want to assign to three people (eg.) equally.
So for example with 15 records, to split to three people named XYZ, PQR and ABC:
Case Name
123 XYZ
124 XYZ
135 XYZ
138 ABC
145 ABC
167 ABC
258 PQR
259 PQR
260 PQR
Considering your Comment, if Case is in A1 and three people's names are in F2:F4 (ie in Column4) please try in B2 and copied down to suit:
=OFFSET(B$2,INT(ROW()-2)/COUNTA(F:F)-COUNTA(F:F)*INT((ROW()-2)/COUNTA(F:F)^2),4)
You have a list of Cases to be assigned “equally” over a pool of People.
However, the Cases can only be assigned “equally” is their number is an exact multiple of the number of People, otherwise the residual cases will be assigned as per the order in the list of people, i.e. if you have 31 cases to be distributed among 4 individuals, then three people will have 8 cases while the remaining one will receive 7.
Assuming the list of Cases (including header) is located at B6:B40 and the list of People (also with header) is at G6:G10, and blank records, if any, are at the end of each list. (These formulas will work with Cases coded as numeric or alpha)
To assign the cases one by one to each person enter the following FormulaArray in the range C7:C40
(Formulas Array are entered by pressing [Ctrl] + [Shift] + [Enter] together)
=IF($B$7:$B$40="","",INDEX($G$7:$G$10,
1+MOD(-1+COUNTA($G$7:$G$10)
+(1+ROW($B$7:$B$40)-ROW($B$7)),
COUNTA($G$7:$G$10))))
Fig. 1
To assign the cases using the pattern shown in the sample data we need first to add a field to the list of persons in order to calculate the distribution of the cases (this makes the formula to allocate people easier to read, maintain and shorter) . In cell H6 enter the field name Count then in range H7:H10 enter this FormulaArray:
=IF($G$7:$G$10="","",
INT(COUNTA($B$7:$B$40)/COUNTA($G$7:$G$10))
+((1+ROW($G$7:$G$10)-ROW($G$7))<=
MOD(COUNTA($B$7:$B$40),COUNTA($G$7:$G$10)))*1)
Fig. 2
Then enter this `FormulaArray` in `E7` and copy to the end of the range.
=IFERROR(IF($B7="","",
IF(COUNTIF($G$7:$G$10,$E6)=0,
INDEX($G$7:$G$10,MATCH(0,COUNTIF(E$6:E6,$G$7:$G$10),0)*1),
IF(COUNTIF($E$6:$E6,$E6)<INDEX($H$7:$H$10,MATCH($E6,$G$7:$G$10,0)),$E6,
INDEX($G$7:$G$10,MATCH(0,COUNTIF(E$6:E6,$G$7:$G$10),0)*1)))),"")
Fig. 3

CountIfs for unique values

In my spreadsheet I have the following in cells A5:C12:
ABC 4 B
ABC 5 B
ABC 5 B
ABC 5 C
CBS 4 B
CBS 5 B
CBS 3 C
NBC 4 B
I am trying to see the number of unique occurrence of a “5 B”. Here, there are 3 times a 5 and B appear: in ABC twice and in CBS once. Thus I would like 2 returned, since there are 2 companies (ABC and CBS) that have 5 and B.
I tried =COUNTIFS(B5:B12,5,C5:C12,"B"), but this returned “3” and couldn’t distinguish between the two ABCs.
The formula =SUMPRODUCT(IFERROR((C5:C12="B")/COUNTIFS(A5:A12,A5:A12,C5:C12,"B"),0)) returns “3”, telling us there are 3 unique “B”s.
The formula =SUMPRODUCT(IFERROR((B5:B12=5)/COUNTIFS(A5:A12,A5:A12,B5:B12,5),0)) returns “2”, telling us there are 2 unique “5”s.
Is there a way to somehow combine the above two formulas (or any other way) to see the number of unique “5B”s?
UNTESTED
=SUM(IF(FREQUENCY(IF(B5:B12&C5:C12="5b",IF(A5:A12<>"",MATCH(A5:A12,A5:A12,0))),ROW(A5:A12)-ROW(A5)+1),1))
entered as an array formula (adapted from Barry Houdini).

Create a excel formula that combines 2 tables

I have 2 tables. Table 1 has employees and Table 2 has codes and their values.
For each employee row in Table 1 by looking up the last 5 characters and matching it Table 2 Column header,I want to insert new rows with all of the Table 2 Code rows and correspondnig "Plan" column value.
For Example, in Table1 1st row EE_Plan1, the last 5 characters "Plan1" should match the 2nd column in Table2, get the plan values (123,879) and insert new code rows as shown below in END RESULT.
Really appreciate any help with creating a formula. Thank you!!
TABLE1
Employee
--------
EE_Plan1
EE_Plan2
EE_Plan3
TABLE2
Code Plan1 Plan2 Plan3
---- ----- ----- -----
DND 123 456 jgh
ABC 879 978 ajs
END RESULT
Employee Code Plan Desc
-------- ---- ---------
EE_Plan1 DND 123
EE_Plan1 ABC 879
EE_Plan2 DND 456
EE_Plan2 ABC 978
EE_Plan3 DND jgh
EE_Plan3 ABC ajs
Vlookups are your friend here. Use a VLOOKUP to locate the corresponding value on the other table and place a cell's value in a different table by reference of Employee. Sorting the sheet before a lookup can help tremendously and set Range Looup to False.
=VLOOKUP(lookup_value,table_array,col_index_num,range_lookup)
For returning multiple values see this article from Microsoft.

Resources