faster alternative to convert pandas df to dictionary - python-3.x

I am trying to convert a pandas dataframe wih 2 columns , into a dictionary such that the values of one column are the keys, and the values of the other column are the values of the dictionary. If the keys happen to be repeating (which they are), I want the values of the same key to be appended in a list.
So far I did the following , but this takes a very long time if I want to convert a 100K plus records to a dictionary.
A B
1 ab kate
2 ab drew
3 ab mike
4 ab eric
5 cd bobby
6 cd kyle
7 ab alex
8 ab michelle
9 cd heather
fdict = dict()
for d, d2 in zip(t.A, t.B):
fdict.setdefault(d, list()).append(d2)
Please help me understand how I can do this faster using python.
Thanks !

I think df.set_index('ID').T.to_dict('list') this oneliner would serve your purpose and faster.

Related

Google sheets formula to get the Top 5 List With Duplicates

I'm trying to compile a best 5 and worst 5 list. I have two rows, column B with the number score and column C with the name. I only want the list to include the name.
In my previous attempts the formula would get the top/bottom 5 but as soon as a duplicate score appeared the first known name with that value would just repeat.
Here is my data
26 Cal
55 John
55 Mike
100 Steve
26 Thomas
100 Jaden
100 Jack
95 Josh
87 Cole
75 Brett
I've managed to get the bottom 5 list formula correct. This formula works perfectly and includes all names of duplicate scores.
Example of what I get:
Cal
Thomas
John
Mike
Brett
=INDEX($C$56:$E$70,SMALL(IF($B$56:$B$70=SMALL($B$56:$B$70,ROWS(E$2:E2)),ROW($B$56:$B$70)-ROW($B$56)+1),SUM(IF($B$56:$B$70=SMALL($B$56:$B$70,
ROWS(E$2:E2)),1,0))-SUM(IF($B$56:$B$70<=SMALL($B$56:$B$70,ROWS(E$2:E2)),1,0))+ROWS(E$2:E2)))
Here is the formula I've tried to get the top 5 - however I keep getting an error.
=INDEX($C$56:$E$70,LARGE(IF($B$56:$B$70=LARGE($B$56:$B$70,ROWS(E$2:E2)),ROW($B$56:$B$70)-ROW($B$56)+1),SUM(IF($B$56:$B$70=LARGE($B$56:$B$70,
ROWS(E$2:E2)),1,0))-SUM(IF($B$56:$B$70<=LARGE($B$56:$B$70,ROWS(E$2:E2)),1,0))+ROWS(E$2:E2)))
Example of what I'm looking for
Steve
Jaden
Jack
Josh
Cole
You can set two queries like this for both cases:
=QUERY(B56:C70,"Select C order by B desc limit 5")
=QUERY(B56:C70,"Select C order by B limit 5")
Use SORTN() function like-
=SORTN(A1:B10,5,,1,1)
To keep only one column, wrap the SORTN() function with INDEX() and specify column number. Try-
=INDEX(SORTN(A1:B10,5,,1,1),,2)

If cell is empty, find and capture neighboring cell values

I have the following Dataframe:
A B C
0 Success 1.5 AAA
1 Duplicate BBB
2 NaN 1.5 CCC
3 Rejected DDD
3 Rejected EEE
I am looking to capture each value in the C column when B is empty. The goal is to store this in a list.
The list would contain BBB,DDD,EEE
I've been searching on Stack for a bit and can quite find this answer.
Any help would be greatly appreciated.
Thank you!
Based on the given description, you can try this to get the list of values of column C when column B values are empty. Read more about tolist here
required_list = df.loc[df['B'].isna(), 'C'].tolist()
Now you can iter the required list as per your requirements.
Try this
df[df["B"].isnull()]["C"].tolist()
This will do you
import numpy as np
df[D] = np.where(df[B].isnull(),df[C],None)
list = df[D].dropna()

Excel - return all unique permutations of 3 columns

I have 3 columns
a b c
jon ben 2
ben jon 2
roy jack 1
jack roy 1
I'm trying to retrieve all unique permutations e.g. ben and jon = jon and ben so they should only appear once. Expected output:
a b c
jon ben 2
roy jack 1
Any ideas of a function that could do this? The order in the output does not matter. I've tried concatenating and then removing duplicates, but obviously this only considers the string order.
I've created a fourth column by joining all three columns together =a1&","&b1&","&c1 and used excel's built in remove duplicates function. This doesnt work as the order of the strings are different.
In your forth column use the formula
=if(A1<B1,A1&","&B1&","&C1,B1&","&A1&","&C1)
Which should join A and B in alphabetical order, then you can remove duplicates as you have done.

Selecting Text from an R string to create a new object

I'm relatively new to R, and I'm currently stuck.
I have observations that are made up of legal articles, fe:
BIV:III,XXVIII.1(b);CIV:2.
So I splitted them resulting in a string listing each observation and the legal articles used. This looks like:
ArtAGr list of 400230
chr[1:2] "BIV:III,XXVIII.1(b)" "CIV:2"
chr[1:1] "ILA:2.3(b)"
chr[1:3] "BIV:IB.3(d)" "CIV:7,9" "ILA:VII.1"
The BIV and CIV would need to become my new variables. However, the observations vary, so some observations include both BIV and CIV, while others include other legal articles like ILA:II.3(b)
Now, I would like to create a dataframe from these guys, so I can group all the observations in a column for each major article.
Eventually, the perfect dataframe should look like:
Dispute BIV CIV ILA
1 III, XXVIII.1(b) 2 NA
2 NA NA II.3(b)
3 IV.3(d) 7,9 VII.1
4 II NA NA
So, I will need to create a new object grouping all observations who contain a text like BIV, and a O or N/A for those observations that do not use this legal article. Any thoughts would be greatly appreciated!
Thanks a lot!
Sven
Here's an approach:
# a vector of character strings (not the splitted ones)
vec <- c("BIV:III,XXVIII.1(b);CIV:2",
"ILA:II.3(b)",
"BIV:IB.3(d);CIV:7,9;ILA:VII.1")
# split strings
s <- strsplit(vec, "[;:]")
# target words
tar <- c("BIV", "CIV", "ILA")
# create data frame
setNames(as.data.frame(do.call(rbind, lapply(s, function(x)
replace(rep(NA_character_, length(tar)),
match(x[c(TRUE, FALSE)], tar), x[c(FALSE, TRUE)])))), tar)
The result:
BIV CIV ILA
1 III,XXVIII.1(b) 2 <NA>
2 <NA> <NA> II.3(b)
3 IB.3(d) 7,9 VII.1

Excel sort data in newspaper style columns

I have an excel sheet that has names and extension numbers. The sheet is designed to be printed as a reference and so it has the data split into 3 columns like how a newspaper is laid out.
EXT Name EXT Name EXT Name
1 bob 4 pete 8 sam
2 dave 5 sally 9 john
I need to have excel sort this data on name, A-Z. I can only work out how to make it sort one column at a time and so I end up having to manaually sort the data every time I add or remove information.
Can excel sort all 3 columns top to bottom and left to right?
Thanks!
Maybe this KB artikel helps.
Or u can try using the small function like
Column A
=SMALL($A$1:$A$9;0+ROWS(A$1:A1))
Column C
=SMALL($A$1:$A$9;30+ROWS(A$1:A1))
In this case you clone the sorted dataset, which is in two columns
The value for the B colums can be found using a vlookup. the value 0 and zero are an offset, so the length of the page is fixed

Resources