Specifically, I know ahead of time I only need to swap position 1 and 2 with 4 and 5.
2 Examples:
HEART
New output:
RTAHE
12734
New output:
34712
There is probably more than a handful of ways to do this. If you're interested in a formula, here is one way to go about it:
=RIGHT(A3,2)&MID(A3,3,LEN(A3)-4)&LEFT(A3,2)
Seems to be working on some test data I threw together.
A bit more robust, as suggested by #Rafalon:
=MID(A3,4,2)&MID(A3,3,1)&LEFT(A3,2)&MID(A3,6,LEN(A3))
Produces following results:
Input
1
12
123
1234
12345
123456
1234567
Output
1
12
312
4312
45312
453126
4531267
Related
I'm trying to compile a best 5 and worst 5 list. I have two rows, column B with the number score and column C with the name. I only want the list to include the name.
In my previous attempts the formula would get the top/bottom 5 but as soon as a duplicate score appeared the first known name with that value would just repeat.
Here is my data
26 Cal
55 John
55 Mike
100 Steve
26 Thomas
100 Jaden
100 Jack
95 Josh
87 Cole
75 Brett
I've managed to get the bottom 5 list formula correct. This formula works perfectly and includes all names of duplicate scores.
Example of what I get:
Cal
Thomas
John
Mike
Brett
=INDEX($C$56:$E$70,SMALL(IF($B$56:$B$70=SMALL($B$56:$B$70,ROWS(E$2:E2)),ROW($B$56:$B$70)-ROW($B$56)+1),SUM(IF($B$56:$B$70=SMALL($B$56:$B$70,
ROWS(E$2:E2)),1,0))-SUM(IF($B$56:$B$70<=SMALL($B$56:$B$70,ROWS(E$2:E2)),1,0))+ROWS(E$2:E2)))
Here is the formula I've tried to get the top 5 - however I keep getting an error.
=INDEX($C$56:$E$70,LARGE(IF($B$56:$B$70=LARGE($B$56:$B$70,ROWS(E$2:E2)),ROW($B$56:$B$70)-ROW($B$56)+1),SUM(IF($B$56:$B$70=LARGE($B$56:$B$70,
ROWS(E$2:E2)),1,0))-SUM(IF($B$56:$B$70<=LARGE($B$56:$B$70,ROWS(E$2:E2)),1,0))+ROWS(E$2:E2)))
Example of what I'm looking for
Steve
Jaden
Jack
Josh
Cole
You can set two queries like this for both cases:
=QUERY(B56:C70,"Select C order by B desc limit 5")
=QUERY(B56:C70,"Select C order by B limit 5")
Use SORTN() function like-
=SORTN(A1:B10,5,,1,1)
To keep only one column, wrap the SORTN() function with INDEX() and specify column number. Try-
=INDEX(SORTN(A1:B10,5,,1,1),,2)
I have the data file which looks like this -
[Table 1]
Terms Author Frequency
Hepatitis Christopher 2
Acid Subrata 1
Acid Kal 3
Kinase Pramod 31
Kinase Steve 5
Kinase Sharon 10
Acid Rob 5
Acid Christopher 2
Hepatitis Sharon 3
which I want to convert in a frequency matrix like this -
Terms Christopher Subrata Kal Pramod Steve Sharon Rob
Hepatitis 2 0 0 0 0 3 0
Acid 2 0 3 0 0 0 5
Kinase 0 0 0 31 5 10 0
Now I have figured out how to do that and I am using this code for that -
a = pd.read_csv("C:\\Users\\robert\\Desktop\\Python Project\\Publications Data\\New Merged Title Terms Corrected\\Python generated file\\Terms_Frequency_File.csv")
b = a.groupby(['Terms']).apply(lambda x:x.set_index(['Terms','Author']).unstack()['Frequency'])
and this worked absolutely fine till yesterday but today I generated the [Table 1] data again as I had to add one additional author to the data and trying to make a frequency matrix again like in [Table 2] but it's giving me this silly error -
KeyError: 'Terms'
I am pretty sure this has to do something with the index column in the dataframe or some white space issues in the index column(in this case 'Terms' column).
I tried to read several answers on this like this - KeyError: 'column_name' and this - Key error when selecting columns in pandas dataframe after read_csv and tried those methods but these aren't helping.
Any help on this will be much appreciated! Thanks much!
I've got the same problem as you. I've observed that if I change the data in .csv format in OpenOffice program then the error occurs. Instead of that I've downloaded the data from the Internet and I edited the data in simple Notepad++ editor. Then it works normally. I know that perhaps this solution doesn't help in you case, but maybe you should change the text editor or program that supports .csv files.
I couldn't find anything similar.
I have a pretty big excel table but I can't get what I need from it.
I have a column for example of names
John
Johnny
Arny
Arny
John
Johanatan
Jeremie
Brook
Arny
Johanatan
I want it to return or show me results like that
Johnny 1
Arny 3
John 1
Jeremie 1
Brook 1
Johanatan 2
Couldn't find an appropriate excel to result me with that.
I have a list that I'm checking against the main data.
The main data looks like:
1234 1
1235 1
1234 1
1213 2
1231 2
1212 2
1231 3
1231 3
etc
The list I'm checking against the main data is:
1
2
3
etc
For each number in my list, I want to count how many start with 123, so the output looks like:
ID 123
1 3
2 1
3 2
etc
I have each ID in the list already. To drag down for each number, I currently have countifs(a1:a8, a1,b1:b8, "123") and it's obviously producing an error. I know I need to include left somewhere in here but I'm not sure where or how to. Much thanks.
In the Main Data sheet, add a column and enter formula as eg: C1=IF(LEFT(A1,3)="123",1,0). Drag the formula for every C cell. Then use that C cell in your SUMIFS in your list sheet eg: =SUMIFS(C:C,B:B,"="&E1) E col for me is your list.
Please refer screenshots below.
Identify begins with 123
sumifs to get the output
Edit:
Another Solution: =SUMPRODUCT(--(LEFT(Maindata!$A$1:$A$8,3)="123")*(Maindata!$B$1:$B$8=Maindata!D1)). This solution works fine to me.
sumproduct with --left
You can use someproduct to do this:
=sumproduct((Maindata!$A$1:$A$8=A1)*(left(Maindata!$B$1:$B$8)="123"))
Where A1 holds the digit you're determining the amount of "values that start with 123" for, and the main data is in worksheet Maindata, range A1:B8.
Is your data in the form of text or number values? If the former, your criteria should instead be "123*" (using an asterisk for wildcard), if the latter you might be able to get away with using ">1230".
I have another question. Thanks for everyone's help and patience with an R newbie!
How can I count how many times a string occurs in a column? Example:
MYdata <- data.frame(fruits = c("apples", "pears", "unknown_f", "unknown_f", "unknown_f"),
veggies = c("beans", "carrots", "carrots", "unknown_v", "unknown_v"),
sales = rnorm(5, 10000, 2500))
The problem is that my real data set contains several thousand rows and several hundred of the unknown fruits and unknown veggies. I played around with "table()" and "levels" but without much success. I guess it's more complicated than that. Great would be to have an output table listing the name of each unique fruit/veggie and how many times it occurs in its column. Any hint in the right direction would be much appreciated.
Thanks,
Marcus
If I understand your question, the function table() should work just fine. Here is how:
table(MYdata$fruits)
apples pears unknown_f
1 1 3
table(MYdata$veggies)
beans carrots unknown_v
1 2 2
Or use table inside lapply:
lapply(MYdata[1:2], table)
$fruits
apples pears unknown_f
1 1 3
$veggies
beans carrots unknown_v
1 2 2
The following gives you a data frame of counts which you might find easier to use or may suit your purposes better:
tabs=lapply(MYdata[-3], table)
out=data.frame(item=names(unlist(tabs)),count=unlist(tabs)[],
stringsAsFactors=FALSE)
rownames(out)=c()
print(out)
item count
1 fruits.apples 1
2 fruits.pears 1
3 fruits.unknown_f 3
4 veggies.beans 1
5 veggies.carrots 2
6 veggies.unknown_v 2
Maybe something like
summary(MYdata$fruits)