I have the data file which looks like this -
[Table 1]
Terms Author Frequency
Hepatitis Christopher 2
Acid Subrata 1
Acid Kal 3
Kinase Pramod 31
Kinase Steve 5
Kinase Sharon 10
Acid Rob 5
Acid Christopher 2
Hepatitis Sharon 3
which I want to convert in a frequency matrix like this -
Terms Christopher Subrata Kal Pramod Steve Sharon Rob
Hepatitis 2 0 0 0 0 3 0
Acid 2 0 3 0 0 0 5
Kinase 0 0 0 31 5 10 0
Now I have figured out how to do that and I am using this code for that -
a = pd.read_csv("C:\\Users\\robert\\Desktop\\Python Project\\Publications Data\\New Merged Title Terms Corrected\\Python generated file\\Terms_Frequency_File.csv")
b = a.groupby(['Terms']).apply(lambda x:x.set_index(['Terms','Author']).unstack()['Frequency'])
and this worked absolutely fine till yesterday but today I generated the [Table 1] data again as I had to add one additional author to the data and trying to make a frequency matrix again like in [Table 2] but it's giving me this silly error -
KeyError: 'Terms'
I am pretty sure this has to do something with the index column in the dataframe or some white space issues in the index column(in this case 'Terms' column).
I tried to read several answers on this like this - KeyError: 'column_name' and this - Key error when selecting columns in pandas dataframe after read_csv and tried those methods but these aren't helping.
Any help on this will be much appreciated! Thanks much!
I've got the same problem as you. I've observed that if I change the data in .csv format in OpenOffice program then the error occurs. Instead of that I've downloaded the data from the Internet and I edited the data in simple Notepad++ editor. Then it works normally. I know that perhaps this solution doesn't help in you case, but maybe you should change the text editor or program that supports .csv files.
Related
Is there any way to not to show an axis label if value is zero against that?
Suppose if a table is like below
Vehicles Sold per Brand
jun-21
jul-21
ago-21
sept-21
Opel
2
4
3
5
Renoult
6
3
8
1
Ferrari
0
0
0
0
Mercedes
1
1
6
4
Seat
2
0
4
2
Others
12
11
15
16
If i want to not to get the graph of Ferrari in axis, what should I do?
I know that, I can hide that column if the graph is not to be shown for that. I can not use that since its a highly dynamic data and I dont want to go and hide it everytime.
Could somebody help?
Many thanks an advance
So, quick and dirty:
But I would then produce the table of numbers so that any row not to be included gets removed and then build the chart with 5 only and not have the gap. I will let you work on that.
So, did that as well, but I will let you figure out how to control the Legend:
The trick is to use large(), but you may need to be wrapping with if() to control 0 better...
I have the following data in spreadsheet A.
name trait1 trait2 nice
0 Adam 29 81 0
1 Barry 17 75 1
2 Chris 62 0 1
I wish to create a spreadsheet B that will be a filtered copy of this data. Namely, let's assume for a moment that I want to filter nice = 1 and am interested only in column name. The copy in spreadsheet B would be as shown below. In spreadsheet B I wish to be adding some extra columns, e.g. education.
name nice education
1 Barry 1 primary
2 Chris 1 university
What I want to achieve is a spreadsheet B that will get updated if anything changes in spreadsheet A. So for example, if I were to change the name Barry to Ben. The spreadsheet B would become the following.
name nice education
1 Ben 1 primary
2 Chris 1 university
Similarly (and what I find to be the hardest), if a row is added in spreadsheet A, e.g.
name trait1 trait2 nice
0 Adam 29 81 0
1 Barry 17 75 1
2 Matt 69 11 1
3 Chris 62 0 1
The updated spreadsheet B would be as follows:
name nice education
1 Barry 1 primary
2 Matt 1
3 Chris 1 university
So I want the education column to remain the same.
My approach of using a combination of =IF() and =VLOOKUP() functions ultimately did not work. Guess I am really curious about how to connect rows of education to names. So that when a row is added in spreadsheet A, then spreadsheet B gets updated but the education field connected to the new row is empty and will be filled by hand later on.
Since you are looking for a finished product to be in Google Sheets, I'd advise to use QUERY():
Formula in I1:
=QUERY(INDEX({A:D,VLOOKUP(A:A,F:G,2,0)}),"Select Col1,Col4,Col5 where Col4=1")
Note: I made the assumption you pull the education in through a VLOOKUP() (since you mentioned that in the body of the question).
Specifically, I know ahead of time I only need to swap position 1 and 2 with 4 and 5.
2 Examples:
HEART
New output:
RTAHE
12734
New output:
34712
There is probably more than a handful of ways to do this. If you're interested in a formula, here is one way to go about it:
=RIGHT(A3,2)&MID(A3,3,LEN(A3)-4)&LEFT(A3,2)
Seems to be working on some test data I threw together.
A bit more robust, as suggested by #Rafalon:
=MID(A3,4,2)&MID(A3,3,1)&LEFT(A3,2)&MID(A3,6,LEN(A3))
Produces following results:
Input
1
12
123
1234
12345
123456
1234567
Output
1
12
312
4312
45312
453126
4531267
My input data is from three excel sheets and few columns (integer data) data are stored as text. What is the best practice to do when reading the excel sheet as dataframe. I am having an issue when I try to pd.concat all three dataframes. I am getting more number rows than expected during pd.concat process.
I tried conversion using astype() to int as well as removing white spaces with df.columns = df.columns.str.strip()
Let me know the best practice to resolve this.
Sorry, I had to post here for the issue that I am facing. I get two lines instead of one line for each comparison. When I looked at the results excel sheet, some ids are stored as text and some are not. Hence they both are treated as two lines during comparisons amidst 3 dataframes using pd.concat Thanks.
Id Year Item sales_Amount1 sales_Amount2 target_Amount
1234 1.2019 Badam 0 70 100
1234 1.2019 Badam 12 0 0
1234 1.2019 carrot 0 0 200
1234 1.2019 carrot 18 0 0
I have been trying to create a windrose that displays the occurence of multiple wind speeds and their respective wind direction. Using other very helpful posts on here I've gotten pretty close to what I want. There is just one thing I can't seem to fix.
As you can see in the figure below the graph starts at 0 degrees while I want the "North" wind direction to start at -11,25 (or +348,75) degrees.
Currently the radial axis labels are added using a pie chart while the rest of the data is plotted in a filled radar chart. It is easy to rotate the pie chart but I can't seem to find a similar function for rotating the radar chart. Any help would be much appreciated. The excel file is attached beneath the figure.
EDIT: Locked excel file against editing
Excel file
I haven't fully digested the netiquette of this website and not sure if it is a good idea to try giving you an answer 6+ months after you posted. Also hope that by this time you found an answer.
If not, this link should be of help:
https://superuser.com/questions/687036/how-to-make-a-pie-radar-chart
In the example the creator made one field for each degree and started the first series, which would be equivalent to your north at 0°. However nothing prevents you from starting at 348.
I have not tested but I also think that nothing prevents you from adding even more "resolution", e.g. half-degree steps.. or even more to your discretion.
EDIT: following L.Guthardt's feedback.
In order to provide you an answer I opted to simplify your table and chart. Mostly for convenience, but also because I struggle to get a full understanding of the original "architecture". Still, the solution should work at any level and is based on two key elements:
first you will have to double the number of rows from 16 to 32 (thus each quadrant being repeated two times, e.g. ... nne - nne - ne - ne...)
second, you have to start and finish with N as showcased here
Direction Cat6
N 6
NNE 4 4
NNE 6
NE 4 4
NE 6
ENE 4 4
ENE 6
E 4 4
E 6
ESE 4 4
ESE 6
SE 4 4
SE 6
SSE 4 4
SSE 6
S 4 4
S 6
SSW 4 4
SSW 6
SW 4 4
SW 6
WSW 4 4
WSW 6
W 4 4
W 6
WNW 4 4
WNW 6
NW 4 4
NW 6
NNW 4 4
NNW 6
N 4 4
which will generate
for the pie chart I used a separate range with alternate gaps in the labels
Direction Dummy
N 1
1
NNE 1
1
NE 1
1
ENE 1
1
E 1
1
ESE 1
1
SE 1
1
SSE 1
1
S 1
1
SSW 1
1
SW 1
1
WSW 1
1
W 1
1
WNW 1
1
NW 1
1
NNW 1
1
Rotating radar charts in Excel can be achieved by building a separate table for plotting the chart. It would have three columns:
Column A: New categories
Column B: Original categories (calculated from A)
Column C: Original data using VLOOKUP() on B
The chart will be plotted using columns B and C. Column B category numbers are offset by the desired number of categories.
If the chart needs to be rotated by other than multiples of a category degree (e.g., 30 degrees for 12 categories), you would need to add rows in between (corresponding to the amount of rotation in relation to the category degree). For example, to rotate a 12-category radar chart by multiples of 15 degrees, one extra row is needed in-between each original category row (to create 24 new categories). In this case, you would need to calculate the intermediate values by linearly interpolating between actual data points.
The trick is that blank category values are not displayed on the chart and the values for these categories blend in smoothly with the real data (because they are interpolated).
I will post an example if the above is not clear enough.
P.S. I cannot look at your new Excel file (in Answers) because it exceeds 5 MB (see screenshot 1).
So I did keep working on this problem and the best solution I've come up with (while using Microsoft Excel) looks as follows:
Currently, the number of sectors in the plot is fixed at 16. If I want to make this number variable, the table required for the plot data requires a very large amount of lookup functions which make the spreadsheet too slow to work with.
I've uploaded the new Excel file here to take a look at:
Excel file