I have 3 Tables stored as named ranges.
The user picks which range to search through using a drop down box. The named ranges are Table1, Table2 and Table 3.
Table1
0.7 0.8 0.9
50 1.08 1.06 1.04
70 1.08 1.06 1.05
95 1.08 1.07 1.05
120 1.09 1.07 1.05
Table2
0.7 0.8 0.9
16 1.06 1.04 1.03
25 1.06 1.05 1.03
35 1.06 1.05 1.03
Table 3
0.7 0.8 0.9
50 1.21 1.16 1.11
70 1.22 1.16 1.12
95 1.22 1.16 1.12
120 1.22 1.16 1.12
Then they pick a value from the header row, and a value from the first column.
i.e. the user picks, Table3, 0.8 and 95. My formula should return 1.16.
I am halfway there using indirect (table1), however I need to extract the header row, and first column so I can use something like
=INDEX(INDIRECT(pickedtable),MATCH(picked colref,INDIRECT(pickedtable:1)), MATCH(picked rowref,INDIRECT(1:pickedtable)))
Any idea how to achieve this?
INDIRECT(pickedtable) should work OK to get the table but to get first column or row from the table you can use INDEX with that, so following your original approach this formula should work
=INDEX(INDIRECT(pickedtable),MATCH(pickedcolref,INDEX(INDIRECT(pickedtable),0,1),0),MATCH(pickedrowref,INDEX(INDIRECT(pickedtable),1,0),0))
or you can use HLOOKUP or VLOOKUP to shorten as per chris neilsen's approach, e.g. with VLOOKUP
=VLOOKUP(pickedcolref,INDIRECT(pickedtable),MATCH(pickedrowref,INDEX(INDIRECT(pickedtable),1,0),0))
Try this
=HLOOKUP(pickedcolref,
IF(pickedtable=1,Table1,IF(pickedtable=2,Table2,IF(pickedtable=3,Table3,""))),
MATCH(pickedrowref,
OFFSET(
IF(pickedtable=1,Table1,IF(pickedtable=2,Table2,IF(pickedtable=3,Table3,""))),
0,0,,1)
,0)
,FALSE)
Related
I parsed a table from a website using Selenium (by xpath), then used pd.read_html on the table element, and now I'm left with what looks like a list that makes up the table. It looks like this:
[Empty DataFrame
Columns: [Symbol, Expiration, Strike, Last, Open, High, Low, Change, Volume]
Index: [], Symbol Expiration Strike Last Open High Low Change Volume
0 XPEV Dec20 12/18/2020 46.5 3.40 3.00 5.05 2.49 1.08 696.0
1 XPEV Dec20 12/18/2020 47.0 3.15 3.10 4.80 2.00 1.02 2359.0
2 XPEV Dec20 12/18/2020 47.5 2.80 2.67 4.50 1.89 0.91 2231.0
3 XPEV Dec20 12/18/2020 48.0 2.51 2.50 4.29 1.66 0.85 3887.0
4 XPEV Dec20 12/18/2020 48.5 2.22 2.34 3.80 1.51 0.72 2862.0
5 XPEV Dec20 12/18/2020 49.0 1.84 2.00 3.55 1.34 0.49 4382.0
6 XPEV Dec20 12/18/2020 50.0 1.36 1.76 3.10 1.02 0.30 14578.0
7 XPEV Dec20 12/18/2020 51.0 1.14 1.26 2.62 0.78 0.31 4429.0
8 XPEV Dec20 12/18/2020 52.0 0.85 0.95 2.20 0.62 0.19 2775.0
9 XPEV Dec20 12/18/2020 53.0 0.63 0.79 1.85 0.50 0.13 1542.0]
How do I turn this into an actual dataframe, with the "Symbol, Expiration, etc..." as the header, and the far left column as the index?
I've been trying several different things, but to no avail. Where I left off was trying:
# From reading the html of the table step
dfs = pd.read_html(table.get_attribute('outerHTML'))
dfs = pd.DataFrame(dfs)
... and when I print the new dfs, I get this:
0 Empty DataFrame
Columns: [Symbol, Expiration, ...
1 Symbol Expiration Strike Last Open ...
Per pandas.read_html docs,
This function will always return a list of DataFrame or it will fail, e.g., it will not return an empty list.
According to your list output the non-empty dataframe is the second element in that list. So retrieve it by indexing (remember Python uses zero as first index of iterables). Do note you can use data frames stored in lists or dicts.
dfs[1].head()
dfs[1].tail()
dfs[1].describe()
...
single_df = dfs[1].copy()
del dfs
Or index on same call
single_df = pd.read_html(...)[1]
I am trying to create a column which contains only the minimum of the one row and a few columns, for example:
A0 A1 A2 B0 B1 B2 C0 C1
0 0.84 0.47 0.55 0.46 0.76 0.42 0.24 0.75
1 0.43 0.47 0.93 0.39 0.58 0.83 0.35 0.39
2 0.12 0.17 0.35 0.00 0.19 0.22 0.93 0.73
3 0.95 0.56 0.84 0.74 0.52 0.51 0.28 0.03
4 0.73 0.19 0.88 0.51 0.73 0.69 0.74 0.61
5 0.18 0.46 0.62 0.84 0.68 0.17 0.02 0.53
6 0.38 0.55 0.80 0.87 0.01 0.88 0.56 0.72
Here I am trying to create a column which contains the minimum for each row of columns B0, B1, B2.
The output would look like this:
A0 A1 A2 B0 B1 B2 C0 C1 Minimum
0 0.84 0.47 0.55 0.46 0.76 0.42 0.24 0.75 0.42
1 0.43 0.47 0.93 0.39 0.58 0.83 0.35 0.39 0.39
2 0.12 0.17 0.35 0.00 0.19 0.22 0.93 0.73 0.00
3 0.95 0.56 0.84 0.74 0.52 0.51 0.28 0.03 0.51
4 0.73 0.19 0.88 0.51 0.73 0.69 0.74 0.61 0.51
5 0.18 0.46 0.62 0.84 0.68 0.17 0.02 0.53 0.17
6 0.38 0.55 0.80 0.87 0.01 0.88 0.56 0.72 0.01
Here is part of the code, but it is not doing what I want it to do:
for i in range(0,2):
df['Minimum'] = df.loc[0,'B'+str(i)].min()
This is a one-liner, you just need to use the axis argument for min to tell it to work across the columns rather than down:
df['Minimum'] = df.loc[:, ['B0', 'B1', 'B2']].min(axis=1)
If you need to use this solution for different numbers of columns, you can use a for loop or list comprehension to construct the list of columns:
n_columns = 2
cols_to_use = ['B' + str(i) for i in range(n_columns)]
df['Minimum'] = df.loc[:, cols_to_use].min(axis=1)
For my tasks a universal and flexible approach is the following example:
df['Minimum'] = df[['B0', 'B1', 'B2']].apply(lambda x: min(x[0],x[1],x[2]), axis=1)
The target column 'Minimum' is assigned the result of the lambda function based on the selected DF columns['B0', 'B1', 'B2']. Access elements in a function through the function alias and his new Index(if count of elements is more then one). Be sure to specify axis=1, which indicates line-by-line calculations.
This is very convenient when you need to make complex calculations.
However, I assume that such a solution may be inferior in speed.
As for the selection of columns, in addition to the 'for' method, I can suggest using a filter like this:
calls_to_use = list(filter(lambda f:'B' in f, df.columns))
literally, a filter is applied to the list of DF columns through a lambda function that checks for the occurrence of the letter 'B'.
after that the first example can be written as follows:
calls_to_use = list(filter(lambda f:'B' in f, df.columns))
df['Minimum'] = df[calls_to_use].apply(lambda x: min(x), axis=1)
although after pre-selecting the columns, it would be preferable:
df['Minimum'] = df[calls_to_use].min(axis=1)
I have the following data.
x y
0.00 0.00
0.03 1.74
0.05 2.60
0.08 3.04
0.11 3.47
0.13 3.90
0.16 4.33
0.19 4.59
0.21 4.76
0.20 3.90
0.18 3.12
0.18 2.60
0.16 2.17
0.15 1.73
0.13 1.47
0.12 1.21
0.14 2.60
0.17 3.47
0.21 3.90
0.23 4.33
0.26 4.76
0.28 5.19
0.31 5.45
0.33 5.62
0.37 5.79
0.38 5.97
0.42 6.14
0.44 6.22
0.47 6.31
0.49 6.39
0.51 6.48
I used =max()/2 to obtain the 50%th percentile, which in this case is 3.24.
The point 3.24 does not exist for the y values but it falls in between the 3.04 and 3.47.
How can I find the address of these 2 cells?
Note: The 50th percentile also hits on the other part of the graph, but I only require the first instance.
Assuming you data in columns A and B, with header row in 1 (first numbers in row 2). Assuming you =max()/2 formula is in D2
Use aggregate to determine the first row where the Y value exceeds you mean. Then do it again and subtract 1 from the row.
=AGGREGATE(15,6,ROW($B$2:$B$32)/(B2:B32>D2),1)
That will return the row number of 6. First occurrence exceeding the value in D2.
=AGGREGATE(15,6,ROW($B$2:$B$32)/(B2:B32>D2),1)-1
That will give you row number of 5.
use the row numbers in conjunction with INDEX and you can pull the X value.
=INDEX(A:A,AGGREGATE(15,6,ROW($B$2:$B$32)/(B2:B32>D2),1)-1)
=INDEX(A:A,AGGREGATE(15,6,ROW($B$2:$B$32)/(B2:B32>D2),1))
That will give you the X values. if you want the corresponding Y values, simply change the index look up range from A:A to B:B.
=INDEX(B:B,AGGREGATE(15,6,ROW($B$2:$B$32)/(B2:B32>D2),1)-1)
=INDEX(B:B,AGGREGATE(15,6,ROW($B$2:$B$32)/(B2:B32>D2),1))
My Data looks like this:
2015-08-01 07:00 0.23 0.52 0.00 0.52 9 14.6 14.6 14.6 67 8.5 0.0 --- 0.00 0.0 --- 14.6 14.1 14.1 16.3 1016.2 0.00 0.0 156 0.22 156 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.05 23 1 100.0 1 1.8797836153192153 660.7143449269239
2015-08-01 07:01 0.25 0.53 0.00 0.53 0 14.6 14.6 14.6 67 8.5 0.0 --- 0.00 0.0 --- 14.6 14.1 14.1 16.3 1016.2 0.00 0.0 153 0.22 153 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.00 23 1 100.0 1 1.8894284951616422 657.3416264126714 105 73 121 163
2015-08-01 07:02 0.25 0.52 0.00 0.52 0 14.7 14.7 14.6 67 8.6 0.0 --- 0.00 0.0 --- 14.7 14.2 14.2 16.1 1016.2 0.00 0.0 139 0.20 139 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.00 24 1 100.0 1 1.8976360559992214 654.4985251906015
2015-08-01 07:03 0.26 0.53 0.00 0.53 0 14.7 14.7 14.7 67 8.6 0.0 --- 0.00 0.0 --- 14.7 14.2 14.2 16.1 1016.3 0.00 0.0 139 0.20 144 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.00 23 1 100.0 1 1.9047561611790007 652.0519661851259
2015-08-01 07:04 0.25 0.53 0.00 0.53 0 14.7 14.7 14.7 67 8.7 0.0 --- 0.00 0.0 --- 14.7 14.2 14.2 16.2 1016.3 0.00 0.0 141 0.20 141 0.0 0.00 0.0 0.003 0.000 23.9 39 9.1 23.4 0.00 24 1 100.0 1 1.903537153899393 652.4695341279602
2015-08-01 07:05 0.25 0.52 0.00 0.52 0 14.8 14.8 14.7 67 8.7 0.0 --- 0.00 0.0 --- 14.8 14.3 14.3 16.3 1016.3 0.00 0.0 148 0.21 148 0.0 0.00 0.0 0.002 0.000 23.9 39 9.1 23.4 0.00 23 1 100.0 1 1.897596925383499 654.5120216976508
........
........
I've got multiple files looking that way: so I got data from 2015-08-01, 2015-06-05 and so on.
i want to plot the 43rd row in relation to the 3rd and 25th row :-) in some kind of heat map style from all those files in ONE plot. So those are the rows want to pick out of each the file:
0.23 156 660.7143449269239
0.25 153 660.7143449269239
0.25 139 654.4985251906015
0.26 139 652.0519661851259
i got the format right through dgrid 3d and that ist my output so far:
here's my code
set dgrid3d
set grid
set palette model HSV defined ( 0 0 1 1, 1 1 1 1 )
set pm3d map
unset surf
set pm3d at b
splot "data_AIT_lvl1_20150604.csv" every ::121::600 using 3:25:43 lc palette title '{/Symbol l}average 20150604',\
"data1.csv" every ::121::361 using 3:25:43 lc palette title '{/Symbol l}average 20150605',\
"data2" every ::121::361 using 3:25:43 lc palette title '{/Symbol l}average 20150606',\
"data3.csv" every ::121::361 using 3:25:43 lc palette title '{/Symbol l}average 20150703',\
and so on for multple files
I like the output but I'd like to know if there's a way to improve the overlaying areas in the plot to distinguish the values better? Is there a gnuplot way to write all the data I hwant to plot from each file into one big table and plot the data from that table into a heat map. I tried a few things but somehow lost track of all my try and error steps so I thought maybe one of you could help me out with a clean approach to this.
Thanks for the answers so far I'm trying my best to specify my second question a bit more:
right now I have the values of multiple days plotted in the graph, it looks good but there are parts overlapping so I can't see the values (hue) of all the days in the plot.
Since in my experience, I tend to overcomplicate problems like this a bit so I decided to ask the question if there's a way to solve that.
I thought maybe by putting all the days into one big table all the data is plotted on one level so I'd get a simple colored heat map.
I tried Joces table solution, which works flawlessly but Joce was right, it didn't actually solve my problem.
as you can see there's now a huge block of data, with different colors, but you can't distinguish between the different days. Alos, the gap from the first picture (between the left big purple block and the entered orange block) is gone and melted into one big block.
So I think what I'm trying to ask is if there's another better way maybe with contour to get what I want.
What you ask for is
set table
set output "one_big_table"
splot "file1" using c1:c2:c3:..., \
"file2" using C1:C2:C3:...., \
...
unset table
This will create as many blocks as you have files, so I am not sure your final goal will be so easy to achieve. That's a different issue though.
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I have some data organized like this in a spreadsheet
c1 c2 c3 c4 c5 c6 c7 c8 c9
3137EACY3 FHLMC 0.75 14 11/25/14 Q414 -3.5 -3.5 2YR -13.6 0.26
3135G0HG1 FNMA 0.375 15 03/16/15 Q115 2.4 2.4 2YR -11.4 0.32
3135G0KM4 FNMA 0.5 15 05/27/15 Q215 3.5 3.5 2YR -13 0.33
31359MZC0 FNMA 4.375 15 10/15/15 Q315 13.1 13.1 2YR -9.9 0.43
31359MH89 FNMA 5 16 03/15/16 Q415 5.7 5.7 3YR -5.7 0.55
3137EADQ9 FHLMC 0.5 16 05/13/16 Q116 1 1 3YR -14.5 0.5
3135G0XP3 FNMA 0.375 16 07/05/16 Q216 10.7 10.7 3YR -8.6 0.6
31359M2D4 FNMA 4.875 16 12/15/16 Q316 21.4 21.4 3YR -9 0.71
3137EADC0 FHLMC 1 17 03/08/17 Q416 31.5 31.5 3YR -5.9 0.81
3137EADF3 FHLMC 1.25 17 05/12/17 Q117 -14.6 -14.6 5YR -5.5 0.86
3137EAAY5 FHLMC 5.5 17 08/23/17 Q217 -10.5 -10.5 5YR -7.3 0.9
3135G0RT2 FNMA 0.875 17 12/20/17 Q317 7 7 5YR -1.5 1.08
3137EADP1 FHLMC 0.875 18 03/07/18 Q417 13.1 13.1 5YR -1.3 1.14
3137EABP3 FHLMC 4.875 18 06/13/18 Q118 8.8 8.8 5YR -10 1.09
3137EACA5 FHLMC 3.75 19 03/27/19 Q218 39.4 39.4 5YR -0.7 1.4
And in another spreadsheet, I have some data organized like this:
i1 i2 i3 i4
EG8566960 EIB 4.75 10/15/14 10/15/14 Q414
500769AX2 KFW 4.125 10/15/14 10/15/14 Q414
045167BJ1 ASIA 4.25 10/20/14 10/20/14 Q414
298785FT8 EIB 0.875 12/15/14 12/15/14 Q414
500769ET7 KFW 1 01/12/15 01/12/15 Q115
EI1571062 CADES 2.875 03/02/15 03/02/15 Q115
XS0213706 EUROF 4.5 03/06/15 Q115
676167AQ2 OKB 4.5 03/09/15 Q115
XS0495091 NEDWBK 3 03/17/15 Q115
I'd like to write a VLOOKUP() that gets the value of c6 when i4 and c4 match up, but I'm having some trouble and can't figure out why I'm getting a #N/A error. Here's what I have written in my vlookup:
=VLOOKUP(D7,'Sheet2'!A:I, 7, FALSE)
Where D7 is where Q414 lies in my first spreadsheet. Does anyone have any suggestions as to why I'm getting this error? I feel like I've tried just about anything I can find online. When I look at the calculation steps, it goes from:
VLOOKUP("Q414",'Sheet2'!A:I,7,FALSE)
with the entire function underlined to:
#N/A
So I know that it is properly selecting "Q414" at least...
Any help is appreciated.
You select the table_array from the column containing the value you're looking for, in your case:
=VLOOKUP(D7,'Sheet2'!G:I, 3, FALSE)
G is the column where the Q414 is found, hence why the table array starts at G. And relative to this column is column I, the third column after G.
Note that vlookup only return the first match, but I'm not sure what you're exactly doing, so :)
EDIT: There was a little misunderstanding and columns c4 and c6 were actually in column D and F respectively. The formula as barry houdini rightly pointed out is thus:
=VLOOKUP(D7,'Sheet2'!D:F,3,FALSE)