I am coding in Python 3.8.
I have two variables, x and y which when varied output a different value of z. I would like to make a contour plot, however I am struggling to find a way to make the M x N data other than manually making it in a CSV.
Sample data:
x = [4 4 2 2 6 12 4 2]
y = [1 4 2 15 1 4 4 1]
z= [100 24 54 21 24 50 29 19]
How do I create a sorted matrix with x rows, and y columns for my contour plot?
I have also just tried doing:
plt.contourf(x,y,z)
However this does not give me the output I want.
I believe I need to use np.mesh in some way, but I cannot figure out how.
The dataset is a lot larger than this, and I would like to understand the best way to tackle this.
Thanks!
Related
I have some time series data (in a Pandas dataframe), d(t):
time 1 2 3 4 ... 99 100
d(t) 5 3 17 6 ... 23 78
I would like to get a time-shifted version of the data, e.g. d(t-1):
time 1 2 3 4 ... 99 100
d(t) 5 3 17 6 ... 23 78
d(t-1) NaN 5 3 17 6 ... 23
But with a complication. Instead of simply time-shifting the data, I need to take the expected value based on a Poisson-distributed shift. So instead of d(t-i), I need E(d(t-j)), where j ~ Poisson(i).
Is there an efficient way to do this in Python?
Ideally, I would be able to dynamically generate the result with i as a parameter (that I can use in an optimization).
numpy's Poisson functions seem to be about generating draws from a Poisson rather than giving a PMF that could be used to calculate expected value. If I could generate a PMF, I could do something like:
for idx in len(d(t)):
Ed(t-i) = np.multiply(d(t)[:idx:-1], PMF(Poisson, i)).sum()
But I have no idea what actual functions to use for this, or if there is an easier way than iterating over indices. This approach also won't easily let me optimize over i.
You can use scipy.stats.poisson to get PMF.
Here's a sample:
from scipy.stats import poisson
mu = 10
# Declare 'rv' to be a poisson random variable with λ=mu
rv = poisson(mu)
# poisson.pmf(k) = (e⁻ᵐᵘ * muᵏ) / k!
print(rv.pmf(4))
For more information about scipy.stats.poisson check this doc.
I have two dataframes as
df_schematic
layer x y
0 18 -10850.0 -6550.0
1 18 -10850.0 -5750.0
2 18 -10950.0 -5850.0
3 18 -10950.0 -5450.0
4 31 -10850.0 -5350.0
5 14 -10850.0 -4950.0
6 17 2945.5 6550.0
2278 rows × 3 columns
df_report
layer x y
0 18 9161.19 -3106.42
1 18 9141.51 -3185.38
2 18 9023.40 -3185.38
3 18 9003.71 -3106.42
4 18 8800.20 -2840.65
5 17 2945.8 6549.6
2216 rows × 3 columns
i am trying to compare df_schematic with the report and find out any missing or irregular values among the report. The main problem is the level of tolerance we can have for a coordinate.
For example:
17 2945.5 6550.0
and
17 2945.8 6549.6
are clearly not equal but they should be passed as a correct entry as the error level is +/-0.5.
Is there any way to find out the missing values and while keeping the tolerance in mind.
Make some experiments with np.isclose.
I mean the following scenario:
Write a function, say isClose, comparing one pair of coordinates (x1, y1) with
another pair (x2, y2), from 2 source rows, something like
np.isclose(x1, x2, atol=0.5) & np.isclose(y1, y2, atol=0.5).
Taking a row from df_schematic as a "base point":
find in df_report all rows with exactly equal value of layer,
for each such row check isClose for x and y coordinates from both rows,
until you find one where this function returns True.
Repeat this procedure for each row from df_schematic.
I have a dataset in a table format that looks like this:
test frequency
1 test40 3
2 test33 5
3 test19 2
4 test4521 1
5 test34 1
6 test27 3
7 test42 3
8 test35 1
....
If I use this command:
library(ggplot2)
ggplot(t, aes("frequency")) +
geom_histogram()
("t" is the name of my table)
Then RStudio says: "StatBin requires a continuous x variable: the x variable is discrete. Perhaps you want stat="count"?"
I just want to see how many times a 3 or a 5 etc. occurs.
Thanks for your help.
It looks like your data is already aggregated? Maybe the ggplot2::geom_histogram() function might not appropriate for you to use? Have you tried the geom_col() function? This simply takes the numbers declared in the input data frame, and displays a column plot with that data.
Using the below code
# Declare data frame
t <- data.frame(test = c("test40", "test33", "test19", "test4521",
"test34", "test27", "test42", "test35"),
frequency = c(3, 5, 2, 1,
1, 3, 3, 1))
returns the data frame like this
# View data
print(t)
test frequency
1 test40 3
2 test33 5
3 test19 2
4 test4521 1
5 test34 1
6 test27 3
7 test42 3
8 test35 1
and therefore you can plot it like this
# Load package
library(ggplot2)
# Generate column plot
ggplot(t, aes(test, frequency)) +
geom_col()
If you simply wanted a count of the times that the number 2 or the number 3 occurred in your data frame, then yes the geom_histogram() is the correct function to use. See, the geom_histogram() function counts the frequency that a term occurs in the data frame, then returns the result. It has an internal validation that looks at the type of data that you are trying to plot across the x-axis, and notices that if it is discrete, then you need to parse the parameter stat="count" in the function. If you don't include this parameter, then ggplot will try to bin your data to create the histogram, which is illogical because all you want is a count.
Check out this link for a description of the difference between continuous and discrete data: What is the difference between discrete data and continuous data?
With this in mind, you can plot the histogram like this
# Generate histogram plot
ggplot(t, aes(frequency)) +
geom_histogram(stat="count")
I hope that helps mate.
I have data like the following:
x y f
1 1 1.2
1 2 1.4
1 3 1.6
3 1 3.2
3 2 3.4
3 3 3.6
5 1 5.2
5 2 5.4
5 3 5.6
If you insert a pivot chart, you can plot f vs x and y using a line chart, and the plot has two stacked x-axes where the lower x-axes values are 1 3 5 corresponding to x, and the upper x-axes has values 1 2 3 for each value of the lower x-axes, representing x = 1 and y = 1 2 3, then x = 2 and y = 1 2 3, and x = 3 and y = 1 2 3. The plot should show a single continuous line from left to right. What I would like is for the line to break when x changes values, so there are three short lines showing the influence of y for constant values of x.
This link makes a chart similar to what I'm describing in the answer. In terms of that figure, what I want is for the link to break every time the year changes. But the answer they have, and discussion doesn't get what I'm looking for. The only approach that I can think of is to modify the PivotTable data by hand and add a row at the location the data breaks. I tried to do something like that at work, but before modifying the table, I copied the table as values to a separate location. With the new data table, I was not able to create the plot with two x axis. If I created the plot, I could put a second value in when y = 3, and for f have NA(), which should create the break in the proper location.
For something that looks like:
Select each of the second and subsequent y 1 values (individually):
and Format Data Point..., Line, No line.
(BTW IMO better suited to Super User.)
I have data that looks like this:
1000 13 75.2
1000 21 79.21
1000 29 80.02
5000 29 87.9
5000 37 88.54
5000 45 88.56
10000 29 90.11
10000 37 90.79
10000 45 90.87
I want to use the first column as x axis labels, the second column as y axis labels and the third column as the z values. I want to display a surface in that manner. What is the best way to do this? I tried Excel but didn't really get anywhere. Does anyone have any suggestions for a tool to do this? Does anyone know how to do this in Excel?
Thanks
I ended up using matplotlib :)
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import matplotlib.pyplot as plt
import numpy as np
x = [1000,1000,1000,1000,1000,5000,5000,5000,5000,5000,10000,10000,10000,10000,10000]
y = [13,21,29,37,45,13,21,29,37,45,13,21,29,37,45]
z = [75.2,79.21,80.02,81.2,81.62,84.79,87.38,87.9,88.54,88.56,88.34,89.66,90.11,90.79,90.87]
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_trisurf(x, y, z, cmap=cm.jet, linewidth=0.2)
plt.show()
You really can't display 3 columns of data as a 'surface'. Only having one column of 'Z' data will give you a line in 3 dimensional space, not a surface (Or in the case of your data, 3 separate lines). For Excel to be able to work with this data, it needs to be formatted as shown below:
13 21 29 37 45
1000 75.2
1000 79.21
1000 80.02
5000 87.9
5000 88.54
5000 88.56
10000 90.11
10000 90.79
10000 90.87
Then, to get an actual surface, you would need to fill in all the missing cells with the appropriate Z-values. If you don't have those, then you are better off showing this as 3 separate 2D lines, because there isn't enough data for a surface.
The best 3D representation that Excel will give you of the above data is pretty confusing:
Representing this limited dataset as 2D data might be a better choice:
As a note for future reference, these types of questions usually do a little better on superuser.com.
You can use r libraries for 3 D plotting.
Steps are:
First create a data frame using data.frame() command.
Create a 3D plot by using scatterplot3D library.
Or You can also rotate your chart using rgl library by plot3d() command.
Alternately you can use plot3d() command from rcmdr library.
In MATLAB, you can use surf(), mesh() or surfl() command as per your requirement.
[http://in.mathworks.com/help/matlab/examples/creating-3-d-plots.html]
You also can use Gnuplot which is also available from gretl. Put your x y z data on a text file an insert the following
splot 'test.txt' using 1:2:3 with points palette pointsize 3 pointtype 7
Then you can set labels, etc. using
set xlabel "xxx" rotate parallel
set ylabel "yyy" rotate parallel
set zlabel "zzz" rotate parallel
set grid
show grid
unset key
Why not merge the rows that contain the same values?
-
13 21 29 37 45
1000] -75.2 -- 79.21 -- 80.02
5000] ---------------------87.9---88.54----88.56
10000] -------------------90.11--90.97----90.87
Excel can use that pretty well..