I'm trying to create a DiGraph from a csv file with adjacency matrix dataframe:
Gdf = pd.read_csv("outputtest.csv", index_col = 0) #load the csv to pandas dataframe
G = nx.from_pandas_adjacency(Gdf, create_using = nx.DiGraph()) #turn the dataframe into Digraph
however it comes out a Graph not a DiGraph
[in]: print(nx.info(G))
[out]:Name:
Type: Graph
...
please help me to fix it
This is my first time trying to use Python, I have no experience on any types of coding, and my English is quite poor.
I've already try to search it on internet and find a similar question as mine:
How to create a directed networkx graph from a pandas adjacency matrix dataframe?
but the answer is using numpy and have to reset labels, which might not really suitable for my case that with much more nodes than 4.
This was a bug fixed in Oct 2017 (https://github.com/networkx/networkx/pull/2693)
Please upgrade to NetworkX 2.1 or the github latest version to get the fixed code.
:)
from Dan Schult
Related
I'm using style in pandas to display a dataframe consisting of a timestamp on jupyter notebook.
The displayed value, 1623838447949609984 turned out to be different from the input, 1623838447949609899.
pandas version, 1.4.2.
Can someone please explain the reason of the following code and output?
Thanks.
import pandas as pd
pd.DataFrame([[1623838447949609899]]).style
Pandas Styler, within its render script, contains the line return f"{x:,.0f}" when x is an integer.
In python if you execute
>>> "{:.0f}".format(1623838447949609899)
'1623838447949609984'
you obtain the result you cite. I suspect this is due to data storage of integers. Although why it pandas might be converting from a 64 bit int to a 32 bit int is unclear, and not related to Styler
I have a network diagram that is sketched in Visio. I would like to use it as an input for the networkx graph from node2vec python package. The documentation says that there is a function called to_networkx_graph() that takes, as its input, the following types of data:
"any NetworkX graph dict-of-dicts dict-of-lists container (e.g. set, list, tuple) of edges iterator (e.g. itertools.chain) that produces edges generator of edges Pandas DataFrame (row per edge) numpy matrix numpy ndarray scipy sparse matrix pygraphviz agraph"
But, still, not mentioning other formats like Visio, pdf, odg, PowerPoint, etc.
So, how to proceed?
I think you need to create some data in the format referred to in the documentation, not just a network diagram. A Visio diagram will not do the job and I know of no way to do a conversion.
I want to plot a multiparite graph using networkx. However, when adding more nodes, the plot becomes very crowdy. Is there a way to have more space between nodes and partitions?
Looking at the documentation of multipartite_layout, I couldn't find parameters for this.
Of course, one could create complicated formulas for the positions, but since the spacing of multipartite_layout already looks so good for small graphs, I was how to scale this to bigger graphs.
Has anyone an idea how to do this (efficiently)?
Sample code, generating a graph with three partitions:
import matplotlib.pyplot as plt
import networkx as nx
# build graph:
G = nx.Graph()
for i in range (0,30):
G.add_node(i,layer=0)
for i in range (30,50):
G.add_node(i,layer=1)
for j in range(0,30):
G.add_edge(i,j)
G.add_node(100,layer=2)
G.add_edge(40,100)
# plot graph
pos = nx.multipartite_layout(G, subset_key="layer",)
plt.figure(figsize=(20, 8))
nx.draw(G, pos,with_labels=False)
plt.axis("equal")
plt.show()
The current, crowdy plot:
nx.multipartite_layout returns a dictionary with the following format: {node: array([x, y])}
I suggest you try pos = {p:array_op(pos[p]) for p in pos} where array_op is a function acting on the position of each node, array([x, y]).
In your case, I think a simple scaling along the x-axis suffice, i.e.
array_op = lambda x, sx: np.array(x[0]*sx, x[1]).
For visualization purpose I guess this should be equivalent with #JPM 's comment. However, this approach gives you the advantage of having the actual transformed position data.
In the end, if such uniform transformation does not satisfy your need, you can always manipulate the position manually with the knowledge of the format of the dict (although it might be less efficient).
In python 3.6 I have imported a netCDF4 file containing global precipitation values. I have also imported a shapefile which contains the shape for the Colorado River basin. My goal is to be able to read/extract precipitation data only within my shapefile. I have looked up multiple examples but none have really helped.
Here is my code so far:
from netCDF4 import Dataset
import numpy as np
import geopandas as gpd
nc = Dataset('filename.nc')
long = nc.variables['lon'][:]
lati = nc.variables['lat'][:]
rainfall = nc.variables['precip'][:]
shapefile=gpd.read_file('filename.shp')
There are no error messages on the code above.
Oh, look, hydrologist in the house! ;)
Well, so far you haven't done much with your code, all you did was read files into memory.
When I was trying to perform the same analysis (only with grib files), I found a great Python library for exactly such purpose, called RasterStats.
It supports working with ndarray raster objects as well as most of the GDAL supported raster filetypes (must be netCDF also!), and it generates exactly the thing you want.
For more, see a very neat manual and let me know if you get stuck somewhere!
I am using rolling mean on my data to smoothen it. My data can be found here.
An illustration of my original data is;
Currently, I am using
import pandas as pd
import numpy as np
data = pd.read_excel('data.xlsx')
data = np.array(data, dtype=np.float)
window_length = 9
res = pd.rolling_mean(np.array(data[:, 2]), window_length, min_periods=1, center=True)
This is what I get after applying rolling mean with a window_length of 9;
And when i increase the window_length to 20, I get a smoother image but at boundaries, the data seems to be erroneous.
The problem is, as seen in the figures above, the rolling mean introduces some sort of errors at the boundaries of my data which do not exist in the original data.
Is there any way to correct this?
My guess is, at the boundary, since part of the window_length is found outside my data, it exaggerates the mean.
Is there a way to correct this error using pandas rolling mean or is there a better pythonic way in doing this? Thanks.
Ps. I am aware the panda function of rolling mean i am using is deprecated in the new versiĆ³n.
You can try a native 2D convolution method such as scipy.ndimage.filters.convolve with weights so just make the kernel an average (mean) function.
The weights would be:
n = 3. # size of kernel over which to calculate mean
weights = np.ones(n,n)/n**2
If the white area of your data are represented by nans, this would reduce the footprint of the result by n since any kernel stamp with a nan included will return a nan. If this is really an issue try look at astropy.convolution, which has better nan handling.