Find the shortest paths between connected nodes (csv files) - python-3.x

I'm trying to find the longest shortest path(s) between 2 counties. I was given 2 .txt files, one with all of the nodes (county ID, population, latitude and longitude, and commuters inside the county) and one with the links (source county, destination county, distance, number of commuters).
01001 43671 32.523283 -86.577176 7871
01003 140415 30.592781 -87.748260 45208
01005 29038 31.856515 -85.331312 8370
01007 20826 33.040054 -87.123243 3199
01009 51024 33.978461 -86.554768 8966
01011 11714 32.098285 -85.704915 2237
01013 21399 31.735884 -86.662232 5708
01015 112249 33.741989 -85.817544 39856
01017 36583 32.891233 -85.288745 9281
01019 23988 34.184158 -85.621930 4645
01021 39593 32.852554 -86.689982 8115
01023 15922 32.027681 -88.257855 3472
01025 27867 31.688155 -87.834164 7705
...
01001 01001 0 7871
01001 01007 76.8615966430995 7
01001 01013 87.9182871130127 37
01001 01015 152.858742124667 5
01001 01021 38.1039665382023 350
01001 01031 140.051395101308 8
01001 01037 57.6726084645634 12
01001 01047 48.517875245493 585
01001 01051 38.9559472915165 741
01001 01053 169.524277177911 5
01001 01059 245.323879285783 7
01001 01065 102.775324022097 2
01001 01073 114.124721221283 142
...
01003 48439 932.019063970525 9
01003 53033 3478.13978129133 11
01003 54081 997.783781484149 10
01005 01005 0.000134258785931453 8370
01005 01011 44.3219329413987 72
01005 01021 168.973302699063 7
...
The first file with the nodes is called "THE_NODES.txt" and the second is "THE_LINKS.txt".
How would I use python code to find the longest shortest path(s) between any of the two counties? I assume I start with making a graph of the network, and since the second file has the connections, use 'THE_LINKS.txt' for the edges(I don't know if the weights would be the distance?)? Also, I think these files can only be read as a csv (correct me if I'm wrong), so I can't (or don't know how to) use networkx for this problem.

You can use the read_table function with | separator to read .txt files
node = pd.read_table('node.txt', sep='|', header=None)
links = pd.read_table('links.txt', sep='|', header=None)
Then you need to find the location of countries ( please refer this link : How to select rows from a DataFrame based on column values? ). Then you have to calculate the distance between the countries.
What have you tried so far ? Include that too.

Related

type error in functions to run point in polygon query on RAPIDS

I want to create a point in polygon query for 14million NYC taxi trips and find out which of the 263 taxi zones the trips were located.
I want to the code on RAPIDS cuspatial. I read a few forums and posts, and came across cuspatial polygon limitations that users can only perform queries on 32 polygons in each run. So I did the following to split my polygons in batches.
This is my taxi zone polygon file
cusptaxizone
(0 0
1 1
2 34
3 35
4 36
...
258 348
259 349
260 350
261 351
262 353
Name: f_pos, Length: 263, dtype: int32,
0 0
1 232
2 1113
3 1121
4 1137
...
349 97690
350 97962
351 98032
352 98114
353 98144
Name: r_pos, Length: 354, dtype: int32,
x y
0 933100.918353 192536.085697
1 932771.395560 191317.004138
2 932693.871591 191245.031174
3 932566.381345 191150.211914
4 932326.317026 190934.311748
... ... ...
98187 996215.756543 221620.885314
98188 996078.332519 221372.066989
98189 996698.728091 221027.461362
98190 997355.264443 220664.404123
98191 997493.322715 220912.386162
[98192 rows x 2 columns])
There are 263 polygons/ taxi zones in total - I want to do queries in 24 batches and 11 polygons in each iteration.
def create_iterations(start, end, batches):
iterations = list(np.arange(start, end, batches))
iterations.append(end)
return iterations
pip_iterations = create_iterations(0, 264, 24)
#loop to do point in polygon query in a table
def perform_pip(cuda_df, cuspatial_data, polygon_name, iter_batch):
cuda_df['borough'] = " "
for i in range(len(iter_batch)-1):
start = pip_iterations[i]
end = pip_iterations[i+1]
pip = cuspatial.point_in_polygon(cuda_df['pickup_longitude'], cuda_df['pickup_latitude'],
cuspatial_data[0][start:end], #poly_offsets
cuspatial_data[1], #poly_ring_offsets
cuspatial_data[2]['x'], #poly_points_x
cuspatial_data[2]['y'] #poly_points_y
)
for i in pip.columns:
cuda_df['borough'].loc[pip[i]] = polygon_name[i]
return cuda_df
When I ran the function I received a type error. I wonder what might cause the issue?
pip_pickup = perform_pip(cutaxi, cusptaxizone, pip_iterations)
TypeError: perform_pip() missing 1 required positional argument: 'iter_batch'
It seems like you are passing in cutaxi for cuda_df, cusptaxizone for cuspatial_data and pip_iterations for polygon_name variable in perform_pip function. There is no variable/value passed for iter_batch defined in perform_pip function:
def perform_pip(cuda_df, cuspatial_data, polygon_name, iter_batch):
Hence, you get the above error which states that iter_batch is missing. As stated in the above comment as well you are not passing the right number of parameters for perform_pip function.
If you edit your code to pass in the right number of variables to perform_pip function the above mentioned error :
TypeError: perform_pip() missing 1 required positional argument: 'iter_batch'
would be resolved.

How to create a dataframe with different random numbers on each column?

I'm trying to make different random numbers but it keeps being the same on every column, how to fix it, using 1 line?
CODE:
yuju= pd.DataFrame()
column_price_x = [random.uniform(65.5,140.5) for i in range(20)]
for i in range(1990,2020):
yuju[i] = column_price_x
yuju
RESULT
EXPECTED:
Different numbers value for each column
How can I deal with it?
Its much easier than you think
In [12]: import numpy as np
In [13]: df = pd.DataFrame(np.random.rand(5,5))
In [14]: df
Out[14]:
0 1 2 3 4
0 0.463645 0.818606 0.520964 0.016413 0.286529
1 0.701693 0.556813 0.352911 0.738017 0.148805
2 0.899378 0.626350 0.821576 0.917648 0.404706
3 0.985617 0.336138 0.443910 0.690457 0.627859
4 0.121281 0.784853 0.799065 0.102332 0.156317
np.random.rand samples from standard uniform distribution (over [0,1])
Edit
if you want uniform distribution over given numbers, use np.random.uniform
In [16]: pd.DataFrame(np.random.uniform(low=65.5,high=140.5,size=(5,5))
...: )
Out[16]:
0 1 2 3 4
0 124.356069 96.718934 100.587485 136.670313 124.134073
1 68.109675 105.677037 86.084935 109.284336 108.393333
2 120.445978 125.036895 92.557137 105.864824 95.297450
3 91.027931 140.040051 94.362951 80.870850 70.106912
4 107.404708 92.472469 84.748544 82.116756 129.313166
here the solution
each iteration you should random again to assign new value for each column
yuju= pd.DataFrame()
for i in range(1990,2020):
yuju[i]= [random.uniform(65.5,140.5) for i in range(20)]
yuju
output
1990 1991 1992 1993 1994 1995 1996 1997 ...
0 73.117785 104.158470 76.704672 136.295814 106.008801 88.129275 96.843800 118.172649 ... 106.08
1 77.146977 131.584449 112.781430 113.071448 118.806880 140.301281 132.196554 136.222878 ... 74.85
2 67.976294 90.571586 137.313729 126.388545 134.941530 119.544528 119.692859 124.883332 ... 82.48
3 76.577618 102.765745 137.014399 84.696234 70.087628 86.180974 121.070030 87.991356 ... 71.67
4 104.675987 134.869611 120.221701 69.652423 105.650834 107.308007 122.372708 80.037225 ... 90.58
5 107.093326 124.649323 138.961846 84.312784 98.964176 87.691698 120.426266 79.888018 ... 97.46
6 97.375159 97.607740 119.027947 77.545403 81.365235 119.204719 75.426836 132.545121 ... 120.15
7 81.099338 94.315767 123.389789 85.734648 134.746295 99.196135 65.963834 72.895016 ... 135.63
8 129.577824 118.482358 137.838454 83.338883 68.603851 138.657750 85.155046 73.311065 ... 91.12
9 129.321333 134.598491 138.810883 119.487502 75.794849 125.314185 118.499014 126.969947 ... 74.86
10 122.704160 118.282868 114.196318 69.668442 112.237553 68.953530 115.395672 114.560736 ... 88.21
11 112.653109 109.635751 78.470715 81.973892 111.413094 76.918852 76.318205 129.423737 ... 103.06
12 80.984595 136.170595 83.258407 112.248942 96.730922 84.922575 104.984614 127.646325 ... 103.24
13 82.658896 97.066191 95.096705 107.757428 93.767250 93.958438 115.113325 98.931509 ... 105.32
14 85.173060 77.257117 72.668875 87.061919 130.088992 80.001858 104.526423 85.237558 ... 87.86
15 68.428850 79.948204 107.060400 92.962859 133.393354 93.806838 99.258857 138.314982 ... 86.80
16 115.105281 110.567551 119.868457 139.482290 103.235046 128.805920 140.131489 107.568099 ... 98.16
17 71.318147 119.965667 97.135972 90.174975 125.738171 115.655945 86.333461 114.574965 ... 134.80
18 134.000260 121.417473 104.832999 129.277671 139.932955 122.623911 92.369881 109.523118 ... 137.47
19 104.444951 111.712214 130.602922 119.446700 88.256841 110.316280 74.611164 88.364896 ... 115.32

Parsing error when reading a specific Pajek (NET) file with Networkx into Jupyter

I am trying to reading this pajek file in Google Colab's version of Jupyter and I get an error when executing the following very simple code:
J = nx.MultiDiGraph()
J=nx.read_pajek("/content/data/graphdatasets/jazz.net")
print(nx.info(J))
The error is the following:
/usr/local/lib/python3.6/dist-packages/networkx/readwrite/pajek.py in parse_pajek(lines)
211 except AttributeError:
212 splitline = shlex.split(str(l))
--> 213 id, label = splitline[0:2]
214 labels.append(label)
215 G.add_node(label)
ValueError: not enough values to unpack (expected 2, got 1)
With pip show networkx, I see that I'm running Networkx version: 2.3. Am I doing something wrong in the code?
Update: Pasting below the file's first few lines:
*Vertices 198
*Arcs
*Edges
1 8 1
1 24 1
1 35 1
1 42 1
1 46 1
1 60 1
1 74 1
1 78 1
According to the Pajek definition the first two lines of your file are not according to the standard. After *vertices n, n lines with details about the vertices are expected. In addition, *edges and *arcs is a duplicate. NetworkX assumes use for an edge list, which started with *arcs a MultiDiGraph and for *edges a MultiGraph (see current code). To resolve your problem, you only need to delete the first two lines of your .net-file.

How to handle such errors?

companies = pd.read_csv("http://www.richard-muir.com/data/public/csv/CompaniesRevenueEmployees.csv", index_col = 0)
companies.head()
I'm getting this error please suggest what approaches should be tried.
"utf-8' codec can't decode byte 0xb7 in position 7"
Try encoding as 'latin1' on macOS.
companies = pd.read_csv("http://www.richardmuir.com/data/public/csv/CompaniesRevenueEmployees.csv",
index_col=0,
encoding='latin1')
Downloading the file and opening it in notepad++ shows it is ansi-encoded. If you are on a windows system this should fix it:
import pandas as pd
url = "http://www.richard-muir.com/data/public/csv/CompaniesRevenueEmployees.csv"
companies = pd.read_csv(url, index_col = 0, encoding='ansi')
print(companies)
If not (on windows), you need to research how to convert ansi-encoded text to something you can read.
See: https://docs.python.org/3/library/codecs.html#standard-encodings
Output:
Name Industry \
0 Walmart Retail
1 Sinopec Group Oil and gas
2 China National Petroleum Corporation Oil and gas
... ... ...
47 Hewlett Packard Enterprise Electronics
48 Tata Group Conglomerate
Revenue (USD billions) Employees
0 482 2200000
1 455 358571
2 428 1636532
... ... ...
47 111 302000
48 108 600000

Export a matrix to Excel

I made a matrix and I want to export it to Excel. The matrix looks like this:
1 2 3 4 5 6 7
2 0.4069264
3 0.5142857 0.2948718
4 0.3939394 0.4098639 0.3772894
5 0.3476190 0.3717949 0.3194444 0.5824176
6 0.2809524 0.3974359 0.2222222 0.3388278 0.3974359
7 0.2809524 0.5987654 0.3933333 0.4188713 0.4711538 0.3429487
8 0.4675325 0.4855072 0.4523810 0.4917184 0.3409091 0.4318182 0.4128788
9 0.3896104 0.5189594 0.4404762 0.2667549 0.5471429 0.3604762 0.3081502
10 0.4242424 0.4068878 0.3484432 0.2708333 0.4766484 0.3740842 0.4528219
11 0.3476190 0.3942308 0.2881944 0.3228022 0.4711538 0.2147436 0.3653846
12 0.6060606 0.3949830 0.2971612 0.3541667 0.5022894 0.3484432 0.4466490
13 0.4675325 0.5972222 0.6060606 0.3670635 0.4393939 0.3939394 0.3695652
14 0.4978355 0.4951499 0.4480952 0.4713404 0.3814286 0.3147619 0.4629121
15 0.4632035 0.4033883 0.4508929 0.3081502 0.4728571 0.3528571 0.4828571
16 0.3766234 0.5173993 0.4771825 0.4734432 0.5114286 0.3514286 0.4214286
17 0.3939394 0.5289116 0.3260073 0.3333333 0.5663919 0.2330586 0.3015873
18 0.3939394 0.3708791 0.2837302 0.4102564 0.3392857 0.2559524 0.4123810
19 0.3160173 0.5727041 0.4885531 0.3056973 0.4725275 0.3827839 0.3346561
20 0.3333333 0.5793651 0.4257143 0.4876543 0.4390476 0.2390476 0.3131868
21 0.5281385 0.3762755 0.4052198 0.2997449 0.4180403 0.2898352 0.4951499
22 0.3593074 0.3784014 0.4075092 0.2423469 0.4908425 0.3113553 0.3430335
23 0.5281385 0.5875850 0.4404762 0.4634354 0.6071429 0.3763736 0.3747795
24 0.3549784 0.6252381 0.5957341 0.4328571 0.4429563 0.4429563 0.3422619
25 0.4242424 0.4931973 0.5054945 0.2142857 0.4670330 0.4285714 0.4312169
26 0.3852814 0.5671769 0.4954212 0.4073129 0.3736264 0.4890110 0.4523810
27 0.5238095 0.3269558 0.5187729 0.4051871 0.5412088 0.5155678 0.5859788
28 0.3160173 0.1904762 0.3205128 0.3384354 0.3429487 0.3173077 0.5123457
29 0.2380952 0.4468537 0.5196886 0.4536565 0.4491758 0.4491758 0.4634039
30 0.4545455 0.4295635 0.4080087 0.4791667 0.3474026 0.3019481 0.4627329
31 0.2857143 0.3988095 0.3397436 0.3443878 0.4294872 0.2756410 0.3456790
32 0.3636364 0.3027211 0.3772894 0.3452381 0.4413919 0.3388278 0.3818342
33 0.3333333 0.4482402 0.4080087 0.4275362 0.2888199 0.4047619 0.4301242
34 0.5411255 0.4825680 0.4043040 0.4417517 0.4748168 0.3850733 0.3708113
35 0.3160173 0.5476190 0.4230769 0.3979592 0.3653846 0.3397436 0.2283951
36 0.4603175 0.4653209 0.4778912 0.5170807 0.3928571 0.4508282 0.4254658
37 0.3939394 0.1955782 0.2490842 0.4047619 0.2490842 0.3516484 0.4559083
38 0.3463203 0.4660494 0.4300000 0.4157848 0.3833333 0.2233333 0.2788462
39 0.5844156 0.4668367 0.3809524 0.3843537 0.4803114 0.3008242 0.5026455
40 0.5454545 0.4902211 0.3740842 0.2946429 0.5279304 0.2971612 0.3293651
41 0.5800866 0.3758503 0.5073260 0.5136054 0.3598901 0.5393773 0.4823633
42 0.4458874 0.3937390 0.3785714 0.4686949 0.3768315 0.3127289 0.4954212
43 0.6536797 0.5740741 0.5533333 0.4453263 0.4866667 0.5400000 0.4358974
44 0.5887446 0.5548469 0.4308608 0.3949830 0.5462454 0.3411172 0.5136684
45 0.4069264 0.4357993 0.4308608 0.3830782 0.4308608 0.3795788 0.4025573
46 0.5974026 0.3826531 0.3672161 0.3954082 0.4441392 0.3159341 0.5141093
47 0.2554113 0.4196429 0.4262821 0.4961735 0.2788462 0.3301282 0.3055556
I tried the command:
WriteXLS("my matrix after i converted it to data.frame", "test.xls")
but I got this error:
The Perl script 'WriteXLS.pl' failed to run successfully.
I googled it but I couldn't find a solution.
Thanks in advance.
Any reason why you can't just use write.csv?
write.csv(mymatrix, "test.csv")
Import it in Excel and you're set!
PS: I assume you're not putting quotes around your variable name in the WriteXLS call, right?
One other option on Windows (which seems a reasonable assumption given that you are using Excel):
You can write a matrix (or data frame) to the clipboard using a command like:
write.table(mymat, 'clipboard', sep='\t')
Then just go into Excel, click in the cell that you want to be the top left cell, then do a paste and your matrix is there (the sep='\t' is important for Excel to interpret it correctly).
This is similar to other answers, but you don't need an intermediate file on disk.
You could also check xlsx if you do not mind the Excel 2007 format, as xlsx does not depend on Perl (though depends on rJava).
After loading the packge via library(xlsx) just try the following:
write.xlsx(USArrests, "/usarrests.xlsx")
It's hard to see what is going on here exactly. Might be several things.
I think the easiest way to write a matrix to excell is by using write.table() and importing the data in excell. It takes an extra step but it also keeps your data in a nice format.
If foo is your matrix:
write.table(foo,"foo.txt")
If you get an error maybe trie coercing the object to a matrix:
write.table(as.matrix(foo),"foo.txt")
Does the matrix contain values in the upper triangle as well? Perhaps making a full matrix works:
foo<-foo+t(foo)
write.table(as.matrix(foo),"foo.txt")
But these are all just random shots in the dark since I don't have a matrix to work with.
EDIT: In response to the other answer, you can remove the column and rownames with col.names=FALSE and row.names=FALSE in both write.table() and write.csv() (which are the same function with different default values).
I have met the same problem, after reinstalling strawberry perl : after debugging the WriteXLS function in R, I found out the the perl module Text::CSV_XS was missing from my fresh new install. I installed this module from the DOS command line :
perl -MCPAN -e shell
install Text::CSV_XS
After this, WriteXLS was working fine.
upper # matrix name
write.xlsx2(upper,file = "File.xlsx", sheetName="Sheetname",col.names=TRUE, row.names=TRUE, append=TRUE, showNA=TRUE)

Resources