Related
I am reforming the 2D coordinate number in a aligned way which was not aligned (coordinate numbers were suffled) before.
I have below input coordinates,
X = [2, 2, 3, 4, 4, 4, 4, 5, 6, 6, 6, 6, 6, 5, 4, 3, 5, 5, 5]
Y = [2, 3, 3, 3, 4, 5, 6, 6, 6, 5, 4, 3, 2, 2, 2, 2, 3, 4, 5]
I have to make it aligned. Therefore, I first applied Sorted function on this coordinates. I got below output after it.
merged_list1 = sorted(zip(X, Y))
output
X1_coordinate_reformed = [2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6]
Y1_coordinate_reformed = [2, 3, 2, 3, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6]
Still it iot aligned properly. I want two consecutive nodes place next to each other. Therefore I am applying the approach to find the nearest coordinate from origin to find the very first node. Then from the first node, I found another nearest coordinate and so on...For that, I have applied below code,
First I wrote a function which calculates the distance and gives index of the nearest coordinate from the list.
def solve(pts, pt):
x, y = pt
idx = -1
smallest = float("inf")
for p in pts:
if p[0] == x or p[1] == y:
dist = abs(x - p[0]) + abs(y - p[1])
if dist < smallest:
idx = pts.index(p)
smallest = dist
elif dist == smallest:
if pts.index(p) < idx:
idx = pts.index(p)
smallest = dist
return idx
coor2 = list(zip(X1_coordinate_reformed, Y1_coordinate_reformed)) # make a list which contains tuples of X and Y coordinates
pts2 = coor2.copy()
origin1 = (0, 0)
new_coor1 = []
for i in range(len(pts2)):
pt = origin1
index_num1 = solve(pts2, pt)
print('index is', index_num1)
origin1 = pts2[index_num1]
new_coor1.append(pts2[index_num1])
del pts2[index_num1]
After running the code, I got below output,
[(6, 6), (5, 6), (4, 6), (4, 5), (4, 4), (4, 3), (3, 3), (2, 3), (2, 2), (3, 2), (4, 2), (5, 2), (5, 3), (5, 4), (5, 5), (6, 5), (6, 4), (6, 3), (6, 2)]
Which is not correct because it can be clearly understand that,
coor2 = [(2, 2), (2, 3), (3, 2), (3, 3), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)]
origin = (0, 0)
if we find the distance between Origin which was (0, 0) in very first and from every coordinate from above coor2 list, we will get (2,2) is nearest coordinate. Then How come my code gives (6,6) is the nearest coordinate??
The interesting thing is, if I apply the same procedure (sorting followed by finding nearest coordinate) on below coordinates,
X2_coordinate = [2, 4, 4, 2, 3, 2, 4, 3, 1, 3, 4, 3, 1, 2, 0, 3, 4, 2, 0]
Y2_coordinate = [3, 4, 2, 1, 3, 2, 1, 0, 0, 2, 3, 4, 1, 4, 0, 1, 0, 0, 1]
After applying sorted function
X2_coordinate_reformed = [0, 0, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]
Y2_coordinate_reformed = [0, 1, 0, 1, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
After applying method of searching nearest coordinates mentioned above, the result I got
[(0, 0), (0, 1), (1, 1), (1, 0), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (3, 4), (3, 3), (3, 2), (3, 1), (3, 0), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]
Kindly suggest me where I am doing wrong and what should I change??
It is better to use scipy for finding closest coordinate.
The code given below works.
from scipy import spatial
pts = merged_list1.copy()
origin = (0, 0)
origin = np.array(origin)
new_coordi = []
for i in range(len(pts)):
x = origin
distance,index = spatial.KDTree(pts).query(x)
new_coordi.append(pts[index])
origin = np.array(pts[index])
del pts[index]
I need to concatenate a uid from uids column to each of the uids in the list of the friends column, as shown in the following example:
Given a pandas.DataFrame object A:
uid friends
0 1 [10, 2, 1, 5]
1 2 [1, 2]
2 3 [5, 4]
3 4 [10, 5]
4 5 [1, 2, 5]
the desired output is:
uid friends in_edges
0 1 [10, 2, 1, 5] [(1, 10), (1, 2), (1, 1), (1, 5)]
1 2 [1, 2] [(2, 1), (2, 2)]
2 3 [5, 4] [(3, 5), (3, 4)]
3 4 [10, 5] [(4, 10), (4, 5)]
4 5 [1, 2, 5] [(5, 1), (5, 2), (5, 5)]
I use the following code to achieve this outcome:
import numpy as np
import pandas as pd
A = pd.DataFrame(dict(uid=[1, 2, 3, 4, 5], friends=[[10, 2, 1, 5], [1, 2], [5, 4], [10, 5], [1, 2, 5]]))
A.loc[:, 'in_edges'] = A.loc[:, 'uid'].apply(lambda uid: [(uid, f) for f in A.loc[A.loc[:, 'uid']==uid, 'friends'].values[0]])
but it the A.loc[A.loc[:, 'uid']==uid, 'friends'] part looks kind of cumbersome to me, so I wondered if there is an easier way to accomplish this task?
Thanks in advance.
You can use .apply() with axis=1 parameter:
df["in_edges"] = df[["uid", "friends"]].apply(
lambda x: [(x["uid"], f) for f in x["friends"]], axis=1
)
print(df)
Prints:
uid friends in_edges
0 1 [10, 2, 1, 5] [(1, 10), (1, 2), (1, 1), (1, 5)]
1 2 [1, 2] [(2, 1), (2, 2)]
2 3 [5, 4] [(3, 5), (3, 4)]
3 4 [10, 5] [(4, 10), (4, 5)]
4 5 [1, 2, 5] [(5, 1), (5, 2), (5, 5)]
Why not try product
import itertools
A['in_edges'] = A.apply(lambda x : [*itertools.product([x['uid']], x['friends'])],axis=1)
A
Out[50]:
uid friends in_edges
0 1 [10, 2, 1, 5] [(1, 10), (1, 2), (1, 1), (1, 5)]
1 2 [1, 2] [(2, 1), (2, 2)]
2 3 [5, 4] [(3, 5), (3, 4)]
3 4 [10, 5] [(4, 10), (4, 5)]
4 5 [1, 2, 5] [(5, 1), (5, 2), (5, 5)]
I'm working on a ML project for which I'm using numpy arrays instead of pandas for faster computation.
When I intend to bootstrap, I wish to subset the columns from a numpy ndarray.
My numpy array looks like this:
np_arr =
[(187., 14.45 , 20.22, 94.49)
(284., 10.44 , 15.46, 66.62)
(415., 11.13 , 22.44, 71.49)]
And I want to index columns 1,3.
I have my columns stored in a list as ix = [1,3]
However, when I try to do np_arr[:,ix] I get an error saying too many indices for array .
I also realised that when I print np_arr.shape I only get (3,), whereas I probably want (3,4).
Could you please tell me how to fix my issue.
Thanks!
Edit:
I'm creating my numpy object from my pandas dataframe like this:
def _to_numpy(self, data):
v = data.reset_index()
np_res = np.rec.fromrecords(v, names=v.columns.tolist())
return(np_res)
The reason here for your issue is that the np_arr which you have is a 1-D array. Share your code snippet as well so that it can be looked into as in what is the exact issue. But in general, while dealing with 2-D numpy arrays, we generally do this.
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
You have created a record array (also called a structured array). The result is a 1d array with named columns (fields).
To illustrate:
In [426]: df = pd.DataFrame(np.arange(12).reshape(4,3), columns=['A','B','C'])
In [427]: df
Out[427]:
A B C
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
In [428]: arr = df.to_records()
In [429]: arr
Out[429]:
rec.array([(0, 0, 1, 2), (1, 3, 4, 5), (2, 6, 7, 8), (3, 9, 10, 11)],
dtype=[('index', '<i8'), ('A', '<i8'), ('B', '<i8'), ('C', '<i8')])
In [430]: arr['A']
Out[430]: array([0, 3, 6, 9])
In [431]: arr.shape
Out[431]: (4,)
I believe to_records has a parameter to eliminate the index field.
Or with your method:
In [432]:
In [432]: arr = np.rec.fromrecords(df, names=df.columns.tolist())
In [433]: arr
Out[433]:
rec.array([(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11)],
dtype=[('A', '<i8'), ('B', '<i8'), ('C', '<i8')])
In [434]: arr['A'] # arr.A also works
Out[434]: array([0, 3, 6, 9])
In [435]: arr.shape
Out[435]: (4,)
And multifield access:
In [436]: arr[['A','C']]
Out[436]:
rec.array([(0, 2), (3, 5), (6, 8), (9, 11)],
dtype={'names':['A','C'], 'formats':['<i8','<i8'], 'offsets':[0,16], 'itemsize':24})
Note that the str display of this array
In [437]: print(arr)
[(0, 1, 2) (3, 4, 5) (6, 7, 8) (9, 10, 11)]
shows a list of tuples, just as your np_arr. Each tuple is a 'record'. The repr display shows the dtype as well.
You can't have it both ways, either access columns by name, or make a regular numpy array and access columns by number. The named/record access makes most sense when columns are a mix of dtypes - string, int, float. If they are all float, and you want to do calculations across columns, its better to use the numeric dtype.
In [438]: arr = df.to_numpy()
In [439]: arr
Out[439]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
According to the more_itertools.windowed specification, you can do:
list(windowed(seq=[1, 2, 3, 4], n=2, step=1))
>>> [(1, 2), (2, 3), (3, 4)]
But what if I want to run it all to the end? Is it possible to get:
>>> [(1, 2), (2, 3), (3, 4), (4, None)]
A workaround but not the best solution is to append None with the sequence.
list(windowed(seq=[1, 2, 3, 4,None], n=2, step=1))
I believe you can do this programmatically based on the step= value which I refer to as win_step in the following code. I also removed hardcoding where possible to make it easier to test various sequence_list, win_width, and win_step data sets:
sequence_list = [1, 2, 3, 4]
win_width = 2
win_step = 1
none_list = []
for i in range(win_step):
none_list.append(None)
sequence_list.extend(none_list)
tuple_list = list(windowed(seq=sequence_list, n=win_width, step=win_step))
print('tuple_list:', tuple_list)
Here are my results based on your original question's data set, and on the current data set:
For original, where:
sequence_list = [1, 2, 3, 4, 5, 6]
win_width = 3
win_step = 2
The result is:
tuple_list: [(1, 2, 3), (3, 4, 5), (5, 6, None), (None, None, None)]
And for the present data set, where:
sequence_list = [1, 2, 3, 4]
win_width = 2
win_step = 1
The result is:
tuple_list: [(1, 2), (2, 3), (3, 4), (4, None)]
I have a list of tuples containing numbers
list_numbers = [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10)]
How do I use list comprehension to get a list of the sum of each item in the tuple?
expected_result = [7, 9, 11, 13, 15]
You can just loop through the list and call the sum() function on each tuple.
sums = [sum(t) for t in list_numbers]
> [7, 9, 11, 13, 15]