Plot individuals home-range with Adehabitat - adehabitathr

I am trying to put the name from the individuals of my research in a polygons home-range plot, but after many attempts I still can not achieve it.
Here and example of my data: X and Y are coordinates and id are individuals
X Y id
29 29 4
44 28 7
57 57 5
60 81 11
32 41 4
43 29 7
57 57 5
46 83 11
32 41 4
43 29 7
57 56 5
60 82 11
35 40 4
43 28 7
62 55 5
54 73 11
27 40 4
43 28 7
61 54 5
First, i calculated the home-range of my data with MPC
cp <- mcp((data)[,1],percent=95, unin = c("m"), unout = c("m2"))
And the plot it
plot(cp, axes=TRUE, border = rainbow(12))
But i don´t kown which polygons correspond to each individual, and if possible i need to include the id of my individuals inside each polygon
Any help would be appreciated!!
Thanks
Juan

Here is an example using the example data from the adehabitatHR package, since you not really providing a reproducible example.
library(adehabitatHR)
data("puechabonsp")
cp <- mcp(puechabonsp$relocs[, 1], percent=95, unin = c("m"), unout = c("m2"))
One way would be to use ggplot2 and sf:
library(sf)
library(tidyverse)
st_as_sf(cp) %>% ggplot(., aes(fill = id)) + geom_sf(alpha = 0.5) +
scale_fill_discrete(name = "Animal id")

Related

Analysis on dataframe with python

I want to be able to calculate the average 'goal','shot',and 'miss' per shooterName to use for further analysis and visualization
The code below gives me the count of the 3 attributes(shot,goal,miss) in the 'event' column sorted by 'shooterName'
Dataframe columns:
season period time teamCode event goal xCord yCord xCordAdjusted yCordAdjusted ... playerPositionThatDidEvent timeSinceFaceoff playerNumThatDidEvent shooterPlayerId shooterName shooterLeftRight shooterTimeOnIce shooterTimeOnIceSinceFaceoff shotDistance
Corresponding data
2020 1 16 PHI SHOT 0 -74 29 74 -29 ... C 16 11 8478439.0 Travis Konecny R 16 16 32.649655
2020 1 34 PIT SHOT 0 49 -25 49 -25 ... C 34 9 8478542.0 Evan Rodrigues R 34 34 47.169906
2020 1 65 PHI SHOT 0 -52 -31 52 31 ... L 65 86 8480797.0 Joel Farabee L 31 31 48.270074
2020 1 171 PIT SHOT 0 43 39 43 39 ... C 42 9 8478542.0 Evan Rodrigues R 42 42 60.307545
2020 1 209 PHI MISS 0 -46 33 46 -33 ... D 38 5 8479026.0 Philippe Myers R 38 38 54.203321
Current code:
dft['count'] = df.groupby(['shooterName', 'event'])['event'].agg(['count'])
dft
Current Output:
shooterName event count
A.J. Greer GOAL 1
MISS 6
SHOT 29
Aaron Downey GOAL 1
MISS 4
SHOT 35
Zenon Konopka GOAL 8
MISS 57
SHOT 176
Desired Output:
shooterName event count %totalshooterNameevents
A.J. Greer GOAL 1 .0277
MISS 6 .1666
SHOT 29 .805
Aaron Downey GOAL 1 .025
MISS 4 .1
SHOT 35 .875
Zenon Konopka GOAL 8 .0331
MISS 57 .236
SHOT 176 .7302
Something similar to this. My end goal is to be able to calculate each 'event' attribute as a percentage of the total 'event' by 'shooterName'. Below I added a column '%totalshooterNameevents' which is 'simply goal', 'shot', and 'miss' calculated by the sum of the 'goal, shot, and miss' per each 'shooterName'
Update
Try:
dft = df.groupby(['shooterName', 'event'])['event'].agg(['count']).reset_index()
dft['%total'] = dft.groupby('shooterName')['count'].apply(lambda x: x / sum(x))
print(dft)
# Output
shooterName event count %total
0 A.J. Greer GOAL 1 0.027778
1 A.J. Greer MISS 6 0.166667
2 A.J. Greer SHOT 29 0.805556
3 Aaron Downey GOAL 1 0.025000
4 Aaron Downey MISS 4 0.100000
5 Aaron Downey SHOT 35 0.875000
6 Zenon Konopka GOAL 8 0.033195
7 Zenon Konopka MISS 57 0.236515
8 Zenon Konopka SHOT 176 0.730290
Without sample, it's difficult to guess what you want. Try:
import pandas as pd
import numpy as np
# Setup a Minimal Reproducible Example
np.random.seed(2021)
df = pd.DataFrame({'shooterName': np.random.choice(list('AB'), 20),
'event': np.random.choice(['shot', 'goal', 'miss'], 20)})
# Create an empty dataframe?
dft = pd.DataFrame(index=df['shooterName'].unique())
# Do stuff
grp = df.groupby('shooterName')
dft['count'] = grp.count()
dft = dft.join(grp['event'].value_counts().unstack('event')
.div(dft['count'], axis=0))
Output:
>>> dft
count goal miss shot
A 12 0.416667 0.250 0.333333
B 8 0.500000 0.375 0.125000

Most frequently occurring numbers across multiple columns using pandas

I have a data frame with numbers in multiple columns listed by date, what I'm trying to do is find out the most frequently occurring numbers across the whole data set, also grouped by date.
import pandas as pd
import glob
def lotnorm(pdobject) :
# clean up special characters in the column names and make the date column the index as a date type.
pdobject["Date"] = pd.to_datetime(pdobject["Date"])
pdobject = pdobject.set_index('Date')
for column in pdobject:
if '#' in column:
pdobject = pdobject.rename(columns={column:column.replace('#','')})
return pdobject
def lotimport() :
lotret = {}
# list files in data directory with csv filename
for lotpath in [f for f in glob.glob("data/*.csv")]:
lotname = lotpath.split('\\')[1].split('.')[0]
lotret[lotname] = lotnorm(pd.read_csv(lotpath))
return lotret
print(lotimport()['ozlotto'])
------------- Output ---------------------
1 2 3 4 5 6 7 8 9
Date
2020-07-07 4 5 7 9 12 13 32 19 35
2020-06-30 1 17 26 28 38 39 44 14 41
2020-06-23 1 3 9 13 17 20 41 28 45
2020-06-16 1 2 13 21 22 27 38 24 33
2020-06-09 8 11 26 27 31 38 39 3 36
... .. .. .. .. .. .. .. .. ..
2005-11-15 7 10 13 17 30 32 41 20 14
2005-11-08 12 18 22 28 33 43 45 23 13
2005-11-01 1 3 11 17 24 34 43 39 4
2005-10-25 7 16 23 29 36 39 42 19 43
2005-10-18 5 9 12 30 33 39 45 7 19
The output I am aiming for is
Number frequency
45 201
32 195
24 187
14 160
48 154
--------------- Updated with append experiment -----------
I tried using append to create a single series from the dataframe, which worked for individual lines of code but got a really odd result when I ran it inside a for loop.
temp = lotimport()['ozlotto']['1']
print(temp)
temp = temp.append(lotimport()['ozlotto']['2'], ignore_index=True, verify_integrity=True)
print(temp)
temp = temp.append(lotimport()['ozlotto']['3'], ignore_index=True, verify_integrity=True)
print(temp)
lotcomb = pd.DataFrame()
for i in (lotimport()['ozlotto'].columns.tolist()):
print(f"{i} - {type(i)}")
lotcomb = lotcomb.append(lotimport()['ozlotto'][i], ignore_index=True, verify_integrity=True)
print(lotcomb)
This solution might be the one you are looking for.
freqvalues = np.unique(df.to_numpy(), return_counts=True)
df2 = pd.DataFrame(index=freqvalues[0], data=freqvalues[1], columns=["Frequency"])
df2.index.name = "Numbers"
df2
Output:
Frequency
Numbers
1 6
2 5
3 5
5 8
6 4
7 7
8 2
9 7
10 3
11 4
12 2
13 8
14 1
15 4
16 4
17 6
18 4
19 5
20 9
21 3
22 4
23 2
24 4
25 5
26 4
27 6
28 1
29 6
30 3
31 3
... ...
70 6
71 6
72 5
73 5
74 2
75 8
76 5
77 3
78 3
79 2
80 3
81 4
82 6
83 9
84 5
85 4
86 1
87 3
88 4
89 3
90 4
91 4
92 3
93 5
94 1
95 4
96 6
97 6
98 1
99 6
97 rows × 1 columns
df.max(axis=0)
for columns
df.max(axis=1)
for index
Ok so the final answer I came up with was a mix of a few things including some of the great input from people in this thread. Essentially I do the following:
Pull in the CSV file and clean up the dates and the column names, then convert it to a pandas dataframe.
Then create a new pandas series and append each column to it ignoring dates to prevent conflicts.
Once I have the series, I use Vioxini's suggestion to use numpy to get counts of unique values and then turn the values into the index, after that sort the column by count in descending order and return the top 10 values.
Below is the resulting code, I hope it helps someone else.
import pandas as pd
import glob
import numpy as np
def lotnorm(pdobject) :
# clean up special characters in the column names and make the date column the index as a date type.
pdobject["Date"] = pd.to_datetime(pdobject["Date"])
pdobject = pdobject.set_index('Date')
for column in pdobject:
if '#' in column:
pdobject = pdobject.rename(columns={column:column.replace('#','')})
return pdobject
def lotimport() :
lotret = {}
# list files in data directory with csv filename
for lotpath in [f for f in glob.glob("data/*.csv")]:
lotname = lotpath.split('\\')[1].split('.')[0]
lotret[lotname] = lotnorm(pd.read_csv(lotpath))
return lotret
lotcomb = pd.Series([],dtype=object)
for i in (lotimport()['ozlotto'].columns.tolist()):
lotcomb = lotcomb.append(lotimport()['ozlotto'][i], ignore_index=True, verify_integrity=True)
freqvalues = np.unique(lotcomb.to_numpy(), return_counts=True)
lotop = pd.DataFrame(index=freqvalues[0], data=freqvalues[1], columns=["Frequency"])
lotop.index.name = "Numbers"
lotop.sort_values(by=['Frequency'],ascending=False).head(10)

Diagonals in North East Direction - Time Limit Exceeded in Python 3.8.3

The program must accept an integer matrix of size RxC as the input. The program must print the integers in the diagonals in the North-East directions of the matrix in the seprate line as output.
Boundary:
2<=R,C<=100
Time Limit : 500ms
Example 1:
Input:
3 3
73 77 76
71 17 87
37 73 98
Output:
73
71 77
37 17 76
73 87
98
Example 2:
Input:
4 6
97 78 7 39 92 45
68 100 49 95 97 100
59 41 81 22 26 100
46 37 81 12 93 10
Output:
97
68 78
59 100 7
46 41 49 39
37 81 95 92
81 22 97 45
12 26 100
93 100
10
My Code:
row,col = map(int,input().split())
matrix = [list(map(int,input().split())) for i in range(row)]
# Redundancy of row and col
rep = []
for i in range(row):
for j in range(col):
b = []
for k in range(i,row):
if (j,k) not in rep:
b.append(matrix[k][j])
rep.append((j,k))
j-=1
if j<0:break
if len(b):print(*(b[::-1]))
My code works well but when the matrix is of size (100,100) it exceeds the given time limit, is there a way to reduce it. Thanks in advance
Note : No External Libraries should be used!
The trick here is to realize that because each number only appears in the solution once, so we really only need to evaluate each value once.
We can also see that each matrix will result in row + col - 1 number of North-East direction diagonals, which will help us.
# Original code
row,col = map(int,input().split())
# I won't turn them into ints, strings actually make it easier for my work
matrix = [input().split() for i in range(row)]
diagonals = [""] * (row + col - 1)
for i in range(row):
for j in range(col):
# determine which diagonal the number belongs to, and prepend it
diagonals[i + j] = "%s %s" % (matrix[i][j], diagonals[i + j])
# print out diagonals one at a time
for diagonal in diagonals: print(diagonal)
I never got the chance to run it, but this should give the general idea!
(new to SO, plz be nice :D)

When using min() - ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() [duplicate]

How can I reference the minimum value of two dataframes as part of a pandas dataframe equation? I tried using the python min() function which did not work. I'm sorry if this is well-documented somewhere but I have not been able to find a working solution for this problem. I am looking for something along the lines of this:
data['eff'] = pd.DataFrame([data['flow_h'], data['flow_c']]).min() *Cp* (data[' Thi'] - data[' Tci'])
I also tried to use pandas min() function, which is also not working.
min_flow = pd.DataFrame([data['flow_h'], data['flow_c']]).min()
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
I was confused by this error. The data columns are just numbers and a name, I wasn't sure where the index comes into play.
import pandas as pd
import numpy as np
np.random.seed(365)
rows = 10
flow = {'flow_c': [np.random.randint(100) for _ in range(rows)],
'flow_d': [np.random.randint(100) for _ in range(rows)],
'flow_h': [np.random.randint(100) for _ in range(rows)]}
data = pd.DataFrame(flow)
# display(data)
flow_c flow_d flow_h
0 82 36 43
1 52 48 12
2 33 28 77
3 91 99 11
4 44 95 27
5 5 94 64
6 98 3 88
7 73 39 92
8 26 39 62
9 56 74 50
If you are trying to get the row-wise mininum of two or more columns, use pandas.DataFrame.min. Note that by default axis=0; specifying axis=1 is necessary.
data['min_c_h'] = data[['flow_h','flow_c']].min(axis=1)
# display(data)
flow_c flow_d flow_h min_c_h
0 82 36 43 43
1 52 48 12 12
2 33 28 77 33
3 91 99 11 11
4 44 95 27 27
5 5 94 64 5
6 98 3 88 88
7 73 39 92 73
8 26 39 62 26
9 56 74 50 50
If you like to get a single minimum value of multiple columns:
data[['flow_h','flow_c']].min().min()
the first "min()" calculates the minimum per column and returns a pandas series. The second "min" returns the minimum of the minimums per column.

Python 3 script uses too much memory

As homework for IT lessons I need to write a script which will check for the highest power of 4 which is in modified input number, but I can use only 8MB of RAM. I used for this logarithmic function, so my code looks like this:
from math import log, floor
n = int(input())
numbers = []
for i in range (0, n):
numbers.append(floor(int(input()) / 10))
for i in numbers:
print(4 ** floor(log(i, 4)))
But I checked this script on my PC and it uses more than 8MB!
Partition of a set of 74690 objects. Total size = 8423721 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 23305 31 2100404 25 2100404 25 str
1 19322 26 1450248 17 3550652 42 tuple
2 5017 7 724648 9 4275300 51 types.CodeType
3 9953 13 716915 9 4992215 59 bytes
4 742 1 632536 8 5624751 67 type
5 4618 6 628048 7 6252799 74 function
6 742 1 405720 5 6658519 79 dict of type
7 187 0 323112 4 6981631 83 dict of module
8 612 1 278720 3 7260351 86 dict (no owner)
9 63 0 107296 1 7367647 87 set
<197 more rows. Type e.g. '_.more' to view.>
On my phone, however, this script uses only 2.5MB:
Partition of a set of 35586 objects. Total size = 2435735 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 9831 28 649462 27 649462 27 str
1 9014 25 365572 15 1015034 42 tuple
2 4669 13 261232 11 1276266 52 bytes
3 2357 7 198684 8 1474950 61 types.CodeType
4 436 1 166276 7 1641226 67 type
5 2156 6 155232 6 1796458 74 function
6 436 1 130836 5 1927294 79 dict of type
7 93 0 87384 4 2014678 83 dict of module
8 237 1 62280 3 2076958 85 dict (no owner) 9 1091 3 48004 2 2124962 87 types.WrapperDescriptorType
<115 more rows. Type e.g. '_.more' to view.>
I tried changing list to tuple, but it didn't make any difference.
Is there any possibility to decrease/limit RAM usage?

Resources