How to create a .kml file from a dataframe? - python-3.x

I need to create a .kml file from a dataframe with more than 800 districts.
This is what I have done so far:
1) Read a .csv file (FIG 1) using PANDAS
2) Creat a new dataframe by choosing only the first 3 columns (longitude, latitude, altitude)
3) Create a list of tuples from the dataframe
4) Create a .kml file and do some styling (colors)
All this proceure work great ONLY when there is 1 district. Now I need to do the same but with more than 800 districts. In FIG 2, it is shown an example with 2 districts (ACTONO and AILSACRAIGO).
When converting the dataframe to a list of tuples, how can make "python" know that there are many districts?
I believe these lines have to be improved:
a) Here I will need a list of tuples (one for each district)
#Converting the dataframe to a list of tuples
tuples = [tuple(x) for x in df_modify.values]
b) And here, "outboundaryies" will have to change for each of the tuples
pol = kml.newpolygon(name= 'ACTONO', description= 'Acton County',
outerboundaryis=tuples, extrude=extrude, altitudemode=altitudemode)
This is all the code:
CODE FOR .CSV WITH 1 DISTRICT
import pandas
import simplekml
kml = simplekml.Kml()
#Using PANDAS to read .csv and chosing the first 3 columns
df = pandas.read_csv('C:\\Users\\disa_ONTshp.csv')
df_modify=df.iloc[:, [0,1,2]]
#Converting the dataframe to a list of tuples
tuples = [tuple(x) for x in df_modify.values]
#Creating a .kml file
extrude=1
altitudemode = simplekml.AltitudeMode.relativetoground
pol = kml.newpolygon(name= 'ACTONO', description= 'Acton County',
outerboundaryis=tuples, extrude=extrude, altitudemode=altitudemode)
#Styling colors
pol.style.linestyle.color = simplekml.Color.green
pol.style.linestyle.width = 5
pol.style.polystyle.color = simplekml.Color.changealphaint(100,
simplekml.Color.green)
#Saving
kml.save("Polygon Styling.kml")
FIG 1 (1 DISTRICT)
FIG 2 (2 DISTRICTS)

This is the answer. What I needed was a dictionary of dataframes.
import pandas as pd
import simplekml
import pprint
import numpy as np
kml = simplekml.Kml()
###LOADING THE .csv FILE WITH ALL THE COORDINATES (USING QGIS)
df = pd.read_csv('C:\\Users\\file.csv')
###ADDING A COLUMN "altitude" WITH RANDOM VALUES FROM 200 TO 2000
df['altitude']=df.groupby('name').name.transform(lambda x: np.random.randint(200,2000))
###CALLING THE COLUMNS OF INTEREST
df=df[['longitude', 'latitude', 'altitude', 'name']]
###CREATING A DICTIONARY OF DATAFRAMES (ONE FOR EACH DISTRICT)
dict_dataframes=dict(tuple(df.groupby('name')))
###CALLING EACH DATAFRAME FROM THE DICTIONARY
for name, df in dict_dataframes.items():
###CREATING A LIST OF TUPLES WITH THE COLUMNS OF THE DATAFRAME
tuples = [tuple(x) for x in df.values]
extrude=1
altitudemode = simplekml.AltitudeMode.relativetoground
pol = kml.newpolygon(name = name, description="District of " + name, outerboundaryis=tuples, extrude=extrude, altitudemode=altitudemode)
pol.style.linestyle.color = simplekml.Color.honeydew
pol.style.linestyle.width = 3
pol.style.polystyle.color = simplekml.Color.changealphaint(100, simplekml.Color.navy)
###SAVING THE FILE
kml.save('C:\\Users\\3d_file.kml')

Related

Can we copy one column from excel and convert it to a list in Python?

I use
df = pd.read_clipboard()
list_name = df['column_name'].to_list()
but this is a bit long method for me. I want to copy a column and convert in python and then apply some function so that the copied text is converted to a list.
this will read a excel column as list
import xlrd
book = xlrd.open_workbook('Myfile.xlsx') #path to your file
sheet = book.sheet_by_name("Sheet1") #Sheet name
def Readlist(Element, Column):
for _ in range(1,sheet.nrows):
Element.append(str(sheet.row_values(_)[Column]))
pass
column1 = [] # List name
Readlist(column1, 1) # Column Number is 1 here
pirnt(column1)
Read a specified column as list use Readlist function, intialize [] variable before using that.
Using Pandas:
import pandas as pd
df = pd.read_excel("path.xlsx", index_col=None, na_values=['NA'], usecols = "A")
mylist = list(df[0])
print(mylist)

finding latest trip information from a large data frame

I have one requirement:
I have a dataframe "df_input" having 20M rows which includes trip details. columns are "vehicle-no", "geolocation","start","end".
For each of the vehicle number there are multiple rows each having different geolocation for different trips.
Now I want to create a new dataframe df_final which will have only the first record for all of the vehicle-no. How can do that in efficient way?
I used something like below which is taking more than 5 hours to complete:
import dfply as dp
from dfply import X
output_df_columns = ["vehicle-no","start", "end", "geolocations"]
df_final = pd.DataFrame(columns = output_df_columns) #create empty dataframe
unique_vehicle_no = list(df_input["vehicle-no"].unique())
df_input.sort_values(["start"],inplace=True)
for each_vehicle in unique_vehicle_no:
df_temp = (df_input >> dp.mask(X.vehicle-no == each_vehicle))
df_final = df_final.append(df_temp.head(1),ignore_index=True, sort=False)
I think this will work out
import pandas as pd
import numpy as np
df_input=pd.DataFrame(np.random.randint(10,size=(1000,3)),columns=['Geolocation','start','end'])
df_input['vehicle_number']=np.random.randint(100,size=(1000))
print(df_input.shape)
print(df_input['vehicle_number'].nunique())
df_final=df_input.groupby('vehicle_number').apply(lambda x : x.head(1)).reset_index(drop=True)
print(df_final['vehicle_number'].nunique())
print(df_final.shape)

How to write from loop to dataframe

I'am trying to calculate 33 stock betas and write them to dataframe.
Unfortunately, I have an error in my code:
cannot concatenate object of type ""; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are vali
import pandas as pd
import numpy as np
stock1=pd.read_excel(r"C:\Users\Кир\Desktop\Uni\Master\Nasdaq\Financials 11.05\Nasdaq last\clean data\01.xlsx", '1') #read second sheet of excel file
stock2=pd.read_excel(r"C:\Users\Кир\Desktop\Uni\Master\Nasdaq\Financials 11.05\Nasdaq last\clean data\01.xlsx", '2') #read second sheet of excel file
stock2['stockreturn']=np.log(stock2.AdjCloseStock / stock2.AdjCloseStock.shift(1)) #stock ln return
stock2['SP500return']=np.log(stock2.AdjCloseSP500 / stock2.AdjCloseSP500.shift(1)) #SP500 ln return
stock2 = stock2.iloc[1:] #delete first row in dataframe
betas = pd.DataFrame()
for i in range(0,(len(stock2.AdjCloseStock)//52)-1):
betas = betas.append(stock2.stockreturn.iloc[i*52:(i+1)*52].cov(stock2.SP500return.iloc[i*52:(i+1)*52])/stock2.SP500return.iloc[i*52:(i+1)*52].cov(stock2.SP500return.iloc[i*52:(i+1)*52]))
My data looks like weekly stock and S&P index return for 33 years. So the output should have 33 betas.
I tried simplifying your code and creating an example. I think the problem is that your calculation returns a float. You want to make it a pd.Series. DataFrame.append takes:
DataFrame or Series/dict-like object, or list of these
np.random.seed(20)
df = pd.DataFrame(np.random.randn(33*53, 2),
columns=['a', 'b'])
betas = pd.DataFrame()
for year in range(len(df['a'])//52 -1):
# Take some data
in_slice = pd.IndexSlice[year*52:(year+1)*52]
numerator = df['a'].iloc[in_slice].cov(df['b'].iloc[in_slice])
denominator = df['b'].iloc[in_slice].cov(df['b'].iloc[in_slice])
# Do some calculations and create a pd.Series from the result
data = pd.Series(numerator / denominator, name = year)
# Append to the DataFrame
betas = betas.append(data)
betas.index.name = 'years'
betas.columns = ['beta']
betas.head():
beta
years
0 0.107669
1 -0.009302
2 -0.063200
3 0.025681
4 -0.000813

how to classify a large csv file of signals without headers in python?

i had a large csv file (3000*20000) of data without headers i added one columns to represent the classes. how i can fit the data to the model when the features has no headers and it can not be added manually due to the large number of columns.
is there i way to automatically iterate each columns in a row?
when i had a small file of 4 columns i used the following code:
import pandas as pd
pd = pd.ExcelFile("bcs.xlsx")
col = [0, 1, 2, 3]
data = pd.parse(pd.sheet_names[0], parse_cols = col)
pdc = list(data["pdc"])
pds = list(data["pds"])
pdsh = list(data["pdsh"])
pd_class = list(data["class"])
features = []
for i in range(len(pdc)):
features.append([pdc[i],pds[i],pdsh[i]])
labels = []
labels = pd_class
But with a 3000 by 20000 file i don't know how to identify the features and labels/target
Let's say you have a csv like that:
1,2,3,4,0
1,2,3,4,1
1,2,3,4,1
1,2,3,4,0
where the first 4 columns are features and the last one is the label or class you want. You can read the file with pandas.read_csv and create a dataframe for you features and one for your labels which you can fit next, to your model.
import pandas as pd
#CSV localPath
mypath ='C:\\...'
#The names of the columns you want to have in your dataframe
colNames = ['Feature1','Feature2','Feature3','Feature4','class']
#Read the data as dataframe
df = pd.read_csv(filepath_or_buffer = mypath,
names = colNames , sep = ',' , header = None)
#Get the first four columns as features
features = df.ix[:,:4]
#and last columns as label
labels = df['class']

Python3, with pandas.dataframe, how to select certain data by some rules to show

I have a pandas.dataframe, and I want to select certain data by some rules.
The following codes generate the dataframe
import datetime
import pandas as pd
import numpy as np
today = datetime.date.today()
dates = list()
for k in range(10):
a_day = today - datetime.timedelta(days=k)
dates.append(np.datetime64(a_day))
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(10, 3)),
columns=('other1', 'actual', 'other2'),
index=['{}'.format(i) for i in range(10)])
df.insert(0, 'dates', dates)
df['err_m'] = np.random.rand(10, 1)*0.1
df['std'] = np.random.rand(10, 1)*0.05
df['gain'] = np.random.rand(10, 1)
Now, I want select by the following rules:
1. compute the sum of 'err_m' and 'std', then sort the df so that the sum is descending
2. from the result of step 1, select the part where 'actual' is > 50
Thanks
Create a new column and then sort by this one:
df['errsum'] = df['err_m'] + df['std']
# Return a sorted dataframe
df_sorted = df.sort('errsum', ascending = False)
Select the lines you want
# Create an array with True where the condition is met
selector = df_sorted['errsum'] > 50
# Return a view of sorted_dataframe with only the lines you want
df_sorted[selector]

Resources