How can I visualize coordinates from a csv file using folium? - python-3.x

I have data in the cvs files:
ID, Name, Address, Latitude, Longitude.
FOO BO 34 Zako, Kost str.55 49.2955102 19.95274595
FOO B1 55 Vara, Dost str 44 49.4814 20.0303
ZOO B2 56 XXXX, YYYY str 99 49.5551 21.6766
I would like to visualize this data on the map using folium in python3.
The example code for coordinates is:
import folium
logo_url = 'https://upload.wikimedia.org/wikipedia/en/c/c6/Logo_link.png'
start_lat = 52.2138
start_lng = 20.9795
csv_data_lat = 49.2955102
csv_data_lng = 19.95274595
map_1 = folium.Map(location=[start_lat, start_lng], zoom_start=12,control_scale = True)
icon = folium.features.CustomIcon(logo_url, icon_size=(50, 50))
folium.Marker([csv_data_lat, csv_data_lng], popup='PKOBP', icon=icon).add_to(map_1)
map_1.save('map1.html')
How to do it for data from csv files?

I guess you could do something like this instead of the folium.Marker line:
def add_marker(row,map,icon):
folium.Marker([row["latitude"], row["longitude"]], popup='PKOBP', icon=icon).add_to(map)
df = pd.read_csv('your_csv.csv')
df.apply(add_marker,axis=1,args=(map_1,icon,))

Related

Problem with PyPDF2 in Python 3.11 get unreadble russian text

Trying to parse pdf with russian text, but can't make it work
from PyPDF2 import PdfReader
reader = PdfReader("./mpp/1.pdf")
page = reader.pages[0]
m = page.extract_text()
print(m)
File pdf: https://disk.yandex.ru/i/qSJfFZJFDuLDIA
i'm trying use .encode, but it's not help.
In total, i need a this:
route1 = {
"bus": "Yutong г/н 499",
"stations": "12"
}
stations = ['1; ул. Арсеньева, м-н Пчелка; 7:40','2; ул. Агеева, ДК "Юность"; 7:44', '3; ул. Горького, минирынок Исток; 7:55', '....etc up to 12']

Converting generic grid type to a lonlat gridtype using cdo

I am currently trying to use one NetCDF file (open-source data which can be downloaded from https://doi.pangaea.de/10.1594/PANGAEA.828650) to extract specific latitudes from this dataset (https://www.nodc.noaa.gov/archive/arc0105/0160558/3.3/data/0-data/spco2_1982-2015_MPI_SOM-FFN_v2016.nc, again open source).
The first dataset is defined regions of the global oceans known as biomes, I have successfully extracted the area which covers biomes labelled as 16 and 17 from this region.
This dataset has the following gridtype:
gridtype = generic
gridsize = 64800
xsize = 180
ysize = 360
xname = lat
xunits = "degrees latitude"
yname = lon
yunits = "degrees longitude"
xfirst = -89.5
xinc = 1
yfirst = -179.5
yinc = 1
The second grid type is a global dataset of ocean carbon fluxes (parameter: fgco2_raw) and I wish to extract values from the region defined by biome 16 and 17 in the Time_Varying_Biomes.nc.
This dataset, spco2_1982-2015_MPI_SOM-FFN_v2016.nc, has the following gridtype:
gridtype = lonlat
gridsize = 64800
datatype = float
xsize = 360
ysize = 180
xname = lon
xlongname = "longitude"
xunits = "degrees_east"
yname = lat
ylongname = "latitude"
yunits = "degrees_north"
xfirst = -179.5
xinc = 1
When I have attempted to regrid Time_Varying doing the following:
cdo remapbil,mygridtype Time_Varying_Biomes.nc TVB_rg.nc
I have received this error:
cdo remapbil (Abort): Unsupported generic coordinates (Variable: MeanBiomes)!
Has anyone experienced this issue before, and do you have any idea on how I can fix it? Let me know if you need any further information to solve the problem.
Thanks in advance!
You can solve this using the setgrid operator.
First create a grid file. This is the same as the one you give, but with "generic" replaced by "lonlat":
cdo griddes infile.nc > mygrid
sed -i "s/generic/lonlat/g" mygrid
Then use CDO to set the grid:
cdo setgrid,mygrid infile.nc infile_fixedgrid.nc
You should then be able to regrid the file.

Linkedin web scraping snippet

I'm doing a web scraping data university research project. I started working on a ready GitHub project, but this project does not retrieve all the data.
The project works like this:
Search Google using keywords: example: (accountant 'email me at' Google)
Extract a snippet.
Retrieve data from this snippet.
The issue is:
The snippets extracted are like this: " ... marketing division in 2009. For more information on career opportunities with our company, email me: vicki#productivedentist.com. Neighborhood Smiles, LLC ..."
The snippet does not show all, the "..." hides information like role, location... How can I retrieve all the information with the script?
from googleapiclient.discovery import build #For using Google Custom Search Engine API
import datetime as dt #Importing system date for the naming of the output file.
import sys
from xlwt import Workbook #For working on xls file.
import re #For email search using regex.
if __name__ == '__main__':
# Create an output file name in the format "srch_res_yyyyMMdd_hhmmss.xls in output folder"
now_sfx = dt.datetime.now().strftime('%Y%m%d_%H%M%S')
output_dir = './output/'
output_fname = output_dir + 'srch_res_' + now_sfx + '.xls'
search_term = sys.argv[1]
num_requests = int(sys.argv[2])
my_api_key = "replace_with_you_api_key" #Read readme.md to know how to get you api key.
my_cse_id = "011658049436509675749:gkuaxghjf5u" #Google CSE which searches possible LinkedIn profile according to query.
service = build("customsearch", "v1", developerKey=my_api_key)
wb=Workbook()
sheet1 = wb.add_sheet(search_term[0:15])
wb.save(output_fname)
sheet1.write(0,0,'Name')
sheet1.write(0,1,'Profile Link')
sheet1.write(0,2,'Snippet')
sheet1.write(0,3,'Present Organisation')
sheet1.write(0,4,'Location')
sheet1.write(0,5,'Role')
sheet1.write(0,6,'Email')
sheet1.col(0).width = 256 * 20
sheet1.col(1).width = 256 * 50
sheet1.col(2).width = 256 * 100
sheet1.col(3).width = 256 * 20
sheet1.col(4).width = 256 * 20
sheet1.col(5).width = 256 * 50
sheet1.col(6).width = 256 * 50
wb.save(output_fname)
row = 1 #To insert the data in the next row.
#Function to perform google search.
def google_search(search_term, cse_id, start_val, **kwargs):
res = service.cse().list(q=search_term, cx=cse_id, start=start_val, **kwargs).execute()
return res
for i in range(0, num_requests):
# This is the offset from the beginning to start getting the results from
start_val = 1 + (i * 10)
# Make an HTTP request object
results = google_search(search_term,
my_cse_id,
start_val,
num=10 #num value can be 1 to 10. It will give the no. of results.
)
for profile in range (0, 10):
snippet = results['items'][profile]['snippet']
myList = [item for item in snippet.split('\n')]
newSnippet = ' '.join(myList)
contain = re.search(r'[\w\.-]+#[\w\.-]+', newSnippet)
if contain is not None:
title = results['items'][profile]['title']
link = results['items'][profile]['link']
org = "-NA-"
location = "-NA-"
role = "-NA-"
if 'person' in results['items'][profile]['pagemap']:
if 'org' in results['items'][profile]['pagemap']['person'][0]:
org = results['items'][profile]['pagemap']['person'][0]['org']
if 'location' in results['items'][profile]['pagemap']['person'][0]:
location = results['items'][profile]['pagemap']['person'][0]['location']
if 'role' in results['items'][profile]['pagemap']['person'][0]:
role = results['items'][profile]['pagemap']['person'][0]['role']
print(title[:-23])
sheet1.write(row,0,title[:-23])
sheet1.write(row,1,link)
sheet1.write(row,2,newSnippet)
sheet1.write(row,3,org)
sheet1.write(row,4,location)
sheet1.write(row,5,role)
sheet1.write(row,6,contain[0])
print('Wrote {} search result(s)...'.format(row))
wb.save(output_fname)
row = row + 1
print('Output file "{}" written.'.format(output_fname))

How can i add all the elements which has same class in the variable using selenium in the python

content = driver.find_element_by_class_name('topics-sec-block')
container = content.find_elements_by_xpath('//div[#class="col-sm-7 topics-sec-item-cont"]')
the code is below:
for i in range(0, 40):
title = []
url = []
heading=container[i].find_element_by_xpath('//div[#class="col-sm-7 topics-sec-item-cont"]/a/h2').text
link = container[i].find_element_by_xpath('//div[#class="col-sm-7 topics-sec-item-cont"]/a')
title.append(heading)
url.append(link.get_attribute('href'))
print(title)
print(url)
it is giving me the 40 number of lines but all lines have same title and url as (some of them is given below):
['Stuck in Mexico: Central American asylum seekers in limbo']
['https://www.aljazeera.com/news/2020/03/stuck-mexico-central-american-asylum-seekers-limbo-200305103910955.html']
['Stuck in Mexico: Central American asylum seekers in limbo']
['https://www.aljazeera.com/news/2020/03/stuck-mexico-central-american-asylum-seekers-limbo-200305103910955.html']

Analysis from unique values from one column naming files from another

I have the following code that I am using to loop through unique values in my data set and it works great, but I would like to change the export name to a more appropriate unique value that I can use in a dashboard. The following is the code that I have where x in path is taken from teams unique names. However, for this part only, I'd like the name to be assigned from a list outside of the original dataframe.
team = df['RSA'].unique()
for x in team:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %x
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)
Couple of ways I've tried to solve this:
1) Create a list and assign x in the path to the list instead of x.
mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']
for x in team:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %mylist
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)
That doesn't work because it doesn't loop and it doesn't align the new values to the original dataframe values. Cross that off the list.
2) I thought maybe a loop inside the loop would do the trick:
team = df['RSA'].unique()
mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']
for x in team:
for name in mylist:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %name
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)
That didn't work either. It just gave me the last value in mylist, but also it doesn't align the the unique values in team list appropriately.
3) Next I created a dataframe with the unique values from team and the new list.
team = df['RSA'].unique()
mylist = ['HR_DASH_0034','HR_DASH_0035','HR_DASH_0036','HR_DASH_0037','HR_DASH_0038','HR_DASH_0039','HR_DASH_0040',
'HR_DASH_0041','HR_DASH_0042','HR_DASH_0043','HR_DASH_0044','HR_DASH_0045','empty']
dict = {'RSA': team, 'DASH_ID': mylist}
newdf = pd.DataFrame(dict)
print (newdf)
RSA DASH_ID
0 Intermountain Region, R4 HR_DASH_0034
1 Pacific Southwest Region, R5 HR_DASH_0035
2 Alaska Region, R10 HR_DASH_0036
3 Pacific Northwest Region, R6 HR_DASH_0037
4 Northern Region, R1 HR_DASH_0038
5 Eastern Region, R9 HR_DASH_0039
6 Albuquerque Service Center(ASC) HR_DASH_0040
7 Rocky Mountain Region, R2 HR_DASH_0041
8 Research & Development(RES) HR_DASH_0042
9 Washington Office(WO) HR_DASH_0043
10 Southwestern Region, R3 HR_DASH_0044
11 Southern Region, R8 HR_DASH_0045
12 L2 Desc Not Available empty
However, I still don't know how to get the DASH_ID column element names to export in my path mentioned above.
So in the end, HR_DASH_0034, name should align to Intermountain Region, R4 when the file is sent out.
Any help appreciated!
Inside your first approach, just use:
mylist = [...] # your list definition
ml_iter = iter(mylist)
and inside the loop, replace mylist with:
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %str(next(ml_iter))
More info: https://www.programiz.com/python-programming/methods/built-in/iter
Lemme know if this helps!
UPDATE: Second Solution
for x, m in zip(team, mylist):
path2 = r'C:\Users\davidlopez\Desktop\regions\%s.csv' %m
r = HROs['RSA'] == x
Completed = HROs['Current Team Simple'].isin(['Completed'])
table = HROs[Completed & r]
top20 = table.groupby(['To Position Title']).RequestNumber.count().sort_values().nlargest(20)
top20.to_csv(path2, index=True, header=True)
Let me know if this works!

Resources