How to Retrieve Geographical Data from the US Geological Survey site? - linux

The USGS has a page for user to make GET queries:
https://nationalmap.gov/epqs/
I attempted to fill the 4 fields provided:
X: -96.808971 // longitude
Y: 32.7792009 // Latitude
Units: Feet
Output: XML
After clicking on "Get Elevation" I get an error message:
"This XML file does not appear to have any style information associated with it. The document tree is shown below."
<USGS_Elevation_Point_Query_Service>
<Elevation_Query x="-96.808971" y="32.7792009">
<Data_Source>3DEP 1/3 arc-second</Data_Source>
<Elevation>420.61</Elevation>
<Units>Feet</Units>
</Elevation_Query>
</USGS_Elevation_Point_Query_Service>
They provide a sample URL GET:
"The following is a sample HTTP GET request and response. The placeholders shown need to be replaced with actual values."
GET pqs.php?x=string&y=string&units=string&output=string HTTP/1.1
Host: nationalmap.gov/epqs/pqs.php
That is where I am stuck.
I have curl and and urlgrabber. Have only used them for simple requests.
TIA

$ curl 'https://nationalmap.gov/epqs/pqs.php?x=-96.808971&y=32.7792009&units=Feet&output=xml'
<?xml version="1.0" encoding="utf-8" ?><USGS_Elevation_Point_Query_Service><Elevation_Query x="-96.808971" y="32.7792009"><Data_Source>3DEP 1/3 arc-second</Data_Source><Elevation>420.61</Elevation><Units>Feet</Units></Elevation_Query></USGS_Elevation_Point_Query_Service>
if you want it prettified, you could use php to format it, eg:
$ curl 'https://nationalmap.gov/epqs/pqs.php?x=-96.808971&y=32.7792009&units=Feet&output=xml' -s | php -r '$domd=#DOMDocument::loadXML(stream_get_contents(STDIN));$domd->formatOutput=1;echo $domd->saveXML();'
<?xml version="1.0" encoding="utf-8"?>
<USGS_Elevation_Point_Query_Service>
<Elevation_Query x="-96.808971" y="32.7792009">
<Data_Source>3DEP 1/3 arc-second</Data_Source>
<Elevation>420.61</Elevation>
<Units>Feet</Units>
</Elevation_Query>
</USGS_Elevation_Point_Query_Service>
or if you just want the raw data, you can parse it with PHP like
curl 'https://nationalmap.gov/epqs/pqs.php?x=-96.808971&y=32.7792009&units=Feet&output=json' -s | php -r '$data=json_decode(stream_get_contents(STDIN));echo json_encode($data->USGS_Elevation_Point_Query_Service->Elevation_Query,-1);'
{
"x": -96.808971,
"y": 32.7792009,
"Data_Source": "3DEP 1/3 arc-second",
"Elevation": 420.61,
"Units": "Feet"
}
.. or if you just want the elevation,
$ curl 'https://nationalmap.gov/epqs/pqs.php?x=-96.808971&y=32.7792009&units=Feet&output=json' -s | php -r '$data=json_decode(stream_get_contents(STDIN));echo $data->USGS_Elevation_Point_Query_Service->Elevation_Query->Elevation;'
420.61

Related

MLflow webserver returns 400 status, "Incompatible input types for column X. Can not safely convert float64 to <U0."

I am implementing an anomaly detection web service using MLflow and sklearn.pipeline.Pipeline(). The aim of the model is to detect web crawlers using server log and response_length column is one of my features. After serving model, for testing the web service I send below request that contains the 20 first columns of the train data.
$ curl --location --request POST '127.0.0.1:8000/invocations'
--header 'Content-Type: text/csv' \
--data-binary 'datasets/test.csv'
But response of the web server has status code 400 (BAD REQUEST) and this JSON body:
{
"error_code": "BAD_REQUEST",
"message": "Incompatible input types for column response_length. Can not safely convert float64 to <U0."
}
Here is the model compilation MLflow Tracking component log:
[Pipeline] ......... (step 1 of 3) Processing transform, total=11.8min
[Pipeline] ............... (step 2 of 3) Processing pca, total= 4.8s
[Pipeline] ........ (step 3 of 3) Processing rule_based, total= 0.0s
2021/07/16 04:55:12 WARNING mlflow.sklearn: Training metrics will not be recorded because training labels were not specified. To automatically record training metrics, provide training labels as inputs to the model training function.
2021/07/16 04:55:12 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/home/matin/workspace/Rahnema College/venv/lib/python3.8/site-packages/mlflow/models/signature.py:129: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details."
Logged data and model in run: 8843336f5c31482c9e246669944b1370
---------- logged params ----------
{'memory': 'None',
'pca': 'PCAEstimator()',
'rule_based': 'RuleBasedEstimator()',
'steps': "[('transform', <log_transformer.LogTransformer object at "
"0x7f05a8b95760>), ('pca', PCAEstimator()), ('rule_based', "
'RuleBasedEstimator())]',
'transform': '<log_transformer.LogTransformer object at 0x7f05a8b95760>',
'verbose': 'True'}
---------- logged metrics ----------
{}
---------- logged tags ----------
{'estimator_class': 'sklearn.pipeline.Pipeline', 'estimator_name': 'Pipeline'}
---------- logged artifacts ----------
['model/MLmodel',
'model/conda.yaml',
'model/model.pkl',
'model/requirements.txt']
Could anyone tell me exactly how I can fix this model serve problem?
The problem caused by mlflow.utils.autologging_utils WARNING.
When the model is created, data input signature is saved on the MLmodel file with some.
You should change response_length signature input type from string to double by replacing
{"name": "response_length", "type": "double"}
instead of
{"name": "response_length", "type": "string"}
so it doesn't need to be converted. After serving the model with edited MLmodel file, the web server worked as expected.

How to fetch only parts of json file in python3 requests module

So, I am writing a program in Python to fetch data from google classroom API using requests module. I am getting the full json response from the classroom as follows :
{'announcements': [{'courseId': '#############', 'id': '###########', 'text': 'This is a test','state': 'PUBLISHED', 'alternateLink': 'https://classroom.google.com/c/##########/p/###########', 'creationTime': '2021-04-11T10:25:54.135Z', 'updateTime': '2021-04-11T10:25:53.029Z', 'creatorUserId': '###############'}, {'courseId': '############', 'id': '#############', 'text': 'Hello everyone', 'state': 'PUBLISHED', 'alternateLink': 'https://classroom.google.com/c/#############/p/##################', 'creationTime': '2021-04-11T10:24:30.952Z', 'updateTime': '2021-04-11T10:24:48.880Z', 'creatorUserId': '##############'}, {'courseId': '##################', 'id': '############', 'text': 'Hello everyone', 'state': 'PUBLISHED', 'alternateLink': 'https://classroom.google.com/c/##############/p/################', 'creationTime': '2021-04-11T10:23:42.977Z', 'updateTime': '2021-04-11T10:23:42.920Z', 'creatorUserId': '##############'}]}
I was actually unable to convert this into a pretty format so just pasting it as I got it from the http request. What I actually wish to do is just request the first few announcements (say 1, 2, 3 whatever depending upon the requirement) from the service while what I'm getting are all the announcements (as in the sample 3 announcements) that had been made ever since the classroom was created. Now, I believe that fetching all the announcements might make the program slower and so I would prefer if I could get only the required ones. Is there any way to do this by passing some arguments or anything? There are a few direct functions provided by google classroom however I came across those a little later and have already written everything using the requests module which would require changing a lot of things which I would like to avoid. However if unavoidable I would go that route as well.
Answer:
Use the pageSize field to limit the number of responses you want in the announcements: list request, with an orderBy parameter of updateTime asc.
More Information:
As per the documentation:
orderBy: string
Optional sort ordering for results. A comma-separated list of fields with an optional sort direction keyword. Supported field is updateTime. Supported direction keywords are asc and desc. If not specified, updateTime desc is the default behavior. Examples: updateTime asc, updateTime
and:
pageSize: integer
Maximum number of items to return. Zero or unspecified indicates that the server may assign a maximum.
So, let's say you want the first 3 announcements for a course, you would use a pageSize of 3, and an orderBy of updateTime asc:
# Copyright 2021 Google LLC.
# SPDX-License-Identifier: Apache-2.0
service = build('classroom', 'v1', credentials=creds)
asc = "updateTime asc"
pageSize = 3
# Call the Classroom API
results = service.courses().announcements().list(pageSize=3, orderBy=asc ).execute()
or an HTTP request example:
GET https://classroom.googleapis.com/v1/courses/[COURSE_ID]/announcements
?orderBy=updateTime%20asc
&pageSize=2
&key=[YOUR_API_KEY] HTTP/1.1
Authorization: Bearer [YOUR_ACCESS_TOKEN]
Accept: application/json
References:
Method: announcements.list | Classroom API | Google Developers

XML-Parsing error AttributeError: 'NoneType' object has no attribute 'text'

There is probably a simple solution to my problem, but I am very new to python3 so please go easy on me;)
I have a simple script running, which already successfully parses information from an xml-file using this code
import xml.etree.ElementTree as ET
root = ET.fromstring(my_xml_file)
u = root.find(".//name").text.rstrip()
print("Name: %s\n" % u)
The xml I am parsing looks like this
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3.2/style/exchange.xsl"?>
<example:world-data xmlns="http://www.example.org" xmlns:ops="http://example.oorg" xmlns:xlink="http://www.w3.oorg/1999/xlink">
<exchange-documents>
<exchange-document system="acb.org" family-id="543672" country="US" doc-number="95962" kind="B2">
<bibliographic-data>
<name>SomeName</name>
...and so on... and ends like this
</exchange-document>
</exchange-documents>
</example:world-data>
(Links are edited due to stackoverflow policy)
Output as expected
SomeName
However, if I try to parse another xml from the same api using the same python commands, I get this error-code
AttributeError: 'NoneType' object has no attribute 'text'
The second xml-file looks like this
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3.2/style/pub-ftxt-claims.xsl"?>
<ops:world-data xmlns="http://www.example.org/exchange" xmlns:example="http://example.org" xmlns:xlink="http://www.example.org/1999/xlink">
<ftxt:fulltext-documents xmlns="http://www.examp.org/fulltext" xmlns:ftxt="ww.example/fulltext">
<ftxt:fulltext-document system="example.org" fulltext-format="text-only">
<bibliographic-data>
<publication-reference data-format="docdb">
<document-id>
<country>EP</country>
<doc-number>10000</doc-number>
<kind>A</kind>
</document-id>
</publication-reference>
</bibliographic-data>
<claims lang="EN">
<claim>
<claim-text>1. Some text.</claim-text>
<claim-text>2. Some text.</claim-text>
<claim-text>2. Some text.</claim-text>
</claim>
</claims>
</ftxt:fulltext-document>
</ftxt:fulltext-documents>
</ops:world-data>
I tried again
root = ET.fromstring(usr_str)
u = root.find(".//claim-text").text.rstrip()
print("Abstract: %s\n" % u)
Expected output
1. Some text.
But it only prints the above mentioned error message.
Why can I parse the first xml but not the second one using these commands?
Any help is highly appreciated.
edit: code by Jack Fleeting works in python console, but unfortunately not in my PyCharm
from lxml import etree
root = etree.XML(my_xml.encode('ascii'))
root2 = etree.XML(my_xml2.encode('ascii'))
root.xpath('//*[local-name()="name"]/text()')
root2.xpath('//*[local-name()="claim-text"]/text()')
Could this be a bug in my PyCharm? My first mentioned code snippet still prints a correct result for name...
edit: Turns out I had to force the output using
a = root3.xpath('//*[local-name()="claim-text"]/text()')
print(a, flush=True)
A couple of problems here before we get to a possible solution. One, the first xml snippet you provided is invalid (for instance, the <bibliographic-data> isn't closed). I realize it's just a snippet but since this is what we have to work with, I modified the snippet below to fix that. Two, both snippets have xmlns declaration with unbound (unused) prefixes (example:world-datain the first, and ops:world-data in the second). I had to remove these prefixes, too, for the rest to work.
Given these modifications, using the lxml library should work for you.
First modified snippet:
my_xml = """<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3.2/style/exchange.xsl"?>
<world-data xmlns="http://www.example.org" xmlns:ops="http://example.oorg" xmlns:xlink="http://www.w3.oorg/1999/xlink">
<exchange-documents>
<exchange-document system="acb.org" family-id="543672" country="US" doc-number="95962" kind="B2">
<bibliographic-data>
<name>SomeName</name>
...and so on... and ends like this
</bibliographic-data>
</exchange-document>
</exchange-documents>
</world-data>"""
And:
my_xml2 = """<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3.2/style/pub-ftxt-claims.xsl"?>
<world-data xmlns="http://www.example.org/exchange" xmlns:example="http://example.org" xmlns:xlink="http://www.example.org/1999/xlink">
<ftxt:fulltext-documents xmlns="http://www.examp.org/fulltext" xmlns:ftxt="ww.example/fulltext">
<ftxt:fulltext-document system="example.org" fulltext-format="text-only">
<bibliographic-data>
<publication-reference data-format="docdb">
<document-id>
<country>EP</country>
<doc-number>10000</doc-number>
<kind>A</kind>
</document-id>
</publication-reference>
</bibliographic-data>
<claims lang="EN">
<claim>
<claim-text>1. Some text.</claim-text>
<claim-text>2. Some text.</claim-text>
<claim-text>3. Some text.</claim-text>
</claim>
</claims>
</ftxt:fulltext-document>
</ftxt:fulltext-documents>
</world-data>"""
And now to work:
from lxml import etree
root = etree.XML(my_xml.encode('ascii'))
root2 = etree.XML(my_xml2.encode('ascii'))
root.xpath('//*[local-name()="name"]/text()')
output:
['SomeName']
root2.xpath('//*[local-name()="claim-text"]/text()')
Output:
['1. Some text.', '2. Some text.', '3. Some text.']

How do I declare and use a variable in the yaml file that is formatted for pyresttest?

So, a brief description of what I want, what my issue is, and what I have tried.
I want to declare and use a dictionary variable for my tests in pyrest, specifically for the [url, body] section so that I can conduct my POST tests targeting a specific endpoint and with a preformatted body.
Here is how mytest.yml file is structured:
- data:
- id: 63
- rate: 25
... a sizable set of field for reasons ...
- type: lab_test_authorization
- modified_at: ansible_date_time.datetime # Useful way to generate
- test:
- url: "some-valid-url/{the_url_question}" # data['special_key']
- method: 'POST'
- headers : {etc..etc}
- body: '{ "data": ${the_body_question} }' # data (the content)
Now the problem starts in my lack of understanding why (if true) does pyrest does not have support for dictionary mappings. I understand yaml supports these feature but am not sure if pyrest can parse through it. Knowing how to call and use dictionary variable in my url and body tags would be significantly helpful.
As of right now, if I try to convert my data Sequence into a data Dictionary, I will get an error stating:
yaml.parser.ParserError: while parsing a block mapping
in "<unicode string>", line 4, column 1:
data:
^
expected <block end>, but found '-'
in "<unicode string>", line 36, column 1:
- config:
I'm pretty sure there are gaps in my knowledge regarding how yaml and pyresttest interact with each other, so any insight would be greatly appreciated.

cant call curl from python3

I am trying to call this curl from python3. This, from bash, is working fine.
curl -LH "Accept: text/bibliography; style=bibtex" http://dx.doi.org/10.1103/PhysRevLett.117.126802
yielding the expected result:
#article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J. K. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H. W.}, year={2016}, month={Sep}}
in python3, I am doing:
import subprocess
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
try:
subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
except ExplicitException:
print("DOI is not available")
self.Messages.on_warn_clicked("DOI is not given",
"Search google instead")
which is giving error:
<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>
whats going wrong here?
You have 3 problems here:
don't quote your arguments in subprocess, it already does that for you when necessary, since you pass the arguments and not the unsplitted command line (good practice, keep it on, but drop the unneccessary quoting).
then, subprocess.call does not allow to parse/store the output in python, which is problematic for number 3:
and last: your site answers with rubbish HTML (java stacktrace) randomly. This explains why you're getting different output in python, but you can get it in bash as well.
Problem #1
subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
should be
subprocess.call(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi])
Else, quotes are applied twice and your Accept: xxx argument has quotes around it, which is unexpected by curl
demo of the non-working quote part:
import subprocess,os
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
#### this is wrong because of the quoting ####
p = subprocess.Popen(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi],stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
[output,error] = p.communicate()
print(output)
result:
b' some stats then ... <html><body><h1>400 Bad request</h1>\nYour browser sent an invalid request.\n</body></html>\n\r\n'
Problems #2 and #3
I have implemented a retry mechanism which parses the output and retries until correct output is found:
import subprocess,os,sys
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
while True:
p = subprocess.Popen(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi],stdout=subprocess.PIPE)
[output,error] = p.communicate()
output = output.decode("latin-1")
if "java.util.concurrent.FutureTask.run" in output:
# site crashed when responding: junk HTML output: retry
sys.stderr.write("Wrong answer: retrying\n")
else:
print(output)
break
result:
Wrong answer: retrying <==== here the site throwed a big HTML exception output
#article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J.âK. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H.âW.}, year={2016}, month={Sep}}
So it works, it's just a site problem, but with my python wrapper you are able to re-submit the request until it yields the proper answer.

Resources