Parse through HTML response from getUrl()

Parse through HTML response from getUrl() - bixby

I am trying create a Bixby capsule that can retrieve values from a website. The getUrl() response will be in HTML. Is there a way to parse through the HTML response for values or somehow converting it to JSON?

You can do that through Bixby using regular expressions, but you would need to parse everything through some regex processing in your code when you get a response from getUrl(). There isn't a way to automatically convert an html response to json. Here are some references to parsing html responses with regex.

Related

UTF-8 handling in the Distance Matrix API

I'm trying to write a VBA code that interacts with the Google's Distance Matrix Api, and returns me the distance between two points. The problem is, when I try to use latin letters (like ã), it returns a Invalid Destination error:
<status>INVALID_REQUEST</status>
<error_message>Invalid request. Invalid 'destinations' parameter.</error_message>
</DistanceMatrixResponse>
The Url I'm using is this:
https://maps.googleapis.com/maps/api/distancematrix/xml?units=metric&origins=Fortaleza&destinations=São+Paulo&key=" & myAPIK
Do you guys know any way to get around this?
But when I use UTF-8 code instead of the word itself, it returns me the distance that I wanted:
https://maps.googleapis.com/maps/api/distancematrix/xml?units=metric&origins=Fortaleza&destinations=S%C3%A3o+Paulo&key=" & myAPIK
I also already tried putting the language (pt-BR) in the Url, but it didn't work.

You're going to want to look at how the actual web page works.
Open the web page in Chrome, then view the Request Traffic in DevTools (F12), it will be on the network tab.
You'll have to find the request in the sidebar, filtering by XHR will help.
When you find the request itself, it will show you how it's encoded. It's unlikely that Latin is the transfer-encoding parameter, and UTF-8 is probably being filtered before the request is sent via JavaScript normalization of your input. But unless you examine the API in DevTools you wont know for sure.
Also if you're just using QueryString (aka the part of the API parameter in the url) to interact with the API, there is a urlencode probably happening. You can see using this website how this works, with Sao Paulo. And you probably need to 'sanitize' your input using an urlencode function before post/get the API.

How to escape Semicolon in django rest framework GET Request

i'm new for Django REST Framework,help me to escape semicolon in django rest framework GET Request.
i'm using postman tool for GET request
Input Request.
Response output is
If you check second image input text after semicolon text(hello) is removed ,so i have lost my original input text as i sended get request.help me achieve this.
Requested code
Thanks

Nutch 2 exclude content-type image from crawling

The problem is that there can be images not with the specific image extensions. For example Nutch2 was crawling a page ending with .ashx but was still an image.
Is there a way to exclude images using an HTML header filter:content-type: images/* or something equivalent but not based on a url pattern (regex-urlfilter.txt)?

You can achieve this by writing a plugin that will extend URLFilter interface.
In String filter(String urlString) method, you can check the url if it has some vague extension then further validate by getting its HTTP header values from the server and check if its content type is an image then return null otherwise return the URL. But I doubt that would not be very efficient method since many useless HTTP calls will be generated for this validation purpose only.
Another thing is to just let it be and Nutch will not going to parse and/or index the image anyway.

How to pass a very long string/file into RESTWebservice JAX-RS Jersey

I wrote a RESTful webservice using JAX-RS API, that returns an XML string.
I am trying to write another RESTful webservice that takes in this XML string, does parsing of it using DOM and extract particular things I want. The XML string happens to be very long so I do not want to pass it as a #QueryParam or #PathParam.
Say If I write that XML string into a file, How do I go about writing this service that takes in this file, extracts whatever I want and return the results. I am actually trying to extract some number of strings, so my webservice should finally return an array with all those strings.
Could somebody please shed some light on how I should go about doing this.
Thanks in advance

Sashikiran,
not sure I understand this correctly, but you can implement streaming access to the HTTP output and input streams. You need not read or write the whole thing at once.
So, while your read the stream from service A you can extract what you need and write that out to service B via a POST request.
Why are you DOM-parsing the XML? A SAX or StAX parser seems more suitable of the XML is indeed very long.
Jan

How do I aggregate data off of a google search

I am trying to aggregate movie times off of google/movies search into a usable format such as json or xml
http://www.google.com/movies?q=movie+times&sc=1&mid=&hl=en&oi=showtimes&ct=change-location&near=new+york
The Google AJAX api does not seem to work for this as you cannot do a movie search.
Does anyone know how this can be done?

Lookup the technique called web scraping.
Basically, you have to fetch the results page using some server-side scripting, and then extract data from it, to present in a formated way (json, xml, etc). Regular expressions or a DOM/XML parser could help.

This guy has a PHP script that converts Google results to RSS.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Parse through HTML response from getUrl() - bixby

I am trying create a Bixby capsule that can retrieve values from a website. The getUrl() response will be in HTML. Is there a way to parse through the HTML response for values or somehow converting it to JSON?

Related

UTF-8 handling in the Distance Matrix API

How to escape Semicolon in django rest framework GET Request

Nutch 2 exclude content-type image from crawling

How to pass a very long string/file into RESTWebservice JAX-RS Jersey

How do I aggregate data off of a google search

Categories

Resources