Read from URL in Groovy (with redirect) - groovy

This Groovy code outputs an empty string:
def url = 'http://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&titles=Mozambique&format=xml&props='.toURL()
print url.getText('utf-8')
With the same URL, curl also returns empty, but curl -L returns the XML I want.
Is there something for Groovy that is similar to that -L option?
-L: If the server reports that the requested page has moved to a different location (indicated with a Location: header and a 3XX response code), this option will make curl redo the request on the new place.

Groovy uses Java's HttpUrlConnection under the covers, which doesn't automatically follow redirects. However,here is a small function that will handle it for you by checking the status and location header on the response and call the redirected URL if necessary:
def findRealUrl(url) {
HttpURLConnection conn = url.openConnection()
conn.followRedirects = false
conn.requestMethod = 'HEAD'
if(conn.responseCode in [301,302]) {
if (conn.headerFields.'Location') {
return findRealUrl(conn.headerFields.Location.first().toURL())
} else {
throw new RuntimeException('Failed to follow redirect')
}
}
return url
}
The code can be downloaded on GitHub.

Related

Vegeta Load Testing: different body for each POST request in the attack

Is there a way to change the json body in vegeta Post request load tests in vegeta.
I want to send a request with a different parameter in the json body for each of the requests. for example if I have
POST https://endpoint.com/createNew
#/targets/data.json
and data.json looks like
{
"id": 1234
}
What is the best way to make it so we have different request data for each of the requests in the attack?
I needed to do something similiar and decided to use the vegeta lib rather than cli for this which allows me to control the HTTP requests
So you need to write your own function which return a vegeta.Targeter
func NewCustomTargeter() vegeta.Targeter {
return func(tgt *vegeta.Target) error {
if tgt == nil {
return vegeta.ErrNilTarget
}
tgt.Method = "POST"
tgt.URL = "https://endpoint.com/createNew"
rand := generateFourDigitRandom()
payload := '{ "id":"`+rand+ `" } `
tgt.Body = []byte(payload)
return nil
}
}
and use this function in the main function like this
targeter := NewCustomTargeter()
attacker := vegeta.NewAttacker()
var metrics vegeta.Metrics
for res := range attacker.Attack(targeter, rate, duration, "Load Test") {
metrics.Add(res)
}
metrics.Close()
fmt.Printf("%+v \n", metrics)
On Jul 10, 2018, vegeta#PR300 introduced the -format=json option. Here is the vegeta README description:
The JSON format makes integration with programs that produce targets
dynamically easier. Each target is one JSON object in its own line.
The method and url fields are required. If present, the body field
must be base64 encoded. The generated JSON Schema defines the format
in detail.
And their provided example:
jq -ncM '{method: "GET", url: "http://goku", body: "Punch!" | #base64, header: {"Content-Type": ["text/plain"]}}' |
vegeta attack -format=json -rate=100 | vegeta encode
If you have multiple files under targets folder and would like to use them all in your load test I would suggest the following configuration:
Install Directory Listing Config plugin using JMeter Plugins Manager
Configure it to point to your targets folder:
In your HTTP Request Sampler use __FileToString() function like:
${__FileToString(${filename},,)}
When you run your test it will pick up the next file from targets directory and use its contents as the request body

a media spider in node js

I'm working on a project named robot hosting on GitHub. The job of my project is to fetch medias from the url which is given from the xml config file.And the xml config file has the defined format just as you can see in scripts dir.
My problem is as below.There are two args:
A list which indicates how deep the web link is, and according to the selector(css selector) in the list item, i can find out the media url or the sub page url where i may finally find out the media.
An arr which contains the sub page urls.
The simplified example as below:
node_list = {..., next = {..., next= null}};
url_arr = [urls];
I want to iterate all the items in the url arr, so i do as below:
function fetch(url, node) {
if(node == null)
return ;
// here do something with http request
var req = http.get('www.google.com', function(res){
var data = '';
res.on('data', function(chunk) {
data += chunk;
}.on('end', function() {
// maybe here generate more new urls
// get another url_list
node = node.next;
fetch(url_new, node);
}
}
// here need to be run in sync
for (url in url_arr) {
fetch(url, node)
}
As you can see, if use async http request, it must eats all system resources. And i can not control the process.
So do anyone have a good idea to solve this problem?
Or, is nodejs not the proper way to do such jobs?
If the problem is that you get too many HTTP requests simultaneously, you could change the fetch function to operate on a stack of URLs.
Basically you would do this:
When fetch is called, insert the URL into the stack and check if a request is in progress:
If a request is not running, pick the first url from stack and process it, otherwise do nothing
When a http request is finished, have it take a new url from the stack and process that
This way you can have the for-loop add all the URLs like now, but only one URL is processed at a time so there won't be too much resources being used.

CouchDB list "An error occured accessing the list: Invalid JSON" error

I'm trying to call a CouchDB list from JavaScript.
My jQuery call is:
var listItems = $.couch.db('foo').list('foo/barList', 'foo/barView');
If I test this in the FireBug console when I'm on a page that has jquery.couch.js loaded (like Futon) it returns exactly what I want it to (several <input> tags with the appropriate data populated).
However, when I call this from code, I get the error:
An error occured accessing the list: Invalid JSON: [the html]
... where [the html] is the html I want to manipulate in my script. I don't understand why I'm getting a JSON error - I thought the point of lists was to return HTML. Is there a way to force it to return my html to me?
Also, my list function includes the following, so I'm not sure why this doesn't work.
start({
"headers": {
"Content-Type": "text/html"
}
});
According to https://issues.apache.org/jira/browse/COUCHDB-1059 this was a recognized bug and it had been patched. However after making the changes in jquery.couch.js recommended by Jan Lehnardt on the above page, I had to do one thing further.
The page above recommends making the following change in jquery.couch.js :
- var resp = httpData(req, "json");
+ var resp = httpData(req, dataType);
For some reason this didn't work for me but it did when I instead replaced it with the following. Theoretically one could add handlers for different types of content-types below.
var cType = req.getResponseHeader('Content-Type');
switch(cType) {
case 'text/html':
var resp = req.responseText;
break;
default:
var resp = $.parseJSON(req.responseText);
break;
}
If I'm missing something, I welcome recommendations on how to do this more effectively, but this works for me.

How to return only status code in POST request in servicestack Instead of HTML page

have created REST service using servicestack and in post request I have return object in following way
return new HttpResult(request)
{
StatusCode = HttpStatusCode.Created,
};
request: object which i have posted in database
When i check it in fiddler it render whole HTML Page of servicestack in response body, instead of that i would like to return Status code only, so please tell me how can i do?
Thanks
There was a bug in versions before < v3.05 that did not respect the HttpResult ContentType in some scenarios, it should be fixed now with the latest version of ServiceStack on NuGet or available from:
https://github.com/ServiceStack/ServiceStack/downloads
Prior to this you can still force the desired ContentType by changing the Accept:application/json Request Header on HttpClient or by appending ?format=json on the querystring of your url.
So now if you don't want to have any DTO serialized, you don't add it to the HttpResult:
return new HttpResult() { StatusCode = HttpStatusCode.Created };
Note you still might get an empty Html response back if calling this service in the browser (or any Rest Client that Accepts:text/html). You can force a ContentType that won't output any response if it has empty payload (e.g JSON/JSV) by specifying it in the result as well, e.g;
return new HttpResult() {
StatusCode = HttpStatusCode.Created,
ContentType = ContentType.Json
};

GET and XMLHttpRequest

i have an XMLHttpRequest.The request passes a parameter to my php server code in /var/www. But i cannot seem to be able to extract the parameter back at the server side. below i have pasted both the codes:
javascript:
function getUsers(u)
{
alert(u);//here u is 'http://start.ubuntu.com/9.10'
xmlhttp=new XMLHttpRequest();
var url="http://localhost/servercode.php"+"?q="+u;
xmlhttp.onreadystatechange= useHttpResponse;
xmlhttp.open("GET",url,true);
xmlhttp.send(null);
}
function useHttpResponse()
{
if (xmlhttp.readyState==4 )
{
var response = eval('('+xmlhttp.responseText+')');
for(i=0;i<response.Users.length;i++)
alert(response.Users[i].UserId);
}
}
servercode.php:
<?php
$q=$_GET["q"];
//$q="http://start.ubuntu.com/9.10";
$con=mysql_connect("localhost","root","blaze");
if(!$con)
{die('could not connect to database'.mysql.error());
}
mysql_select_db("BLAZE",$con) or die("No such Db");
$result=mysql_query("SELECT * FROM USERURL WHERE URL='$q'");
if($result == null)
echo 'nobody online';
else
{
header('Content-type: text/html');
echo "{\"Users\":[";
while($row=mysql_fetch_array($result))
{
echo '{"UserId":"'.$row[UsrID].'"},';
}
echo "]}";
}
mysql_close($con);
?>
this is not giving the required result...although the commented statement , where the variable is assigned explicitly the value of the argument works...it alerts me the required output...but somehow the GET method's parameter is not reaching my php or thats how i think it is....pls help....
If u is http://start.ubuntu.com/9.10 as you write, the URL gets garbled because : is a forbidden character in a URL.
You need to escape the URL using encodeURIComponent() in Javascript, and urldecode() it back in PHP. Docs here and here.
The JavaScript part would look like so:
var url="http://localhost/servercode.php"+"?q="+encodeURIComponent(u);
and the PHP part:
$q=urldecode($_GET["q"]);
your mySQL query is also vulnerable to a SQL injection, which is highly dangerous. You should at least sanitize $q using mysql_real_escape_string(). See this question for an overview on the problem, and possible solutions.

Resources