How can I query all records with timestamp older than 30 days using cURL? - linux

I want to fetch all records (from Solr) with a timestamp older than 30 days via cURL command.
What I have tried:
curl -g "http://localhost:8983/solr/input_records/select?q=timestamp:[* TO NOW/DAY-30DAYS]"
I do not understand why this does not work but it does not fetch anything. It simply returns nothing. If I replace '[* TO NOW/DAY-30DAYS]' with an actual value, it will retrieve that record.
Additional relevant information, this is how to delete all records older than 30 days (it works). Again, I do not want to delete, rather just fetch the data.
curl -g "http://localhost:8983/solr/input_records/update?commit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>timestamp:[* TO NOW/DAY-30DAYS]</query></delete>"
Thanks in advance!

This error is happening because you don't have proper URL encoding for your request. Most likely the problem is spaces - need to replace them with %20, same applies to other symbols
Try this:
curl -g "http://localhost:8983/solr/input_records/select?q=timestamp:[*%20TO%20NOW/DAY-30DAYS]

A further addition to Mysterion's answer,
Since you are doing this using curl, you are facing the issue of the URL encoding.
If you just mention
http://localhost:8983/solr/input_records/select?q=timestamp:[* TO NOW/DAY-30DAYS]
in your browser (chrome or others)
the Url encoding is automatically handled by the browser and you would get your response as expected.

Related

How do I substitute my values in a post request?

I have links .php how do I substitute my values in all parameters using curl post?
Provided that I do not know what the parameters are in these php links, curl should determine for itself what parameters are in the post request and substitute my values.
If I know the parameter, then I can send it to the links like this:
while read p; do
curl $p -X POST --connect-timeout 18 --cookie "" --user-agent "" -d "parametr=helloworld" -w "%{url}:%{time_total}s\n"
done < domain.txt > output.txt
And if I do not know the parameters, what should I do? How to make curl automatically substitute values into parameters? For example, the value: "hello world" provided that I did not know "parameter"
It's simply not possible. curl is a client program and has no way of knowing or finding out which request parameters are supported by a server or which are not.
Unless of course, the API is properly documented and available as an OpenAPI/Swagger specification for example. If it isn't, you're out of luck.

Did I discover a bug in Telegram?

I use a Telegram bot to incorporate weather alerts from a local weatherservice into my homeautomation system. Today I discovered a weird problem because the message containing the weather alert wasn't sent. If I try this in bash on Linux:
output="Nationaal Hitteplan";curl "https://api.telegram.org/botxxxxxxxxx:longsecuritycode/sendMessage?chat_id=xxxxxxxxx&text=$output"
(I removed my personal tokens in the above command of course...)
then I get a 400 Bad Request and the message is not sent.
If I change output="Nationaal Hitteplan" to output="Nationaal hitteplan" then the message is send as it is supposed to be.
I don't see what's wrong here. The term Nationaal Hitteplan is basically a set of advisories about what to do in hot weather. It has no negative meaning associated with it but apparently Telegram detects a problem.
Does someone have a solution for this except from changing the term as described above?
Url's that contains special characters like a space should be url-encoded.
Use the following curl command to let curl let the encoding:
token='xxxxxxxxx:longsecuritycode';
output="Nationaal Hitteplan";
curl -G \
--data-urlencode 'chat_id=1234567' \
--data-urlencode "text=${output}" \
"https://api.telegram.org/bot${token}/sendMessage"
How to urlencode data for curl command?

Couchdb replication hangs and doesn't proceed

I'm trying to replicate some couchdb databases. It works perfectly fine for all, except for one. It seems to always stay stuck at some point in the replication, and the terminal forever stays on hold and doesn't return.
I have no idea where the problem comes from, I imagine it could be one specific document that is problematic, but I don't know how I could check and debug that.
Also, I noticed that if I use replicator in futon, it seems to be working and replicate. But I need to be able to do it from the command line.
Any idea where the problem could be coming from or/and how to debug it?
This is the command I'm using:
curl -X POST http://127.0.0.1:5984/_replicate -d '{"source":"http://user:password#127.0.0.1:5984/origin-db-name", "target":"http://user:password#127.0.0.1:5984/target-db-name"}' -H "Content-Type: application/json"
and here is the output I get when running: curl "http://user:password#127.0.0.1:5984/_active_tasks":
[
{
"pid":"<0.351.0>",
"checkpoint_interval":5000,
"checkpointed_source_seq":1007,
"continuous":false,
"doc_id":null,
"doc_write_failures":0,
"docs_read":0,
"docs_written":0,
"missing_revisions_found":0,
"progress":59,
"replication_id":"4ec403b2de1d9c546182252369f1d96e",
"revisions_checked":187,
"source":"http://user:*****#127.0.0.1:5984/origin-db-name/",
"source_seq":1693,
"started_on":1531010162,
"target":"http://user:*****#127.0.0.1:5984/target-db-name/",
"type":"replication",
"updated_on":1531010188
}
]

Force couchdb to reference attachments instead of duplicate in new revision

I have a problem with attachments in couchdb.
Let's say I have a document with a big attachment (100 MB). It means that each time you're modifying the document (not the attachment, just one field of the document), it will duplicate the 100 MB attachment.
Is it possible to force couchdb to create references of attachments when they are not modified (couchdb can easily verify if the attachment has been modified with the MD5)?
Edit:
According to this it should be able to do it but how? Mine (personal install) doesn't do it by default!
Normally, what you expect to find is the default behaviour of CouchDB. I think it could depend on how the API is used however. For example, following sample scenario works fine (on CouchDB 1.5)
All commands are given in bash syntax, so you can reproduce easily (just make sure to use correct document id and revision numbers).
Create 10M sample file for upload
dd if=/dev/urandom of=attach.dat bs=1024 count=10240
Create test DB
curl -X PUT http://127.0.0.1:5984/attachtest
Database expected data_size is about few bytes at this point. You can query it as follows, and look for data_size attribute.
curl -X GET http://127.0.0.1:5984/attachtest
which gives in my test:
{"db_name":"attachtest","doc_count":1,"doc_del_count":0,"update_seq":2,"purge_seq":0,"compact_running":false,"disk_size":8287,"data_size":407,"instance_start_time":"1413447977100793","disk_format_version":6,"committed_update_seq":2}
Create sample document
curl -X POST -d '{"hello": "world"}' -H "Content-Type: application/json" http://127.0.0.1:5984/attachtest
This command gives an output with document id and revision, which are then should be used hereafter
Now, attach sample file to the document; command should use id and revision as logged in the output of the previous one:
curl -X PUT --data-binary #attach.dat -H "Content-Type: application/octet-stream" http://127.0.0.1:5984/attachtest/DOCUMENT-ID/attachment\?rev\=DOCUMENT-REVISION-1
Last command output denotes that revision 2 have been created, so the document was updated indeed. One can check the database size now, which should be around 10000000 (10M). Again, looking for data_size in the following command's output:
curl -X GET http://127.0.0.1:5984/attachtest
Now, geting the document back from DB. It will be used then to update it. Important to have in it:
the _rev in the document, to be able to update it
attachment stub, to denote that attachment should not be deleted, but kept intact
curl -o document.json -X GET http://127.0.0.1:5984/attachtest/DOCUMENT-ID
Update document content, not changing the attachment itself (keeping the stub there). Here this will simply change one attribute value.
sed -i 's/world/there/' document.json
and update document in the DB
curl -X PUT -d #document.json -H "Content-Type: application/json" http://127.0.0.1:5984/attachtest/DOCUMENT-ID
Last command output denotes that revision 3 have been created, so we now that the document is updated indeed.
Finally, now we can verify the database size! Expected data_size is still around 10000000 (10M), not 20M:
curl -X GET http://127.0.0.1:5984/attachtest
And this should work fine. For example, on my machine it gives:
{"db_name":"attachtest","doc_count":1,"doc_del_count":0,"update_seq":8,"purge_seq":0,"compact_running":false,"disk_size":10535013,"data_size":10493008,"instance_start_time":"1413447977100793","disk_format_version":6,"committed_update_seq":8}
So, still 10M.
It means that each time you're modifying the document (not the
attachment, just one field of the document), it will duplicate the 100
MB attachment.
In my testing I found the opposite - the same attachment is linked through multiple revisions of the same document with no loss of space.
Please can you retest to be certain of this behaviour?

Is it possible to read only first N bytes from the HTTP server using Linux command?

Here is the question.
Given the url http://www.example.com, can we read the first N bytes out of the page?
using wget, we can download the whole page.
using curl, there is -r, 0-499 specifies the first 500 bytes. Seems solve the problem.
You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you'll instead get the whole document.
using urlib in python. similar question here, but according to Konstantin's comment, is that really true?
Last time I tried this technique it failed because it was actually impossible to read from the HTTP server only specified amount of data, i.e. you implicitly read all HTTP response and only then read first N bytes out of it. So at the end you ended up downloading the whole 1Gb malicious response.
So the problem is that how can we read the first N bytes from the HTTP server in practice?
Regards & Thanks
You can do it natively by the following curl command (no need to download the whole document). According to the curl man page:
RANGES
HTTP 1.1 introduced byte-ranges. Using this, a client can request to get only one or more subparts of a specified document. curl
supports this with the -r flag.
Get the first 100 bytes of a document:
curl -r 0-99 http://www.get.this/
Get the last 500 bytes of a document:
curl -r -500 http://www.get.this/
`curl` also supports simple ranges for FTP files as well.
Then you can only specify start and stop position.
Get the first 100 bytes of a document using FTP:
curl -r 0-99 ftp://www.get.this/README
It works for me even with a Java web app deployed to GigaSpaces.
curl <url> | head -c 499
or
curl <url> | dd bs=1 count=499
should do
Also there are simpler utils with perhaps borader availability like
netcat host 80 <<"HERE" | dd count=499 of=output.fragment
GET /urlpath/query?string=more&bloddy=stuff
HERE
Or
GET /urlpath/query?string=more&bloddy=stuff
You should also be aware that many
HTTP/1.1 servers do not have this
feature enabled, so that when you
attempt to get a range, you'll instead
get the whole document.
You will have to get the whole web anyways, so you can get the web with curl and pipe it to head, for example.
head
c, --bytes=[-]N
print the first N bytes of each file; with the leading '-', print all
but the last N bytes of each file
I came here looking for a way to time the server's processing time, which I thought I could measure by telling curl to stop downloading after 1 byte or something.
For me, the better solution turned out to be to do a HEAD request, since this usually lets the server process the request as normal but does not return any response body:
time curl --head <URL>
Make a socket connection. Read the bytes you want. Close, and you're done.

Resources