Elasticsearch-py bulk helper equivalent of curl with file - python-3.x

I am looking to replicate the following command using the elasticsearch python client (and without using subprocess):
curl -s -XPOST "localhost:9200/index_name/_bulk" --data-binary #file
I have attempted to use the bulk helper without any luck:
es = Elasticsearch()
with open("file") as fp:
bulk(
client=es,
index="index_name",
actions=fp
)
This results in type is missing errors.
The file, which is processed just fine when using curl, looks a bit like this:
{"index":{"_type":"someType","_id":"123"}}
{"field1":"data","field2":"data",...}
{"index":{"_type":"someType","_id":"456"}}
{"field1":"data","field2":"data",...}
...
Please note, I'd rather not change the contents of the file since I have around 21000 with the same format.

The actions parameter must take an iterable (not a file handle) that will iterate over the lines of your file, so you need to do it like this instead:
es = Elasticsearch()
def readbulk():
for line in open("file"):
yield line
bulk(
client=es,
index="index_name",
actions=readbulk
)

Related

Problem running bitbucket rest api command using python

I,am building a script to update files on Bitbucket using the rest api.
My problems are:
Running the command using subprocess lib and running the command directly on the command line gives two different results.
If I run the command using the command line, when I inspect my commits on the Bit bucket app I can see a Commit message and a Issue.
If I run the command using the help of the subprocess lib I don't have a commit message and a Issue in the end. The commit message sets itself by default to "edited by bitbucket" and the issue is null.
This is the command:
curl -X PUT -u user:pass -F content=#conanfile_3_f62hu.py -F 'message= test 4' -F branch=develop -F sourceCommitId={} bitbucket_URL".format(latest_commit)
The other problem is that I need to pass a file to the content in order to update it.
If I pass it like above it works. The problem is that I am generating the file content as raw string and creating a temporary file with that content.
And when I pass the file as a variable, it does not get the content of the file.
My code:
content = b'some content'
current_dir = os.getcwd()
temp_file=tempfile.NamedTemporaryFile(suffix=".py",prefix="conanfile", dir=current_dir)
temp_file.name = temp_file.name.split("\\")
temp_file.name = [x for x in temp_file.name if x.startswith("conanfile")][0]
temp_file.name = "#" + temp_file.name
temp_file.write(content)
temp_file.seek(0)
update_file_url = "curl -X PUT -u user:pass -F content={} -F 'message=test 4' -F branch=develop -F sourceCommitId={} bitbucket_url".format(temp_file.name, latest_commit)
subprocess.run(update_file_url)
Basically I'am passing the file like before, just passing the name to the content, but it does not work.
If I print the command everything looks good, so I don't know why the commit message does not get set and the file content as well.
Updated:
I was able to pass the file, My mistake was that I was not passing it like temp_file.name.
But I could not solve the problem of the message.
What I found is that the message will only take the first word. If there is a space and one more word after, it will ignore it.
The space is causing some problem.
I found the solution, if someone found himself with this problem we need to use a \ before the message= .
Example: '-F message=\" Updated with latest dependencies"'

Execute a subprocess that takes an input file and write the output to a file

I am using a third-party C++ program to generate intermediate results for the python program that I am working on. The terminal command that I use looks like follows, and it works fine.
./ukb/src/ukb_wsd --ppr_w2w -K ukb/scripts/wn30g.bin -D ukb/scripts/wn30_dict.txt ../data/glass_ukb_input2.txt > ../data/glass_ukb_output2w2.txt
If I break it down into smaller pieces:
./ukb/src/ukb_wsd - executable program
--ppr_w2w - one of the options/switches
-K ukb/scripts/wn30g.bin - parameter K indicates that the next item is a file (network file)
-D ukb/scripts/wn30_dict.txt - parameter D indicate that the next item is a file (dictionary file)
../data/glass_ukb_input2.txt - input file
> - shell command to write the output to a file
../data/glass_ukb_output2w2.txt - output file
The above works fine for one instance. I am trying to do this for around 70000 items (input files). So found a way by using the subprocess module in Python. The body of the python function that I created looks like this.
with open('../data/glass_ukb_input2.txt', 'r') as input, open('../data/glass_ukb_output2w2w_subproc.txt', 'w') as output:
subprocess.run(['./ukb/src/ukb_wsd', '--ppr_w2w', '-K', 'ukb/scripts/wn30g.bin', '-D', 'ukb/scripts/wn30_dict.txt'],
stdin=input,
stdout=output)
This error is no longer there
When I execute the function, it gives an error as follows:
...
STDOUT = subprocess.STDOUT
AttributeError: module 'subprocess' has no attribute 'STDOUT'
Can anyone shed some light about solving this problem.
EDIT
The error was due to a file named subprocess.py in the source dir which masked Python's subprocess file. Once it was removed no error.
But the program could not identify the input file given in stdin. I am thinking it has to do with having 3 input files. Is there a way to provide more than one input file?
EDIT 2
This problem is now solved with the current approach:
subprocess.run('./ukb/src/ukb_wsd --ppr_w2w -K ukb/scripts/wn30g.bin -D ukb/scripts/wn30_dict.txt ../data/glass_ukb_input2.txt > ../data/glass_ukb_output2w2w_subproc.txt',shell=True)

How to pass string via STDIN into terminal command being executed within python script?

I need to generate postgres schema from a dataframe. I found csvkit library to come closet to matching datatypes. I can run csvkit and generate postgres schema over a csv on my desktop via terminal through this command found in docs:
csvsql -i postgresql myFile.csv
csvkit docs - https://csvkit.readthedocs.io/en/stable/scripts/csvsql.html
And I can run the terminal command in my script via this code:
import os
a=os.popen("csvsql -i postgresql Desktop/myFile.csv").read()
However I have a dataframe, that I have converted to a csv string and need to generate schema from the string like so:
csvstr = df.to_csv()
In the docs it says that under positional arguments:
The CSV file(s) to operate on. If omitted, will accept
input on STDIN
How do I pass my variable csvstr into the line of code a=os.popen("csvsql -i postgresql csvstr").read() as a variable?
I tried to do the below line of code but got an error OSError: [Errno 7] Argument list too long: '/bin/sh':
a=os.popen("csvsql -i postgresql {}".format(csvstr)).read()
Thank you in advance
You can't pass such a big string via commandline! You have to save the data to a file and pass its path to csvsql.
import csv
csvstr = df.to_csv()
with open('my_cool_df.csv', 'w', newline='') as csvfile:
csvwriter= csv.writer(csvfile)
csvwriter.writerows(csvstr)
And later:
a=os.popen("csvsql -i postgresql my_cool_df.csv")

Downloading json file from json file curl

I have a json file with the structure seen below:
{
url: "https://mysite.com/myjsonfile",
version_number: 69,
}
This json file is accessed from mysite.com/myrootjsonfile
I want to run a load data script to access mysite.com/myrootjsonfile and load the json content from the url field using curl and save the resulting content to local storage.
This is my attempt so far.
curl -o assets/content.json 'https://mysite.com/myrootjsonfile' | grep -Po '(?<="url": ")[^"]*'
unfortunately, instead of saving the content from mysite.com/myjsonfile its saving the content from above: mysite.com/myrootjsonfile. Can anyone point out what i might be doing wrong? Bear in mind in a completely new to curl. Thanks!
It is saving the content from myrootjsonfile because that is what you are telling curl to do - to save that file to the location assets/content.json, and then greping stdin, which is empty. You need to use two curl commands, one to download the root file (and process it to find the URL of the second), and the second to download the actual content you want. You can use command substitution for this:
my_url=$(curl https://mysite.com/myrootjsonfile | grep -Po '(?<=url: )[^,]*')
curl -o assets/content.json "$my_url"
I also changed the grep regex - this one matches a string of non-comma characters which follow after "url: ".
Assuming you wished to save the file to assets/content.json, note that flags are case sensitive.
Use -o instead of -O to redirect the output to assets/content.json.

How can I send a file's contents as a POST parameter using cURL?

I'm trying to use cURL to POST the contents of a file, as if I'd pasted that contents in to an html textarea. That's to say I don't want to upload the file, I just want a post parameter called foo to be filled with text from a file called bar.txt. bar.txt's contents may include newlines, quotes, and so on.
Is this possible?
Thanks.
Edit: I found out how to do it in the end:
curl --data-urlencode "foo#bar.txt" http://example.com/index.php
This will take the contents of the file bar.txt, url encode it, place the resultant string in a parameter called foo in a POST request of http://example.com/index.php.
I can't speak to whether the solutions others have suggested will work or not, but the one above seems like the best way.
You can by doing something like:
$ curl --data "foo:$(cat foo.txt)" http://localhost/yourfile.php
Note that you'll probably want to encode the file, as cacheguard said. To encode it in base64, just modify the previous command like this:
$ curl --data "foo:$(cat foo.txt | base64)" http://localhost/yourfile.php
You should encode/decode the content of your file (for instance by using the base64 command under Linux).
file foo.txt:
8<----------------------------
Hello World
I am a Secure Web Gateway
8<----------------------------
base64 foo.txt | base64 -d

Resources