Algolia search python client to upload indexes behind a firewall

Algolia search python client to upload indexes behind a firewall - python-3.x

Building a search with Algolia search product for our Moodle site.
To update the index at Algolia site we want to use some automated process (upload the index data).
I decided to start with the Python client (Python v 3.9.x). Since we are working inside a corporate network we are behind the firewalls and there is a problem accessing algolia's servers.
I'm testing with two methods:
Search within already existing indices: index.search
update all indexes: index.replace_all_objects
Error message: 'Unreachable hosts'
Here is my code snippets:
# Search:
searchAppID = constants.ALGOLIA_APP_ID
searchKeyHash = constants.ALGOLIA_SEARCH_KEY
config = SearchConfig(searchAppID, searchKeyHash)
config.connect_timeout = 2
config.read_timeout = 5
config.write_timeout = 30
client = SearchClient.create_with_config(config)
index = client.init_index('myIndex')
result = index.search('SearchString')
# function to update index using json file:
def replace_IdxObj(client,index,fileName):
try:
if (index.exists()):
with open(fileName, encoding="utf8") as f:
data = json.load(f)
result = index.replace_all_objects(data, { 'autoGenerateObjectIDIfNotExist' : True })
return result
else:
print('No Index found! Cancel operation... \n')
return None
except Exception as err:
print ("Error: " + str(err))
When I test my code outside of the company's network - it's all ok!
Also I have a SSL certificate that could be used to work with external resources, so I used it in the following test (running from company's network behind the firewall):
response = requests.get('https://google.com/', verify=("C:\\Program Files\\Common Files\\SSL\\certs\\cert.cer"))
print(response)
This test works just fine (I'm getting 200 response)!
Please advise are there any ways to make Python process (which is used by algolia's python client) to use the certificate that I have ??? Any alternatives?
Many thanks!

Related

400 Caller's project doesn't match parent project

I have this block of code that basically translates text from one language to another using the cloud translate API. The problem is that this code always throws the error: "Caller's project doesn't match parent project". What could be the problem?
translation_separator = "translated_text: "
language_separator = "detected_language_code: "
translate_client = translate.TranslationServiceClient()
# parent = translate_client.location_path(
# self.translate_project_id, self.translate_location
# )
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = (
os.getcwd()
+ "/translator_credentials.json"
)
# Text can also be a sequence of strings, in which case this method
# will return a sequence of results for each text.
try:
result = str(
translate_client.translate_text(
request={
"contents": [text],
"target_language_code": self.target_language_code,
"parent": f'projects/{self.translate_project_id}/'
f'locations/{self.translate_location}',
"model": self.translate_model
}
)
)
print(result)
except Exception as e:
print("error here>>>>>", e)

Your issue seems to be related to the authentication method that you are using on your application, please follow the guide for authention methods with the translate API. If you are trying to pass the credentials using code, you can explicitly point to your service account file in code with:
def explicit():
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
'service_account.json')
Also, there is a codelab for getting started with the translation API with Python, this is a great step by step getting started guide for running the translate API with Python.
If the issue persists, you can try creating a Public Issue Tracker for Google Support

Getting owner of file from smb share, by using python on linux

I need to find out for a script I'm writing who is the true owner of a file in an smb share (mounted using mount -t cifs of course on my server and using net use through windows machines).
Turns out it is a real challenge finding this information out using python on a linux server.
I tried using many many smb libraries (such as smbprotocol, smbclient and others), nothing worked.
I find few solutions for windows, they all use pywin32 or another windows specific package.
And I also managed to do it from bash using smbcalcs but couldn't do it cleanly but using subprocess.popen('smbcacls')..
Any idea on how to solve it?

This was unbelievably not a trivial task, and unfortunately the answer isn't simple as I hoped it would be..
I'm posting this answer if someone will be stuck with this same problem in the future, but hope maybe someone would post a better solution earlier
In order to find the owner I used this library with its examples:
from smb.SMBConnection import SMBConnection
conn = SMBConnection(username='<username>', password='<password>', domain=<domain>', my_name='<some pc name>', remote_name='<server name>')
conn.connect('<server name>')
sec_att = conn.getSecurity('<share name>', r'\some\file\path')
owner_sid = sec_att.owner
The problem is that pysmb package will only give you the owner's SID and not his name.
In order to get his name you need to make an ldap query like in this answer (reposting the code):
from ldap3 import Server, Connection, ALL
from ldap3.utils.conv import escape_bytes
s = Server('my_server', get_info=ALL)
c = Connection(s, 'my_user', 'my_password')
c.bind()
binary_sid = b'....' # your sid must be in binary format
c.search('my_base', '(objectsid=' + escape_bytes(binary_sid) + ')', attributes=['objectsid', 'samaccountname'])
print(c.entries)
But of course nothing will be easy, it took me hours to find a way to convert a string SID to binary SID in python, and in the end this solved it:
# posting the needed functions and omitting the class part
def byte(strsid):
'''
Convert a SID into bytes
strdsid - SID to convert into bytes
'''
sid = str.split(strsid, '-')
ret = bytearray()
sid.remove('S')
for i in range(len(sid)):
sid[i] = int(sid[i])
sid.insert(1, len(sid)-2)
ret += longToByte(sid[0], size=1)
ret += longToByte(sid[1], size=1)
ret += longToByte(sid[2], False, 6)
for i in range(3, len(sid)):
ret += cls.longToByte(sid[i])
return ret
def byteToLong(byte, little_endian=True):
'''
Convert bytes into a Python integer
byte - bytes to convert
little_endian - True (default) or False for little or big endian
'''
if len(byte) > 8:
raise Exception('Bytes too long. Needs to be <= 8 or 64bit')
else:
if little_endian:
a = byte.ljust(8, b'\x00')
return struct.unpack('<q', a)[0]
else:
a = byte.rjust(8, b'\x00')
return struct.unpack('>q', a)[0]
... AND finally you have the full solution! enjoy :(

I'm adding this answer to let you know of the option of using smbprotocol; as well as expand in case of misunderstood terminology.
SMBProtocol Owner Info
It is possible to get the SID using the smbprotocol library as well (just like with the pysmb library).
This was brought up in the github issues section of the smbprotocol repo, along with an example of how to do it. The example provided is fantastic and works perfectly. An extremely stripped down version
However, this also just retrieves a SID and will need a secondary library to perform a lookup.
Here's a function to get the owner SID (just wrapped what's in the gist in a function. Including here in case the gist is deleted or lost for any reason).
import smbclient
from ldap3 import Server, Connection, ALL,NTLM,SUBTREE
def getFileOwner(smb: smbclient, conn: Connection, filePath: str):
from smbprotocol.file_info import InfoType
from smbprotocol.open import FilePipePrinterAccessMask,SMB2QueryInfoRequest, SMB2QueryInfoResponse
from smbprotocol.security_descriptor import SMB2CreateSDBuffer
class SecurityInfo:
# 100% just pulled from gist example
Owner = 0x00000001
Group = 0x00000002
Dacl = 0x00000004
Sacl = 0x00000008
Label = 0x00000010
Attribute = 0x00000020
Scope = 0x00000040
Backup = 0x00010000
def guid2hex(text_sid):
"""convert the text string SID to a hex encoded string"""
s = ['\\{:02X}'.format(ord(x)) for x in text_sid]
return ''.join(s)
def get_sd(fd, info):
""" Get the Security Descriptor for the opened file. """
query_req = SMB2QueryInfoRequest()
query_req['info_type'] = InfoType.SMB2_0_INFO_SECURITY
query_req['output_buffer_length'] = 65535
query_req['additional_information'] = info
query_req['file_id'] = fd.file_id
req = fd.connection.send(query_req, sid=fd.tree_connect.session.session_id, tid=fd.tree_connect.tree_connect_id)
resp = fd.connection.receive(req)
query_resp = SMB2QueryInfoResponse()
query_resp.unpack(resp['data'].get_value())
security_descriptor = SMB2CreateSDBuffer()
security_descriptor.unpack(query_resp['buffer'].get_value())
return security_descriptor
with smbclient.open_file(filePath, mode='rb', buffering=0,
desired_access=FilePipePrinterAccessMask.READ_CONTROL) as fd:
sd = get_sd(fd.fd, SecurityInfo.Owner | SecurityInfo.Dacl)
# returns SID
_sid = sd.get_owner()
try:
# Don't forget to convert the SID string-like object to a string
# or you get an error related to "0" not existing
sid = guid2hex(str(_sid))
except:
print(f"Failed to convert SID {_sid} to HEX")
raise
conn.search('DC=dell,DC=com',f"(&(objectSid={sid}))",SUBTREE)
# Will return an empty array if no results are found
return [res['dn'].split(",")[0].replace("CN=","") for res in conn.response if 'dn' in res]
to use:
# Client config is required if on linux, not if running on windows
smbclient.ClientConfig(username=username, password=password)
# Setup LDAP session
server = Server('mydomain.com',get_info=ALL,use_ssl = True)
# you can turn off raise_exceptions, or leave it out of the ldap connection
# but I prefer to know when there are issues vs. silently failing
conn = Connection(server, user="domain\username", password=password, raise_exceptions=True,authentication=NTLM)
conn.start_tls()
conn.open()
conn.bind()
# Run the check
fileCheck = r"\\shareserver.server.com\someNetworkShare\someFile.txt"
owner = getFileOwner(smbclient, conn, fileCheck)
# Unbind ldap session
# I'm not clear if this is 100% required, I don't THINK so
# but better safe than sorry
conn.unbind()
# Print results
print(owner)
Now, this isn't super efficient. It takes 6 seconds for me to run this one a SINGLE file. So if you wanted to run some kind of ownership scan, then you probably want to just write the program in C++ or some other low-level language instead of trying to use python. But for something quick and dirty this does work. You could also setup a threading pool and run batches. The piece that takes longest is connecting to the file itself, not running the ldap query, so if you can find a more efficient way to do that you'll be golden.
Terminology Warning, Owner != Creator/Author
Last note on this. Owner != File Author. Many domain environments, and in particular SMB shares, automatically alter ownership from the creator to a group. In my case the results of the above is:
What I was actually looking for was the creator of the file. File creator and modifier aren't attributes which windows keeps track of by default. An administrator can enable policies to audit file changes in a share, or auditing can be enabled on a file-by-file basis using the Security->Advanced->Auditing functionality for an individual file (which does nothing to help you determine the creator).
That being said, some applications store that information for themselves. For example, if you're looking for Excel this answer provides a method for which to get the creator of any xls or xlsx files (doesn't work for xlsb due to the binary nature of the files). Unfortunately few files store this kind of information. In my case I was hoping to get that info for tblu, pbix, and other reporting type files. However, they don't contain this information (which is good from a privacy perspective).
So in case anyone finds this answer trying to solve the same kind of thing I did - Your best bet (to get actual authorship information) is to work with your domain IT administrators to get auditing setup.

How to do transactions using on google cloud datastore using google-cloud-ndb library in python3

Hi I am developing an application in python3 using the flask frame work that is going to run on appengine standard that uses cloud datastore for persistence
I want to perform transactions
so I tried the following i
#ndb.transactional()
def update_user(req_data):
print("running for req")
print(req_data)
query = TestUser.query(ndb.AND(TestUser.age=="1"))
with client.context():
result = query.get()
if result.name == "the one":
print("not writing")
return
else:
print(result.name+ " is not equal to 'the one'")
print(result.name)
result.name = req_data["name"]
result.put()
print("transaction ended")
#app.route('/test_req',methods=['POST'])
def test_req_handler():
req_data = request.get_json()
update_user(req_data)
print(req_data)
return "ok"
In the local development environment when I hit the handler /test_req I am getting the following error
\lib\site-packages\google\cloud\ndb\context.py", line 72, in get_context
raise exceptions.ContextError()
google.cloud.ndb.exceptions.ContextError: No current context. NDB calls must be made in context established by google.cloud.ndb.Client.context.
when I remove the #ndb.transactional() decorator entities gets updated and there is no error

In the original Cloud Datastore the only queries allowed in transactions are ancestor queries. The emulator is still enforcing this restriction. This change is noted in https://cloud.google.com/datastore/docs/upgrade-to-firestore .

i want to list all Resource in organisations using python function in google cloud

I went through the official docs of google cloud but I don't have an idea how to use these to list resources of specific organization by providing the organization id
organizations = CloudResourceManager.Organizations.Search()
projects = emptyList()
parentsToList = queueOf(organizations)
while (parent = parentsToList.pop()) {
// NOTE: Don't forget to iterate over paginated results.
// TODO: handle PERMISSION_DENIED appropriately.
projects.addAll(CloudResourceManager.Projects.List(
"parent.type:" + parent.type + " parent.id:" + parent.id))
parentsToList.addAll(CloudResourceManager.Folders.List(parent))
}
organizations = CloudResourceManager.Organizations.Search()
projects = emptyList()
parentsToList = queueOf(organizations)
while (parent = parentsToList.pop()) {
// NOTE: Don't forget to iterate over paginated results.
// TODO: handle PERMISSION_DENIED appropriately.
projects.addAll(CloudResourceManager.Projects.List(
"parent.type:" + parent.type + " parent.id:" + parent.id))
parentsToList.addAll(CloudResourceManager.Folders.List(parent))
}

You can use Cloud Asset Inventory for this. I wrote this code for performing a sink in BigQuery.
import os
from google.cloud import asset_v1
from google.cloud.asset_v1.proto import asset_service_pb2
def asset_to_bq(request):
client = asset_v1.AssetServiceClient()
parent = 'organizations/{}'.format(os.getEnv('ORGANIZATION_ID'))
output_config = asset_service_pb2.OutputConfig()
output_config.bigquery_destination.dataset = 'projects/{}}/datasets/{}'.format(os.getEnv('PROJECT_ID'),
os.getEnv('DATASET'))
output_config.bigquery_destination.table = 'asset_export'
output_config.bigquery_destination.force = True
response = client.export_assets(parent, output_config)
# For waiting the finish
# response.result()
# Do stuff after export
return "done", 200
if __name__ == "__main__":
asset_to_bq('')
Be careful is you use it, the sink must be done in an empty/not existing table or set the force to true.
In my case, some minutes after the Cloud Scheduler that trigger my function and extract the data to BigQuery, I have a Scheduled Query into BigQuery that copy the data to another table, for keeping the history.
Note: It's also possible to configure an extract in Cloud Storage if you prefer.
I hope that is a starting point for you and for achieving what do you want to do.

I am able to list the project but I also want to list the folder and resources under folder and folder.name and tags and i also want to specify the organization id to resources information from a specific organization
import os
from google.cloud import resource_manager
def export_resource (organizations):
client = resource_manager.Client()
for project in client.list_projects():
print("%s, %s" % (project.project_id, project.status))

Jenkins Python API returns HTML

I'm trying to write a Python script to talk to my instance of Jenkins. I am using the newest version of the jenkinsapi module and querying Jenkins 1.509.3.
I can get a job list like follows:
l=j.get_jobs_list()
where j is an instance of jenkinsapi.Jenkins (I used the requester from jenkinsapi.utils.requester to skip ssl verification)
However, when I try to get more information on an individual job with
j.get_job(l[0])
it fails with this error: Inappropriate content found at [some_address] and what is returned is a bunch of HTML (that looks like the starting page for my instance, the one you see when you log in) instead of anything that should look like the response. Pasting [some_address] into the browser gives me what I expect as a response.
While I can get some information on the Jenkins instance, what I am really interested in is info on individual jobs. Any ideas how to fix it and get the job info?

Using python 3.6, python-jenkins 1.0.1 and Jenkins 2.121.1, following works nicely:
import pprint
import jenkins
IP = 'localhost'
USERNAME = 'my_username'
PW = 'my_password'
def get_version(server):
user = server.get_whoami()
version = server.get_version()
print('Hello %s from Jenkins %s' % (user['fullName'], version))
def get_jobs(server):
jobs = server.get_jobs() # List[dict]
print("Here are top 5 jobs")
pprint(jobs[:5])
return jobs
def get_job(server, job_name):
job_config = server.get_job_config(job_name) # XML
job_info = server.get_job_info(job_name) # dict
print("\n --- JOB CONFIG --- ")
print(job_config)
print("\n --- JOB INFO --- ")
pprint(job_info)
if __name__ == "__main__":
_server = jenkins.Jenkins(IP, username=USERNAME, password=PW)
get_version(_server)
_jobs = get_jobs(_server)
get_job(_server, _jobs[0]['name'])
Jenkins API I was using is documented here: https://python-jenkins.readthedocs.io/en/latest/index.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Algolia search python client to upload indexes behind a firewall - python-3.x

Related

400 Caller's project doesn't match parent project

Getting owner of file from smb share, by using python on linux

How to do transactions using on google cloud datastore using google-cloud-ndb library in python3

i want to list all Resource in organisations using python function in google cloud

Jenkins Python API returns HTML

Categories

Resources