Elastic Search via python gives wrong count

Elastic Search via python gives wrong count - python-3.x

I’m new to python and I need to get connected to “Kibana” via python. we’re using Kibana 7.4.1. The requirement is to get them just the count (hits).
Due to some restrictions, I need to use Python 3.6 only. I’ve added the “ElasticSearch” & “ElasticSearch-dsl” library.
I’m able to get connected to the Kibana via the client, but I’m getting the wrong hits count.
Code:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import MultiSearch, Search
from elasticsearch_dsl.query import QueryString, Range, SimpleQueryString
es = Elasticsearch(['host2', 'host2'], http_auth=('usr', 'pass'), port=9200)
s = Search(using=es, index='c*')
s.filter(SimpleQueryString(query="tags:prod AND severity:INFO AND service: finder AND msg:* is processed"))
s.filter(Range(** {'#timestamp': {'gte': 'now-5m', 'lt': 'now'}}))
response = s.execute()
print("Got %d Hits:" % response['hits']['total']['value']) # Always coming as 1000 so this is wrong
Can I get some help with this, please?

First of all a little clarification. You are connecting to Elasticsearch and not Kibana (Kibana is a client, like the program you are writing).
You are receiving always 10000 as result, because your index has more than 10000 hits. It is a documented feature. Indeed, since the count computation is expensive in the general case it is performed only when needed. In order to obtain the right number of results you have two possibilities
to set the query parameter track_total_hits to true
use the count API.
track_total_hits
You can add this extra parameter to the search object as reported here as follows:
s = Search(using=es, index='c*')
s = s.extra(track_total_hits=True)
<the-rest of your code>
Count API approach
Instead of invoking the execute() function, you can simply use the count() function:
s = Search(using=es, index='c*')
s.filter(SimpleQueryString(query="tags:prod AND severity:INFO AND service: finder AND msg:* is processed"))
s.filter(Range(** {'#timestamp': {'gte': 'now-5m', 'lt': 'now'}}))
response = s.cpunt()
print("Got %d Hits:" % response)
Kind regards

Related

Need to set JMS_IBM_LAST_MSG_IN_GROUP property to true for IBM MQ testing using JMeter

I am testing IBM MQ using JMeter and able to establish the connection with queue to send requests over it. However, I need to set "JMS_IBM_LAST_MSG_IN_GROUP" property as true for one of the message but unable to do so. I am using below piece of code while sending request or trying to set the property to true but it remains set to it's default value i.e. false when I am checking in backend. Any clue what I am missing here.
Note: Connection is being established in another sampler and making use of that connection here. This code is working fine to send any request, just that property is not getting set to true.
import java.time.Instant
import com.ibm.msg.client.jms.JmsConstants
def sess = System.getProperties().get("Session")
def destination = System.getProperties().get("Destination")
def producer = sess.createProducer(destination)
def rnd = new Random(System.currentTimeMillis())
def payload = String.format('${groupid}|${sequencenumber}|rest of the payload|')
def msg = sess.createTextMessage(payload)
println('Payload --> ' + payload)
msg.setBooleanProperty(JmsConstants.JMS_IBM_LAST_MSG_IN_GROUP,true)
def start = Instant.now()
producer.send(msg)
def stop = Instant.now()
producer.close()
SampleResult.setResponseData(msg.toString())
SampleResult.setDataType( org.apache.jmeter.samplers.SampleResult.TEXT)
SampleResult.setLatency( stop.toEpochMilli() - start.toEpochMilli())

Your code does not include anything to set the Group ID or sequence number. I assume we have all the relevant code shown, in which case, I think you are missing code something along these lines:
msg.setStringProperty("JMSXGroupID", groupid);
msg.setIntProperty("JMSXGroupSeq", sequencenumber);

As per JMS_IBM_LAST_MSG_IN_GROUP chapter of IBMMQ documentation
This property is ignored in the publish/subscribe domain and is not relevant when an application connects to a service integration bus.
In general it's not necessary to use this property, you can come up with your own custom one, i.e. for the producer:
msg.setBooleanProperty("X_CUSTOM_PROPERTY_LAST_MESSAGE",true)
and for the consumer:
msg.getBooleanProperty("X_CUSTOM_PROPERTY_LAST_MESSAGE")
More information: IBM MQ testing with JMeter - Learn How

Sharing as it might help someone else. Was able to set the property to true by making below changes, rest of the code is same as mentioned in original question
import com.ibm.msg.client.wmq.WMQConstants
def gid=String.format('${groupid}')
def msg = sess.createTextMessage()
println('Payload --> ' + payload)
msg.setStringProperty(WMQConstants.JMSX_GROUPID,gid)
msg.setBooleanProperty(WMQConstants.JMS_IBM_LAST_MSG_IN_GROUP,true)
msg.text=payload

Algolia search python client to upload indexes behind a firewall

Building a search with Algolia search product for our Moodle site.
To update the index at Algolia site we want to use some automated process (upload the index data).
I decided to start with the Python client (Python v 3.9.x). Since we are working inside a corporate network we are behind the firewalls and there is a problem accessing algolia's servers.
I'm testing with two methods:
Search within already existing indices: index.search
update all indexes: index.replace_all_objects
Error message: 'Unreachable hosts'
Here is my code snippets:
# Search:
searchAppID = constants.ALGOLIA_APP_ID
searchKeyHash = constants.ALGOLIA_SEARCH_KEY
config = SearchConfig(searchAppID, searchKeyHash)
config.connect_timeout = 2
config.read_timeout = 5
config.write_timeout = 30
client = SearchClient.create_with_config(config)
index = client.init_index('myIndex')
result = index.search('SearchString')
# function to update index using json file:
def replace_IdxObj(client,index,fileName):
try:
if (index.exists()):
with open(fileName, encoding="utf8") as f:
data = json.load(f)
result = index.replace_all_objects(data, { 'autoGenerateObjectIDIfNotExist' : True })
return result
else:
print('No Index found! Cancel operation... \n')
return None
except Exception as err:
print ("Error: " + str(err))
When I test my code outside of the company's network - it's all ok!
Also I have a SSL certificate that could be used to work with external resources, so I used it in the following test (running from company's network behind the firewall):
response = requests.get('https://google.com/', verify=("C:\\Program Files\\Common Files\\SSL\\certs\\cert.cer"))
print(response)
This test works just fine (I'm getting 200 response)!
Please advise are there any ways to make Python process (which is used by algolia's python client) to use the certificate that I have ??? Any alternatives?
Many thanks!

Getting owner of file from smb share, by using python on linux

I need to find out for a script I'm writing who is the true owner of a file in an smb share (mounted using mount -t cifs of course on my server and using net use through windows machines).
Turns out it is a real challenge finding this information out using python on a linux server.
I tried using many many smb libraries (such as smbprotocol, smbclient and others), nothing worked.
I find few solutions for windows, they all use pywin32 or another windows specific package.
And I also managed to do it from bash using smbcalcs but couldn't do it cleanly but using subprocess.popen('smbcacls')..
Any idea on how to solve it?

This was unbelievably not a trivial task, and unfortunately the answer isn't simple as I hoped it would be..
I'm posting this answer if someone will be stuck with this same problem in the future, but hope maybe someone would post a better solution earlier
In order to find the owner I used this library with its examples:
from smb.SMBConnection import SMBConnection
conn = SMBConnection(username='<username>', password='<password>', domain=<domain>', my_name='<some pc name>', remote_name='<server name>')
conn.connect('<server name>')
sec_att = conn.getSecurity('<share name>', r'\some\file\path')
owner_sid = sec_att.owner
The problem is that pysmb package will only give you the owner's SID and not his name.
In order to get his name you need to make an ldap query like in this answer (reposting the code):
from ldap3 import Server, Connection, ALL
from ldap3.utils.conv import escape_bytes
s = Server('my_server', get_info=ALL)
c = Connection(s, 'my_user', 'my_password')
c.bind()
binary_sid = b'....' # your sid must be in binary format
c.search('my_base', '(objectsid=' + escape_bytes(binary_sid) + ')', attributes=['objectsid', 'samaccountname'])
print(c.entries)
But of course nothing will be easy, it took me hours to find a way to convert a string SID to binary SID in python, and in the end this solved it:
# posting the needed functions and omitting the class part
def byte(strsid):
'''
Convert a SID into bytes
strdsid - SID to convert into bytes
'''
sid = str.split(strsid, '-')
ret = bytearray()
sid.remove('S')
for i in range(len(sid)):
sid[i] = int(sid[i])
sid.insert(1, len(sid)-2)
ret += longToByte(sid[0], size=1)
ret += longToByte(sid[1], size=1)
ret += longToByte(sid[2], False, 6)
for i in range(3, len(sid)):
ret += cls.longToByte(sid[i])
return ret
def byteToLong(byte, little_endian=True):
'''
Convert bytes into a Python integer
byte - bytes to convert
little_endian - True (default) or False for little or big endian
'''
if len(byte) > 8:
raise Exception('Bytes too long. Needs to be <= 8 or 64bit')
else:
if little_endian:
a = byte.ljust(8, b'\x00')
return struct.unpack('<q', a)[0]
else:
a = byte.rjust(8, b'\x00')
return struct.unpack('>q', a)[0]
... AND finally you have the full solution! enjoy :(

I'm adding this answer to let you know of the option of using smbprotocol; as well as expand in case of misunderstood terminology.
SMBProtocol Owner Info
It is possible to get the SID using the smbprotocol library as well (just like with the pysmb library).
This was brought up in the github issues section of the smbprotocol repo, along with an example of how to do it. The example provided is fantastic and works perfectly. An extremely stripped down version
However, this also just retrieves a SID and will need a secondary library to perform a lookup.
Here's a function to get the owner SID (just wrapped what's in the gist in a function. Including here in case the gist is deleted or lost for any reason).
import smbclient
from ldap3 import Server, Connection, ALL,NTLM,SUBTREE
def getFileOwner(smb: smbclient, conn: Connection, filePath: str):
from smbprotocol.file_info import InfoType
from smbprotocol.open import FilePipePrinterAccessMask,SMB2QueryInfoRequest, SMB2QueryInfoResponse
from smbprotocol.security_descriptor import SMB2CreateSDBuffer
class SecurityInfo:
# 100% just pulled from gist example
Owner = 0x00000001
Group = 0x00000002
Dacl = 0x00000004
Sacl = 0x00000008
Label = 0x00000010
Attribute = 0x00000020
Scope = 0x00000040
Backup = 0x00010000
def guid2hex(text_sid):
"""convert the text string SID to a hex encoded string"""
s = ['\\{:02X}'.format(ord(x)) for x in text_sid]
return ''.join(s)
def get_sd(fd, info):
""" Get the Security Descriptor for the opened file. """
query_req = SMB2QueryInfoRequest()
query_req['info_type'] = InfoType.SMB2_0_INFO_SECURITY
query_req['output_buffer_length'] = 65535
query_req['additional_information'] = info
query_req['file_id'] = fd.file_id
req = fd.connection.send(query_req, sid=fd.tree_connect.session.session_id, tid=fd.tree_connect.tree_connect_id)
resp = fd.connection.receive(req)
query_resp = SMB2QueryInfoResponse()
query_resp.unpack(resp['data'].get_value())
security_descriptor = SMB2CreateSDBuffer()
security_descriptor.unpack(query_resp['buffer'].get_value())
return security_descriptor
with smbclient.open_file(filePath, mode='rb', buffering=0,
desired_access=FilePipePrinterAccessMask.READ_CONTROL) as fd:
sd = get_sd(fd.fd, SecurityInfo.Owner | SecurityInfo.Dacl)
# returns SID
_sid = sd.get_owner()
try:
# Don't forget to convert the SID string-like object to a string
# or you get an error related to "0" not existing
sid = guid2hex(str(_sid))
except:
print(f"Failed to convert SID {_sid} to HEX")
raise
conn.search('DC=dell,DC=com',f"(&(objectSid={sid}))",SUBTREE)
# Will return an empty array if no results are found
return [res['dn'].split(",")[0].replace("CN=","") for res in conn.response if 'dn' in res]
to use:
# Client config is required if on linux, not if running on windows
smbclient.ClientConfig(username=username, password=password)
# Setup LDAP session
server = Server('mydomain.com',get_info=ALL,use_ssl = True)
# you can turn off raise_exceptions, or leave it out of the ldap connection
# but I prefer to know when there are issues vs. silently failing
conn = Connection(server, user="domain\username", password=password, raise_exceptions=True,authentication=NTLM)
conn.start_tls()
conn.open()
conn.bind()
# Run the check
fileCheck = r"\\shareserver.server.com\someNetworkShare\someFile.txt"
owner = getFileOwner(smbclient, conn, fileCheck)
# Unbind ldap session
# I'm not clear if this is 100% required, I don't THINK so
# but better safe than sorry
conn.unbind()
# Print results
print(owner)
Now, this isn't super efficient. It takes 6 seconds for me to run this one a SINGLE file. So if you wanted to run some kind of ownership scan, then you probably want to just write the program in C++ or some other low-level language instead of trying to use python. But for something quick and dirty this does work. You could also setup a threading pool and run batches. The piece that takes longest is connecting to the file itself, not running the ldap query, so if you can find a more efficient way to do that you'll be golden.
Terminology Warning, Owner != Creator/Author
Last note on this. Owner != File Author. Many domain environments, and in particular SMB shares, automatically alter ownership from the creator to a group. In my case the results of the above is:
What I was actually looking for was the creator of the file. File creator and modifier aren't attributes which windows keeps track of by default. An administrator can enable policies to audit file changes in a share, or auditing can be enabled on a file-by-file basis using the Security->Advanced->Auditing functionality for an individual file (which does nothing to help you determine the creator).
That being said, some applications store that information for themselves. For example, if you're looking for Excel this answer provides a method for which to get the creator of any xls or xlsx files (doesn't work for xlsb due to the binary nature of the files). Unfortunately few files store this kind of information. In my case I was hoping to get that info for tblu, pbix, and other reporting type files. However, they don't contain this information (which is good from a privacy perspective).
So in case anyone finds this answer trying to solve the same kind of thing I did - Your best bet (to get actual authorship information) is to work with your domain IT administrators to get auditing setup.

Dump series back into InfluxDB after querying with replaced field value

Scenario
I want to send data to an MQTT Broker (Cloud) by querying measurements from InfluxDB.
I have a field in the schema which is called status. It can either be 1 or 0. status=0 indicated that series has not been sent to the cloud. If I get an acknowlegdment from the MQTT Broker then I wish to rewrite the query back into the database with status=1.
As mentioned in FAQs for InfluxDB regarding Duplicate data If the information has the same timestamp as the previous query but with a different field value => then the update field will be shown.
In order to test this I created the following:
CREATE DATABASE dummy
USE dummy
INSERT meas_1, type=t1, status=0,value=123 1536157064275338300
query:
SELECT * FROM meas_1
provides
time status type value
1536157064275338300 0 t1 234
now if I want to overwrite the series I do the following:
INSERT meas_1, type=t1, status=1,value=123 1536157064275338300
which will overwrite the series
time status type value
1536157064275338300 1 t1 234
(Note: this is not possible via Tags currently in InfluxDB)
Usage
Query some information using the client with "status"=0.
Restructure JSON to be sent to the cloud
Send the information to cloud
If successful then write the output from Step 1. back into the DB but with status=1.
I am using the InfluxDBClient Python3 to create the Application (MQTT + InfluxDB)
Within the write_points API there is a parameter which mentions batch_size which require int as input.
I am not sure how can I use this with the Application that I want. Can someone guide me with this or with the Schema of the DB so that I can upload actual and non-redundant information to the cloud ?

The batch_size is actually the length of the list of the measurements that needs to passed to write_points.
Steps
Create client and query from measurement (here, we query gps information)
client = InfluxDBClient(database='dummy')
op = client.query('SELECT * FROM gps WHERE "status"=0', epoch='ns')
Make the ResultSet into a list:
batch = list(op.get_points('gps'))
create an empty list for update
updated_batch = []
parse through each measurement and change the status flag to 1. Note, default values in InfluxDB are float
for each in batch:
new_mes = {
'measurement': 'gps',
'tags': {
'type': 'gps'
},
'time': each['time'],
'fields': {
'lat': float(each['lat']),
'lon': float(each['lon']),
'alt': float(each['alt']),
'status': float(1)
}
}
updated_batch.append(new_mes)
Finally dump the points back via the client with batch_size as the length of the updated_batch
client.write_points(updated_batch, batch_size=len(updated_batch))
This overwrites the series because it contains the same timestamps with status field set to 1

Firebase... Add/Update Firebase Using node.js Script

I have arbitrary JSON that is sensibly laid out like this:
[
{
"id":100,
"name":"Buckeye, AZ",
"status":"OPEN",
"address":{
"street":"416 S Watson RD",
"city":"Buckeye"
...
}
}
]
I've written a node.js script like this for proof of concept (why I'm using node is that the JS API seems better supported than REST or Ruby for this. I could be wrong):
http = require('http')
Firebase = require('firebase')
all_sites_url = "http://supercharge.info/service/supercharge/allSites"
firebase_url = "https://tesla-supercharger.firebaseio.com/"
http.get(all_sites_url, (res) ->
body = ""
res.on "data", (chunk) ->
body += chunk
return
res.on "end", ->
response = JSON.parse(body)
all_sites = response
send_to_firebase(response)
return
return
).on "error", (e) ->
console.log "Got error: ", e
return
send_to_firebase = (response) ->
firebase_ref = new Firebase(firebase_url)
for charger in response
console.log charger
new_child = firebase_ref.push()
new_child.set {id: charger.id, data: charger}, (error) ->
if error
console.log "Data cound not be saved #{error}"
else
console.log "Data saved successfully"
The result is a unique id generated by Firebase, which has as a child a data and an id child. The data child has the expected information like name, status, etc.
What I'd prefer is to generate a key-value pair. E.g., for an id of 100:
- 100
- name
- address
street
city
etc. So my first question is how to accomplish this or if it is even sensible.
After the first time around, this data (call it the data from an external server) will be there and a mobile app will have added some fields. These are not present in the data already there. Next time I fetch data from the external server, I want to update things that have changed that the server would know about, like status. I don't want to tamper with things that only the mobile devices would know about like remote_observations.
I know I'm seeming a bit dense here, but I'm trying to put together a sensible data model that will be updatable from that server using a CRON job and incrementally updatable from a bunch of mobile devices.
Any help is much appreciated.
UPDATE: I have found that this works for getting the structure I want:
send_to_firebase = (response) ->
firebase_ref = new Firebase(firebase_url)
for charger in response
firebase_ref.child(charger.id).update charger, (error) ->
if error
console.log "Data could not be saved #{error}"
else
responses_pending += 1
console.log "Data saved successfully : #{responses_pending} pending"
firebase_ref.on 'value', ->
console.log "value received rp=#{responses_pending}"
process.exit() if (responses_pending -= 1) < 1

So the code I settled on is this:
http = require('http')
Firebase = require('firebase')
firebase_url = '/path/to/your/firebase'
# code to get JSON of the form:
{
"id":100,
"name":"Buckeye, AZ",
"status":"OPEN",
"address":{"street":"416 S Watson RD",
"city":"Buckeye",
"state":"AZ",
"zip":"85326",
"country":"USA"},
... etc.
}
# Asynchronous get of JSON hash from some server or other.
get_my_fine_JSON().on 'complete', (response) ->
send_to_firebase(response)
send_to_firebase = (response) ->
firebase_ref = new Firebase(firebase_url)
length = response.length
for charger in response
firebase_ref.child(charger.id).update charger, (error) ->
if error
console.log "Data could not be saved #{error}"
else
console.log "Data saved successfully"
process.exit() if length -= 1 is 0
Discussion:
The idea was to have a Firebase structure like this:
- 100
- address
street: "123 Main Street"
etc.
That's reason 1 why id is pulled up to be the primary key. Reason 2 is so that I can uniquely identify an object pulled off the external server as the "same" one in my Firebase and apply any updates necessary.
Epiphany 1: Update is more like upsert. If the key is there, whatever hash you supply replaces matching values. If it's not there, then Firebase happily adds it. Which is way cool because it covers both the push and patch cases.
Epiphany 2: This process will hang waiting for events if nothing tells it to stop. That's why the countdown index, length is decremented until the code has upserted (for lack of a better term) each item.
Observation 1: Doing this in node.js is super fast compared with REST using Python or Ruby. And this upsert stuff is wicked cool if I'm understanding it right.
Observation 2: There isn't a ton of wisdom out there as of this writing regarding writing node shell scripts to do this kind of stuff. Maybe it's a good idea, maybe a bad one. I don't know.
Observation 3: Because of the asynchronous nature of node and the Firebase Javascript API (both GOOD THINGs), terminating a process before the last bit is done can be tricky because your process has to hang on just long enough to complete its last request/response with Firebase. This is, as mentioned before, done in the completion handler of the update. Otherwise we wouldn't necessarily be complete when the process exited.
Caveat 1: Related to observation 2, this could be a bad idea, but I haven't been able to find resources that speak to the problem.
Caveat 2: This could be a horrid abuse or misunderstanding of the Firebase update API. I am reporting observed behavior in the limited case of my specific data. YMMV.
Caveat 3: I'm hoping the process lifetime is as I suggest it is in observation 3.
A note to the decaffeinated: The Javascript for this is so trivially different that it shouldn't be too tough to translate. Or go to js2coffee and paste the Coffeescript into the right pane to get real Javascript in the left pane that you can tune.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string