kdb/q: Query multiple handles with hopen

kdb/q: Query multiple handles with hopen - handle

I would like to be able to query several handles at once, where the tables have the same format like:
handles: 8000,8001,8003
tables: foo
Want to do something like:
x:hopen `8000`8001`8003
x select from foo col1,col2
So i get rows from each foo table on each handle.
Is there a way to achieve this?
Thank you

Use 'each' to hopen each handle
q)h:hopen each 8000 8001 8002
q)h
476 480 484i
Use apply each-left to send the same query to each server
q)r:h#\:"select col1,col2 from foo"
q)r
+`col1`col2!(1 2;2 3)
+`col1`col2!(1 2;2 3)
+`col1`col2!(1 2;2 3)
Then you'll have to raze the result:
q)show res:raze r
col1 col2
---------
1 2
2 3
1 2
2 3
1 2
2 3

If you are not planning to reuse the handles, you can do
q)raze`::8000`::8001`::8003#\:"select from foo col1,col2"

Same as other answers, but more complicated using set (neg h) rather than get (h)
The cookbook/load-balancing helps in this example, too.
q)h:hopen each 8000 8001 8002
q)h
476 480 484i
q)r:(0#0i)!() /dictionary of handles and results
Set the callback for the response from the servers
q).z.ps:{#[`r;.z.w;:;x]}
Send a "set" query to each handle
q)(neg h)#\:({(neg .z.w)value"select col1,col2 from foo"};`)
Wait until all messages have a response
q)h .\:()
Finally, putting the result together
q)raze r h
The only advantage is concurrency.
As #AlexanderBelopolsky pointed out, don't forget
q)hclose each h

Related

running for loop until arbitrary index (python 3.x)

So I have these strings that I split by spaces (' ') and I just rolled them into a single list I called 'keyLabelRun'
so it looks like this:
keyLabelRun[0-12]:
0 OS=Dengue
1 virus
2 3
3 PE=4
4 SV=1
5 Split=0
6
7 OS=Bacillus
8 subtilis
9 XF-1
10 GN=opuBA
11 PE=4
12 SV=1
I only want the elements that include and are after "OS=", anything else, whether it be "SV=" or "PE=" etc. I want to skip over those elements until I get to the next "OS="
The number of elements to the next "OS=" is arbitrary so that's where I'm having the problem.
This is what I'm currently trying:
OSarr = []
for i in range(len(keyLabelrun)):
if keyLabelrun[i].count('OS='):
OSarr.append(keyLabelrun[i])
if keyLabelrun[i+1].count('=') != 1:
continue
But the elements where "OS=" is not included is what is tripping me up I think.
Also at the end I'm going to join them all back together in their own elements but I feel like I will be able to handle that after this.
In my attempt, I am trying to append all elements I'm looking for in order to an new list 'OSarr'
If anyone can lend a hand, it would be much appreciated.
Thank you.
These list of strings came from a dataset that is a text file in the form:
>tr|W0FSK4|W0FSK4_9FLAV Genome polyprotein (Fragment) OS=Dengue virus 3 PE=4 SV=1 Split=0
MNNQRKKTGKPSINMLKRVRNRVSTGSQLAKRFSKGLLNGQGPMKLVMAFIAFLRFLAIPPTAGVLARWGTFKKSGAIKVLKGFKKEISNMLSIINKRKKTSLCLMMILPAALAFHLTSRDGEPRMIVGKNERGKSLLFKTASGINMCTLIAMDLGEMCDDTVTYKCPHITEVEPEDIDCWCNLTSTWVTYGTCNQAGEHRRDKRSVALAPHVGMGLDTRTQTWMSAEGAWRQVEKVETWALRHPGFTILALFLAHYIGTSLTQKVVIFILLMLVTPSMTMRCVGVGNRDFVEGLSGATWVDVVLEHGGCVTTMAKNKPTLDIELQKTEATQLATLRKLCIEGKITNITTDSRCPTQGEATLPEEQDQNYVCKHTYVDRGWGNGCGLFGKGSLVTCAKFQCLEPIEGKVVQYENLKYTVIITVHTGDQHQVGNETQGVTAEITPQASTTEAILPEYGTLGLECSPRTGLDFNEMILLTMKNKAWMVHRQWFFDLPLPWTSGATTETPTWNRKELLVTFKNAHAKKQEVVVLGSQEGAMHTALTGATEIQNSGGTSIFAGHLKCRLKMDKLELKGMSYAMCTNTFVLKKEVSETQHGTILIKVEYKGEDVPCKIPFSTEDGQGKAHNGRLITANPVVTKKEEPVNIEAEPPFGESNIVIGIGDNALKINWYKKGSSIGKMFEATARGARRMAILGDTAWDFGSVGGVLNSLGKMVHQIFGSAYTALFSGVSWVMKIGIGVLLTWIGLNSKNTSMSFSCIAIGIITLYLGAVVQADMGCVINWKGKELKCGSGIFVTNEVHTWTEQYKFQADSPKRLATAIAGAWENGVCGIRSTTRMENLLWKQIANELNYILWENNIKLTVVVGDIIGVLEQGKRTLTPQPMELKYSWKTWGKAKIVTAETQNSSFIIDGPNTPECPSVSRAWNVWEVEDYGFGVFTTNIWLKLREVYTQLCDHRLMSAAVKDERAVHADMGYWIESQKNGSWKLEKASLIEVKTCTWPKSHTLWSNGVLESDMIIPKSLAGPISQHNHRPGYHTQTAGPWHLGKLELDFNYCEGTTVVITENCGTRGPSLRTTTVSGKLIHEWCCRSCTLPPLRYMGEDGCWYGMEIRPISEKEENMVKSLVSAGSGKVDNFTMGVLCLAILFEEVMRGKFGKKHMIAGVFFTFVLLLSGQITWRDMAHTLIMIGSNASDRMGMGVTYLALIATFKIQPFLALGFFLRKLTSRENLLLGVGLAMATTLQLPEDIEQMANGIALGLMALKLITQFETYQLWTALISLTCSNTIFTLTVAWRTATLILAGVSLLPVCQSSSMRKTDWLPMAVAAMGVPPLPLFIFGLKDTLKRRSWPLNEGVMAVGLVSILASSLLRNDVPMAGPLVAGGLLIACYVITGTSADLTVEKAADITWEEEAEQTGVSHNLMITVDDDGTMRIKDDETENILTVLLKTALLIVSGIFPYSIPATLLVWHTWQKQTQRSGVLWDVPSPPETQKAELEEGVYRIKQQGIFGKTQVGVGVQKEGVFHTMWHVTRGAVLTYNGKRLEPNWASVKKDLISYGGGWRLSAQWQKGEEVQVIAVEPGKNPKNFQTMPGTFQTTTGEIGAIALDFKPGTSGSPIINREGKVVGLYGNGVVTKNGGYVSGIAQTNAEPDGPTPELEEEMFKKRNLTIMDLHPGSGKTRKYLPAIVREAIKRRLRTLILAPTRVVAAEMEEALKGLPIRYQTTATKSEHTGREIVDLMCHATFTMRLLSPVRVPNYNLIIMDEAHFTDPASIAARGYISTRVGMGEAAAIFMTATPPGTADAFPQSNAPIQDEERDIPERSWNSGNEWITDFAGKTVWFVPSIKAGNDIANCLRKNGKKVIQLSRKTFDTEYQKTKLNDWDFVV
>tr|M4KW32|M4KW32_BACIU Choline ABC transporter (ATP-binding protein) OS=Bacillus subtilis XF-1 GN=opuBA PE=4 SV=1 Split=0
MLTLENVSKTYKGGKKAVNNVNLKIAKGEFICFIGPSGCGKTTTMKMINRLIEPSAGKIFIDGENIMDQDPVELRRKIGYVIQQIGLFPHMTIQQNISLVPKLLKWPEQQRKERARELLKLVDMGPEYVDRYPHELSGGQQQRIGVLRALAAEPPLILMDEPFGALDPITRDSLQEEFKKLQKTLHKTIVFVTHDMDEAIKLADRIVILKAGEIVQVGTPDDILRNPADEFVEEFIGKERLIQSSSPDVERVDQIMNTQPVTITADKTLSEAIQLMRQERVDSLLVVDDEHVLQGYVDVEIIDQCRKKANLIGEVLHEDIYTVLGGTLLRDTVRKILKRGVKYVPVVDEDRRLIGIVTRASLVDIVYDSLWGEEKQLAALS
>sp|Q8AWH3|SX17A_XENTR Transcription factor Sox-17-alpha OS=Xenopus tropicalis GN=sox17a PE=2 SV=1 Split=0
MSSPDGGYASDDQNQGKCSVPIMMTGLGQCQWAEPMNSLGEGKLKSDAGSANSRGKAEARIRRPMNAFMVWAKDERKRLAQQNPDLHNAELSKMLGKSWKALTLAEKRPFVEEAERLRVQHMQDHPNYKYRPRRRKQVKRMKRADTGFMHMAEPPESAVLGTDGRMCLESFSLGYHEQTYPHSQLPQGSHYREPQAMAPHYDGYSLPTPESSPLDLAEADPVFFTSPPQDECQMMPYSYNASYTHQQNSGASMLVRQMPQAEQMGQGSPVQGMMGCQSSPQMYYGQMYLPGSARHHQLPQAGQNSPPPEAQQMGRADHIQQVDMLAEVDRTEFEQYLSYVAKSDLGMHYHGQESVVPTADNGPISSVLSDASTAVYYCNYPSA

I got it! :D
OSarr = []
G = 0
for i in range(len(keyLabelrun)):
OSarr.append(keyLabelrun[G])
G += 1
if keyLabelrun[G].count('='):
while keyLabelrun[G].count('OS=') != 1:
G+=1
Maybe next time everyone, thank you!

Due to the syntax, you have to keep track of which part (OS, PE, etc) you're currently parsing. Here's a function to extract the species name from the FASTA header:
def extract_species(description):
species_parts = []
is_os = False
for word in description.split():
if word[:3] == 'OS=':
is_os = True
species_parts.append(word[3:])
elif '=' in word:
is_os = False
elif is_os:
species_parts.append(word)
return ' '.join(species_parts)
You can call it when processing your input file, e.g.:
from Bio import SeqIO
for record in SeqIO.parse('input.fa', 'fasta'):
species = extract_species(record.description)

pandas groupby trying to optimse several steps

I've been trying to optimise a bokeh server to calculate live stats by selected country on Covid19.
I found myself repeating a groupby function to calculate new columns and was wondering, having selected the groupby, if I could then apply it in a similar way to .agg() on multiple columns ?
For example:
dfall = pd.DataFrame(db("SELECT * FROM C19daily"))
dfall.set_index(['geoId', 'date'], drop=False, inplace=True)
dfall = dfall.sort_index(ascending=True)
dfall.head()
id date geoId cases deaths auid
geoId date
AD 2020-03-03 70119 2020-03-03 AD 1 0 AD03/03/2020
2020-03-14 70118 2020-03-14 AD 1 0 AD14/03/2020
2020-03-16 70117 2020-03-16 AD 3 0 AD16/03/2020
2020-03-17 70116 2020-03-17 AD 9 0 AD17/03/2020
2020-03-18 70115 2020-03-18 AD 0 0 AD18/03/2020
I need to create new columns based on 'cases' and 'deaths' and applying various functions like cumsum(). Currently I do this the long way
dfall['ccases'] = dfall.groupby(level=0)['cases'].cumsum()
dfall['dpc_cases'] = dfall.groupby(level=0)['cases'].pct_change(fill_method='pad', periods=7)
.....
dfall['cdeaths'] = dfall.groupby(level=0)['deaths'].cumsum()
dfall['dpc_deaths'] = dfall.groupby(level=0)['deaths'].pct_change(fill_method='pad', periods=7)
I tried to optimise the groupby call like this:-
with dfall.groupby(level=0) as gr:
gr = g['cases'].cumsum()...
But the error suggest the class doesn't support this
AttributeError: __enter__
I thought I could use .agg({}) and supply dictionary
g = dfall.groupby(level=0).agg({'cc' : 'cumsum', 'cd' : 'cumsum'})
but that produces another error
pandas.core.base.SpecificationError: nested renamer is not supported
I have plenty of other bits to optimise, I thought this python part would be the easiest and save a few ms!
Could anyone nudge me in the right direction?

To avoid repeating dfall.groupby(level=0) you can just save it in a variable:
gb = dfall.groupby(level=0)
gb_cases = gb['cases']
dfall['ccases'] = gb_cases.cumsum()
dfall['dpc_cases'] = gb_cases.pct_change(fill_method='pad', periods=7)
...
And to run multiple aggregations using a single expression, I think you can use named aggregation. But I have no clue whether it will be more performant or not. Either way, it's better to profile the code and improve the actual bottlenecks.

Foreach message in logstash

I need help, I want to compare 2 or more messages containce kv in logstash
examples :
first message : X < 10=5.4|9=14|36=V|3=9|49=360T_SEP|5=Good|220=p48
second messages : y1 > 8=pap4|10=495|37=d|34=7|49=SEP|220=p48
y2 > 8=pap4|10=495|34=d|34=7|49=SEP|220=p48
iteration 1 : I get two key : 5 and 220
iteration 2 : I check if y1 has not 5 and 220 from x equals 220 to y1 then set in y1 5.
Basically, I want retrieved in each message the key 220 which corresponds to 5
Any Suggestion please.

Unless things have really changed, logstash typically concerns itself with one event at a time. The elapsed filter is one of the only exception where it is considering prior events in the processing.
You could use ruby to create your own cache, or perhaps use the redis inputs and outputs to that effect, but I'd suggest changing the format of the original message to include the data you need.

How to join records in Easytrieve internal SORT?

I've a requirement, where I need to extract 2 types of records from a single input file & join them for EZT report processing.
Currently, I've written an ICETOOL step to perform the extraction followed by the join. The output of the ICETOOL step is fed to the Easytrieve report step.
Extraction card is as below -
SORT FIELDS=(14,07,PD,A)
OUTFILE FNAMES=FILE010,INCLUDE=(25,03,CH,EQ,C'010')
OUTFILE FNAMES=FILE011,INCLUDE=(25,04,CH,EQ,C'011')
OPTION DYNALLOC=(SYSDA,05)
Here is the join card -
SORT FIELDS=(14,07,PD,A)
JOINKEYS F1=FILE010,FIELDS=(14,07,A),SORTED,NOSEQCHK
JOINKEYS F2=FILE011,FIELDS=(14,07,A),SORTED,NOSEQCHK
REFORMAT FIELDS=(F1:14,07,
F2,25,10)
OUTREC BUILD=(1,17,80:X),VTOF
OPTION DYNALLOC=(SYSDA,05)
I'm wondering if it was possible to perform the above SORT/ICETOOL operations within EasyTrive. I've used the Easytrieve internal SORT but it was for the simple extractions. Can we perform the join operation within the Easytrieve?
Note - The idea is to have a single EZT step.

You can make use Synchronized File Processing facility (SFP) in Easytrieve to acheive the task. Read more about it here.
FILE FILE010
KEY1 14 7 N
*
FILE FILE011
KEY2 14 7 N
FIELD1 25 10 A
*
FILE OUTFILE FB(80 0)
OKEY 1 7 N
OFIELD 8 10 A
*
WS-COUNT W 5 N VALUE 0
*
JOB INPUT FILE010 KEY KEY1 FILE011 KEY KEY2 FINISH(DIS)
*
IF EOF FILE010
STOP
END-IF
*
IF MATCHED
OKEY = KEY1
OFIELD = FIELD1
WS-COUNT = WS-COUNT + 1
PUT OUTFILE
END-IF
*
DIS. PROC
DISPLAY 'RECORDS WRITTEN: ' WS-COUNT
END-PROC
Please note,
Above code isn't tested, it's just a draft showing the idea on
file matching using Easytrieve to achieve the task.
Data types to the data items are assumed. You may have to change them suitably.
You may have to define the variable input datasets in the FILE
statement.
You may add more statements within the IF MATCHED condition for the
creation of report.
Hope this helps!

How to determine a formula for execution time given quantitative data, Excel, trendlines, monte carlo simulation

Can I get your help on some Maths and possibly Excel?
I have benchmarked my app increasing the number of iterations and number of obligors recording the time taken in seconds with the following result:
200 400 600 800 1000 1200 1400 1600 1800 2000
20000 15.627681 30.0968663 44.7592684 60.9037558 75.8267358 90.3718977 105.8749983 121.0030672 135.9191249 150.3331682
40000 31.7202111 62.3603882 97.2085204 128.8111731 156.2443206 186.6374271 218.324317 249.2699288 279.6008184 310.9970803
60000 47.0708635 92.4599437 138.874287 186.0576007 231.2181381 280.541207 322.9836878 371.3076757 413.4058622 459.6208335
80000 60.7346238 120.3216303 180.471169 241.668982 300.4283548 376.9639188 417.5231669 482.6288981 554.9740194 598.0394434
100000 76.7535915 150.7479245 227.5125656 304.3908046 382.5900043 451.6034296 526.0730786 609.0358776 679.0268121 779.6887277
120000 90.4174626 179.5511355 269.4099593 360.2934453 448.4387573 537.1406039 626.7325734 727.6132992 807.4767327 898.307638
How can I now come up with a function for T (time taken in seconds) as an expression of number of obligors O and number of iterations I
Thanks

I'm not quite sure of the data involved due to the question construction/presentation.
Assuming you're looking for y = f(x). If you load the data into Excel, you can use the methods SLOPE and INTERCEPT on the data ranges to derive an expression of the form
y = mx+c
and thus a linear function.
If you want a quadratic or cubic, you can use LINEST with a column of time data squared/cubed etc. to give you quadratic/cubic parameters, and thus derive an appropriate higher order function.

Spoke to one of the quants here the function is of the from T = KNO, where T is time, K some constant, N iterations, O obligors.
Rearrange for K = T/(NO), plug this into my sample data, take the average of all sample points, use the Std dev for the error
I did this for my data and get:
T = 3.81524E-06 * N * O (with 1.9% error), this is a pretty good approximation.

Create a chart in Excel, add a trendline, and select to have the equation displayed on the chart.

To clarify: You have tabular data below which you want to fit to some function f(O,I)=t?
200 400 600 800 1000 1200 1400 1600 1800 2000
20000 15.627681 30.0968663 44.7592684 60.9037558 75.8267358 90.3718977 105.8749983 121.0030672 135.9191249 150.3331682
40000 31.7202111 62.3603882 97.2085204 128.8111731 156.2443206 186.6374271 218.324317 249.2699288 279.6008184 310.9970803
60000 47.0708635 92.4599437 138.874287 186.0576007 231.2181381 280.541207 322.9836878 371.3076757 413.4058622 459.6208335
80000 60.7346238 120.3216303 180.471169 241.668982 300.4283548 376.9639188 417.5231669 482.6288981 554.9740194 598.0394434
100000 76.7535915 150.7479245 227.5125656 304.3908046 382.5900043 451.6034296 526.0730786 609.0358776 679.0268121 779.6887277
120000 90.4174626 179.5511355 269.4099593 360.2934453 448.4387573 537.1406039 626.7325734 727.6132992 807.4767327 898.307638
A rough guess looks like both O & I are linear. So f is in the form t = aO + bI + c. Plug in a few (O,I,t) and see what a,b,c should be.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

kdb/q: Query multiple handles with hopen - handle

If you are not planning to reuse the handles, you can do q)raze`::8000`::8001`::8003#\:"select from foo col1,col2"

Related

running for loop until arbitrary index (python 3.x)

pandas groupby trying to optimse several steps

Foreach message in logstash

How to join records in Easytrieve internal SORT?

How to determine a formula for execution time given quantitative data, Excel, trendlines, monte carlo simulation

Categories

Resources