Bash select random string from list [duplicate] - string

This question already has answers here:
How to select a random item from an array in shell
(5 answers)
Closed 4 years ago.
I have a list of strings that I want to select one of them at random each time I launch the script.
For example
SDF_BCH_CB="file1.sdf"
SDF_BCH_CW="file2.sdf"
SDF_BCH_RCB="file3.sdf"
SDF_BCH_RCW="file4.sdf"
SDF_TT="file5.sdf"
Then I want to be randomly select from the above list to assign the following two variables.
SDFFILE_MIN=$SDF_BCH_CW
SDFFILE_MAX=$SDF_TT
How can I do this ?
Thanks

Store in array, count and random select:
#!/bin/bash
array[0]="file1.sdf"
array[1]="file2.sdf"
array[2]="file3.sdf"
array[3]="file4.sdf"
size=${#array[#]}
index=$(($RANDOM % $size))
echo ${array[$index]}

Use the built in $RANDOM function and arrays.
DF_BCH_CB="file1.sdf"
SDF_BCH_CW="file2.sdf"
SDF_BCH_RCB="file3.sdf"
SDF_BCH_RCW="file4.sdf"
SDF_TT="file5.sdf"
ARRAY=($DF_BCH_CB $SDF_BCH_CW $SDF_BCH_RCB $SDF_BCH_RCW $SDF_TT)
INDEX=(0 1 2 3 4)
N1=$((RANDOM % 5))
SDFFILE_MIN=${ARRAY[$N1]}
N2=$((RANDOM % 4))
if [ "$N2" = "$N1" ] ; then
N2=$N1+1
fi
SDFFILE_MAX=${ARRAY[$N2]}
echo $SDFFILE_MIN
echo $SDFFILE_MAX
This gets two different strings for SDFFILE_MIN and SDFFILE_MAX. If they don't have to be different, remove the if statement in the middle.

Related

kdb/q: How to apply a string manipulation function to a vector of strings to output a vector of strings?

Thanks in advance for the help. I am new to kdb/q, coming from a Python and C++ background.
Just a simple syntax question: I have a string with fields and their corresponding values
pp_str: "field_1:abc field_2:xyz field_3:kdb"
I wrote an atomic (scalar) function to extract the value of a given field.
get_field_value: {[field; pp_str] pp_fields: " " vs pp_str; pid_field: pp_fields[where like[pp_fields; field,":*"]]; start_i: (pid_field[0] ss ":")[0] + 1; end_i: count pid_field[0]; indices: start_i + til (end_i - start_i); pid_field[0][indices]}
show get_field_value["field_1"; pp_str]
"abc"
show get_field_value["field_3"; pp_str]
"kdb"
Now how do I generalize this so that if I input a vector of fields, I get a vector of values? I want to input ("field_1"; "field_2"; "field_3") and output ("abc"; "xyz"; "kdb"). I tried multiple approaches (below) but I just don't understand kdb/q's syntax well enough to vectorize my function:
/ Attempt 1 - Fail
get_field_value[enlist ("field_1"; "field_2"); pp_str]
/ Attempt 2 - Fail
get_field_value[; pp_str] /. enlist ("field_1"; "field_3")
/ Attempt 3 - Fail
fields: ("field_1"; "field_2")
get_field_value[fields; pp_str]
To run your function for each you could project the pp_str variable and use each for the others
q)get_field_value[;pp_str]each("field_1";"field_3")
"abc"
"kdb"
Kdb actually has built-in functionality to handle this: https://code.kx.com/q/ref/file-text/#key-value-pairs
q){#[;x](!/)"S: "0:y}[`field_1;pp_str]
"abc"
q)
q){#[;x](!/)"S: "0:y}[`field_1`field_3;pp_str]
"abc"
"kdb"
I think this might be the syntax you're looking for.
q)get_field_value[; pp_str]each("field_1";"field_2")
"abc"
"xyz"

Eliminate one list according to another list in Python

I have two dimensional list like that
x_irp_group = [['x1_1_4', 'x1_2_4', 'x1_3_4', 'x1_4_4', 'x1_5_4', 'x1_6_4', 'x1_7_4', 'x1_8_4', 'x1_9_4', 'x1_10_4', 'x1_1_5', 'x1_2_5', 'x1_3_5', 'x1_4_5', 'x1_5_5', 'x1_6_5', 'x1_7_5', 'x1_8_5', 'x1_9_5', 'x1_10_5', 'x1_1_6', 'x1_2_6', 'x1_3_6', 'x1_4_6', 'x1_5_6', 'x1_6_6', 'x1_7_6', 'x1_8_6', 'x1_9_6', 'x1_10_6', 'x1_1_7', 'x1_2_7', 'x1_3_7', 'x1_4_7', 'x1_5_7', 'x1_6_7', 'x1_7_7', 'x1_8_7', 'x1_9_7', 'x1_10_7', 'x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8'], ['x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8', 'x1_1_9', 'x1_2_9', 'x1_3_9', 'x1_4_9', 'x1_5_9', 'x1_6_9', 'x1_7_9', 'x1_8_9', 'x1_9_9', 'x1_10_9', 'x1_1_10', 'x1_2_10', 'x1_3_10', 'x1_4_10', 'x1_5_10', 'x1_6_10', 'x1_7_10', 'x1_8_10', 'x1_9_10', 'x1_10_10', 'x1_1_11', 'x1_2_11', 'x1_3_11', 'x1_4_11', 'x1_5_11', 'x1_6_11', 'x1_7_11', 'x1_8_11', 'x1_9_11', 'x1_10_11', 'x1_1_12', 'x1_2_12', 'x1_3_12', 'x1_4_12', 'x1_5_12', 'x1_6_12', 'x1_7_12', 'x1_8_12', 'x1_9_12', 'x1_10_12']]
I wanna eliminate this two dimensional list if the elements in another one dimensional list like that
x_irp_eliminated_list = ['x1_1_4', 'x1_1_8', 'x1_1_12', 'x1_1_16', 'x1_1_19', 'x1_1_22', 'x1_1_26', 'x1_1_30', 'x1_1_34', 'x1_1_37', 'x1_1_43', 'x1_1_49', 'x1_1_55', 'x1_1_61', 'x1_1_68', 'x1_1_75', 'x1_1_81', 'x1_1_87', 'x1_1_92', 'x1_1_96', 'x1_1_101', 'x1_1_107', 'x1_1_112', 'x1_1_116', 'x1_1_121', 'x1_1_126', 'x1_1_131', 'x1_1_134', 'x1_1_137', 'x1_1_141', 'x1_1_145', 'x1_1_149', 'x1_1_152', 'x1_1_155', 'x1_1_160', 'x1_1_164', 'x1_1_169', 'x1_1_173', 'x1_1_181', 'x1_1_189', 'x1_1_197', 'x1_1_205', 'x1_2_8', 'x1_2_10', 'x1_2_13', 'x1_2_17', 'x1_2_21', 'x1_2_25', 'x1_2_28', 'x1_2_30', 'x1_2_34', 'x1_2_40', 'x1_2_45', 'x1_2_51', 'x1_2_58', 'x1_2_66', 'x1_2_71', 'x1_2_77', 'x1_2_82', 'x1_2_86', 'x1_2_91', 'x1_2_97', 'x1_2_102', 'x1_2_106', 'x1_2_111', 'x1_2_117', 'x1_2_122', 'x1_2_125', 'x1_2_129', 'x1_2_132', 'x1_2_135', 'x1_2_139', 'x1_2_143', 'x1_2_147', 'x1_2_151', 'x1_2_154', 'x1_2_157', 'x1_2_161', 'x1_2_166', 'x1_2_172', 'x1_2_177', 'x1_2_181', 'x1_2_189', 'x1_2_197', 'x1_2_205', 'x1_2_214', 'x1_3_1', 'x1_3_4', 'x1_3_8', 'x1_3_11', 'x1_3_15', 'x1_3_18', 'x1_3_22', 'x1_3_25', 'x1_3_28', 'x1_3_32', 'x1_3_35', 'x1_3_39', 'x1_3_42', 'x1_3_46', 'x1_3_49', 'x1_3_52', 'x1_3_56', 'x1_3_59', 'x1_3_63', 'x1_3_66', 'x1_3_70', 'x1_3_73', 'x1_3_77', 'x1_3_81', 'x1_3_85', 'x1_3_88', 'x1_3_91', 'x1_3_94', 'x1_3_97', 'x1_3_101', 'x1_3_105', 'x1_3_109', 'x1_3_112', 'x1_3_115', 'x1_3_118', 'x1_3_122', 'x1_3_126', 'x1_3_130', 'x1_3_134', 'x1_3_137', 'x1_3_140', 'x1_3_143', 'x1_3_147', 'x1_3_151', 'x1_3_156', 'x1_3_159', 'x1_3_163']
I write a code like that but it did not work well.
x_final = [i for i, j in zip(x_irp_group, x_irp_eliminated_list) if i == j]
I shorten the lists. Normally their sizes are much bigger than that
the list comprehension you have isn't working because you are zipping the elements together, which isn't what the operation represents (they are not parallel arrays) what you want is something along the lines of:
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_list)]
Note that for a 2d list you may need to nest this like:
# writing normal loops you'd write:
# for row in x_irp_group:
# for i in row:
# if (...):
# so I typically try to indent the loops similarly since nested array comprehension
# gets complicated, honestly I'd likely prefer using generator functions for this anyway
x_final = [[i for i in row
if (i not in x_irp_eliminated_list)
]for row in x_irp_group
]
although know that i not in x_irp_eliminated_list will be very slow for a list, changing it to a set would improve performance:
x_irp_eliminated_set = set(x_irp_eliminated_list)
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_set)]
Or if the lists are trivially sorted, then you could convert them both to sets, do a subtraction then sort it again:
x_final = [ sorted(set(x_irp_group[0]) - set(x_irp_eliminated_list)) ]
although if you have super giant lists this would probably be less desirable.
x_irp_eliminated_list_set = set(x_irp_eliminated_list)
x_last = [i for row in x_irp_group
for i in row
if (i in x_irp_eliminated_list_set)]
print(x_last[:30])
I used this for faster operation. Set approach made it faster. Thanks for that information. I learn one new thing. But it creates one dimensional list. I would like to create two dimensional list like original x_irp_group

Records of each version in list

I have a list of versions
1.0.0.1 - 10
1.1.0.1 - 10
1.2.0.1 - 10
That is 30 nr in my list. But I only want to show the 5 highest nr of each sort:
1.0.0.5 - 10
1.1.0.5 - 10
1.2.0.5 - 10
How can I do that? The last nr can be any number but the 3 first nr is only
1.0.0
1.1.0
1.2.0
CODE:
import groovy.json.JsonSlurperClassic
def data = new URL("http://xxxx.se:8081/service/rest/beta/components?repository=Releases").getText()
/**
* 'jsonString' is the input json you have shown
* parse it and store it in collection
*/
Map convertedJSONMap = new JsonSlurperClassic().parseText(data)
def list = convertedJSONMap.items.version
list
Version numbers alone usually don't make an easy sort. So I'd split them into numbers and work from there. E.g.
def versions = [
"1.0.0.12", "1.1.0.42", "1.2.0.666",
"1.0.0.6", "1.1.0.77", "1.2.0.8",
"1.0.0.23", "1.1.0.5", "1.2.0.5",
]
println(
versions.collect{
it.split(/\./)*.toInteger() // turn into array of integers
}.groupBy{
it.take(2) // group by the first two numbers
}.collect{ _, vs ->
vs.sort().last() // sort the arrays and take the last
}*.join(".") // piece the numbers back together
)
// => [1.0.0.23, 1.1.0.77, 1.2.0.666]

parsing dates from strings

I have a list of strings in python like this
['AM_B0_D0.0_2016-04-01T010000.flac.h5',
'AM_B0_D3.7_2016-04-13T215000.flac.h5',
'AM_B0_D10.3_2017-03-17T110000.flac.h5',
'AM_B0_D0.7_2016-10-21T104000.flac.h5',
'AM_B0_D4.4_2016-08-05T151000.flac.h5',
'AM_B0_D0.0_2016-04-01T010000.flac.h5',
'AM_B0_D3.7_2016-04-13T215000.flac.h5',
'AM_B0_D10.3_2017-03-17T110000.flac.h5',
'AM_B0_D0.7_2016-10-21T104000.flac.h5',
'AM_B0_D4.4_2016-08-05T151000.flac.h5']
I want to parse only the date and time (for example, 2016-08-05 15:10:00 )from these strings.
So far I used a for loop like the one below but it's very time consuming, is there a better way to do this?
for files in glob.glob("AM_B0_*.flac.h5"):
if files[11]=='_':
year=files[12:16]
month=files[17:19]
day= files[20:22]
hour=files[23:25]
minute=files[25:27]
second=files[27:29]
tindex=pd.date_range(start= '%d-%02d-%02d %02d:%02d:%02d' %(int(year),int(month), int(day), int(hour), int(minute), int(second)), periods=60, freq='10S')
else:
year=files[11:15]
month=files[16:18]
day= files[19:21]
hour=files[22:24]
minute=files[24:26]
second=files[26:28]
tindex=pd.date_range(start= '%d-%02d-%02d %02d:%02d:%02d' %(int(year), int(month), int(day), int(hour), int(minute), int(second)), periods=60, freq='10S')
Try this (based on the 2nd last '-', no need of if-else case):
filesall = ['AM_B0_D0.0_2016-04-01T010000.flac.h5',
'AM_B0_D3.7_2016-04-13T215000.flac.h5',
'AM_B0_D10.3_2017-03-17T110000.flac.h5',
'AM_B0_D0.7_2016-10-21T104000.flac.h5',
'AM_B0_D4.4_2016-08-05T151000.flac.h5',
'AM_B0_D0.0_2016-04-01T010000.flac.h5',
'AM_B0_D3.7_2016-04-13T215000.flac.h5',
'AM_B0_D10.3_2017-03-17T110000.flac.h5',
'AM_B0_D0.7_2016-10-21T104000.flac.h5',
'AM_B0_D4.4_2016-08-05T151000.flac.h5']
def find_second_last(text, pattern):
return text.rfind(pattern, 0, text.rfind(pattern))
for files in filesall:
start = find_second_last(files,'-') - 4 # from yyyy- part
timepart = (files[start:start+17]).replace("T"," ")
#insert 2 ':'s
timepart = timepart[:13] + ':' + timepart[13:15] + ':' +timepart[15:]
# print(timepart)
tindex=pd.date_range(start= timepart, periods=60, freq='10S')
In Place of using file[11] as hard coded go for last or 2nd last index of _ then use your code then you don't have to write 2 times same code. Or use regex to parse the string.

Add a number to each line of a file in bash

I have some files with some lines in Linux like:
2013/08/16,name1,,5000,8761,09:00,09:30
2013/08/16,name1,,5000,9763,10:00,10:30
2013/08/16,name1,,5000,8866,11:00,11:30
2013/08/16,name1,,5000,5768,12:00,12:30
2013/08/16,name1,,5000,11764,13:00,13:30
2013/08/16,name2,,5000,2765,14:00,14:30
2013/08/16,name2,,5000,4765,15:00,15:30
2013/08/16,name2,,5000,6765,16:00,16:30
2013/08/16,name2,,5000,12765,17:00,17:30
2013/08/16,name2,,5000,25665,18:00,18:30
2013/08/16,name2,,5000,45765,09:00,10:30
2013/08/17,name1,,5000,33765,10:00,11:30
2013/08/17,name1,,5000,1765,11:00,12:30
2013/08/17,name1,,5000,34765,12:00,13:30
2013/08/17,name1,,5000,12765,13:00,14:30
2013/08/17,name2,,5000,1765,14:00,15:30
2013/08/17,name2,,5000,3765,15:00,16:30
2013/08/17,name2,,5000,7765,16:00,17:30
My column separator is "," and in the third column (currently ,,), I need the entry number within the same day. For example, with date
2013/08/16 I have 11 lines and with date 2013/08/17 7 lines, so I need add the numbers for example:
2013/08/16,name1,1,5000,8761,09:00,09:30
2013/08/16,name1,2,5000,9763,10:00,10:30
2013/08/16,name1,3,5000,8866,11:00,11:30
2013/08/16,name1,4,5000,5768,12:00,12:30
2013/08/16,name1,5,5000,11764,13:00,13:30
2013/08/16,name2,6,5000,2765,14:00,14:30
2013/08/16,name2,7,5000,4765,15:00,15:30
2013/08/16,name2,8,5000,6765,16:00,16:30
2013/08/16,name2,9,5000,12765,17:00,17:30
2013/08/16,name2,10,5000,25665,18:00,18:30
2013/08/16,name2,11,5000,45765,09:00,10:30
2013/08/17,name1,1,5000,33765,10:00,11:30
2013/08/17,name1,2,5000,1765,11:00,12:30
2013/08/17,name1,3,5000,34765,12:00,13:30
2013/08/17,name1,4,5000,12765,13:00,14:30
2013/08/17,name2,5,5000,1765,14:00,15:30
2013/08/17,name2,6,5000,3765,15:00,16:30
2013/08/17,name2,7,5000,7765,16:00,17:30
I need do it in bash. How can I do it?
This one's good too:
awk -F, 'sub(/,,/, ","++a[$1]",")1' file
Output:
2013/08/16,name1,1,5000,8761,09:00,09:30
2013/08/16,name1,2,5000,9763,10:00,10:30
2013/08/16,name1,3,5000,8866,11:00,11:30
2013/08/16,name1,4,5000,5768,12:00,12:30
2013/08/16,name1,5,5000,11764,13:00,13:30
2013/08/16,name2,6,5000,2765,14:00,14:30
2013/08/16,name2,7,5000,4765,15:00,15:30
2013/08/16,name2,8,5000,6765,16:00,16:30
2013/08/16,name2,9,5000,12765,17:00,17:30
2013/08/16,name2,10,5000,25665,18:00,18:30
2013/08/16,name2,11,5000,45765,09:00,10:30
2013/08/17,name1,1,5000,33765,10:00,11:30
2013/08/17,name1,2,5000,1765,11:00,12:30
2013/08/17,name1,3,5000,34765,12:00,13:30
2013/08/17,name1,4,5000,12765,13:00,14:30
2013/08/17,name2,5,5000,1765,14:00,15:30
2013/08/17,name2,6,5000,3765,15:00,16:30
2013/08/17,name2,7,5000,7765,16:00,17:30

Resources