Regex to find text & value in large text - python-3.x

As I SSH into CM, run commands and start reading the CLI output, I get the following
back:
# * A lot more output above but been removed *
terminal_output = """
[24;1H [79b[1GCommand: disp sys cust<<[23;0H[0;7m [79b[1G[0m[24;0H [79b[1G[1;0H[0;7m [79b[1G[0m[2;0H [79b[1G[3;1H[0J7[1;1H[0;7mdisplay system-parameters customer-options [0m8[1;65H[0;7mPage 1 of 12[0m[2;33HOPTIONAL FEATURES[4;8HG3 Version: [4;20HV20 [4;50HSoftware Package: [4;68HEnterprise [5;10HLocation: [5;20H2[6;10HPlatform: [6;20H28 [5;51HSystem ID (SID): [5;68H9990093751 [6;51HModule ID (MID): [6;68H1 [8;60HUSED[9;29HPlatform Maximum Ports: [9;53H 81000[9;60H 436[10;35HMaximum Stations: [10;53H 135[10;60H 110[11;27HMaximum XMOBILE Stations: [11;53H 41000[11;60H 0[12;17HMaximum Off-PBX Telephones - EC500: [12;53H 135[12;60H 2[13;17HMaximum Off-PBX Telephones - OPS: [13;53H 135[13;60H 40[14;17HMaximum Off-PBX Telephones - PBFMC: [14;53H 135[14;60H 0[15;17HMaximum Off-PBX Telephones - PVFMC: [15;53H 135[15;60H 0[16;17HMaximum Off-PBX Telephones - SCCAN: [16;53H 0[16;60H 0[17;22HMaximum Survivable Processors: [17;53H 313[17;62H 1[22;9H(NOTE: You must logoff & login to effect the permission changes.)[2;50H[0m
"""
It's a lot of ANSI escape codes (I think?) which sort of makes the output not too readable but anyways, what I'm trying to get back is the following from the text above:
Maximum Stations: 135 110
I know from my understanding that a Regex would be required for this.
The Regexes that I tried using but did not work:
r'Maximum Stations:\s*(\d+)(\d+)'
r'Maximum Stations: \d+'
If anyone knows how to filter out these ANSI character codes so they don't appear in the final output that'd be great too.
Thank you.

you can try the following
"(Maximum Stations:)\s\[\d*;\d*H\s*(\d*)\[\d*;\d*H\s*(\d*)"gm
it produces three groups the first with the maximum stations text then two more each with the number you wanted to capture. You would have to combine the groups to get your final output.
I don't know if this will be generic enough for your application though.

Related

How to extract the substring preceding marker?

I have a string:
[3016] - Device is ready...
[10ice is loading..13] - v3[3016] - Device is ready...
[1r 0.[3016] - Device is ready.
Everything except '[3016] - Device is ready...' is 'noise'
The key word here is "Device is ready"
3016 - timestamp in msec. I need to extract '3016' from string for further operations
Tried following:
if "Device is ready" in reply:
# set a pattern for extracting time from the result
found = re.findall("\[.*\]", reply)
# Cut timestemp from reply
x = [tm[1:-1] for tm in found]
in case the reply was 'clean' ([3016] - Device is ready...) it's ok, but if there is 'noise' in reply then it doesn't work. Can someone point me in the right direction or perhaps assist with the code? Thanks in advance
If there is a single key, and it should precede the marker Device is ready, you can capture the digits first.
\[(\d+)].*\bDevice is ready\b
The pattern matches:
\[(\d+)] Capture 1+ digits between square brackets in group 1
.* Match 0+ times any char
\bDevice is ready\b and then Device is ready
Regex demo | Python demo
import re
strings = [
"[3016] - Device is ready...",
"[10ice is loading..13] - v3[3017] - Device is ready...",
"[1r 0.[3018] - Device is ready.",
"[1r 0 - Device is ready. [3019]",
]
pattern = r"\[(\d+)].*\bDevice is ready\b"
for s in strings:
match = re.search(pattern, s)
if match:
print(match.group(1))
Output
3016
3017
3018
You should use a regex group () to extract the number. found will be a list of all the numbers found inside []:
if "Device is ready" in reply:
# set a pattern for extracting time from the result
found = re.findall("\[(\d+)\]", reply)
print(found[0])

How do I in FloPy Modflow6 output MAW head values for all timesteps?

I am creating a MAW well and want to use it as an observation well to compare it later to field data, it should be screened over multiple layers. However, I am only getting the head value in the well of the very last timestep in my output file. Any ideas on how to get all timesteps in the output?
The FloPy manual says something about it needing to be in Output Control, but I can't figure out how to do that:
print_head (boolean) – print_head (boolean) keyword to indicate that the list of multi-aquifer well heads will be printed to the listing file for every stress period in which “HEAD PRINT” is specified in Output Control. If there is no Output Control option and PRINT_HEAD is specified, then heads are printed for the last time step of each stress period.
In the MODFLOW6 manual I see that it is possible to make a continuous output:
modflow6
My MAW definition looks like this:
maw = flopy.mf6.ModflowGwfmaw(gwf,
nmawwells=1,
packagedata=[0, Rwell, minbot, wellhead,'MEAN',OBS1welllayers],
connectiondata=OBS1connectiondata,
perioddata=[(0,'STATUS','ACTIVE')],
flowing_wells=False,
save_flows=True,
mover=True,
flow_correction=True,
budget_filerecord='OBS1wellbudget',
print_flows=True,
print_head=True,
head_filerecord='OBS1wellhead',
)
My output control looks like this:
oc = flopy.mf6.ModflowGwfoc(gwf,
budget_filerecord=budget_file,
head_filerecord=head_file,
saverecord=[('HEAD', 'ALL'), ('BUDGET', 'ALL'), ],
)
Hope this is all clear and someone can help me, thanks!
You need to initialise the MAW observations file... it's not done in the OC package.
You can find the scripts for the three MAW examples in the MF6 documentation here:
https://github.com/MODFLOW-USGS/modflow6-examples/tree/master/notebooks
It looks something like this:
obs_file = "{}.maw.obs".format(name)
csv_file = obs_file + ".csv"
obs_dict = {csv_file: [
("head", "head", (0,)),
("Q1", "maw", (0,), (0,)),
("Q2", "maw", (0,), (1,)),
("Q3", "maw", (0,), (2,)),
]}
maw.obs.initialize(filename=obs_file, digits=10, print_input=True, continuous=obs_dict)

kdb/q: How to apply a string manipulation function to a vector of strings to output a vector of strings?

Thanks in advance for the help. I am new to kdb/q, coming from a Python and C++ background.
Just a simple syntax question: I have a string with fields and their corresponding values
pp_str: "field_1:abc field_2:xyz field_3:kdb"
I wrote an atomic (scalar) function to extract the value of a given field.
get_field_value: {[field; pp_str] pp_fields: " " vs pp_str; pid_field: pp_fields[where like[pp_fields; field,":*"]]; start_i: (pid_field[0] ss ":")[0] + 1; end_i: count pid_field[0]; indices: start_i + til (end_i - start_i); pid_field[0][indices]}
show get_field_value["field_1"; pp_str]
"abc"
show get_field_value["field_3"; pp_str]
"kdb"
Now how do I generalize this so that if I input a vector of fields, I get a vector of values? I want to input ("field_1"; "field_2"; "field_3") and output ("abc"; "xyz"; "kdb"). I tried multiple approaches (below) but I just don't understand kdb/q's syntax well enough to vectorize my function:
/ Attempt 1 - Fail
get_field_value[enlist ("field_1"; "field_2"); pp_str]
/ Attempt 2 - Fail
get_field_value[; pp_str] /. enlist ("field_1"; "field_3")
/ Attempt 3 - Fail
fields: ("field_1"; "field_2")
get_field_value[fields; pp_str]
To run your function for each you could project the pp_str variable and use each for the others
q)get_field_value[;pp_str]each("field_1";"field_3")
"abc"
"kdb"
Kdb actually has built-in functionality to handle this: https://code.kx.com/q/ref/file-text/#key-value-pairs
q){#[;x](!/)"S: "0:y}[`field_1;pp_str]
"abc"
q)
q){#[;x](!/)"S: "0:y}[`field_1`field_3;pp_str]
"abc"
"kdb"
I think this might be the syntax you're looking for.
q)get_field_value[; pp_str]each("field_1";"field_2")
"abc"
"xyz"

Make all timestamps in a list have the same format

I have this list and would like for all of the timestamps to have the same format (... = more elements):
timestampList = [...
"8:36 - Appointment1",
"9:21 - Appointment2",
"10:01 - Appointment3",
"11:52 - Appointment4",
"12:18 - Appointment5" ...]
Is there an easy way to make sure all timestamps in the list have the same format(HH:MM)? Is there perhaps a module that makes this possible? I have tried to resolve the problem but couldn't find a way of doing it. I want the list to look like this:
timestampList = [...
"08:36 - Appointment1",
"09:21 - Appointment2",
"10:01 - Appointment3",
"11:52 - Appointment4",
"12:18 - Appointment5" ...]
You can use re.sub and a lookahead regex from the beginning of the line. If we see that the timestamp starts with \d:, then prepend a "0":
>>> import re
>>> [re.sub(r"^(?=\d:)", "0", x) for x in timestamps]
['08:36 - Appointment1', '09:21 - Appointment2', '10:01 - Appointment3', '11:52 - Appointment4', '12:18 - Appointment5']

Python3 Renaming Files By tkinter Listbox

I want to rename all files in a directory by tkinter listbox.
Got stuck at this point:
files_list = os.listdir(root.foldername)
print(files_list)
gives me
['1.mp4', '10.mp4', '2.mp4', '3.mp4', '4.mp4', '5.mp4', '6.mp4', '7.mp4', '8.mp4', '9.mp4']
values = [listbox.get(idx) for idx in listbox.curselection()]<br>
And
inlist = (', '.join(values))<br>
print(inlist)
gives me
Lost - 1x01 - Pilot(1), Lost - 1x02 - Pilot(2), Lost - 1x03 - Tabula Rasa, Lost - 1x04 - Walkabout, Lost - 1x05 - White Rabbit, Lost - 1x06 - House Of The Rising Sun, Lost - 1x07 - The Moth, Lost - 1x08 - Confidence Man, Lost - 1x09 - Solitary, Lost - 1x10 - Raised By Another
Now I'm looking for a solution to use os.rename in order to rename the files 1.mp4 till 10.mp4.
Additionally Python for whatever reason does not come with a built-in way to have natural sorting, so it sorts 1.mp4 followed by 10.mp4.
Thank you very much in advance.
For natural sorting take a look at Sorting alphanumeric strings in Python.
Then loop through all files and rename them, eg.
for i in range(len(files_list)):
old_file_name = files_list[i]
new_file_name = values[i] + '.mp4'
os.rename(old_file_name, new_file_name)
For assistance in dealing with pathnames see os.path.

Resources