splitting a file.txt into two file with a condition

splitting a file.txt into two file with a condition - linux

How can i split the given file into two different files results codes and warning codes.AS given below is single text file and I want to split it into two files as I had lot more file in such condition to split.
Result Codes:
0 - SYS_OK - "Ok"
1 - SYS_ERROR_E - "System Error"
1001 - MVE_SYS_E - "MTE System Error"
1002 - MVE_COMMAND_SYNTAX_ERROR_E - "Command Syntax is wrong"
Warning Codes:
0 - SYS_WARN_W - "System Warning"
100001 - MVE_SYS_W - "MVE System Warning"
200001 - SLEA_SYS_W - "SLEA System Warning"
200002 - SLEA_INCOMPLETE_SCRIPTED_OVERRIDE_COMMAND_W - "One or more of the entered scripted override commands has missing mandatory parameters"
300001 - L1_SYS_W - "L1 System Warning"

Well, on first glance, the distinction seems to be that "warnings" all contain the character sequence _W - and anything that doesn't is "results". Did you notice that?
awk '/_W -/{print >"warnings";next}{print >"results"}'

Here is a python solution:
I am assuming you are having the list of warning codes.
import re
warnings = open(r'warning-codes.txt');
warn_codes =[]
for line in warnings:
m = re.search(r'(\d+) .*',line);
if(m):
warn_codes.append(m.groups(1));
ow = open('output-warnings.txt','w')
ors = open('output-results.txt','w')
log_file = open(r'log.txt');
for line in log_file:
m = re.search(r'(\d+) .*',line);
if(m and (m.groups(1) in warn_codes)):
ow.write(line+'\n');
elif(m):
ors.write(line+'\n');
else:
print("none");
ow.close()
ors.close()

Related

Regex to find text & value in large text

As I SSH into CM, run commands and start reading the CLI output, I get the following
back:
# * A lot more output above but been removed *
terminal_output = """
[24;1H [79b[1GCommand: disp sys cust<<[23;0H[0;7m [79b[1G[0m[24;0H [79b[1G[1;0H[0;7m [79b[1G[0m[2;0H [79b[1G[3;1H[0J7[1;1H[0;7mdisplay system-parameters customer-options [0m8[1;65H[0;7mPage 1 of 12[0m[2;33HOPTIONAL FEATURES[4;8HG3 Version: [4;20HV20 [4;50HSoftware Package: [4;68HEnterprise [5;10HLocation: [5;20H2[6;10HPlatform: [6;20H28 [5;51HSystem ID (SID): [5;68H9990093751 [6;51HModule ID (MID): [6;68H1 [8;60HUSED[9;29HPlatform Maximum Ports: [9;53H 81000[9;60H 436[10;35HMaximum Stations: [10;53H 135[10;60H 110[11;27HMaximum XMOBILE Stations: [11;53H 41000[11;60H 0[12;17HMaximum Off-PBX Telephones - EC500: [12;53H 135[12;60H 2[13;17HMaximum Off-PBX Telephones - OPS: [13;53H 135[13;60H 40[14;17HMaximum Off-PBX Telephones - PBFMC: [14;53H 135[14;60H 0[15;17HMaximum Off-PBX Telephones - PVFMC: [15;53H 135[15;60H 0[16;17HMaximum Off-PBX Telephones - SCCAN: [16;53H 0[16;60H 0[17;22HMaximum Survivable Processors: [17;53H 313[17;62H 1[22;9H(NOTE: You must logoff & login to effect the permission changes.)[2;50H[0m
"""
It's a lot of ANSI escape codes (I think?) which sort of makes the output not too readable but anyways, what I'm trying to get back is the following from the text above:
Maximum Stations: 135 110
I know from my understanding that a Regex would be required for this.
The Regexes that I tried using but did not work:
r'Maximum Stations:\s*(\d+)(\d+)'
r'Maximum Stations: \d+'
If anyone knows how to filter out these ANSI character codes so they don't appear in the final output that'd be great too.
Thank you.

you can try the following
"(Maximum Stations:)\s\[\d*;\d*H\s*(\d*)\[\d*;\d*H\s*(\d*)"gm
it produces three groups the first with the maximum stations text then two more each with the number you wanted to capture. You would have to combine the groups to get your final output.
I don't know if this will be generic enough for your application though.

Python - Disect/Tokenize and Iterate Over Segments of Multilined Text with re

Assuming a VCD file with a structure like the one that follows as a minimum example:
#0 <--- section
b10000011#
0$
1%
0&
1'
0(
0)
#2211 <--- section
0'
#2296 <--- section
b0#
1$
#2302 <--- section
0$
I want to split the whole thing into timestamp sections and search in every one of them for certain values. That is to first isolate the section inbetween the #0 and #2211 timestamp, then the section inbetween the #2211 and #2296 and so on.
I am trying to do this with python in the following way.
search_space = "
#0
b10000011#
0$
1%
0&
1'
0(
0)
#2211
0'
#2296
b0#
1$
#2302
0$"
# the "delimiter"
timestamp_regex = "\#[0-9]+(.*)\#[0-9]+"
for match in re.finditer(timestamp_regex, search_space, flags=re.DOTALL|re.MULTILINE):
print(match.groups())
But it has no effect. What is the proper way to handle such scenario with the re package?

You need to use a lazy quantifier ? here.
I made some little changes like this:
timestamp_regex = r"(\#[0-9]+)(.+?)(?=\#[0-9]+|\Z)"
for match in re.finditer(timestamp_regex, search_space, flags=re.DOTALL|re.MULTILINE):
print(f"section: {match.group(1)}\nchunk:{match.group(2)}\n----")
output:
section: #0
chunk:
b10000011#
0$
1%
0&
1'
0(
0)
----
section: #2211
chunk:
0'
----
section: #2296
chunk:
b0#
1$
----
section: #2302
chunk:
0$
----
Check the pattern at Regex101
Details:
(\#[0-9]+) - 1st capturing group consisting of # and one or more digits
(.+?) - 2nd capturing group - match anything one or more times non-greedy (match as little as possible)
(?=\#[0-9]+|\Z) - Positive lookahead on \#[0-9]+ OR \Z which is the end of your input string (2nd capturing group is followed by either another section or the end of string). End of string is needed here because for the last section there is only the chunk and no following #[0-9]+, so the chunk is followed by end of string.

How to extract the substring preceding marker?

I have a string:
[3016] - Device is ready...
[10ice is loading..13] - v3[3016] - Device is ready...
[1r 0.[3016] - Device is ready.
Everything except '[3016] - Device is ready...' is 'noise'
The key word here is "Device is ready"
3016 - timestamp in msec. I need to extract '3016' from string for further operations
Tried following:
if "Device is ready" in reply:
# set a pattern for extracting time from the result
found = re.findall("\[.*\]", reply)
# Cut timestemp from reply
x = [tm[1:-1] for tm in found]
in case the reply was 'clean' ([3016] - Device is ready...) it's ok, but if there is 'noise' in reply then it doesn't work. Can someone point me in the right direction or perhaps assist with the code? Thanks in advance

If there is a single key, and it should precede the marker Device is ready, you can capture the digits first.
\[(\d+)].*\bDevice is ready\b
The pattern matches:
\[(\d+)] Capture 1+ digits between square brackets in group 1
.* Match 0+ times any char
\bDevice is ready\b and then Device is ready
Regex demo | Python demo
import re
strings = [
"[3016] - Device is ready...",
"[10ice is loading..13] - v3[3017] - Device is ready...",
"[1r 0.[3018] - Device is ready.",
"[1r 0 - Device is ready. [3019]",
]
pattern = r"\[(\d+)].*\bDevice is ready\b"
for s in strings:
match = re.search(pattern, s)
if match:
print(match.group(1))
Output
3016
3017
3018

You should use a regex group () to extract the number. found will be a list of all the numbers found inside []:
if "Device is ready" in reply:
# set a pattern for extracting time from the result
found = re.findall("\[(\d+)\]", reply)
print(found[0])

Python Print Table for Term and Definition with Handled Overflow

I'm trying to make a program that prints out a two column table (Term and Definition) something like this: (table width should be 80 characters)
+--------------------------------------------------------------------------+
| Term | Definition
|
+--------------------------------------+-----------------------------------+
| this is the first term. |This is the definition for thefirst|
| |term that wraps around because the |
| |definition is longer than the width|
| |of the column. |
+--------------------------------------+-----------------------------------+
|The term may also be longer than the |This is the definition for the |
|width of the column and should wrap |second term. |
|around as well. | |
+--------------------------------------+-----------------------------------+
I have existing code for this, but it prints out "this is the first term" on every line because I have used a nested for loop. (Also tried implementing the textwrap module) Here is the code that I have:
# read file
with open(setsList[selectedSet-1], "r", newline="") as setFile:
cardList = list(csv.reader(setFile))
setFile.close()
for i in range(len(cardList)):
wrapped_term = textwrap.wrap(cardList[i][0], 30)
wrapped_definition = textwrap.wrap(cardList[i][1], 30)
for line in wrapped_term:
for line2 in wrapped_definition:
print(line, " ",line2)
print("- - - - - - - - - - - - - - - - - - - - - - - - - - -")
Can anyone suggest a solution? Thank you.

After a lot of (trial) & error & random youtube videos, the solution: (if anyone has a similar problem)
with open("table.csv", "r", newline="") as setFile:
cardList = list(csv.reader(setFile))
setFile.close()
print("+------------------------------------------------------------------------------+")
print("| Term | Definition |")
print("+------------------------------------------------------------------------------+")
print()
for x in range(len(cardList)):
wrapped_term = textwrap.wrap(cardList[x][0], 30)
wrapped_definition = textwrap.wrap(cardList[x][1], 30)
wrapped_list = []
for i in range(len(wrapped_term)):
try:
wrapped_list.append([wrapped_term[i], wrapped_definition[i]])
except IndexError:
if len(wrapped_term) > len(wrapped_definition):
wrapped_list.append([wrapped_term[i], ""])
elif len(wrapped_term) < len(wrapped_definition):
wrapped_list.append(["", wrapped_definition[i]])
column1 = len(" Term ")
column2 = len(" Definition ")
print("+--------------------------------------+---------------------------------------+")
for item in wrapped_list:
print("|", item[0], " "*(column1 - len(item[0])),"|", item[1], " "*(column2-len(item[1])), "|")
print("+--------------------------------------+---------------------------------------+")
print("* *")
Basically, I created a wrapped version of each of my terms and definitions.
Then the try-catch stuff checks whether the term is longer than the definition (in terms of lines) and if so puts blank lines for the definition and vice versa.
I then created a wrapped_list (combined terms and definitions) to store this the above.
With help from this video: (https://www.youtube.com/watch?v=B9BRuhqEb2Q), I formatted the table.
Hope this helped anyone struggling with a similar problem - this can be applied to any number of columns in a table, and any length of csv file.

Python3 Renaming Files By tkinter Listbox

I want to rename all files in a directory by tkinter listbox.
Got stuck at this point:
files_list = os.listdir(root.foldername)
print(files_list)
gives me
['1.mp4', '10.mp4', '2.mp4', '3.mp4', '4.mp4', '5.mp4', '6.mp4', '7.mp4', '8.mp4', '9.mp4']
values = [listbox.get(idx) for idx in listbox.curselection()]<br>
And
inlist = (', '.join(values))<br>
print(inlist)
gives me
Lost - 1x01 - Pilot(1), Lost - 1x02 - Pilot(2), Lost - 1x03 - Tabula Rasa, Lost - 1x04 - Walkabout, Lost - 1x05 - White Rabbit, Lost - 1x06 - House Of The Rising Sun, Lost - 1x07 - The Moth, Lost - 1x08 - Confidence Man, Lost - 1x09 - Solitary, Lost - 1x10 - Raised By Another
Now I'm looking for a solution to use os.rename in order to rename the files 1.mp4 till 10.mp4.
Additionally Python for whatever reason does not come with a built-in way to have natural sorting, so it sorts 1.mp4 followed by 10.mp4.
Thank you very much in advance.

For natural sorting take a look at Sorting alphanumeric strings in Python.
Then loop through all files and rename them, eg.
for i in range(len(files_list)):
old_file_name = files_list[i]
new_file_name = values[i] + '.mp4'
os.rename(old_file_name, new_file_name)
For assistance in dealing with pathnames see os.path.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

splitting a file.txt into two file with a condition - linux

Well, on first glance, the distinction seems to be that "warnings" all contain the character sequence _W - and anything that doesn't is "results". Did you notice that? awk '/_W -/{print >"warnings";next}{print >"results"}'

Related

Regex to find text & value in large text

Python - Disect/Tokenize and Iterate Over Segments of Multilined Text with re

How to extract the substring preceding marker?

Python Print Table for Term and Definition with Handled Overflow

Python3 Renaming Files By tkinter Listbox

Categories

Resources