Simplified example
| summarize event_count() by State
| where State matches regex "K.*S"
| where event_count > 10
| project State, event_count
OUTPUT:
State | event_count
KANSAS | 3166
ARKANSAS | 1028
LAKE SUPERIOR | 34
In the above example, a search is performed and output is restricted to when the regex matches. Instead, I would like to be able to exclude any events where the regex matches. In the above example, this would equate to returning all events that don't match "K.*S". Documentation shows "contains" & "!contains" as well as "has" & "!has"...but I am unable to find a "!matches regex" to the match regex operator. How do I exclude events from a search where a regex matches? In other words, how do I return events where the regex did not match. This would help me filter out any false-positive alerts from my rules.
I have tried using !contains operator and !has operator, but when the regular expression is overly complicated, I have not found a solid work around to excluding outside of a regex. My goal is to filter out any events from my search that match a regular expression.
You can use the not operator.
You should also move the filter to before the summarize statement
Related
I have the challenge that I need to search in Excel for multiple terms and to get the result back for each cell which of the different terms has matched.
I know there is a formula combination to search for multiple terms but this will not give me the matched term back. The exampel below gives only a "0" or "1" back.
=IF(ISNUMBER(SEARCH({"TermA","TermB","TermC"},A1)),"1","0")
| | A | B |
| 1 | This is TermA | TermA |
| 2 | Some TermB Text | TermB |
| 3 | And TermA Text | TermA |
| 4 | another TermC | TermC |
Background I have to do some normalization of the values and look therefore for some forumla which can identify the values and list the match. The values which are used to search for should be later on another page so it can be easily extended.
Thank you for some hints and approaches which will put me into the right direction.
To return all matching terms:
=INDEX(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[.='TermA' or .='TermB' or .='TermC']"),COLUMN(A1))
Wrap in an IFERROR() if no match is found at all.
If one has ExcelO365 and you refer to a range, things got a lot easier:
Formula in E1:
=TRANSPOSE(FILTER(C$1:C$3,ISNUMBER(FIND(C$1:C$3,A1))))
=INDEX(FILTER(C:C,C:C<>""),MATCH(1, COUNTIF(A1, "*"&FILTER(C:C,C:C<>"")&"*"), 0))
For use in office 365 version. If previous version replace FILTER(C:C,C:C<>"") with C$1:C$4 for your example or whatever your range of search values may be. Table reference is also possible.
The formula searches for the first match in your list of values if the text including your term contains a matching value anywhere in that text. It returns the first match.
This is my very first question linking to my first Python project.
To put it simple, I have 2 columns of data in Excel like this (first 6 rows):
destination_area | destination_code
SG37.D0 | SG37.D
SG30.C0 | SG30.C
SG4.A3.P | SG4.A
SG15.C16 | SG15.C
SG35.D02 | SG35.D
SG8.A5.BC | SG8.A
So in Excel, I'm using a function to get destination code by finding first "." in the cell & return all characters from the left of it, plus 1 character:
=IfError(left(E2,search(".",E2)+1),"")
Now I want to execute it using str.extract
df1['destination_code'] = df1['destination_area'].str.extract(r"(?=(.*[0-9][.][A-Z]))", expand = False)
print(df1['destination_area'].head(6),df1['destination_code'].head(6))
I almost got what I need but the code still recognize those that have more than 1 "."
destination_area | destination_code
SG37.D0 | SG37.D
SG30.C0 | SG30.C
SG4.A3.P | SG4.A3.P
SG15.C16 | SG15.C
SG35.D02 | SG35.D
SG8.A5.BC | SG8.A5.BC
I recognize that my regex is understanding the pattern of {a number + "." + a letter}, which returns all characters for the cases of "SG4.A3.P" and "SG8.A5.BC".
So how to modify my code? Or any better way to perform the code like how Excel does? Thanks in advance
No need in lookahead. Use
df1['destination_code'] = df1['destination_area'].str.extract(r"^([^.]+\..)", expand=False)
See proof. Mind the capturing group, it is enough here to return the value you need.
Explanation:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^.]+ any character except: '.' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of \1
I'm trying to work with manually inserted strings in SAS and I need to remove specific special characters (maybe by inserting a list of them) without removing blank spaces between words.
I've found a possible solution with a combination of compbl and transtrn to remove special characters and substitute them with blanks, reduced to one by compbl but this requires multiple steps.
I'm wondering if there is a function that allows me to do this in a single step. I've tried with the compress function (with the 'k' modifier to keep only letters and digits) but it removes blanks between words.
I'd like to go from a string like this one:
O'()n?e /, ^P.iece
To:
One Piece
With a single blank between the two words.
If someone can help me it would be awesome!
Use the next tags for compress function:
k -- Keep chars instead replace it
a -- Alphabetic chars
s -- Space characters
d -- Digits
And after it, use function COMPBL.
Code:
data have;
value="O'()n?e /, ^P.iece";
run;
data want;
set have;
value_want=COMPBL((compress(value,,"kasd"));
run;
So:
+--------------------+------------+
| value | value_want |
+--------------------+------------+
| O'()n?e /, ^P.iece | One Piece |
+--------------------+------------+
You could use regex and prxchage.
data have;
value="O'()n?e /, ^P.iece";
run;
data want;
set have;
value_want=value_want=prxchange("s/\s\s+/ /",-1,prxchange("s/[^a-zA-Z0-9\s]*//",-1,value));
run;
Result:
+--------------------+------------+
| value | value_want |
+--------------------+------------+
| O'()n?e /, ^P.iece | One Piece |
+--------------------+------------+
I've searched for a while, but it looks like all the examples I find are the opposite of what I need. There are many ways to see if a string with wildcards matches any of the values in an array, but I need to go the other way - I need the array to contain wildcards, and check if the string in the target cell matches any of the match strings in the array, but the match strings can contain wild cards.
To put it in context, I am parsing large log files, and there are many lines I wish to ignore (but not delete); so I have a helper column:
+---+-------+----------------------------------------+----------------------------+
| | A | B | C (filter for = FALSE) | Requirement
+---+-------+----------------------------------------+----------------------------+
| 1 | 11:00 | VPN Status | =COUNTIF(IgnoreList,B1)>0 + Keep
| 2 | 11:05 | Log at event index 118, time index 115 | =COUNTIF(IgnoreList,B2)>0 + Ignore
| 3 | 11:20 | Log at event index 147, time index 208 | =COUNTIF(IgnoreList,B3)>0 + Ignore
+---+-------+----------------------------------------+----------------------------+
I've tried to put wildcards in my IgnoreList range to catch any of the "Log at event" lines:
+--------------------------------------+
| IgnoreList +
+--------------------------------------+
| State Runtime 1 +
| State Runtime 2 +
| State Runtime 3 +
| State Runtime 4 +
| Log at event index *, time index * +
+--------------------------------------+
... but this isn't working.
Does anyone know how to check a cell against an array containing wildcards?
My IgnoreList has 60 entries so far, so testing each cell individually isn't really feasible. I could have 30,000 or more entries in the log, so individual testing will be a lot more formulas than I'd hoped to use. I also don't want to edit the formulae when I add an entry to the IgnoreList.
Thanks for your help!
Use SEARCH, which allows wild card lookups, inside SUMPRODUCT:
=SUMPRODUCT(--ISNUMBER(SEARCH(IgnoreList,B1)))>0
To use COUNTIF one would need to reverse the criteria and wrap in SUMPRODUCT:
=SUMPRODUCT(COUNTIF(B1,IgnoreList))>0
Some (ascii) reports I produce contain ascii tables, like this one:
+------+------+------+
| col1 | col2 | col3 |
+======+======+======+
| bla | bla | bla |
| bla | bla | bla |
| bla | bla | bla |
+------+------+------+
I am trying to find a way to highlight such tables using a vim syntax file. A simple highlighting should suffice - no need to distinguish between the |, the =, the + and the -. However, I do not want to highlight the words inside the table (only the skeleton), and I do not want to highlight -, = signs (etc.) outside of the table.
The problem with vim syntax files is that they have no way of determining what's "up" or "down" relatively to a given point. I would be OK with just highlighting per-line, for examples, lines like this:
+------+------+------+
even if they not create nice tables, but the problem is with lines like this:
| col1 | col2 | col3 |
which may be mixed with non-tabular code, like this Python code:
x = y\
| z | u | v # | is here for 'or'
Can you think of a more elegant way of doing so? I've seen ome highlighters (other than vim) which highlight tables quite well...
You can solve this with containmaint, cp. :help :syn contains. First, define a region that spans the entire range of lines that the table is comprised of. I'm using a simplistic pattern for the header / footer line here, and assert that there's no | immediately above / below (in the neighboring line); refine this as needed:
syn region tableRegion start="|\#<!\n+[-+]*+$" end="^+[-+]*+\n|\#!" contains=tableRow
Then, define the (again, simplistic here) pattern to match the table rows, and mark this contained, so it will only match inside other syntax regions that contains= it.
syn match tableRow "^|.*|$" contained