How to replace repeated words with single word

How to replace repeated words with single word - string

I have a string variable response:
where where where is it
I'm going there
where where did you say
sometimes it is where you think
i think its where where you go
its everywhere where you are
i am planning on going where where where i want to
As you can see, the word "where" is repeated quite often. I want to replace strings "where where" and "where where where" (or even "where where where where") with "where".
However, I don't want to replace "everywhere where" with "where".
I know I can do this manually, but I was hoping to condense the code into as few lines as possible.
This is what I have been trying so far:
gen temp = regexr(response, " (where)+ where ", " where ")
replace temp = regexr(response, "^(where)+ where ", "where ")
These are my results after running the code above:
where where is it
I'm going there
where did you say
sometimes it is where you think
i think its where where you go
its everywhere where you are
i am planning on going where where where i want to
Instead, I want the final data to look like this:
where is it
I'm going there
where did you say
sometimes it is where you think
i think its where you go
its everywhere where you are
i am planning on going where i want to
I have been using "(where)+" to capture both "where where" and "where where where" but it doesn't seem to work. I also split the code into two commands, one begins with "^(where)" and the other with " (where)" in order to avoid capturing the 'where' in "everywhere" but it seems as if the code does not capture "where where" when it occurs in the middle of the sentence.

A quick fix using Stata's string functions is the following:
clear
input str50 string1
"where where where is it"
"I'm going there"
"where where did you say"
"sometimes it is where you think"
"i think its where where you go"
"its everywhere where you are"
"i am planning on going where where where i want to"
end
generate tag1 = !strmatch(string1, "*everywhere where*")
generate tag2 = ( length(string1) - length(subinstr(string1, "where", "", .)) ) / 5
generate string2 = cond(tag1 == 1, stritrim(subinstr(string1, "where", "", tag2-1)), string1)
list string2, separator(0)
+----------------------------------------+
| string2 |
|----------------------------------------|
1. | where is it |
2. | I'm going there |
3. | where did you say |
4. | sometimes it is where you think |
5. | i think its where you go |
6. | its everywhere where you are |
7. | i am planning on going where i want to |
+----------------------------------------+

Related

KQL query, how to extend information from rendereddescription

I want to extend the query result with specific values, but I do not know how to get only a fragment of information, the one that is in the screen, that is, for example, from the "rendereddescription" section, I only need information about "server_principal_name" and assign it to some value, e.g. "user" and this I know this needs to be resolved | extend "variable name" = i here i do not know what the syntax is.enter image description here

you can use the parse operator: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/parseoperator
for example:
print RenderDescription = #"... 0000000000 session_server_principal_name:ABB\HPAM-TCS-DB10 server_principal_sid:01050000000 ...."
| parse RenderDescription with * "session_server_principal_name:" session_server_principal_name " " *
RenderDescription
session_server_principal_name
... 0000000000 session_server_principal_name:ABB\HPAM-TCS-DB10 server_principal_sid:01050000000 ....
ABB\HPAM-TCS-DB10

Kusto: remove non-matching rows when using the parse operator

I'm querying azure log analytics using Kusto, and extracting fields with the parse operator, then keeping only the records which parsed correctly:
traces
| parse message with "Search found " people " people in " groupCount " groups"
| where people != "" and groupCount != ""
| order by n desc
Is there a more terse way of parsing and dropping non-matching rows? If I am parsing out a lot of columns from a set of logs, maybe containing partial matches, this connascence between the parse and where gets fiddly.
By comparison, in SumoLogic, the parse operator automatically drops all rows which don't match a parsed pattern, which makes for really tidy pipelines:
*
| parse "Search found * people in * groups" as people, groupCount
| order by n desc

In Kusto: 'parse' operator does not auto-filter rows that does not match the provided pattern, and operator works as in mode of 'extend' - adding more columns.
If you would like to filter specific row - the recommendation is to use 'where' operator before the 'parse': this will also improve performance as 'parse' will have fewer rows to scan.
traces
| where message startswith 'Search found'
| parse message with "Search found " people " people in " groupCount " groups"
...

There's now a built in operator that will do this: parse-where
https://learn.microsoft.com/en-us/azure/kusto/query/parsewhereoperator
It has syntax just like parse, but will omit from its output any records which didn't match the parse pattern.
So the query:
traces
| parse message with "Search found " people " people in " groupCount " groups"
| where people != "" and groupCount != ""
| order by n desc
becomes:
traces
| parse-where message with "Search found " people " people in " groupCount " groups"
| order by n desc

How to get ordered, defined or all columns except or after or before a given column

In BASH
I run the following one liner to get an individual column/field after splitting on a given character (one can use AWK as well if they want to split on more than one char i.e. on a word in any order, ok).
#This will give me first column i.e. 'lori' i.e. first column/field/value after splitting the line / string on a character '-' here
echo "lori-chuck-shenzi" | cut -d'-' -f1
# This will give me 'chuck'
echo "lori-chuck-shenzi" | cut -d'-' -f2
# This will give me 'shenzi'
echo "lori-chuck-shenzi" | cut -d'-' -f3
# This will give me 'chuck-shenzi' i.e. all columns after 2nd and onwards.
echo "lori-chuck-shenzi" | cut -d'-' -f2-
Notice the last command above, How can I do the same last cut command shit in Groovy?
For ex: if the contents are in a file and they look like:
1 - a
2 - b
3 - c
4 - d
5 - e
6 - lori-chuck shenzi
7 - columnValue1-columnValue2-columnValue3-ColumnValue4
I tried the following Groovy code, but it's not giving me lori-chuck shenzi (i.e. after ignoring the 6th bullet and first occurence of the -, I want my output to be lori-chuck shenzi and the following script is returning me just lori (which is givning me the correct output as my index is [1] in the following code, so I know that).
def file = "/path/to/my/file.txt"
File textfile= new File(file)
//now read each line from the file (using the file handle we created above)
textfile.eachLine { line ->
//list.add(line.split('-')[1])
println "Bullet entry full value is: " + line.split('-')[1]
}
// return list
Also, is there an easy way for the last line in the file above, if I can use Groovy code to change the order of the columns after they are split i.e. reverse the order like we do in Python [1:], [:1], [:-1] etc.. or in some fashion

I don't like this solution but I did this to get it working. After getting index values from [1..-1 (i.e. from 1st index, excluding the 0th index which is the left hand side of first occurrence of - character), I had to remove the [ and ] (LIST) using join(',') and then replacing any , with a - to get the final result what I was looking for.
list.add(line.split('-')[1..-1].join(',').replaceAll(',','-'))
I would still like to know what's a better solution and how can this work when we talk about cherry picking individual columns + in a given order (instead of me writing various Groovy statements to pick individual elements from the string/list per statement).

If I'm understanding your question correctly, what you want is:
line.split('-')[1..-1]
This will give you from position 1 to the last. You can do -2 (next to last) and so on, but just be aware that you can get an ArrayIndexOutOfBoundsException moving backwards too, if you go past the beginning of your array!
-- Original answer is above this line --
Adding to my answer, since comments don't allow code formatting. If all you want is to pick specific columns, and you want a string in the end, you could do something like:
def resultList = line.split('-')
def resultString = "${resultList[1]}-${resultList[2]} ${resultList[3]}"
and pick whatever columns you want that way. I thought you were looking for a more generic solution, but if not, specific columns are easy!
If you want the first value, a dash, then the rest joined by spaces, just use:
"${resultList[1]}-${resultList[2..-1].join(" ")}"
I don't know how to give you specific answers for every combination you might want, but basically once you have your values in a list, you can manipulate that however you want, and turn the results back into a string with GStrings or with .join(...).

how to split a string or make chars in vb 2010

I searched but nothing explains how to do this,
for example
Dim sentence as String = "cat is an animal"
if i make a msgbox :
MsgBox(sentence)
it shows
cat is an animal
how to make a msgbox that says
cat
is
an
animal.

Easy way Replace space with new line
as in string words = MyString.Replace(" ","\r\n")
Split would be split on space in to an array , and then join that back up with new lines which is pointless unless you need the array for something else.

Lua string.match() problem

I want to match a few lines for a string and a few numbers.
The lines can look like
" Code : 75.570 "
or
" ..dll : 13.559 1"
or
" ..node : 4.435 1.833 5461"
or
" ..NavRegions : 0.000 "
I want something like
local name, numberLeft, numberCenter, numberRight = line:match("regex");
But I'm very new to the string matching.

This pattern will work for every case:
%s*([%w%.]+)%s*:%s*([%d%.]+)%s*([%d%.]*)%s*([%d%.]*)
Short explanation: [] makes a set of characters (for example the decimals). The last to numbers use [set]* so an empty match is valid too. This way the number that haven't been found will effectively be assigned nil.
Note the difference between using + - * in patterns. More about patterns in the Lua reference.
This will match any combination of dots and decimals, so it might be useful to try and convert it to a number with tonumber() afterwards.
Some test code:
s={
" Code : 75.570 ",
" ..dll : 13.559 1",
" ..node : 4.435 1.833 5461",
" ..NavRegions : 0.000 "
}
for k,v in pairs(s) do
print(v:match('%s*([%w%.]+)%s*:%s*([%d%.]+)%s*([%d%.]*)%s*([%d%.]*)'))
end

Here is a starting point:
s=" ..dll : 13.559 1"
for w in s:gmatch("%S+") do
print(w)
end
You may save these words in a table instead of printing, of course. And skip the second word.

#Ihf Thank you, I now have a working solution.
local moduleInfo, name = {};
for word in line:gmatch("%S+") do
if (word~=":") then
word = word:gsub(":", "");
local number = tonumber(word);
if (number) then
moduleInfo[#moduleInfo+1] = number;
else
if (name) then
name = name.." "..word:gsub("%$", "");
else
name = word:gsub("%$", "");
end
end
end
end
#jpjacobs Really nice, thanks too. I'll rewrite my code for synthetic reasons ;-) I'll implement your regex of course.

I have no understanding of the Lua language, so I won't help you there.
But in Java this regex should match your input
"([a-z]*)\\s+:\\s+([\\.\\d]*)?\\s+([\\.\\d]*)?\\s+([\\.\\d]*)?"
You have to test each group to know if there is data left, center, right
Having a look at Lua, it could look like this. No guarantee, I did not see how to escape . (dot) which has a special meaning and also not if ? is usable in Lua.
"([a-z]*)%s+:%s+([%.%d]*)?%s+([%.%d]*)?%s+([%.%d]*)?"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to replace repeated words with single word - string

Related

KQL query, how to extend information from rendereddescription

Kusto: remove non-matching rows when using the parse operator

How to get ordered, defined or all columns except or after or before a given column

how to split a string or make chars in vb 2010

Lua string.match() problem

Categories

Resources