parsing a string that ends - c#-4.0

I have a huge string. I need to extract a substring from that that huge string. The conditions are the string starts with either "TECHNICAL" or "JUSTIFY" or "ALIGN" and ends with a number( any number from 1 to 10) followed by period and then followed by space. so for example, I have
string x = "This is a test, again I am testing TECHNICAL: I need to extract this substring starting with testing. 8. This is test again and again and again and again.";
so I need this
TECHNICAL: I need to extract this substring starting with testing.
I was wondering if someone has elegant solution for that.
I was trying to use the regular expression, but I guess I could not figure out the right expresion.
any help will be appreciated.
Thanks in advance.

Try this: #"((?:TECHNICAL|JUSTIFY|ALIGN).*?)(?:[1-9]|10)\. "

Related

Groovy - characters loss with stream.getText

I have this Groovy script that I'm testing:
InputStream is = awsS3Stream.getObjectContent();
def lines = is.getText("UTF-8");
println "lines:"+lines;
Pattern pattern = ~/type\"\:\"[A-Z][a-z]*\"/;
Matcher matcher = pattern.matcher(lines);
...
I noticed that depending on the size of the awsS3Stream object, variable lines may not have all of the text - the end of it is missing. I was hoping that using StringBuffer instead of String would solve the issue, but it did not. I hope someone may know a Groovy based solution to it as I'm not terribly familiar with Groovy... much appreciate your time.
P.S The issues I'm seeing is not related to the pattern - I don't need pattern there to see that the variable lines doesn't always have all of the data.
Are you trying to match alphabetic strings with just one initial uppercase letter? If not, the problem is with your regexp. To match camel case strings with any number of capital letters, use this:
Pattern pattern = ~/type\"\:\"[A-Za-z]*\"/;
The issue was with the data going into s3, not how I retrieve it.

Finding a character inside a string in Excel

I want to remove all the characters from a string expect whatever character is between a certain set of characters. So for example I have the input of Grade:2/2014-2015 and I want the output of just the grade, 2.
I'm thinking that I need to use the FIND function to grab whatever is between the : and the / , this also needs to work with double characters such 10 however I believe that it would work so long as the defining values with the FIND function are correct.
Unfortunately I am totally lost on this when using the FIND function however if there is another function that would work better I could probably figure it out myself if I knew what function.
It's not particularly elegant but =MID(A1,FIND(":",A1)+1,FIND("/",A1) - FIND(":",A1) - 1) would work.
MID takes start and length,FIND returns the index of a given character.
Edit:
As pointed out, "Grade:" is fixed length so the following would work just as well:
=MID(A1,7,FIND("/",A1) - 7)
You could use LEFT() to remove "Grade:"
And then use and then use LEFTB() to remove the year.
Look at this link here. This is the way I would go about it.
=SUBSTITUTE(SUBSTITUTE(C4, "Grade:", ""), "/2014-2015", "")
where C4 is the name of your cell.

Read a string and create an acronym from the first initial letter of every word on the string

I just wrote a code with the criteria above, but it doesn't seem to work properly because I either miss a letter at the end or in the middle.
Could anyone please check out my code an tell me what I'm doing wrong. By the way I already checked other threads on this similar problem, but I'm not allowed to use regex or print function.
phrase=('my room is cold')
allSpaces=findstr(' ',phrase);
k=length(allSpaces)
acr=phrase(1:allSpaces(1):allSpaces(k)-1)
Output:
acr= mrms
Change last line to
acr = phrase([1 allSpaces+1])
That way you get the first letter, and then the first after each space.

LevenshteinSim() Approximate string matching

I am using levenshteinSim() to do the approximate string matching. I am facing a problem
here is what my data look like
string = "Mitchell"
stringvector = c("Ray Mitchell", "Mitchell Dough","Juila Mitch")
.
I want the algorithm to match only second part of the Stringvector, not the first half..How do i do it. I really appreciate your help. And how do I use weighing schema?
Thanks
Kothavari
I believe you will need to preprocess the data to just pull out the second part of the string and use the algo on that.
Other people seem to do some preproessing first. See here

Extracting a substring from a large string in Erlang

I need to search for a substring in a string and return that if it is there in the string.
What is the best way to do that in Erlang? Note that i dont know the place that substring happens in the bigger string so i need to do a search for that.
You can use a regular expression:
> re:run("foobarbaz", "bar", [{capture, first, list}]).
{match,["bar"]}
See the documentation for re:run/3 for more information. In particular you may find that a different capture option suits your need.
Or if you don't need all the features of regular expressions, string:str/2 might be enough:
> string:str(" Hello Hello World World ", "Hello World").
8
This small function may help you. It returns true if the small string can be found in the big string, otherwise it returns false.
string_contains(Big, Small)->
string:str(Big, Small) > 0.

Resources