Select substring between two characters in Scala - string

I'm getting a garbled JSON string from a HTTP request, so I'm looking for a temp solution to select the JSON string only.
The request.params() returns this:
[{"insured_initials":"Tt","insured_surname":"Test"}=, _=1329793147757,
callback=jQuery1707229194729661704_1329793018352
I would like everything from the start of the '{' to the end of the '}'.
I found lots of examples of doing similar things with other languages, but the purpose of this is not to only solve the problem, but also to learn Scala. Will someone please show me how to select that {....} part?

Regexps should do the trick:
"\\{.*\\}".r.findFirstIn("your json string here")

As Jens said, a regular expression usually suffices for this. However, the syntax is a bit different:
"""\{.*\}""".r
creates an object of scala.util.matching.Regex, which provides the typical query methods you may want to do on a regular expression.
In your case, you are simply interested in the first occurrence in a sequence, which is done via findFirstIn:
scala> """\{.*\}""".r.findFirstIn("""[{"insured_initials":"Tt","insured_surname":"Test"}=, _=1329793147757,callback=jQuery1707229194729661704_1329793018352""")
res1: Option[String] = Some({"insured_initials":"Tt","insured_surname":"Test"})
Note that it returns on Option type, which you can easily use in a match to find out if the regexp was found successfully or not.
Edit: A final point to watch out for is that the regular expressions normally do not match over linebreaks, so if your JSON is not fully contained in the first line, you may want to think about eliminating the linebreaks first.

Related

How to get a substring with Regex in Python

I am trying to formnulate a regex to get the ids from the below two strings examples:
/drugs/2/drug-19904-5106/magnesium-oxide-tablet/details
/drugs/2/drug-19906/magnesium-moxide-tablet/details
In the first case, I should get 19904-5106 and in the second case 19906.
So far I tried several, the closes I could get is [drugs/2/drug]-.*\d but would return g-19904-5106 and g-19907.
Please any help to get ride of the "g-"?
Thank you in advance.
When writing a regex expression, consider the patterns you see so that you can align it correctly. For example, if you know that your desired IDs always appear in something resembling ABCD-1234-5678 where 1234-5678 is the ID you want, then you can use that. If you also know that your IDs are always digits, then you can refine the search even more
For your example, using a regex string like
.+?-(\d+(?:-\d+)*)
should do the trick. In a python script that would look something like the following:
match = re.search(r'.+?-(\d+(?:-\d+)*)', my_string)
if match:
my_id = match.group(1)
The pattern may vary depending on the depth and complexity of your examples, but that works for both of the ones you provided
This is the closest I could find: \d+|.\d+-.\d+

How to use the split() method with some condition?

There is one condition where I have to split my string in the manner that all the alphabetic characters should stay as one unit and everything else should be separated like the example shown below.
Example:
Some_var='12/1/20 Balance Brought Forward 150,585.80'
output_var=['12/1/20','Balance Brought Forward','150,585.80']
Yes, you could use some regex to get over this.
Some_var = '12/1/20 Balance Brought Forward 150,585.80'
match = re.split(r"([0-9\s\\\/\.,-]+|[a-zA-Z\s\\\/\.,-]+)", Some_var)
print(match)
You will get some extra spaces but you can trim that and you are good to go.
split isn't gonna cut it. You might wanna look into Regular Expressions (abbreviated regex) to accomplish this.
Here's a link to the Python docs: re module
As for a pattern, you could try using something like this:
([0-9\s\\\/\.,-]+|[a-zA-Z\s\\\/\.,-]+)
then trim each part of the output.

Regex or IndexOf?

I have a long string "AB100123485;AB10064279293-IP-1-KNPO;AB473898487-MM41". I have to extract integer value after "IP-" i.e 1 (only) what is the most efficient way ? I am using c#
Thanks
The 'most-efficient' way depends on how consistent your string is in terms of length and appearance. You can surely do this with a regular expression as a quick solution if you just want to get the digit directly following IP-.
You can utilize the RegularExpressions API, passing in your regular expression and input string.
https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.match?view=netframework-4.8#System_Text_RegularExpressions_Regex_Match_System_String_System_String_
This pattern should get you started IP-[0-9]; refine it more to your use case as needed.
For example:
Match matched = System.Text.RegularExpressions.Match(
"AB100123485;AB10064279293-IP-1-KNPO;AB473898487-MM41",
"IP-[0-9]"
);

Sinon-chai expect String to contain substrings in specified order

Is there a way in sinon-chai to test whether a string contains sub-strings in the specified order? Something like:
expect("Hello World, it's a lovely day!").to.contain.in.order("World", "day")
Similar to what the sinon-chai-in-order does for spy calls.
I am currently using a regexp matching similar to this:
expect(vm.$el.querySelector('table').textContent.replace(/\r?\n|\r/g, ''))
.to.match(/.*A.*B.*D.*C.*/)
Originally, I was doing the following, but for I am not sure why it doesn't match across the newline:
expect(vm.$el.querySelector('table').textContent)
.to.match(/.*A.*B.*D.*C.*/gm)
However these matches can be sometimes hard to read and readability is the reason I am using sinon-chai.

Groovy - characters loss with stream.getText

I have this Groovy script that I'm testing:
InputStream is = awsS3Stream.getObjectContent();
def lines = is.getText("UTF-8");
println "lines:"+lines;
Pattern pattern = ~/type\"\:\"[A-Z][a-z]*\"/;
Matcher matcher = pattern.matcher(lines);
...
I noticed that depending on the size of the awsS3Stream object, variable lines may not have all of the text - the end of it is missing. I was hoping that using StringBuffer instead of String would solve the issue, but it did not. I hope someone may know a Groovy based solution to it as I'm not terribly familiar with Groovy... much appreciate your time.
P.S The issues I'm seeing is not related to the pattern - I don't need pattern there to see that the variable lines doesn't always have all of the data.
Are you trying to match alphabetic strings with just one initial uppercase letter? If not, the problem is with your regexp. To match camel case strings with any number of capital letters, use this:
Pattern pattern = ~/type\"\:\"[A-Za-z]*\"/;
The issue was with the data going into s3, not how I retrieve it.

Resources