How to split string by same multiple delimiters in Java - string

I am trying to split the string 'id#namespace' by #.
There is this special case that the id has a format name#gmail which makes the string I am trying to split look like name#gmail#namespace.
Is there any way to achieve that only split by the last # which will give me name#gmail and namespace?

If it is always the last index. Use lastIndex to find the last index of the character and then use substring
Something like this
int idx = string.lastIndexOf("#");
String[] splitStrings = {string.substring(0, idx), string.substring(idx)};

You could match the last # using a negative lookahead (?! to assert that there are no more # following:
#(?!.*#)
Regex demo
System.out.println("name#gmail#namespace".split("#(?!.*#)")[0]); //name#gmail

Related

Replace matched susbtring using re sub

Is there a way to replace the matched pattern substring using a single re.sub() line?.
What I would like to avoid is using a string replace method to the current re.sub() output.
Input = "/J&L/LK/Tac1_1/shareloc.pdf"
Current output using re.sub("[^0-9_]", "", input): "1_1"
Desired output in a single re.sub use: "1.1"
According to the documentation, re.sub is defined as
re.sub(pattern, repl, string, count=0, flags=0)
If repl is a function, it is called for every non-overlapping occurrence of pattern.
This said, if you pass a lambda function, you can remain the code in one line. Furthermore, remember that the matched characters can be accessed easier to an individual group by: x[0].
I removed _ from the regex to reach the desired output.
txt = "/J&L/LK/Tac1_1/shareloc.pdf"
x = re.sub("[^0-9]", lambda x: '.' if x[0] is '_' else '', txt)
print(x)
There is no way to use a string replacement pattern in Python re.sub to replace with two possible strings, as there is no conditional replacement construct support in Python re.sub. So, using a callable as the replacement argument or use other work-arounds.
It looks like you only expect one match of <DIGITS>_<DIGITS> in the input string. In this case, you can use
import re
text = "/J&L/LK/Tac1_1/shareloc.pdf"
print( re.sub(r'^.*?(\d+)_(\d+).*', r'\1.\2', text, flags=re.S) )
# => 1.1
See the Python demo. See the regex demo. Details:
^ - start of string
.*? - zero or more chars as few as possible
(\d+) - Group 1: one or more digits
_ - a _ char
(\d+) - Group 2: one or more digits
.* - zero or more chars as many as possible.

find regex expression based character match

I have a list of strings something like this:
a=['bukt/id=gdhf/year=989/month=98/day=12/hgjhg.csv','bukt/id=76fhfh/year=989/month=08/day=128/hkngjhg.csv']
ids are unique.I want to have a output list which will be something like this
output_list = ['bukt/id=gdhf/','bukt/id=76fhfh/']
So basically need a regex expression to match any id and remove the rest of the part from the string
How can I do that in most efficient way considering the length of the input list is more than 100K
import re
rgx = r'(bukt/id=[a-zA-Z0-9]+/).+'
re.search(rgx, string).group(1)
The result will be in group 1. This captures "bukt/id=", followed by any alphanumeric characters and then a slash, and throws away the rest.
There's no need for regex, you can just split your string on /, discard everything after the second / and then join again with /:
a=['bukt/id=gdhf/year=989/month=98/day=12/hgjhg.csv','bukt/id=76fhfh/year=989/month=08/day=128/hkngjhg.csv']
out = ['/'.join(u.split('/')[:2]) for u in a]
print(out)
Output:
['bukt/id=gdhf', 'bukt/id=76fhfh']
If you want the trailing /, just add an empty string to the end of the split array:
out = ['/'.join(u.split('/')[:2] + ['']) for u in a]
Output:
['bukt/id=gdhf/', 'bukt/id=76fhfh/']

insert a character after every nth character using groovy

I am using groovy and ended up with some long strings. I need to insert after say every 50th character. How do i do that?
could not find any option other than traversing the string via index and put something.
You can alternatively do a split using regular expression and then concatenation using join method.
Example:
​def input = 'abCDSasdDSdsds'
def splitted = input.split(/(?<=\G\w{5})/)
// or you can write . instead of \w
assert 'abCDS:asdDS:dsds' == splitted.join(':')
​

How to match a part of string before a character into one variable and all after it into another

I have a problem with splitting string into two parts on special character.
For example:
12345#data
or
1234567#data
I have 5-7 characters in first part separated with "#" from second part, where are another data (characters,numbers, doesn't matter what)
I need to store two parts on each side of # in two variables:
x = 12345
y = data
without "#" character.
I was looking for some Lua string function like splitOn("#") or substring until character, but I haven't found that.
Use string.match and captures.
Try this:
s = "12345#data"
a,b = s:match("(.+)#(.+)")
print(a,b)
See this documentation:
First of all, although Lua does not have a split function is its standard library, it does have string.gmatch, which can be used instead of a split function in many cases. Unlike a split function, string.gmatch takes a pattern to match the non-delimiter text, instead of the delimiters themselves
It is easily achievable with the help of a negated character class with string.gmatch:
local example = "12345#data"
for i in string.gmatch(example, "[^#]+") do
print(i)
end
See IDEONE demo
The [^#]+ pattern matches one or more characters other than # (so, it "splits" a string with 1 character).

Removing first character from a string in octave

I wanted to know how to remove first character of a string in octave. I am manipulating the string in a loop and after every loop, I want to remove the first character of the remaining string.
Thanks in advance.
If it's just a one-line string then:
short_string = long_string(2:end)
But if you have a cell array of strings then either do it as above if you have a loop already, otherwise you can use this shorthand to do it in one line:
short_strings = cellfun(#(x)(x(2:end)), long_strings, 'uni', false)
Or else if you have a matrix of strings (i.e. all the same length), then you can vectorize it as:
short_strings = long_strings(:, 2:end)

Resources