How to replace unwanted characters

How to replace unwanted characters - groovy

I have some hotels that contains characters which are not valie for when i want to insert these hotel names as a file name as file naming doesn't allow /, * or ? and want to know what this error means.
text?text
text?text
text**text**
text*text (text)
text *text*
text?
I am trying to use an if else statement so that if a hotel name contains any of these characters, then replace them with -. However I am receiving and error stating a dangling ?. I just want to check if I am using the replace correctly for these characters.
def hotelNameTrim = hotelName.toString()
if (hotelNameTrim.contains("/"))
{
hotelNameTrim.replaceAll("/", "-")
}
else if (hotelNameTrim.contains("*"))
{
hotelNameTrim.replaceAll("*", "-")
}
else if (hotelNameTrim.contains("?"))
{
hotelNameTrim.replaceAll("?", "-")
}

replaceAll accepts a regex as a search pattern. * and ? are special characters in regex and need to be escaped with a back slash. Which itself needs to be escaped in a Java string :)
Try this:
hotelNameTrim.replaceAll("[\\*\\?/]","-")
That will replace all you characters with a dash.

Related

Regular expression for splitting a comma with the string

I had to split string data based on Comma.
This is the excel data:-
Please find the excel data
string strCurrentLine="\"Himalayan Salt Body Scrub with Lychee Essential Oil from Majestic Pure, All Natural Scrub to Exfoliate & Moisturize Skin, 12 oz\",SKU_27,\"Tombow Dual Brush Pen Art Markers, Portrait, 6-Pack\",SKU_27,My Shopify Store 1,Valid,NonInventory".
Regex CSVParser = new Regex(",(?=(?:[^\"]\"[^\"]\")(?![^\"]\"))");
string[] lstColumnValues = CSVParser.Split(strCurrentLine);
I have attached the image.The problem is I used the Regex to split the string with comma but i need the ouptut just like SKU_27 because string[0] and string2 contains the forward and backward slash.I need the output string1 and remove the forward and backward slash.

The file seems to be a CVA file. For CVA to be properly formatted, it will use quotes "" to wrap strings that contains comma, such as
id, name, date
1,"Some text, that includes comma", 2020/01/01
Simply split the string by comma, you will get the 2nd column with double quote.

I'm not sure whether you are asking how to remove the double-quotes from lstColumnValues[0] and lstColumnValues[2], or add them to lstColumnValues[1].
To remove the double-quotes, just use Replace:
string myString = lstColumnValues[0].Replace("\"", "");
If you need to add them:
string myString = $"\"{lstColumnValues[1]}\"";

how to modify textfile using U-SQL

I have a large file of around 130MB containing 10 A characters in each line and \t at the end of 10th "A" character, I want to extract this text file and then change all A's to B's. Can any one help with its code snippet?
this is what I have wrote till now
USE DATABASE imodelanalytics;
#searchlog =
EXTRACT characters string
FROM "/iModelAnalytics/Samples/Data/dummy.txt"
USING Extractors.Text(delimiter: '\t', skipFirstNRows: 1);
#modify =
SELECT characters AS line
FROM #searchlog;
OUTPUT #modify
TO "/iModelAnalytics/Samples/Data/B.txt"
USING Outputters.Text();
I'm new to this, so any suggestions will be helpful ! Thanks

Assuming all of the field would be AAAAAAAAAA then you could write:
#modify = SELECT "BBBBBBBBBB" AS characters FROM #searchlog;
If only some are all As, then you would do it in the SELECT clause:
#modify =
SELECT (characters == "AAAAAAAAAA" ? "BBBBBBBBBB" : characters) AS characters
FROM #searchlog;
If there are other characters around the AAAAAAAAAA then you would use more of the C# string functions to find them and replace them in a similar pattern.

Reading from a string using sscanf in Matlab

I'm trying to read a string in a specific format
RealSociedad
this is one example of string and what I want to extract is the name of the team.
I've tried something like this,
houseteam = sscanf(str, '%s');
but it does not work, why?

You can use regexprep like you did in your post above to do this for you. Even though your post says to use sscanf and from the comments in your post, you'd like to see this done using regexprep. You would have to do this using two nested regexprep calls, and you can retrieve the team name (i.e. RealSociedad) like so, given that str is in the format that you have provided:
str = 'RealSociedad';
houseteam = regexprep(regexprep(str, '^<a(.*)">', ''), '</a>$', '')
This looks very intimidating, but let's break this up. First, look at this statement:
regexprep(str, '^<a(.*)">', '')
How regexprep works is you specify the string you want to analyze, the pattern you are searching for, then what you want to replace this pattern with. The pattern we are looking for is:
^<a(.*)">
This says you are looking for patterns where the beginning of the string starts with a a<. After this, the (.*)"> is performing a greedy evaluation. This is saying that we want to find the longest sequence of characters until we reach the characters of ">. As such, what the regular expression will match is the following string:
<ahref="/teams/spain/real-sociedad-de-futbol/2028/">
We then replace this with a blank string. As such, the output of the first regexprep call will be this:
RealSociedad</a>
We want to get rid of the </a> string, and so we would make another regexprep call where we look for the </a> at the end of the string, then replace this with the blank string yet again. The pattern you are looking for is thus:
</a>$
The dollar sign ($) symbolizes that this pattern should appear at the end of the string. If we find such a pattern, we will replace it with the blank string. Therefore, what we get in the end is:
RealSociedad

Found a solution. So, %s stops when it finds a space.
str = regexprep(str, '<', ' <');
str = regexprep(str, '>', '> ');
houseteam = sscanf(str, '%*s %s %*s');
This will create a space between my desired string.

Lua: Search a specific string

Hi all tried all the string pattrens and library arguments but still stuck.
i want to get the name of the director from the following string i have tried the string.matcH but it matches the from the first character it finD from the string
the string is...
fixstrdirector = {id:39254,cast:[{id:15250,name:Hope Davis,character:Aunt Debra,order:5,cast_id:10,profile_path:/aIHF11Ss8P0A8JUfiWf8OHPVhOs.jpg},{id:53650,name:Anthony Mackie,character:Finn,order:3,cast_id:11,profile_path:/5VGGJ0Co8SC94iiedWb2o3C36T.jpg},{id:19034,name:Evangeline Lilly,character:Bailey Tallet,order:2,cast_id:12,profile_path:/oAOpJKgKEdW49jXrjvUcPcEQJb3.jpg},{id:6968,name:Hugh Jackman,character:Charlie Kenton,order:0,cast_id:13,profile_path:/wnl7esRbP3paALKn4bCr0k8qaFu.jpg},{id:79072,name:Kevin Durand,character:Ricky,order:4,cast_id:14,profile_path:/c95tTUjx5T0D0ROqTcINojpH6nB.jpg},{id:234479,name:Dakota Goyo,character:Max Kenton,order:1,cast_id:15,profile_path:/7PU6n4fhDuFwuwcYVyRNVEZE7ct.jpg},{id:8986,name:James Rebhorn,character:Marvin,order:6,cast_id:16,profile_path:/ezETMv0YM0Rg6YhKpu4vHuIY37D.jpg},{id:930729,name:Marco Ruggeri,character:Cliff,order:7,cast_id:17,profile_path:/1Ox63ukTd2yfOf1LVJOMXwmeQjO.jpg},{id:19860,name:Karl Yune,character:Tak Mashido,order:8,cast_id:18,profile_path:/qK315vPObCNdywdRN66971FtFez.jpg},{id:111206,name:Olga Fonda,character:Farra Lemkova,order:9,cast_id:19,profile_path:/j1qabOHf3Pf82f1lFpUmdF5XvSp.jpg},{id:53176,name:John Gatins,character:Kingpin,order:10,cast_id:41,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:1126350,name:Sophie Levy,character:Big Sister,order:11,cast_id:42,profile_path:null},{id:1126351,name:Tess Levy,character:Little Sister,order:12,cast_id:43,profile_path:null},{id:1126352,name:Charlie Levy,character:Littlest Sister,order:13,cast_id:44,profile_path:null},{id:187983,name:Gregory Sims,character:Bill Panner,order:14,cast_id:45,profile_path:null}],crew:[{id:58726,name:Leslie Bohem,department:Writing,job:Screenplay,profile_path:null},{id:53176,name:John Gatins,department:Writing,job:Screenplay,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:17825,name:Shawn Levy,department:Directing,job:Director,profile_path:/7f2f8EXdlWsPYN0HPGcIlG21xU.jpg},{id:12415,name:Richard Matheson,department:Writing,job:Story,profile_path:null},{id:57113,name:Dan Gilroy,department:Writing,job:Story,profile_path:null},{id:25210,name:Jeremy Leven,department:Writing,job:Story,profile_path:null},{id:17825,name:Shawn Levy,department:Production,job:Producer,profile_path:/7f2f8EXdlWsPYN0HPGcIlG21xU.jpg},{id:34970,name:Susan Montford,department:Production,job:Producer,profile_path:/1XJt51Y9ciPhkHrAYE0j6Jsmgji.jpg},{id:3183,name:Don Murphy,department:Production,job:Producer,profile_path:null},{id:34967,name:Rick Benattar,department:Production,job:Producer,profile_path:null},{id:1126348,name:Eric Hedayat,department:Production,job:Producer,profile_path:null},{id:186721,name:Ron Ames,department:Production,job:Producer,profile_path:null},{id:10956,name:Josh McLaglen,department:Production,job:Executive Producer,profile_path:null},{id:57634,name:Mary McLaglen,department:Production,job:Executive Producer,profile_path:null},{id:23779,name:Jack Rapke,department:Production,job:Executive Producer,profile_path:null},{id:488,name:Steven Spielberg,department:Production,job:Executive Producer,profile_path:/cuIYdFbEe89PHpoiOS9tmo84ED2.jpg},{id:30,name:Steve Starkey,department:Production,job:Executive Producer,profile_path:null},{id:24,name:Robert Zemeckis,department:Production,job:Executive Producer,profile_path:/isCuZ9PWIOyXzdf3ihodXzjIumL.jpg},{id:531,name:Danny Elfman,department:Sound,job:Original Music Composer,profile_path:/pWacZpYPos8io22nEiim7d3wp2j.jpg},{id:18265,name:Mauro Fiore,department:Crew,job:Cinematography,profile_path:null},{id:54271,name:Dean Zimmerman,department:Editing,job:Editor,profile_path:null},{id:25365,name:Richard Hicks,department:Production,job:Casting,profile_path:null},{id:5490,name:David Rubin,department:Production,job:Casting,profile_path:null},{id:52088,name:Tom Meyer,department:Art,job:Production Design,profile_path:null}]}
i have tried string.match(fixstrdirector,"name:(.+),department:Directing")
but it gives me the from the first occurace it find the name to the end of thr string
output:
Hope Davis,character:Aunt Debra,order:5,cast_id:10,profile_path:/aIHF11Ss8P0A8JUfiWf8OHPVhOs.jpg},{id:53650,name:Anthony Mackie,character:Finn,order:3,cast_id:11,profile_path:/5VGGJ0Co8SC94iiedWb2o3C36T.jpg},{id:19034,name:Evangeline Lilly,character:Bailey Tallet,order:2,cast_id:12,profile_path:/oAOpJKgKEdW49jXrjvUcPcEQJb3.jpg},{id:6968,name:Hugh Jackman,character:Charlie Kenton,order:0,cast_id:13,profile_path:/wnl7esRbP3paALKn4bCr0k8qaFu.jpg},{id:79072,name:Kevin Durand,character:Ricky,order:4,cast_id:14,profile_path:/c95tTUjx5T0D0ROqTcINojpH6nB.jpg},{id:234479,name:Dakota Goyo,character:Max Kenton,order:1,cast_id:15,profile_path:/7PU6n4fhDuFwuwcYVyRNVEZE7ct.jpg},{id:8986,name:James Rebhorn,character:Marvin,order:6,cast_id:16,profile_path:/ezETMv0YM0Rg6YhKpu4vHuIY37D.jpg},{id:930729,name:Marco Ruggeri,character:Cliff,order:7,cast_id:17,profile_path:/1Ox63ukTd2yfOf1LVJOMXwmeQjO.jpg},{id:19860,name:Karl Yune,character:Tak Mashido,order:8,cast_id:18,profile_path:/qK315vPObCNdywdRN66971FtFez.jpg},{id:111206,name:Olga Fonda,character:Farra Lemkova,order:9,cast_id:19,profile_path:/j1qabOHf3Pf82f1lFpUmdF5XvSp.jpg},{id:53176,name:John Gatins,character:Kingpin,order:10,cast_id:41,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:1126350,name:Sophie Levy,character:Big Sister,order:11,cast_id:42,profile_path:null},{id:1126351,name:Tess Levy,character:Little Sister,order:12,cast_id:43,profile_path:null},{id:1126352,name:Charlie Levy,character:Littlest Sister,order:13,cast_id:44,profile_path:null},{id:187983,name:Gregory Sims,character:Bill Panner,order:14,cast_id:45,profile_path:null}],crew:[{id:58726,name:Leslie Bohem,department:Writing,job:Screenplay,profile_path:null},{id:53176,name:John Gatins,department:Writing,job:Screenplay,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:17825,name:Shawn Levy

You're searching from the first occurrence of "name:" until the "department:Directing" with everything in between.
Instead, you need to restrict what can be between the two strings. Here for example I'm saying that the characters that make up the name can only be alphanumeric or a space:
string.match(fixstrdirector,"name:([%w ]+),department:Directing")
Alternatively, given that there's a comma separating the parameters, a better approach would be to search for "name:" followed by any characters other than a comma, followed by "department:Directing":
string.match(fixstrdirector,"name:([^,]+),department:Directing")
Of course that wouldn't work if the name had a comma it in!

Lua patterns provides - modifier for tasks as you have above. As stated on PiL - Section 20.2:
The + modifier matches one or more characters of the original class.
It will always get the longest sequence that matches the pattern.
Like *, the modifier - also matches zero or more occurrences of
characters of the original class. However, instead of matching the
longest sequence, it matches the shortest one.
Next, when you are using . to match, it'll find any and all characters satisfying the pattern. Therefore, you'll get the result from first occurence of name until the ,department:Directing is found. Since you know that it is a JSON data, you can try to match for [^,]; that is, non-comma characters.
So, for your case try:
local tAllNames = {}
for sName in fixstrdirector:gmatch( "name:([^,]-),department:Directing" ) do
tAllNames[ #tAllNames + 1 ] = sName
end
and all your required names will be stored in the table tAllNames. An example of the above can be seen at codepad.

How to search for unicode characters in records of DB2?

I have a table in DB2 say METAATTRIBUTE wherein a column say "content" might contain any special character including the unicode characters.
For any special character, Eg: "#" I can simply search by :
Select * from METAATTRIBUTE where content like '%#%';
but how to search for unicode characters like "u201B" or "u201E" ???
Thanks in advance.

Assuming you are talking about DB2 LUW, the Unicode string literals are designated by the symbols "u&", followed by a regular string literal in single quotes. Unicode code points are designated by an escape character, backslash by default. For example:
$ db2 "values u&'\201b'"
1
---
‛
1 record(s) selected.
So your query would look like:
Select * from METAATTRIBUTE where content like u&'%\201b%';

Recently, I have had the same problem. This worked for me
select *
from METAATTRIBUTE
where MEDEDELINGSZONE like '%' || UX'201B' || '%'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to replace unwanted characters - groovy

replaceAll accepts a regex as a search pattern. * and ? are special characters in regex and need to be escaped with a back slash. Which itself needs to be escaped in a Java string :) Try this: hotelNameTrim.replaceAll("[\\*\\?/]","-") That will replace all you characters with a dash.

Related

Regular expression for splitting a comma with the string

how to modify textfile using U-SQL

Reading from a string using sscanf in Matlab

Lua: Search a specific string

How to search for unicode characters in records of DB2?

Categories

Resources