how to modify textfile using U-SQL - text

I have a large file of around 130MB containing 10 A characters in each line and \t at the end of 10th "A" character, I want to extract this text file and then change all A's to B's. Can any one help with its code snippet?
this is what I have wrote till now
USE DATABASE imodelanalytics;
#searchlog =
EXTRACT characters string
FROM "/iModelAnalytics/Samples/Data/dummy.txt"
USING Extractors.Text(delimiter: '\t', skipFirstNRows: 1);
#modify =
SELECT characters AS line
FROM #searchlog;
OUTPUT #modify
TO "/iModelAnalytics/Samples/Data/B.txt"
USING Outputters.Text();
I'm new to this, so any suggestions will be helpful ! Thanks

Assuming all of the field would be AAAAAAAAAA then you could write:
#modify = SELECT "BBBBBBBBBB" AS characters FROM #searchlog;
If only some are all As, then you would do it in the SELECT clause:
#modify =
SELECT (characters == "AAAAAAAAAA" ? "BBBBBBBBBB" : characters) AS characters
FROM #searchlog;
If there are other characters around the AAAAAAAAAA then you would use more of the C# string functions to find them and replace them in a similar pattern.

Related

Need guidance with Regular Expression in Python

I need help with one of my current tasks wherein i am trying to pick only the table names from the query via Python
So basically lets say a query looks like this
Create table a.dummy_table1
as
select a.dummycolumn1,a.dummycolumn2,a.dummycolumn3 from dual
Now i am passing this query into Python using STRINGIO and then reading only the strings where it starts with "a" and has "_" in it like below
table_list = set(re.findall(r'\ba\.\w+', str(data)))
Here data is the dataframe in which i have parsed the query using StringIO
now in table_list i am getting the below output
a.dummy_table1
a.dummycolumn1
a.dummycolumn2
whereas the Expected output should have been like
a.dummy_table1
<Let me know how we can get this done , have tried the above regular expression but that is not working properly>
Any help on same would be highly appreciated
Your current regex string r"\ba.\w+" simply matches any string which:
Begins with "a" (the "\ba" part)
Followed by a period (the "." part)
Followed by 1 or more alphanumeric characters (the "\w+" part).
If I've understood your problem correctly, you are looking to extract from str(data) any string fragments which match this pattern instead:
Begins with "a"
Followed by a period
Followed by 1 or more alphanumeric characters
Followed by an underscore
Followed by 1 or more alphanumeric characters
Thus, the regular expression should have "_\w+" added to the end to match criteria 4 and 5:
table_list = set(re.findall(r"\ba\.\w+_\w+", str(data)))

Replace the string in all lines of itab?

I am trying to create an algorithm in SAP ABAP to eliminate the word IBAN from certain fields. For example, in the below photo we have that for KNBK-Bankschlüssel=7415000, the KNBK-Bankkontonummer= <IBAN 000000000008. I am trying to eliminate IBAN from the field so that only 000000000008 will be shown in the table.
Is there any string operation that would let me check whether a field has the keyword IBAN and to eliminate it?
Thank you all in advance!
You can do it with the REPLACE statement:
IF word CS 'IBAN'. "to check if the string contains IBAN (as substring)
REPLACE 'IBAN' WITH '' INTO word. "This will remove the substring IBAB, but it will be replaced with a space
CONDENSE word NO-GAPS. "This will remove the space (and other spaces as well, if there is any in the string)
ENDIF.
Looking at the screenshot, the field contains '< IBAN>' (instead of just 'IBAN'), so you have to modify the code accordingly.
Another possibility is REPLACE IN TABLE:
REPLACE ALL OCCURRENCES OF REGEX '^.*IBAN>'
IN TABLE itab WITH ''
RESPECTING CASE.
This snippet will delete all the <IBAN>s with all the preceding characters from all lines.

Regular expression for splitting a comma with the string

I had to split string data based on Comma.
This is the excel data:-
Please find the excel data
string strCurrentLine="\"Himalayan Salt Body Scrub with Lychee Essential Oil from Majestic Pure, All Natural Scrub to Exfoliate & Moisturize Skin, 12 oz\",SKU_27,\"Tombow Dual Brush Pen Art Markers, Portrait, 6-Pack\",SKU_27,My Shopify Store 1,Valid,NonInventory".
Regex CSVParser = new Regex(",(?=(?:[^\"]\"[^\"]\")(?![^\"]\"))");
string[] lstColumnValues = CSVParser.Split(strCurrentLine);
I have attached the image.The problem is I used the Regex to split the string with comma but i need the ouptut just like SKU_27 because string[0] and string2 contains the forward and backward slash.I need the output string1 and remove the forward and backward slash.
The file seems to be a CVA file. For CVA to be properly formatted, it will use quotes "" to wrap strings that contains comma, such as
id, name, date
1,"Some text, that includes comma", 2020/01/01
Simply split the string by comma, you will get the 2nd column with double quote.
I'm not sure whether you are asking how to remove the double-quotes from lstColumnValues[0] and lstColumnValues[2], or add them to lstColumnValues[1].
To remove the double-quotes, just use Replace:
string myString = lstColumnValues[0].Replace("\"", "");
If you need to add them:
string myString = $"\"{lstColumnValues[1]}\"";

python3 replace ' in a string

I am trying to clean text strings containing any ' or &#39 (which includes an ; but if i add it here you will see just ' again. Because the the ANSI is also encoded by stackoverflow. The string content contains ' and when it does there is an error.
when i insert the string to my database i get this error:
psycopg2.ProgrammingError: syntax error at or near "s"
LINE 1: ...tment and has commenced a search for mr. whitnell's
the original string looks like this:
...a search for mr. whitnell&#39s...
To remove the ' and &#39 ; I use:
stripped_content = stringcontent.replace("'","")
stripped_content = stringcontent.replace("&#39 ;","")
any advice is welcome, best regards
When you try to replace("&#39 ;","") it literally searching for "&#39 ;" occurrences in string. You need to convert "&#39 ;" to its character equivalent. Try this:
s = "That's how we 'roll"
r = s.replace(chr(int('&#39'[2:])), "")
and with this chr(int('&#39'[2:])) you'll get ' character.
Output:
Thats how we roll
Note
If you try to run this s.replace(chr(int('&#39'[2:])), "") without saving your result in variable then your original string would not be affected.

How to search for unicode characters in records of DB2?

I have a table in DB2 say METAATTRIBUTE wherein a column say "content" might contain any special character including the unicode characters.
For any special character, Eg: "#" I can simply search by :
Select * from METAATTRIBUTE where content like '%#%';
but how to search for unicode characters like "u201B" or "u201E" ???
Thanks in advance.
Assuming you are talking about DB2 LUW, the Unicode string literals are designated by the symbols "u&", followed by a regular string literal in single quotes. Unicode code points are designated by an escape character, backslash by default. For example:
$ db2 "values u&'\201b'"
1
---
‛
1 record(s) selected.
So your query would look like:
Select * from METAATTRIBUTE where content like u&'%\201b%';
Recently, I have had the same problem. This worked for me
select *
from METAATTRIBUTE
where MEDEDELINGSZONE like '%' || UX'201B' || '%'

Resources