I have a column from which i have to extract String and then format it back to US currency format with 2 decimal places.
For example :
Column value : {tag}0000020000890|
From this, I have to match the tag and extract 20000890, and format it to 200,008.90
I have extracted the part with below code:
LTRIM(REGEXP_SUBSTR('match pattern', 1,1,'i',,1), '0')
Where match pattern is '\{tag\}(.*?)\|'
With this, I am able to extract 20000890
And then I tried the below to_char and to_number function on top of it to format as comma separated currency with 2 decimal points.
to_char(ltrim(Regexp_substr('match pattern',1,1,'i',1),'0'), '99G999G999D99')
But this throws below error:
Sql error -20447, sqlstate 22007 sqlerrmc 99G999G999D99
Sysibm.Varchar-format
Then I tried,
to_char(to_number(ltrim(Regexp_substr('match pattern',1,1,'i',1),'0')), '99G999G999D99')
But this also throws error:
Sql error -20476, sqlstate 22018 sqlermc DECFLOAT_FORMAT; 99G999G999D99
I'm not sure what causes this error.
The format that you try to use is supported starting from V11.5 only.
TO_CHAR V11.5
TO_CHAR V11.1
Compare the Table 2. Format elements for decimal floating-point to varchar table from both links.
Moreover, you must cast a string to a numeric value in the 1-st parameter of TO_CHAR:
SELECT TO_CHAR(DECFLOAT(REGEXP_SUBSTR(V, '\{tag\}(.*?)\|', 1, 1, 'i', 1)), '99,999,999.99')
FROM (VALUES '{tag}0000020000890|') T(V);
Take a look at VARCHAR_FORMAT. It is the function TO_CHAR is mapped to. The group separator is not G, but "," or ".". Basically, you have to replace your formatting string 99G999G999D99 with something like 99,999,999.99.
The Db2 documentation has more examples on that.
Related
I want to extract data from a column before the '-' symbol. I could do this easily with T-SQL but I am getting errors when I do the same in Azure Databricks.
I also want to be able to check the column if there is such a symbol and where it does not exist I don't want to extract the data.
In T-SQL I could write:
SELECT EmailAddress
,SUBSTRING(emailaddress, 0, charindex('#', emailaddress, 0))
FROM [dbo].[DimEmployee]
How do I get the same result in Databricks, please?
To find records with ‘-’ symbol, you can use pyspark.sql.Column.contains
Column.contains(other)
Contains the other element. Returns a boolean Column based on a string match.
regexp_extract_all function
Extracts the all strings in str that matches the regexp expression and corresponds to the regex group index.
regexp_extract_all(str, regexp [, idx] )
E.g.
SELECT regexp_extract_all('100-200, 300-400', '(\\d+)-(\\d+)', 1);
[100, 300]
Refer - https://docs.databricks.com/sql/language-manual/functions/regexp_extract_all.html#regexp_extract_all-function-databricks-sql
QuotedPremium column is a string feature so I need to convert it to numeric value in order to use algorithm.
So, for that I am using Edit Metadata module, where I specify data type to be converted is Floating Point.
After I run it - I got an error:
Could not convert type System.String to type System.Double, inner exception message: Input string was not in a correct format.
What am I missing here?
As mentioned in comments, you must change column where numbers are handled as text to numeric type data and it shouldn't have any null values. Now answering the question of how to substitute NULL's in data using ML studio and converting to numeric type.
Substitute NULL's in data
Use Execute R Script module for that, and add this code in it.
dataset1 <- maml.mapInputPort(1); # class: data.frame
dataset1[dataset1 == "NULL"] = 0; # Wherever cell's value is "NULL", replace it with 0
maml.mapOutputPort("dataset1"); # return the modified data.frame
Image for same:
Convert to numeric data
As you have added in your answer, this can be done using the Edit Metadata module.
I have created a UDT made up of fields from three or four columns of data. One of the field contains a letter inside parens, for example (c) or (d). When importing the csv file using cqlsh's COPY FROM, I get an error message:
Syntax error in CQL query …..mismatched import ‘(‘ expecting ‘)’ (….column 3, column 4) VALUES (10.2[(]c…).
I have tried importing csv file with fields where the letter does not have brackets and get:
Syntax error in CQL query …..mismatched import ‘c‘ expecting ‘)’ (….column 3, column 4) VALUES (10.2[c]…)
I have tried importing csv file without a letter in the field and get:
Syntax error in CQL query …..mismatched import ‘,‘ expecting ‘)’ (….column 4) VALUES (10.2,…)
The UDT is made up of integers and text. It appears that importing the csv file containing the UDT which includes a letter inside a bracket (e.g. (c)) generates a data violation as does a letter with no bracket as does and as does field with no value in it.
Have you tried character escaping using double dollar ($$) or double quotes ('') ? http://docs.datastax.com/en/cql/3.3/cql/cql_reference/escape_char_r.html
I am importing text values into a transformation using a Fixed Width input step. Everything is coming in as a string. I want to convert some of the string values to integers with a decimal point at a specified spot. Here are some examples of the before (left hand side) and expected results (right hand side):
00289 --> 0028.9
01109 --> 0110.9
003201 --> 0032.01
I've tried numerous combinations of the Format mask in a Select Values step (meta data tab) but I can't get the values I'm looking for.
Can you anyone tell me what combination I can try for* Type/Length/Precision/Format/Encoding/Decimal/Group* attributes for these fields to get the desired output?
Have you tried another step the reach your goal? You can try to use e.g. User Defined Java Expression setting it in this way:
Java expression: new java.math.BigDecimal(text.substring(0,4) + "." + text.substring(4,text.length()))
Value type: BigNumber
But this will convert your input to:
00289 --> 28.9
01109 --> 110.9
003201 --> 32.01
Because its output is BigNumber format. BigNumber or Number format can be used for decimal numbers. You cannot use Integer for decimals because it has no decimal part.
If you want a String output leave out the new java.math.BigDecimal() part from the expression above and set Value type to String. It will produce these results:
00289 --> 0028.9
01109 --> 0110.9
003201 --> 0032.01
This is the one suggestion. Of course there are another ways of how to reach your goal.
I have to insert the string "johnmelling" value into a table which has the column as
[USERPASS] varbinary NOT NULL.
Please could any one suggest me, what would be the best conversion to insert "johnmelling"?
I tried to to insert as below,
Insert into table(column1)
Values(CONVERT(varbinary(1), 'johnmelling'))
Then I got the error
Line 1: String or binary data would be truncated.
Thank You,
You are converting to varbinary(1) so your target datatype is varbinary but the integer you have specified in parentheses is 1 which means your datatype will only have a length of 1; you are receiving that error because the length you have allocated to that datatype is too small. The literal, 'johnmelling' is 11 characters but you are trying to store it in a datatype that has a length of 1.
Simply change the integer in parentheses to 11, 50, 255, max or whatever you think is an appropriate length and you won't get that error.