Is there a simple way to parse comma separated Key:Value pairs in Excel, Power Query or VBA if the values contain unescaped commas? - excel

I'm working with a CSV export of identity data containing ~22000 records. One of the fields included is titled 'ExtendedAttributes' and each cell in the column contains a quote bound string of comma separated Key:Value pairs. Each record in the file has an arbitrary number of extended attributes (up to around 50). My ultimate objective is to expand these extended attributes into their own columns in Excel (2016). I already have solutions for the expansions into columns from other data using formulae, simple VBA and most recently Power Query based approaches.
However, my previous solutions have all been based on the Key:Value pairs being simple to delimit. In this export, the ExtendedAttributes field has:
Value data that may contain unescaped/unquoted commas. e.g.
"Key1: Value1, name: surname, forename, Key2: Value2, ... "
Keys that may contain multiple comma separated values, which are also unquoted/unescaped. e.g.
"Key1: Value1, emailAlias: alias1#domain, alias2#domain, alias3#domain, Key2: Value2, ... "
My usual approach to this, where Key:Value pairs don't have these problems would be to delimit using commas to break it into the key value pairs, transpose the data into rows, then delimit using the colon to populate my new columns and their values as described here in the PowerBI community pages
This doesn't work here because delimiting using a comma breaks the values.
Is there a straightforward way to parse this into the constituent Key:Value pairs using (ideally) Power Query? Happy to also go with VBA or formula based solutions.
My instinctive approach would be to try and identify substrings containing a colon and prepend them with a unique character, which can then be used as a delimiter. (It's not impossible that the data may also include unescaped colons, but I'm happy to assume that it doesn't) But recognise that this may be a needlessly complex approach and I'm unsure how best to do it.
I'm happy to keep values with multiple comma separated items as a single unit (A problem for me to deal with later).
For the example data:
"Key1: Value1, name: surname, forename, emailAlias: alias1#domain, alias2#domain, alias3#domain, Key2: Value2, ... "
I'd like to end up with something that lets me treat the data like this, using maybe a ! as an example unique character that I could then use as a delimiter:
"Key1: Value1!name: surname, forename!emailAlias: alias1#domain, alias2#domain, alias3#domain!Key2: Value2!..."
I don't have access to the original data (vendor controlled system) and have limited data processing tools on my corporate desktop (Excel 2016, VBA, PQ).
Appreciate any help.

In Power Query, you can define a function Partition as follows:
let
Output = (str as text, sep as text) as text =>
Text.RemoveRange(
Text.Replace(
Text.Combine(
List.Transform(
Text.Split(str, " "),
each if Text.Contains(_, ":") then sep & _ else _
),
" "
), ", " & sep, sep
),
0, Text.Length(sep)
)
in
Output
Example text transformation using separator !
Starting text:
Key1: Value1, name: surname, forename, emailAlias: alias1#domain
Split the string based on spaces into a list
Key1:
Value1,
name:
surname,
forename,
emailAlias:
alias1#domain
Prepend any list items containing : with separator !
!Key1:
Value1,
!name:
surname,
forename,
!emailAlias:
alias1#domain
Combine the list back into a string
!Key1: Value1, !name: surname, forename, !emailAlias: alias1#domain
Replace , ! with !
!Key1: Value1!name: surname, forename!emailAlias: alias1#domain
Remove the first separator
Key1: Value1!name: surname, forename!emailAlias: alias1#domain
Once you have this function defined, you can call it in a column transformation that would look something like
= Table.TransformColumns(#"Prev Step", {{"ColName", each Partition(_,"!") , type text}})

You do not say anything about the file records not being counted in the "ExtendedAttribute field" category... I prepared a function able to separate that area which you put in discussion. Please, use the next code:
Function separateKeys(x As String, sep As String) As String
Dim arr1, arr2, i As Long, k As Long
arr1 = Split(x, ": ")
ReDim arr2(UBound(arr1))
For i = 0 To UBound(arr1) - 1
If arr1(i + 1) = arr1(UBound(arr1)) Then Exit For
arr2 = Split(arr1(i + 1), " ")
arr2(UBound(arr2) - 1) = Replace(arr2(UBound(arr2) - 1), ",", sep)
arr1(i + 1) = Join(arr2, " ")
Next
separateKeys = Replace(Join(arr1, ":"), sep & " ", sep)
End Function
The above function can (probably) be adapted in a way to skip calculations for rest of the file, or also transform each comma in the sep character (simple using Replace).
In order to test the above function, please use the next testing Sub:
Sub testSepKeys()
Dim x As String, sep As String
sep = "|" 'you can try something else, but improbable to appear in the processed text
x = "Key1: Value1, name: surname, forename, emailAlias: alias1#domain, alias2#domain, alias3#domain, Key2: Value2, Value3, key3: Val1, Val2"
Debug.Print separateKeys(x, sep)
End Sub
Like global way of working, I would suggest splitting the file on line separator, then process all the array elements (lines) using the above (adapted) function and finally join it on the line separator.
The newly created file should be open using Workbooks.OpenText, DataType:=xlDelimited, OtherChar:=sep.
Please, test the above function and send some feedback.

Related

Deleting a string between two carriage returns tsql

Very new to SQL so I appreciate your patience in advance.
I have a column in a table that stores a particular set of instructions; each instruction is encapsulated by a carriage return.
eg: char(13)+ #instruction1 + char(13)...
#Instruction1 is a string of variable length but I do know a certain part of the string eg: #instruction1 = some string + #knownstring + some string.
So we have char(13) + (some string + #knownstring + some string) +char(13).
I want to replace this entire line with ''.
Identifying it just using the #knownstring.
Is this possible?
Thanking you all again, I really appreciate your assistance
select replace(replace(column,#knownsting,''),char(13),'')
from table
where key=1235
Replaces only the #knownstring but I also need to replace the surrounding text between the two char(13)
You might try something along this:
DECLARE #KnownString VARCHAR(50)='Keep This'
DECLARE #YourString VARCHAR(MAX)='blah' + CHAR(13) + 'dummy keep this dummy more' + CHAR(13) + 'Something without the known part' + CHAR(13) + 'Again with Keep THIS';
SELECT STUFF(
(
SELECT CHAR(13) + CASE WHEN CHARINDEX(#KnownString,LineText)>0 THEN #KnownString ELSE LineText END
FROM (SELECT CAST('<x>' + REPLACE(#YourString,CHAR(13),'</x><x>') + '</x>' AS XML)) A(Casted)
CROSS APPLY Casted.nodes('/x') B(fragment)
OUTER APPLY (SELECT fragment.value('text()[1]','nvarchar(max)')) C(LineText)
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)'),1,1,'');
The result
blah
Keep This
Something without the known part
Keep This
The idea
The string is transformed to XML by replacing the line breaks with XML tags. Now we can query all text lines separately, check them for the known string, do the needed manipulation, and finally reconcatenate all fragments using the XML-trick (together with STUFF to get rid of the leading CHAR(13)).
Remarks
Using v2016 I'd use the split-string approach with OPENJSON and starting with v2017 there is STRING_AGG() to make the reconcatenation easier.

Postgresql COPY empty string as NULL not work

I have a CSV file with some integer column, now it 's saved as "" (empty string).
I want to COPY them to a table as NULL value.
With JAVA code, I have try these:
String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',', HEADER true)";
String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',', NULL '' HEADER true)";
I get: PSQLException: ERROR: invalid input syntax for type numeric: ""
String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',', NULL '\"\"' HEADER true)";
I get: PSQLException: ERROR: CSV quote character must not appear in the NULL specification
Any one has done this before ?
I assume you are aware that numeric data types have no concept of "empty string" ('') . It's either a number or NULL (or 'NaN' for numeric - but not for integer et al.)
Looks like you exported from a string data type like text and had some actual empty string in there - which are now represented as "" - " being the default QUOTE character in CSV format.
NULL would be represented by nothing, not even quotes. The manual:
NULL
Specifies the string that represents a null value. The default is \N
(backslash-N) in text format, and an unquoted empty string in CSV format.
You cannot define "" to generally represent NULL since that already represents an empty string. Would be ambiguous.
To fix, I see two options:
Edit the CSV file / stream before feeding to COPY and replace "" with nothing. Might be tricky if you have actual empty string in there as well - or "" escaping literal " inside strings.
(What I would do.) Import to an auxiliary temporary table with identical structure except for the integer column converted to text. Then INSERT (or UPSERT?) to the target table from there, converting the integer value properly on the fly:
-- empty temp table with identical structure
CREATE TEMP TABLE tbl_tmp AS TABLE tbl LIMIT 0;
-- ... except for the int / text column
ALTER TABLE tbl_tmp ALTER col_int TYPE text;
COPY tbl_tmp ...;
INSERT INTO tbl -- identical number and names of columns guaranteed
SELECT col1, col2, NULLIF(col_int, '')::int -- list all columns in order here
FROM tbl_tmp;
Temporary tables are dropped at the end of the session automatically. If you run this multiple times in the same session, either just truncate the existing temp table or drop it after each transaction.
Related:
How to update selected rows with values from a CSV file in Postgres?
Rails Migrations: tried to change the type of column from string to integer
postgresql thread safety for temporary tables
Since Postgres 9.4 you now have the ability to use FORCE_NULL. This causes the empty string to be converted into a NULL. Very handy, especially with CSV files (actually this is only allowed when using CSV format).
The syntax is as follow:
COPY table FROM '/path/to/file.csv'
WITH (FORMAT CSV, DELIMITER ';', FORCE_NULL (columnname));
Further details are explained in the documentation: https://www.postgresql.org/docs/current/sql-copy.html
If we want to replace all blank and empty rows with null then you just have to add emptyasnull blanksasnull in copy command
syntax :
copy Table_name (columns_list)
from 's3://{bucket}/{s3_bucket_directory_name + manifest_filename}'
iam_role '{REDSHIFT_COPY_COMMAND_ROLE}' emptyasnull blanksasnull
manifest DELIMITER ',' IGNOREHEADER 1 compupdate off csv gzip;
Note: It will apply for all the records which contains empty/blank values

New line symbol when exporting to excel

I need to fill a cell with a data, separated by 'new line' symbol.
I've tried:
data: l_con_sepa TYPE c VALUE cl_abap_char_utilities=>newline.
...
CONCATENATE <gf_aufk>-tplnr " 6000000159 Korchagin AS 02.02.2017
<gf_aufk>-pltxt
l_con_sepa
<gf_aufk>-aufnr
INTO lv_str
SEPARATED BY space.
Tried to use CL_ABAP_CHAR_UTILITIES=>CR_LF. Tried to use "&" and "#" symbols. Tried to wrap lv_str with quotes. Nothing.
I either got symbols as is, or just a blank space insted of 'alt+enter' equivalent.
A simple experiment with Excel, namely creating a cell with Alt+Enter characters and saving it as a CSV file, shows that such a new line symbol is LF and not CR_LF. Moreover it is put there in double quotes.
So just use double quotes and CL_ABAP_CHAR_UTILITIES=>NEWLINE.
It must work with CSV. You did not specify what API you use to export your data to XLS format, so I cannot test it. If you do not mind putting those details in the question, please do so.
Assuming you use FM SAP_CONVERT_TO_XLS_FORMAT, there is even no need for double quotes.
REPORT YYY.
TYPES: BEGIN OF gty_my_type,
col1 TYPE char255,
col2 TYPE char255,
END OF gty_my_type,
gtty_my_type TYPE STANDARD TABLE OF gty_my_type WITH EMPTY KEY.
START-OF-SELECTION.
DATA(gt_string_table) = VALUE gtty_my_type(
(
col1 = 'aaa'
&& cl_abap_char_utilities=>newline
&& 'bbb'
&& cl_abap_char_utilities=>newline
&& 'ccc'
col2 = 'ddd'
)
).
CALL FUNCTION 'SAP_CONVERT_TO_XLS_FORMAT'
EXPORTING
i_filename = 'D:\temp\abap.xlsx'
TABLES
i_tab_sap_data = gt_string_table
EXCEPTIONS
conversion_failed = 1
OTHERS = 2.
ASSERT sy-subrc = 0.
The result looks like follows
I thought that it might be caused by CONCATENATE .. INTO .. SEPARATED BY space but it is not. Please execute the following program in order to check it out.
REPORT YYY.
TYPES: BEGIN OF gty_my_type,
col1 TYPE char255,
col2 TYPE char255,
END OF gty_my_type,
gtty_my_type TYPE STANDARD TABLE OF gty_my_type WITH EMPTY KEY.
DATA: gs_string TYPE gty_my_type.
DATA: gt_string_table TYPE gtty_my_type.
START-OF-SELECTION.
CONCATENATE 'aaa' cl_abap_char_utilities=>newline 'bbb' cl_abap_char_utilities=>newline 'ccc'
INTO gs_string-col1 SEPARATED BY space.
gs_string-col2 = 'ddd'.
APPEND gs_string TO gt_string_table.
CALL FUNCTION 'SAP_CONVERT_TO_XLS_FORMAT'
EXPORTING
i_filename = 'D:\temp\abap.xlsx'
TABLES
i_tab_sap_data = gt_string_table
EXCEPTIONS
conversion_failed = 1
OTHERS = 2.
ASSERT sy-subrc = 0.
So the problem must be somewhere else. You are not showing us your whole code. Maybe you use some kind of a third party package to process your Excel files?
I don't remember if it's needed to add an "end of line" symbol.
Just append each line into a table and download the full table using FM SAP_CONVERT_TO_XLS_FORMAT.

Populating a wide table with SSRS text parameter with a delimiter

I am trying to populate a table variable in SSRS and call a SP subsequently to process the data in it:
DECLARE #Tbl1 TABLE
(
D01 float,
D02 float,
D03 float,
D04 float,
D05 float,
...
D96 float
)
To populate it I use a text parameter #LS. The input is comma delimited string with 96 elements:
0.635316969,0.756943899,0.890520142,1.028008362,1.166350106,1.30511861,1.444527254,1.580948571,1.578743639,1.575542931,1.573195746,1.571346448,1.571275321,1.56992391,1.568003484,1.567221089,1.556836567,1.543820351,1.53037, ...., ,0.514543561
In a dataset I tried to populate the table first (after table variable declaration):
insert into #Tbl1
VALUES (#LS)
But got this error at run-time: "Column name or number of supplied values does not match table definition."
I tried JOIN(SPLIT()) with comma without luck. Any ideas?
Thanks!
The problem is the the #LS parameter is a single-value text parameter so you can't use it like that - you'd need to use a multi-value parameter.
So let's try something different. You don't need to create a temporary table because you could build your column values to give a dataset that you want using Sql like this:
SELECT 0.635316969 AS D01, 0.756943899 AS D02, ... , 0.514543561 AS D96
Fortunately almost everything in SSRS is an expression, so we just need to build this Sql statement dynamically from the #LS parameter using an expression. Go to the Report menu then Report Properties... and click the Code tab. Enter the following code:
Function MakeSql(LS As String) As String
Dim Sql As String
Dim Values() As String
Dim i As Integer
Sql = "SELECT "
Values = Split(LS, ",")
For i = 0 To Values.Length - 1
Sql = Sql + Values(i) + " AS D" + Right("0" + CStr(i+1), 2) + ", "
Next i
Sql = Left(Sql, Len(Sql) - 2) ' Remove trailing comma
Return Sql
End Function
So what we are doing is splitting the string into an array of values which we loop through to create a Sql statement that aliases these values to the field names we want.
Right-click your dataset, choose Dataset Properties and press the fx button beside the query textbox. This allows us to enter a text expression for our Sql statement rather than an actual Sql statement. Here we need to call the custom code function we created above which will insert our custom built Sql expression:
=Code.MakeSql(Parameters!LS.Value)
Make sure your dataset has the fields D01 to D96 (you'll have to set these up manually because SSRS can't analyse the Sql expression to determine the field values) and you're done!

Groovy String Concatenation

Current code:
row.column.each(){column ->
println column.attributes()['name']
println column.value()
}
Column is a Node that has a single attribute and a single value. I am parsing an xml to input create insert statements into access. Is there a Groovy way to create the following structured statement:
Insert INTO tablename (col1, col2, col3) VALUES (1,2,3)
I am currently storing the attribute and value to separate arrays then popping them into the correct order.
I think it can be a lot easier in groovy than the currently accepted answer. The collect and join methods are built for this kind of thing. Join automatically takes care of concatenation and also does not put the trailing comma on the string
def names = row.column.collect { it.attributes()['name'] }.join(",")
def values = row.column.collect { it.values() }.join(",")
def result = "INSERT INTO tablename($names) VALUES($values)"
You could just use two StringBuilders. Something like this, which is rough and untested:
def columns = new StringBuilder("Insert INTO tablename(")
def values = new StringBuilder("VALUES (")
row.column.each() { column ->
columns.append(column.attributes()['name'])
columns.append(", ")
values.append(column.value())
values.append(", ")
}
// chop off the trailing commas, add the closing parens
columns = columns.substring(0, columns.length() - 2)
columns.append(") ")
values = values.substring(0, values.length() - 2)
values.append(")")
columns.append(values)
def result = columns.toString()
You can find all sorts of Groovy string manipulation operators here.

Resources