How do you choose the last string in a String value? - string

There are two tables:
Student(sid integer, sname varchar(20), programme varchar(4), level integer, age integer)
Tutor(tid integer, tname varchar(20))
The task is to find the names of all IT students who are enrolled in a class taught by a tutor whose surname is Hoffman.
This is what I have so far:
SELECT s.sname
FROM student s, tutor t
WHERE s.programme = 'IT' AND t.tname = '%Hoffman';
However, the '%Hoffman' part will only search for a string which contains 'Hoffman', so technically if if there would be anyone with Hoffman as the first name, the query will return his data too.
The tname column is of type String and consist of a first name, a space, and a last name, and possible middle names in between.
How do you choose the last string (or index?) of a value in t.tname in this case?

You could use something like this:
WHERE s.programme = 'IT' and RIGHT(t.tname, 7)
since 'Hoffman' has 7 characters

Related

How to cast String into int in PostgreSql

I want to cast string into Integer.
I have a table like this.
Have:
ID Salary
1 "$1,000"
2 "$2,000"
Want:
ID Salary
1 1000
2 2000
My query
Select Id, cast(substring(Salary,2, length(salary)) as int)
from have
I am getting error.
ERROR: invalid input syntax for type integer: "1,000"
SQL state: 22P02
Can anyone please provide some guidance on this.
Remove all non-digit characters, then you cast it to an integer:
regexp_replace(salary, '[^0-9]+', '', 'g')::int
But instead of trying to convert the value every time you select it, fix your database design and convert the column to a proper integer. Never store numbers in text columns.
alter table bad_design
alter salary type int using regexp_replace(salary, '[^0-9]+', '', 'g')::int;

Need initial N characters of column in Postgres where N is unknown

I have one column in my table in Postgres let's say employeeId. We do some modification based on the employee type and store it in DB. Basically, we append strings from these 4 strings ('ACR','AC','DCR','DC'). Now we can have any combination of these 4 strings appended after employeeId. For example, EMPIDACRDC, EMPIDDCDCRAC etc. These are valid combinations. I need to retrieve EMPID from this. EMPID length is not fixed. The column is of varying length type. How can this be done in Postgres?
I am not entirely sure I understand the question, but regexp_replace() seems to do the trick:
with sample (employeeid) as (
values
('1ACR'),
('2ACRDCR'),
('100DCRAC')
)
select employeeid,
regexp_replace(employeeid, 'ACR|AC|DCR|DC.*$', '', 'gi') as clean_id
from sample
returns:
employeeid | clean_id
-----------+---------
1ACR | 1
2ACRDCR | 2
100DCRAC | 100
The regular expression says "any character after any of those string up to the end of the string" - and that is then replace with nothing. This however won't work if the actual empid contains any of those codes that are appended.
It would be much cleaner to store this information in two columns. One for the empid and one for those "codes"

Using U-SQL to eliminate duplicate and null values in one specific column while keeping a 2nd column properly aligned

I am trying to use U-SQL to remove duplicate, null,'',and Nan cells in a specific column called "Function" of a csv file. I also want to keep the Product column correctly aligned with the Function column after the blank rows are removed. So i would want to remove the same rows in the Product column as I do in the Function column to keep them properly aligned. I want to only keep one occurrence of a duplicate Function row. In this case I only want to keep the very first occurrence. The Product column has no empty cells and has all unique values. Any help is greatly appreciated. I know this can be done in a much easier way, but I want to use the code to automate the process as the Data in the DataLake changes over time. I think I am somewhat close in the code i currently have. The actual data set is a very large file and I am fairly certain that there are at least 4 duplicate values in the Functions column that aren't simply empty cells. I need to eliminate both duplicate values and empty cells in the Function column because empty cells are being recognized as duplicates as well. I want to be able to use the Function values as a primary key in the next step of my school project that wont include the Product column.
DECLARE #inputfile string = "/input/Function.csv";
//DECLARE #OutputUserFile string = "/output/Test_Function/UniqueFunction.csv";
#RawData =
EXTRACT Function string,
Product string
FROM #inputfile
USING Extractors.Csv(encoding: Encoding.[ASCII]);
// Query from Function data
// Set ROW_NUMBER() of each row within the window partitioned by Function field
#RawDataDuplicates=
SELECT ROW_NUMBER() OVER (PARTITION BY Function) AS RowNum, Function AS function
FROM #RawData;
// ORDER BY Function to see duplicate rows next to one another
#RawDataDuplicates2=
SELECT *
FROM #RawDataDuplicates
ORDER BY function
OFFSET 0 ROWS;
// Write to File
//OUTPUT #RawDataDuplicates2
//TO "/output/Test_Function/FunctionOver-Dups.csv"
//USING Outputters.Csv();
// GROUP BY and count # of duplicates per Function
#groupBy = SELECT Function, COUNT(Function) AS FunctionCount
FROM #RawData
GROUP BY Function
ORDER BY Function
OFFSET 0 ROWS;
// Write to file
//OUTPUT #groupBy
//TO "/output/Test_Function/FunctionGroupBy-Dups.csv"
//USING Outputters.Csv();
#RawDataDuplicates3 =
SELECT *
FROM #RawDataDuplicates2
WHERE RowNum == 1;
OUTPUT #RawDataDuplicates3
TO "/output/Test_Function/FunctionUniqueEmail.csv"
USING Outputters.Csv(outputHeader: true);
//OUTPUT #RawData
//TO #OutputUserFile
//USING Outputters.Csv(outputHeader: true);
I have also commented out some code that I don't necessarily need. When I run the code as it is, I am currently getting this error: this E_CSC_USER_REDUNDANTSTATEMENTINSCRIPT, Error Message: This statement is dead code.. –
It does not give a line number but likely the "Function AS function" line?
Here is a sample file that is a small slice of the full spreadsheet and only includes data in the 2 relevant columns. The full spreadsheet has data in all columns.
https://www.dropbox.com/s/auu2aco4b037xn7/Function.csv?dl=0
here is a screenshot of the output I get when I follow wBob's advice and click.
You can apply a series of transformations to your data using string functions like .Length and ranking function like ROW_NUMBER to remove the records you want, for example:
#input =
EXTRACT
CompanyID string,
division string,
store_location string,
International_Id string,
Function string,
office_location string,
address string,
Product string,
Revenue string,
sales_goal string,
Manager string,
Country string
FROM "/input/input142.csv"
USING Extractors.Csv(skipFirstNRows : 1 );
// Remove empty columns
#working =
SELECT *
FROM #input
WHERE Function.Length > 0;
// Rank the columns by Function and keep only the first one
#working =
SELECT CompanyID,
division,
store_location,
International_Id,
Function,
office_location,
address,
Product,
Revenue,
sales_goal,
Manager,
Country
FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Function ORDER BY Product) AS rn
FROM #working
) AS x
WHERE rn == 1;
#output = SELECT * FROM #working;
OUTPUT #output TO "/output/output.csv"
USING Outputters.Csv(quoting:false);
My results:

Table column split by value to other columns and it's values

I have table this kind if look and it represent specifications for products
where 1st columns is SKU and serve as ID and 2nd column us specifications specifications title,Value and 0 or 1 as optional parameter(1 is default if it missed) separated by "~" and ech option is seperated by ^
I want to split it to table with SKU and each of specifications title as column header and value as it's value
I manage to write this code to split it to records with dived specifications and stack with separating title from value for each specification and record and how looking for help with this
let
Source = Excel.CurrentWorkbook(){[Name="Таблица1"]}[Content],
Type = Table.TransformColumnTypes(Source,{{"Part Number", type text}, {"Specifications", type text}}),
#"Replaced Value" = Table.ReplaceValue(Type,"Specification##","",Replacer.ReplaceText,{"Specifications"}),
SplitByDelimiter = (table, column, delimiter) =>
let
Count = List.Count(List.Select(Text.ToList(Table.Column(table, column){0}), each _ = delimiter)) + 1,
Names = List.Transform(List.Numbers(1, Count), each column & "." & Text.From(_)),
Types = List.Transform(Names, each {_, type text}),
Split = Table.SplitColumn(table, column, Splitter.SplitTextByDelimiter(delimiter), Names),
Typed = Table.TransformColumnTypes(Split, Types)
in
Typed,
Split = SplitByDelimiter(#"Replaced Value","Specifications","^"),
Record = Table.ToRecords(Split)
in
Record
Ok, I hope you still need this, as it took the whole evening. :))
Quite interesting task I must say!
I assume that "~1" is always combined with "^", so "~1^" always ending field's value. I also assume that there is no ":" in values, as all colons are removed.
IMO, you don't need to use Table.SplitColumn function at all.
let
//replace it with your Excel.CurrentWorkbook(){[Name="Таблица1"]}[Content],
Source = #table(type table [Part Number = Int64.Type, Specifications = text], {{104, "Model:~104~1^Type:~Watch~1^Metal Type~Steel~1"}, {105, "Model:~105~1^Type:~Watch~1^Metal Type~Titanium~1^Gem Type~Ruby~1"}}),
//I don't know why do you replace these values, do you really need this?
ReplacedValue = Table.ReplaceValue(Source,"Specification##","",Replacer.ReplaceText,{"Specifications"}),
TransformToLists = Table.TransformColumns(Source, {"Specifications", each List.Transform(List.Select(Text.Split(_ & "^", "~1^"), each _ <> "") , each Text.Split(Text.Replace(_, ":", ""), "~")) } ),
ConvertToTable = Table.TransformColumns(TransformToLists, {"Specifications", each Table.PromoteHeaders(Table.FromColumns(_))}),
ExpandedSpecs = Table.TransformRows(ConvertToTable, (x) => Table.ExpandTableColumn(Table.FromRecords({x}), "Specifications", Table.ColumnNames(x[Specifications])) ),
UnionTables = Table.Combine(ExpandedSpecs),
Output = UnionTables
in
Output
UPDATE:
How it works (skipping obvious steps):
TransformToLists: TransformColumns takes table, and a list of column names and functions applied to this column's value. So it applies several nested functions to the value of "Specifications" field of each row. These functions do the following: List.Select returns list of non-empty values, which in order was obtained by applying Text.Split function to the value of "Specifications" field having ":"s removed:
Text.Split(
Text.Replace(_, ":", "")
, "~")
Each keyword means that following function applied to every processed value (it can be field, column, row/record, list item, text, function, etc), which is indicated with the underscore sign. This underscore can be replaced with a function:
each _ equals (x) => some_function_that_returns_corresponding_value(x)
So,
each Text.Replace(_, ":", "")
equals
(x) => Text.Replace(x, ":", "") //where x is any variable name.
//You can go further and make it typed, although generally it is not needed:
(x as text) => Text.Replace(x, ":", "")
//You can also write a custom function and use it:
(x as text) => CustomFunctionName(x)
Having said, TransformToLists step returns a table with two columns: "Part number" and "Specifications", containing list of lists. Each of these lists has two values: column name and its value. This happens because initial value in "Specifications" field has to be split twice: first it is split to pairs by "~1^", and then each pair is split by "~". So now we have column name and its value in each nested list, and now we have to convert them all into a single table.
ConvertToTable: We apply TransformColumns again, using a function for each row's "Specifications" field (remember, a list of lists). We use Table.FromColumns, as it takes a list of lists as an argument, and it returns a table where 1st row is column headers and second is their values. Then we promote 1st row to headers. Now we have a table, and "Specifications" field containing nested table with variable number of columns. And we have to put them all together.
ExpandedSpecs: Table.TransformRows applies transformation function to every row (as a record) in a table (in the code it is signed as x). You can write your custom function, as I did:
= Table.ExpandTableColumn( //this function expands nested table. It needs a table, but "x" that we have is a record. So we do the conversion:
Table.FromRecords({x}) //function takes a list of records, so {x} means a list with single value of x
, "Specifications" //Column to expand
, Table.ColumnNames(x[Specifications]) //3rd argument is a list of resulting columns. It takes table as an argument, and table is contained within "Specifications" field.
)
It returns a list of tables (having single row each), and we combine them using Table.Combine at UnionTables step. This results in a table having all the columns from combined tables, with nulls when there is no such a column in some of them.
Hope it helps. :)
A TextToColumns VBA solution is much simpler if I understand what you are asking MSDN for Range.TextToColumns

Reading columns with different valuetypes

A SliceQuery< Long, String, String > says the keytype is long, column name is string and column value is string. When I execute the slice query using QueryResult < ColumnSlice< String, String > > I can get all the columns, but the key is long so there must be a column whose value is long type. It's a bit confusing for me to see how type safety works here(since query result will get a column type ).
Also, if there is a column with value type other than string, then problem must arise.
How to have a generic slicequery that can be used to query columns of different value types ?
P.S : I am new to cassandra/hector.
Thanks
Almost. The first type is the type of the row key as you point out, but the row key is not stored as a column. The row key is stored off in some other special place. This is one of those gotcha's that folks coming from the relational DB world (like me) trip over.
As for how to manage column values with different types, there's a two-pronged approach. First, you store the value as a byte array and serialize it yourself. Second, you key off of the column name to tell you which column - and therefore which value type - you're dealing with. Once you know the correct type you can use the appropriate Serializer to deserialize the byte value into a variable of the correct type. For your own complex objects and special types, you can write your own serializers.

Resources