MYSQL: Using GROUP BY with string literals - string

I have the following table with these columns:
shortName, fullName, ChangelistCount
Is there a way to group them by a string literal within their fullName? The fullname represents file directories, so I would like to display results for certain parent folders instead of the individual files.
I tried something along the lines of:
GROUP BY fullName like "%/testFolder/%" AND fullName like "%/testFolder2/%"
However it only really groups by the first match....
Thanks!

Perhaps you want something like:
GROUP BY IF(fullName LIKE '%/testfolder/%', 1, IF(fullName LIKE '%/testfolder2/%', 2, 3))
The key idea to understand is that an expression like fullName LIKE foo AND fullName LIKE bar is that the entire expression will necessarily evaluate to either TRUE or FALSE, so you can only get two total groups out of that.
Using an IF expression to return one of several different values will let you get more groups.
Keep in mind that this will not be particularly fast. If you have a very large dataset, you should explore other ways of storing the data that will not require LIKE comparisons to do the grouping.

You'd have to use a subquery to derive the column values you'd like to ultimately group on:
FROM (SELECT SUBSTR(fullname, ?)AS derived_column
FROM YOUR_TABLE ) x
GROUP BY x.derived_column

Either use when/then conditions or Have another temporary table containing all the matches you wish to find and group. Sample from my database.
Here I wanted to group all users based on their cities which was inside address field.
SELECT ut.* , c.city, ua.*
FROM `user_tracking` AS ut
LEFT JOIN cities AS c ON ut.place_name LIKE CONCAT( "%", c.city, "%" )
LEFT JOIN users_auth AS ua ON ua.id = ut.user_id

Related

how do I get rid of leading/trailing spaces in SAS search terms?

I have had to look up hundreds (if not thousands) of free-text answers on google, making notes in Excel along the way and inserting SAS-code around the answers as a last step.
The output looks like this:
This output contains an unnecessary number of blank spaces, which seems to confuse SAS's search to the point where the observations can't be properly located.
It works if I manually erase superflous spaces, but that will probably take hours. Is there an automated fix for this, either in SAS or in excel?
I tried using the STRIP-function, to no avail:
else if R_res_ort_txt=strip(" arild ") and R_kom_lan=strip(" skåne ") then R_kommun=strip(" Höganäs " );
If you want to generate a string like:
if R_res_ort_txt="arild" and R_kom_lan="skåne" then R_kommun="Höganäs";
from three variables, let's call them A B C, then just use code like:
string=catx(' ','if R_res_ort_txt=',quote(trim(A))
,'and R_kom_lan=',quote(trim(B))
,'then R_kommun=',quote(trim(C)),';') ;
Or if you are just writing that string to a file just use this PUT statement syntax.
put 'if R_res_ort_txt=' A :$quote. 'and R_kom_lan=' B :$quote.
'then R_kommun=' C :$quote. ';' ;
A saner solution would be to continue using the free-text answers as data and perform your matching criteria for transformations with a left join.
proc import out=answers datafile='my-free-text-answers.xlsx';
data have;
attrib R_res_ort_txt R_kom_lan length=$100;
input R_res_ort_txt ...;
datalines4;
... whatever all those transforms will be performed on...
;;;;
proc sql;
create table want as
select
have.* ,
answers.R_kommun_answer as R_kommun
from
have
left join
answers
on
have.R_res_ort_txt = answers.res_ort_answer
& have.R_kom_lan = abswers.kom_lan_answer
;
I solved this by adding quotes in excel using the flash fill function:
https://www.youtube.com/watch?v=nE65QeDoepc

ArangoDB AQL: Find Gaps In Sequential Data

I've been given data to build an application that has sequential data in the form of part numbers of products: "000000", "000001", "000002", "000010", "000011" .... The previous application was an old MS Access database that didn't have any gap filling features in the part number generator, hence the gap between "000002" and "000010" (Yes, they are also strings, but I can work with that...).
We could continue to increment based on the last value and ignore the gaps, however, in an attempt to use all numbers available to us with our naming scheme, we'd like to be able to fill the gaps. Our naming scheme describes the "product family" with the first two digits such that: [00]0000 would be a different family from [02]0000.
I can find the starting and ending values using something like:
let query = `
LET first = (
MIN(
FOR part in part_search
SEARCH STARTS_WITH(part.PartNumber, #family)
RETURN part.PartNumber
)
)
LET last = (
MAX(
FOR part in part_search
SEARCH STARTS_WITH(part.PartNumber, #family)
RETURN part.PartNumber
)
)
RETURN { first, last }
`
The above example returns: {first: "000000", last: "000915"}
Using ArangoDB and AQL, how could I go about finding these gaps? I've found some SQL examples but I feel the features of AQL are a bit more limiting.
Thanks in advance!
To start with, I think your best bet for getting min/max values is using aggregates:
FOR part in part_search
SEARCH STARTS_WITH(part.PartNumber, #family)
COLLECT x = 1
AGGREGATE first = MIN(part.PartNumber), last = MAX(part.PartNumber)
RETURN {
first: first,
last: last
}
But that won't really help when trying to find gaps. And you're right - SQL has several logical constructs that could help (like using variables and cursor iteration), but even that would be a pattern I would discourage.
The better path might be to do a "brute force" approach - compare a table containing your existing numbers with a table of all numbers, using a native method like JOIN to find the difference. Here's how you might do that in AQL:
LET allNumbers = 0..9999
LET existingParts = (
FOR part in part_search
SEARCH STARTS_WITH(part.PartNumber, #family)
LET childId = RIGHT(part.PartNumber, 4)
RETURN TO_NUMBER(childId)
)
RETURN MINUS(allNumbers, existingParts)
The x..y construct creates a sequence (an array of numbers), which we use as the full set of possible numbers. Then, we want to return only the "non-family" part of the ID (I'm calling it "child"), which needs to be numeric to compare with the previous set. Then, we use MINUS to remove elements of existingParts from the allNumbers list.
One thing to note, that query would return only the "child" portion of the part number, so you would have to join it back to the family number later. Alternatively, you could also skip string-splitting, and get "fancy" with your list creation:
LET allNumbers = TO_NUMBER(CONCAT(#family, '0000'))..TO_NUMBER(CONCAT(#family, '9999'))
LET existingParts = (
FOR part in part_search
SEARCH STARTS_WITH(part.PartNumber, #family)
RETURN TO_NUMBER(part.PartNumber)
)
RETURN MINUS(allNumbers, existingParts)

How to search influxdb by tag = list of values?

I have an influxdb test data set look like this:
tag: user. values can be user1, user2, ... user1000;
field: value. values can be random between 0 and 100.
let's say I have 1,000 users, and 500 belong to group 1 and the rest belong to group 2. if I want to do something like this: select count(value) from where user in list of (user1, user2, ... user 500), how can I efficiently do this?
I think you're looking for Regular Expressions , aka "regex".
In your case, this is an awkward way to do it, but should get you close. A more elegant and maintainable version is left as an exercise. :)
We have to be careful that we correctly match e.g. user61 but not user610, so we need separate search terms for single-digit user ids, two-digit ids, etc. $ means "end of line", in this case the end of the tag value.
SELECT count(value) from my_measurement WHERE
(user =~ /user[0-9]$/ OR -- this matches user0 to user9
user =~ /user[0-9][0-9]$/ OR -- user10 to user99
user =~ /user[1-4][0-9][0-9]$/ OR -- user100 to user499
user = 'user500')
GROUP BY user

How to remove all characters before a specific character in Cognos Report Studio 10.2

I have columns with different company names. In front of each company name there is a Company_ID. After the Company_ID a specific character = _ divides the ID from the Name. For example i have
111_Mercedes
11B4324_Apple
38A_Google
A1ZH8_Airline
I would like to remove all characters including the specific character.
Result should be
Mercedes
Apple
Google
Airline
Thanks in advance
If this is all in one data item and you need a pattern removed, try this:
As an example, 111_Mercedes 11B4324_Apple 38A_Google
The name starts with _ and ends with a space
Because of this, we can use the replace function to set up the process in two steps
1) Wrap the undesired portion in brackets
Sql would look like this
select
concat('<',replace(
replace('111_Mercedes 11B4324_Apple 38A_Google',' ','<')
,'_','>'))
FROM sysibm.sysdummy1
The result would look like
<111>Mercedes<11B4324>Apple<38A>Google
2) Then remove the content in the brackets
Sql would look like this:
Select trim(REGEXP_REPLACE(
'<111>Mercedes<11B4324>Apple<38A>Google'
, '<(.*?)>',' ',1,0,'c'))
FROM sysibm.sysdummy1
The result would look like this
Mercedes Apple Google
For Cognos try to use the functions in the data item definitions
BracketCompany = concat('<',replace(replace([Company ID],' ','<'),'_','>'))
Then another data item over this, to remove the content within the brackets
FinalCompany = trim(REGEXP_REPLACE([BracketCompany], '<(.*?)>',' ',1,0,'c'))

Multiple LIKE Conditions in the same column

I am trying to return a query that when returns all the courses associated with a course code for example 'CSC' will give me a tuple of [('CSCA08H3F',), ('CSCA20H3F',), ('CSCA67H3F',)]...etc. I know I have to use the LIKE clause, but I seem to be doing it wrong as well feel like there is a simpler way of doing this lol...
def create_course_table(db, course_file):
'''Courses Table should be ID,Course,Section,Name'''
con = sqlite3.connect(db)
cur = con. cursor()
cur.execute('''DROP TABLE IF EXISTS Courses''')
# create the table
cur.execute('''CREATE TABLE Courses( ID TEXT , Course TEXT ,
Sections TEXT , Name TEXT)''')
# Read CSV File
csv_reader = open(course_file, 'r')
csv_reader.readline()
# Insert the rows
for line in csv_reader:
course = line.strip().split(',')
ID = course[0]
Course = course[1]
Section = course[2]
Name = course[3:]
for names in Name:
cur.execute('''INSERT INTO Courses VALUES (?, ?, ?, ?)''',
(ID, Course, Section, names))
# commit and close the cursor and connection
con.commit()
cur.close()
con.close()
db = 'exams.db'
def find_dept_courses(db, dept):
'''Return the courses from the given department. Use the "LIKE"
clause in your SQL query for the course name.'''
return run_query(db, ''' SELECT Course FROM Courses WHERE LIKE Course 'ACT%'
OR LIKE Course 'AFS%' OR LIKE Course 'ANT%' OR LIKE Course 'AST%'
OR LIKE Course 'BIO%'
OR LIKE Course 'CHM%' OR LIKE Course CIT%' OR LIKE Course 'CLA%'
OR LIKE Course 'CRT%' OR LIKE Course 'CSC%' OR LIKE Course 'CTL%'
OR LIKE Course 'ECT%' OR LIKE Course 'EES%' OR LIKE Course 'ENG%'
OR LIKE Course 'EST%' OR LIKE Course 'FRE%' OR LIKE Course 'FST%'
OR LIKE Course 'GAS%'
OR LIKE Course 'GGR%' OR LIKE Course 'HIS%' OR LIKE Course 'HLT%'
OR LIKE Course 'IDS%' OR LIKE Course 'JOU%' OR LIKE Course 'LGG%'
OR LIKE Course 'LIN%' OR LIKE Course 'MAT%' OR LIKE Course 'MDS%'
OR LIKE Course 'MGA%' OR LIKE Course'MGE%' OR LIKE Course 'MGF%'
OR LIKE Course 'MGH%' OR LIKE Course 'MGI%' OR LIKE Course 'MGM%'
OR LIKE Course 'MGO%' OR LIKE Course 'MGS%' OR LIKE Course 'MGT%'
OR LIKE Course 'NRO%' OR LIKE Course 'PHL%' OR LIKE Course 'PHY%'
OR LIKE Course 'PLI%' OR LIKE Course 'POL%' OR LIKE Course 'PPG%'
OR LIKE Course 'PSC%' OR LIKE Course 'PSY%' OR LIKE Course 'RLG%'
OR LIKE Course 'SOC%' OR LIKE Course 'STA%' OR LIKE Course 'VPA%'
OR LIKE Course 'VPD%' OR LIKE Course 'VPM%' OR LIKE Course 'WST%' AND WHERE Course = ? ''', [dept])
Any help or comments would be appreciated.
Familiarize yourself with the proper syntax of the LIKE clause.
You've put the column name after "LIKE". Column name comes before "LIKE".
The following query will return the Course column from all rows in Courses where the Course column starts with the given string.
SELECT Course FROM Courses WHERE Course LIKE ? || "%";
Substitute ? with the desired prefix.
Since all Course column items are uppercase, then you could use IN instead of LIKE.
Simple test at a prompt that shows that printf used on Course can trim it down to 3 characters and so you can compare to 3 characters of each item to search.
sqlite> .schema
CREATE TABLE test (one text);
sqlite> select * from test;
abcd
efgh
ijkl
abcd
efgh
ijkl
abcd
efgh
ijkl
sqlite> select one from test where printf('%.3s',one) in ('abc');
abcd
abcd
abcd
sqlite>
Thus find_dept_courses() could use:
def find_dept_courses(db, dept):
'''Return the courses from the given department. Use the "IN"
clause in your SQL query for the course name.'''
return run_query(db, ''' SELECT Course FROM Courses WHERE printf('%.3',Course)
IN ('ACT','AFS','ANT','AST','BIO','CHM','CIT','CLA','CRT','CSC','CTL',
'ECT','EES','ENG','EST','FRE','FST','GAS','GGR','HIS','HLT','IDS','JOU',
'LGG','LIN','MAT','MDS','MGA','MGE','MGF','MGH','MGI','MGM','MGO','MGS',
'MGT','NRO','PHL','PHY','PLI','POL','PPG','PSC','PSY','RLG','SOC','STA',
'VPA','VPD','VPM','WST') AND WHERE Course = ? ''', [dept])
Edit: If needed sqlite has a upper() and lower() functions to force the case.
sqlite upper function

Resources