My first post/question here, I have searched and tried many options but nothing seems to fit exactly what I need.
I am building an Access DB to manage scheduling work and assigning employees to that work. The Scheduled work comes from an Excel Sheet that I've imported into Access. I don't have control over how the data comes to me or the format.
So I have a table 'tblTempP6' with the scheduled work for the year. Several columns determine a unique entry.
I have another table 'shtP6DataEast' has an index which is the primary key. the two tables are the same except shtP6DataEast entries has and index. so i can access specific entries and assign employees etc..
Each time i get a new sheet to import the existing scheduled work is still in the sheet and i dont want duplicate entries with different ids.
I have tried Left Joins but have had issues because i need to use 4 columns as the unique identifier...
Any thought?
Thanks
forgive my NOOB Question
SELECT tblTempP6.[Project #], tblTempP6.[Work Order #], tblTempP6.[Project Name], tblTempP6.[Activity Name], tblTempP6.[TOA #], tblTempP6.Start, tblTempP6.Finish, tblTempP6.[Budgeted Labor Units], tblTempP6.[S/S - SUBSTATION], tblTempP6.iCircuit, tblTempP6.isStreet, tblTempP6.[Test Group Work Type], tblTempP6.Comments, tblTempP6.[Resource IDs]
FROM tblTempP6
LEFT JOIN shtP6DataEast ON tblTempP6.[Project Name] = shtP6DataEast.[Project Name]
WHERE (((shtP6DataEast.[Project Name]) Is Null));
Well I will try to answer this generally, using tmp as the name of the table with the freshly imported and possibly duplicated data, and dat as the name of the table with the previously imported, permanent data.
There's a bunch of ways to do it, here's how to do it with LEFT JOIN:
SELECT *
FROM tmp
LEFT JOIN dat ON tmp.KeyCol1 = dat.KeyCol1
AND tmp.KeyCol2 = dat.KeyCol2
AND tmp.KeyCol3 = dat.KeyCol3
AND tmp.KeyCol4 = dat.KeyCol4
WHERE dat.IDcol IS NULL
You do the left join against the existing data, and then exclude all rows that actually matched to that table.
Related
I have a Databricks delta table of financial transactions that is essentially a running log of all changes that ever took place on each record. Each record is uniquely identified by 3 keys. So given that uniqueness, each record can have multiple instances in this table. Each representing a historical entry of a change(across one or more columns of that record) Now if I wanted to find out cases where a specific column value changed I can easily achieve that by doing something like this -->
SELECT t1.Key1, t1.Key2, t1.Key3, t1.Col12 as "Before", t2.Col12 as "After"
from table1 t1 inner join table t2 on t1.Key1= t2.Key1 and t1.Key2 = t2.Key2
and t1.Key3 = t2.Key3 where t1.Col12 != t2.Col12
However, these tables have a large amount of columns. What I'm trying to achieve is a way to identify any columns that changed in a self-join like this. Essentially a list of all columns that changed. I don't care about the actual value that changed. Just a list of column names that changed across all records. Doesn't even have to be per row. But the 3 keys will always be excluded, since they uniquely define a record.
Essentially I'm trying to find any columns that are susceptible to change. So that I can focus on them dedicatedly for some other purpose.
Any suggestions would be really appreciated.
Databricks has change data feed (CDF / CDC) functionality that can simplify these type of use cases. https://docs.databricks.com/delta/delta-change-data-feed.html
I have an Excel spreadsheet which I use as a relational database for my milk round. I query this database using MS Query in Excel (Mac 2011 Version) to generate my delivery routes. One of the columns is the customer address and I'd like to have this shown once per order i.e. have a distinct query for just this column while displaying multiple other rows. It's purely for cosmetic purposes to make the spreadsheet less cluttered.
The main spreadsheet I use as my database has column headings which I have screenshotted, complete with some sample data:
From this main spreadsheet I use MS Query to generate my delivery route which looks like this:
As you can see there is a lot of repeated data in the route generated from the query. What I'd like to do is have just one instance of the address per customer's order, it would help with the legibility of the route when opened in an iPad. I hide other columns that aren't really necessary to help in that regard.
*EDIT
From isolated's comments below, here's a screenshot of ideally how the data returned from the query should look:
I've manually deleted the repeated info in the name & address column to achieve the desired result. I've also hidden some columns that aren't really necessary and I use some conditional formatting rules to help distinguish each customer's order.
EDIT*
I have tried using a group by clause and the following window function but can't get it to work:
SELECT *
FROM (
SELECT “All Orders”.”Route ID”,
“All Orders”.Name,
“All Orders”.Address
ROW_NUMBER() OVER(PARTITION BY “All Orders”.Address
ORDER BY “All Orders”.Address DESC) AS row_number
FROM “All Orders”
) AS rows
WHERE row_number = 1;
Whenever I try to run the query I get an error message regarding syntax. Hopefully someone can tell me where I'm going wrong!
I don't know MS Sql at all, but you could do something with a formula in excel. If you don't like this solution, simply put a comment below that you would still like a sql route and I can get you a query to try to adapt to ms sql.
Create another column and call it address2 (or several more columns if your address field is multiple columns).
Then use this/these formula and adjust as needed:
Column F (address2): =IF(A2=A1,"",C2)
Column G (town2): =IF(A2=A1,"",D2)
You can then hide columns C and D.
=============
U P D A T E
Here's a method that works in many dbms such as postgres, but I don't know how to adapt [rank() over (partition by...] to excel sql.
select account,
cust_name,
item,
case
when prod_rank = 1 then address
else ''
end address
from (
select
account,
cust_name,
item,
address,
rank() over (partition by account order by item) as prod_rank
from table1
)z
order by account, item
I tried a few variations in excel sql and finally got this one to work.
select a.Account,
a.Name,
a.Product,
Iif(a.product = b.min_item,a.address,'') as [address]
FROM table1 as a
,(
select
z.Account,
min(z.Product) as min_item
FROM table1 as z
group by z.Account ) as b
where b.account = a.Account
order by a.account, a.product
I have 2 big tables (1 has 690K Rows, 2nd one has 890K rows).
They have the same format and columns:
Username - Points - Bonuses - COLUMN D... COLUMN - K.
Lets say in the first table i have the "Original" usernames and in the 2nd table i have "New" usernames + Some of the "Original" usernames (So people who are still playing + people who are new to the game).
What I'm trying to do is to merge them so i can have in a single table (sum up) their values.
I've already made my tables proper System Tables.
I created their connection in the workbook.
I've tried to merge them but i keep getting less rows than i expect to have, so some records are being left out or not being summed.
I've tried Left Outer, Right Outer, Full Outer with no success.
This is where im standing:
As #Jenn said, i had to append the tables instead of merging them and i also used a filter inside PowerQuery to remove all blanks/zeros before loading it into Excel, i was left with 500K Unique rows, instead of 1.6 Million. Thanks for the comment!
I would append the tables, as indicated above. First load each table separately into PowerQuery, and then append one table into the other one. The column names look a little long and it may make sense to simplify the column names so that the system doesn't read them as different columns due to an inadvertent typo.
I have two tables that hold information needed to display time clock interaction in an excel sheet. The data will need to update with every time clock interaction. I joined the two tables and it was pointed out to me that data duplication is a big no no. Looking for a more simple solution than to do a join everyday so I can have recent interactions. Once I can get the SQL end set up, I can handle the excel side.
Table info:
From the dbo.employees table I need the ID, Last_Name, First_Name
From the dbo.employeetimecardactions I need ID, ActionTime, ActionDate, ShiftStart, Action Type.
ID is the common column between the two tables of course.
If my JOIN statement is needed I will supply, but seeing as the data duplication is a problem I would like to start fresh with NO prior code brought into it.
Also any additional information needed can be supplied if I know exactly what is needed
END RESULT- Excel File that I can share with the powers that be. Contains all recent time clock interactions. Also it would be nice to be able to search by date or employee but that should be an Excel function I would think, and not absolutely necessary
Please check the names of the two tables and correct appropriately, this is based on the first part of this thread and later comments:
SELECT E.EmployeeID, E.First_Name, E.Last_Name, A.ActionTime, A.ActionDate, A.ShiftStart, A.ActionType
FROM Employees E LEFT OUTER JOIN
EmployeeTimeCardActions A ON E.EmployeeID=A.EmployeeID
Here's a WHERE clause to include date. Please check your DB for date format to use:
="WHERE ActionDate BETWEEN '" & TEXT(A2,"mm/dd/yyyy") & "' AND '"&TEXT(B2,"mm/dd/yyyy")&"'"
The formula is in cell C2
I am working with Excel 2010, Power Query, and PowerPivot.
I have a query named Database that consists of 60+ merged tables containing a total of 2m+ rows. I also have a separate query that consists of two columns PrimaryKey3 and Members (a count of members per month). The entries in PrimaryKey3 are unique, consisting of ID-MMM-YY.
Both queries have PrimaryKey3 in common, however in Database there can be multiple rows with the same PrimaryKey3.
In order to match a member amount to each row in Database, I tried a Left Outer join. There were no errors, but when I try to upload to PowerPivot it says there are only 169K rows. I then tried Full Outer join and Inner Join, and received an error "could not convert value to number," coming from a column already formatted as a text in Database. This column contains numbers and numbers proceeding with a letter: 1234, A234. Every non-blank row has a PrimaryKey3. Why is it trying to reformat my columns/ how do I get around that?
Should I be using a different type of join, or is there another way besides merging to do this?
Hope this makes sense, thank you for any help in advance!
I uploaded both queries to PowerPivot, and created a relationship through PrimaryKey3. I then created a new column in Database with =Related(Enrollment[Members]).