How to count the number of observations in a SAS table? - statistics

I am very new to SAS. Now, I have a SAS data table as following:
ID score
-------------------
01 1
02 3
03 4
04 2
Is there any way to save the number of observations in this table using only PROC SORT and DATA step? I want to hold the value in the log window, which is like "hold N=4" in the SAS log script.
Sorry for my unprofessional description. Thanks in advance.

As a new SAS user, the NOBS option may be all you need. However, as your coding skills increase, you may find yourself in situations where is is not appropriate. The NOBS option on the SET statement may not work in all cases. The value returned will be the number of physical observations in the data set, including any observations that may have been deleted in-place. It also may not work with certain views (especially views connected to external databases).
The "safest" way to find the number of undeleted observations in a data set or view is to use PROC SQL and actually count them, putting the result into a macro variable. For example, suppose you have a data object named HAVE:
proc sql noprint;
select count(*) into : nobs
from WORK.HAVE;
quit;
%put 'Obs in data set:' &nobs;
Note this works if HAVE is a data set or a view.
Alternatively, if you object is just a data set, you can use the SAS TABLES Dictionary view to return the NLOBS attribute, which has the number of "logical" observations (i.e. accounts for any deleted rows):
proc sql noprint;
select nlobs into : nobs
from dictionary.tables
where libname='WORK'
and memname='HAVE';
quit;
%put 'Obs in data set:' &nobs;
This will certainly be more efficient if your SAS data set is very large. I've often wondered why SAS does not make this NLOBS value available as an option on the SET statement, but I'm sure there are reasons.
PROC SQL, views, macro variables, and in-place deleted observations may be all new to you right now, but as you advance with your SAS learning you are bound to start using them.

Use the nobs= in the set statement.
data _null_;
set xyz nobs=nobs;
put "HOLD N=" nobs ;
stop;
run;

data null;
set sashelp.vtable;
where libname="WORK" and memname="DS1";
call symput("count_obs",nlobs);
run;
%put obs in ds1 is :&count_obs;

Related

Displaying SQL data from multiple tables

I have two tables that hold information needed to display time clock interaction in an excel sheet. The data will need to update with every time clock interaction. I joined the two tables and it was pointed out to me that data duplication is a big no no. Looking for a more simple solution than to do a join everyday so I can have recent interactions. Once I can get the SQL end set up, I can handle the excel side.
Table info:
From the dbo.employees table I need the ID, Last_Name, First_Name
From the dbo.employeetimecardactions I need ID, ActionTime, ActionDate, ShiftStart, Action Type.
ID is the common column between the two tables of course.
If my JOIN statement is needed I will supply, but seeing as the data duplication is a problem I would like to start fresh with NO prior code brought into it.
Also any additional information needed can be supplied if I know exactly what is needed
END RESULT- Excel File that I can share with the powers that be. Contains all recent time clock interactions. Also it would be nice to be able to search by date or employee but that should be an Excel function I would think, and not absolutely necessary
Please check the names of the two tables and correct appropriately, this is based on the first part of this thread and later comments:
SELECT E.EmployeeID, E.First_Name, E.Last_Name, A.ActionTime, A.ActionDate, A.ShiftStart, A.ActionType
FROM Employees E LEFT OUTER JOIN
EmployeeTimeCardActions A ON E.EmployeeID=A.EmployeeID
Here's a WHERE clause to include date. Please check your DB for date format to use:
="WHERE ActionDate BETWEEN '" & TEXT(A2,"mm/dd/yyyy") & "' AND '"&TEXT(B2,"mm/dd/yyyy")&"'"
The formula is in cell C2

What is the best way to move out-of-order Access records into the proper order by using a locked ID field?

I have roughly 1500 records in an Access database. I have a field ID that acts as the primary key, and as such cannot be manually changed. After looking through the original Excel sheet these records were kept in, I noticed that a few records in Excel were missing from the Access database. After going through all of them, I added the three missing records into Access.
This database stores records in date order, grouped by a manufacturer. Ex. records from Manufacturer1 collected during week 1 of June '16 are all located together, and records from Manufacturer2 collected during week 2 of June '16 are stored directly afterwards. This is important for us because the data in this database often needs to be looked at visually, so keeping things in date order is essential. There is also a macro that export the data to an Excel sheet and formats it to be easier to read, which exports the records in the order in which they are stored (by the ID field). This is a problem because the three missing records are from years past - now they are in the middle of records from 2018. The IDs they were assigned upon entry keeps them in that location.
Is there a way to reliably insert these records into the database in the location at which they should be? Such as shifting the values of other records ID fields down by 3 to allow room for the missing records? I know I can probably manually have those three records move to the desired location in the macro that exports to Excel, but I'd rather have a less hacky solution that could work if a similar problem happens again.
The order of data in a database is of no interest to the database - it's the relation between data that matters.
To always view your data in the order you want use the ORDER BY clause in an SQL statement. Generally you can add data to the underlying table directly through the query - unless you've got many-to-one type queries where your update would need to affect more than one record.
SELECT FieldName1, FieldName2, . . . .
FROM MyDataTable
ORDER BY Manufacturer, Date
Edit: Even here you'll be adding new records to the bottom of the dataset, but refreshing the query will move the records to the correct order.

Is there a workaround for the maximum length of an ODBCConnection.CommandText string in VBA?

I have a VBA script that generates a query string for a SAP HANA ODBC Connection in Excel. The query is determined by user inputs and can vary greatly in length. The query itself uses many versions of a similar query appended to one another using UNION ALL syntax.
The script sometimes throws a runtime error when trying to refresh. From my research, it has become clear that the reason for this is that the CommandText string exceeds a maximum allowed length of 32,767 (https://ask.sqlservercentral.com/questions/50819/too-long-sql-in-excel-vba.html).
I wondered whether there is a workaround for this, other than using a stored procedure (I am not against this if there is a way to create a stored procedure at runtime then execute it, but I cannot use a predefined stored procedure as my query is always different hence the need for VBA to create it)
Some more info about the dynamic query in VBA:
Column names, as well as parameters, are created dynamically and can be different every time
The query uses groups of lists of product numbers to generate an IN statement for each product group, then sums the sales for those products under the name of the group. These are then all UNION'd together to create one table with grouped records
Example of user input:
Example of resulting query:
WITH SOME_CTE (SOME_FIELDS) AS
(SELECT SOME_STUFF
FROM SOME_TABLE
WHERE SOME_STUFF_IS_GOING_ON)
SELECT GEND "Gender", 'Attribute 1' "Attribute", SUM(UNITS) "Units", SUM(VAL) "Value", SUM(MARGIN) "Margin"
FROM SOME_CTE
WHERE PRODUCT IN ('12345', '23456', '34567', '45678')
GROUP BY GEND
UNION ALL
SELECT GEND, 'Attribute 2' ATTR_NAME, SUM(UNITS), SUM(VAL), SUM(MARGIN)
FROM SOME_CTE
WHERE PRODUCT IN ('01234', '02345', '03456', '03567')
GROUP BY GEND
ORDER BY "Gender", "Attribute"
...and so on.
As you can see, with 2 attribute groups containing 4 products each there is no problem, but when we get to about 30 with several hundred each, it could be too long.
Note: I have tried things like shortening field references in the repeated parts of the query string to 1 character etc. which helps but does not solve the problem.
Any help would be greatly appreciated.
One workaround is to send multiple queries. Since you are using union all, you could execute every time single select statement, i.e.
create table in (for example) master database (don't create temporary tables! as they will be dropped after every query) - but before that, make sure you create new table, so delete old one if exists (also drop the table after you are done with it). Now every single select statement you'll change to insert statement, which will insert records to your so-called temporary table.
This way, you'll avoid lengthy queries, you'll just send single insert .. into.. select statements.
At the end, to get all results, you just need simple select query. After getting this data, you should drop that table, as it's no longer needed.

Excel Power Query - Incremental Load from Query and adding date

I am trying to do something quite simple which I am failing to understand.
Take the output from a query, date time stamp and write it into a Excel table.
Iterate the logic again and you get the same output but the generated date time has progressed in time.
Query 1 -- From SQL which yields 2 columns category, count.
I am taking this and adding a generated date to it using DateTime.LocalNow().
Query 2 -- Target table
How can i construct a query which adds to an existing table and doesnt require me to load the result into a new table.
I have seen this blog.oraylis.de and i cant make it work since the DateTime.LocalNow() call runs for source and target and i end up with the same datetime throughout the query.
I think i am missing something obvious.
EDIT:-
= Table.Combine({SOURCE_DATA, TARGET_DATA})
This loads into a 3rd new table and doesnt take into account that 3rd table when loading - so you just end up with a new version of just the first two tables with new timestamp
These steps should work
create a query Q1 based on the SQL Statement, add your timestamp using DateTime.LocalNow() and load this into an Excel table (execute the query)
create a new query Q2 based on this Excel new table (just like that, no transforms)
Modify the first query Q1 by adding the Table.Combine with Q2 as the last step.
So, in other words, Q2 loads the existing data from the Excel table into which Q1 writes. The Excel table is always written completely but since the existing data is preserved you will get the result of new data being loaded to the table. Hope this helps.
Good luck, Hilmar

how to do a query with cassandradb counter table

i have a table in Cassandradb as mentioned below:
CREATE TABLE remaining (owner varchar,buddy varchar,remain counter,primary key(owner,buddy));
generally i do some inc/dec operations on REMAIN field ,using cql like below:
update remaining set remain=remain + 1 where owner='userA' and buddy='userB';
update remaining set remain=remain + 1 where owner='userA' and buddy='userC';
....
and now i need to find out all buddies for userA which it's REMAIN field greater then 1. when i using:
select buddy,remain from remaining where owner='userA' and remain > 0;
gives me an error:
No indexed columns present in by-columns clause with Equal operator
how to do this in a cassandradb way?
The short answer to this is that you cannot do queries with conditionals on counter columns in Cassandra.
The reason behind this is that all Cassandra queries need to be modeled around the primary key of that particular table. Counter columns are not allowed as parts of the primary key of a table (their changing values would cause constant reorganization of the dat on disk). Counter columns are more used for tracking the state of a known piece of data, for example number of times a photo has been up-voted. This could be quickly recalled as long as we knew which photo we were interested in. To actually sort photos by numbers of votes you would need to perform an analytics style query using spark or Hadoop.

Resources