Azure Synapse: Can't drop column due to it has dependent statistics - statistics

We use Azure Synapse Analytics. When trying to drop a column
ALTER table my_table drop COLUMN my_column
it says that:
The statistics 'Stat_616f789ac8c54c449f7910cb3bcb3810' is dependent on column 'my_column'.
But I had no luck in finding said stats to drop them:
select * from sys.stats where name like '%Stat%'
How can I determine and eliminate this annoying blocker?

You can use a simple modification of this query to show the auto-created statistics for a table. Make the following changes:
where st.user_created = 0
and sm.name = 'MY_SCHEMA_NAME'
and tb.name = 'MY_TABLE_NAME'
Then drop the offending stats using the DROP STATISTICS command.
Alternatively, turn off auto-statistics to avoid the creation of statistics you don't recognise.
Documentation here.

Related

Power Query how to make a Table with multiple values a parameter that uses OR

I have a question regarding Power Query and Tables as parameters for excel.
Right now I can create a table and use it as a parameter for Power query via Drill down.
But I'm unsure how i would proceed with a Table that has multiple values. How can a table be recognized with multiple "values" as a parameter
For example:
I have the following rawdata and parameter tables
Rawdata+parametertables
Now if I wanted to filter after Value2 with a parameter tables I would do a drill down of the parameter tables and load them to excel.
After that I have two tables that I can filter Value2 with an OR Function by 1 and 2
Is it possible to somehow combine this into 1 Table and that it still uses an OR Function to search
Value2
Im asking because I want it to be potentially possible to just add more and more parameters into the table without creating a new table everytime. Basically just copy paste some parameters into the parameter table and be done with it
Thanks for any help in advance
Assuming, you use Parameters only for filtering. There are other ways, but this one looks the best from performance point of view.
You may create Parameters table, so you have such tables:
Note, it's handy to have the same names (Value2) for key column in both tables, otherwise Table.Join will create additional column(s) after merging tables.
Add similar step to filter RawData table:
join = Table.Join(RawData, "Value2", Parameters, "Value2")

Pivoting based on Row Number in Azure Data Factory - Mapping Data Flow

I am new to Azure and am trying to see if the below result is achievable with data factory / mapping data flow without Databricks.
I have my csv file with this sample data :
I have following data in my table :
My expected data/ result:
Which transformations would be helpful to achieve this?
Thanks.
Now, you have the RowNumber column, you can use pivot activity to do row-column pivoting.
I used your sample data to made a test as follows:
My Projection tab is like this:
My DataPreview is like this:
In the Pivot1 activity, we select Table_Name and Row_Number columns to group by. If you don't want Table_Name column, you can delete it here.
At Pivote key tab, we select Col_Name column.
At Pivoted columns, we must select a agrregate function to aggregate the Value column, here I use max().
The result shows:
Please correct me if I understand you wrong in the answer.
update:
The data source like this:
The result shows as you saied, ADF sorts the column alphabetically.It seems no way to customize sorting:
But when we done the sink activity, it will auto mapping into your sql result table.

Create a list of mode values based on table in Excel

I am trying to create a table that will show me the mode of a data set. The data is contained in 3 columns.
Sample Data
Though the actual data set is thousands of rows
I am trying to identify what the most frequent rate paid is for each weight and zone.
I can get an average via a pivot table. I can also have a pivot table show me how many times each rate shows up in each weight and zone, but that is just a count. I would like it to show me the mode rate.
Any ideas on how to work this would be very appreciated!
Update: This is what I need the end result to look like:
Result:
I found the answer to what I needed to do here: https://www.get-digital-help.com/2010/02/11/match-two-criteria-and-return-multiple-rows-in-excel/
I was able to use this formula to create a list of values. From that list I used a mode and min formula to return the mode or min value.
From that I was able to populate a table with the values as needed.
Screen shot of the results.

How to get value from nested relations in Power Pivot?

I'm using Power Pivot add-in to create a data warehouse for generate dynamic tables and graphs (strictly data source is Excel), but I have a problem whit a calculate in the relations. My data model is the following:
My Snowflake data warehouse model
So for the fact table "fSales" I need to multiply the dCostDetail[Value] per dWorkCost[Value] to generate the fSales[Expenses] amount.
I tried to use the formula but I get an error: related but it don't allow to nested between the relations, e.g. fSales[Expenses] = related(dCostDetail[Value])*related(dWorkCost[Value])
Also I tried to use the next formula:
fSales[Expenses] = related(dWorkCost[Value]) * Calculate(Calculate(Calculate(Value(dCostDetail[Value]), Userelationship(fSales[IdProduct],dProduct[Sku]),Userelationships(dProduct[IdCateg],dCategory[IdCategory]), Userelationships(dCategory[IdCategory],dCostDetail[IdCateg]))))
And I need this "type" of normalized model to have the details when I analyze the information, e.g. filter, but if you know another way to generate the calculation it would be ok.
RELATED doesn't work in measures, because it evaluates on a record-by-record level. So you're on the right track, but what you need to do is create a column in Powerpivot in the fSales table called "Cost Detail" or whatever, and use a RELATED formula there to pull in that value from the CostDetail table. Create another column and do the same thing to pull in the dWorkCost value into the fSales table.
Then you can do a measure for the expenses like this:
Expenses:=SUM([whatever you called CostDetailColumn])*SUM([whatever you called WorkCostColumn])
You should be able to drop that measure into a pivot and it should do what you're looking for.

Hive UDF for selecting all except some columns

The common query building pattern in HiveQL (and SQL in general) is to either select all columns (SELECT *) or an explicitly-specified set of columns (SELECT A, B, C). SQL has no built-in mechanism for selecting all but a specified set of columns.
There are various mechanisms for excluding some columns as outlined in this SO question but none apply naturally to HiveQL. (For example, the idea to create a temporary table with SELECT * then ALTER TABLE DROP some of its columns would wreak havoc in a big data environment.)
Ignoring the ideological discussion about whether it is a good idea to select all but some columns, this question is about the possible ways to extend Hive with this capability.
Prior to Hive 0.13.0 SELECT could take regular-expression-based columns, e.g., property_.* inside a backtick-quoted string. #invoketheshell's answer below refers to this capability but it comes at a cost, which is that, when this capability is on, Hive cannot accept columns with non-standard characters in them, e.g., $foo or x/y. That's why the Hive developers turned this behavior off by default in 0.13.0. I am looking for a generic solution that works for any column name.
A generic table-generating UDF (UDTF) could certainly do this because it can manipulate the schema. Since we are not going to generate new rows, is there a way to solve this problem using a simple row-based UDF?
This seems like a common problem with many posts around the Web showing how to solve it for various databases yet I haven't been able to find a solution for Hive. Is there code somewhere that does this?
You can choose every column except those listed in a regex based specification. This is query columns by exclusion. See below:
A SELECT statement can take regex-based column specification in Hive releases prior to 0.13.0, or in 0.13.0 and later releases if the configuration property hive.support.quoted.identifiers is set to none.
That being said you could create a new table or view using the following, and all the columns except the columns specified will be returned:
hive.support.quoted.identifiers=none;
drop table if exists database.table_name;
create table if not exists database.table_name as
select `(column_to_remove_1|...|column_to_remove_N)?+.+`
from database.some_table
where
--...
;
This will create a table that has all the columns from some_table except the columns named column_to_remove_1, ... , to column_to_remove_N. You can also choose to create a view instead.

Resources