So I have a lot of rows taken up by excel. I have 10,000 rows or so taken up by data and I am working with 10,000 or different IDs. Is there a way to query off an oracle database just 1 time by capturing the entire ID column as a group and including the group in the WHERE query instead of looping the 10,000 assets and query the database 10,000 times?
Sorry for not providing code. I really have not attempted this because I dont know if a solution exists.
Something like what you are asking can be accomplished in a two step process. First, by creating SELECT-FROM-DUAL queries for the relevant IDs, and second, inputting those queries into your main query and joining against them to limit to only the returns you need.
For the first step, use Excel to create SELECT-FROM-DUAL subqueries.
If your ID column starts in cell A2, copy the following formula into an empty cell on the same row and drag it down the column until all rows with an ID also have the formula. Alter the references to cell A2 and A3 if your IDs don't start in cell A2.="SELECT "&A2&" AS id FROM DUAL"&IF(NOT(ISBLANK(A3)), " UNION ALL", "")
Ultimately, what we want is a block of SELECT-FROM-DUAL statements that look like the below. Note that the last statement will not end in "UNION ALL", but all other statements should.
| IDs | Formula |
|----- |------------------------------------ |
| 1 | SELECT 1 AS id FROM DUAL UNION ALL |
| 2 | SELECT 2 AS id FROM DUAL UNION ALL |
| 3 | SELECT 3 AS id FROM DUAL UNION ALL |
| 4 | SELECT 4 AS id FROM DUAL UNION ALL |
| 5 | SELECT 5 AS id FROM DUAL UNION ALL |
| 6 | SELECT 6 AS id FROM DUAL |
For the second step, add all the SELECT-FROM-DUAL statements to your main query and then add an appropriate JOIN condition.SELECT
*
FROM table_you_need tyn
INNER JOIN (
SELECT 1 AS id FROM DUAL UNION ALL
SELECT 2 AS id FROM DUAL UNION ALL
SELECT 3 AS id FROM DUAL UNION ALL
SELECT 4 AS id FROM DUAL UNION ALL
SELECT 5 AS id FROM DUAL UNION ALL
SELECT 6 AS id FROM DUAL
) your_ids yi
ON tyn.id = yi.id
;
If you had a shorter list of IDs you could use a similar strategy to create an ID list for a WHERE ids IN (<list_of_numbers>) clause, but the IN list is typically limited to 100 items, and consequently would not work for your current question.
You can import data from Excel using Toad or SQL Developer. You need to create a table first in the database.
You can read the data directly with external tables if you save the excel file as a CSV file to a folder on the database server that the database can access.
You can read files as Excel (xls or xlsx format) using a PL/SQL library.
There are probably a few other ways I haven't thought of as well. This is a very common question.
Related
I have a dataset.table partioned by date (100 partition) like this :
table_name_(100) which means : table_name_20200101, table_name_20200102, table_name_20200103, ...
Exemple of table_name_20200101 :
| id | col_1 | col_2 | col_3 |
-----------------------------------------------------------------------------
| xxx | 2 | 6 | 10 |
| yyy | 1 | 60 | 29 |
| zzz | 12 | 61 | 78 |
| aaa | 18 | 56 | 80 |
I would like to delete the row ID = yyy in all the table (partioned) :
DELETE FROM `project_id.dataset_id.table_name_*`
WHERE id = 'yyy'
I got this error :
Illegal operation (write) on meta-table
project_id:dataset_id.table_name_*
Is there a way to delete rows 'yyy' in all table (partioned) ?
Thank you
Okay, some various things to call out here to ensure we're using consistent terminology.
You're talking about sharded tables, not partitioned. In a partitioned table, the data within the table is organized based on the partitioning specification. Here, you just have a series of tables named using a common prefix and a suffix based on date.
The use of the table_prefix* syntax is called a wildcard table, and DML is explicitly not allowed via wildcard tables: https://cloud.google.com/bigquery/docs/querying-wildcard-tables
The table_name_(100) is an aspect of how the BigQuery UI collapses series of like-named tables to save space in the navigation panes. It's not how the service itself references tables at all.
The way you can accomplish this is to leverage other aspects of BigQuery: The INFORMATION_SCHEMA tables and scripting functionality.
Information about what tables are in a dataset is available via the TABLES view: https://cloud.google.com/bigquery/docs/information-schema-tables
Information about scripting can be found here: https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting
Now, here's an example that combines these concepts:
DECLARE myTables ARRAY<STRING>;
DECLARE X INT64 DEFAULT 0;
DECLARE queryStr STRING;
# First, we query INFORMATION_SCHEMA to generate an array of the tables we want to process.
# This INFORMATION_SCHEMA query currently has a LIMIT clause so that if you get it wrong,
# you won't bork all the tables in the dataset in one go.
SET myTables = (
SELECT
ARRAY_AGG(t)
FROM (
SELECT
TABLE_NAME as t
FROM `my-project-id`.my_dataset.INFORMATION_SCHEMA.TABLES
WHERE
TABLE_TYPE = 'BASE TABLE' AND
STARTS_WITH(TABLE_NAME, 'table_name_')
ORDER BY TABLE_NAME
LIMIT 2
)
);
# Now, we process that array of tables using scripting's loop construct,
# one at a time.
LOOP
IF X >= ARRAY_LENGTH(myTables)
THEN LEAVE;
END IF;
# DANGER WILL ROBINSON: This mutates tables!!!
#
# The next line constructs the SQL statement we want to run for each table.
#
# In this example, we're constructing the same DML DELETE
# statement to run on each table. For safety sake, you may want to start with
# something like a SELECT query to validate your assumptions and project the
# myTables values to see what you're getting.
SET queryStr = "DELETE FROM `my-project-id`.my_dataset." || myTables[SAFE_OFFSET(X)] || " WHERE id = 'yyy'";
# Now, run the generated SQL via EXECUTE IMMEDIATE.
EXECUTE IMMEDIATE queryStr;
SET X = X + 1;
END LOOP;
I am using application insights to record custom measurements about our application. I have a customEvent that has data stored in the customMeasurements object. The object ontains 4 key-value pairs. I have many of these customEvents and I am trying to average the key-value pairs from all the events and display the results in a 2 column table.
I want to have one table that has 2 columns. First column is the key
name, and the second column in the key-value of all the events
averaged.
For example, event1 has key1's value set to 2. event2 has key1's value set to 6. If those are the only two events I received in the last 7 days, I want my table to show the number 4 in the row containing data for key1.
I can only average 1 key per query since I cannot put multiple summarizes inside of 1 query... Here is what I have for averaging the first key in the customMeasurements object:
customEvents
| where name == "PerformanceMeasurements"
| where timestamp > ago(7d)
| summarize key1average=avg(toint(customMeasurements.key1))
| project key1average
But I need to average all the keys inside of this object and build 1 table as described above.
For reference, I have attached a screenshot of the layout of a customEvent customMeasurements object:
If amount of Keys is limited and is known beforehand, then I'd recommend using multiple aggregations within | summarize operator by separating them with comma:
| summarize key1average=avg(toint(customMeasurements.key1)), key2average=avg(toint(customMeasurements.key2)), key3average=avg(toint(customMeasurements.key3))
If Keys may vary, then you'd to flatten out custom dimensions first with |mvexpand operator:
customEvents
| where timestamp > ago(1h)
| where name == "EventName"
| project customDimensions
| mvexpand bagexpansion=array customDimensions
| extend Key = customDimensions[0], Value = customDimensions[1]
| summarize avg(toint(Value)) by tostring(Key)
In this case, each Key-Value pair from customDimensions will become its own row and you will be able to operate on those with the standard query language constructs.
I have two tables.
An Issue table
+----+-------+
| ID | Name |
+----+-------+
| 1 | task1 |
| 2 | task2 |
| 3 | task3 |
+----+-------+
And table that extends issue by custom fields
+----+---------+------------+------------+
| ID | issueId | customName | val |
+----+---------+------------+------------+
| 1 | 1 | age | 22 |
| 2 | 1 | speed | 56kmph |
| 3 | 1 | startDate | 03.03.2015 |
+----+---------+------------+------------+
Problem in PowerPivot is that, when I select Issue as a Row, customField as Columns and val as a Value at place of Value Excel automatically aggregate using "Count of Value" which shows fields count, and for speed, startDate etc. excel shows "1", not the propper val of it.
Is it possible to force powerPivot to show value by its column name?
If you don't mind using Power Query, you can get to this fairly easily:
Here's how:
1. Add your tables as sources in Power Query. In Excel 2016, you can do that by clicking on a table, then on Data -> From Table. This will open Power Query with your selected table loaded. The table will be listed under Queries, on the left side of the screen.
Once you've loaded your first table as a source. Probably the simplest way to add the next one (by way of explanation anyhow) is to click File -> Close and Load, and do what you did previously, this time for the second source.
(When you Close and Load, a new tab will be created in your workbook, with the results of the new Query...which right now would just look like a duplicate of your original source table.)
2. Merge (join) your two queries.
a. Click on your Issues query, in the queries list on the left side of your screen. That will open the Issues query.
b. Click Home -> Merge Queries (drop-down) -> Merge Queries as New.
c. Fill in the dialog window like below and click OK. Make sure to select the columns you want to match on--highlighted in green here. This will create a new query, most likely named Merge. (Of course, you would use the names of your tables, instead of Issues and Extended.)
Your new query will look something like this:
d. Click on the button to expand the tables in the column of tables and make selections, like these, from the drop-down window and click OK.
You'll get a table something like this:
3. Pivot your customName column.
a. You can't pivot a column with nulls, so select the customName column, then Transform -> Replace Values, and enter these settings in the dialog window that pops up, then click OK (the Replace With box is left empty):
b. Select the customName column then Transform -> Pivot Column. Fill in the dialog window that pops up like this, below, and click OK.
4. Clean up. Select all the columns you want to keep, then click Home -> Remove Columns (drop-down) -> Remove Other Columns:
You'll end up with something like this:
When you Close and Load, you'll get a new tab with the final table in it.
I have the following code:
SELECT ta.application as koekkoek, ta.ipc, ipc_count/ipc_tot as ipc_share, t3.sfields FROM (
select t1.appln_id as application, t1.ipc_subclass_symbol as ipc, count(t2.appln_id) as ipc_count, sum(ipc_count) over (PARTITION BY application) as ipc_tot
FROM temp.tls209_small t1
CROSS JOIN
(SELECT appln_id, FROM temp.tls209_small group by appln_id ) t2
where t1.appln_id = t2.appln_id
GROUP BY application, ipc
) as ta
CROSS JOIN thesis.ifris_ipc_concordance t3
WHERE ta.ipc LIKE t3.ipc+'%'
AND ta.ipc NOT LIKE t3.not_ipc+'%'
AND t3.not_appln_id NOT IN
(SELECT ipc_subclass_symbol from temp.tls209_small t5 where t5.appln_id = ta.application)
Giving the folllowing error:
Field 'ta.application' not found.
I have tried numerous notations for the field, but BigQuery doesn't seem to recognize any reference to other tables in the subquery.
The purpose of the code is as to assign new technology classifications to records based on a concordance table:
I have got two tables:
One large table with application id's, classifications and some other stuff tls209_small:
And a concordance table with some exception rules ifris_ipc_concordance:
In the end I need to assign the sfields label for each row in tls209 (300 million rows). The rules are that ipc_class_symbol+'%' from the first table should be like ipcin the second table, but not like not_ipc.
In addition, the not_appln_id value, if present, should not be associated with the same appln_id in the first table.
So a small example, say this is the input of the query:
appln_id | ipc_class_symbol
1 | A1
1 | A2
1 | A3
1 | C3
sfields | ipc | not_ipc | not_appln_id
X | A | A2 | null
Y | A | null | A3
appln_id 1 should get two times sfields X because ipc=A, not_ipc matches A1 and A3.
Y should not be assigned at all as A3 occurs in appln_id 1.
In the results, I also need the share of the ipc_class_symbol for a single application (1 for 328100001, 0.5 for 32100009 etc.)
Without the last condition (AND t3.not_appln_id NOT IN (SELECT ipc_subclass_symbol from temp.tls209_small t5 where t5.appln_id = ta.application) ) the query works fine:
Any suggestions on how to get the subquery to recognize the application id (ta.application), or other ways to introduce the last condition to the query?
I realize my explanation of the problem may not be very straightforward, so if anything is not clear please indicate so, I'll try to clarify the issues.
The query you're performing is doing an anti-join. You can re-write this as an explicit join, but it is a little verbose:
SELECT *
FROM [x.z] as z
LEFT OUTER JOIN EACH [x.y] as y ON y.appln_id = z.application
WHERE y.not_appln_id is NULL
A working solution for the problem was achieved by first generating a table my matching only the ipc_class_symbol from the first table, to the ipc column of the second, but also including the not_ipc, and not_appln_id columns from the second. In addition, a list of all ipc class labels assigned to each appln_id was added using the GROUP_CONCAT method.
Finally, with help from Pentium10, the resulting table was filtered based on the exeption rules as also discussed in this question.
In the final query, the GROUP BY and JOIN arguments needed EACH modifiers to allow the large tables to be processed:
SELECT application as appln_id, ipc as ipc_class, ipc_share, sfields as ifris_class FROM (
SELECT * FROM (
SELECT ta.application as application, ta.ipc as ipc, ipc_count/ipc_tot as ipc_share, t3.sfields as sfields, t3.ipc as yes_ipc, t3.not_ipc as not_ipc, t3.not_appln_id as exclude, t4.classes as other_classes FROM (
SELECT t1.appln_id as application, t1.ipc_class_symbol as ipc, count(t2.appln_id) as ipc_count, sum(ipc_count) over (PARTITION BY application) as ipc_tot
FROM thesis.tls209_appln_ipc t1
FULL OUTER JOIN EACH
(SELECT appln_id, FROM thesis.tls209_appln_ipc GROUP EACH BY appln_id ) t2
ON t1.appln_id = t2.appln_id
GROUP EACH BY application, ipc
) AS ta
LEFT JOIN EACH (
SELECT appln_id, GROUP_CONCAT(ipc_class_symbol) as classes FROM [thesis.tls209_appln_ipc]
GROUP EACH BY appln_id) t4
ON ta.application = t4.appln_id
CROSS JOIN thesis.ifris_ipc_concordance t3
WHERE ta.ipc CONTAINS t3.ipc
) as tx
WHERE (not ipc contains not_ipc or not_ipc is null)
AND (not other_classes contains exclude or exclude is null or other_classes is null)
)
I understand that this is not possible using an UPDATE.
What I would like to do instead, is migrate all rows with say PK=0 to new rows where PK=1. Are there any simple ways of achieving this?
For a relatively simple way, you could always do a quick COPY TO/FROM in cqlsh.
Let's say that I have a column family (table) called "emp" for employees.
CREATE TABLE stackoverflow.emp (
id int PRIMARY KEY,
fname text,
lname text,
role text
)
And for the purposes of this example, I have one row in it.
aploetz#cqlsh:stackoverflow> SELECT * FROM emp;
id | fname | lname | role
----+-------+-------+-------------
1 | Angel | Pay | IT Engineer
If I want to re-create Angel with a new id, I can COPY the table's contents TO a .csv file:
aploetz#cqlsh:stackoverflow> COPY stackoverflow.emp TO '/home/aploetz/emp.csv';
1 rows exported in 0.036 seconds.
Now, I'll use my favorite editor to change the id of Angel to 2 in emp.csv. Note, that if you have multiple rows in your file (that don't need to be updated) this is your opportunity to remove them:
2,Angel,Pay,IT Engineer
I'll save the file, and then COPY the updated row back into Cassandra FROM the file:
aploetz#cqlsh:stackoverflow> COPY stackoverflow.emp FROM '/home/aploetz/emp.csv';
1 rows imported in 0.038 seconds.
Now Angel has two rows in the "emp" table.
aploetz#cqlsh:stackoverflow> SELECT * FROM emp;
id | fname | lname | role
----+-------+-------+-------------
1 | Angel | Pay | IT Engineer
2 | Angel | Pay | IT Engineer
(2 rows)
For more information, check the DataStax doc on COPY.