I'm trying to update one table from one database to another database in visual studio 2012 using schema compare. The table exists in both databases, the first one has more columns than the one in the second db. The schema compare shows me that this table is different displaying me the new columns on the left size of the window.
When I try to generate the script for updating the table on the database in the right side, is creating a temp table but is not listing the whole columns, so it gives me an error in one column that is NOT NULL.
Any idea of this behavior? shouldn't be straightforward for this simple change?
Related
I am building an etl using pyspark in databricks.
I have a source SQL table with roughly 10 million rows of data which I want to load into a SQL staging table.
I have two basic requirements:-
When a row is added to the source table, it must to inserted into the staging table.
When a row is updated to the source table, it must to updated into the staging table.
Source data
Thankfully the source table has two timestamp columns for created and updated time. I can query the new and updated data using these two columns and put it into a dataframe called source_df.
Target data
I load all the keys (IDs) from the staging table into a dataframe called target_df.
Working out changes
I join the two dataframe together based on the key to work out which rows already exist (which form updates), and which rows don't exist (which form inserts). This gives me two new dataframes, inserts_df and updates_df.
Inserting new rows
This is easy because I can just use inserts_df.write to directly write into the staging table. Job done!
Updating existing rows
This is what I can't figure out as there is little in the way of existing examples. I am lead to believe that you can't do updates using pyspark. I could use the "overwrite" mode to replace the SQL table, but it doesn't make a lot of sense to replace 10 million rows when I only want to update half a dozen.
How can I effectively get the rows from updates_df into SQL without overwriting the whole table?
Why Cassandra is called unstructured even though table/column family has to be defined with columns and their data type.
For the defined table with some fixed columns we can choose to fill some columns in one particular row and choose not to fill in other row. But same thing can be done in RDBMS where we can leave some columns in the insert statement and the columns left out should allow null?
As mongo store the data in json documents where we can store different (keys) data every time insert new document. we don't need to define anything . But for cassandra we need to reconfigure our table to accommodate new columns getting added.
Even though some articles are present but still its not clear to me. Can someone pin point the reason.
Basically is not about "how it works", is how the files are stored, this is why cassandra have not structure for the files, you can have a same récords in diferents folders.
I am using Power Query in Excel 2016 to combine data from 12 different workbooks within the same folder system into one table, and need to add an additional column in the master table that tracks the status of each row. However, when I refresh the data, the Status column does not follow the rows to which it is initially applied.
I have already looked at [ Inserting text manually in a custom column and should be visible on refresh of the report ] but this solution only works with a unique ID column. Because each of the 12 workbooks is edited separately and because there is no single column that can be guaranteed to have unique values between all of the different spreadsheets, I don't have a key to join the data to the additional column.
I believe there is always a way of finding a Unique ID. If you can get your head around this, it is not that difficult to solve your problem.
See my below example, I used three sample workbooks saved in a Test folder. Depends on the way you add them to the query editor, in my example I used From Folder and follow the prompts without making any changes and combined the tables automatically. Once combined there is a Source.Name column automatically added. I suggest to leave this column in your output table as it can form part of the Unique ID if your data is highly identical across the workbooks.
An optional step (not in my screenshot) is to add an Index column and concatenate the index number with a product/task name so it can make that specific line of data entry even more unique.
Once you added the Status column with data entered manually on the master table, load the master table back to query editor.
Then go back to the original query (Test (Input) in my example) and merge it with the reloaded output query. See my screen-shot for how to 'uniquely' merge the two tables.
The rest is self-explanatory. I think the key is finding elements of the Unique ID and incorporate it in the merge part.
Let me know if you have any questions. Cheers :)
I have an Excel 2010 workbook with two SQL queries each returning data to a separate worksheet as a named table. They return the same db fields, but one is constrained on the values of one of the fields. I have additional columns using formulas to transform these field data, and these are also identical between worksheets.
Upon refresh, Excel autofills the formulae per the conventions of a named table. One of the sheets/tables--call it Table 1-- autofills with native references (e.g., for a field/column named variable, the corresponding formula uses [#[variable]] as its reference. However, the other table--call it Table 2--autofills with references to Table 1, i.e., 'Table 1'[#[variable]].
I have searched and replaced these several times, and rewritten the formulae, but each time I refresh the data query these references pop up. I searched to replace Table 1 with Table 2, as it occurred to me this may be a namespace collision and Excel just takes the first-created table as canon. This, though, doesn't fix the issue, nor did changing the column names to create a non-colliding namespace.
The only other thing I can think is that I'd copied the formulas from Table 1 and even though I removed the table name perhaps Excel has held onto the reference. Is there a table cache or such that Excel references to keep pulling these? Should I create a new query and new table and manually create the formulae, or would that run into the same issue?
[Entering this as an answer so it's not shown as an outstanding question.]
Creating the relevant tables from scratch results in no such namespace collision nor any wonkiness thus far, as we'd expect. I realized that I'd left something out of my initial question: I had copied, in whole or part (likely whole), the tab containing Table 1 to create Table 2. Even editing the resulting new SQL query and the formulae on Table 2, it seems Excel--in its effort to help--recalls several components of the table and does not update this cached information.
I have an excel file and I want to import the excel file basing on the existing database table using entity framework. Right now I firstly convert the excel sheet to a DataTable and have a loop to loop through each row of the DataTable. Each row has an id field and if the id exists in the database table I need to update it otherwise I need to insert this row to the database table. I want to use entity framework to wrap my loop into one transaction for roll back purpose in case of error. But I run into a scenario of two rows with the same id but different values. The first row is checked and added my entity collection, but the second row might be mistakenly updated the firstly added row because the firstly is not actually added due to the delayed context.SaveChanges() called after the loop. How can I update the previously added row in the entity collection without repeatedly calling context.SaveChanges() inside my loop? Thanks.
I don't think I have done it over the past decade or so, but I have used Microsoft Word's Mail Merge to create the SQL statement that I needed (SELECT, INSERT and UPDATE) for each line in an Excel sheet. Once I got the long SQL statement in text I simply copy-paste it into the console and the statement was executed and the job was done. I am confident that there are better ways of doing this but it worked at the time with limited knowledge but a need. This answer is probably in the category "don't try this at work, but it is fine to do it at home if it does the job".