compare the data between two variables and generate a report in to the text file - linux

my requirement is transfer the data from source data base to the target database
job1.
sourcedatabase:oracle. target
table1 target1.lst
table2 table2.lst
table3 table3.lst
this part i done successfully.
job 2.
now i want to count the number of records source database and target database
this part also done successfully.
job3: ...........(this part only i am lacking)
i kept the record count between source and target in variable as well as text file
now tell me how to compare the the values in a variable or a text file(these values are find by using select count(*) from table and wc -l $filename.) that i may find the loading process done successfully or not and also i want maintain a log file also
please enhance me how to compare the values in a text file or a variable so that i can maintain a log file to generate a report maintain in a text file.

It's not very clear where these text files came from and why to compare to them. Why don't you store the counts in the database in first place (instead of/in addition to writing them to a file).
text file or a variable
What variable? In Oracle PL/SQL you can compare variables using =, !=, is null, is not null, etc. In any other programming language: there are comparison operators too.

Related

How to Link a Excel Table with Access and prevent NULL Values due to wrong Data Type Conversion?

In the current Project i Need to Keep a Excel File which gets Values from a Machine to the Access Database to work with them and Import them in the Data Model.
Problem is some of the Values give invalid results due to the way they are saved. For example the timestamp is saved like
030420 instead of 03:04:20 and Access cant handle that and gives me a #NUMBER
I can not simply Change the datatype in Excel because the whole Excel gets refreshed every hour by a source that i cant influence.
Any help appreciated.
If Erik's proposal does not work, you can
- create a backup copy of your Excel source
- tweak the file: enter text in the first row of the problematic columns
- link the tweaked file into Access
- put back the real file in place.
Now the problematic columns should be read as Text, and you can build a query that solves any issue like conversion, null handling...
Link, don't import, the Excel file, and you have a linked table.
Now, use this linked table as source in a simpel select query where you modify the data and alias the fields as needed. For example:
Select
F1 As SomeName,
F2 As OtherName,
TimeSerial(Mid([F5],1,2),Mid([F5],3,2),Mid([F5],5,2)) As TrueTime
From
LinkedTable
Where
F7 Is Not Null
The use this query for your import.
Consider querying the Excel file instead of using a linked table.
The query can directly query an Excel range:
SELECT * FROM
[Excel 12.0 XML;DATABASE=PathToMyExcel;HDR=Yes;IMEX=1].[MyRange] t
Then, you can use functions like TimeSerial to cast numbers to time values.

How to add columns from multiple files in U-SQL in ADLA?

I have a lot of csv files in a Azure Data Lake, consisting of data of various types (e.g., pressure, temperature, true/false). They are all time-stamped and I need to collect them in a single file according to timestamp for machine learning purposes. This is easy enough to do in Java - start a filestream, run a loop on the folder that opens each file, compares timestamps to write relevant values to the output file, starting a new column (going to the end of the first line) for each file.
While I've worked around the timestamp problem in U-SQL I'm having trouble coming up with syntax that will help me run this on the whole folder. The wildcard syntax {*} treats all files as the same fileset while I need to run some sort of loop to join a column from each file individually.
Is there any way to do this, perhaps using virtual columns?
First you have to think about your problem functional/declaratively and not based on procedural paradigms such as loops.
Let me try to rephrase your question to see if I can help. You have many csv files with data that is timestamped. Different files can have rows with the same timestamp, and you want to have all rows for the same timestamp (or range of timestamps) output to a specific file? So you basically want to repartition the data?
What is the format of each of the files? Do they all have the same schema or different schemas? In the later case, how can you differentiate them? Based on filename?
Let me know in the comments if that is a correct declarative restatement and the answers to my questions and I will augment my answer with the next step.

How can I insert data from Excel file to Oracle using INSERT INTO SQL statement?

I have created types using Oracle objects and created a table
CREATE OR REPLACE TYPE OttawaAddress_Ty AS OBJECT
(StrtNum NUMBER(9),
Street VARCHAR2(20),
City VARCHAR2(15),
Province CHAR(2),
PostalCode CHAR(7));
/
CREATE OR REPLACE TYPE OttawaOfficesInfo_Ty AS OBJECT
(Name VARCHAR(35),
OfficeID VARCHAR2(2),
Phone VARCHAR2(15),
Fax CHAR(15),
Email CHAR(30));
/
CREATE TABLE OttawaOffices
(OfficeAddress OttawaAddress_Ty,
OfficeInfo OttawaOfficesInfo_Ty,
Longitude_DMS NUMBER (10,7),
Latitude_DMS NUMBER (10,7),
SDO_GEOMETRY MDSYS.SDO_GEOMETRY);
I have an Excel file which holds the data and I need to import to this Oracle table using INSERT INTO SQL statements. How can I do this? As you can notice, I have a column called SDO_GEOMETRY which will hold the Decimal Degrees of the records. These decimal degrees are saved in two separate columns in my Excel file.
I am not sure if I can problematically insert the values from Excel or whether I need to go through every record and create
INSERT INTO ... VALUES.... And if so, how to add values when I have created types?
Oracle has a really neat feature called External Tables. These look like regular tables from inside the database, so we can execute SELECT statements against them. The trick is that the table's data domes from OS files (hence "external"). We just define the table to say the same structure as the spreadsheet's columns. It doesn't work with Excel binary format but it does work for CSV files (so Save as ...).
The advantage of external tables is that manipulating data is easy in SQL - it's what it just best - and we don't need to load anything into a staging table. Something like
insert into OttawaOffices
select ottawaaddress_ty(ext.strtnum,ext.street,ext.city,ext.province,ext.postalcode)
, ottawaofficesinfo_ty (ext.name,ext.officeid,ext.phone,ext.fax,ext.email)
, ext.longitude
, ext.latitude
, SDO_GEOMETRY MDSYS.SDO_GEOMETRY(ext.col1, ext.col2)
from your_external_table ext
/
The limitation of external tables is the need to get the source file onto the database server and create a database directory object. Some places are funny about this.
Anyway, find out more.
I'm not going to pass judgement on the declaring table columns as user-defined types. It's usually considered bad practice but maybe it works in your use case.

Query database columns using Excel/csv data

I have a case where I need to read an Excel/csv/text file containing two columns (say colA and colB) of values (around 1000 rows). I need to query the database using values in colA. The query will return an XMLType into which the respective colB value needs to be inserted. I have the XML query and the insert working but I am stuck on what approach I should take to read the data, query and update it on the fly.
I have tried using external tables but realized that I don't have access to the server root to host the data file. I have also considered creating a temporary table to load the data to using SQL Loader or something similar and run the query/update within the tables. But that would need some formal overhead to go through. I would appreciate suggestions on the approach. Examples would be greatly helpful.
e.g.
text or Excel file:
ColA,ColB
abc,123
def,456
ghi,789
XMLTypeVal e.g.
<node1><node2><node3><colA></colA><colB></colB></node3></node2></node1>
UPDATE TableA SET XMLTypeVal
INSERTCHILDXML(XMLTypeVal,
'/node1/node2/node3', 'colBval',
XMLType('<colBval>123</colBval>'))
WHERE EXTRACTVALUE(TableA.XMLTypeVal, node1/node2/node3/ColA') = ('colAval');

SSIS Excel Data Source - Is it possible to override column data types?

When an excel data source is used in SSIS, the data types of each individual column are derived from the data in the columns. Is it possible to override this behaviour?
Ideally we would like every column delivered from the excel source to be string data type, so that data validation can be performed on the data received from the source in a later step in the data flow.
Currently, the Error Output tab can be used to ignore conversion failures - the data in question is then null, and the package will continue to execute. However, we want to know what the original data was so that an appropriate error message can be generated for that row.
According to this blog post, the problem is that the SSIS Excel driver determines the data type for each column based on reading values of the first 8 rows:
If the top 8 records contain equal number of numeric and character types – then the priority is numeric
If the majority of top 8 records are numeric then it assigns the data type as numeric and all character values are read as NULLs
If the majority of top 8 records are of character type then it assigns the data type as string and all numeric values are read as
NULLs
The post outlines two things you can do to fix this:
First, add IMEX=1 to the end of your Excel driver connection string. This will allow Excel to read the values as Unicode. However, this is not sufficient if the data in the first 8 rows are numeric.
In the registry, change the value for HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Nod\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows to 0. This will ensure that the driver looks at all the rows to determine the data type for the column.
Yes, you can. Just go into the output column list on the Excel source and set the type for each of the columns.
To get to the input columns list right click on the Excel source, select 'Show Advanced Editor', click the tab labeled 'Input and Output Properties'.
A potentially better solution is to use the derived column component where you can actually build "new" columns for each column in Excel. This has the benefits of
You have more control over what you convert to.
You can put in rules that control the change (i.e. if null give me an empty string, but if there is data then give me the data as a string)
Your data source is not tied directly to the rest of the process (i.e. you can change the source and the only place you will need to do work is in the derived column)
If your Excel file contains a number in the column in question in the first row of data, it seems that the SSIS engine will reset the type to a numeric type. It kept resetting mine. I went into my Excel file and changed the numbers to "Numbers stored as text" by placing a single quote in front of them. They are now read as text.
I also noticed that SSIS uses the first row to IGNORE what the programmer has indicated is the actual type of the data (I even told Excel to format the entire column as TEXT, but SSIS still used the data, which was a bunch of digits), and reset it. Once I fixed that by putting a single-quote in my Excel file in front of the number in the first row of data, I thought it would get it right, but no, there is additional work.
In fact, even though the SSIS External DataSource Column now has the type DT_WSTR, it will still read 43567192 as 4.35671E+007. So you have to go back into your Excel file and put single quotes in front of all the numbers.
Pretty LAME, Microsoft! But there's your solution. I have no idea what to do if the Excel file is not under your control.
I was looking for a solution for the similar issue, but didn't find anything on the internet. Although most of the found solutions work at design time, they don't work when you want to automate your SSIS package.
I resolved the issue and made it work by changing the properties of "Excel Source". By default the AccessMode property is set to OpenRowSet. If you change it to SQL Command, you can write your own SQL to convert any column as you wish.
For me SSIS was treating the NDCCode column as float, but I needed it as a string and so I used following SQL:
Select [Site], Cstr([NDCCode]) as NDCCode From [Sheet1$]
Excel source is SSIS behaves crazy. SSIS determines the type of data in a particualr column by reading first 10 rows.. hence the issue. If you have a text column with null values in first 10 roes, SSIS takes the data type as Int. With a bit of struggle, here is a workaround
Insert a dummy row (preferrably first row) in the worksheet. I prefer doing this thru a Script task, you may consider using some service to preprocess the file before SSIS connects to it
With the duummy row, you are sure that the datatypes will be set as you need
Read the data using Excel source and filter out the dummy row before you take it for further processing.
I know it is a bit shabby, but it works :)
I could fix this issue. while creating the SSIS package, I manually changed the specific column to text (Open the excel file select the column, right click on column, select format cells, in number tab select Text and save the excel).
Now create the SSIS package and test it. It works. Now try to use the excel file where this column was not set as text.
It worked for me and I could execute the package successfully.
This should be resolved simply, just untick the box "Frist row as column names" and all data will be collected as text data type. Only downside of this choice is that you have to manage the columns names from the auto names (column 1, 2 etc) and handle the first row which contains the column names.
I had trouble implementing the solution here - I could follow the instructions, but it only gave new errors.
I solved my conversion issues by using a Data Conversion entity. This can be found on the SSIS Toolbox under Data Flow Transformations. I placed the Data Conversion between my Excel Source and OLE DB Destination, linked Excel to Data C, Data C to OLE DB, double clicked Data C to bring up a list of the data columns. Gave the problem column a new Alias, and changed the Data Type column.
Lastly, in the Mappings of the OLE DB Destination, use the Alias column name, rather than the original Excel column name. Job done.
You can use a Data Conversion component to convert to the desired data types.

Resources