BCP Unexpected String Data, Right Truncation - text

Trying to import data into Azure. Created a text file and have tried both comma and tab delimited text files.
Here is the table the text file is to be inserted into:
CREATE TABLE [dbo].[Test] (
[Id] [uniqueidentifier] NOT NULL,
[first_name] [varchar] (50),
[last_name] [varchar] (50),
[dob] [varchar] (10),
[gender] [char] (1),
[phone] [char] (10))
BCP dbo.Test in C:\Test.txt -S “DBServerName” -d “DBName” -U “UserName” -P “Password” -c -r/r
Have tried saving the text file in different formats and with different encodings, but believe that it’s correct to have it as UTF-16 with UNIX LF. Any thoughts? Also, if there are nulls in the data (excluding the Id field), does that need to be specified somehow in the BCP statement? Thanks!

I think you can reference this document:Import data into Azure SQL Database with BCP Utility.
This tutorial is talking about load data from CSV into Azure SQL Database (flat files).
From the document, we can get that:
1. Your data file needs to use the ASCII or UTF-16 encoding.
2. BCP does not support UTF-8.
Besides, you can also reference Format Files for Importing or Exporting Data.
I have a text data file which one column has null.
Then I imported this file into my Azure SQL database dbleon1 specified nothing else and succeeded.
My bcp code:
bcp tb1 in C:\Users\leony\Desktop\tb1.txt -S *****.database.windows.net -d dbleon1 -U ServerAdmin -P ***** -q -c -t
I checked and the data is imported into tb2 in database dbleon1.
Hope this helps.

Related

KUSTO split txt when ingesting

I created a table in my Azure Data Explorer with the following command:
.create table MyLogs ( Level:string, Timestamp:datetime, UserId:string, TraceId:string, Message:string, ProcessId:int32 )
I then created my Storage Account --> Container and i then uploaded a simple txt file with the following content
Level Timestamp UserId TraceId Message ProcessId
I then generated a SAS for the container holding that txt file and used in into the query section of my Azure Data Explorer like the following:
.ingest into table MyLogs (
h'...sas for my txt file ...')
Now, when i read the table i see something like this
Level TimeStamp UserId TraceID MEssage ProcessId
Level Timestamp UserId TraceId Message ProcessId
So it basically put all the content into the first column.
I was expecting some automatic splitting. I tried with tab, spaces, commas and many other separators.
I tried to configure an injection mapping with csv format but had no luck.
For what I understood, each new line in the txt is a new row in the table. But how to split the same line with some specific separator?
I read many pages of documentation but had no luck
You can specify any of the formats that you want to try using the format argument, see the list of formats and the ingestion command syntax example that specify the format here
In addition, you can use the "one click ingestion" from the web interface.
This should work (I have done it before with Python SDK)
.create table MyLogs ingestion csv mapping 'MyLogs_CSV_Mapping' ```
[
{"Name":"Level","datatype":"datetime","Ordinal":0},
{"Name":"Timestamp","datatype":"datetime","Ordinal":1},
{"Name":"UserId","datatype":"string","Ordinal":2},
{"Name":"TraceId","datatype":"string","Ordinal":3},
{"Name":"Message","datatype":"string","Ordinal":4},
{"Name":"ProcessId","datatype":"long","Ordinal":5}
]```
https://learn.microsoft.com/de-de/azure/data-explorer/kusto/management/data-ingestion/ingest-from-storage
.ingest into table MyLogs SourceDataLocator with (
format="csv",
ingestionMappingReference = "MyLogs_CSV_Mapping"
)
Hopefully this will help a bit :)

Convert Access database into delimited format on Unix/Linux

I have an Access database file and I need to convert it into delimited file format.
The Access DB file has multiple tables and I need to create separate delimited files for each table.
So far I am not able to parse Access DB files with any Unix commands. Is there some way that I can do this on Unix?
You can use UCanAccess to dump Access tables to CSV files using the console utility:
gord#xubuntu64-nbk1:~/Downloads/UCanAccess$ ./console.sh
/home/gord/Downloads/UCanAccess
Please, enter the full path to the access file (.mdb or .accdb): /home/gord/ClientData.accdb
Loaded Tables:
Clients
Loaded Queries:
Loaded Procedures:
Loaded Indexes:
Primary Key on Clients Columns: (ID)
UCanAccess>
Copyright (c) 2019 Marco Amadei
UCanAccess version 4.0.4
You are connected!!
Type quit to exit
Commands end with ;
Use:
export [--help] [--bom] [-d <delimiter>] [-t <table>] [--big_query_schema <pathToSchemaFile>] [--newlines] <pathToCsv>;
for exporting the result set from the last executed query or a specific table into a .csv file
UCanAccess>export -d , -t Clients clientdata.csv;
UCanAccess>Created CSV file: /home/gord/Downloads/UCanAccess/clientdata.csv
UCanAccess>quit
Cheers! Thank you for using the UCanAccess JDBC Driver.
gord#xubuntu64-nbk1:~/Downloads/UCanAccess$
gord#xubuntu64-nbk1:~/Downloads/UCanAccess$ cat clientdata.csv
ID,LastName,FirstName,DOB
1,Thompson,Gord,2017-04-01 07:06:27
2,Loblaw,Bob,1966-09-12 16:03:00

Loading Special Character via Polybase

I am trying to load single quote string delimited file and I am able to load data except for certain records for the string which contains below format. How to Load this below values using PolyBase in SQL Datawarehouse. Any input is highly appreciated.
Eg:
'Don''t Include'
'1'''
'Can''t'
'VM''s'
External File Format:
CREATE EXTERNAL FILE FORMAT SAMPLE_HEADER
with (format_type=delimitedtext,
format_options(
FIELD_TERMINATOR=',',
STRING_DELIMITER='''',
DATE_FORMAT='yyyy-MM-dd HH:mm:ss',
USE_TYPE_DEFAULT=False)
)
In this case your string delimiter needs to be something other than a single quote.
I assume you're using a comma-delimited file. You have a couple of options:
Make your column delimiter something other than comma.
Make your string delimiter a character that does not exist in your data
Use an output format other than CSV, such as Parquet or Orc
If you're going to use a custom delimiter, I suggest ASCII Decimal(31) or Hex(0x1F), which is specifically reserved for this purpose.
If you're going to use a string delimiter you might use double-quote (but I'm guessing this is in your data) or choose some other character.
That said, my next guess is that you're going to come across data with embedded carriage returns, and this is going to cause yet another layer of problem. For that reason, I suggest you move your extracts to something other than CSV, and look to Parquet or Orc.
Currently, Polybase in SQLDW does not support handling of the escape character in the delimited text format. So you cannot load your file directory in SQLDW.
In order to load your file, you may pre-process your input file. During pre-processing you may generate another data file either in binary format (PARQUET or ORC which are directory readable by poly-base) or another delimited file with some special field separator(any character which is not expected in your data file, e.g. | or ~). With such special character, there is no need of using escaping/delimiting the values)
Hope its helps.
From Azure docs:
<format_options> ::=
{
FIELD_TERMINATOR = field_terminator
| STRING_DELIMITER = string_delimiter
| First_Row = integer -- ONLY AVAILABLE SQL DW
| DATE_FORMAT = datetime_format
| USE_TYPE_DEFAULT = { TRUE | FALSE }
| Encoding = {'UTF8' | 'UTF16'}
}

Export data from SqlQuery to Excel sheet [duplicate]

I have table with more than 3 000 000 rows. I have try to export the data from it manually and with SQL Server Management Studio Export data functionality to Excel but I have met several problems:
when create .txt file manually copying and pasting the data (this is several times, because if you copy all rows from the SQL Server Management Studio it throws out of memory error) I am not able to open it with any text editor and to copy the rows;
the Export data to Excel do not work, because Excel do not support so many rows
Finally, with the Export data functionality I have created a .sql file, but it is 1.5 GB, and I am not able to open it in SQL Server Management Studio again.
Is there a way to import it with the Import data functionality, or other more clever way to make a backup of the information of my table and then to import it again if I need it?
Thanks in advance.
I am not quite sure if I understand your requirements (I don't know if you need to export your data to excel or you want to make some kind of backup).
In order to export data from single tables, you could use Bulk Copy Tool which allows you to export data from single tables and exporting/Importing it to files. You can also use a custom Query to export the data.
It is important that this does not generate a Excel file, but another format. You could use this to move data from one database to another (must be MS SQL in both cases).
Examples:
Create a format file:
Bcp [TABLE_TO_EXPORT] format "[EXPORT_FILE]" -n -f "[ FORMAT_FILE]" -S [SERVER] -E -T -a 65535
Export all Data from a table:
bcp [TABLE_TO_EXPORT] out "[EXPORT_FILE]" -f "[FORMAT_FILE]" -S [SERVER] -E -T -a 65535
Import the previously exported data:
bcp [TABLE_TO_EXPORT] in [EXPORT_FILE]" -f "[FORMAT_FILE] " -S [SERVER] -E -T -a 65535
I redirect the output from hte export/import operations to a logfile (by appending "> mylogfile.log" ad the end of the commands) - this helps if you are exporting a lot of data.
Here a way of doing it without bcp:
EXPORT THE SCHEMA AND DATA IN A FILE
Use the ssms wizard
Database >> Tasks >> generate Scripts… >> Choose the table >> choose db model and schema
Save the SQL file (can be huge)
Transfer the SQL file on the other server
SPLIT THE DATA IN SEVERAL FILES
Use a program like textfilesplitter to split the file in smaller files and split in files of 10 000 lines (so each file is not too big)
Put all the files in the same folder, with nothing else
IMPORT THE DATA IN THE SECOND SERVER
Create a .bat file in the same folder, name execFiles.bat
You may need to check the table schema to disable the identity in the first file, you can add that after the import in finished.
This will execute all the files in the folder against the server and the database with, the –f define the Unicode text encoding should be used to handle the accents:
for %%G in (*.sql) do sqlcmd /S ServerName /d DatabaseName -E -i"%%G" -f 65001
pause

Processing CSV data

I have recently been asked to take a .csv file that looks like this:
Into something like this:
Keeping in mind that there will be hundreds, if not thousands of rows due to a new row being created every time a user logs in/out, and there will be more than simply two users. My first thought was to load the .csv file into a MySQL then run a query on it. However, I really don't want to install MySQL on the machine that will be used for this.
I could do it manually for each agent in Ecxel/Open Office, but due to there being little room for error, and there are so many lines to do this, I want to automate the process. What's the best way to go about that?
This one-liner relies only on awk, and date for converting back and forth to timestamps:
awk 'BEGIN{FS=OFS=","}NR>1{au=$1 "," $2;t=$4; \
"date -u -d \""t"\" +%s"|getline ts; sum[au]+=ts;}END \
{for (a in sum){"date -u -d \"#"sum[a]"\" +%T"|getline h; print a,h}}' test.csv
having test.csv like this:
Agent,Username,Project,Duration
AAA,aaa,NBM,02:09:06
AAA,aaa,NBM,00:15:01
BBB,bbb,NBM,04:14:24
AAA,aaa,NBM,00:00:16
BBB,bbb,NBM,00:45:19
CCC,ccc,NDB,00:00:01
results in:
CCC,ccc,00:00:01
BBB,bbb,04:59:43
AAA,aaa,02:24:23
You can use this with little adjustments for extracting the date from extra columns.
Let me give you an example in case you decide to use SQLite. You didn't specify a language but I will use Python because it can be read as pseudocode. This part creates your sqlite file:
import csv
import sqlite3
con = sqlite3.Connection('my_sqlite_file.sqlite')
con.text_factory = str
cur = con.cursor()
cur.execute('CREATE TABLE "mytable" ("field1" varchar, \
"field2" varchar, "field3" varchar);')
and you use the command:
cur.executemany('INSERT INTO stackoverflow VALUES (?, ?, ?)', list_of_values)
to insert rows in your database once you have read them from the csv file. Notice that we only created three fields in the database so we are only inserting 3 values from your list_of_values. That's why we are using (?, ?, ?).

Resources