Amazon Athena partition with colon(:) is not working

Amazon Athena partition with colon(:) is not working - presto

When creating partition in Athena, I tried to use the date in the format (yyyy-MM-ddTHH:mm:ssZ) then I am not able to query the data
Step 1: Create table
CREATE EXTERNAL TABLE my_info (
id STRING,
name STRING
) PARTITIONED BY (
part string
) STORED AS ORC LOCATION 's3://bucket1/data' tblproperties ("orc.compress"="SNAPPY");
Step 2: Create folder like below and added the files.
S3://bucket1/data/part=2019-11-12T14:15:16Z
Step 3: Refresh partition
MSCK REPAIR TABLE my_info
Step 4: Query the data
SELECT *
FROM my_info
With this I am not able to query any data
If I change the folder to format (yyyy-MM-ddTHH)
without ’:’ in Step 2
s3://bucket1/data/part=2019-11-12T14
Then I am able to get the results.
Any idea about why this is not working.

This is because when you create the partitioned table the partitioning is implemented as part of the S3 path e.g. for s3://bucket1/data/part=2019-11-12T14:15:16Z the part=2019-11-12T14:15:16Z section is an S3 path that Athena interprets as a partition when querying the data.
S3 path names have some restrictions on the characters that can be used:
The following characters in a key name might require additional code
handling and likely need to be URL encoded or referenced as HEX. Some
of these are non-printable characters and your browser might not
handle them, which also requires special handling:
Ampersand ("&")
Dollar ("$")
ASCII character ranges 00–1F hex (0–31 decimal) and 7F (127 decimal)
'At' symbol ("#")
Equals ("=")
Semicolon (";")
Colon (":")
Plus ("+")
Space – Significant sequences of spaces may be lost in some uses (especially multiple spaces)
Comma (",")
Question mark ("?")
In this case it's probably the colons in the path that are not being interpreted by Presto/Athena. To work around this you can use an alternative dividing character in the timestamp e.g. part=2019-11-12--14-15-16 or omit it altogether.

It seems you can use an URL encoded colon (%3A).
Further, if you which to use timestamp as the partition type instead of string, make sure to use a "java.sql.Timestamp compatible format" as documented for the CREATE TABLE statement.
So the final url would be s3://bucket1/data/part=2019-11-12 14%3A15%3A16/.

Related

Azure Data Factory removing spaces from column names of csv file

I'm a bit new to azure data factory so apologies if I'm missing anything obvious. I've done several searches and I can't find anything that quite fits.
So the situation is that we have an existing pipeline that will take the path to a csv file and pass this in as a delimited data set. As a sink it is using a parquet data set. This is a generic process that we can pass any delimited file into and it will output it as parquet.
This has been working well but now we have started receiving files with spaces and special characters in the header which causes the output to parquet to fail. Unfortunately we don't have control over the format of the files we receive so I can't handle this at source.
What I would like to do is on ingestion of the file replace any spaces and other special characters in the header with an underscore. If I were doing this on premise I could quickly create a powershell script to do it. I had thought about creating a custom task in AFD to call a powershell script to do this in the blob storage but that seems more complicated than it should be. Is there something else I can do to get this process working while keeping it generic?

As #Joel Cochran mentioned, you can use the below expression in Select transformation to replace space and special characters in the header.
regexReplace($$,'[^a-zA-Z]','_')
Source:
In Select transformation, remove the auto mappings and add new rule base mapping to use this expression.
preview:

You can change the output filename not directly in the Copy activity, assuming you are using this activity.
The workaround is to use a parameter for the filename output that you can cleanup.
You can use the Get Metadata activity to get all filenames from the source csv files.
Then loop over these files with a foreach activity.
Within the foreach activity you can set the output filename with the new name with the cleaned value.
The function could look like this:
#replace(item().name, ' ', '_')
More information on the replace function

file transfer Extra attachmate appends username to host file name

Hi when I try to download a file from mainframe, using attachmate extra it appends the username also along with it. I dont know where to turn it off.
like for example - file name is yyyy.file.name, then when i try to transfer of file it transfers username.yyyy.file.name.
in 3.4 the option to append user name is turned off. Still its happening

Enclose the entire dataset name (including the high-level qualifier) in single quotes. This is a TSO (not JCL) convention - if you refer to a dataset without single quotes, it pre-pends your user ID as the high-level qualifier; however if you place single quotes around the dataset name it will take it 'as is' (well, it will uppercase it, since all z/OS dataset names are uppercase, but otherwise it will be 'as is').

ERROR for load files in HBase at Azure with ImportTsv

Trying to load tsv file in HBase running in HDInsight in Microsoft Azure cloud using a recommended approach connecting through Remote Desktop and running on the command line trying to load t1.tsv file (with two tab separated columns) from hdfs into hbase t1 table:
C:\apps\dist\hbase-0.98.0.2.1.5.0-2057-hadoop2\bin>hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,num t1 t1.tsv
and get:
ERROR: One or more columns in addition to the row key and timestamp(optional) are required
Usage: importtsv -Dimporttsv.columns=a,b,c
replacing order of the specified columns to num,HBASE_ROW_KEY
C:\apps\dist\hbase-0.98.0.2.1.5.0-2057-hadoop2\bin>hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=num,HBASE_ROW_KEY t1 t1.tsv
I get:
ERROR: Must specify exactly one column as HBASE_ROW_KEY
Usage: importtsv -Dimporttsv.columns=a,b,c
This tells me that comma separator in the column list is not recognized or column name is incorrect I also tried to use column with qualifier as num:v and as 'num' - nothing helps
Any ideas what could be wrong here? Thanks.

>hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,d:c1,d:c2" testtable /example/inputfile.txt
This works for me. I think there are some differences between terminals in Linux and Windows, thus in windows you need to add quotation marks to clarify this is a value string, otherwise might not be recognized.

Can I import SAP tables that were exported by SE16?

I have exported the contents of a table with transaction SE16, by selecting all the entries and going selecting Download, unconverted.
I'd like to import these entries into another system (where the same table exists and is active).
Furthermore, when I import, there's a possibility that the specific key already exists for a number of entries (old entries).
Other entries won't have a field with the same key present in the table where they're to be imported (new entries).
Is there a way to easily update my table in the second system with the file provided from the first system? If needed, I can export the data in the 3 other format types (Spreadsheet, Rich text format and HTML format). It seems to me though like the spreadsheet and rich text formats sometimes corrupt the data, and the html is far too verbose.
[EDIT]
As per popular demand, the table i'm trying to export / import is a Z table whose fields are all numeric, character, date or time fields (flat data types).
I'm trying to do it like this because the clients don't have any basis resource to help them transport, and would like to "kinna" automate the process of updating one of the tables in one system.
At the moment it's a business request to do it like this, but I'm open to suggestions (and the clients are open too)

Edit
Ok I doubt that what you describe in your comment exists out of the box, but you can easily write something like that:
Create a method (or function module if that floats your boat) that accepts the following:
iv_table name TYPE string and
iv_filename TYPE string
This would be the method:
method upload_table.
data: lt_table type ref to data,
lx_root type ref to cx_root.
field-symbols: <table> type standard table.
try.
create data lt_table type table of (iv_table_name).
assign lt_table->* to <table>.
call method cl_gui_frontend_services=>gui_upload
exporting
filename = iv_filename
has_field_separator = abap_true
changing
data_tab = <table>
exceptions
others = 4.
if sy-subrc <> 0.
"Some appropriate error handling
"message id sy-msgid type 'I'
" number sy-msgno
" with sy-msgv1 sy-msgv2
" sy-msgv3 sy-msgv4.
return.
endif.
modify (p_name) from table <table>.
"write: / sy-tabix, ' entries updated'.
catch cx_root into lx_root.
"lv_text = lx_root->get_text( ).
"some appropriate error handling
return.
endtry.
endmethod.
This would still require that you make sure that the exported file matches the table that you want to import. However cl_gui_frontend_services=>gui_upload should return sy-subrc > 0 in that case, so you can bail out before you corrupt any data.
Original Answer:
I'll assume that you want to update a z-table and not a SAP standard table.
You will probably have to format your datafile a little bit to make it tab or comma delimited.
You can then upload the data file using cl_gui_frontend_services=>gui_upload
Then if you want to overwrite the existing data in the table you can use
modify zmydbtab from table it_importeddata.
If you do not want to overwrite existing entries you can use.
insert zmydbtab from table it_importeddata.
You will get a return code of sy-subrc = 4 if any of the keys already exists, but any new entries will be inserted.
Note
There are many reasons why you would NOT do this for a SAP-standard table. Most prominent is that there is almost always more to the data-model than what we are aware of. Also when creating transactional data, there are often follow-on events or workflow that kicks off, that will not be the case if you're updating the database directly. As a rule of thumb, it is usually a bad idea to update SAP standard tables directly.
In that case try to find a BADI, or if that's not available, record a BDC and do the updates that way.

If the system landscape was setup correctly, your client would not need any kind of basis operations support whatsoever to perform the transports. So instead of re-inventing the wheel, I'd strongly suggest to catch up on what the CTS and TMS can do once they're setup with sensible settings.

How to use Relational Stores with a position based data file?

I have different data files that are mapped on relational stores. I do have a formatter which contains the separators used by the different data files (most of them csv). Here is an example of how it looks like:
DQKI 435741198746445 45879645422727JHUFHGLOBAL COLLATERAL SERVICES AGGREGATOR V9
The rule to read this file is as following: from index 0 to 3, it's the code name, from index 8 to 11, it's PID, from index 11 to 20, it's account number, and so on...
How do you specify such rule in ActivePivot Relational Stores?

The relational-store of ActivePivot ships with a high performance, multithreaded CSV-Source to parse files and load them into data stores. I suppose that's what you hope to use for your fixed-length field file.
But this is not supported in the current version of the Relational Store (1.5.x).
You could pre-process your file with a small script to add a separator character at the end of each of the fields. Then the entire CSV Source can be reused immediately.
You could write your own data source that defines fields as offset in the text line. If you do that you can reuse all of the fast field parsers available in the CSV Source project (they work on any char sequences):
com.quartetfs.fwk.format.impl.DoubleParser
com.quartetfs.fwk.format.impl.FloatParser
com.quartetfs.fwk.format.impl.DoubleVectorParser
com.quartetfs.fwk.format.impl.FloatVectorParser
com.quartetfs.fwk.format.impl.IntegerParser
com.quartetfs.fwk.format.impl.IntegerVectorParser
com.quartetfs.fwk.format.impl.LongParser
com.quartetfs.fwk.format.impl.ShortParser
com.quartetfs.fwk.format.impl.StringParser
com.quartetfs.fwk.format.impl.DateParser

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Amazon Athena partition with colon(:) is not working - presto

Related

Azure Data Factory removing spaces from column names of csv file

file transfer Extra attachmate appends username to host file name

ERROR for load files in HBase at Azure with ImportTsv

Can I import SAP tables that were exported by SE16?

How to use Relational Stores with a position based data file?

Categories

Resources