postgres copy command, binary file

postgres copy command, binary file - linux

I am using COPY to copy a field from a table to a file. this field is a zipped text file, so I use a binary copy.
the file is created, the only problem is that COPY adds a header and a trailer (?) to the file, which I don't need. can this be changed? is there a parameter that can cause COPY to put the field exactly as it is in the database?
If I manually delete the unwanted header I can extract the file with zcat, or gunzip.
I am doing somthing like this:
psql -d some_database -c \
"copy (select some_column from a_table where id=900) to stdout with BINARY;" > /tmp/tmp.gz
And then I want to do
gunzip /tmp/tmp.gz
Any ideas?

One possibility, which works although you may not like it:
psql -At -c "select encode(content, 'base64') from t where ..." | base64 -d
i.e. print the content as base64 and decode it. I think the reality is that psql is intended to produce readable output, and persuading it to disgorge raw binary data is intentionally difficult.
I suppose if you want to enough, you can write some tool (Perl/python script) to connect to the database and print raw output directly.
The "WITH BINARY" option to COPY doesn't just do a simple binary output, it performs some encoding which is probably dubious to rely on.

Are your sure it's best way to store zipped text in database as binary ? According to documentation long text is implicitly/automatically compressed:
Long strings are compressed by the
system automatically, so the physical
requirement on disk might be less.
Very long values are also stored in
background tables so that they do not
interfere with rapid access to shorter
column values. In any case, the
longest possible character string that
can be stored is about 1 GB.

I don't know a straightforward way... COPY has a binary format with a variable length header, not very easy to "trim". Outside that, PG is rather text-centric, I don't tkink there is way to force an "raw" (binary) output from a SELECT for a BYTEA field.
You could get a textual hexadecimal output and write yourself a little program (C, perl or whatever) to convert it from say \x000102414243 to binary. Not difficult, but not straightforward (and the hex format is in Postgresql 9.0)
psql -t -q -c "select binaryfield from.. where ..." mydb | myhextobin > tmp.gz
BTW, Grzegorz's answer is very pertinent.
Added: not very clean, nor foolproof, just if something finds it useful...
/* expects a pg hexadecimal string, in "\x....." format, and converts to binary*/
/* warning: no checks! it just ignores chars outside [0-9a-f] */
#include<stdio.h>
int main() {
int x, pos, v;
char hex[3]={0,0,0};
pos = 0;
while( (x = getchar()) >= 0) {
if(( x >='0' && x <= '9') || ( x >= 'a' && x <= 'f' )) {
hex[pos++] = (char)x;
if(pos == 2) {
sscanf(hex, "%x", &v);
putchar((char)v);
pos = 0;
}
}
}
return pos==0 ? 0 : 1;
}

The copy command do the job. You only need to tell: --no-align and --tuples-only.
For compression, use gzip between psql and file
psql --tuples-only --no-align -d some_database -c \
"copy (select some_column from a_table where id=900) to stdout with BINARY;" | gzip > /tmp/tmp.gz

It is not suggested to attempt to decode the postgresql binary format. Just because the test file you are using works doesn't mean everything will work. For instance, perhaps certain character sequences (not appearing in your test file) get escaped.

You may find it easier to do this by using a language that has client drivers and the ability to read the bytea type: PHP, python, ruby, perl, javascript, java, etc. Just perform your query there, use gzip libraries that probably already exist in that language, and write out the file.
Alternately, you could use a procedural language inside the database and create a stored procedure. You would pass the requested filename to the stored procedure.

Related

Name mapping does not work when exporting shape file from postgres using phsql2shp command

I am exporting a shapefile from a PostgreSQL database using the phsql2shp from PostGIS package on Linux (using WSL on Windows). I am specifying a mappings file with the -m parameter. The problem is that the mapping does not seem to work. For example, this is the command I am executing:
#dump_data.sh
pgsql2shp -kf shapefile.shp -m mappings.txt -h host_ip \
-u shema_name -P password db_name \
"SELECT * FROM table_name limit 10"
and that is the mapping.txt file that I am using:
#mapping.txt
play PlayDesign \n
play_section AcrIdent \n
The resulting Shapefile has the following column names:
For some reason the first mapping succeeded but not the second one which should be AcrIdent not "play_secti"
I suspect this has something to do with the EOL (end of line) scheme. I have set my visual studio code set to CRLF for the mapping.txt file.
Note: Ideally I would like to specify a longer than 10 character name for the resulting column but it seems that this is a limitation of ShapeFiles

Oddly enough, it seems that only the first 10 characters of the postresql fields names are compaired against the first column. "play" does work, because it less than 10 characters; "play_section" is not recognized, because it is compaired against "play_secti".
Oddly even more, the same mapping does work for shp2pgsql.
I think it might be considered a bug. The workaround, anyway, is to write only the first 10 character of the field's name in the first column (if it doesn't collide with another field's name.)
I think, this is the point:
diff --git a/loader/pgsql2shp-core.c b/loader/pgsql2shp-core.c
index 1596dc206..06226e951 100644
--- a/loader/pgsql2shp-core.c
+++ b/loader/pgsql2shp-core.c
## -1536,7 +1536,7 ## ShpDumperOpenTable(SHPDUMPERSTATE *state)
* use this to create the dbf field name from
* the PostgreSQL column name */
{
- const char *mapped = colmap_dbf_by_pg(&state->column_map, dbffieldname);
+ const char *mapped = colmap_dbf_by_pg(&state->column_map, pgfieldname);
if (mapped)
{
strncpy(dbffieldname, mapped, 10);
The prototype of colmap_dbf_by_pg is:
const char *colmap_dbf_by_pg(colmap *map, const char *pgname);
I understand it should be called with the pg field's name, but it is called instead with the dbf field's name.
After applying this change, it behaves as expected.

Storing oracle query results into bash variable

declare -a result=`$ORACLE_HOME/bin/sqlplus -silent $DBUSER/$DBPASSWORD#$DB << EOF $SQLPLUSOPTIONS $roam_query exit; EOF`
I am trying to pull data from an oracle database and populate a bash variable. The select query works however it returns multiple rows and those rows are returned as a long continuous string. I want to capture each row from the database in an array index for example:
index[0] = row 1 information
index[1] = row 2 information
Please help. All suggestions are appreciated. I checked all documentation without no luck. Thank you. I am using solaris unix

If you have bash version 4, you can use the readarray -t command to do this. Any vaguely recent linux should have bash v4, but I don't know about Solaris.
BTW, I'd also recommend putting double-quotes around variable references (e.g. "$DBUSER/$DBPASSWORD#$DB" instead of just $DBUSER/$DBPASSWORD#$DB) (except in here-documents), using $( ) instead of backticks, and using lower- or mixed-case variable names (there are a bunch of all-caps names with special meanings, and if you use one of those by accident, weird things can happen).
I'm not sure I have the here-document (the SQL commands) right, but here's roughly how I'd do it:
readarray -t result < <("$oracle_home/bin/sqlplus" -silent "$dbuser/$dbpassword#$db" << EOF
$sqlplusoptions $roam_query
exit;
EOF
)

Python3 - How to write a number to a file using a variable and sum it with the current number in the file

Suppose I have a file named test.txt and it currently has the number 6 inside of it. I want to use a variable such as x=4 then write to the file and add the two numbers together and save the result in the file.
var1 = 4.0
f=open(test.txt)
balancedata = f.read()
newbalance = float(balancedata) + float(var1)
f.write(newbalance)
print(newbalance)
f.close()

It's probably simpler than you're trying to make it:
variable = 4.0
with open('test.txt') as input_handle:
balance = float(input_handle.read()) + variable
with open('test.txt', 'w') as output_handle:
print(balance, file=output_handle)
Make sure 'test.txt' exists before you run this code and has a number in it, e.g. 0.0 -- you can also modify the code to deal with creating the file in the first place if it's not already there.

Files only read and write strings (or bytes for files opened in binary mode). You need to convert your float to a string before you can write it to your file.
Probably str(newbalance) is what you want, though you could customize how it appears using format if you want. For instance, you could round the number to two decimal places using format(newbalance, '.2f').
Also note that you can't write to a file opened only for reading, so you probably need to either use mode 'r+' (which allows both reading and writing) combined with a f.seek(0) call (and maybe f.truncate() if the length of the new numeric string might be shorter than the old length), or close the file and reopen it in 'w' mode (which will truncate the file for you).

Bash script key/value pair regardless of bash version

I am writing a curl bash script to test webservices. I will have file_1 which would contain the URL paths
/path/to/url/1/{dynamic_path}.xml
/path/to/url/2/list.xml?{query_param}
Since the values in between {} is dynamic, I am creating a separate file, which will have values for these params. the input would be in key-value pair i.e.,
dynamic_path=123
query_param=shipment
By combining two files, the input should become
/path/to/url/1/123.xml
/path/to/url/2/list.xml?shipment
This is the background of my problem. Now my questions
I am doing it in bash script, and the approach I am using is first reading the file with parameters and parse it based on '=' and store it in key/value pair. so it will be easy to replace i.e., for each url I will find the substring between {} and whatever the text it comes with, I will use it as the key to fetch the value from the array
My approach sounds okay (at least to me) BUT, I just realized that
declare -A input_map is only supported in bashscript higher than 4.0. Now, I am not 100% sure what would be the target environment for my script, since it could run in multiple department.
Is there anything better you could suggest ? Any other approach ? Any other design ?
P.S:
This is the first time i am working on bash script.

Here's a risky way to do it: Assuming the values are in a file named "values"
. values
eval "$( sed 's/^/echo "/; s/{/${/; s/$/"/' file_1 )"
Basically, stick a dollar sign in front of the braces and transform each line into an echo statement.
More effort, with awk:
awk '
NR==FNR {split($0, a, /=/); v[a[1]]=a[2]; next}
(i=index($0, "{")) && (j=index($0,"}")) {
key=substr($0,i+1, j-i-1)
print substr($0, 1, i-1) v[key] substr($0, j+1)
}
' values file_1

There are many ways to do this. You seem to think of putting all inputs in a hashmap, and then iterate over that hashmap. In shell scripting it's more common and practical to process things as a stream using pipelines.
For example, your inputs could be in a csv file:
123,shipment
345,order
Then you could process this file like this:
while IFS=, read path param; do
sed -e "s/{dynamic_path}/$path/" -e "s/{query_param}/$param/" file_1
done < input.csv
The output will be:
/path/to/url/1/123.xml
/path/to/url/2/list.xml?shipment
/path/to/url/1/345.xml
/path/to/url/2/list.xml?order
But this is just an example, there can be so many other ways.
You should definitely start by writing a proof of concept and test it on your deployment server. This example should work in old versions of bash too.

Drupal 6 db_query files table

As part of moving from a windows server to a linux server I have to clean up a large number of filenames.
My Problem is that when I execute:
db_query("UPDATE {files} SET filename = '%s' AND filepath = '%s' WHERE fid = %d", $file->filename, $file->filepath, $file->fid);
and afterwards select the content for $file->fid the filename field has the value of "0"
If I dump the query as text both before and after it's being executed the filename field contains the filename I have specified where as the filepath is being stored correctly.

DAMN! putting an AND into an update query will not produce the expected result... MySQL allows this but it's not the way to go :)

Use a comma instead of AND.
Might also want to look into using drupal_write_record() instead of db_query. drupal_write_record will automatically update a pre-existing row if you add the 3rd parameter for a key to check. In your case, you could use the file id.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

postgres copy command, binary file - linux

The copy command do the job. You only need to tell: --no-align and --tuples-only. For compression, use gzip between psql and file psql --tuples-only --no-align -d some_database -c \ "copy (select some_column from a_table where id=900) to stdout with BINARY;" | gzip > /tmp/tmp.gz

It is not suggested to attempt to decode the postgresql binary format. Just because the test file you are using works doesn't mean everything will work. For instance, perhaps certain character sequences (not appearing in your test file) get escaped.

Related

Name mapping does not work when exporting shape file from postgres using phsql2shp command

Storing oracle query results into bash variable

Python3 - How to write a number to a file using a variable and sum it with the current number in the file

Bash script key/value pair regardless of bash version

Drupal 6 db_query files table

Categories

Resources