I am trying to load neo4j db using CSV file from python. I am able to connect and load the data without using UNWIND operation. But I want to use it for a faster load. I am getting the following error upon trying to use it.
{code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input '': expected whitespace, '.', node labels, '[', '^', '*', '/', '%', '+', '-', "=~", IN, STARTS, ENDS, CONTAINS, IS, '=', '~', "<>", "!=", '<', '>', "<=", ">=", AND, XOR, OR or AS (line 1, column 16 (offset: 15))
" UNWIND {data1} as node"
^}
I have the following code where I am using UNWIND. This is just a small sample I am trying to test out with.
cq1=""" UNWIND {data1} as node
MERGE (n:Person {id : node.ID})
SET
n.gender = node.GENDER"""
list_dicts_data1 = [{"ID": '1',"GENDER": 'Male'}]
graph.run(cq1, data1 = list_dicts_data1)
Any suggestions what is causing this error and how do I fix it?
I can't read the error clearly due to bad formatting, but which version of Neo4j are you running? The parameter syntax changed a while ago from {var} to $var. That might be the cause.
Also, have you considered using py2neo's bulk load API, which wraps this kind of statement for you?
https://py2neo.org/2021.0/bulk/index.html
Related
I have created a cloud log query as follows in which I get all logs of the airflow dag run. I want to get all the logs saved in the cloud sql using python.
resource.type="composername"
severity=(ERROR OR WARNING OR INFO)
logName=("projects/projectname/logs/airflow-worker" OR "projects/projectname/logs/airflow-
scheduler")
labels.workflow:"DAG_NAME"
labels.task-id:"task1"
textPayload:"exception"
FILTER = "timestamp>=2022-05-16T14:26:50.943463+00:00 AND timestamp<2022-05-
16T14:27:21.368493+00:00 AND labels.workflow:ibis_secret"
entries = client.list_entries(filter_=FILTER,order_by=None)
for entry in logger.list_entries():
#print(type(entry))
# print("##",entry)
timestamp = entry.timestamp.isoformat()
print('* {} -> {}'.format (timestamp, entry.payload))
data = entry.payload
Error is like:-
raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InvalidArgument: 400 Unparseable filter:
syntax error at line 1, column 26, token ':';
syntax error at line 1, column 64, token ':';
syntax error at line 1, column 65, token '00'
When i am trying to pass the timestamp with timing and timezone its giving error and when I remove the timestamp its working fine. Also I needed information on how to pass the cloud query log in the filter so I can get respective dagname dag logging in the above python code . Please let me know if anyone has an idea or suggestion for this.
Enclose your timestamps in double quotes. Your FILTER should be:
FILTER = 'timestamp >= "2022-05-16T14:26:50.943463+00:00" AND timestamp < "2022-05-16T14:27:21.368493+00:00" AND labels.workflow:ibis_secret'
I am trying to code a binance Buy function. However the code
from binance.enums import *
order = client.create_order(symbol='DOGEUSDT', side = 'BUY', type = 'MARKET', quantity = 475, timeInForce='GTC')
this code outputs > APIError(code=-1121): Invalid symbol.
Also, for the same symbol,
print(client.get_symbol_info(symbol="DOGEUSDT"))
ouptuts > None
The symbol DOGEUSDT exists is the orderbook. https://api.binance.com/api/v3/ticker/bookTicker
I dont know why the same symbol I get from binance is invalid.
Are you using the testnet? I had the same error, however when I removed testnet=True from the client initialization the error was resolved.
two things can happen
client.API_URL = https://testnet.binance.vision/api
in that case change it to
https://api.binance.com/api
Client(api_key, secret_key, testnet=True)
in that case change testnet=False or delete it
I am trying to create a virtual table in HANA based on a remote system table view.
If I run it at the command line using hdbsql
hdbsql H00=> create virtual table HanaIndexTable at "SYSRDL#CG_SOURCE"."<NULL>"."dbo"."sysiqvindex"
0 rows affected (overall time 305.661 msec; server time 215.870 msec)
I am able to select from HanaIndexTable and get results and see my index.
When I code it in python, I use the following command:
cursor.execute("""create virtual table HanaIndexTable1 at SYSRDL#CG_source.\<NULL\>.dbo.sysiqvindex""")
I think there is a problem with the NULL. But I see in the output that the escape key is doubled.
self = <hdbcli.dbapi.Cursor object at 0x7f02d61f43d0>
operation = 'create virtual table HanaIndexTable1 at SYSRDL#CG_source.\\<NULL\\>.dbo.sysiqvindex'
parameters = None
def __execute(self, operation, parameters = None):
# parameters is already checked as None or Tuple type.
> ret = self.__cursor.execute(operation, parameters=parameters, scrollable=self._scrollable)
E hdbcli.dbapi.ProgrammingError: (257, 'sql syntax error: incorrect syntax near "\\": line 1 col 58 (at pos 58)')
/usr/local/lib/python3.7/site-packages/hdbcli/dbapi.py:69: ProgrammingError
I have tried to run the command without the <> but get the following error.
hdbcli.dbapi.ProgrammingError: (257, 'sql syntax error: incorrect syntax near "NULL": line 1 col 58 (at pos 58)')
I have tried upper case, lower case and escaping. Is what I am trying to do impossible?
There was an issue with capitalization between HANA and my remote source. I also needed more escaping rather than less.
I am trying to insert rows into my database. Establishing a connection to the database is successful. When I try to insert my desired rows I get an error in the sql. The error appears to be coming from my variable "network_number". I am running nested for loops to iterate through the network number ranges from 1.1.1 - 254.254.254 and adding each unique IP to the database. The network number is written as a string so should the column for the network number be set to VARCHAR or TEXT to include full stops/period? The desired output is to populate my database table with each network number. You can find the sql query assigned to the variable sql_query.
def populate_ip_table(ip_ranges):
network_numbers = ["", "", ""]
information = "Populating the IP table..."
total_ips = (len(ip_ranges) * 254**2)
complete = 0
for octet_one in ip_ranges:
network_numbers[0] = str(octet_one)
percentage_complete = round(100 / total_ips * complete, 2)
information = f"{percentage_complete}% complete"
output_information(information)
for octet_two in range(1, 254 + 1):
network_numbers[1] = str(octet_two)
for octet_three in range(1, 254 + 1):
network_numbers[2] = str(octet_three)
network_number = ".".join(network_numbers)
complete += 1
sql_query = f"INSERT INTO ip_scan_record (ip, scanned_status, times_scanned) VALUES ({network_number}, false, 0)"
execute_sql_statement(sql_query)
information = "100% complete"
output_information(information)
Output
[ * ] Connecting to the PostgreSQL database...
[ * ] Connection successful
[ * ] Executing SQL statement
[ ! ] syntax error at or near ".50"
LINE 1: ...rd (ip, scanned_status, times_scanned) VALUES (1.1.50, false...
^
As stated by the Docs:
There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used
Postgresql Docs
I think you need to use VARCHAR, due to the small varying length of your ip-string. while, text is effectively avarchar (no limit), but it may have some problems related to indexing if a record with compressed size of greater than 2712 is tried to be inserted.
Actually your problem is, you need to put an extra single qoutes on network_number. To give you a string when inserting the value in postgresql.
To prove this try insert {network_number} as this:
network_number = "'" + ".".join(network_numbers) + "'"
sql_query = f"INSERT INTO ip_scan_record (ip, scanned_status, times_scanned) VALUES ({network_number}, false, 0)"
OR:
sql_query = f"INSERT INTO ip_scan_record (ip, scanned_status, times_scanned) VALUES ('{network_number}', false, 0)"
You could also, used inet dataType, which will save you this hassle.
As stated by Docs:
PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses. It is better to use these types instead of plain text types to store network addresses, because these types offer input error checking and specialized operators and functions.
PostgreSQL: Network Address Types
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a log file, perl script who took the log file and transcript the log file, i want to send this all line (eight thousand line) in one commit
my script:
# Connect to the database.
my $dbh = DBI->connect(
"DBI:mysql:database=DB;host=>IP",
"hostname", 'password',
{'RaiseError' => 1,'AutoCommit'=> 0}
);
open (FILE, 'file.log');
while (<FILE>) {
($word1, $word2, $word3, $word4, $word5, $word6, $word7, $word8, $word9, $word10, $word11, $word12, $word13, $word14) = split(" ");
$word13 =~ s/[^\d.]//g;
if ($word2 eq "Feb") {
$word2 = "02"
}
print "'$word5-$word2-$word3 $word4', $word11, $word13 \n";
eval {
#on peut utiliser insert mais il y aura des doublons et ici on est dans une table unique
my $sth = $dbh->prepare("INSERT INTO `test_query` (time, cstep, time_in_seconde) VALUES('$word5-$word2-$word3 $word4', $word11, $word13);");
#print $sth->rows . " rows found.\n";
#$sth->finish;
# do inserts, updates, deletes, queries here
#$sth->execute() or die "execution failed: $dbh->errstr()";
$sth->execute() or die "execution failed: $dbh->errstr()";
$dbh->commit();
};
### If something went wrong...
}
}
$dbh->disconnect();
thanks
For better performance, you want to simplify your code and move as many code as possible out of the loop :
prepare the statement out of the loop, using bind parameters : the statement is always the same, only bind parameters get to change
commit out of the loop : this will increase performance and also has the advantage of making your process atomic. As all changes occur within the same database transaction, either all lines will be processed (and commited), or, if a failure occurs on any line, no line at all will be commited. While implementing this optimization you need to watch for resource usage on your database (this will typically require more space in the UNDO tablespace) ; if resources are not enough, either increase them or commit every Nth record (with N being as high as possible)
avoid printing inside the loop unless your really need it (I commented that line)
you are building a connection with RaiseError attribute enabled, but then you ignore errors that can occur at execute. If this is really what you want, then just disable the RaiseError attribute on the statement handler, and remove the eval around execute
Other considerations in terms of coding practices :
always use strict and use warnings
use an array to store parsed data instead of a list of scalars : could make your code faster and will make it more readable
Code :
use strict;
use warnings;
# Connect to the database.
my $dbh = DBI->connect(
"DBI:mysql:database=DB;host=>IP",
"hostname", 'password',
{'RaiseError' => 1,'AutoCommit'=> 0}
);
# prepare the insert statement
my $sth = $dbh->prepare("INSERT INTO `test_query` (time, cstep, time_in_seconde) VALUES(?, ?, ?)");
$sth->{RaiseError} = 0;
open (my $file, 'file.log') or die "could not open : $!";
while (<$file>) {
my #words = split / /;
$words[12] =~ s/[^\d.]//g;
if ($words[1] eq "Feb") {
$words[1] = "02" ;
}
# print "'$words[4]-$words[1]-$words[2] $words[3]', $words[10], $words[12] \n";
$sth->execute( "$words[4]-$words[1]-$words[2] $words[3]", $words[10], $words[12] );
}
$dbh->commit;
$dbh->disconnect;
The last solution, that would probably perform even faster than this one, is to use DBI method execute_array to perform bulk database inserts. Attribute ArrayTupleFetch can be used to provide a code reference that DBI will invoke everytime it is ready to perform the next INSERT : this code reference should read the next file line and provide an array reference of values suitable for INSERT. When the file is exhausted, the sub should return undef, which will indicate DBI that the bulk process is completed.
Code :
#!/usr/local/bin/perl
use strict;
use warnings;
use DBI;
# open the file
open (my $file, 'log.file') or die "could not open : $!";
# connect the database
my $dbh = DBI->connect("DBI:mysql:database=DB;host=ip", "hostname", 'password', {'RaiseError' => 1,'AutoCommit'=> 0});
# prepare the INSERT statement
my $sth = $dbh->prepare("INSERT INTO `test_query` (time, cstep, time_in_seconde) VALUES(?, ?, ?)");
# run bulk INSERTS
my $tuples = $sth->execute_array({
ArrayTupleStatus => \my #tuple_status,
ArrayTupleFetch => sub {
my $line = <$file>;
return unless $line;
my #words = split / /;
# ... do anything you like with the array, then ...
return [ "$words[4]-$words[1]-$words[2] $words[3]", $words[10], $words[12] ];
}
});
if ($tuples) {
print "Successfully inserted $tuples records\n";
} else {
# do something usefull with #tuple_status, that contains the detailed results
}
$dbh->commit;
$dbh->disconnect;