I'm using Postgres 9.2, Python 2.7.3, psycopg2 2.5.1.
I have a table, with one of the fields declared as 'some_field int[] NOT NULL' and I need to insert some data, so I'm doing something like this:
cursor.execute('INSERT INTO some_table (some_field) VALUES (%s)', ([1, 2, 3], ))
but unexpectedly getting an error 'DataError: missing "]" in array dimensions', because the result query becames
INSERT INTO some_table (some_field) VALUES ('[1, 2, 3]')
instead of
INSERT INTO some_table (some_field) VALUES (ARRAY[1, 2, 3])
or
INSERT INTO some_table (some_field) VALUES ('{1, 2, 3}')
Am I missing something or it is a psycopg2 error?
The first snippet of code is the correct one. To check the SQL generated by psycopg2 you can always use the mogrify() method:
>>> curs.mogrify('INSERT INTO some_table (some_field) VALUES (%s)', ([1, 2, 3], ))
'INSERT INTO some_table (some_field) VALUES (ARRAY[1, 2, 3])'
Then you can try the SQL using psql and look for errors. If you find that psycopg2 generates a query that can't be executed in psql, please report a bug.
Related
I have a dataframe like below
df.show(2,False)
col1
----------
[[1,2][3,4]]
I want to add the some static value in each array content like this
col2
----------
[[1,2,"Value"],[3,4,"value]]
Please suggest me the way to achieve
explode the array and then use concat function to add the value to the array, finally use collect_list to recreate nested array.
from pyspark.sql.functions import *
df.withColumn("spark_parti_id",spark_partition_id()).\
withColumn("col2",explode(col("col1"))).\
withColumn("col2",concat(col("col2"),array(lit(2)))).\
groupBy("spark_parti_id").\
agg(collect_list(col("col2")).alias("col2")).\
drop("spark_parti_id").\
show(10,False)
#+----------------------+
#|col2 |
#+----------------------+
#|[[1, 2, 2], [3, 4, 2]]|
#+----------------------+
I have an RDD and I want to find distinct values for multiple columns.
Example:
Row(col1=a, col2=b, col3=1), Row(col1=b, col2=2, col3=10)), Row(col1=a1, col2=4, col3=10)
I would like to find have a map:
col1=[a,b,a1]
col2=[b,2,4]
col3=[1,10]
Can dataframe help compute it faster/simpler?
Update:
My solution with RDD was:
def to_uniq_vals(row):
return [(k,v) for k,v in row.items()]
rdd.flatMap(to_uniq_vals).distinct().collect()
Thanks
I hope I understand your question correctly;
You can try the following:
import org.apache.spark.sql.{functions => F}
val df = Seq(("a", 1, 1), ("b", 2, 10), ("a1", 4, 10))
df.select(F.collect_set("_1"), F.collect_set("_2"), F.collect_set("_3")).show
Results:
+---------------+---------------+---------------+
|collect_set(_1)|collect_set(_2)|collect_set(_3)|
+---------------+---------------+---------------+
| [a1, b, a]| [1, 2, 4]| [1, 10]|
+---------------+---------------+---------------+
The code above should be more efficient than the purposed select distinct
column-by-column for several reasons:
Less workers-host round trips.
De-duping should be done locally on the worker prior to inter-worker de-doupings.
Hope it helps!
You can use drop duplicates and then select the same columns. Might not be the most efficient way but still a decent way:
df.dropDuplicates("col1","col2", .... "colN").select("col1","col2", .... "colN").toJSON
** Works well using Scala
I'm trying to read data from a sqlite3 database using python3 and it looks as it tries to be smart and convert columns looking like a integer to integer type. I don't want that (if I got it right sqlite3 stores data as text no matter what anyway).
I've created the database as:
sqlite> create table t (id integer primary key, foo text, bar datetime);
sqlite> insert into t values (NULL, 1, 2);
sqlite> insert into t values (NULL, 1, 'fubar');
sqlite> select * from t;
1|1|2
2|1|fubar
and tried to read it using:
db = sqlite3.connect(dbfile)
cur = db.cursor()
cur.execute("SELECT * FROM t")
for l in cur:
print(t)
db.close()
And getting output like:
(1, '1', 2)
(2, '1', 'fubar')
but I expected/wanted something like
('1', '1', '2')
('2', '1', 'fubar')
(definitely for the last column)
Try
for l in cur:
print((str(x) for x in t))
SQLite stores values in whatever affinity the column has.
If you do not want to have numbers, don't use datetime but text.
How can I get a list of records by PK in a single query using bookshelf.js?
The end query should be equivalent to:
SELECT * FROM `user` WHERE `id` IN (1,3,5,6,7);
This solution works:
AdModel
.where('search_id', 'IN', [1, 2, 3])
.fetchAll();
You can also use the Knex QueryBuilder to get the same result
AdModel
.query(qb => {
qb.whereIn('search_id', [1, 2, 3]);
})
.fetchAll();
List (single) query and fetching relations (withRelated)
BookshelfModel
.where('field_id', 'IN', [1, 2, 3])
.fetchAll({withRelated: ["tableA", "tableB"]});
I'm new to both technologies and I'm trying to do the following:
select * from mytable where column = "col1" or column="col2"
So far, the documentation says I should use the get method by using:
family.get('rowid')
But I do not have the row ID. How would I run the above query?
Thanks
In general I think you're mixing two ideas. The query you've written is in CQL, and Pycassa doesn't support CQL (at least to my knowledge).
However, in general regardless of used query interface, if you don't know the row key, you will have to create Secondary Indexes on the queried columns.
You can do just that in Pycassa, consider following code fragment:
from pycassa.columnfamily import ColumnFamily
from pycassa.pool import ConnectionPool
from pycassa.index import *
from pycassa.system_manager import *
sys = SystemManager('192.168.56.110:9160')
try:
sys.drop_keyspace('TestKeySpace')
except:
pass
sys.create_keyspace('TestKeySpace', SIMPLE_STRATEGY, {'replication_factor': '1'})
sys.create_column_family('TestKeySpace', 'mycolumnfamily')
sys.alter_column('TestKeySpace', 'mycolumnfamily', 'column1', LONG_TYPE)
sys.alter_column('TestKeySpace', 'mycolumnfamily', 'column2', LONG_TYPE)
sys.create_index('TestKeySpace', 'mycolumnfamily', 'column1', value_type=LONG_TYPE, index_name='column1_index')
sys.create_index('TestKeySpace', 'mycolumnfamily', 'column2', value_type=LONG_TYPE, index_name='column2_index')
pool = ConnectionPool('TestKeySpace')
col_fam = ColumnFamily(pool, 'mycolumnfamily')
col_fam.insert('row_key0', {'column1': 10, 'column2': 20})
col_fam.insert('row_key1', {'column1': 20, 'column2': 20})
col_fam.insert('row_key2', {'column1': 30, 'column2': 20})
col_fam.insert('row_key3', {'column1': 10, 'column2': 20})
# OrderedDict([('column1', 10), ('column2', 20)])
print col_fam.get('row_key0')
## Find using index: http://pycassa.github.io/pycassa/api/pycassa/
column1_expr = create_index_expression('column1', 10)
column2_expr = create_index_expression('column2', 20)
clause = create_index_clause([column1_expr, column2_expr], count=20)
for key, columns in col_fam.get_indexed_slices(clause):
print "Key => %s, column1 = %d, column2 = %d" % (key, columns['column1'], columns['column2'])
sys.close
However maybe you can think if it's possible to design your data in a way that you can use row keys to query your data.