Why is my django unittest failing a constraint?

Why is my django unittest failing a constraint? - python-3.x

I have this model:
class TestopiaEvent(Model):
event_id = AutoField(primary_key=True)
name = CharField(max_length=255)
start_date = DateField()
end_date = DateField()
testers_required = IntegerField()
class Meta:
constraints = [
CheckConstraint(
check=Q(start_date__lte=F('end_date'), start_date__gte=datetime.now().date()),
name='correct_datetime'
)
]
And this test:
class TestopiaEventTestCase(TestCase):
def setUp(self):
self.default_values = {
'name': 'Testopia 1',
'start_date': datetime.now().date(),
'end_date': datetime.now().date() + timedelta(days=1),
'testers_required': 1
}
self.testopia_event = TestopiaEvent(**self.default_values)
def test_save_with_valid_model_check_database(self):
self.assertIsNone(self.testopia_event.save())
And it fails with this error:
django.db.utils.IntegrityError: new row for relation "webserver_testopiaevent" violates check constraint "correct_datetime"
DETAIL: Failing row contains (1, Testopia 1, 2020-07-24 00:00:00+00, 2020-07-25 00:00:00+00, 1).
I don't understand why it is failing as it should only fail if today's date is less than the start date and the start date or/and the start date is greater than the end date, which it isn't?
What have I done wrong? Thanks
Edit: Here are the postgresdb constraints:
testopia=# \d+ webserver_testopiaevent
Table
"public.webserver_testopiaevent"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------------+------------------------+-----------+----------+-----------------------------------------------------------+----------+--------------+-------------
event_id | integer | | not null | nextval('webserver_testopiaevent_event_id_seq'::regclass) | plain | |
name | character varying(255) | | not null | | extended | |
start_date | date | | not null | | plain | |
end_date | date | | not null | | plain | |
testers_required | integer | | not null | | plain | |
Indexes:
"webserver_testopiaevent_pkey" PRIMARY KEY, btree (event_id)
Check constraints:
"correct_datetime" CHECK (start_date >= statement_timestamp() AND start_date <= end_date)
Access method: heap

Now() returns a DateTimeField() so with the timestamp addition it will be more than the current date if my DateField is set to the same date.

Related

PySpark Map to Columns, rename key columns

I am converting the Map column to multiple columns dynamically based on the values in the column. I am using the following code (taken mostly from here), and it works perfectly fine.
However, I would like to rename the column names that are programmatically generated.
Input df:
| map_col |
|:-------------------------------------------------------------------------------|
| {"customer_id":"c5","email":"abc#yahoo.com","mobile_number":"1234567890"} |
| null |
| {"customer_id":"c3","mobile_number":"2345678901","email":"xyz#gmail.com"} |
| {"email":"pqr#hotmail.com","customer_id":"c8","mobile_number":"3456789012"} |
| {"email":"mnk#GMAIL.COM"} |
Code to convert Map to Columns
keys_df = df.select(F.explode(F.map_keys(F.col("map_col")))).distinct()`
keys = list(map(lambda row: row[0], keys_df.collect()))
key_cols = list(map(lambda f: F.col("map_col").getItem(f).alias(str(f)), keys))
final_cols = [F.col("*")] + key_cols
df = df.select(final_cols)
Output df:
| customer_id | mobile_number | email |
|:----------- |:--------------| :---------------|
| c5 | 1234567890 | abc#yahoo.com |
| null | null | null |
| c3 | 2345678901 | xyz#gmail.com |
| c8 | 3456789012 | pqr#hotmail.com |
| null | null | mnk#GMAIL.COM |
I already have the fields customer_id, mobile_number and email in the main dataframe, of which map_col is one of the columns. I get error when I try to generate the output because same column names are already in the dataset. Therefore, I need to rename these column names to customer_id_2, mobile_number_2, and email_2 before it is generated in the dataset. map_col column may have more keys and values than shown.
Desired output:
| customer_id_2 | mobile_number_2 | email_2 |
|:------------- |:-----------------| :---------------|
| c5 | 1234567890 | abc#yahoo.com |
| null | null | null |
| c3 | 2345678901 | xyz#gmail.com |
| c8 | 3456789012 | pqr#hotmail.com |
| null | null | mnk#GMAIL.COM |

Add the following line just before the code which converts map to columns:
df = df.withColumn('map_col', F.expr("transform_keys(map_col, (k, v) -> concat(k, '_2'))"))
This uses transform_keys which changes the key names adding _2 to the originam name, as you needed.

typeOrm unique row

I'm trying to make a Entity using typeOrm on my NestJS, and it's not working as I expected.
I have the following entity
#Entity('TableOne')
export class TableOneModel {
#PrimaryGeneratedColumn()
id: number
#PrimaryColumn()
tableTwoID: number
#PrimaryColumn()
tableThreeID: number
#CreateDateColumn()
createdAt?: Date
}
This code generate a migration that generates a table like the example below
+--------------+-------------+------+-----+----------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+----------------------+-------+
| id | int(11) | NO | | NULL | |
| tableTwoID | int(11) | NO | | NULL | |
| tableThreeID | int(11) | NO | | NULL | |
| createdAt | datetime(6) | NO | | CURRENT_TIMESTAMP(6) | |
+--------------+-------------+------+-----+----------------------+-------+
That's ok, the problem is, that I want to the table only allow one row with tableTwoID and tableThreeID, what should I use in the Entity to generated the table as I expected it to be?
Expected to not allow rows like the example below
+----+------------+--------------+----------------------------+
| id | tableTwoID | tableThreeID | createdAt |
+----+------------+--------------+----------------------------+
| 1 | 1 | 1 | 2019-10-30 19:27:43.054844 |
| 2 | 1 | 1 | 2019-10-30 19:27:43.819174 | <- should not allow the insert of this row
+----+------------+--------------+----------------------------+

Try marking the column as Unique
#Unique()
ColumnName

This is currently expected behavior from TypeORM. According to the documentation if you have multiple #PrimaryColumn() decorators you create a composite key. The combination of the composite key columns must be unique (in your above '1' + '1' + '1' = '111' vs '2' + '1' + '1' = '211'). If you are looking to make each column unique along with being a composite primary key, you should be able to do something like #PrimaryColumn({ unique: true })

How can I convert a input string into dictionary for each rows of a column in pyspark

I have a column values of a dataframe where I am receiving a string input like below where startIndex is the index of beginning of each character, end index is the end of occurrence of that character in the string and flag is the character itself.
+---+------------------+
| id| Values |
+---+------------------+
|01 | AABBBAA |
|02 | SSSAAAA |
+---+------------------+
Now I want to convert the string into dictionary for each rows as depicted below:
+---+--------------------+
| id| Values |
+---+--------------------+
|01 | [{"startIndex":0, |
| | "endIndex" : 1, |
| | "flag" : A }, |
| | {"startIndex":2, |
| | "endIndex" : 4, |
| | "flag" : B }, |
| | {"startIndex":5, |
| | "endIndex" : 6, |
| | "flag" : A }] |
|02 | [{"startIndex":0, |
| | "endIndex" : 2, |
| | "flag" : S }, |
| | {"startIndex":3, |
| | "endIndex" : 6, |
| | "flag" : A }] |
+---+--------------------+-
I have the pseudo code to frame the dictionary but not sure how to apply it
to all the rows at one go without using loops. Also the problem with such
approach is only the last framed dictionary is getting overwritten in all the rows
import re
x = "aaabbbbccaa"
xs = re.findall(r"((.)\2*)", x)
print(xs)
start = 0
output = ''
for item in xs:
end = start + (len(item[0])-1)
startIndex = start
endIndex = end
qualityFlag = item[1]
print(startIndex, endIndex, qualityFlag)
start = end+

Using udf() to wrap up the code logic and to_json() to convert the array of structs into string:
from pyspark.sql.functions import udf, to_json
import re
df = spark.createDataFrame([
('01', 'AABBBAA')
, ('02', 'SSSAAAA')
] , ['id', 'Values']
)
# argument `x` is a StringType() over the udf function
# return `row` as a list of dicts
#udf('array<struct<startIndex:long,endIndex:long,flag:string>>')
def set_fields(x):
row = []
for m in re.finditer(r'(.)\1*', x):
row.append({
'startIndex': m.start()
, 'endIndex': m.end()-1
, 'flag': m.group(1)
})
return row
df.select('id', to_json(set_fields('Values')).alias('Values')).show(truncate=False)
+---+----------------------------------------------------------------------------------------------------------------------------+
|id |Values |
+---+----------------------------------------------------------------------------------------------------------------------------+
|01 |[{"startIndex":0,"endIndex":1,"flag":"A"},{"startIndex":2,"endIndex":4,"flag":"B"},{"startIndex":5,"endIndex":6,"flag":"A"}]|
|02 |[{"startIndex":0,"endIndex":2,"flag":"S"},{"startIndex":3,"endIndex":6,"flag":"A"}] |
+---+----------------------------------------------------------------------------------------------------------------------------+

Reshaping table Excel PowerQuery

I have a large table in Excel, which is output of a data-gathering tool, that looks more or less like this:
DateA | ValueA | DateB | ValueB | ... | DateZ | ValueZ
---------------------------------------------------------------------------
2019-01-01 | 3 | 2019-01-01 | 6 | ... | 2019-01-04 | 7
2019-01-02 | 1 | 2019-01-04 | 2 | ... | 2019-01-05 | 3
And I'd like to process it so it would like this:
Date | Value | Type
-----------------------------
2019-01-01 | 3 | A
2019-01-02 | 1 | A
2019-01-01 | 6 | B
2019-01-04 | 2 | B
...
2019-01-04 | 7 | Z
2019-01-05 | 3 | Z
Because this is the format, that is used on our sql database.
How to do this in the least tedious way, preferably using PowerQuery? I'd like to avoid brute-force coping and pasting with vba loop.
The number of columns is fixed, but would be nice to have an option to add another one later on, the number of rows would however vary around some value (like 20, 21, 20, 22, 19, 20) day-to-day

Columns are harder to work with, so I'd first transform each column into a new row as a list.
ColumnsToRows =
Table.FromColumns(
{
Table.ToColumns(Source),
Table.ColumnNames(Source)
},
{"ColumnValues","ColumnName"}
)
This should give you a table as follows where each list consists of values in the corresponding column. For example, the top list is {1/1/2019,1/2/2019}. (The from columns part is to add the ColumnName column.)
| ColumnValues | ColumnName |
|--------------|------------|
| [List] | DateA |
| [List] | ValueA |
| [List] | DateB |
| [List] | ValueB |
| [List] | DateZ |
| [List] | ValueZ |
We can then filter this based on the data type in each list. To get the date rows you can write:
DataRows =
Table.SelectRows(
ColumnsToRows,
each Value.Type(List.First([ColumnValues])) = type date
)
Which gets you the following filtered table:
| ColumnValues | ColumnName |
|--------------|------------|
| [List] | DateA |
| [List] | DateB |
| [List] | DateZ |
If you expand the first column with Table.ExpandListColumn(DataRows, "ColumnValues"), then you get
| ColumnValues | ColumnName |
|--------------|------------|
| 1/1/2019 | DateA |
| 1/2/2019 | DateA |
| 1/1/2019 | DateB |
| 1/4/2019 | DateB |
| 1/4/2019 | DateZ |
| 1/5/2019 | DateZ |
The logic is analogous to filter and expand the value rows.
ValueRows =
Table.ExpandListColumn(
Table.SelectRows(
ColumnsToRows,
each Value.Type(List.First([ColumnValues])) = type number
),
"ColumnValues"
)
Which gets you a similar looking table:
| ColumnValues | ColumnName |
|--------------|------------|
| 3 | ValueA |
| 1 | ValueA |
| 6 | ValueB |
| 2 | ValueB |
| 7 | ValueZ |
| 3 | ValueZ |
Now we just need to combine together the columns we want into a single table:
Combine Columns =
Table.FromColumns(
{
DateRows[ColumnValues],
ValueRows[ColumnValues],
ValueRows[ColumnName]
},
{"Date", "Value", "Type"}
)
and then extract the text following Value in the column names.
ExtractType =
Table.TransformColumns(
CombineColumnns,
{{"Type", each Text.AfterDelimiter(_, "Value"), type text}}
)
The final table should be just as specified:
| Date | Value | Type |
|----------|-------|------|
| 1/1/2019 | 3 | A |
| 1/2/2019 | 1 | A |
| 1/1/2019 | 6 | B |
| 1/4/2019 | 2 | B |
| 1/4/2019 | 7 | Z |
| 1/5/2019 | 3 | Z |
All in a single query, the M code looks like this:
let
Source = <Source Goes Here>,
ColumnsToRows = Table.FromColumns({Table.ToColumns(Source), Table.ColumnNames(Source)}, {"ColumnValues","ColumnName"}),
DateRows = Table.ExpandListColumn(Table.SelectRows(ColumnsToRows, each Value.Type(List.First([ColumnValues])) = type date), "ColumnValues"),
ValueRows = Table.ExpandListColumn(Table.SelectRows(ColumnsToRows, each Value.Type(List.First([ColumnValues])) = type number), "ColumnValues"),
CombineColumnns = Table.FromColumns({DateRows[ColumnValues], ValueRows[ColumnValues], ValueRows[ColumnName]},{"Date", "Value", "Type"}),
ExtractType = Table.TransformColumns(CombineColumnns, {{"Type", each Text.AfterDelimiter(_, "Value"), type text}})
in
ExtractType

spark dataframe save to SQL table with auto increment column

I have the following table in db
+----------------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| VERSION | bigint(20) | NO | | NULL | |
| user_id | bigint(20) | NO | MUL | NULL | |
| measurement_id | bigint(20) | NO | MUL | NULL | |
| day | timestamp | NO | | NULL | |
| hour | tinyint(4) | NO | | NULL | |
| hour_timestamp | timestamp | NO | | NULL | |
| value | bigint(20) | NO | | NULL | |
+----------------+------------+------+-----+---------+----------------+
I'm trying to save spark dataframe that holds multiple rows that have the following case class structure:
case class Record(val id : Int,
val VERSION : Int,
val user_id : Int,
val measurement_id : Int,
val day : Timestamp,
val hour : Int,
val hour_timestamp : Timestamp,
val value : Long )
When I'm trying to save the dataframe to my sql through jdbc driver using:
dataFrame.insertIntoJDBC(...)
I get a primary key violation error:
com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry '1' for key 'PRIMARY'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
I tried to set id=0 as the default value of all the rows and also tried to remove the id field from the case class, neither worked.
Can anyone help?
Thanks,
Tomer

Found it.
I had a sql <-> java column type issue.
According to: https://www.cis.upenn.edu/~bcpierce/courses/629/jdkdocs/guide/jdbc/getstart/mapping.doc.html
bigint sql columns should be represented as Long in java.
After I've changed my case class to:
case class Record(val id: Long,
val VERSION : Long,
val user_id : Long,
val measurement_id : Long,
val day : Timestamp,
val hour : Int,
val hour_timestamp : Timestamp,
val value : Long )
And set a id=0 for all the records in the dataframe it worked.
Thanks

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why is my django unittest failing a constraint? - python-3.x

Now() returns a DateTimeField() so with the timestamp addition it will be more than the current date if my DateField is set to the same date.

Related

PySpark Map to Columns, rename key columns

typeOrm unique row

How can I convert a input string into dictionary for each rows of a column in pyspark

Reshaping table Excel PowerQuery

spark dataframe save to SQL table with auto increment column

Categories

Resources