Please suggest an optimised Django update Query on Bulk Data - python-3.x

So, I have list of dictionaries like this
dnc_info = [{'website': 'www.mdn.com', 'name':'shubham', 'company_name': 'mdn'}, {'website': 'google.com', 'name': 'ketan', 'company_name': 'google'}, {'website': 'http://microsoft.com', name:'somename', , 'company_name': 'microsoft'}, {'website': None, 'name':'somename2',, 'company_name': None}....] upto 10,000 dict
Now, I have a DataBase(PostgreSQL) table which contains the following field:
+--------------+-------------+--------------------+-------------+---------
| company_name | website | email | campaign_id | color_code | |
+--------------+-------------+--------------------+-------------+------------+--+
| google | google.com | shubham#google.com | 50 | #FFFFFF | |
| mdn | www.mdn.com | some#mdn.com | 50 | #FFFFFF | |
+--------------+-------------+--------------------+-------------+---------
up to 20,000 rows
Now what I want is to be able to update the color code field the above table from dnc_info on basis following conditions
Condition 1: Table's company name should match with dnc_info company name ignoring case sensitivity
Condition 2: Only website's domain from table should match with dnc_info website domain ignoring case senstivity
Condition 3: Table's email domain should match with dnc_info website's domain also ignoring case sensitivity.
Condition 4: Table's email should match dnc_info email also ignoring case sensitivity.
I'm able to create separate lists for every object key from dnc_info like this:
website = ['mdn.com', 'google.com', 'microsoft.com']
email = ['shubham#mdn.com', 'someone#google.com']
Please suggest an optimised model query based on the above conditions that will update the column color_code in the table.

Instead of using ORM, I had used raw_query() and it worked for me.

Related

How can I combine duplicates of 1 column, then have multiple results in the same row of another column?

I am very new to kql, and i am stuck on this query. I am looking to have query display which users have had sign-ins from different states. I created this query, but i do not know how to count the results in the column "names".
SigninLogs
| project tostring(LocationDetails.state), UserDisplayName
| extend p =pack( 'Locations', LocationDetails_state)
| summarize names = make_set(p) by UserDisplayName
This generates a column "names" with a row like so:
[{"Locations":"Arkansas"},{"Locations":"Iowa"},{"Locations":""}]
Here is a simple query that grabs all sign-ins from users and another column with the locations.
SigninLogs
| where ResultType == "0"
| summarize by UserDisplayName, tostring(LocationDetails.state)
Is there a way to combine the duplicates of users column, and then display each location in the second? If so, could i count each location in order to filter by where location is > 1?
I am looking to have query display which users have had sign-ins from different states
Assuming I understood your question correctly, this could work (using array_length()):
SigninLogs
| project State = tostring(LocationDetails.state), UserDisplayName
| summarize States = make_set(State) by UserDisplayName
| where array_length(States) > 1 // filter users who had sign-ins from *more than 1 state*

Dual criteria data validation in Excel

Unlike the other questions posted with this topic, my criteria are not simple comparators. I want a dropdown list that includes all values in one named table excluding those values that meet another criteria. For instance a table includes employee names in one column and vacation dates in another column. I want the data validation to allow a list of employees who are not on vacation for a variable date drawn from another cell. The general method seems to be to create additional tables where the secondary criteria (in this case date) is the column header populated by items from the first list that satisfy some criteria. It seems impractical to create 365 tables named for each date and populated by rows of employees from the first table that have not requested that date off. Is there another way to accomplish this?
Sample Data:
| Employee| Vacation Dates | | work on 1/26/20 |
_____________________________ ___________________
| Bob | 1/26/20, 1/27/20| | <allow only |
| Mike | 2/20/20, 2/21/20| | Mike or Cindy> |
| Cindy | 2/20/20, 1/28/20|
Had to transpose my thinking. Rather than a table for each date, I can have a vacation table for each employee. The validation formula has to be a custom validation rather than a list, so no drop down selection list is available, but it will work. Error message also cannot discriminate which criteria is being violated -- name not on employee list versus name from employee list who is on vacation. Would be great if validation worked like conditional formatting with different rules applied in sequence.
| Employee| Bob | Mike | Cindy | | 1/26/20 |
____________________________________| ___________
| Bob |1/26/20| 2/20/20 |2/20/20| | |
| Mike |1/27/20| 2/21/20 |1/28/20| | |
| Cindy |
The validation formula for the "1/26/20" column (F in the scheme above) would be
=AND(COUNTIF($A$2:$A$4,F2)>0,COUNTIF(INDIRECT(ADDRESS(2,MATCH(F2,$B$1:$D$1,0)+1)):INDIRECT(ADDRESS(3,MATCH(G2,$B$1:$D$1,0)+1)),F1)<1)

dynamoDB: How to query N elements starting at X index

I'm looking for examples of the code using python3, no links to the documentation. I havent found examples in the documentation.
I'm looking to query 2 elements with the category "red" starting at the ID 1.
This is my table:
| ID | category | description |
| 0 | red | .... |
| 1 | red | .... |
| 2 | blue | .... |
| 3 | red | .... |
| 4 | red | .... |
The query should return the elements with the id 1 and 3.
Looking forward to read your examples. Thanks in advance.
In dynamo Db you query over your PartitionKey, LSIs or GSIs.
In your case I would create a GSI with its' partitionKey (gsiID) as your category and its' sortKey (gsiSK) as your ID.
In that case you can do a query like this: query all elements with gsiID = red and gsiSK = *
This will give you all the reds sorted by their ID in ascending order (you can also specify descending order)
Now dynamo queries have an option to limit your result. Since you need to you can do a limit = 2.
I hope this will help you!
You need to define an Global secondary index in which the partition key is category and the sort key is id.
Once your have that index defined, you can query it as follows (I am using the JS notation, sorry):
{
TableName: 'your_table_name',
IndexName: 'your_index_name',
KeyConditionExpression: 'category = :x and ID >= :y',
ExpressionAttributeValues: {
':x': 'red',
':y': 1
}
}
Note that this is a query. In DynamoDB, queries work on "chunks" of items (aka: "pages"). Specifically, when executing a query, DDB takes a chunk, finds all matching items in that chunks and returns them. If there are other matching items in other chunks they will not be returned. However, the response will provide you with details of the next chunk so that you can issue a subsequent query on the next chunk. These "details" are encapsulated in the LastEvaluatedKey field of the response and they should be copied into the ExclusiveStartKey of the subsequent request.
You can check this guide to see an example of using LastEvaluatedKey. Look for the following line:
while 'LastEvaluatedKey' in response:
Important!
Although you want to get just two items, you do not want to set the Limit field to 2. Setting it to 2 means that DynamoDB will use very small chunks when looking for items that match your query (in fact, it will use chunks of just two items): this means you will need to do numerous repeated queries (by using LastEvaluatedkey/ExclusiveStartKey as explained above) until you actually find two matching items. This will considerably slow down the entire process. For most practical scenarios, the best thing to do is not to set the Limit field at all, and just use its default value.

Recreating a non-straightforward Excel 'vlookup'

I'm looking for some thoughts on how you might recreate a 'vlookup' that I currently do in excel.
I have two tables: Data contains a list of datetime values; DateConverter; contains a list of calendar dates and their associated "network dates." Imagine for a business - not every day is a workday, so if I want to calculate differences in dates, I'm most interested in the number of work days that elapsed between my two dates.
Here is what the data might look like:
Data Table DateConverter Table
================= ===================
| Datetime | | Calendar date | Netowrk date |
| ------------- | | ------------- | ------------ |
| 6-1-15 8:00a | | 6-1-15 | 1000 |
| 6-2-15 1:00p | | 6-2-15 | 1001 |
| 6-3-15 7:00a | | 6-3-15 | 1002 |
| 6-10-15 3:00p | | 6-4-15 | 1003 |
| 6-15-15 1:00p | | 6-5-15 | 1004 |
| 6-12-15 2:00a | | 6-8-15 | 1005 | // Skips the weekend
| ... | | ... | ... |
In excel, I can easily map in the network date for each date in the Datetime field with a variant of vlookup:
// Assume that Datetime values are in Column A, Calendar date values in
// Column C, Network date values in Column D - this formula fills Column B
// Headers are in row 1 - first values are in row 2
B2=OFFSET($D$1,COUNTIFS($C:$C,"<"&A2),)
The formula counts the dates that are less than the lookup value (using countifs because the values in the search array are dates, and the search value is datetime) and returns the associate network date.
Is there a way to do this in Tableau? Will it require a calculated field or can I do this with some kind of join?
Thanks in advance for the help! Let me know if there is anything I can clarify. Thanks!
If the tables are on the same data server, you have the option to use joins, which is usually the most efficient way to combine information from different tables. If the tables are on different servers or platforms, then you can't use a single query to join them.
In either case, you can use Tableau data blending, which is sort of like a client-side join of aggregated results from multiple queries. Its a pretty useful technique, but a little more complex and restricted and also usually less efficient than a server side join.
So if you have the option to have both tables on the same server, start with that. It will be simpler and likely faster.
Note if you are going to use a date as a join key, you probably want to define it is a date and not a datetime.
#alex-blakemore's response would normally be adequate, but if you can change the schema, you could simply add the network date to the DataTable. The hourly granularity should not cause excessive growth and you don't need to navigate the joining.
Then, instead of counting rows and requiring a sorted table, simply subtract the Network date from each other and add 1.

Can Behat tables be used to check multiple text labels

Using Behat with mink and Drupal extensions.
I essentually have a page with multiple labels and I want to confirm the text of them all. I want to do this without having to enter something like.
Then I should see "Filter"
Is there a way to check all the text im expecting using Pystrings or Tables in a similar way they can be used to populate text fields:
And I fill in "Options" with:
Just thought it maybe easier to check it all at once rather than having to provide multiple steps
=====
Update:
After being provided some direction from dblack I used the following inside its own feature to test all labels that fall on the same page:
Note: I use the mink and UIBusinessSelector extensions
Also the 'login' is a custom function
Background: All scenarios require an admin login, then create a filter then confirm page labels
Given I login as an admin
When I go to the page "Product Filter"
And I click the "Add Filter Button"
Scenario Outline: Verifying page text
Then I should see "<ThisText>"
Examples:
| ThisText |
| Filter by SKUs |
| Filter by Package Name |
| Filter by Campaign Medium |
| Filter by Product Category |
| Filter by Product Selection |
| Filter by Product Holiday Experience |
| Filter by Product Star Rating |
| Filter by Product Destination |
| Filter by Product Duration |
| Filter by Product Supplier |
| Filter by Air Ex Point |
| Filter by Land Ex Point |
| Filter by Product departure |
| Filter by Ship name |
| Filter by Cruise Line |
| Remove $0 products |
| Human readable name |
If you want to use a table to check each of your labels, you could use scenario outlines.
Scenario Outline: Check labels
Given I am logged on as "someuser"
When I go to the homepage
Then I should see "<mylabel>"
Examples:
| mylabel |
| Filter |
| Some Other Label |
| Another Label |
The drawback is that scenario outlines are templates, where the scenario outline is run once for each of the examples provided - for your example, you just want to know all the labels are on the page so you don't really want to make a log in and request for each label.
If i wanted to ensure a page contained all the labels it was supposed to, I would just do this (the scenario would be run just once):
Scenario: Check labels
Given I am logged on as "someuser"
When I go to the homepage
And I should see "Filter"
And I should see "Some Other Label"
And I should see "Another Label"

Resources