T-SQL Search In String for specific words - string

Short to the point:
"I am using SQL Server Manager 2008 R2. I have a table with columns "product name" and "product size". The size of the product is recorded in his name like that:
Ellas Kitchen Apple & Ginger Baby Cookies 120g
Ellas Kitchen Apple, Raisin 'n' Cinnamon Bakey Bakies 4 x 12g
Elastoplast Spray Plaster 32.5ml
Ellas Kitchen Stage 1 Butternut Squash Each
the size of this product should be:
120g
4 x 12g
32.5ml
N/A
(some of the products can have no size in there name and should be set to "N/A")
I want to write T-SQL statement that update the product size getting it from the product name.
I have done this in javascript, but in order to do the things right I have to write SQL statement and that's my problem. I have found it very difficult to work with "regular expressions" in T-SQL.
I have seen a exmaple of how to get only the number from string, but have no idea how to do using sql.
Declare #ProductName varchar(100)
Select #ProductName= 'dsadadsad15234Nudsadadmbers'
Select #ProductName= SubString(#ProductName,PATINDEX('% [0-9]%',#ProductName),Len(#ProductName))
Select #ProductName= SubString(#ProductName,0,PATINDEX('%[^0-9]%',#ProductName))
Select #ProductName
I will appreciate any example or idea.
Thanks, in advance.
EDIT:
Thanks for your reply,xQbert.
I have not included all possible formats, because if I have a working example with few of them I think I will be able to do for all. Anyway, in order to give more details here are the possible situations:
( Inumber + "x" + Dnumber + word)* + (_)* + (Dnumber + word)*
- * means 0 or more
where word can be - g, kg, ml, cl, pack
where Inumber is integer
where Dnumber is double
where _ is space
For exmaple:
12 x 100g
100ml
2 x kg
And the price (if there is ) is always in the end of the name:
Product name + product prize
For example:
Organix Organic Chick Pea & Red Pepper Lasagne 190g
Organix Organic Vegetable & Pork Risotto 250g
Organix Rice Cakes apple 50g
Organix Rusks 7m+ 6 Pack
Organix Savoury Biscuits Cheese & Onion Each
Organix Savoury Biscuits Tomato & Basil Each
Organix Stage 1 Squash & Chicken 2 x 120g

PATINDEX is not REGX and you have limited logic processing in TSQL compared to .NET. Have you condisidered CLR integration?
http://msdn.microsoft.com/en-us/library/ms131089(SQL.100).aspx
This from 2005 but and example of REGX in SQL via CLR integration.
http://msdn.microsoft.com/en-us/library/ms345136(v=SQL.90).aspx

Related

Summarizing Excel with multiple columns

I have Excel data in a table with a single row, and multiple values in two categories, and I want to summarize the two categories.
Input data:
Recipe
Meal
Ingredients
Plum pie
Coffee
Dessert
Plums
Sugar
Eggs
Plum jam
Breakfast
Coffee
Plums
Sugar
Fried eggs
Breakfast
Lunch
Eggs
Pancakes
Breakfast
Dessert
Eggs
Flour
Milk
Desired output:
Eggs
Flour
Milk
Plums
Sugar
Breakfast
2
1
1
1
1
Coffee
1
2
2
Lunch
1
Dessert
1
1
1
Of course restructuring the input data and summarizing via a Pivot table or Countif is a solution, but not a practical possibility due to the source of the data.
Can anybody help with an intelligent solution (and apologies for the table pictures - can anybody help pasting tables other than as pictures - I solved the tables problem partially via https://tableconvert.com/excel-to-markdown but alas - no colors)
Thanks,
Anders
Quite verbose and an high percentage of lambda, but dynamic enough for you to only enter two variables at the start:
Formula in I2
=LET(meals,B2:C5,ingredients,D2:F5,uq_ing,SORT(UNIQUE(TOROW(ingredients,1),1),,,1),REDUCE(HSTACK("",uq_ing),SORT(UNIQUE(TOCOL(meals,1))),LAMBDA(x,y,VSTACK(x,HSTACK(y,MAP(uq_ing,LAMBDA(z,SUM(BYROW(meals,LAMBDA(v,SUM(N(v=y))))*BYROW(ingredients,LAMBDA(w,SUM(N(w=z))))))))))))
=LET(
meal, B2:C5,
ing, D2:F5,
uMeal, UNIQUE(TOCOL(meal)),
uIng, UNIQUE(TOCOL(ing, 1)),
arr, MAKEARRAY(
ROWS(uMeal),
ROWS(uIng),
LAMBDA(row, col,
SUM(
BYROW(meal, LAMBDA(r, SUM(--(r = INDEX(uMeal, row))))) *
BYROW(ing, LAMBDA(r, SUM(--(r = INDEX(uIng, col)))))
)
)
),
VSTACK(HSTACK("", TRANSPOSE(uIng)), HSTACK(uMeal, arr))
)
Probably very similar to #JvdV's answer, but at least you can check the results against each other.

Extract numerical data from product description

I am working with python3.
I am trying to pull out numerical data from a product description. Sometimes however the same product has a differently worded description which results in conflicting results.
My code is:
import pandas as pd
import re
data = {'desc':['1 oz Silver Eagles Monster Box (500 pc)', 'Silver Eagle Monster Box (500 pcs 1 oz coins)', '2021 10 oz Silver Royal Canadian Mint Bar'], 'inventory':['in stock', 'in stock', 'out of stock']}
df=pd.DataFrame(data)
df['ounces']=df['desc'].str.extract(r'(\d+ pc|\d+ oz)')
print(df)
What i get is:
desc inventory ounces
0 1 oz Silver Eagles Monster Box (500 pc) in stock 1 oz
1 Silver Eagle Monster Box (500 pcs 1 oz coins) in stock 500 pc
2 2021 10 oz Silver Royal Canadian Mint Bar out of stock 10 oz
Clearly the first 2 items are the same. I expected regex to look for 'pc' first in the entire description and then if nothing found look for 'oz' but that is not what it does. What I need to get is:
desc inventory ounces
0 1 oz Silver Eagles Monster Box (500 pc) in stock 500 pc
1 Silver Eagle Monster Box (500 pcs 1 oz coins) in stock 500 pc
2 2021 10 oz Silver Royal Canadian Mint Bar out of stock 10 oz
My original dataframe does not have ounces and i am trying to add that column and extract the correct data at the same time. Should I be going about this differently?
You can use
>>> df['ounces'] = df['desc'].str.findall(r'(?:.*\D)?(\d+ pc)|(\d+ oz)').str[0].str.join('')
>>> df
desc inventory ounces
0 1 oz Silver Eagles Monster Box (500 pc) in stock 500 pc
1 Silver Eagle Monster Box (500 pcs 1 oz coins) in stock 500 pc
2 2021 10 oz Silver Royal Canadian Mint Bar out of stock 10 oz
The (?:.*\D)?(\d+ pc)|(\d+ oz) pattern will give priority to the first capturing group that matches pc, and the oz part will only get matched if it occurs after the pc one.
See the regex demo.
Since Series.str.findall returns all pattern matches, .str[0] is required to obtain the first result only, and .str.join('') convert the tuple (as there are two groups in the pattern, findall returns list of tuples) into a string (since one of the group values will always be empty).

How to filter by distinct AND Max in Excel

hmmm. Thought this was like a normal forum. Ok it's not copying my highlight. I have complex and changing data sets this is a simple version. I want to some how filter the following:
state number seq OP
KY 831222 1 Apple
KY 831222 2 Apple
KY 831222 3 Apple
KY 845678 2 orange
KY 845678 3 orange
KY 845678 4 orange
KY 845678 2 Banana
KY 845678 3 Banana
KY 845678 4 Banana
PA 4567890 4 Apple
PA 4567890 5 Apple
So that I only see the following:
KY 831222 3 Apple
KY 845678 4 orange
KY 845678 4 Banana
PA 4567890 5 Apple
That is I want to filter/group by the MAX seq for EACH set. I THiNK if this was seq I'd do it something like this:
Select Distinct State,number,max(seq), OP From (table created from range) group by KY,number,OP
BUT how do I do it in EXCEL? I know there is a seq wizard thing but I can never get it to work right. Inbedding a sql query in a macro may be doable.
I am a little past beginner with macros. I generally record then figure stuff out later. So I may not understand terms.
OK This looks fine up here in the copy/paste write area but the "draft area" at the bottom looks like it's squishing things together. I don't have time to figure out how to format this so it looks like the table I'm seeing. I just hope this displays correctly.
Thanks for any help
- Sharon
Tried various types of filtering. Tried a pivot table - it just makes a mess - doesn't even display the data neatly. Tried the sql wizard but it's clunky.
Few steps:
Select your range > Data Tab > Sort
Add 4 levels of sorting: State - Number - OP - Seq
Only edit to make: Sort Seq from Large to Small
Apply
Keep range selected > Data Tab > Remove Duplicates
All boxes need to be checked but Seq
Apply
We need to first sort the range on Seq column largest to smallest since remove duplicates will work bottum-top.

Python Web Scraping : Split quantities from Unstructured Data

I am relatively new to the field of Web Scraping as well as python. I am trying to scrape data from a supermarket/Online Grocery stores.
I am facing an issue in cleaning the scraped data-
Data Sample Scraped
Tata Salt Lite, Low Sodium, 1kg
Fortune Kachi Ghani Pure Mustard Oil, 1L (Pet Bottle)
Bourbon Bliss, 150g (Buy 3 Get 1 Free) Amazon Brand
Vedaka Popular Toor/Arhar Dal, 1 kg
Eno Bottle 100 g (Regular) Pro
Nature 100% Organic Masoor Black Whole, 500g
Surf Excel Liquid Detergent 1.05 L
Considering the above data sample I would like to separate the quantities from the product names.
Required Format
Name -Tata Salt Lite, Low Sodium,
Quantity -1kg
Name - Fortune Kachi Ghani Pure Mustard Oil
Quantity - 1L and so on...
I have tried to separate the same with a regex
re.split("[,/._-]+", i)
but with partial success.
Could anyone please help me on how to handle the dataset. Thanks in advance.
You can try to implement below solution to each string:
text_content = "Tata Salt Lite, Low Sodium, 1kg"
quantity = re.search("(\d+\s?(kg|g|L))", text_content).group()
name = text_content.rsplit(quantity)[0].strip().rstrip(',')
description = "Name - {}, Quantity - {}".format(name, quantity)

Load item cost from an inventory table

I have an Inventory Sheet that contains a bunch of data about products I have for sale. I have a sheet for each month where I load in my individual sales. In order to calculate my cost of sales, I enter my product cost for each sale manually. I would like a formula to load the cost automatically, using the product name as a search term.
Inventory Item | Cost Sold Item | Sale Price | Cost
Product 1 | 2.99 Product 3 | 16.99 | X
Product 2 | 4.99 Product 3 | 14.57 | X
Product 3 | 6.99 Product 1 | 7.99 | X
So basically I am looking to "solve for X".
In addition to this, the product name on the two tables are actually different lengths. For example, one item on my Inventory Table may be "This is a very, very long product name that goes on and on for up to 120 characters", and on my products sold table it will be truncated at the first 40 characters of the product name. So in the above formula, it should only search for the first 40 characters of the product name.
Due to the complicated nature of this, I haven't been able to search for a sufficient solution, since I don't really know exactly where to start to quickly explain it.
UPDATE:
The product names of my Inventory List, and the product names of my items sold aren't matching. I thought I could just search for the left-most 40 characters, but this is not the case.
Here is a sample of products I have in my Inventory List:
Ford Focus 2000 thru 2007 (Haynes Repair Manual) by Haynes, Max
Franklin Brass D2408PC Futura, Bath Hardware Accessory, Tissue Paper Holder, ...
Fuji HQ T-120 Recordable VHS Cassette Tapes ( 12 pack ) (Discontinued by Manu...
Fundamentals of Three Dimensional Descriptive Geometry [Paperback] by Slaby, ...
GE Lighting 40184 Energy Smart 55-Watt 2-D Compact Fluorescent Bulb, 250-Watt...
Get Set for School: Readiness & Writing Pre-K Teacher's Guide (Handwriting Wi...
Get the Edge: A 7-Day Program To Transform Your Life [Audiobook] [Box set] by...
Gift Basket Wrap Bag - 2 Pack - 22" x 30" - Clear [Kitchen]
GOLDEN GATE EDITION 100 PIECE PUZZLE [Toy]
Granite Ware 0722 Stainless Steel Deluxe Food Mill, 2-Quart [Kitchen]
Guess Who's Coming, Jesse Bear [Paperback] by Carlstrom, Nancy White; Degen, ...
Guide to Culturally Competent Health Care (Purnell, Guide to Culturally Compe...
Guinness World Records 2002 [Illustrated] by Antonia Cunningham; Jackie Fresh...
Hawaii [Audio CD] High Llamas
And then here is a sample of the product names in my Sold list:
My Interactive Point-and-Play with Disne...
GE Lighting 40184 Energy Smart 55-Watt 2...
Crayola® Sidewalk Chalk Caddy with Chalk...
Crayola® Sidewalk Chalk Caddy with Chalk...
First Look and Find: Merry Christmas!
Sesame Street Point-and-Play and 10-Book...
Disney Mickey Mouse Board Game - Duck Du...
Nordic Ware Microwave Vegetable and Seaf...
SmartGames BACK 2 BACK
I have played around with searching for the left-most characters, minus 3. This did not work correctly. I have also switched the [range lookup] between TRUE and FALSE, but this has also not worked in a predictable way.
Use the VLOOKUP function. Augment the lookup_value parameter with the LEFT function.
        
In the above example, LEFT(E2, 9) is used to truncate the Sold Item lookup into Inventory Item.

Resources