I am working with python3.
I am trying to pull out numerical data from a product description. Sometimes however the same product has a differently worded description which results in conflicting results.
My code is:
import pandas as pd
import re
data = {'desc':['1 oz Silver Eagles Monster Box (500 pc)', 'Silver Eagle Monster Box (500 pcs 1 oz coins)', '2021 10 oz Silver Royal Canadian Mint Bar'], 'inventory':['in stock', 'in stock', 'out of stock']}
df=pd.DataFrame(data)
df['ounces']=df['desc'].str.extract(r'(\d+ pc|\d+ oz)')
print(df)
What i get is:
desc inventory ounces
0 1 oz Silver Eagles Monster Box (500 pc) in stock 1 oz
1 Silver Eagle Monster Box (500 pcs 1 oz coins) in stock 500 pc
2 2021 10 oz Silver Royal Canadian Mint Bar out of stock 10 oz
Clearly the first 2 items are the same. I expected regex to look for 'pc' first in the entire description and then if nothing found look for 'oz' but that is not what it does. What I need to get is:
desc inventory ounces
0 1 oz Silver Eagles Monster Box (500 pc) in stock 500 pc
1 Silver Eagle Monster Box (500 pcs 1 oz coins) in stock 500 pc
2 2021 10 oz Silver Royal Canadian Mint Bar out of stock 10 oz
My original dataframe does not have ounces and i am trying to add that column and extract the correct data at the same time. Should I be going about this differently?
You can use
>>> df['ounces'] = df['desc'].str.findall(r'(?:.*\D)?(\d+ pc)|(\d+ oz)').str[0].str.join('')
>>> df
desc inventory ounces
0 1 oz Silver Eagles Monster Box (500 pc) in stock 500 pc
1 Silver Eagle Monster Box (500 pcs 1 oz coins) in stock 500 pc
2 2021 10 oz Silver Royal Canadian Mint Bar out of stock 10 oz
The (?:.*\D)?(\d+ pc)|(\d+ oz) pattern will give priority to the first capturing group that matches pc, and the oz part will only get matched if it occurs after the pc one.
See the regex demo.
Since Series.str.findall returns all pattern matches, .str[0] is required to obtain the first result only, and .str.join('') convert the tuple (as there are two groups in the pattern, findall returns list of tuples) into a string (since one of the group values will always be empty).
Related
When I use FIRST(CG1) in 'Cell Values' the grand total is not summing up instead its showing one of the values from the result of FIRST(CG1).
Please advise if we have to always use sum(XXX) to get the grand total summed up.
Short answer, yes if you want the Grand Total to be a summation of your data. Applying a grand total to a different aggregation will have different results.
AVG will average the averages of your Category Axis
MAX will take the MAX of the Max for each category
Cumulative Sum will take the "Last" value in your Category since it doesn't have any additional values to SUM up.
Product will take the Product of Products
First and Last you are already aware of.
Long Answer:
You actually CAN sum the first value of a column from a column grouping.
For example, consider the following data set.
[Grouping] [Food] [Color] [Weight]
Fruit Apple Yellow 4
Fruit Apple Green 2
Fruit Apple Red 4
Fruit Banana Yellow 5
Fruit Banana Brown 2
Fruit Orange Orange 3
Vegetable Carrot Orange 4
If, in your custom expression, you put
Sum(if(RankReal([Grouping], "ties.method=first", [Food]) = 1, [Weight], 0))
it will find the first instance of each food in your data set, so regardless of how you group the left, your results, subtotals, and grand totals will sum only the first instance of each food.
So you will be able to see the following:
Fruit Apple 4
Banana 5
Orange 3
Subtotal: 12
Vegetable Carrot 4
Subtotal: 4
Grand Total: 16
I have an Inventory Sheet that contains a bunch of data about products I have for sale. I have a sheet for each month where I load in my individual sales. In order to calculate my cost of sales, I enter my product cost for each sale manually. I would like a formula to load the cost automatically, using the product name as a search term.
Inventory Item | Cost Sold Item | Sale Price | Cost
Product 1 | 2.99 Product 3 | 16.99 | X
Product 2 | 4.99 Product 3 | 14.57 | X
Product 3 | 6.99 Product 1 | 7.99 | X
So basically I am looking to "solve for X".
In addition to this, the product name on the two tables are actually different lengths. For example, one item on my Inventory Table may be "This is a very, very long product name that goes on and on for up to 120 characters", and on my products sold table it will be truncated at the first 40 characters of the product name. So in the above formula, it should only search for the first 40 characters of the product name.
Due to the complicated nature of this, I haven't been able to search for a sufficient solution, since I don't really know exactly where to start to quickly explain it.
UPDATE:
The product names of my Inventory List, and the product names of my items sold aren't matching. I thought I could just search for the left-most 40 characters, but this is not the case.
Here is a sample of products I have in my Inventory List:
Ford Focus 2000 thru 2007 (Haynes Repair Manual) by Haynes, Max
Franklin Brass D2408PC Futura, Bath Hardware Accessory, Tissue Paper Holder, ...
Fuji HQ T-120 Recordable VHS Cassette Tapes ( 12 pack ) (Discontinued by Manu...
Fundamentals of Three Dimensional Descriptive Geometry [Paperback] by Slaby, ...
GE Lighting 40184 Energy Smart 55-Watt 2-D Compact Fluorescent Bulb, 250-Watt...
Get Set for School: Readiness & Writing Pre-K Teacher's Guide (Handwriting Wi...
Get the Edge: A 7-Day Program To Transform Your Life [Audiobook] [Box set] by...
Gift Basket Wrap Bag - 2 Pack - 22" x 30" - Clear [Kitchen]
GOLDEN GATE EDITION 100 PIECE PUZZLE [Toy]
Granite Ware 0722 Stainless Steel Deluxe Food Mill, 2-Quart [Kitchen]
Guess Who's Coming, Jesse Bear [Paperback] by Carlstrom, Nancy White; Degen, ...
Guide to Culturally Competent Health Care (Purnell, Guide to Culturally Compe...
Guinness World Records 2002 [Illustrated] by Antonia Cunningham; Jackie Fresh...
Hawaii [Audio CD] High Llamas
And then here is a sample of the product names in my Sold list:
My Interactive Point-and-Play with Disne...
GE Lighting 40184 Energy Smart 55-Watt 2...
Crayola® Sidewalk Chalk Caddy with Chalk...
Crayola® Sidewalk Chalk Caddy with Chalk...
First Look and Find: Merry Christmas!
Sesame Street Point-and-Play and 10-Book...
Disney Mickey Mouse Board Game - Duck Du...
Nordic Ware Microwave Vegetable and Seaf...
SmartGames BACK 2 BACK
I have played around with searching for the left-most characters, minus 3. This did not work correctly. I have also switched the [range lookup] between TRUE and FALSE, but this has also not worked in a predictable way.
Use the VLOOKUP function. Augment the lookup_value parameter with the LEFT function.
In the above example, LEFT(E2, 9) is used to truncate the Sold Item lookup into Inventory Item.
I have been breaking my head with this problem since morning and I haven't found a solution. Please give your valuable pointers if possible, so that I can try to find the solution.
I basically have two sets of data- an old list and a new list. I wish to compare the new list( comparing name and country together) with an old list, since the new list has a few additional entries. Later on, I would like to create a new list with common entries from both old and new list and add all the new entries below the common ones ( if possible, else I will do that manually later on but I would like Excel to tell me that this is a new entry). Sorry, if this has not been well explained, but maybe the following illustration helps
Old List
Item No. Name Country
1 Apples Italy
3 Banana Spain
4 Grapes Slovakia
5 Pineapple Greece
8 Banana Czech Republic
14 Apples India
23 Pineapple Hungary
19 Peach USA
2 Strawberries France
New List
Item No. Name Country
4 Grapes Slovakia
Mango Pakistan
14 Apples India
Oranges Mexico
19 Peach USA
2 Strawberries France
1 Apples Italy
3 Banana Spain
23 Pineapple Hungary
Avocado Netherlands
Expected Output:
List with common serial No.s based on common names from both lists
Item No.Name Country
4 Grapes Slovakia
14 Apples India
19 Peach USA
2 Strawberries France
1 Apples Italy
3 Banana Spain
23 Pineapple Hungary
Mango Pakistan
Oranges Mexico
Avocado Netherlands
As can be seen in this attachment, I have an old list with Item No., Name and Country. Let's assume that the item numbers have been classified based on some code words. In the second list, there are again Item No.s, Name and Country but some item numbers haven't been filled ( since they are new and have not yet been sorted). Now, I want Excel to compare the names AND countries of both data and provide the common Item No. output if there is a match. If there is no match, then I would like Excel to tell me that this is a new entry. I looked up on various forums and I realized that VLOOKUP command only allows me to search on Name OR Country would give me the common entries of Names/Countries respectively but not Item No.s. Is there any formula that could help me solve this problem?
Just paste the list together, then sort it, and then remove the duplicates. Removing duplicates is built-in into Excel starting from version 2007, you will find it in the Data ribbon (see http://office.microsoft.com/en-001/excel-help/filter-for-unique-values-or-remove-duplicate-values-HP010073943.aspx).
To use VLOOKUP just concatenate Name and Country, for example, B2 & "-" & C2. You can then do a lookup on the concatenated values in your source table:
VLOOKUP(B2 & "-" & C2,NewList!D2:E100,2,False)
This assumes that the concatenated column is in D in your new table, and that you've copied the numbers to column E (VLOOKUP can't look to the right). I put in the dash for readablility and to avoid the chance that a Country ends with a number, unlikely as that might be.
In SharePoint I have a table with a lookup column that takes in multiple values. I am having a problem finding a code for a calculated column that counts how many values the user chose. For example,
Favorite Ice Cream
1 Vanilla; Chocolate; Strawberry
2 Cookies and Cream; Mint
3 Coconut; Chocolate; Butter Pecan; Strawberry
And an adjacent column will be...
Number of total Ice Cream
1 3
2 2
3 4
Thank you in advance!
Short to the point:
"I am using SQL Server Manager 2008 R2. I have a table with columns "product name" and "product size". The size of the product is recorded in his name like that:
Ellas Kitchen Apple & Ginger Baby Cookies 120g
Ellas Kitchen Apple, Raisin 'n' Cinnamon Bakey Bakies 4 x 12g
Elastoplast Spray Plaster 32.5ml
Ellas Kitchen Stage 1 Butternut Squash Each
the size of this product should be:
120g
4 x 12g
32.5ml
N/A
(some of the products can have no size in there name and should be set to "N/A")
I want to write T-SQL statement that update the product size getting it from the product name.
I have done this in javascript, but in order to do the things right I have to write SQL statement and that's my problem. I have found it very difficult to work with "regular expressions" in T-SQL.
I have seen a exmaple of how to get only the number from string, but have no idea how to do using sql.
Declare #ProductName varchar(100)
Select #ProductName= 'dsadadsad15234Nudsadadmbers'
Select #ProductName= SubString(#ProductName,PATINDEX('% [0-9]%',#ProductName),Len(#ProductName))
Select #ProductName= SubString(#ProductName,0,PATINDEX('%[^0-9]%',#ProductName))
Select #ProductName
I will appreciate any example or idea.
Thanks, in advance.
EDIT:
Thanks for your reply,xQbert.
I have not included all possible formats, because if I have a working example with few of them I think I will be able to do for all. Anyway, in order to give more details here are the possible situations:
( Inumber + "x" + Dnumber + word)* + (_)* + (Dnumber + word)*
- * means 0 or more
where word can be - g, kg, ml, cl, pack
where Inumber is integer
where Dnumber is double
where _ is space
For exmaple:
12 x 100g
100ml
2 x kg
And the price (if there is ) is always in the end of the name:
Product name + product prize
For example:
Organix Organic Chick Pea & Red Pepper Lasagne 190g
Organix Organic Vegetable & Pork Risotto 250g
Organix Rice Cakes apple 50g
Organix Rusks 7m+ 6 Pack
Organix Savoury Biscuits Cheese & Onion Each
Organix Savoury Biscuits Tomato & Basil Each
Organix Stage 1 Squash & Chicken 2 x 120g
PATINDEX is not REGX and you have limited logic processing in TSQL compared to .NET. Have you condisidered CLR integration?
http://msdn.microsoft.com/en-us/library/ms131089(SQL.100).aspx
This from 2005 but and example of REGX in SQL via CLR integration.
http://msdn.microsoft.com/en-us/library/ms345136(v=SQL.90).aspx