Combining rows with nearly same fields

Combining rows with nearly same fields - python-3.x

I have two data frames and want to combine them into a single data frame. I used a common key to merge two frames. The final result was a data frame that some of rows have nearly identical fields except a few columns have different values. I want to combine these nearly identical rows into a single row considering adding appropriate columns.
Here are the data frames:
stores:
Banner - Region - Store ID
Walmart - NC - 66999
TJ - NY - 4698
prices:
Price - Store ID - UPC
3.6 - 66999 - 234565
4.5 - 4698 - 334526
I already merged tow frames and played a little bit to converge to the desired frame.
store_cross = pd.crosstab(stores['Store ID'],stores['Region'],margins=True)
merged_df2 = pd.merge(store_cross,prices,left_on='Store ID', right_on='Store ID')
merged_df2 = pd.merge(merged_df2,stores,left_on='Store ID', right_on='Store ID')
This is the result so far:
NY - NC - Price - UPC - Banner
1 - 0 - 3.6 - 234565 - Walmart
0 - 1 - 4.5 - 334526 - TJ
It is possible to have a UPC at different stores. It means that there are other rows in the frame that have the same UPC and Banner but at different locations.
What I am looking to have is something like this:
Banner - UPC - NC - NY
Walmart - 234565 - 3.9 - 3.6
TJ - 334526 - 4.5 - 4.3

I believe you need first merge and then DataFrame.pivot_table:
df = pd.merge(stores, prices, on='Store ID')
store_cross = df.pivot_table(index=['Banner', 'Store ID','UPC'],
columns='Region',
values='Price',
aggfunc='sum').reset_index()
print (store_cross)
Region Banner Store ID UPC NC NY
0 TJ 4698 334526 NaN 4.5
1 Walmart 66999 234565 3.6 NaN

Related

How can understand semantic meaning for different value?

I want to get apple's financial data , download https://www.sec.gov/files/dera/data/financial-statement-and-notes-data-sets/2022_01_notes.zip from https://www.sec.gov/dera/data/financial-statement-and-notes-data-set.html.Extract it and put it in the /tmp/2022_01_notes.You can get the table sub,num and field definiton in the webpage https://www.sec.gov/files/aqfsn_1.pdf.
I compute the zip file's MD5 message digest.
md5sum 2022_01_notes.zip
b1cdf638200991e1bbe260489093bf67 2022_01_notes.zip
You can download it from official webpage or my dropbox:
https://www.dropbox.com/s/5ntwasipze8vr29/2022_01_notes.zip?dl=0
No matter where you download it from ,please check the md5sum value,maybe SEC uploaded wrong file and they will update the zip file in the future.
import pandas as pd
df_sub = pd.read_csv('/tmp/2022_01_notes/sub.tsv',sep='\t')
df_sub[df_sub['cik'] == 320193] #apple's cik is 321093
df_sub
adsh cik name sic countryba stprba cityba ... instance nciks aciks pubfloatusd floatdate floataxis floatmems
4329 0000320193-22-000006 320193 APPLE INC 3571.0 US CA CUPERTINO ... aapl-20220127_htm.xml 1 NaN NaN NaN NaN NaN
4731 0000320193-22-000007 320193 APPLE INC 3571.0 US CA CUPERTINO ... aapl-20211225_htm.xml 1 NaN NaN NaN NaN NaN
0000320193-22-000007 is a access number for its 2022Q2 data.
df_num = pd.read_csv('/tmp/2022_01_notes/num.tsv',sep='\t')
#get all apple's financial data in xbrl concepts format
df_apple = df_num[df_num['adsh'] == '0000320193-22-000007' ]
#extract only one concept ----RevenueFromContractWithCustomerExcludingAssessedTax
#it is revenue mapping into financial accountant concept from xbrl taxonomy.
df_apple_revenue = df_apple[df_apple['tag'] == 'RevenueFromContractWithCustomerExcludingAssessedTax']
df_apple_revenue_2021 = df_apple_revenue[df_apple_revenue['ddate'] == 20201231]
df_apple_revenue_2021
It is too long to display the dataframe on my terminal console,i write into a excel
df_apple_revenue_2021.to_csv('/tmp/apple_revenue_2021.csv')
and show it in the excel,paste the content here.
For the first two lines ,what does 8285000000 and 15761000000 mean?Please give a rational description for 8285000000 and 15761000000.
0000320193-22-000007 RevenueFromContractWithCustomerExcludingAssessedTax us-gaap/2021 20201231 1 USD 0xf159835fd3644f228d15724ad9d1837c 0 8285000000 0 1 0.013698995 5 -6
0000320193-22-000007 RevenueFromContractWithCustomerExcludingAssessedTax us-gaap/2021 20201231 1 USD 0x58c22680ab8dbbfb662ff4e14055c1bd 1 15761000000 0 1 0.013698995 5 -6

To explain these figures, you have to tie back to the filing from which they were extracted. In this case, the filing with the accession-number of 0000320193-22-000007 is Form 10-Q For the Fiscal Quarter Ended December 25, 2021. If you check in that filing, you'll find, for example, seven of the value numbers in your dataframe in the table Net sales by reportable segment specifically Three Months Ended December 26,2020.
So, for example, 8285000000 refers to the Japan segment for that period, while 15761000000 is in the Net sales by category table for the Services category for the same reporting period. That table contains six more of the values in the dataframe.

Is there a way to count unique values in column X where Y and Z match set criteria?

I have a list of data with A, B, C and D where there are lots of duplicate names. I want to count each name only once, if B and C matches kriteria set.
Example:
Peter - Clerk - Working - Male
Peter - Clerk - Working - Male
Steve - Clerk - Working - Male
John - Manager - Working - Male
John - Manager - Working - Male
Dave - Clerk - Working - Male
I have tried various Sum's and Countifs, and manage to count the amount of unique encounters for each name, but when i try to add filters for "clerk" and "working" i fail in making a coherent formula.
{=SUM(1/(COUNTIF(Avdelingsmøte_FK!Deltaker,Avdelingsmøte_FK!Deltaker)))}
The kriteria are set to be Clerk and Working, so i expect the count to return 3. With my current formula i get 4.

You need to use the numerator as a filter then change the COUNTIF to a COUNTIFS and supplement with the condions and finally add the inverse of the numerator into the denomonator.
=SUMPRODUCT(((B2:B7="clerk")*(C2:C7="working"))/(COUNTIFS(A2:A7,A2:A7,B2:B7,"clerk",C2:C7,"working")+((B2:B7<>"clerk")+(C2:C7<>"working"))))

Using SUMIFS to sum all rows matching one criteria within a column matched by another criteria

I think I'm very close to what I want but I'm still getting an #N/A error -
I have some wage sheets that cross reference a labour table 'Table1' which stores the information of my employees (Pay code, Site, Contracted Hours etc). In Table1 I have columns titled 1-10 which values.
On the wage sheet I have a cell 'AM3' that will be a number between 1-10. Depending on that cell, the cell below should sum up all values in that column for all staff at that particular site.
For example - I have a wage sheet for site 'EXAMPLE SITE' which is stored in cell C2 and cell AM3 = 9.
I am trying to use the following formula to make this work:
=SUMIFS(INDEX(Table1,,MATCH(AM3,Table1[#Headers]),0),Table1[[Site]:[Site]],$C$2)
That is, I'm checking Table1, and finding the column headed with the value contained in cell AM3 (with an exact match). criteria_range1 is the Site column and criteria1 is 'EXAMPLE SITE' stored in C2.
I would expect this to sum every cell in column header 9, matching Site 'EXAMPLE SITE'. But I just get the #N/A error.
Table1:
Name - Site - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10
Tom - EXAMPLE SITE - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 10 - 20
Geoff- EXAMPLE SITE - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 9 - 18
Sarah- RANDOM SITE - 0 - 0 - 0 - 0 - 0 - 0 - 5 - 15 - 25 - 40
With that example I want the formula to return '19' as a numerical value. I feel like I'm just being dumb but no amount of googling is helping me.

MATCH seems to be having a hard time with matching table headers with numeric values. Try this:
=SUMIFS(INDEX(Table1,,MATCH(AM3,INDEX(Table1[#Headers]*1,),0)),Table1[[Site]:[Site]],$C$2)

Same problem with MATCH as earlier but alternate resolution.
=SUMIFS(INDEX(Table1[[1]:[10]],0,MATCH(TEXT(AM3, "0"),Table1[[#Headers],[1]:[10]],0)),Table1[Site],C2)

Excel with rows of details related to other row

There is a way to have a group of rows related to other one, in the same sheet, like a more detailed information? Obviously must keep them always next to the main row if you filter or sort.
Desired example based on vehicles and travels:
A B C D
1 [ID] [VEHICLE TYPE] [BRAND] [COLOUR]
+ 2 A-171 PICKUP HONDA BLACK
- 3 [TRAVEL] [KM] [STATION]
- 4 12/08/2016 13.000 BARCELONA
- 5 13/08/2016 13.750 DONOSTI
+ 6 B-501 VAN RENAULT WHITE
- 7 [TRAVEL] [KM] [STATION]
- 8 12/08/2016 117.800 PARIS
- 9 13/08/2016 120.000 AMSTERDAM
- 10 14/08/2016 124.320 MUNICH
So when you sort the spreadsheet, should keep always the travel rows next to the vehicle row.
It is that possible? If not, what can I do to get this or similar? (I don't mind to use other sheet tab, but it wasn't the ideal)

You can use the Group function (Alt-A-G-G), and they won't be sorted as usual if you use sort on the whole column

Find and count matches between two arrays in excel

I'm trying to make a table that tracks pupil progress. Upon entry, pupils are given a predicted grade in all their subjects. Later in the year, their teachers give them an exam (which has a grade) and then give a set of new predicted grades.
This gives me two tables (each on a different sheet)...
-------------------------------------
-Entry - Subject -
-------------------------------------
-Student - Art - Maths - French -
-------------------------------------
- Jane - U - U - n -
-------------------------------------
- Alice - E - A+ - n -
-------------------------------------
- Tom - D - A - c -
-------------------------------------
and
----------------------------------------------------------------------
-Later - Subject -
----------------------------------------------------------------------
-Student - Art Exam - Art New Grade - Maths Exam - Maths New Grade -
----------------------------------------------------------------------
- Jane - U - U - E - E -
----------------------------------------------------------------------
- Alice - D - D - A+ - A+ -
----------------------------------------------------------------------
- Tom - C - B - A - A+ -
----------------------------------------------------------------------
I have created a dashboard that has two drop downs where users can select the subject and then the comparison (exam grade, new grade, best possible grade, ...). Using SUMPRODUCT I can take that input and easily count the grades in the matching columns.
What I want to do is create a table similar to the one below that can show how the pupils have changed between the two tracking periods...
--------------------------------------------
- Subject - New Grade -
- Art - n - U - E - D - C - B - A - A+ -
--------------------------------------------
- E n - - - - - - - - -
-- -----------------------------------------
- n U - - 1 - - - - - - -
-- -----------------------------------------
- t E - - - - 1 - - - - -
-- -----------------------------------------
- r D - - - - - - 1 - - -
-- -----------------------------------------
- y C - - - - - - - - -
--------------------------------------------
- B - - - - - - - - -
--------------------------------------------
- A - - - - - - - - -
--------------------------------------------
- A+ - - - - - - - - -
--------------------------------------------
Each cell counts the number of times it finds matching values between the two arrays where subject = chosen value (in this case Art) and comparison = chosen value (in this case new grade). I don't mind null values being zero or blank. I need to have the count of matches so I can then look at how many pupils are making progress (getting a letter earlier in the alphabet than the test upon entry suggested).
Ideally, I'll also end up making both rows and columns selectable so teachers can compare exam result to new predicted grade.
In my two arrays, the students appear in the same order (one less criteria to worry about) but the subjects and comparators don't. (NB - I've been using concatenation to merger subject and comparator)
All the help I've seen so far expects the data to be in just two rows, but I've 20 subjects and then each subject can have 5 or so comparisons to be made!
Thanks for any advice!

Try this:
=IF(SUMPRODUCT(($B3=INDEX(Sheet1!$B$3:$D$5,0,MATCH($A$2,Sheet1!$B$2:$D$2,0)))*(C$2=INDEX(Sheet2!$B$3:$E$5,0,MATCH($A$2 & " New Grade",Sheet2!$B$2:$E$2,0))))=0,"",SUMPRODUCT(($B3=INDEX(Sheet1!$B$3:$D$5,0,MATCH($A$2,Sheet1!$B$2:$D$2,0)))*(C$2=INDEX(Sheet2!$B$3:$E$5,0,MATCH($A$2 & " New Grade",Sheet2!$B$2:$E$2,0)))))
If you don't mind the 0s then the just the true part of the above if statement will work:
=SUMPRODUCT(($B3=INDEX(Sheet1!$B$3:$D$5,0,MATCH($A$2,Sheet1!$B$2:$D$2,0)))*(C$2=INDEX(Sheet2!$B$3:$E$5,0,MATCH($A$2 & " New Grade",Sheet2!$B$2:$E$2,0))))
For Reference:
Entry sheet (Sheet1):
New Grade Sheet (Sheet2)
One caveat, the Students must be in the same order.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Combining rows with nearly same fields - python-3.x

Related

How can understand semantic meaning for different value?

Is there a way to count unique values in column X where Y and Z match set criteria?

Using SUMIFS to sum all rows matching one criteria within a column matched by another criteria

Excel with rows of details related to other row

Find and count matches between two arrays in excel

Categories

Resources