transform text file to data table using python - python-3.x

I have text data available in the following format:
title:- sell product A,
description:- good quality product, fast service,
title:- sell product B,
description:- long lasting and good quality,
title:- buy product C,
description:- latest, good price
My goal is to get the text values into a dataframe with title and description values in corresponding columns.For example:
title
Description
sell product A
good quality product, fast service
sell product B
long lasting and good quality
buy product C
latest, good price

Edit Ver 1: Read file and Split to two columns
I am assuming that the pattern of your txt file is
title:-followed by data
description:-followed by data
title:-immediately after description line
description:-immediately after title line
If the pattern is not title:- and description:- sequence one after the other, the below code will NOT work.
I also assume there is no line blank line separating the title and description lines. If there are blank lines, I need to exclude the blank lines first before we do the splitting into title and description.
Read the full file into a list, then separate the data into two lists, then strip the leading text and load into pandas. I have added comments to each line so you can see what's being done.
import pandas as pd
with open('abc.txt','r') as f:
data = [line.rstrip() for line in f.readlines()]
#read all rows into a list while also striping \n
#[::2] selects alternate rows, starting from 0. [8:] excludes "title:-"
#adding rstrip(',') removes the trailing "," from the string
title = [d[8:].rstrip(',') for d in data[::2]]
#[1::2] selects alternate rows, starting from 1. [14:] excludes "description:-"
#adding rstrip(',') removes the trailing "," from the string
desc = [d[14:].rstrip(',') for d in data[1::2]]
df = pd.DataFrame({'Title':title,'Description':desc})
print (df)
The output of this will be:
Title Description
0 sell product A good quality product, fast service
1 sell product B long lasting and good quality
2 buy product C latest, good price
For the above example, the source data in abc.txt file is as follows:
title:- sell product A,
description:- good quality product, fast service,
title:- sell product B,
description:- long lasting and good quality,
title:- buy product C,
description:- latest, good price
Edit Ver 2: If title & description rows do not match
If you are unsure of the datafile and want to play it safe, you can create the dataframe as follows:
df = pd.DataFrame({'Title':pd.Series(title),'Description':pd.Series(desc)})
Here's an example:
I modified the input file to be as follows:
title:- sell product A,
description:- good quality product, fast service,
title:- sell product B,
description:- long lasting and good quality,
title:- buy product C,
description:- latest, good price
title:- buy product D,
The output of this will be:
Title Description
0 sell product A good quality product, fast service
1 sell product B long lasting and good quality
2 buy product C latest, good price
3 buy product D NaN
In the above example, the pattern breaks for the last row. We don't have a matching description value for the title. This results in a NaN in the Description column.
Edit Ver 3: Input file has blank line after each row
If your input file has a blank line in between Title and Description rows, then you can add this code immediately after reading the file.
with open('abc.txt','r') as f:
#read all rows into a list while also striping \n
data = [line.rstrip() for line in f.readlines()]
#removing the blank lines from data
data = data[::2]
Input sample file:
title:- sell product A,
description:- good quality product, fast service,
title:- sell product B,
description:- long lasting and good quality,
title:- buy product C,
description:- latest, good price
title:- buy product D,
Output DataFrame:
Title Description
0 sell product A good quality product, fast service
1 sell product B long lasting and good quality
2 buy product C latest, good price
3 buy product D NaN

Related

How join two dataframes with multiple overlap in pyspark

Hi I have a dataset of multiple households where all people within households have been matched between two datasources. The dataframe therefore consists of a 'household' col, and two person cols (one for each datasource). However some people (like Jonathan or Peter below) where not able to be matched and so have a blank second person column.
Household
Person_source_A
Person_source_B
1
Oliver
Oliver
1
Jonathan
1
Amy
Amy
2
David
Dave
2
Mary
Mary
3
Lizzie
Elizabeth
3
Peter
As the dataframe is gigantic, my aim is to take a sample of the unmatched individuals, and then output a df that has all people within households where only sampled unmatched people exist. Ie say my random sample includes Oliver but not Peter, then I would only household 1 in the output.
My issue is I've filtered to take the sample and now am stuck making progress. Some combination of join, agg/groupBy... will work but I'm struggling. I add a flag to the sampled unmatched names to identify them which i think is helpful...
My code:
# filter to unmatched people
df_unmatched = df.filter(col('per_A').isNotNull()) & col('per_B').isNull())
# take random sample of 10%
df_unmatched_sample = df_unmatched.sample(0.1)
# add flag of sampled unmatched persons
df_unmatched_sample = df_unmatched.withColumn('sample_flag', lit('1'))
As it pertains to your intent:
I just want to reduce my dataframe to only show the full households of
households where an unmatched person exists that has been selected by
a random sample out of all unmatched people
Using your existing approach you could use a join on the Household of the sample records
# filter to unmatched people
df_unmatched = df.filter(col('per_A').isNotNull()) & col('per_B').isNull())
# take random sample of 10%
df_unmatched_sample = df_unmatched.sample(0.1).select("Household").distinct()
desired_df = df.join(df_unmatched_sample,["Household"],"inner")
Edit 1
In response to op's comment:
Is there a slightly different way that keeps a flag to identify the
sampled unmatched person (as there are some households with more than
one unmatched person)?
A left join on your existing dataset after adding the flag column to your sample may help you to achieve this eg:
# filter to unmatched people
df_unmatched = df.filter(col('per_A').isNotNull()) & col('per_B').isNull())
# take random sample of 10%
df_unmatched_sample = df_unmatched.sample(0.1).withColumn('sample_flag', lit('1'))
desired_df = (
df.alias("dfo").join(
df_unmatched_sample.alias("dfu"),
[
col("dfo.Household")==col("dfu.Household") ,
col("dfo.per_A")==col("dfu.per_A"),
col("dfo.per_B").isNull()
],
"left"
)
)

How to group similar news text / article in pandas dataframe

I have a pandas dataframe of news article. Suppose
id
news title
keywords
publcation date
content
1
Congress Wants to Beef Up Army Effort to Develop Counter-Drone Weapons
USA,Congress,Drone,Army
2020-12-10
SOME NEWS CONTENT
2
Israel conflict: The range and scale of Hamas' weapons ...
Israel,Hamas,Conflict
2020-12-10
NEWS CONTENT
3
US Air Force progresses testing of anti-drone laser weapons
USA,Air Force,Weapon,Dron
2020-10-10
NEWS CONTENT
4
Hamas fighters display weapons in Gaza after truce with Israel
Hamas,Gaza,Israel,Weapon,Truce
2020-11-10
NEWS CONTENT
Now
HOW TO GROUP SIMILAR DATA BASED ON NEWS CONTENT AND SORT BY PUBLICATION DATE
Note:The content may be summary of the news
So that it displays as:
Group1
id
news title
keywords
publcation date
content
3
US Air Force progresses testing of anti-drone laser weapons
USA,Air Force,Weapon,Dron
2020-10-10
NEWS CONTENT
1
Congress Wants to Beef Up Army Effort to Develop Counter-Drone Weapons
USA,Congress,Drone,Army
2020-12-10
SOME NEWS CONTENT
Group2
id
news title
keywords
publcation date
content
4
Hamas fighters display weapons in Gaza after truce with Israel
Hamas,Gaza,Israel,Weapon,Truce
2020-11-10
NEWS CONTENT
2
Israel conflict: The range and scale of Hamas' weapons ...
Israel,Hamas,Conflict
2020-12-10
NEWS CONTENT
It's a little bit complicated, I choose the easy way for the similarity, but you can change the function as you wish.
you can also use https://pypi.org/project/pyjarowinkler/ for the is_similar function instead of the "set" I did. *the function can be much more complicated than the one I did
I used two applies the first one is to fit the "grps". it will work without the first one but it will be more accurate at the second time
you can also change the range(3,-1,-1) to a higher number for the accuracy
def is_similar(txt1,txt2,level=0):
return len(set(txt1) & set(txt2))>level
grps={}
def get_grp_id(row):
row_words = row['keywords'].split(',')
if len(grps.keys())==0:
grps[1]=set(row_words)
return 1
else:
for level in range(3,-1,-1):
for grp in grps:
if is_similar(grps[grp],row_words,level):
grps[grp]= grps[grp] | set(row_words)
return grp
grp +=1
grps[grp]=set(row_words)
return grp
df.apply(get_grp_id,axis=1)
df['grp'] = df.apply(get_grp_id,axis=1)
df = df.sort_values(['grp','publcation date'])
this is the output
if you want to split it into separate df let me know

Reading in CSVs and how to write the name of the CSV file into every row of the CSV

I have about 2,000 CSV's I was hoping to read into a df but first I was wondering how someone would (before joining all the CSVs) write the name of every CSV in the every row of the CSV. Like for example, in CSV1, there would be a column that would say "CSV1" in every row. And same for CSV2, 3 etc.
Was wondering if there was a way to accomplish this?
import os
import glob
import pandas as pd
os.chdir(r"C:\Users\User\Downloads\Complete Corporate Financial History")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
The csv files all look like this:
https://docs.google.com/spreadsheets/d/1hOb_nNjB3K8ldyyBUemQlcsTWcjyD8iLh8XMa5XB8Qk/edit?usp=sharing
They don't have the Ticker (file name) in each row though.
Edit: Here are the column headers: Quarter end Shares Shares split adjusted Split factor Assets Current Assets Liabilities Current Liabilities Shareholders equity Non-controlling interest Preferred equity Goodwill & intangibles Long-term debt Revenue Earnings Earnings available for common stockholders EPS basic EPS diluted Dividend per share Cash from operating activities Cash from investing activities Cash from financing activities Cash change during period Cash at end of period Capital expenditures Price Price high Price low ROE ROA Book value of equity per share P/B ratio P/E ratio Cumulative dividends per share Dividend payout ratio Long-term debt to equity ratio Equity to assets ratio Net margin Asset turnover Free cash flow per share Current ratio
and the rows descend by quarter.
Sample Data
,Quarter end,Shares,Shares split adjusted,Split factor,Assets,Current Assets,Liabilities,Current Liabilities,Shareholders equity,Non-controlling interest,Preferred equity,Goodwill & intangibles,Long-term debt,Revenue,Earnings,Earnings available for common stockholders,EPS basic,EPS diluted,Dividend per share,Cash from operating activities,Cash from investing activities,Cash from financing activities,Cash change during period,Cash at end of period,Capital expenditures,Price,Price high,Price low,ROE,ROA,Book value of equity per share,P/B ratio,P/E ratio,Cumulative dividends per share,Dividend payout ratio,Long-term debt to equity ratio,Equity to assets ratio,Net margin,Asset turnover,Free cash flow per share,Current ratio
0,6/30/2019,440000000.0,440000000.0,1.0,17900000000.0,6020000000.0,13000000000.0,3620000000.0,4850000000.0,12000000.0,55000000,5190000000.0,5900000000.0,3.69E+09,-1.20E+08,-1.20E+08,-0.27,-0.27,0.08,1.06E+08,1.29E+08,-2.00E+08,34000000,1360000000.0,128000000.0,22.55,25.83,19.27,0.0855,0.0243,10.9,1.98,16.11,33.46,0.2916,1.2296,0.2679,0.0311,0.78,-0.05,1.662
1,3/31/2019,449000000.0,449000000.0,1.0,18400000000.0,6050000000.0,13200000000.0,3660000000.0,5170000000.0,12000000.0,55000000,5420000000.0,5900000000.0,3.54E+09,1.87E+08,1.86E+08,0.4,0.39,0.08,-2.60E+08,42000000,-7.40E+08,-9.60E+08,1330000000.0,164000000.0,18.37,20.61,16.12,0.1298,0.0373,11.39,1.61,14.13,33.38,0.1798,1.1542,0.2784,0.0485,0.77,-0.94,1.6543
2,12/31/2018,485000000.0,485000000.0,1.0,18700000000.0,6580000000.0,13100000000.0,3520000000.0,5570000000.0,12000000.0,55000000,7250000000.0,5900000000.0,3.47E+09,2.18E+08,2.18E+08,0.45,0.45,0.06,4.26E+08,3.54E+08,-4.00E+07,7.40E+08,2280000000.0,-31000000.0,19.62,23.6,15.63,0.1208,0.035,11.38,1.79,None,33.3,0.1813,1.0685,0.2952,0.0457,0.76,0.94,1.8696
3,9/30/2018,483000000.0,483000000.0,1.0,18300000000.0,6130000000.0,13000000000.0,3010000000.0,5360000000.0,14000000.0,55000000,5470000000.0,6320000000.0,3.52E+09,1.61E+08,1.60E+08,0.33,0.32,0.06,51000000,65000000,-3.20E+07,82000000,1540000000.0,207000000.0,19.88,23.13,16.64,-0.0594,-0.0165,10.98,1.86,None,33.24,None,1.1902,0.2895,None,0.75,-0.32,2.0345
4,6/30/2018,483000000.0,483000000.0,1.0,18200000000.0,6080000000.0,13000000000.0,2980000000.0,5200000000.0,14000000.0,55000000,5480000000.0,6310000000.0,3.57E+09,1.20E+08,1.20E+08,0.25,0.24,0.06,1.76E+08,1.17E+08,-3.50E+07,2.52E+08,1460000000.0,166000000.0,20.27,24.07,16.47,-0.069,-0.0186,10.66,1.88,None,33.18,None,1.2259,0.2826,None,0.73,0.02,2.0406
5,3/31/2018,483000000.0,483000000.0,1.0,18200000000.0,5900000000.0,12900000000.0,2800000000.0,5270000000.0,14000000.0,55000000,5560000000.0,6310000000.0,3.45E+09,1.43E+08,1.42E+08,0.3,0.29,0.06,-4.40E+08,29000000,-5.40E+08,-9.50E+08,1210000000.0,117000000.0,26.87,31.17,22.57,-0.0536,-0.0134,10.8,2.67,None,33.12,None,1.2102,0.2861,None,0.7,-1.15,2.1039
6,12/31/2017,483000000.0,483000000.0,1.0,18700000000.0,6380000000.0,13800000000.0,2820000000.0,4910000000.0,14000000.0,55000000,7410000000.0,6810000000.0,3.27E+09,-7.30E+08,-7.30E+08,-1.51,-1.51,0.06,6.12E+08,-2.40E+08,-4.50E+07,3.35E+08,2150000000.0,236000000.0,25.3,27.85,22.74,-0.0232,-0.0038,10.06,2.07,None,33.06,None,1.4019,0.2594,None,0.67,0.78,2.2585
7,9/30/2017,481000000.0,481000000.0,1.0,19200000000.0,6150000000.0,13300000000.0,2680000000.0,5950000000.0,13000000.0,55000000,5250000000.0,6800000000.0,3.24E+09,1.19E+08,1.01E+08,0.23,0.22,0.06,1.72E+08,-1.30E+08,-1.50E+07,30000000,1820000000.0,131000000.0,24.76,26.84,22.67,-0.1222,-0.0308,12.24,1.92,None,33.0,None,1.1543,0.3063,None,0.65,0.09,2.2966
8,6/30/2017,441000000.0,441000000.0,1.0,19100000000.0,6030000000.0,13400000000.0,2660000000.0,5740000000.0,13000000.0,55000000,5220000000.0,6800000000.0,3.26E+09,2.12E+08,1.94E+08,0.44,0.43,0.06,2.17E+08,-1.30E+08,-8.60E+08,-7.70E+08,1790000000.0,125000000.0,25.2,28.65,21.75,-0.0899,-0.0231,12.89,2.05,None,32.94,None,1.1954,0.2976,None,0.61,0.21,2.2698
9,3/31/2017,441000000.0,441000000.0,1.0,20200000000.0,6710000000.0,14700000000.0,2590000000.0,5480000000.0,13000000.0,55000000,5170000000.0,8050000000.0,3.19E+09,3.22E+08,3.05E+08,0.69,0.65,0.06,-3.00E+08,1.03E+09,-4.30E+07,6.90E+08,2550000000.0,113000000.0,24.66,30.69,18.64,-0.0815,-0.0223,12.31,2.15,None,32.88,None,1.4826,0.2692,None,0.59,-0.94,2.5937
10,12/31/2016,441000000.0,441000000.0,1.0,20000000000.0,5890000000.0,14900000000.0,2750000000.0,5120000000.0,26000000.0,55000000,6940000000.0,8040000000.0,3.06E+09,-1.30E+09,-1.30E+09,-2.92,-2.92,7.76,6.62E+08,-2.40E+08,-4.00E+08,0,1860000000.0,302000000.0,24.43,32.1,16.75,-0.098,-0.029,11.49,0.91,None,32.82,None,1.5897,0.2525,None,0.57,0.82,2.1433
11,9/30/2016,438000000.0,438000000.0,1.0,37400000000.0,9370000000.0,23500000000.0,5500000000.0,11800000000.0,2170000000.0,55000000,5380000000.0,9500000000.0,5.21E+09,1.66E+08,1.48E+08,0.34,0.33,0.09,3.06E+08,-2.30E+08,-1.40E+08,-6.60E+07,1860000000.0,152000000.0,30,32.91,27.09,-0.0377,-0.0105,26.73,1.07,None,25.06,None,0.8107,0.313,None,0.57,0.35,1.7033
12,6/30/2016,1320000000.0,438000000.0,0.333333,36100000000.0,8090000000.0,21600000000.0,5490000000.0,12300000000.0,2190000000.0,55000000,5400000000.0,8280000000.0,5.30E+09,1.35E+08,1.18E+08,0.09,0.09,0.03,3.32E+08,3.11E+08,-1.00E+08,5.45E+08,1930000000.0,-50000000.0,30.42,34.5,26.34,-0.047,-0.0139,28.01,1.1,None,24.97,None,0.6741,0.3398,None,0.58,0.87,1.4747
13,3/31/2016,1320000000.0,438000000.0,0.333333,36100000000.0,7670000000.0,21800000000.0,5560000000.0,12200000000.0,2140000000.0,55000000,5400000000.0,8260000000.0,4.95E+09,16000000,-2000000,0,0,0.03,-4.30E+08,-1000000,-1.10E+08,-5.40E+08,1380000000.0,29000000.0,24.54,30.66,18.42,-0.0467,-0.0137,27.76,0.9,None,24.88,None,0.6784,0.3368,None,0.59,-1.05,1.3798
14,12/31/2015,1310000000.0,438000000.0,0.333333,36500000000.0,7950000000.0,22400000000.0,5210000000.0,12000000000.0,2090000000.0,55000000,7540000000.0,9040000000.0,5.25E+09,-7.00E+08,-7.20E+08,-0.55,-0.55,0.03,8.65E+08,-4.60E+08,-2.30E+08,1.80E+08,1920000000.0,398000000.0,28.48,33.54,23.43,-0.0324,-0.0089,27.36,0.99,25.66,24.79,None,0.7542,0.3283,None,0.62,1.07,1.5262
You could try something like this, then:
df_list = []
for filename in all_filenames:
df = pd.read_csv(filename)
# Adds a column Ticker to the dataframe with the filename in the column.
# The split function will work if no filename has more than one period.
# Otherwise, you can use Python built-in function to trim off the extension.
df['Ticker'] = filename.split('.')[0]
df_list.append(df)
all_dfs = pd.concat(df_list, axis=0)
Can't think of an inbuilt way of doing this, but an alternative way is,
expand your for loop and load the data frame to a variable
create a column, df['fileName']=filename.split('.')[0], to get just the file name without the .csv.
Then append this df to a list , this list will get appended every loop and after the loop completion just do a df.concat(list_csv, axis=0) to make one single df.
Replying from my phone so couldn't type in a working code, but it's easy if you think about it.
KR,
Alex

Load item cost from an inventory table

I have an Inventory Sheet that contains a bunch of data about products I have for sale. I have a sheet for each month where I load in my individual sales. In order to calculate my cost of sales, I enter my product cost for each sale manually. I would like a formula to load the cost automatically, using the product name as a search term.
Inventory Item | Cost Sold Item | Sale Price | Cost
Product 1 | 2.99 Product 3 | 16.99 | X
Product 2 | 4.99 Product 3 | 14.57 | X
Product 3 | 6.99 Product 1 | 7.99 | X
So basically I am looking to "solve for X".
In addition to this, the product name on the two tables are actually different lengths. For example, one item on my Inventory Table may be "This is a very, very long product name that goes on and on for up to 120 characters", and on my products sold table it will be truncated at the first 40 characters of the product name. So in the above formula, it should only search for the first 40 characters of the product name.
Due to the complicated nature of this, I haven't been able to search for a sufficient solution, since I don't really know exactly where to start to quickly explain it.
UPDATE:
The product names of my Inventory List, and the product names of my items sold aren't matching. I thought I could just search for the left-most 40 characters, but this is not the case.
Here is a sample of products I have in my Inventory List:
Ford Focus 2000 thru 2007 (Haynes Repair Manual) by Haynes, Max
Franklin Brass D2408PC Futura, Bath Hardware Accessory, Tissue Paper Holder, ...
Fuji HQ T-120 Recordable VHS Cassette Tapes ( 12 pack ) (Discontinued by Manu...
Fundamentals of Three Dimensional Descriptive Geometry [Paperback] by Slaby, ...
GE Lighting 40184 Energy Smart 55-Watt 2-D Compact Fluorescent Bulb, 250-Watt...
Get Set for School: Readiness & Writing Pre-K Teacher's Guide (Handwriting Wi...
Get the Edge: A 7-Day Program To Transform Your Life [Audiobook] [Box set] by...
Gift Basket Wrap Bag - 2 Pack - 22" x 30" - Clear [Kitchen]
GOLDEN GATE EDITION 100 PIECE PUZZLE [Toy]
Granite Ware 0722 Stainless Steel Deluxe Food Mill, 2-Quart [Kitchen]
Guess Who's Coming, Jesse Bear [Paperback] by Carlstrom, Nancy White; Degen, ...
Guide to Culturally Competent Health Care (Purnell, Guide to Culturally Compe...
Guinness World Records 2002 [Illustrated] by Antonia Cunningham; Jackie Fresh...
Hawaii [Audio CD] High Llamas
And then here is a sample of the product names in my Sold list:
My Interactive Point-and-Play with Disne...
GE Lighting 40184 Energy Smart 55-Watt 2...
Crayola® Sidewalk Chalk Caddy with Chalk...
Crayola® Sidewalk Chalk Caddy with Chalk...
First Look and Find: Merry Christmas!
Sesame Street Point-and-Play and 10-Book...
Disney Mickey Mouse Board Game - Duck Du...
Nordic Ware Microwave Vegetable and Seaf...
SmartGames BACK 2 BACK
I have played around with searching for the left-most characters, minus 3. This did not work correctly. I have also switched the [range lookup] between TRUE and FALSE, but this has also not worked in a predictable way.
Use the VLOOKUP function. Augment the lookup_value parameter with the LEFT function.
        
In the above example, LEFT(E2, 9) is used to truncate the Sold Item lookup into Inventory Item.

Excel lookup value for multiple criteria and multiple columns

I am helping a friend with some data analysis in Excel.
Here's how our data looks like:
Car producer | Classification | Prices from 9 different vendors in 9 columns
AUDI | C | 100 200 300 400 500 600 700 800 900
AUDI | C | 100 900 800 200 700 300 600 400 500
AUDI | B | .. ..
Now, for each classification and each producer, we produced a list that shows which of the 9 vendors has offers the most lowest prices (in terms of count, so for example there are 2 cars from AUDI in the C class, so vendor A would offer the lowest price for both).
What we need: A way to calculate the average price for this vendor. So, if we see that the vendor A has the lowest price for AUDI cars in the C class, then we want to know the average price for vendor A for these cars.
I'm quite stumped since I can't use the "standard" index-match-small approach since the prices are stored in 9 different columns.
I've suggested to use a long if-chain like this: =if(vendor=A,averageif(enter the criteria and select the column of vendor A for average values),if(vendor=B,average(enter the criteria and select the column of vendor B for average values),... etc.).
But this method is obviously limited and does not scale well to higher dimensions.
We also would like to avoid using any addons.
You're going to need to create a separate table that has all unique classifications in the rows and all dealers in the columns (same as yours, but with duplicate rows removed). Then, in each cell, take the average price for that classification*vendor combination. This can be done by using a combination of sumif/countif. For example, if your second table had a column for classifications in cells M2:M[end], calculating the average price for the Audi C class offered by vendor 1 could be:
=sumif(C$2:C$[end],"="&$M2,$B$2:$B$[end])/countif($B$2:$B$[end],"="&$M2)
This would look something like this:
Then you could simply find the cheapest vendor by matching the min price. For example, the cheapest vendor for the audi C class in my example image would be:
=index($N$1:$V$1,match(min($N2:$V2),$N2:$V2,0))
A lot this could be done using PivotTables. If it is a one off thing, I would go that route, if it needs to be automated, then try using a multicondtional VLOOKUP (needs to be entered as a Matrix Formula: CTRL+ALT+SHIFT). This is simply an example, not based on your data:
{=VLOOKUP(A11&B11,CHOOSE({1\2},A2:A7&B2:B7,C2:C7),2,0)}
A better explanation is given here at chandoos site:http://chandoo.org/wp/2014/10/28/multi-condition-vlookup/

Resources