Removing entries from another file - excel

I have two very large files we'll call Old and New. New contains many entries that Old contains. What I need to do is remove any entry from New that Old contains. There are 9,459 entries in Old with 55 columns. New contains 11,983 entries with 76 columns. I need to make the comparison based on 5 columns; 'name_last', 'name_first', 'name_middle', 'street', and 'type'
I'm using Excel 2010, I'm very new to it, and haven't got a clue where to start.

Make up a concatenated column in each file to "glue" together 'name_last', 'name_first', 'name_middle', 'street', and 'type'. Something like
this
=LOWER(A2&B2&C2&D2&E2)
(The LOWER will let you run a case insensitive search)
Add a formula like this (change sheet names and columns to suit)
=ISNA(MATCH(F2,[old.xlsx]Sheet2!$F:$F,0))
to look up each value in column F of "new.cls" against the entire list of concatenated values in "old.xls"
AutoFilter the TRUE results to return the non-matches, then delete these rows

Related

How to rename one column(based on given position) if there are duplicate column names

I have a dataframe with 2 duplicate columns. I need to rename one of the column based on the position given in a configuration file. Using rename() is not working because it is renaming both the columns, even though I am passing the position of the column which needs to be renamed. Could you please suggest how to achieve this?
I want the logic to be generic as I have multiple files and each file has duplicates but different column names which are mentioned in the configuration file.
columns - state city country state
config file -
position HEADER
0 statename
Df.rename(columns = {Df.columns[position]:row['HEADER']}, inplace = True)
It is not working because both the columns are renamed even when position is passed.
IIUC you can convert the names of the column to a numpy array with to_numpy and alter the column names there. This will also the change the column name in the specified position without reassigning the new column names. You can also insert multiple positions with to_numpy unlike to_list:
Df.columns.to_numpy()[position] = row['HEADER']

Sorting txt data files while importing in Excel Data Query

I am trying to enter approximately 190 txt datafiles in Excel using the New Query tool (Data->New Query->From File->From Folder). In the Windows explorer the data are properly ordered: the first being 0summary, the second 30summary etc.
However, when entering them through the query tool the files are sorted as shown in the picture (see line 9 for example, you will see that the file is not in the right position):
The files are sorted based on the first digit instead of the value represented. Is there a solution to this issue? I have tried putting space between the number and the summary but it also didn't work. I saw online that Excel doesn't recognize the text within "" or after /, but I am not allowed to save the text files with those symbols in their name in Windows. Even when removed the word summary the problem didn't fix. Any suggestions?
If all your names include the word Summary:
You can add a column "Extract" / "Text before delimiter" enter "Summary", change the column type to Number and sort over that column
If the only numbers are those you wish to sort on, you can
add a custom column with just the numbers
Change the data type to whole number
sort on that.
The formula for the custom column:
Text.Select([Name],{"0".."9"})
If the alpha portion varies, and you need to sort on that also, you can do something similar adding another column for the alpha portion, and sorting on that.
If there might be digits after the leading digits upon which you want to sort, then use the following formula for the added column which will extract only the digits at the beginning of the file name:
=Text.Middle([Name],0,Text.PositionOfAny([Name],{"A".."z"}))

Relabelling large amounts of data in Excel

In a CSV file: I want to relabel 433,000+ rows of IDs that look like "e904ab64a642efcd25f4a43cb729701646d4bf7a4ed0bacbae9d85127978606a" into simpler ID codes. For each of these unique IDs there are 4-5 rows of data. I really don't want to "find and replace" each of them because there are over 2000+ unique IDs. Is there any function in excel that can help me do that? Otherwise, any recommendations of what programs I can use?
If the IDs are always on consecutive lines, you can
Store the ID before replacement
replace it with your simpler ID (also store it)
go to next line
check if the ID is the same as stored on the previous line
If yes, use the same replacement ID as on previous line
If no, do same 1)
If you are happy doing this manually (since your tags do not currently include vba) then here is a simple approach:
Create a Unique List of IDs, for example by creating a 1-column
PivotTable
Next to each Unique ID, put your simplified ID (however you are creating that - is there an algorithm, or could it just be =Row()?)
Insert a column in the original sheet, adjacent to the ID column
Use a VLOOKUP to find the matching Simplified ID (e.g. =VLOOKUP(A1,'New IDs'!$A:$B,2,FALSE))
When it has finished calculating, copy the Simplified IDs, and Paste Special as Values

Append new columns into Excel with MATLAB

I would like to ask how to use MATLAB to append new columns into existing excel file without altering the original data in the file? In my case I don't know the original number of columns and rows in the file and it is inefficient to open the files one by one and check in practice. Another difficulty is that the new columns may have different number of rows to the existing data so that I cannot use the trick of reading in the data, forming a new matrix and replace the data with the new matrix.
I have seen many posts teaching people how to add new rows but adding new column seems quite a different thing since the columns are named by letters instead of numbers.
Thank you.
You could try reading in the data, use size on the array to determine the number of columns, and then use xlswrite with the range that you want. Have a look here for a function to turn the column number into the excel format: http://au.mathworks.com/matlabcentral/answers/54153-dynamic-ranges-using-xlswrite
Finally I solve it with the following code:
%%%
if (step==1)
xlswrite(filename,array,sheetname,'A1'); %Create the file
else
[~,~,Data]=xlsread(filename,sheetname); %read in all the old data
OriCol=size(Data,2); %get the column number of the old data
NewCol=OriCol+1; %the new array is placed right next to the original data
ColLetter=xlcolumnletter(NewCol);
StartCell=[ColLetter,'1'];
xlswrite(filename,array,sheetname,StartCell);
end

How to modify Choice field values (add & remove) without altering the original entries

I am working with a SharePoint list that has a "Category" column, which is a choice field, let's just say the categories are A,B,C,D (right now) for simplicity. At the beginning of the fiscal year (1.5 months) these will not be the same- some additions and deletions will be needed, but we need to make sure the original list values for this field do not change (the ones that have already been entered).
If I modify the Choice field values directly to remove an item, let's say A, will it remove all instances of A throughout the list? ie. Am I safe to edit this field directly, or should I create a "Historical Category" column to store the old values.
Just set up a test list with a choice column containing values 1,2,3,4. Created a few list entries, then modified the choice column values to 11,12,13,14. Original list entry values for the choice column did not change (ie they were preserved). My conclusion is that it seems like modifying the choice column values will not alter the original list entries, although you must be careful to not overwrite them with one of the newly modified values.

Resources