I have huge_database.csv like this:
name,phone,email,check_result,favourite_fruit
sam,64654664,sam#example.com,,
sam2,64654664,sam2#example.com,,
sam3,64654664,sam3#example.com,,
[...]
===============================================
then I have 3 email lists:
good_emails.txt
bad_emails.txt
likes_banana.txt
the contents of which are:
good_emails.txt:
sam#example.com
sam3#example.com
bad_emails.txt:
sam2#example.com
likes_banana.txt:
sam#example.com
sam2#example.com
===============================================
I want to do some grep, so that at the end the output will be like this:
sam,64654664,sam#example.com,y,banana
sam2,64654664,sam2#example.com,n,banana
sam3,64654664,sam3#example.com,y,
I don't mind doing it in multiple steps manually and, perhaps, in some complex algorithm such as copy pasting to multple files. What matters to me is the reliability, and most importantly the ability to process very LARGE csv files with more than 1M lines.
What must also be noted is the lists that I will "grep" to add data to some of the columns will most of the times affect at most 20% of the total csv file rows, meaning the remaining 80% must be intact and if possible not even displace from their current order.
I would also like to note that I will be using a software called EmEditor rather than spreadsheet softwares like Excel due to the speed of it and the fact that Excel simply cannot process large csv files.
How can this be done?
Will appreciate any help.
Thanks.
Googling, trial and error, grabbing my head from frustration.
Filter all good emails with Filter. Open Advanced Filter. Next to the Add button is Add Linked File. Add the good_emails.txt file, set to the email column, and click Filter. Now only records with good emails are shown.
Select column 4 and type y. Now do the same for bad emails and change the column to n. Follow the same steps and change the last column values to the correct string.
I have a large excel file and want to select specific rows (not continuous block) and columns to read in. With columns this is easy, is there a way to do this for rows? or do I need to read in everything and then delete all the rows I don't want?
Consider an excel file with structure
,CW18r4_aer_7,,,,,,,
,Vegetation ,,,,,,,
Date,A1,A2,B1,B2,C1,C2,C3,C4
1/7/86,3.80,8.02,7.94,9.81,9.82,4.19,3.88,0.87
2/7/86,0.50,2.02,5.26,3.70,8.59,8.61,9.86,3.27
3/7/86,4.75,3.88,0.46,5.95,9.45,9.62,4.33,1.63
4/7/86,7.64,6.93,2.71,9.96,1.25,0.35,1.84,1.02
5/7/86,3.33,8.24,7.36,7.86,0.43,2.32,2.18,1.91
6/7/86,1.96,1.78,7.45,2.28,5.27,9.94,0.22,2.94
7/7/86,4.67,8.41,1.49,5.48,5.46,1.39,1.85,7.71
8/7/86,8.07,5.60,4.23,3.93,3.92,9.09,9.90,2.15
9/7/86,7.00,5.16,6.10,8.86,7.18,9.42,8.78,5.42
10/7/86,7.53,9.81,3.33,1.50,9.45,6.96,5.41,5.25
11/7/86,0.95,3.84,3.52,5.94,8.77,1.94,5.69,8.62
12/7/86,2.94,3.07,5.13,8.10,6.52,9.93,5.85,3.91
13/7/86,9.33,7.03,5.80,2.45,2.86,7.32,5.00,0.17
14/7/86,7.39,4.85,9.15,2.23,1.70,9.42,2.72,9.32
15/7/86,3.38,4.67,6.63,2.12,5.09,7.71,0.99,9.72
16/7/86,9.85,6.68,3.09,5.05,0.34,5.44,5.99,6.19
I want to take the headers from row 3 and then read in some of the rows and columns.
import pandas as pd
df = pd.read_excel("filename.xlsx", skiprows = 2, usecols = "A:C,F:I", userows = "4:6,13,17:19")
Importantly, this is not a block that can be described by say [A3:C10] or the like.
The userows option does not exist. I know I can skip rows at the top, and at the bottom - so presumably can make lots of data frames and knit them together. But is there a simple way to just read in what you need once? My workaround is to just create lots of excel spreadsheets that just have what I need for different data frames, but this leaves things very open to me making a mistake I can't find.
I have about 100 columns with some empty values that I would like to replace with zeroes. I know how to do this with a single column using Calculate and Replace, but I wanted to see if there was a way to do this with multiple columns at once.
Thanks!
You could script it but it'd probably take you as long to write the script as it would to do it manually with a transformation. A better idea would be to fix it in the data source itself before you import it so SPOTFIRE doesn't have to do the transformation every time, which if you are dealing with a large amount of data, could hinder your performance.
I would like to ask how to use MATLAB to append new columns into existing excel file without altering the original data in the file? In my case I don't know the original number of columns and rows in the file and it is inefficient to open the files one by one and check in practice. Another difficulty is that the new columns may have different number of rows to the existing data so that I cannot use the trick of reading in the data, forming a new matrix and replace the data with the new matrix.
I have seen many posts teaching people how to add new rows but adding new column seems quite a different thing since the columns are named by letters instead of numbers.
Thank you.
You could try reading in the data, use size on the array to determine the number of columns, and then use xlswrite with the range that you want. Have a look here for a function to turn the column number into the excel format: http://au.mathworks.com/matlabcentral/answers/54153-dynamic-ranges-using-xlswrite
Finally I solve it with the following code:
%%%
if (step==1)
xlswrite(filename,array,sheetname,'A1'); %Create the file
else
[~,~,Data]=xlsread(filename,sheetname); %read in all the old data
OriCol=size(Data,2); %get the column number of the old data
NewCol=OriCol+1; %the new array is placed right next to the original data
ColLetter=xlcolumnletter(NewCol);
StartCell=[ColLetter,'1'];
xlswrite(filename,array,sheetname,StartCell);
end