I have a large number of txt files supplied to me by a third party, each containing two columns of string data. The data format is completely consistent but there are no column headers.
I'm trying to combine them all into one file. The simple way to do this usually is by opening the windows command prompt in the file location and using say copy *.txt MyMergedFile.txt. In this case it copies the contents of the last file on the list to my new file and ignores the others. I assumed that this is because of the lack of headers? Is there a way to either quickly and easily insert headers into all my files, so I can use the usual method, or a simple way of combining these without headers? Happy to use PowerShell, SQL2008, R, vb, whatever has the lowest hassle factor. I'm working in Windows 10. The application is building a large lookup table in a GIS geodatabase.
I fixed this in R using the following:
#get tidyverse
library(tidyverse)
# make a list of target files, i.e. all the .txts in my designated folder
files<-list.files(path = "C:/temp/myfiles", pattern = "*.txt", full.names = T)
# quick check to make sure they are all there
print files
# put the contents into one file
masterfile<- sapply(files, read.table, simplify=FALSE) %>% bind_rows()
From here I can read my data into a suitable format for the next stage in my workflow.
Related
This is to create a bot that updates the existing CSV file with the latest available data from the new CSV file that is downloaded at regular intervals.
I am unable to figure out the logic. Need your help.
Step 1: I am accessing the following website,
https://www.marketwatch.com/tools/stockresearch/globalmarkets/intIndices.asp
Step 2: I am downloading the Tables from the above website and saving a CSV file.
Step 3: I am comparing the OLD CSV file with the NEW CSV FILE and updating the values in the OLD CSV.
Step 4: If there were changes made there is a status column and in the corresponding row I need to update "Value Updated" or "Latest Value Exists"
When you extract data from CSVs/Excel workbooks you can set a session name other than Default. By doing so this will allow you to build for each functions to loop through both file, comparing and flagging differences if needed.
You should make sure that the indexes counted in the for loops are the correct ones as some mix ups could happen.
Read the excel using database commands and then compare between them.
There are multiple ways to do that externally & internally -
Let's talk about one of the ALGO(Only using the Commands that are available in AA) -
STEP 1 : OPEN BOTH CSV's IN DIFFERENT SESSIONS AND GET THE COLUMNS(WHICH NEEDS TO BE COMPARED) SAVED TO INDIVIDUAL LISTS.
FOR EX: IF I HAD TO COMPARE COLUMN 1's DATA OF X.CSV WITH COLUMN 2's DATA OF Y.CSV THEN,
A) CAPTURE COLUMN 1 DATA OF X.CSV (FILEDATA COLUMN) TO LIST - lstColumn1
B) CAPTURE COLUMN 2 DATA OF Y.CSV (FILEDATA COLUMN) TO LIST - lstColumn2
STEP 2 : COMPARE BOTH DATA
FOR EX :
A) COMPARE -
IF (lstColumn1=lstColumn2)
GO TO ("PLEASE MENTION CELL NUMBER")
UPDATE WITH "SPECIFIED VALUE"
This logic will provide you the best case results with respect to time complexity and without using the external code. You can, however, achieve a quicker one with a metabot implementation.
For downloading a similar use case bot you can visit this link - https://botstore.automationanywhere.com/bot/excel_comparison/
I want to read in a bunch of different .csv files from a folder, average the 9th column of each individual one (starting at the 2nd row because I want to exclude the headers), and then output the averages into a new .csv file as a single column list. I have tried some code so far, but Matlab just says it is busy and never reads or outputs anything. Any advice on where to go next? Thanks!
function csv_write_2()
folder = 'C:\Users\Brent\Desktop\MCGOUGH\2017-07-12_bov_da_medmen-l_01\vic-2d_data';
csvFiles = dir(fullfile(folder, '*.csv'));
numfiles = length(csvFiles);
average = zeros(1, numfiles);
for k = 1:numfiles
M = csvread(fullfile(folder, csvFiles(k).name),1,0);
average(k) = mean(M(:,9));
end
csvwrite(fullfile(folder, 'output.csv'), average);
end
Two suggestions:
1) Use the "Import Data" GUI button to generate the script for importing from the CSV. This is in the "Home" tab. You'll click it, then select the data you want to import, the format you want it imported as, etc. Then click the down arrow under "Import Selection" and click "Generate Function". You can then modify this function as needed, and call it in a loop to loop over your various CSV files. This way you'll know you've got the importing part written correctly.
2) Use xlsread instead. This does take longer to run, but is MUCH more intuitive an easy to use. There's also xlswrite, which I haven't used, but I assume is similarly easy.
I have a lot of csv files in a Azure Data Lake, consisting of data of various types (e.g., pressure, temperature, true/false). They are all time-stamped and I need to collect them in a single file according to timestamp for machine learning purposes. This is easy enough to do in Java - start a filestream, run a loop on the folder that opens each file, compares timestamps to write relevant values to the output file, starting a new column (going to the end of the first line) for each file.
While I've worked around the timestamp problem in U-SQL I'm having trouble coming up with syntax that will help me run this on the whole folder. The wildcard syntax {*} treats all files as the same fileset while I need to run some sort of loop to join a column from each file individually.
Is there any way to do this, perhaps using virtual columns?
First you have to think about your problem functional/declaratively and not based on procedural paradigms such as loops.
Let me try to rephrase your question to see if I can help. You have many csv files with data that is timestamped. Different files can have rows with the same timestamp, and you want to have all rows for the same timestamp (or range of timestamps) output to a specific file? So you basically want to repartition the data?
What is the format of each of the files? Do they all have the same schema or different schemas? In the later case, how can you differentiate them? Based on filename?
Let me know in the comments if that is a correct declarative restatement and the answers to my questions and I will augment my answer with the next step.
I am trying to set up a query that will simply combine data from CSVs into a table as new files get added to a specific folder, where each row contains the data from a separate file. While doing tests with CSVs that I created in excel, this was very simple. After expanding the content column, I would see an individual row of data for each file.
In practice however, where I am trying to use CSVs that are put out from a proprietary android app, expanding the content column leads to 1 single row, with data from all files placed end to end.
Does this have something to do with there not being and "end of line" character in the CSVs the app is producing? If so, is there an easy way to remedy this without changing the app? If not, is there something simple and direct I can ask the developer to change which would prevent this behavior?
Thanks for any insight!
I do not know anything about programming.
I have thousands of text files (.txt), in each of them person names are included. I have these names in a column in a separate spreadsheet too. I want to replace all this names with a "X". So instead of "Brad pitt" or "Angelina Jolie" or "George Clooney", all will be replaced by "X". Is it possible to do it in a few step approach instead of opening every file and replacing the names?
I have a 7 character string of numbers in each of these files, say 1234567 or 1234568. Again is it possible to replace all these nembers in all these files to just a "X"?
Please guide me what program in windows can I use to do these?
Accept my apology for lay computer language.
Download notepad++, http://notepad-plus-plus.org/download/v6.3.html
Click on top, search > find (or ctrl+f)
click the find in files tab
select the directory, fill out the find what and replace with
click replace in files
Edit: "I did, but the problem is that in "find what", I can put one name, but I have a list of 1000 different first and last names to be replaced with X (I mean de-identifying)"
In that case I would write a C# program to read in all the files, file by file and loop through your list of names. Use the String.Replace method to replace the names with X and then save the file. Look at the StreamReader and StreamWriter classes.
http://msdn.microsoft.com/en-us/library/system.io.streamreader.aspx
http://msdn.microsoft.com/en-us/library/system.io.streamwriter.aspx