Replacing names with X in thousands of text files - text

I do not know anything about programming.
I have thousands of text files (.txt), in each of them person names are included. I have these names in a column in a separate spreadsheet too. I want to replace all this names with a "X". So instead of "Brad pitt" or "Angelina Jolie" or "George Clooney", all will be replaced by "X". Is it possible to do it in a few step approach instead of opening every file and replacing the names?
I have a 7 character string of numbers in each of these files, say 1234567 or 1234568. Again is it possible to replace all these nembers in all these files to just a "X"?
Please guide me what program in windows can I use to do these?
Accept my apology for lay computer language.

Download notepad++, http://notepad-plus-plus.org/download/v6.3.html
Click on top, search > find (or ctrl+f)
click the find in files tab
select the directory, fill out the find what and replace with
click replace in files
Edit: "I did, but the problem is that in "find what", I can put one name, but I have a list of 1000 different first and last names to be replaced with X (I mean de-identifying)"
In that case I would write a C# program to read in all the files, file by file and loop through your list of names. Use the String.Replace method to replace the names with X and then save the file. Look at the StreamReader and StreamWriter classes.
http://msdn.microsoft.com/en-us/library/system.io.streamreader.aspx
http://msdn.microsoft.com/en-us/library/system.io.streamwriter.aspx

Related

combining txt files which have no column headers

I have a large number of txt files supplied to me by a third party, each containing two columns of string data. The data format is completely consistent but there are no column headers.
I'm trying to combine them all into one file. The simple way to do this usually is by opening the windows command prompt in the file location and using say copy *.txt MyMergedFile.txt. In this case it copies the contents of the last file on the list to my new file and ignores the others. I assumed that this is because of the lack of headers? Is there a way to either quickly and easily insert headers into all my files, so I can use the usual method, or a simple way of combining these without headers? Happy to use PowerShell, SQL2008, R, vb, whatever has the lowest hassle factor. I'm working in Windows 10. The application is building a large lookup table in a GIS geodatabase.
I fixed this in R using the following:
#get tidyverse
library(tidyverse)
# make a list of target files, i.e. all the .txts in my designated folder
files<-list.files(path = "C:/temp/myfiles", pattern = "*.txt", full.names = T)
# quick check to make sure they are all there
print files
# put the contents into one file
masterfile<- sapply(files, read.table, simplify=FALSE) %>% bind_rows()
From here I can read my data into a suitable format for the next stage in my workflow.

Copy incorrect words in excel

I need to find and copy a word(s) in a string. The condition is that the word is an incorrect one. Essentially, it's something like copy all words that has wiggle red underline in browser,MS Words, etc.
I am doing this to extract the brand names in hundred of thousand of free text cells. Since the brand names are usually not words in dictionary (for searchability and identifiablity) , this approach would help find the majority of them.
It doesn't have to be an excel functionality, I am open to any tool that works.
moving them directly into excel is tedious, shown by the link in the previous answer. If you would like a generated list of the misspelled words, follow the instructions on this site:
http://www.techrepublic.com/blog/microsoft-office/a-word-macro-that-highlights-and-lists-misspelled-words/
The code copies the misspelled words into a new document for you, so they will be isolated from your original document. Then you can apply any formatting or data analyses if you need it.

Spreadsheet to multiple txt files with specified names

I'm looking for a way to export text from a cells in a spreadsheet into multiple '.txt' files. The trick is that each .txt file would contain text from one cell from specified column and it would be named using text from one cell from other specified column.
Example Spreadsheet:
Names text extra-info
John 15684 Spring
Sally 54645 Autumn
Mark 45545 Winter
From this example three .txt files should be created.
Named:
John.txt
Sally.txt
Mark.txt
and containing relevant numbers. ex 15684 inside Jonh.txt.
The spreadsheet is a google spreadsheet at the moment but we have access to open office and excel as long as one of them can do that job.
This should be pretty straightforward. Do you know how to begin writing an App Script? I don't mean to sound condescending, but since you didn't post code, I don't know exactly where you need help.
I'd use the Spreadsheet service to gather the data from your spreadsheet:
https://developers.google.com/apps-script/reference/spreadsheet/range#getValues()
Then you'd iterate through each row and create a new file using the createFile() method in DriveApp for the contents:
https://developers.google.com/apps-script/reference/drive/drive-app#createFile(String,String,String)
All of the resulting files will be in the root folder of Drive.

Looking up multiple values in a list

I'm trying to select multiple values based on a search key. In it's most basic form there is no problem with this. I followed this example and everything went well:
http://office.microsoft.com/en-us/excel-help/how-to-look-up-a-value-in-a-list-and-return-multiple-corresponding-values-HA001226038.aspx
=IF(ISERROR(INDEX($A$1:$B$7,SMALL(IF($A$1:$A$7=$A$10,ROW($A$1:$A$7)),ROW(1:1)),2)),"",INDEX($A$1:$B$7,SMALL(IF($A$1:$A$7=$A$10,ROW($A$1:$A$7)),ROW(1:1)),2))
The problem with this however is that in my case I have multiple CSV files (external) where some values in my A$ column look like this:
=- sometext // results into #NAME? error
Excel interprets these as a formulas where it is actually only supposed to be a string. Sure I could change it to text and save it again but I would like to avoid any manipulation in these CSV files.
I tried to extend the second IF statement (if you read it from left to right) with:
IF(AND($A$1:$A$7 <> "#NAME?", $A$1:$A$7=$A$10,ROW($A$1:$A$7)))
and
IF(AND(NOT(ISERROR($A$1:$A$7)), $A$1:$A$7=$A$10,ROW($A$1:$A$7)))
Both didn't work. (Sorry if I messed up some syntax and formula names, I'm using a different language version)
Here a small image of what's happening right now and how it should look:
On the right site you can see a list of values right next to Test1 which are missing on the left site due to the #NAME? error.
I would suggest opening the csv's files as text files. Selecting Comma as your delimiter and then select Text as your Column data format. This way, Excel will treat all your data as text and will not try to read =- sometext as a formula.
To do so, you would need to change your .csv files extension to .txt or anything else (even no file format).
Instead of "Opening" the CSV file, you can "Import" it. This will open the Text Import Wizard which will allow you to specify particular columns as Text. This is located in different areas in different versions of Excel. In Excel 2007, it is on the Data Tab / Get External Data / From Text. The example below demonstrates bringing in long numbers, but it should work just as well with your formula "lookalikes"

CSV Exporting: Preserving leading zeros

I'm working on a .NET application which exports CSV files to open in Excel and I'm having a problem with preserving leading zeros when the file is opened in Excel. I've used the method mentioned at http://creativyst.com/Doc/Articles/CSV/CSV01.htm#CSVAndExcel
This works great until the user decides to save the CSV file within Excel. If the file is opened again in Excel then the leading zeros are lost.
Is there anything I can do when generating the CSV file to prevent this from happening.
This is not a CSV issue.
This is Excel loving to play with CSV files.
Change the extension to something else.
As #GSerg mentions, this is not a CSV issue.
If your users must edit/save in Excel they need to select the entire worksheet, right-click and choose "Format Cells" and from the Category list select "Text" after opening the csv file. This will preserve the leading zeros since the numbers will be treated as simple text.
Alternatively, you could use Open XML SDK 2.0, or some other Excel library, to create an xlsx file from your csv data and programmaticaly set the Cell type to Text in order to take the end users out of the equation...
I found a nice way around this, if you add a space anywhere along the phone number, the cell is then not treated as number and is treated as a text cell in both Excel and Apple's iWork Numbers.
It's the only solution I've found so far that plays nice with Numbers.
Yes I realise the number then has a space, but this is easy to process out of large chunks of data, you just have to select a column and remove all spaces.
Also, if this is web related, most web type things are ok with users entering a space in the number field. E.g you can tap-to-call on mobiles.
The challenge is to get the space in there in the first place.
In use:
01202123456 = 1202123456
but
01202 123456 = 01202 123456
Ok, new discovery.
Using Quick Preview on Mac to view a CSV file the telephone column will display perfectly, but opening the file fully with Numbers or Excel will ruin that column.
On some level Mac OS X is capable of handling that column correctly with no user meddling.
I am now working on the best/easiest way to make a website output a universally accepted CSV with telephone numbers preserved.
But maybe with that info someone else has an idea on how to make Numbers handle the file in the same way that Quick Preview does?

Resources