Change multiple consecutive variable names - rename

I am using Stata 14 and have a dataset which contains a large group of variables:
court_date1 court_date2 court_date3
I would like to change part of each variable name while keeping the number at the end:
court_event1 court_event2 court_event3
Is there a way to do so as a group using the wildcard (*)? They are numbered consecutively, but are not listed consecutively in the dataset.

rename (*date*) (*event*)
works with just the names you give. If that catches too much, then
rename (court_date*) (court_event*)
See help rename groups, including the dryrun option.

Related

Excel: Create sequential value based on input from multiple drop downs

I am looking for some assistance. I want to be able to generate a name from input on multiple drop down lists of a standardized naming convention and I can not determine how to complete this. As an example this is just data generated for the example. I need to have a value generated by drop down list selections where If the name already existed it would append it to add the next sequential number. Can anyone advise of where I should start with this? I have spend hours searching with no luck.
Currently I have the drop down list selections, that are then matched via INDEX and reference the corresponding code abbreviation in columns F:J, that are then displayed out using =F5&G5&H5&I5&J5".
I can not determine how to add the sequence to this starting with 1, and if 1 already exists moving to 2. Example AAAARE_BSKS1 exists but the same selections are chosen the new name for that selection would be AAAARE_BSKS2.
I was able to get it figured out. I took my generated name, I then added a new column beside it that did a =COUNTIF($J$2:$J26,J26) to find the number of time the name repeated itself prior to this cell. From there I added another column that did a =CONCATENATE(J26:K26) to add the name, with the number of occurrences after, and it worked perfectly. Just wanted to post this in case anyone runs into the same issue. Thank you.

2 Unsorted List Files, need the missing links

I have 2 list files, of which one is missing lines.
However, both files are NOT sorted equally, thus, each line has a different place in both.
How do I find and list specifically the missing lines in the copy?
Thank you.

Simultaneously renaming multiple files with different names

I have around 150 files with the same .txt extension, but the filenames are just alphanumeric strings (e.g. 7J9E45600.txt, FF5632088.txt, etc.). I have a list where the alphanumeric strings are matched to more meaningful names. I want to replace these alphanumeric strings with the meaningful names, but would like to do it programatically. Most of the existing solutions allow to rename multiple files with incrementally increasing numbers, e.g. via a loop command, but in my case all the filenames will be different. An example of what I want to do is as follows: rename 7J9E45600.txt to adipose.txt, rename FF5632088.txt to brain.txt, etc. A solution utilizing Linux, R, Perl or Python is most welcome.
Yes, this is easy to do with a for loop in R.
Make or read in your data, with a column of old names, and a matching column containing the new names. I copied an example with four files.
oldnames<-c("/Users/foo/Documents/pictures/Test/2020-04-21 19.59.jpg",
"/Users/foo/Documents/pictures/Test/2020-04-21 19.59.35.jpg",
"/Users/foo/Documents/pictures/Test/2020-04-21 19.58.37.jpg",
"/Users/foo/Documents/pictures/Test/2020-04-21 17.21.06.jpg")
newnames<-c("/Users/foo/Documents/pictures/Test/2021-04-21 19.59.59.jpg",
"/Users/foo/Documents/pictures/Test/2021-04-21 19.59.35.jpg",
"/Users/foo/Documents/pictures/Test/2021-04-21 19.58.37.jpg",
"/Users/foo/Documents/pictures/Test/2021-04-21 17.21.06.jpg")
testnames_df<-data.frame(cbind(oldnames,newnames))
for (i in 1:4) {file.rename(from=testnames_df$oldnames[i], to = testnames_df$newnames[i])}

SPSS converting a string into a numeric variable issue

I have a string variable with lots of parentheses and other punctuation e.g. _LSC Debt licensed work. How can I easily convert it to a numeric variable when I already have a specified code list for it? i.e. I don't want it to automatically recode everything because it uses the wrong values against the labels.
Create a dataset with two variables: a string holding the current messy name and a numeric variable holding the new code. Then, with both the original dataset and the lookup one sorted by the string, do MATCH FILES specifying a table match (or use Data > Merge Files > Add Variables).
You can prepare a separate file which includes two variables:
- one contains each of the possible values in the original string variable to be recoded (make sure the name and width are the same as your original variable)
- the second contains the new values you want to recode to.
when you set this up, match the files like this:
get file="filepath\Your_Value_Table.sav".
sort cases by YourOriginalVarName.
dataset name ValTab.
get file="filepath\Your_Original_File.sav".
sort cases by YourOriginalVarName.
match files /file=* /table=ValTab /by YourOriginalVarName.
exe.
At this point your original file will contain a new variable that has the codes you wanted.
In general I agree with the solution provided by others. However, I would like to suggest an extra step, which could make your look-up file (see the answer of eli-k and JKP) a bit better.
The point is that your string variable with lots of parentheses and other punctuation probably also has different ways to write the same thing.
For example:
_LSC Debt licensed work
LSC Debt licensed work
_LSC Debt Licensed Work
etc.
You could create a lookup-table with three variables: the unique values of the original string variable, a cleaned-up version of that variable, and finally the numeric value you want to attach.
The advantage of the cleaned-up version is that you can identify more easily the same value although it is written differently.
You could clean up using several functions:
string CleanedUpVersion (A40).
compute CleanedUpVersion = REPLACE(RTIM(LTRIM(UPCASE(YourOriginalVarName))),'_','').
execute.
In this basic example we convert to capital letters, delete leading and trailing blanks and remove the underscore by replacing it by nothing.
Overall this could help to avoid giving different numbers to unique values in your original variable that mean the same thing, while you would like them to have the same number.

Join a path variable to each row in the import csv file

I have many Import files, which look like this
So there are sales values per Team Member, but NO period inside.
The period is coded in the Path like:
AllData\201501\Revenues.txt
AllData\201502\Revenues.txt
AllData\201503\Revenues.txt
I want to have the Periode from the path on each data row, so my final output table should look like this:
So I must bring the period from the path inside the file anyway.
The question how to access the path is solved in perfect example here:
How can I save a path criteria when I import from folders?
But there I have still the period on the "whole" text, not on the row.
In the linked question you can change the custom column formula from:
Text.FromBinary([Content])
to
Text.Split(Text.FromBinary([Content]), "#(000a)")
(depending on how line breaks are represented, you may need to use "#(000a)#(000d)" instead).
This will split the text at each new line, and you'll get a list of the name;value pairs. Click on the box with the two arrows next to the column name to expand the column. Each row should now have the period associated with the name;value pair. Finally, split the column by delimiter on the semicolon to separate the name from the value.
There are 2 options, both involve horrible looking equations.
First option, we assume the paths are going to have the period in the same position in the string.
for the example, we want the number between the 1st and 2nd slashes.
=TRIM(LEFT(SUBSTITUTE(MID(A1,FIND("|",SUBSTITUTE(A1,"\","|",1))+1,LEN(A1)),"\",REPT(" ",LEN(A1))),LEN(A1)))
If it's between a different set of slashes, alter the ,1 to tell the formula which slash to start from. If the number of slashes can be different, then we will have to try for the second option.
Second option, we assume that those are the only numbers in the path.
This formula will extract those numbers:
=SUMPRODUCT(MID(0&A1,LARGE(INDEX(ISNUMBER(--MID(A1,ROW($1:$25),1))* ROW($1:$25),0),ROW($1:$25))+1,1)*10^ROW($1:$25)/10)
Note that this will extract all the numbers from the string. If the path contains numbers, then these will get added to the string. e.g. C:\2014Data\201401\Revenues.txt would return 2014201401
If this doesn't take care of it, then it may be easier putting a column into the table yourself

Resources