How to identify line endings on a large number of files

How to identify line endings on a large number of files - linux

Given a medium-size tree of files (a few hundred), is there some utility that can scan the whole tree (recursively) and display the name of each file and whether the file currently contains CRLF, LF, or mixed line terminators?
A GUI that can both display the current status and also selectively change specific files is preferred, but not essential.
Also prefer a solution for Windows, but I have access to both Bash for Windows and a Linux box that has access to the same file tree, so I can use something Linux-y if necessary.

Related Question: https://unix.stackexchange.com/questions/118959/how-to-find-files-that-contain-newline-in-filename
You can use linux' find to look recursivly for filenames containing newline characters:
find . -name $'*[\n\r]*'
From there you can proceed to do what you need to do.

Related

How to move lots of dotfiles staying at /home without breaking programs?

With more and more programs installed on my computer, I am tired of seeing lots of dotfiles while I have to access them often. For some reason I won't hide dotfiles when browsing files. Is there a way to move them to a better place I want them to stay (e.g. ~/.config/$PROGCONF) without affecting programs while running?
Symlinks still leave file symbols, which is far from my expectation. I expect that operations like listdirs() won't show the files while opening them uses a redirection.

"For some reason it won't hide dotfiles when browsing files.":
That depends on the file manager you use. nautilus hides it by default and most file managers have an option to "show/hide hidden files". The ls command by default omits out hidden files (files starting with a dot). It lists all files with the option -a.
"Is there a way to move them to a better place":
Programs which have support for "XDG user directories" can store their config files in `~/.config/$PROGRAM_NAME/. If the program doesn't support that and expects the config file to be present in the home directory, there is little you can do (Maybe you can give us a list of what programs' config files you want to move). The process differs for each program.
Let me give an example with vim. Its config file is ~/.vimrc. Lets say you move the file to ~/.config/vim/.vimrc. You can make vim read the file by launching vim using the following command.
vim -u ~/.config/vim/.vimrc
You can modify the .desktop entry or create a new shell script to launch vim using the above command and put it inside /usr/local/bin/ or create shell functions / aliases. You can read more about changing vim's config file location in this SO question.
This arch wiki article has application specific information.
"without affecting programs while running":
It depends on a few factors namely the file system used, the program we are dealing with and so on.
Generally, deleting / moving files only unlinks the file name from an inode and programs read / write files using inodes. Read more here. And most programs read the config file at the start, load the values into memory. They rarely read the config files again. So, if you move your config file while the program is running (assuming the program supports config in both places), you won't see a difference until the program is restarted.
"I expect that operations like listdirs() won't show the files"
I am assuming you are talking about os.listdir() in python. If files are present, os.listdir() will list them, there is little you can change about that. But you can write custom functions to omit out the hidden files from being listed.
This SO question can help with that.

Batch replacing unidentified Characters in Unix that were created by macOS

On a Linux volume as part of a NAS with many TB of data some files were created from macOS and some of those files uploaded from macOS seem to include characters in filenames that cannot be reproduced via FTP or SMB file protocol. These files will appear as e.g. "picture_name001.jpg". Where the "" probably stands for a colon or slash.
I can search for "" and found out it applies to 2171 files in distributed locations on the volume. Way too much to manually find and correct each file name.
I thought I can connect to the NAS via SSH and simply loop through each directory doing an automated replace of the "" into "_", but this doesn't work because:
for file in **; do mv -- "$file" "${file///_}"; done
this attempt will throw back an error on the first item matching  with:
mv: can't rename '120422_LAXJFK': No such file or directory
So obviously this substitute character displayed as "" is not the way to address the file or directory as it refers to a name that doesn't actually exists in the volume index.
(A) How do I find out if "120422_LAX:JFK" or "120422_LAX/JFK" is meant here, and (B) how do I escape these invalid characters to eventually be able to automatically rename all those names to for example "120422_LAX_JFK"?
Is there for example a way to get a numerical file ID from the name and then instruct to rename the file by number in case its name contains ""?

I think the problem is that behind this "" can be different codes of symbols. When the system can't represent some characters (for example, given encoding is not supported), then it automatically replaced by some default character (in your case it is ""). But actually there is some code of the character, that should be in the name. BUT when you trying to do this for file in **; do mv -- "$file" "${file///_}"; done system can't recognize code, that symbol is "" is stands for.
I think this problem can be solved by changing the encoding of characters (they should be compatible and better the same) on both devices (mac and NAS)
Hope this would help

Linux terminal script to create boilerplate files in current working directory with one varying word?

I have to create two boilerplate files, both of which always have the same content, with the EXCEPTION of a single word. I'm thinking of creating a command or something that I can run in the Linux terminal (Ubuntu), along with an argument that represents the one word which can vary in the files created. Perhaps a batch file will accomplish this, but I don't know what it will look like.
I will be able to run this command every time I create these boilerplate files, instead of pasting the boilerplate and changing the one word in the file that has to be changed.
These file paths relative to my current working directory are:
registration.php
etc/module.xml

A simple Python script that reads in the file as string and replaces the occurrence would probably be the quickest. Something like:
with open('somefile.txt', 'r+') as inputFile:
txt=inputFile.read().replace('someword', 'replacementword')
inputFile.seek(0)
inputFile.write(txt)
inputfile.close()

Linux directory starting with dot

Is there anything special about directories which start with a dot . in Linux (Ubuntu), such as ~/.vim?
Thanks.

Files and directories whose names begin with a dot (.) by default are not displayed in directory listings by the standard command ls. Therefore, they are traditionally used to store settings, preferences, etc.. Directory ~/.vim in particular surely contains personal preferences and settings for the text editor vim.
There are also two special directory names in this class: the directory named simply . is an alias for the same directory in which it appears (a self reference), and the directory named .. refers to the parent directory of ..
Many graphical file browsers ignore the convention of hiding file names beginning with a ., so it is not necessarily correct any longer to call these files "hidden". Nevertheless, that terminology persists.

In UNIX-like environments, a filename preceded by a dot indicates a hidden file. It's mainly a mechanism to decrease clutter in directory listings. You can get a listing of hidden files by passing the -a parameter to ls

Those are hidden. You'd need to apply extra effort to see them.

Append text file with custom footer

Good day,
I am a CNC program not a computer programer. I am using CAM software to make cutting programs for our CNC router. The router is a bit old and can only take files 200-300 kb big. We are doing carvings that require 1-2 megs text files. I am using a program called GSplit ( http://www.gdgsoft.com/gsplit/ ) to divvy up the text file. It generates 10-25+ files with a custom header that our machine can read. All the files are great and it works, but I have to manually add the closing lines/footer to each file. The files that are created and used are normal .txt files but with a specific extension, .ANC.
Is there any way to automate this process of opening each individual file, scrolling to the end and copy/pasting the same 1-2 lines of code? The files are NAME[number].ANC in a contained folder. Would it be possible to just direct to a folder and say "add this 'text' to every file in this folder"?
Thanks for your time.

What OS are you using? Using Unix you can do a simple script on command line. If you are in the directory with the specific files simply execute:
for file in *; do echo "APPEND THIS" >> $file; done
If you are running Windows you should be able to do the same using cygwin (probably you could also use the power shell, but I don't know anything about the that)

I found a program Notepad++ (apparently the last person to find it...). USed the find/replace files option. A regular expression(note sure exactly what these are but I'm sure you guys do) "\s+\z" as to what to look for. It finds the last space or whatever at the end of all the files and then adds the code I need. Easy, free, and I don't need to write any computer code. Thanks for the attempt to help me Dirkk! :)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string