I've got some files so big to directly open them in Sublime Text. Is there any way to open only the nth first lines? Something like head in bash? Thanks
If you're on Linux or Mac, or have Cygwin, Git Bash, or similar installed on a Windows machine, check out the split utility, which is part of the coreutils package. It does exactly what it says: it splits input into separate files. It is configurable via command-line options, like every Unix utility. For example, if you wanted to split your input file into separate 10,000-line files starting with notsobigfile and using numeric suffixes ending with .txt, you would run
split -d -l 10000 --additional-suffix=".txt" reallybigfile.txt notsobigfile
and it would output files named notsobigfile01.txt, notsobigfile02.txt, etc. If this would generate more than 100 files (00 through 99), just add -a x where x is the number of digits (the default is 2).
For all the possible options, just read the man page:
man split
If you only want to output the first part of the file, check out the options for the -n/--number flag.
To figure out how many lines your input file has, run the word counting utility using the lines option:
wc -l reallybigfile.txt
Related
Does anybody know the command to remove the header from a ppm file in Linux? I've tried this already
´´´´´´´´´´´´´´´´´´´´´´´´´´´´´´´´´´
head -n 4 Example.ppm > header.txt
tail -n 5+ Example.ppm > body.bin
´´´´´´´´´´´´´´´´´´´´´´´´´´´´´´´´´´
It tells me that "Tail" could not be found.
Most ppm files use newlines in the header so your first command is fine. However, the rest of the file is binary, so:
head -n 4 Example.ppm > header.txt
filesize=$(wc -c header.txt)
dd if=Example.ppm of=body.bin bs=1 skip=$filesize
You should have /bin/tail if you have /bin/head; both are in the coreutils RPM package.
The format of a ppm(5) file (http://netpbm.sourceforge.net/doc/ppm.html) is awkward to use with the line-based head/tail/sed family. The documentation describes fields separated by whitespace that is not necessarily a line break.
You will need to: 1) Ignore comments from '#' to end of line; and 2) process the remainder one field (not column, not line) at a time. Using awk(1) could be an option here.
Check the documentation (http://netpbm.sourceforge.net/doc/directory.html) for a list of conversion programs. You may find one that converts the PPM file into a form better suited to whatever usage is your ultimate goal.
I'm trying to display the output of an AWS lambda that is being captured in a temporary text file, and I want to remove that file as I display its contents. Right now I'm doing:
... && cat output.json && rm output.json
Is there a clever way to combine those last two commands into one command? My goal is to make the full combined command string as short as possible.
For cases where
it is possible to control the name of the temporary text file.
If file is not used by other code
Possible to pass "/dev/stdout" as the.name of the output
Regarding portability: see stack exchange how portable ... /dev/stdout
POSIX 7 says they are extensions.
Base Definitions,
Section 2.1.1 Requirements:
The system may provide non-standard extensions. These are features not required by POSIX.1-2008 and may include, but are not limited to:
[...]
• Additional character special files with special properties (for example, /dev/stdin, /dev/stdout, and /dev/stderr)
Using the mandatory supported /dev/tty will force output into “current” terminal, making it impossible to pipe the output of the whole command into different program (or log file), or to use the program when there is no connected terminals (cron job, or other automation tools)
No, you cannot easily remove the lines of a file while displaying them. It would be highly inefficient as it would require removing characters from the beginning of a file each time you read a line. Current filesystems are pretty good at truncating lines at the end of a file, but not at the beginning.
A simple but extremely slow method would look like this:
while [ -s output.json ]
do
head -1 output.json
sed -i 1d output.json
done
While this algorithm is plain and simple, you should know that each time you remove the first line with sed -i 1d it will copy the whole content of the file but the first line into a temporary file, resulting in approximately 0.5*n² lines written in total (where n is the number of lines in your file).
In theory you could avoid this by do something like that:
while [ -s output.json ]
do
line=$(head -1 output.json)
printf -- '%s\n' "$line"
fallocate -c -o 0 -l $((${#len}+1)) output.json
done
But this does not account for variable newline characters (namely DOS-formatted newlines) and fallocate does not always work on xfs, among other issues.
Since you are trying to consume a file alongside its creation without leaving a trace of its existence on disk, you are essentially asking for a pipe functionality. In my opinion you should look into how your output.json file is produced and hopefully you can pipe it to a script of your own.
I have a 17 GB txt file and i cannot seem to load it via vim. Researched on solutions provided here. However i do not seem to understand them very well and i am not good with linux or perl.
I understand i would have to use grep or something.
grep -oP "/^2" file
I have tried up to this code but i cannot seem to find the solution to output the number of occurences without printing all the lines to screen
I would like to find the number of lines that starts with a digit 2 in the file and output the number to shell.
If you want to continue using PCRE:
grep -cP ^2 file
Using grep's "basic regular expressions":
grep -c ^2 file
I have a file with contents like below.
7f22cebc9330
600e98
7fff1814ff50
7f22cebc95c0
7f22cebc95b8
4002a8
7f22cebc95bc
You can see that some have 12 characters (eg:7f22cebc9330 ), and some have six (eg: 600e98).
How can I edit this file such that only lines with 12 characters are kept in the file, removing all the lines that are NOT of 12 characters length ?
So that my new file would look like this:
7f22cebc9330
7fff1814ff50
7f22cebc95c0
7f22cebc95b8
7f22cebc95bc
I mean by using shell command in linux.
Thanks.
awk 'length() == 12' input.file > output.file
There are several tools that will allow you to edit the file directly (gnu sed, perl, etc), but doing so is a mistake. Write the output to a new file, and use the shell to rename if necessary.
I am using the Split command on linux and trying to split the file in to n number of files based on number of records, and after splitting each file gets a million records.
The command that I am using is
split -a 3 --numeric-suffixes -l 1000000 - 20141113_File.txt.
And this command is creates me n files with naming convention 20141113_File.txt.000, 20141113_File.txt.001,...20141113_File.txt.010
What I am looking is the first file should start with 001 not with 001 prefix, like
20141113_File.txt.001, 20141113_File.txt.002,......20141113_File.txt.011
I am able to achieve the same in 8.2.1 GNU Coreutilities version.