How to split a messed-up-dump back to separated files? - linux

I have a ZIP file, not succeeded to unzip for some-reason, saying "invalid or incomplete multibytes or wide character". So I unzip -p myfile.zip > Messed.data , I want to separate them, with script.
unzip -l to get the files-size.
dd ibs=1 skip=$((sum of front file-size)) count=$((this-file-size))
I tried and found the speed was unbearable slow.
So I ask for any help to this. Thank you.

Related

Splitting large tar file into multiple tar files

I have a tar file which is 3.1 TB(TeraByte)
File name - Testfile.tar
I would like to split this tar file into 2 parts - Testfil1.tar and Testfile2.tar
I tried the following so far
split -b 1T Testfile.tar "Testfile.tar"
What i get is Testfile.taraa(what is "aa")
And i just stopped my command. I also noticed that the output Testfile.taraa doesn't seem to be a tar file when I do ls in the directory. It seems like it is a text file. May be once the full split is completed it will look like a tar file?
The behavior from split is correct, from man page online: http://man7.org/linux/man-pages/man1/split.1.html
Output pieces of FILE to PREFIXaa, PREFIXab, ...
Don't stop the command let it run and then you can use cat to concatenate (join) them all back again.
Examples can be seen here: https://unix.stackexchange.com/questions/24630/whats-the-best-way-to-join-files-again-after-splitting-them
split -b 100m myImage.iso
# later
cat x* > myImage.iso
UPDATE
Just as clarification since I believe you have not understood the approach. You split a big file like this to transport it for example, files are not usable this way. To use it again you need to concatenate (join) pieces back. If you want usable parts, then you need to decompress the file, split it in parts and compress them. With split you basically split the binary file. I don't think you can use those parts.
You are doing the compression first and the partition later.
If you want each part to be a tar file, you should use 'split' first with de original file, and then 'tar' with each part.

remove extracting files after extract, split files

I have a request and a problem.
I have archived files
tar xvpf /to_arch |gzip - c | split -b10000m - /arch/to_arch.gz_
I use this comand. this is archive got my system and i need move it on other server.
on nev server i havent space for put arhive and extract it then i have idea.
Can someone help me write script in bash who can remuve extracted files.
like to_arch.gz_aa to_arch.gz_abto_arch.gz_acto_arch.gz_ad etc.
if finish extract aa file then script delete it.
cat *.gz* | tar zxvf - -i
Normaly i extract that but havent space on disk.

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time, although I am in the process of getting a computer to do this. Is it possible to unzip only parts of the file to begin testing my scripts?
I am trying to a specific SNP at a position on a subset of the samples. I have tried using bcftools to no avail: (If anyone can identify what went wrong with that I would also really appreciate it. I created an empty file for the output (722g.990.SNP.INDEL.chrAll.vcf.bgz) but it returns the following error)
bcftools view -f PASS --threads 8 -r chr9:55252802-55252810 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.bgz
The output type "722g.990.SNP.INDEL.chrAll.vcf.bgz" not recognised
I am planning on trying awk, but need to unzip the file first. Is it possible to partially unzip it so I can try this?
Double check your command line for bcftools view.
The error message 'The output type "something" is not recognized' is printed by bcftools when you specify an invalid value for the -O (upper-case O) command line option like this -O something. Based on the error message you are getting it seems that you might have put the file name there.
Check that you don't have your input and output file names the wrong way around in your command. Note that the -o (lower-case o) command line option specifies the output file name, and the file name at the end of the command line is the input file name.
Also, you write that you created an empty file for the output. You don't need to do that, bcftools will create the output file.
I don't have that much experience with bcftools but generically If you want to to use awk to manipulate a gzipped file you can pipe to it so as to only unzip the file as needed, you can also pipe the result directly through gzip so it too is compressed e.g.
gzip -cd largeFile.vcf.gz | awk '{ <some awk> }' | gzip -c > newfile.txt.gz
Also zcat is an alias for gzip -cd, -c is input/output to standard out, -d is decompress.
As a side note if you are trying to perform operations on just a part of a large file you may also find the excellent tool less useful it can be used to view your large file loading only the needed parts, the -S option is particularly useful for wide formats with many columns as it stops line wrapping, as is -N for showing line numbers.
less -S largefile.vcf.gz
quit the view with q and g takes you to the top of the file.

Linux Bash auto command, source text file

Thank you for your concern.
I'm a noob trying to bulk-ip-lookup with [geoiplookup -f GeoLiteCity.dat] command.
I have more than 700 ips to lookup which saved on as c.txt (on the same folder)
How can I make a bash shell script? I've already made one and all I got was:
sudo: unable to execute ./ok.sh: No such file or directory
Here is my script
It would be all - ok to use another language.
To make it more clear;
[geoiplookup -f GeoLiteCity.dat IP1]
[geoiplookup -f GeoLiteCity.dat IP2]
...
[geoiplookup -f GeoLiteCity.dat IP700]
and save them as one text file. (Which would be 700 row)
I'm Korean and sorry for my poor English, but I couldn't find any in my language how to do this. I'll really appreciate it, or I have to look up 1 by 1 till Friday... (as internet speed is extremely slow in my company)
Please help me. I will pray for you at every sunday morning. Thank you.
found a very simple answer with a duckduckgo search for 'iterate through each line of file bash'
stackoverflow.com/questions/1521462/looping-through-the-content-of-a-file-in-bash
#!/usr/bin/bash
printf "\n\n"
while read ip; do
echo "LOOKING UP IP $ip"
geoiplookup $ip
printf "\n\n"
done < ipaddresses.txt
save it as iplookup.sh and run, without 'sudo':
bash iplookup.sh
tested and working. be sure to rename your file 'c.txt' to 'ipaddresses.txt' ! also the 'ipaddresses.txt' file must be in the same directory

Unzipping file without first directory

I want to extract some files. Ex. test.zip to /path/to/folder. Using Archive::Extract and specifying a "to" in extract I can extract it to /path/to/folder, but it extracts to /path/to/folder/test. Same goes for using the system unzip/gunzip.
I don't want to unzip -j, I want to keep the subdirectories.
Is there a way to do this that does not involve browsing to /path/to/folder/test and cp -rf * ../? Either by system command or in perl...
Thanks for reading. :)
You might prefer Archive::Zip
Archive::Zip->new( 'test.zip' )->extractTree( '', '/path/to/folder' );

Resources