Merging Two Text Files into One CSV File - text

I'm working with a windows batch command to create a list of filepaths and filenames (without the ext) for processing and archival. I need to make a CSV file that will contain the path to the file and the filename.
I was able to use the 'DIR /A-D-S /D /S' command to output the list with the file paths, which is filelistA.txt. Then I use a vbscript (makelistB.vbs) to strip the path and extension and save that as filelistB.txt. I need to merge the two files row for row, putting the comma separator in between and that's where I need some sort of VBscript.
filelistA.txt looks like:
C:\Data\Clients\COLD\AC3060P.txt
C:\Data\Clients\COLD\AC3090P.txt
C:\Data\Clients\COLD\AC3100P.txt
C:\Data\Clients\COLD\AC3150P.txt
C:\Data\Clients\COLD\AC3200P.txt
C:\Data\Clients\COLD\AC3600P.txt
C:\Data\Clients\COLD\AC3652P.txt
C:\Data\Clients\COLD\AC5715P.txt
C:\Data\Clients\COLD\AC5720P.txt
C:\Data\Clients\COLD\AC5725P.txt
filelistB.txt looks like:
AC3060P
AC3090P
AC3100P
AC3150P
AC3200P
AC3600P
AC3652P
AC5715P
AC5720P
AC5725P
I want to make FileListCSV.txt, that looks like this:
C:\Data\Clients\FWBT\COLD\AC3060P.txt,AC3060P
C:\Data\Clients\FWBT\COLD\AC3090P.txt,AC3090P
C:\Data\Clients\FWBT\COLD\AC3100P.txt,AC3100P
C:\Data\Clients\FWBT\COLD\AC3150P.txt,AC3150P
C:\Data\Clients\FWBT\COLD\AC3200P.txt,AC3200P
C:\Data\Clients\FWBT\COLD\AC3600P.txt,AC3600P
C:\Data\Clients\FWBT\COLD\AC3652P.txt,AC3652P
C:\Data\Clients\FWBT\COLD\AC5715P.txt,AC5715P
C:\Data\Clients\FWBT\COLD\AC5720P.txt,AC5720P
C:\Data\Clients\FWBT\COLD\AC5725P.txt,AC5725P
I'm also open to using SED for windows if that can do all of this in one shot. However, I would imagine this should be something that can be whipped up in VBscript in a few minutes.

This Windows batch file will do what you want without the need for the intermediate files.
#ECHO OFF
FOR %%i IN (*.txt) DO ECHO %%~fi,%%~ni
You can get the output of this batch into a text file by redirecting the output like this:
MyBatch.cmd>>Output.txt

This could be a job for the SQLite shell.
C:\Temp> sqlite3.exe
create table paths(path text);
create table filenames(fname text);
.import fileListA.txt paths
.import fileListB.txt filenames
.separator ,
.output FileListCSV.txt
select * from paths p join filenames f on f.rowid = p.rowid;
.q
The SQLite shell is a single executable that will either create a persistent SQLite database in the form of a file, or create a database in memory (when, like here, without any argument in the command line).

This can be done pretty fast in EasyMorph using Append transformation in "Append columns" mode.

Related

how to store multiple files in one file in python?

How can I store multiple files in one file using python?
I mean my own file format not a zip or a rar.
For e.g I want to create an archive from a folder but with my own file format. ( like 'Files.HR' )
Or just storing files in one file without any dictionary or file format. ( 'Files' No file format )
You may want to use "tar" files. In python, you can use the tarfile module to write files in the file and then later extract them back into real files.
You do not have to name the file *.tar. You can name it something else related to your specific application, such as naming it Files.HR.
Please see this nice tutorial or read the official docs to see how to use tarfile.

DoCmd.TransferText where delimiter is semicolon and decimal is comma

I'm trying to import a csv file with:
Dim appAccess As Access.Application
Set appAccess = CreateObject("Access.Application")
appAccess.OpenCurrentDatabase (databasePath)
appAccess.DoCmd.TransferText transferType:=acImportDelim, tableName:=dbTableName, Filename:=strPath, hasFieldNames:=True
I'm using a German machine, where the standard delimiter is ; and the standard decimal-separator is ,.
If I use those separators, I get an error (the data is not separated correctly).
If I change the separator in the csv file to ,and the decimal-separator to ., the data is loaded in the database, but the . is ignored and numeric values therefore aren't imported correctly.
I don't have the option, to create an import scheme in Access manually. Is there a way, to do this with VBA?
I created a Schema.ini file, which looks like this:
[tempfile.csv]
Format=Delimited(;)
ColNameHeader=True
DecimalSymbol=","
I saved it in the same folder where the csv file is located.
I still get a Runtime-Error, saying field1;field2;... is not a header in the target table. So I'm guessing, the method didn't use ; as a delimiter.
If you have a look at the documentation of the DoCmd.TransferText method there exists a parameter SpecificationName which says:
A string expression that's the name of an import or export specification you've created and saved in the current database. For a fixed-width text file, you must either specify an argument or use a schema.ini file, which must be stored in the same folder as the imported, linked, or exported text file.
To create a schema file, you can use the text import/export wizard to create the file. For delimited text files and Microsoft Word mail merge data files, you can leave this argument blank to select the default import/export specifications.
So if you are not able to generate that schema.ini file using the wizard you can generate it yourself in the same folder as your files to import. For a documentation how to build that file see Schema.ini File (Text File Driver).
It should look something like the following I think:
[YourImportFileName.csv]
Format=Delimited(;)
DecimalSymbol=","
Note that you have to generate one ini file for each CSV file you want to import because the first line is always the name of the import file. So generate the schema.ini, import, delete the ini and start over generating the next ini for the next file.
If you want to generate that ini file with VBA on the fly, have a look at How to create and write to a txt file using VBA.

How to load different files into different tables, based on file pattern?

I'm running a simple PySpark script, like this.
base_path = '/mnt/rawdata/'
file_names = ['2018/01/01/ABC1_20180101.gz',
'2018/01/02/ABC2_20180102.gz',
'2018/01/03/ABC3_20180103.gz',
'2018/01/01/XYZ1_20180101.gz'
'2018/01/02/XYZ1_20180102.gz']
for f in file_names:
print(f)
So, just testing this, I can find the files and print the strings just fine. Now, I'm trying to figure out how to load the contents of each file into a specific table in SQL Server. The thing is, I want to do a wildcard search for files that match a pattern, and load specific files into specific tables. So, I would like to do the following:
load all files with 'ABC' in the name, into my 'ABC_Table' and all files with 'XYZ' in the name, into my 'XYZ_Table' (all data starts on row 2, not row 1)
load the file name into a field named 'file_name' in each respective table (I'm totally fine with the entire string from 'file_names' or the part of the string after the last '/' character; doesn't matter)
I tried to use Azure Data Factory for this, and it can recursively loop through all files just fine, but it doesn't get the file names loaded, and I really need the file names in the table to distinguish which records are coming from which files & dates. Is it possible to do this using Azure Databricks? I feel like this is an achievable ETL process, but I don't know enough about ADB to make this work.
Update based on Daniel's recommendation
dfCW = sc.sequenceFile('/mnt/rawdata/2018/01/01/ABC%.gz/').toDF()
dfCW.withColumn('input', input_file_name())
print(dfCW)
Gives me:
com.databricks.backend.daemon.data.common.InvalidMountException:
What can I try next?
You can use input_file_name from pyspark.sql.functions
e.g.
withFiles = df.withColumn("file", input_file_name())
Afterwards you can create multiple dataframes by filtering on the new column
abc = withFiles.filter(col("file").like("%ABC%"))
xyz = withFiles.filter(col("file").like("%XYZ%"))
and then use regular writer for both of them.

Qlikview - append data to Excel

I have qvw file with sql query
Data:
LOAD source, color, date;
select source, color, date
as Mytable;
STORE Data into [..\QV_Data\Data.qvd] (qvd);
Then I export data to excel and save.
I need something to do that automatically instead of me
I need to run query every day and automatically send data to excel but keep old data in excel and append new value.
Can qlikview to do that?
For that you need to create some crazy macro that runs after a reload task in on open-trigger. If you schedule a windows task that execute a bat file with path to qlikview.exe with the filepath as parameters and -r flag for reload(?) you can probably accomplish this... there are a lot of code of similar projects to be found on google.
I suggest adding this to the loadscript instead.
STORE Table into [..\QV_Data\Data.csv] (txt);
and then open that file in excel.
If you need to append data you could concatenate new data onto the previous data.. something like:
Data:
load * from Data.csv;
//add latest data
concatenate(Data)
LOAD source, color, date from ...
STORE Data into [..\QV_Data\Data.csv] (txt);
I assume you have the desktop version so you don't have access to the Qlikview Management Console (if you do, this is obviously the best way).
So, without the Console, you should create a txt file with this command: "C:\Program Files\QlikView\Qv.exe" /r "\\thePathToYourFile\test.qvw". Save this file with .cmd file extension. After that you can schedule this command file with the windows task scheduler.

Matlab: filesystem, string manipulation and figures saving

In the workspace I have many m-files containing data I'd like to plot.
I have to read them all and save their plot without showing the results (I'll see them after all is done).
The last part can be done this way?
f = figure('Visible', 'off');
plot(x,y);
saveas(f,'figure.fig');
but I don't want to load manually each m-file where x and y are stored.
So I need a way to explore the filesystem and run these statements for each file, manipulate their name and save a jpg with the same name of its m-file.
The dir function will return a structure containing info on the Folders and Files in the current directory
>> FileInfo = dir
Then you need to write code to use that info to automatically navigate the directory structure (using cd for instance), and select the files you want to read.
The function what can also be useful if you're wanting to only look for certain file types, e.g. .mat files.
Not surprisingly, similar questions to this have been asked before, for instance see here

Resources