jLibSvm: How to set up parameters for a well formed train

jLibSvm: How to set up parameters for a well formed train - svm

I'm trying to use jLibSvm to train biometric data, all of the same type.
There are 5 classes of data tagged with strings.
I've tried either with a MultiClassModel or with a BinaryModel, but in both case, any prediction always returns the same result, even by changing the gamma value.
Below, my code for MultiClassModel:
public static <L,SparseVector> void train(Map<SparseVector,L> examples){
MultiClassificationSVM svm = new MultiClassificationSVM(new C_SVC());
ImmutableSvmParameterPoint.Builder builder = new ImmutableSvmParameterPoint.Builder();
builder.C = 1.0;
builder.kernel = new GaussianRBFKernel(0.001);
builder.eps = 0.1f;
ImmutableSvmParameter params = builder.build();
MutableMultiClassProblemImpl problem =
new MutableMultiClassProblemImpl(String.class, null,
examples.size(), new NoopScalingModel());
problem.examples = examples;
for( SparseVector v : examples.keySet() )
problem.exampleIds.put(v, problem.exampleIds.size());
model = svm.train(problem, params);
}
And BinaryModel:
public static <L> void train(List<SparseVector> exTrue, List<SparseVector> exFalse, String label){
C_SVC svm = new C_SVC();
ImmutableSvmParameterPoint.Builder builder = new ImmutableSvmParameterPoint.Builder();
builder.C = C;
builder.kernel = new GaussianRBFKernel(gamma);
builder.eps = 0.001f;
ImmutableSvmParameter params = builder.build();
MutableBinaryClassificationProblemImpl problem
= new MutableBinaryClassificationProblemImpl(String.class, exTrue.size()+exFalse.size());
for(SparseVector v : exTrue)
problem.addExample(v, label);
String inverse = new StringLabelInverter().invert(label);
for(SparseVector v : exFalse)
problem.addExample(v, inverse);
BinaryModel bm = svm.train(problem, params);
models.put(label, bm);
}

Related

Apache Beam Metrics Counter giving incorrect count using SparkRunner

I am having source and target csv files with 10 million records with 250 columns.
I am running an apache beam pipeline which joins all columns from source and target file.
When, I run this on spark cluster the pipeline executes correctly with no exceptions but,
The join beam metrics counter returns double count when the following spark property is used.
-- executor-memory "2g"
But, When I increase the excutor-memory to 11g then it returns the correct count.
I have tried following example,
Pipeline pipeline = Pipeline.create(options);
final TupleTag<String> eventInfoTag = new TupleTag<>();
final TupleTag<String> countryInfoTag = new TupleTag<>();
PCollection<KV<String, String>> eventInfo =
eventsTable.apply(ParDo.of(new ExtractEventDataFn()));
PCollection<KV<String, String>> countryInfo =
countryCodes.apply(ParDo.of(new ExtractCountryInfoFn()));
PCollection<KV<String, CoGbkResult>> kvpCollection =
KeyedPCollectionTuple.of(eventInfoTag, eventInfo)
.and(countryInfoTag, countryInfo)
.apply(CoGroupByKey.create());
PCollection<KV<String, String>> finalResultCollection =
kvpCollection.apply(
"Process",
ParDo.of(
new DoFn<KV<String, CoGbkResult>, KV<String, String>>() {
#ProcessElement
public void processElement(ProcessContext c) {
KV<String, CoGbkResult> e = c.element();
String countryCode = e.getKey();
String countryName = "none";
countryName = e.getValue().getOnly(countryInfoTag);
for (String eventInfo : c.element().getValue().getAll(eventInfoTag)) {
Metrics.counter("count", "errorcount").inc();
c.output(
KV.of(
countryCode,
"Country name: " + countryName + ", Event info: " + eventInfo));
}
}
}));
final PipelineResult result = pipeline.run();
MetricQueryResults metrics =
result
.metrics()
.queryMetrics(
MetricsFilter.builder()
.addNameFilter(MetricNameFilter.inNamespace("count"))
.build());
Iterable<MetricResult<Long>> counters = metrics.getCounters();
for (MetricResult<Long> counter : counters) {
System.out.println("Hi >> "+counter.getName().getName() + " : " + counter.getAttempted() + " " + counter.getCommittedOrNull());
}
I need help with this.
Thank you

public static void main(String[] args) {
Configuration hadoopConf = new Configuration();
hadoopConf.set("fs.defaultFS", args[13]);
hadoopConf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
hadoopConf.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());
final TupleTag<Row> sourceDataInfoTag = new TupleTag<Row>(){};
final TupleTag<Row> targetDataInfoTag = new TupleTag<Row>(){};
HadoopFileSystemOptions options = PipelineOptionsFactory.as(HadoopFileSystemOptions.class);
options.setRunner(SparkRunner.class);
options.setHdfsConfiguration(Collections.singletonList(hadoopConf));
Pipeline pipeline = Pipeline.create(options);
PCollection<String> sourceData = pipeline.apply(TextIO.read().from(args[14]).withDelimiter("\n".getBytes()));
PCollection<KV<Row, Row>> sourceDataRows = sourceData.apply(ParDo.of(new ExtractFunction()));
PCollection<String> targetData = pipeline.apply(TextIO.read().from(args[23]).withDelimiter("\n".getBytes()));
PCollection<KV<Row, Row>> targetDataRows = targetData.apply(ParDo.of(new ExtractFunction()));
PCollection<KV<Row, CoGbkResult>> kvpCollection = KeyedPCollectionTuple
.of(sourceDataInfoTag, sourceDataRows.setCoder(KvCoder.of(RowCoder.of(SOURCE_JOIN_RECORD_TYPE),RowCoder.of(SOURCE_RECORD_TYPE))))
.and(targetDataInfoTag, targetDataRows.setCoder(KvCoder.of(RowCoder.of(TARGET_JOIN_RECORD_TYPE),RowCoder.of(TARGET_RECORD_TYPE))))
.apply(CoGroupByKey.<Row>create());
PCollection<GenericRecord> finalResultCollections = kvpCollection.apply("process",ParDo.of(new DoFn<KV<Row, CoGbkResult>, GenericRecord>() {
#ProcessElement
public void processElement(ProcessContext context) {
KV<Row, CoGbkResult> element = context.element();
Iterator<Row> srcIter = element.getValue().getAll(sourceDataInfoTag).iterator();
Iterator<Row> trgIter = element.getValue().getAll(targetDataInfoTag).iterator();
Metrics.counter("count", "count").inc();
GenericRecordBuilder builder = new GenericRecordBuilder(SCHEMA);
boolean done = false;
boolean captureError = false;
while (!done)
{
// Some iterator data here.
.
.
builder.set(colName, data);
if(captureError){
GenericRecord record = builder.build();
context.output(record);
}
}
}
})).setCoder(AvroCoder.of(GenericRecord.class, SCHEMA));
finalResultCollections.apply("writeText",FileIO.<GenericRecord>write()
.via(ParquetIO.sink(SCHEMA))
.withSuffix(".parquet")
.withPrefix("part")
.to("hdfs://temp/"));
final PipelineResult result = pipeline.run();
State state = result.waitUntilFinish();
MetricQueryResults metrics =
result
.metrics()
.queryMetrics(
MetricsFilter.builder()
.addNameFilter(MetricNameFilter.inNamespace("count"))
.build());
Iterable<MetricResult<Long>> counters = metrics.getCounters();
for (MetricResult<Long> counter : counters) {
System.out.println("Count >> "+counter.getName().getName() + " : " + counter.getAttempted() + " " + counter.getCommittedOrNull());
}
}

In your code, when you do Metrics.counter("count", "errorcount") you define the counter. But it is defined in a loop which is also in sort of a loop (processElement). You should define your counter as a field in the DoFn. Don't worry the DoFn is reused for processing the bundle. Such as: private final Counter counter = Metrics.counter(MyClass.class, COUNTER_NAME);
Also you showed only part of the code but I don't see done boolean set to true. But that is just out of curiosity.
And last but not least, you should try spark runner on master branch of Beam because there was a fix merged yesterday about metrics (metrics not reset when running multiple pipelines inside the same JVM). I don't know if it matches your use case but it's worth trying.

MVC 5 implicit conversion exists

I have a repository that I am tring to get a query to pass to a controller.
public ProjectViewModel SearchContractors(string zip)
{
var query = (from h in repository.tblHandymen
join hc in repository.tblHandyManCoverages on h.handymanID equals hc.handymanID
join s in repository.tblServiceRequests on hc.zip equals s.zip
where hc.zip == zip
where h.handymanID == hc.handymanID
where h.status == "Active"
select h
);
ProjectViewModel model = new ProjectViewModel
{
ContractorSearch = query.AsEnumerable()
};
return model;
}
Where I am stuck is here
ProjectViewModel model = new ProjectViewModel
{
ContractorSearch = query.AsEnumerable()
};
The error is an implicit conversion exists. Tried several things. Nothing working.

Your query is not returning a collection of ProjectContractorSearchViewModel objects (its returning a collection of tblHandymen). You need to project the results to typeof ProjectContractorSearchViewModel
var query = (from h in repository.tblHandymen
.....
).Select(x => new ProjectContractorSearchViewModel
{
someProperty = x.someProperty,
anotherProperty = x.anotherProperty,
....
});
ProjectViewModel model = new ProjectViewModel
{
ContractorSearch = query.AsEnumerable()
};
return model;

Converting Hand Written DI to Windsor Provided DI

For the past six or seven months I have been doing DI in some of my components as result they have grown to become little bit of complicated. In the past I have been creating Object graphs with hand written Factories. Since it is becoming unmanageable I am trying to move that code to Framework dependent DI(by code and not by some XML files). I am posting my Code as well as issues I am stuck with.
Here is my composition layer (it is big, so bear with me :) ):
IAgentFactory GetAgentFactory()
{
string errorMessage;
IDictionary<AgentType, ServiceParameters> agentFactoryPrerequisite = new Dictionary<AgentType, ServiceParameters>();
string restResponseHeaderStatus = MyConfigurationProject.GetConfigValue("RestResponseHeaderStatus", out errorMessage);
var service1Parameters = new ServiceParameters();
service1Parameters.BindingName = MyConfigurationProject.GetConfigValue("Service1WebHttpBindingConfiguration", out errorMessage).ToString();
service1Parameters.HeaderPassword = MyConfigurationProject.GetConfigValue("Service1HeaderPassword", out errorMessage).ToString();
service1Parameters.HeaderUserName = MyConfigurationProject.GetConfigValue("Service1HeaderUserName", out errorMessage).ToString();
service1Parameters.ResponseHeaderStatus = restResponseHeaderStatus;
service1Parameters.ServicePassword = MyConfigurationProject.GetConfigValue("Service1ServicePassword", out errorMessage).ToString();
service1Parameters.ServiceUrl = MyConfigurationProject.GetConfigValue("Service1URL", out errorMessage).ToString();
service1Parameters.ServiceUserName = MyConfigurationProject.GetConfigValue("Service1ServiceUserName", out errorMessage).ToString();
agentFactoryPrerequisite.Add(new KeyValuePair<AgentType, ServiceParameters>(AgentType.Service1, service1Parameters));
var agentFactory = new AgentFactory(agentFactoryPrerequisite);
return agentFactory;
}
protected DatalayerSettings GetDataLayerSettings()
{
var datalayerSettings = new DatalayerSettings();
datalayerSettings.ConnectionString = ConfigurationManager.ConnectionStrings["MyConnectionString"].ConnectionString;
datalayerSettings.MySchemaName = ConfigurationManager.AppSettings["MyDatabaseSchema"];
datalayerSettings.UpdatingUser = "Admin";
return datalayerSettings;
}
PostgersDAFactory GetPostGresDaFactory()
{
var datalayerSettings = GetDataLayerSettings();
return new PostgersDAFactory(datalayerSettings, "MyAssembly.PostgresDA", "MyDifferentAssembly.CommonDatalayer", "MyServiceLogPath");
}
public class PostgersDAFactory
{
readonly DatalayerSettings _datalayerSettings;
readonly string _assemblyName;
readonly string _logPath;
readonly string _mySecondAssemblyName;
public PostgersDAFactory(DatalayerSettings datalayerSettings, string assemblyName, string mySecondAssemblyName, string logPath)
{
_datalayerSettings = datalayerSettings;
_assemblyName = assemblyName;
_logPath = logPath;
_commonDaAssemblyName = commonDaAssemblyName;
}
public IDA1 GetDA1Instance()
{
var type1 = Type.GetType("MyAssembly.PostgresDA.ClassRealisingImplementation_For_DA1," + _assemblyName);
return (IDA1)Activator.CreateInstance(type1, _datalayerSettings, _logPath);
}
public IDA2 GetDA2Instance()
{
var type1 = Type.GetType("MyAssembly.PostgresDA.ClassRealisingImplementation_For_DA2," + _assemblyName);
return (IDA2)Activator.CreateInstance(type1, _datalayerSettings);
}
public IDA3 GetDA3Instance()
{
var type1 = Type.GetType("MyAssembly2.ClassRealisingImplementation_For_DA3," + _commonDaAssemblyName);
return (IDA3)Activator.CreateInstance(type1, _datalayerSettings);
}
}
public BaseFileHandler GetFileHandler(FileProvider fileprovider, MockedServiceCalculator mockedServicecalculator = null)
{
string errorMessage;
var postgresFactory = GetPostGresDaFactory();
var Da1Instance = postgresFactory.GetDA1Instance();
var fileSyncBusiness = new FileSyncBusiness(Da1Instance);
var interfaceConfiguratonParameters = fileSyncBusiness.GetInterfaceConfigurationParameters();
var servicePointDetailsSettings = new ServicePointDetailsSettings();
var nullDate = new DateTime(2099, 1, 1);
CommonValidations commonValidations;
if (mockedServicecalculator == null)
{
commonValidations = GetStubbedCommonValidations(nullDate);
}
else
{
commonValidations = GetCommonValidations_WithMockedServiceCalculator(nullDate, mockedServicecalculator);
}
switch (fileprovider)
{
case FileProvider.Type1:
var type1Adapter = new Type1Adaptor(false, nullDate);
servicePointDetailsSettings = GetUtiltaParameters(interfaceConfiguratonParameters);
return new Type1FileHandler(servicePointDetailsSettings, fileSyncBusiness, commonValidations, type1Adapter);
case FileProvider.Type2:
var type2Adapter = new Type2Adaptor(true, nullDate);
servicePointDetailsSettings.ApplicableParameters = MyApplicationCommonMethods.ConvertConfigurationTableToDictonary(interfaceConfiguratonParameters, "applicableintype2");
servicePointDetailsSettings.BadFileLocation = MyConfigurationProject.GetConfigValue("Type2BadFileLocation", out errorMessage);
servicePointDetailsSettings.DateFormat = MyConfigurationProject.GetConfigValue("Type2DateFormat", out errorMessage);
servicePointDetailsSettings.FailureFileLocation = MyConfigurationProject.GetConfigValue("Type2FailureFile", out errorMessage);
servicePointDetailsSettings.LogFileName = "Type2LogFile";
servicePointDetailsSettings.LogPath = MyConfigurationProject.GetConfigValue("Type2ErrorLog", out errorMessage);
servicePointDetailsSettings.MandatoryParameters = MyApplicationCommonMethods.GetDictonaryForMandatoryParameters(interfaceConfiguratonParameters, "applicableintype2", "mandatoryintype2");
servicePointDetailsSettings.SourceFileLocation = MyConfigurationProject.GetConfigValue("type2FileLocation", out errorMessage);
servicePointDetailsSettings.SuccessFileLocation = MyConfigurationProject.GetConfigValue("type2SuccessFile", out errorMessage);
servicePointDetailsSettings.TargetFileExtension = MyConfigurationProject.GetConfigValue("type2SupportedFileType", out errorMessage);
servicePointDetailsSettings.Type2RecordTag = MyConfigurationProject.GetConfigValue("MyApplicationtype2RecordTag", out errorMessage);
return new Type2FileHandler(servicePointDetailsSettings, fileSyncBusiness, commonValidations, type2Adapter);
default:
throw new NotImplementedException("FileProvider type: " + Convert.ToInt32(fileprovider) + " is not implemented");
}
}
}
While moving towards Windsor, I am facing several issues, as I have never used this product, it seems it is very complicated.
Issues:
How to pass parameters to object when they have parameterised
constructors?
I know there is a better way to write this "PostgersDAFactory"
class, but simply don't know.
There are some Factory methods Such as GetAgentFactory(), which are
dependent on some static method of other project, which in turn
gives me configuration values(I ahd to store them in the database),
another method GetDataLayerSettings is dependent on app config as
well as some static string.
I am likely to change parameter names in my classes in order to
promote readability, so how to turn on the logging for Windsor?
Finally another complicated method GetFileHandler, has some logic
(switch case).
I have tried going on there website but I found it totally difficult to digest information, there API is huge and learning curve seems to be mammoth.
Note: I had to change the variable names due to security reasons.

Issue with SqlScalar<T> and SqlList<T> when calling stored procedure with parameters

The new API for Servicestack.OrmLite dictates that when calling fx a stored procedure you should use either SqlScalar or SqlList like this:
List<Poco> results = db.SqlList<Poco>("EXEC GetAnalyticsForWeek 1");
List<Poco> results = db.SqlList<Poco>("EXEC GetAnalyticsForWeek #weekNo", new { weekNo = 1 });
List<int> results = db.SqlList<int>("EXEC GetTotalsForWeek 1");
List<int> results = db.SqlList<int>("EXEC GetTotalsForWeek #weekNo", new { weekNo = 1 });
However the named parameters doesn't work. You HAVE to respect the order of the parameters in the SP. I think it is because the SP is executed using CommandType=CommandType.Text instead of CommandType.StoredProcedure, and the parameters are added as dbCmd.Parameters.Add(). It seems that because the CommandType is Text it expects the parameters to be added in the SQL querystring, and not as Parameters.Add(), because it ignores the naming.
An example:
CREATE PROCEDURE [dbo].[sproc_WS_SelectScanFeedScanRecords]
#JobNo int = 0
,#SyncStatus int = -1
AS
BEGIN
SET NOCOUNT ON;
SELECT
FSR.ScanId
, FSR.JobNo
, FSR.BatchNo
, FSR.BagNo
, FSR.ScanType
, FSR.ScanDate
, FSR.ScanTime
, FSR.ScanStatus
, FSR.SyncStatus
, FSR.JobId
FROM dbo.SCAN_FeedScanRecords FSR
WHERE ((FSR.JobNo = #JobNo) OR (#JobNo = 0) OR (ISNULL(#JobNo,1) = 1))
AND ((FSR.SyncStatus = #SyncStatus) OR (#SyncStatus = -1) OR (ISNULL(#SyncStatus,-1) = -1))
END
When calling this SP as this:
db.SqlList<ScanRecord>("EXEC sproc_WS_SelectScanFeedScanRecords #SyncStatus",new {SyncStatus = 1});
It returns all records with JobNo = 1 instead of SyncStatus=1 because it ignores the named parameter and add by the order in which they are defined in the SP.
I have to call it like this:
db.SqlList<ScanRecord>("EXEC sproc_WS_SelectScanFeedScanRecords #SyncStatus=1");
Is this expected behavior? I think it defeats the anonymous type parameters if I can't trust it
TIA
Bo

My solution was to roll my own methods for stored procedures. If people finds them handy, I could add them to the project
public static void StoredProcedure(this IDbConnection dbConn, string storedprocedure, object anonType = null)
{
dbConn.Exec(dbCmd =>
{
dbCmd.CommandType = CommandType.StoredProcedure;
dbCmd.CommandText = storedprocedure;
dbCmd.SetParameters(anonType, true);
dbCmd.ExecuteNonQuery();
});
}
public static T StoredProcedureScalar<T>(this IDbConnection dbConn, string storedprocedure, object anonType = null)
{
return dbConn.Exec(dbCmd =>
{
dbCmd.CommandType = CommandType.StoredProcedure;
dbCmd.CommandText = storedprocedure;
dbCmd.SetParameters(anonType, true);
using (IDataReader reader = dbCmd.ExecuteReader())
return GetScalar<T>(reader);
});
}
public static List<T> StoredProcedureList<T>(this IDbConnection dbConn, string storedprocedure, object anonType = null)
{
return dbConn.Exec(dbCmd =>
{
dbCmd.CommandType = CommandType.StoredProcedure;
dbCmd.CommandText = storedprocedure;
dbCmd.SetParameters(anonType, true);
using (var dbReader = dbCmd.ExecuteReader())
return IsScalar<T>()
? dbReader.GetFirstColumn<T>()
: dbReader.ConvertToList<T>();
});
}
They are just modified versions of the SqlScalar and SqlList plus the ExecuteNonQuery

Using binary type of a UserProperty

For some reason I need to save some big strings into user profiles. Because a property with type string has a limit to 400 caracters I decited to try with binary type (PropertyDataType.Binary) that allow a length of 7500. My ideea is to convert the string that I have into binary and save to property.
I create the property using the code :
context = ServerContext.GetContext(elevatedSite);
profileManager = new UserProfileManager(context);
profile = profileManager.GetUserProfile(userLoginName);
Property newProperty = profileManager.Properties.Create(false);
newProperty.Name = "aaa";
newProperty.DisplayName = "aaa";
newProperty.Type = PropertyDataType.Binary;
newProperty.Length = 7500;
newProperty.PrivacyPolicy = PrivacyPolicy.OptIn;
newProperty.DefaultPrivacy = Privacy.Organization;
profileManager.Properties.Add(newProperty);
myProperty = profile["aaa"];
profile.Commit();
The problem is that when I try to provide the value of byte[] type to the property I receive the error "Unable to cast object of type 'System.Byte' to type 'System.String'.". If I try to provide a string value I receive "Invalid Binary Value: Input must match binary byte[] data type."
Then my question is how to use this binary type ?
The code that I have :
SPUser user = elevatedWeb.CurrentUser;
ServerContext context = ServerContext.GetContext(HttpContext.Current);
UserProfileManager profileManager = new UserProfileManager(context);
UserProfile profile = GetUserProfile(elevatedSite, currentUserLoginName);
UserProfileValueCollection myProperty= profile[PropertyName];
myProperty.Value = StringToBinary(GenerateBigString());
and the functions for test :
private static string GenerateBigString()
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 750; i++) sb.Append("0123456789");
return sb.ToString();
}
private static byte[] StringToBinary(string theSource)
{
byte[] thebytes = new byte[7500];
thebytes = System.Text.Encoding.ASCII.GetBytes(theSource);
return thebytes;
}

Have you tried with smaller strings? Going max on the first test might hide other behaviors. When you inspect the generated string in the debugger, it fits the requirements? (7500 byte[])

For those, who are looking for answer. You must use Add method instead:
var context = ServerContext.GetContext(elevatedSite);
var profileManager = new UserProfileManager(context);
var profile = profileManager.GetUserProfile(userLoginName);
profile["MyPropertyName"].Add(StringToBinary("your cool string"));
profile.Commit();

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

jLibSvm: How to set up parameters for a well formed train - svm

Related

Apache Beam Metrics Counter giving incorrect count using SparkRunner

MVC 5 implicit conversion exists

Converting Hand Written DI to Windsor Provided DI

Issue with SqlScalar<T> and SqlList<T> when calling stored procedure with parameters

Using binary type of a UserProperty

Categories

Resources