Unable to connect to s3 buckets from pyspark - apache-spark

I am trying to connect to my s3 buckets using my Spark as follow:
rdd=sc.textFile("s3n://bucketname/objectname")
rdd=sc.textFile(""s3a://bucketname/objectname")
and changed my cores-site.xml as pers s3a or s3n but I am getting error as follow. Tried various changes in my hadoop core-site.xml. I am getting errors such as "load aws credentials from any provider in chain". {/.aws credentials file is there with right credentials}
ResponseStatus: Bad Request, XML Error Message: AuthorizationHeaderMalformedThe authorization header is malformed; a non-empty Access Key (AKID) must
be provided in the credential
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://********.compute-1.amazonaws.com:9000</value>
</property>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>ACCESSKEYID</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>SECRETKEY</value>
</property>
</configuration>
I added aws-sdk-s3 into my spark jars file. Please provide me directions to get me on to the right track.
Complete error message:
Bad Request, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AuthorizationHeaderMalformed</Code><Message>The authorization header is malformed; a non-empty Access Key (AKID) must be provided in the credential.</Message><RequestId>E64EEB94923F0FF7</RequestId><HostId>cmAiSUGZo7w7IgK3gJ+ubuWdlXwffEhpnpdnkoJQ2hLP8EHBXZDau0mFCKCC3eWBtfL9V1Le4Mw=</HostId></Error>

Related

Can't Enforce password reset on WSo2IS?

When I send the SOAP request to update the forcedpasswordreset value i get 202 Code on the SOAP UI and the user doesn't get notified to update the password, and the wso2carbon.log says the following:
INFO {org.wso2.carbon.core.services.util.CarbonAuthenticationUtil} - 'wso2admin#carbon.super [-1234]' logged in at [2019-01-07 11:39:23,318+0200]
I'm trying to use AdminForcedPasswordReset in WSO2 Identity Server and I followed the steps in {https://docs.wso2.com/display/IS530/Forced+Password+Reset} RecoveryEmail type.
Here's my SOAP Request:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:mgt="http://mgt.profile.user.identity.carbon.wso2.org" xmlns:xsd="http://mgt.profile.user.identity.carbon.wso2.org/xsd">
<soapenv:Header/>
<soapenv:Body>
<mgt:setUserProfile>
<mgt:username>omar.alaeldain</mgt:username>
<mgt:profile>
<xsd:fieldValues>
<xsd:claimUri>http://wso2.org/claims/identity/adminForcedPasswordReset</xsd:claimUri>
<xsd:fieldValue>true</xsd:fieldValue>
</xsd:fieldValues>
<xsd:profileName>default</xsd:profileName>
</mgt:profile>
</mgt:setUserProfile>
</soapenv:Body>
I expect the user omar.alaeldain to get nitified at next login by an E-mail to update his password.
Please verify whether you have done the following changes.
You need to enable "Enable Password Reset via Recovery Email" from the below configuration.
https://docs.wso2.com/display/IS530/Forced+Password+Reset?preview=/60494003/60494255/forced-password-reset-residentidp.png
You need to configure "from email address".
Open the output-event-adapters.xml file found in the /repository/conf directory.
Configure the relevant property values for the email server that you need to configure for this service under the tag.
<adapterConfig type="email">
<!-- Comment mail.smtp.user and mail.smtp.password properties to support connecting SMTP servers which use trust
based authentication rather username/password authentication -->
<property key="mail.smtp.from">abcd#gmail.com</property>
<property key="mail.smtp.user">abcd</property>
<property key="mail.smtp.password">xxxx</property>
<property key="mail.smtp.host">smtp.gmail.com</property>
<property key="mail.smtp.port">587</property>
<property key="mail.smtp.starttls.enable">true</property>
<property key="mail.smtp.auth">true</property>
<!-- Thread Pool Related Properties -->
<property key="minThread">8</property>
<property key="maxThread">100</property>
<property key="keepAliveTimeInMillis">20000</property>
<property key="jobQueueSize">10000</property>
</adapterConfig>
Configure email address to the user omar.alaeldain.
once finish all the three steps invoke the soap service.

Nifi 1.5 Untrusted Proxy on cluster

I've done my best to follow: https://pierrevillard.com/2016/11/29/apache-nifi-1-1-0-secured-cluster-setup/
I'm running nifi-1.5.0 and when I go to each of the pages I see an error like: Untrusted proxy CN=nifi-{1-3}.east.companyname.com, OU=NIFI.
I'm using ldap authentication, and just accepting the "invalid" certificate.
I've used an unrelated key-server to generate the keystore/truststore/certs as per the link above.
I also have the
nifi.security.needClientAuth=true
and
nifi.cluster.protocol.is.secure=true
set in the nifi.properties files on all of my nodes
my authorizers file includes entries for all of the nodes like:
<property name="Node Identity 1">CN=nifi-1.east.companyname.com, OU=NIFI</property>
<property name="Node Identity 2">CN=nifi-2.east.companyname.com, OU=NIFI</property>
<property name="Node Identity 3">CN=nifi-3.east.companyname.com, OU=NIFI</property>
Thanks in advance!
I would recommend configuring your authorizer in authorizers.xml to use a CompositeConfigurableUserGroupProvider that has two user group providers:
file-user-group-provider: this will be used to store the identities (certificate DNs) of your cluster nodes
ldap-user-group-provider: for your end users, that will be proxied when the cluster is replicating requests
Configure both of these UserGroupProviders, then configure the CompositeConfigurableUserGroupProvider to use the file-user-group-provider as the "Configurable Provider" and the ldap-user-group-provider as "User Group Provider 1". Here is an example:
<authorizers>
<userGroupProvider>
<identifier>file-user-group-provider</identifier>
<class>org.apache.nifi.authorization.FileUserGroupProvider</class>
<property name="Users File">./conf/users.xml</property>
<property name="Legacy Authorized Users File"></property>
<property name="Initial User Identity 1">CN=nifi-1.east.companyname.com, OU=NIFI</property>
<property name="Initial User Identity 1">CN=nifi-2.east.companyname.com, OU=NIFI</property>
<property name="Initial User Identity 1">CN=nifi-3.east.companyname.com, OU=NIFI</property>
</userGroupProvider>
<userGroupProvider>
<identifier>ldap-user-group-provider</identifier>
<class>org.apache.nifi.ldap.tenants.LdapUserGroupProvider</class>
<!-- ... configure this to match the settings in login-identity-providers.xml ... -->
</userGroupProvider>
<userGroupProvider>
<identifier>composite-configurable-user-group-provider</identifier>
<class>org.apache.nifi.authorization.CompositeConfigurableUserGroupProvider</class>
<property name="Configurable User Group Provider">file-user-group-provider</property>
<property name="User Group Provider 1">ldap-user-group-provider</property>
</userGroupProvider>
<accessPolicyProvider>
<identifier>file-access-policy-provider</identifier>
<class>org.apache.nifi.authorization.FileAccessPolicyProvider</class>
<property name="User Group Provider">composite-configurable-user-group-provider</property>
<property name="Authorizations File">./conf/authorizations.xml</property>
<property name="Initial Admin Identity"></property>
<property name="Legacy Authorized Users File"></property>
<property name="Node Identity 1">CN=nifi-1.east.companyname.com, OU=NIFI</property>
<property name="Node Identity 2">CN=nifi-2.east.companyname.com, OU=NIFI</property>
<property name="Node Identity 3">CN=nifi-3.east.companyname.com, OU=NIFI</property>
</accessPolicyProvider>
<authorizer>
<identifier>managed-authorizer</identifier>
<class>org.apache.nifi.authorization.StandardManagedAuthorizer</class>
<property name="Access Policy Provider">file-access-policy-provider</property>
</authorizer>
</authorizers>
Configure this on each node, then remove users.xml and authorizations.xml and restart NiFi on each node. (This is necessary to create the users.xml and authorizations.xml with your node identities setup to act as proxies, which will not happen if users.xml and authorizations.xml exist with data.) If done correctly, each node should allow the clustered nodes to authenticate using the client certificate (from their keystore.jks) and each node will be authorized to act as proxies, meaning that when an end-user is talking to one cluster, that interaction will be replicated to all nodes in the cluster, which is what you want.
You should be able to set nifi.security.needClientAuth=false. Certificate-based authentication will still work, it just won't be required (i.e., for the initial communication from an end-user to a node, LDAP credentials will be enough).
Hope this helps!
Reference: NiFi Admin Guide

How to configure LDAP on Spark-Thrift server on AWS EMR?

Note that we are not talking about hiveserver2, or hive-thrift server here.
If anyone has experience with this, I want to configure LDAP auth on spark-thrift server. I am using AWS EMR as my cluster.
I am able to start the server and query using it, but without any username or password. Not even sure where to specify authentication related properties. There's just very little documentation on this stuff.
Looking forward to hear from anyone who has experience doing this.
copy the hive-site.xml from your ~/hive/conf directory to your ~/spark/conf/ directory.
you need to configure egress rules to allow your EMR cluster to connect to your LDAP server/ip/port.
As per official documentation from hiveserver2 :
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2
Set following for LDAP mode:
hive.server2.authentication.ldap.url – LDAP URL (for example, ldap://hostname.com:389).
hive.server2.authentication.ldap.baseDN – LDAP base DN. (Optional for AD.)
hive.server2.authentication.ldap.Domain – LDAP domain. (Hive 0.12.0 and later.)
See User and Group Filter Support with LDAP Atn Provider in HiveServer2 for other LDAP configuration parameters in Hive 1.3.0 and later.
hive-site.xml – changes :
<property>
<name>hive.server2.authentication</name>
<value>LDAP</value>
<description>
Expects one of [nosasl, none, ldap, kerberos, pam, custom].
Client authentication types.
NONE: no authentication check
LDAP: LDAP/AD based authentication
KERBEROS: Kerberos/GSSAPI authentication
CUSTOM: Custom authentication provider
(Use with property hive.server2.custom.authentication.class)
PAM: Pluggable authentication module
NOSASL: Raw transport
</description>
<property>
<name>hive.server2.authentication.ldap.url</name>
<value>ldaps://changemetoyour.ldap.url:5000</value>
<description>
LDAP connection URL(s),
this value could contain URLs to mutiple LDAP servers instances for HA,
each LDAP URL is separated by a SPACE character. URLs are used in the
order specified until a connection is successful.
</description>
</property>
<property>
<name>hive.server2.authentication.ldap.baseDN</name>
<value>changeme.mydomain.com</value>
<description>LDAP base DN</description>
</property>
<property>
<name>hive.server2.authentication.ldap.Domain</name>
<value/>
<description/>
</property>
<property>
<name>hive.server2.authentication.ldap.groupDNPattern</name>
<value/>
<description>
COLON-separated list of patterns to use to find DNs for group entities in this directory.
Use %s where the actual group name is to be substituted for.
For example: CN=%s,CN=Groups,DC=subdomain,DC=domain,DC=com.
</description>
</property>

HDFS group Permissions issue, Cluster integrated with Kerberos + AD

CDH cluster is integrated with Kerberos + AD.
user_A is added to groups groupX and AD_GROUP_X
user_B is added to groups groupX and AD_GROUP_X
There are two files in HDFS with different group permissions:
/user/file_a
Owner: user_A, Group: groupA
Permissions: u=rwx, g=rwx, o=---
/user/file_b
Owner: user_B, Group: AD_GROUP_X
Permissions: u=rwx, g=rwx, o=---
Scenario #1:
user_A wants to access file /user/file_b ==> Success
Scenario #2:
user_B wants to access file /user/file_a ==> failed expected is success
Once AD is integrated with cluster, HDFS reads only AD groups or it can read both AD groups and unix groups.
It is possible to configure and combine multiple existing mapping providers without expecting all the users at a single place. i.e AD User can use LdapGroupMapping provider for group. Unix user can use the default provider ShellBasedUnixGroupsMapping for unix group mapping.
It can be configured as shown below.
<property>
<name>hadoop.security.group.mapping</name>
<value>org.apache.hadoop.security.CompositeGroupsMapping</value>
</property>
<property>
<name>hadoop.security.group.mapping.providers</name>
<value>unix,ad01,ad02</value>
</property>
<property>
<name>hadoop.security.group.mapping.providers.combined</name>
<value>true</value>
<description>true or false to indicate whether groups from the providers are combined or not. If true, all the providers are tried and the final result is all the groups where the user exists. If false, the first group in which the user was found is returned. Default value is true.
</description>
</property>
<property>
<name>hadoop.security.group.mapping.provider.unix</name>
<value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
</property>
<property>
<name>hadoop.security.group.mapping.provider.ad01</name>
<value>org.apache.hadoop.security.LdapGroupsMapping</value>
</property>
<property>
<name>hadoop.security.group.mapping.provider.ad02</name>
<value>org.apache.hadoop.security.LdapGroupsMapping</value>
</property>
<property>
<name>hadoop.security.group.mapping.provider.ad01.ldap.url</name>
<value>ldap://</value>
</property>
<property>
<name>hadoop.security.group.mapping.provider.ad02.ldap.url</name>
<value>ldap://</value>
</property>
<property>
<name>hadoop.security.group.mapping.provider.ad01.ldap.bind.user</name>
<value></value>
</property>
<property>
<name>hadoop.security.group.mapping.provider.ad02.ldap.bind.user</name>
<value></value>
</property>
<property>
<name>hadoop.security.group.mapping.provider.ad01.ldap.base</name>
<value></value>
</property>
<property>
<name>hadoop.security.group.mapping.provider.ad02.ldap.base</name>
<value></value>
</property>
Support multiple group providers - JIRA

How secured is SSO based on token based authentication?

I am planning to integrate jasper server with my web application as Single Sign on. I went through Jasper Authentication cookbook
and jasper
suggest Token based authentication as one of the solution (as authentication is already done by my web application)
What Jasper suggests is this
you pass the token in specific format (as defined below under tokenFormatMapping) to jasper server
, jasper will authenticate the request.
So valid tokens can be
u=user|r=role1|o=org1|pa1=PA11|pa2=PA21|exp=2001404150601
Invalid token can be
u1=user|r=role1|o=org1|pa1=PA11|pa2=PA21|exp=2001404150601
r=role1|u=user|o=org1|pa1=PA11|pa2=PA21|exp=2001404150601
My question is this really a secured process because as soon hacker knows the pattern, he can simply login to jasper server ?
To me looks like security can be compromised here. Am i missing something here?
<bean class="com.jaspersoft.jasperserver.api.security.externalAuth.wrappers.spring.preauth.JSPreAuthenticatedAuthenticationProvider">
....................
<property name="tokenPairSeparator" value="|" />
<property name="tokenFormatMapping">
<map>
<entry key="username" value="u" />
<entry key="roles" value="r" />
<entry key="orgId" value="o" />
<entry key="expireTime" value="exp" />
<entry key="profile.attribs">
<map>
<entry key="profileAttrib1" value="pa1" />
<entry key="profileAttrib2" value="pa2" />
</map>
</entry>
</map>
</property>
<property name="tokenExpireTimestampFormat" value="yyyyMMddHHmmssZ" />
</bean>
</property>
</bean>
According to the Jasper Reports Authentication cookbook, using token-based authentication the user is not directly logged in, meaning that only certain operations can be done using this method.
Furthermore, it specifies the following:
JasperReports Server will accept any properly formatted token;
therefore, you need to protect the integrity of the token using
measures such as the following:
Connect to JasperReports Server using SSL to protect against token interception.
Encrypt the token to protect against tampering.
Configure the token to use a timestamp to protect against replay attacks. Without a timestamp, when you include the token in a web page or REST web service URL, the URL can be copied and used by unauthorized people or systems. Setting the expire time for the token will stop tokens/URLs from being used to authenticate beyond the indicated time. You can set the expiry time depending on your use case. For a user who is logged into the application/portal and is requesting access to JasperReports Server, expiry time of a minute or less from the request time is appropriate.
All communications need to be made through an SSL tunnel. Otherwise, anyone could establish a connection to your JR server, send tokens and get information from it.
I was also looking to implement token based SSO with Jasper Server and got stuck on exactly the same question. This approach doesn't seem secure to me as the authentication is never denied if the request is properly formatted which is a simple thing to do.
The other alternative (If you are not using CAS or LDAP providers) would be to authenticate based on request as mentioned in section 7.4 "Authentication Based on Request" in the authentication cook-book. Create your own custom authentication provider and configure it in the applicationContext-externalAuth.xml :
<bean id="customAuthenticationManager" class="org.springframework.security.
providers.ProviderManager">
<property name="providers">
<list>
<ref bean="${bean.myCustomProvider}"/>
<ref bean="${bean.daoAuthenticationProvider}"/>
</list>
</property>
</bean>

Resources