This article is a detailed guide developed for technical teams to understand the Hyper Anna on-premise deployment guidelines. 

Contents

On-Premise and Client Cloud Deployment

For a deployment to the client’s ecosystem the following is required.

Site access

Most on-premise deployments are done remotely by HyperAnna via VPN and SSH access to Anna VM.

In case of an offline deployment (where remote access to the VM isn't possible), Hyper Anna staff will need to access the client's site to conduct the on-premise deployment.

Workstation requirements

All access to Anna from both end users and Hyper Anna staff is generally from trusted machines within your secure network. This is usually via a remote desktop connection or a company laptop within a VPN.

  1. SSH client for accessing VM
  2. Browser for accessing Anna (Google Chrome preferred)
  3. [Optional] MySQL client
  • Chrome compatibility: Version 60 and above 
  • IE compatibility: IE11

Virtual Machine requirements

For simplified deployment in initial environment build, all components are deployed to a single virtual machine with the following specs:

Operating System

  • Ubuntu 18.04, or Red Hat Enterprise Linux 7

Access

  • User level SSH access with sudo rights

Minimal Hardware Requirements

  • 16 CPU cores, 128 GB RAM, 100 GB disk space at /var/lib/docker (symlink or mounted); 500GB recommended

Cloud providers

  • In AWS environments, we recommend a r4.4xlarge instance.
  • In Microsoft Azure environments, we recommend a Standard H16 instance.

Software requirements

The below software needs to be installed as part of the installation of Anna. The Hyper Anna team can install the software during the deployment phase if the virtual machine provided has access to the Hyper Anna Proxy Server.

  1. docker-ce (18.09.5-ce or higher)
  2. Docker-compose (1.22 or higher)
  3. python (2.7.x minimum, 3.x preferred)
  4. ansible (2.7+)

Network requirements: domain name

  • A human-readable domain name within a domain familiar to your end users (e.g. hyperanna.yourdomainname.com).
  • SSL Certificate for the nominated domain.

The domain name should point to the Anna VM.

Network requirements - whitelists

Inbound - All access to the Anna application and Data Initialisation tools on the VM is via the following

  1. HTTP/HTTPS ports (80/443) from within your private network
  2. SSH port (22) from within your private network

Outbound - For maximum effectiveness, Anna connects to additional servers on the public internet at runtime.

Front end runtime - from end-user workstations:

Back end runtime - from the Anna VM

Access to the following are required for us to monitor the system:

Access to the following may be required for email functionality:

  • Gmail (smtp.gmail.com and imap.gmail.com), or
  • Internal mail servers (smtp + imap)

Network requirements - deployment

Hyper Anna requires internet access to install Anna and all dependencies. A proxy based deployment approach would require whitelisting the IP address below on port 3128: 

  • Proxy IP Address: 13.237.23.43, Port: 3128

Extra Whitelist for Deployment

  • *.newrelic.com

On-premise data source options

With an on-premise deployment there are five data source options:

  1. On-premise database
  2. Cloud database
  3. Big Data platform
  4. On-premise flat file
  5. Cloud flat file

Each option is discussed in more detail below. For all data source options, a Data Dictionary explaining the meaning of each of the columns in the data should be uploaded to Anna.

There are benefits to connecting via direct database connection over flat file: 

  • Reuse database views containing business logic and data transformation logic
  • Speed up exploration and consumption of additional data for new use cases
  • Automate data refresh and eliminate effortful and delayed manual extraction
  • Connecting to the data source of truth reduces risk of data integrity issues

Please note: Unstructured data is currently not supported.

As a guide, one standard VM should be able to handle 25GB of data.

1. On-premise database:

Anna can connect to databases from a range of vendors in the client ecosystem. The Anna VM needs to be whitelisted for the database. The following database information will need to be provided to Hyper Anna:

  • Database type
  • Database server name / URL / IP
  • Port
  • Database schema name
  • Username & password with read access (additional write access is required if data transformation needs to be done by Hyper Anna)
  • Names of the table(s) or view(s) to be used by Anna

Supported databases:

For a full list of supported databases, please refer to this list.

2. Cloud database:

Anna can connect to a Cloud database, provided the Anna VM has internet access.

The Anna VM needs to be whitelisted for the database.

The following database information will need to be provided to Hyper Anna:

  • Database type
  • Database server name / URL / IP
  • Port
  • Database schema name
  • Username & password with read access (additional write access is required if data transformation needs to be done by Hyper Anna)
  • Names of the table(s) or view(s) to be used by Anna

Supported databases:

For a full list of supported databases, please refer to this list.

3. Big Data Platform:

Anna uses Spark for data processing and querying, and additionally, is capable of connecting to an external Spark & HDFS cluster.

Recommended Versions: Apache Spark: 2.2.+ Apache Hadoop: 2.7.+

4. On-premise flat file:

We require the flat files to be uploaded to the Anna VM. To enable automatic data refresh, a process to automatically upload data to the Anna VM is required.

Anna currently supports these data formats:

  • CSV
  • CSV in Zip

Note: The delimiter used in the flat file must be unambiguous and must not clash with data values and column names. We recommend putting double quotes around all values.

5. Cloud flat file:

Anna can also connect to a flat file uploaded to Hyper Anna’s Azure Secure Blob Storage by the client. The Anna VM will need internet access.

Hyper Anna will provide a login for the client to use to conduct the upload.

Anna currently supports these data formats:

  • CSV
  • CSV in Zip

The delimiter used in the flat file must be unambiguous and must not clash with data values and column names. We recommend putting double quotes around all values.

Email service requirements

In order to support Email Anna, Share Insights, and any other future email based features of Anna, the following is required:

OPTION 1 - anna@hyperanna.com

Questions from users can be sent to anna@hyperanna.com and Anna’s replies will be sent to the sender’s email address. This is the preferred option.

The hyperanna.com domain should be whitelisted on the email server so that Anna’s replies are not filtered incorrectly or placed in spam.

From the Anna VM:

  1. Enable outbound access to smtp.gmail.com on port 587
  2. Enable outbound access to imap.gmail.com on port 995

OPTION 2 - Client email address managed by Anna (e.g. anna@client.com)

Questions from users can be sent to anna@client.com and Anna’s replies will be sent to the sender’s email address.

Anna will require access to the nominated email account to access the client email server.

Anna VM will require access to the client email server.

Email Security for Cloud and On-Premises Deployment

Email encryption: Anna supports email transfer through an encrypted channel. Anna does not currently support email content encryption.

Client data

Client data (your organisation’s data) is segregated into different files - meaning a client’s data will not exist in the same file as a different client, and is protected by:

  • User access control in Anna
  • O/S user security using SSH public key (and not password)
  • VM security, firewall security, and network security Client data files are not encrypted.

Access control data

Access control data such as user logins and passwords are logically segregated in the Anna database. This is secure because passwords are one-way hashed - meaning that no-one within or outside Hyper Anna can see the password.

The database does not contain any client data nor personally identifiable data except the username and organisation name. The database contains metadata for the client datasets to be used in Anna.

Access control data other than passwords are not encrypted.

Access control

User access rights are controlled through Anna’s Admin Portal feature. The controls can be at the user-, group-, and organisation level, and can be applied to data down to the row level.

If you have any questions, please contact us at support@hyperanna.com.

Did this answer your question?