Hero Backgroud Elements 2
Reading Time: 10 Min

Using Jupyter Notebooks with SAP HANA Cloud

Learning Article
  • There are many data driven applications that can be developed by using the SAP HANA Cloud instance with any other development environment, such as Jupyter Notebook.

    In this example, we’ll focus on creating a Python 3 file on Jupyter Notebook and connect it to our SAP HANA Cloud instance. Once the connection is created, then we can add or access data stored on the SAP HANA Cloud instance.

    By connecting Python files to your SAP HANA Cloud instance, it also possible to use machine learning capabilities.

    Pre-requisites

    Before you get started, make sure you have Anaconda installed on your system. You can install Anaconda from https://docs.anaconda.com/anaconda/install/. Another pre-requisite is to make sure that you have an SAP HANA Cloud instance setup and running.

    You can also use Jupyter Notebook from Visual Studio. If you prefer to use Visual Studio with Anaconda, you can install Visual Studio Code from here. To work with Jupyter notebooks, you must activate an Anaconda environment in VS Code. To select an environment, use the Python Select Interpreter command from the Command Palette (Ctrl+Shift+P). Once the appropriate environment is activated, you can create and open a Jupyter Notebook.

    In this tutorial, you will see an example of Jupyter Notebook from Anaconda.

    Install packages and create a connection to SAP HANA Cloud

    1. To get started, install two packages, hdbcli and hana_ml. The hdbcli package is a SAP python driver, which allows a connection to SAP HANA Cloud from a Python application. The hana_ml package allows you to create a SAP HANA dataframe, as well as create a connection to your SAP HANA Cloud instance. In addition to this, the hana_ml package can also be used to experience various machine learning capabilities. It is recommended to install hdbcli driver version 2.5 and the latest version of hana_ml driver. You can install both these packages from https://pypi.org/.
    2. Once the packages are installed, launch Jupyter Notebook.
    3. Create a new Python 3 notebook.
    4. Import the packages using the following lines of code:
      import hdbcli
      import hana_ml
    5. To check if the package has been imported correctly, you can check the package version. Just type the following:
      print(hdbcli.__version __)
      print(hana_ml.__version__)
    6. Next, import the following packages required for this tutorial:
      from hana_ml import dataframe
      from hana_ml.dataframe import ConnectionContext
      import pandas as pd
    7. From the hana_ml package, import the dataframe to create a SAP HANA dataframe and use ConnectionContect to create a connection to the SAP HANA Cloud instance. Import also the pandas in order to load data from the csv file into the pandas dataframe.
    8. To create a connection to the SAP HANA Cloud instance, replace in the code below XXXX with your host, port, user and password information. Then run it.
      conn2 = dataframe.ConnectionContext(address=’XXXX’, port=’XXXX’, user=’XXXX’, password=’XXXX’, encrypt='true', sslValidateCertificate='false')

       

    Next Section: Load data and Create a HANA Cloud table
  • Load data from a csv into a pandas dataframe

    Please click here to download a zip file with the csv. From the zip file, extract travelroute.csv file on to your local machine. To load data from csv file, run the following code and replace the XXXX with the path to the csv file on your local machine.

    flight_pd = pd.read_csv('XXXX', sep=';', header=0, names=["CITYFROM","CITYTO", "NUMBOOKINGS"])

    Create SAP HANA Cloud table from pandas dataframe

    To create a table in the SAP HANA Cloud database from here, use the dataframe method below from the hana_ml package. Make sure to replace the XXXX with your schema name. This is part of the pandas dataframe created above and will generate a hana dataframe. The hana dataframe is simply a table hosted in SAP HANA Cloud that you can run a select statement from Jupyter notebook.

    flight_hdf = dataframe.create_dataframe_from_pandas(conn2, flight_pd, 'TRAVEL_ROUTE2', 'XXXX', force=True, replace=True)
    flight_hdf = conn2.table("TRAVEL_ROUTE2", schema="XXXX")
    print(flight_hdf.select_statement)

     

    Prev Section Next Section: Access the SAP HANA Cloud table from Jupyter Notebook
  • Access the SAP HANA Cloud table from Jupyter Notebook

    1. Now that you have a table in HANA, to access this table we execute the following lines of code. Replace the XXXX with your schema name and table with your table name.
      flight_hdf=conn2.sql ('select * from XXXX.(table) ')
    2. To check the data types of the table, you can execute the following command:
      flight_hdf.dtypes()
    3. To explore the SAP HANA dataframe, you can execute the following command:
      flight_hdf.describe().collect()

       

    Prev Section Next Section: Exploratory Data Visualizations
  • Exploratory Data Visualizations

    1. To add visualizations to your data, import EDAVisualizer using the command:
      from hana_ml.visualizers.eda import EDAVisualizer
    2. Next, import the following packages required for this tutorial.
      from hana_ml import dataframe
      import pandas as pd
      import matplotlib.pyplot as plt
      import time
      import numpy as np
      from hana_ml.algorithms.pal import stats

       

    3. Now we will create a pie chat using the SAP HANA Cloud table to find the most frequently travelled destination. To create a pie chart using run the following:
      f = plt.figure(figsize=(10,10))
      ax1 = f.add_subplot(111)
      start=time.time()
      eda=EDAVisualizer(ax1)
      ax, pie_data=eda.pie_plot(data = flight_hdf, column = “CITYTO”, title=”Frequently travelled Destinations”,legend=True)
      plt.show()
      end=time.time()
      print("Time: {}s".format(round(end-start),2))
    4. As you can see, the most frequently travelled destination is Frankfurt.

    Related topics:

    Prev Section Completed