Bamboolib — Data Analysis with Python — without programming
Bamboolib is a Python library that provides a user interface component for no-code data analysis and transformations within Jupyter notebooks, including those in Azure Databricks. It allows users to work with their data more easily and quickly, without needing to write any code. Bamboolib generates Python code in the background as users complete tasks, which can be shared with others for quick reproduction of tasks or extended by experienced coders for more sophisticated results. Bamboolib is designed to streamline common data wrangling, exploration, and visualization tasks, and can be used by both novice and experienced data analysts.
Installation in Anaconda is quite simple :
Open the anaconda terminal and run the below commands
Test Bamboolib on Jupyter notebook
- Launch Jupyter Notebook
- import bamboolib in a Python notebook using the following command:
import bamboolib as bam
When the user clicks the “Show bamboo UI” button, a comprehensive user interface that enables user interaction with the Pandas data frame is displayed. You can scroll through to view all of the columns or up and down to view the rows by using the scroll bars.
As we can see that the GUI interface displays three options
1. Explore DataFrame
2. Search Actions
3. Create Plot
Data Exploration:
The “Explore Dataframe” option in the Bamboolib user interface makes it easy to do exploratory data analysis (EDA). Explore DataFrame tool consists of the below tabs
Glimpse: This provides high-level details about the dataset, such as column names, datatypes, unique values, missing values, and the shape of the data frame which is 891 * 12
Predictor patterns: it displays a heatmap that predicts the relationship between columns on x-axis and y-axis. Click on any cell to learn more about the correlation between any two columns.
For instance, the “sex” column is chosen on the x-axis, and “survived” is on the y-axis.
Correlation Matrix: It displays the correlation between the columns
Search Actions:
The transformation option allows us to choose from a number of transformations, including filters, sorting, grouping, and more, to be applied to our dataset. With the wide range of operations available, transformation can be utilized to manipulate our data.
Let’s examine a few of the modifications we can make.
Select columns: By clicking on search actions and choosing “select or drop columns,” you can filter the dataset to see only specific columns. Another flyout will appear on the right side, allowing the user to click “select or drop” and utilize the column dropdown to filter the columns.
After selecting “Select” and choosing the columns click on execute
We can see the dataset now consist of 891 rows and 10 columns
If anytime you wish to navigate to the original dataset, click on the undo button
Drop columns: Frequently, a DataFrame will have columns that are not useful to your analysis. We can see that the cabin column in our dataframe has the most missing values, hence it is preferable to remove it using the drop column method.
We can see that the dataframe now has 891 rows and 11 columns after dropping the cabin column.
Filter: Using this option we can analyse the dataset by adding certain conditions. For instance, the analysis of the number of passengers whose age is less than or equal to 15 can be carried out as shown below.
It is cleary visibile that there were 83 children on the Titanic.
We may further analyse this data to determine the gender distribution of the children, as shown below.
Sort: This options allows to sort the dataset based on one or more columns. For example, in the dataset below, you would like to display the rows with the names in alphabetical order from A to Z.
Group By: This option allows you to split your data into separate groups to perform computations for better analysis on one or more columns.
For instance, you can use the group by function on column “Sex” to determine the count for each gender.
We can also group by “sex” and calculate the mean of Survived, Pclass, Age, Sib Sp, Parch.
Drop missing values: This option removes the rows for the selected column that have missing values.
In our dataset we have the “Age” column which has 177 missing values, let us drop all the rows which has missing values.
Create Plot: You can create pie charts, scatter plots, bar plots, histograms, box plots, and more using the create plot option.