Skip to content

Packages

Data Basis' packages allow access to the public datalake directly from your computer or development environment. Currently available in:

  • Python
  • R
  • Stata
  • CLI (terminal)

Ready to start? On this page you'll find:

Getting started

Before starting: Create your Google Cloud project

To create a Google Cloud project, you just need an email registered with Google. You need to have your own project, even if empty, to make queries in our public datalake.

  1. Access Google Cloud. If it's your first time, accept the Terms of Service.
  2. Click on Create Project. Choose a nice name for the project.
  3. Click on Create
Why do I need to create a Google Cloud project?

Google provides 1 TB free per month of BigQuery usage for each project you own. A project is necessary to activate Google Cloud services, including BigQuery usage permission. Think of the project as the "account" in which Google will track how much processing you have already used. You don't need to add any card or payment method - BigQuery automatically starts in Sandbox mode, which allows you to use its resources without adding a payment method. Read more here.

Installing the package

To install the package in Python and command line, you can use pip directly from your terminal. In R, you can install directly in RStudio or editor of your preference.

pip install basedosdados
install.packages("basedosdados")

Requerimentos:

  1. Ensure your Stata is version 16+
  2. Ensure Python is installed on your computer.

Once the requirements are met, run the following commands:

net install basedosdados, from("https://raw.githubusercontent.com/basedosdados/mais/master/stata-package")

Configuring the package

Once you have your project, you need to configure the package to use the ID of that project in queries to the datalake. To do this, you must use the project_id that Google provides for you when the project is created.

Example of Project ID in BigQuery

You don't need to configure the project beforehand. As soon as you run your first query, the package will indicate the steps to configure.

Once you have the project_id, you must pass this information to the package using the set_billing_id function.

set_billing_id("<YOUR_PROJECT_ID>")

You need to specify the project_id every time you use the package.

Make your first query

A simple example to start exploring the datalake is to pull information cadastral of municipalities directly from our base of Brazilian Directories (table municipio). To do this, we will use the function download, downloading the data directly to our machine.

import basedosdados as bd
bd.download(savepath="<PATH>",
dataset_id="br-bd-diretorios-brasil", table_id="municipio")

To understand more about the download function, read the reference manual.

library("basedosdados")
query <- "SELECT * FROM `basedosdados.br_bd_diretorios_brasil.municipio`"
dir <- tempdir()
data <- download(query, "<PATH>")

To understand more about the download function, read the reference manual.

bd_read_sql, ///
    path("<PATH>") ///
    query("SELECT * FROM `basedosdados.br_bd_diretorios_brasil.municipio`") ///
    billing_project_id("<PROJECT_ID>")

basedosdados download "where/to/save/file" \
--billing_project_id <YOUR_PROJECT_ID> \
--query 'SELECT * FROM
`basedosdados.br_bd_diretorios_brasil.municipio`'
To understand more about the download function, read the reference manual.

Tutorials

How to use the packages

We prepared tutorials presenting the main functions of each package for you to start using them.

Reference manuals (API)