Packages
Data Basis' packages allow access to the public datalake directly from your computer or development environment. Currently available in:
- Python
- R
- Stata
- CLI (terminal)
Ready to start? On this page you'll find:
Getting started
Before starting: Create your Google Cloud project
To create a Google Cloud project, you just need an email registered with Google. You need to have your own project, even if empty, to make queries in our public datalake.
- Access Google Cloud. If it's your first time, accept the Terms of Service.
- Click on
Create Project
. Choose a nice name for the project. - Click on
Create
Why do I need to create a Google Cloud project?
Google provides 1 TB free per month of BigQuery usage for each project you own. A project is necessary to activate Google Cloud services, including BigQuery usage permission. Think of the project as the "account" in which Google will track how much processing you have already used. You don't need to add any card or payment method - BigQuery automatically starts in Sandbox mode, which allows you to use its resources without adding a payment method. Read more here.
Installing the package
To install the package in Python and command line, you can use
pip
directly from your terminal. In R, you can install directly in
RStudio or
editor of your preference.
pip install basedosdados
install.packages("basedosdados")
Requerimentos:
- Ensure your Stata is version 16+
- Ensure Python is installed on your computer.
Once the requirements are met, run the following commands:
net install basedosdados, from("https://raw.githubusercontent.com/basedosdados/mais/master/stata-package")
Configuring the package
Once you have your project, you need to configure the package to use the ID
of that project in queries to the datalake. To do this, you must use the
project_id
that Google provides for you when the
project is created.
You don't need to configure the project beforehand. As soon as you run your first query, the package will indicate the steps to configure.
Once you have the project_id
, you must pass this information to the package using the set_billing_id
function.
set_billing_id("<YOUR_PROJECT_ID>")
You need to specify the project_id
every time you use the package.
Make your first query
A simple example to start exploring the datalake is to pull information
cadastral of municipalities directly from our base of Brazilian Directories (table municipio
). To do this, we will use the
function download
, downloading the data directly to our machine.
import basedosdados as bd
bd.download(savepath="<PATH>",
dataset_id="br-bd-diretorios-brasil", table_id="municipio")
To understand more about the download
function, read the reference manual.
library("basedosdados")
query <- "SELECT * FROM `basedosdados.br_bd_diretorios_brasil.municipio`"
dir <- tempdir()
data <- download(query, "<PATH>")
To understand more about the download
function, read the reference manual.
bd_read_sql, ///
path("<PATH>") ///
query("SELECT * FROM `basedosdados.br_bd_diretorios_brasil.municipio`") ///
billing_project_id("<PROJECT_ID>")
basedosdados download "where/to/save/file" \
--billing_project_id <YOUR_PROJECT_ID> \
--query 'SELECT * FROM
`basedosdados.br_bd_diretorios_brasil.municipio`'
download
function, read the reference manual.
Tutorials
How to use the packages
We prepared tutorials presenting the main functions of each package for you to start using them.
Blog:
Vídeos:
Blog:
Vídeos:
Documentation: