Convert Databricks DBC notebook format to ipynb

DBC files are difficult to work with. Here's the fast way to convert them to ipynb files

Databricks natively stores it’s notebook files by default as DBC files, a closed, binary format. A .dbc file has a nice benefit of being self-contained. One dbc file can consist of an entire folder of notebooks and supporting files. But other than that, dbc files are frankly obnoxious.

I can’t view them outside of the Databricks notebook/workspace UI
The native nbviewer in github doesn’t recognize them (nbviewer is what allows github to view ipynb files (Jupyter notebook files))
they are binary (that makes git diffs impossible)
- ipynb files are json and I posit that they actually aren’t human-readable, but I can at least do a modicum of diffing/merging if I needed to.

Sidenote:

The git integration in the Databricks UI is passable, but lacking. One example, each notebook must be saved as a separate commit even though any given feature/bug may span multiple notebooks.

So, what’s the solution?

I’ve never seen this published before, but I was poking around the Databricks CLI and noticed it can actually do all of this for you. Here’s a couple of sample scripts that demonstrate some methods of doing notebook lifecycle using the CLI:

pip3 install --upgrade databricks-cli
databricks --version

# let's use a personal access token
# databricks|user settings|access tokens
# https://eastus2.azuredatabricks.net

databricks configure --token

databricks workspace -h
databricks workspace ls /Users/davew@microsoft.com
databricks workspace ls /Users/davew@microsoft.com/OpenHack
# default format is SOURCE, also the only(???) format for export_dir
databricks workspace export_dir /Users/davew@microsoft.com/OpenHack .
databricks workspace export --format JUPYTER /Users/davew@microsoft.com/OpenHack/01_useful_functions .

All we really need to do is download our notebooks with a little sh script and then call git commit, etc, as needed.
Quite cool!

Feel free to contact me for more information. ›

Are you convinced your data or cloud project will be a success?

Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.

Thanks for reading. If you found this interesting please subscribe to my blog.

Dave Wentzel 2019-12-03 CONTENT
azure data lake data science

So, what’s the solution?

Related Posts