• jared@discsanddata.com
  • Siem Reap, Cambodia
Beginning Python
Making Pandasql Happen in Jupyter

Making Pandasql Happen in Jupyter

(Standard warning in all my posts. I’m a beginner. Nothing below should be taken as the best way of doing things, a good way of doing things, or even a right way of doing things. If you think you know better than me, you probably do. Please comment with better practices)

So, I was working on my first week’s assignment for the capstone project for the Learn SQL Basics for Data Science specialization from UC-Davis (via Coursera), when I encountered a small problem. I was using a Jupyter notebook to work with a dataset that contained data on olympic athletes. Jupyter runs in Python, so we were going to use pandas dataframes. To query the data with SQL, we just need to import sqldf from pandasql. That’s what the instructor did. Easy. In the notebook went…

from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

The Jupyter troll responds…

ModuleNotFoundError: No Module name 'pandasql'

Um, I just said import. It’s there. Anyway, it’s not. Luckily, I’m really smart and know how to install things using the command line.

pip install pandasql

Hurrah! Pandasql installed! Back to the notebook…

from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
ModuleNotFoundError: No Module name 'pandasql'

Nope. Installing it in the terminal didn’t work. Help me, internet help me!

Let’s see what stackoverflow says. Generally, the answers there are beyond my level, but I find it helpful sometimes. Discussions like the short one at https://stackoverflow.com/questions/58944642/modulenotfounderror-no-module-name-pandasql showed me that I can install modules directly into Jupyter. Cool, let’s try that.

!pip install pandasql
from pandasql import sqldf
import pandas as pd
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
ModuleNotFoundError: No Module name 'pandasql'

Ok. That didn’t work. But sometimes, just sometimes, there might be some documentation that might help. https://anaconda.org/anaconda/pandasql says I should try…

conda install -c anaconda pandasql

Cool. Let’s see how that works when it goes right into the notebook…

!conda install -c anaconda pandasql
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

It’s working! But then the output ends with…

Proceed ([y]/n)? 

Um, what do I do now? This question came at the end of the output. I can’t click it. Typing ‘y’ or ‘yes’ or ‘please yes for dear god YES!’ in the next cell didn’t work. Is this where it ends? No more data engineering for me? Come on internet… one last time…

Thank you https://stackoverflow.com/questions/39826250/conda-stuck-on-proceed-y-n-when-updating-packages-in-ipython-console. So it seems I can answer questions like this before they’re asked. Cool. Instead of just

conda install -c anaconda pandasql

I go with

!conda install -c anaconda pandasql -y

That ‘-y’ is my answer for when the pandasql fairy pops the question…

!conda install -c anaconda pandasql -y
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

Then it loaded up! I can play with my data! I might pass my SQL for idiots specialization! It’s such a small achievement, but a win’s a win.

If you want to see what I’m doing with pandasql, check out my project.

3 thoughts on “Making Pandasql Happen in Jupyter

Leave a Reply to SuperSquirrelGirl Cancel reply

Your email address will not be published. Required fields are marked *