dataframe_sql
is a Python package that translates SQL syntax into operations on
pandas DataFrames, a functionality which is not available in the central pandas
package.
pip install dataframe_sql
In this simple example, a DataFrame is read in from a csv and then using the query function you can produce a new DataFrame from the sql query.
from pandas import read_csv
from dataframe_sql import register_temp_table, query
my_table = read_csv("some_file.csv")
register_temp_table(my_table, "my_table")
query("""select * from my_table""")
The package currently only supports pandas but there are plans to support dask, rapids, and modin in the future.
The SQL syntax for dataframe_sql
is exactly the same as the syntax in
sql_to_ibis
, its underlying package.
You can find the full SQL syntax here
While there are other packages that accomplish the goal of using SQL with pandas
DataFrames, all other packages such as pandasql actually use a database on the
backend which defeats the purpose of using pandas to begin with. In the case of
pandasql which uses SQLite, this can result in major performance bottlenecks.
dataframe_sql
actually performs native pandas operations in memory on DataFrames,
which avoids conflicts that may arise from using external databases.