[database]type = postgres
= ipeseserv3.epfl.ch
host = 12345
port = QBuildings-Suisse
database = students
username = students_project password
3 Tutorial
Now that we have a good picture of what is GBuildings, this part should help you understand how to use it.
3.1 Visualisation of the data
Most of users do not need to modify the database, but only to use the data contained. To do so, a reader user should be used.
They are several ways to visualise the data:
- QGIS: useful to have a geographical representation of the data
Open the Browser panel and connect to PostGIS by opening a new connection:
Enter the credentials:
- Pycharm: useful to have a quick access when you code
Open the Database panel and add a connection and find PostgreSQL in the Data Source list:
Enter your credentials:
- For Visual Studio Code users, follow the instructions from this extension
3.2 Accessing the data in code
For some tools, it is useful to get the data in a GeoDataframe from pandas. Two things are needed to do so:
- A .ini file that contains the credentials to access the database.
- sqlalchemy package to connect to the database.
from sqlalchemy import create_engine, MetaData, select
import configparser
def establish_connection(db, schema):
"""
:param db: Name of the database to which we want to connect - Suisse, Geneva, ...
:param schema: Name of the schema you want to connect to - Aggregated, Processed, Smoothed
"""
# Database connection
= your_own_path + "/" + db + ".ini"
file_ini
= configparser.ConfigParser()
project
project.read(file_ini)
# Database
= schema
db_schema try:
= 'postgresql+psycopg2://{}:{}@{}:{}/{}'.format(project['database']['username'],
db_engine_str 'database']['password'],
project['database']['host'],
project['database']['port'],
project['database']['database'])
project[= create_engine(db_engine_str)
db_engine = db_engine.connect()
connection print('Connected to database')
except:
print('Cannot connect to database engine')
if 'database' in project:
print('\thost: {}\n\tport: {}\n\tdatabase: {}\n\tusername: {}'.format(
'database']['host'],
project['database']['port'],
project['database']['database'],
project['database']['username']))
project[
= MetaData(bind=db_engine)
metadata if isinstance(db_schema, list):
for schema in db_schema:
=schema)
metadata.reflect(schemaelse:
=schema)
metadata.reflect(schema
return db_engine, connection, metadata
Once the tables are reflected using establish_connection(db , schema)
, queries can be done on the reflection. However, with the students user, only select queries can be done.
The most simple select query is the following one:
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
import geopandas as gpd
= "Processed" # Should be the schema you want to access
db_schema = "buildings" # The table you want to access
table_name = sa.select(tables[db_schema + '.' + table_name])
selectQuery
= gpd.read_postgis(selectQuery.compile(dialect=postgresql.dialect()), con=db_engine, geom_col='geometry').fillna(np.nan) data
It returns a GeoDataframe with all the rows from the database. However, read_postgis is slow so in case you have a large set of data, you may prefer to fetch the rows not all at once.
= sa.select(tables[db_schema + '.' + table_name])
selectQuery = connection.execute(selectQuery.compile(dialect=postgresql.dialect()))
order = True
flag = []
results while flag:
= order.fetchmany(1000)
partial_results if not partial_results:
= False
flag else:
+= partial_results
results = pd.DataFrame(results)
data = gpd.GeoSeries(to_shape(row) for row in data['geometry'])
geom = gpd.GeoDataFrame(data, geometry=geom) data
Finally, if needed a condition can be added on the row to be selected.
= 'id' # The column on which there is the selection criteria
columns_with_criteria = eval(tables[db_schema + '.' + table_name] + ".columns." + columns_with_criteria)
selection = 10 # The value on which the selection has to be made
criteria = sa.select(tables[db_schema + '.' + table_name]) \
selectQuery == criteria) .where(selection
If the selection has to be done on more than one criteria, then a join query should be done. It is highly recommended to perform join queries on keys (primary key or foreign key) to fasten the result.
Additional options on select are ORDER BY, GROUP BY and HAVING. It is also much faster to do it in SQL than using pandas.
3.3 Modifying the data in code
For most of the users, it is highly discouraged to modify the QBuildings-Suisse db, which is the one used by everybody. Most likely, the easiest thing to do is to create a database for the area on which you are going to work. To do so, the easiest way is to select the data from QBuildings-Suisse that cross the area and insert them into a new db.
The first step is to create your own database. This should be done by a user that has the global access to the databases server. In practice, for the moment, this means contact Joseph Loustau on Mattermost. The following information should be provided:
- username
- password if you have a preferred one (otherwise, one would be generated for you)
- Name of the database
- Any additional fields or tables
You will have a personal user with full grants to your own db and a db with the same tables as the ones in QBuildings-Suisse but empty. To fill it, either the right data from Suisse should be extracted, either the QBuildings repository is used.
3.4 Extract data from QBuildings-Suisse
The copy_by_mask function will copy the data from Suisse into your db, according to the geographical mask used as a filter.
import psycopg2
from geoalchemy2 import Geometry
from sqlalchemy.dialects import postgresql
import geopandas as gpd
import pandas as pd
from timeit import default_timer as timer
from geoalchemy2.shape import to_shape
def copy_by_mask(db_from, db_to, schema_to_copy, mask):
"""
:param db_from: dictionary with the information of db from which the data have to be imported
:param db_to: dictionary with the information of db from which the data have to be exported
:param mask: a geopackage that delimits the area of interest
:return:
"""
def insert_query(db_to, t, selection_query, nrows):
= db_to['engine'].connect()
connect_to = connection.execute(selection_query.compile(dialect=postgresql.dialect()))
order = connection.execute(nrows.compile(dialect=postgresql.dialect())).fetchone().count
nrows = 0
flag = divmod(nrows, 100000)
divider, remain while flag < divider:
= order.fetchmany(100000)
partial_results if not partial_results:
= divider + 1
flag else:
= pd.DataFrame(partial_results)
df_inter = gpd.GeoSeries(to_shape(row) for row in df_inter['geometry'])
geom = gpd.GeoDataFrame(df_inter, geometry=geom, crs='EPSG:2056') # TODO: get the CRS from SQL
df_inter =re.split("\.", t)[1], if_exists='append', schema=re.split("\.", t)[0], con=db_to['engine'])
df_inter.to_postgis(name= order.fetchmany(remain)
partial_results if partial_results:
= pd.DataFrame(partial_results)
df_inter = gpd.GeoSeries(to_shape(row) for row in df_inter['geometry'])
geom = gpd.GeoDataFrame(df_inter, geometry=geom, crs='EPSG:2056')
df_inter =re.split("\.", t)[1], if_exists='append', schema=re.split("\.", t)[0], con=db_to['engine'])
df_inter.to_postgis(name
connect_to.close()return
= gpd.read_file(filename=mask)
mask = db_from['metadata'].tables
tables
for schema in schema_to_copy:
= db_from['engine'].connect()
connection # Select transformers
print('\nSelect transformers from ' + schema + ' by intersection\n\n')
= mask['geometry'].buffer(-150).to_wkb().tolist()[0]
geom = tables[schema + '.' + 'transformers']
df_transformer = select(df_transformer).where(df_transformer.c.geometry.st_intersects(geom))
transfoQuery = select(func.count(df_transformer.c.id)).where(df_transformer.c.geometry.st_intersects(geom))
nrows = transfoQuery.cte()
ctequery + '.' + 'transformers', transfoQuery, nrows)
insert_query(db_to, schema
# Select buildings
= timer()
start print('Select buildings corresponding to the selected transformers')
= tables[schema + '.' + 'buildings']. \
j + '.' + 'buildings'].c.transformer == ctequery.c.id)
join(ctequery, tables[schema = tables[schema + '.' + 'buildings']
df_building = select([df_building]).select_from(j)
buildingQuery = select(func.count(df_building.c.id_building)).select_from(j)
nrows = select([df_building.c.id_building]).select_from(j).cte()
ctequery_build + '.' + 'buildings', buildingQuery, nrows)
insert_query(db_to, schema
= timer()
end print('Operation took', end - start)
# Select tables
if schema == 'Aggregated':
= ['buildings_RegBL', 'roofs', 'facades']
tables_to_copy else:
= ['roofs', 'facades']
tables_to_copy for t in tables_to_copy:
print('Select', t, 'according to selected buildings')
= timer()
start = tables[schema + '.' + t].join(ctequery_build, tables[schema + '.' + t].c.id_building
j_build == ctequery_build.c.id_building)
= select([tables[schema + '.' + t]]).select_from(j_build)
sqlQuery = select(func.count(tables[schema + '.' + t].c.id_building)).select_from(j_build)
nrows + '.' + t, sqlQuery, nrows)
insert_query(db_to, schema = timer()
end print('Operation took', end - start, 'for table', t)
connection.close()
return print('Data copied')
if __name__ == '__main__':
= ['Aggregated', 'Processed', 'Smoothed']
schemas = establish_connection("Suisse", schemas)
db_engine_suisse, metadata_suisse = establish_connection("your_db", schemas)
db_engine_db, metadata_db
= {'engine': db_engine_suisse,
db_from 'metadata': metadata_suisse
}= {'engine': db_engine_db,
db_to 'metadata': metadata_db
}= 'your_perimeter.gpkg'
mask copy_by_mask(db_from, db_to, schemas, mask)
However, Suisse still has some issues and some data are incorrect (pay attention to roof_solar_area_m2 and capita_cap particularly). Those errors are caused by wrong intersections at big scale that may not happen at smaller scale. To generate new data on your area of interest, use the QBuildings repository.
3.4.1 QBuildings repository
The python repository (5 in Figure 1.1) is the repository used to generate the data into the db. The description of what it contains can be found in Section (gbuildings-repository?).
To run it, the user should follow these steps:
Go to the share drive IPESESERV5
- If you are on Windows: \\ipeseserv5\Desktop Share\IPESE-projects\QBuildings in the Windows explorer
- If you are on Mac: Finder > Go > Connect to server > smb://ipeseserv5 > Connect > Desktop Share > IPESE-projects > QBuildings
- Create a folder with the same structure than the one from Suisse,
- Copy the database.ini file that can be retrieve from Suisse model,
- Change the name of the input layers with the corresponding ones from your region. If they are not available, you can pass the national ones but the correct mask of the region of interest. However, this will take longer to compute.
Go to the python repository, open QBuildings.py and pick the options
- db: Name of the db you want to modify
- export_format: Can be among [‘db’, ‘csv’, ‘gpkg’, ‘shp’, ‘geojson’, ‘xls’]
- replace: Erase the existing db
- with_mask: if your area of interest contains more than 10’000 buildings, put True
- export_warning: if True, export the buildings for which an error occurred
- layer: Layer to run among [‘Aggregated’, ‘Processed’, ‘Smoothed’]
- tables: Tables to modify among [‘buildings’, ‘roofs’, ‘facades’, ‘transformers’] , [‘all’] to take them all at once