Skip to main content

Merge data from multiple data sources

When working with data from multiple data sources, you may need to combine the data into a single data structure. This guide walks you through some basic approaches using JavaScript or Python in order to accomplish this.

Sample data

In this example, the sample data represents three different steps pulling data from three different sources in the same backend API. The output of each step is a list of objects, where each object contains an id key that is present in all step outputs. You can use the following snippets to output the data from the corresponding language steps to get an understanding of how language steps can merge data.

Step 1

return [
{'id': 1, 'name': 'John', 'age': 30},
{'id': 2, 'name': 'Alice', 'age': 25},
{'id': 3, 'name': 'Bob', 'age': 35}
]

Step 2

return [
{'id': 1, 'occupation': 'Engineer'},
{'id': 2, 'occupation': 'Doctor'},
{'id': 3, 'occupation': 'Teacher'}
]

Step 3

return [
{'id': 1, 'city': 'New York', 'country': 'USA'},
{'id': 2, 'city': 'London', 'country': 'UK'},
{'id': 4, 'city': 'Paris', 'country': 'France'}
]

Merge data using code

The following code snippets can be used as a starting point to merge data from your data sources.

# Import libraries
import pandas as pd
import json

# Convert sample data sources into Dataframes
df1 = pd.DataFrame.from_dict(Step1.output)
df2 = pd.DataFrame.from_dict(Step2.output)
df3 = pd.DataFrame.from_dict(Step3.output)

# Merge data (ex: df1.merge(df_to_be_merged, on=("<COMMON_FIELD>"), how="<TYPE_OF_MERGE>"))
# Pandas docs: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html
merged_data_df = df1.merge(df2, on=("id"), how="inner" )
merged_data_df = merged_data_df.merge(df3, on=("id"), how = "inner")

# Return data to the frontend as JSON object
return json.loads(merged_data_df.to_json(orient='records'))