Merge data from multiple data sources
When working with data from multiple data sources, you may need to combine the data into a single data structure. This guide walks you through some basic approaches using JavaScript or Python in order to accomplish this.
Sample data
In this example, the sample data represents three different steps pulling data from three different sources in the same backend API. The output of each step is a list of objects, where each object contains an id
key that is present in all step outputs. You can use the following snippets to output the data from the corresponding language steps to get an understanding of how language steps can merge data.
- Python
- JavaScript
Step 1
return [
{'id': 1, 'name': 'John', 'age': 30},
{'id': 2, 'name': 'Alice', 'age': 25},
{'id': 3, 'name': 'Bob', 'age': 35}
]
Step 2
return [
{'id': 1, 'occupation': 'Engineer'},
{'id': 2, 'occupation': 'Doctor'},
{'id': 3, 'occupation': 'Teacher'}
]
Step 3
return [
{'id': 1, 'city': 'New York', 'country': 'USA'},
{'id': 2, 'city': 'London', 'country': 'UK'},
{'id': 4, 'city': 'Paris', 'country': 'France'}
]
Step 1
return [
{ id: 1, name: "John", age: 30 },
{ id: 2, name: "Alice", age: 25 },
{ id: 3, name: "Bob", age: 35 },
];
Step 2
return [
{ id: 1, occupation: "Engineer" },
{ id: 2, occupation: "Doctor" },
{ id: 3, occupation: "Teacher" },
];
Step 3
return [
{ id: 1, city: "New York", country: "USA" },
{ id: 2, city: "London", country: "UK" },
{ id: 4, city: "Paris", country: "France" },
];
Merge data using code
The following code snippets can be used as a starting point to merge data from your data sources.
- Python
- JavaScript
# Import libraries
import pandas as pd
import json
# Convert sample data sources into Dataframes
df1 = pd.DataFrame.from_dict(Step1.output)
df2 = pd.DataFrame.from_dict(Step2.output)
df3 = pd.DataFrame.from_dict(Step3.output)
# Merge data (ex: df1.merge(df_to_be_merged, on=("<COMMON_FIELD>"), how="<TYPE_OF_MERGE>"))
# Pandas docs: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html
merged_data_df = df1.merge(df2, on=("id"), how="inner" )
merged_data_df = merged_data_df.merge(df3, on=("id"), how = "inner")
# Return data to the frontend as JSON object
return json.loads(merged_data_df.to_json(orient='records'))
// Sample data sources
const data1 = Step1.output;
const data2 = Step2.output;
const data3 = Step3.output;
// Initialize empty array
const mergedData = [];
// Iterate over the first data source
data1.forEach((item1) => {
// Find matching items in other data sources based on the ID field
const matchingItem2 = data2.find((item2) => item2.id === item1.id);
const matchingItem3 = data3.find((item3) => item3.id === item1.id);
// Merge data from all sources if IDs match
if (matchingItem2 && matchingItem3) {
const mergedItem = { ...item1, ...matchingItem2, ...matchingItem3 };
mergedData.push(mergedItem);
}
});
// Return merged data to frontend
return mergedData;