Best way to transform data values in Azure Data Factory

Best way to transform data values in Azure Data Factory - azure

What is the best way to replace specific values using a Azure Data factory?
The case, for example: Need to lead to a single value brand "ssang yong" and model "ceed" for brand "kia".
Data source:
{
id: 1
brand: "ssang yong",
model: "rexton"
},
{
id: 2
brand: "ssang_yong",
model: "rexton"
},
{
id: 3
brand: "ssangyong",
model: "rexton"
},
{
id: 4
brand: "kia",
model: "ceed"
},
{
id: 5
brand: "kia",
model: "c'eed"
}
Pattern:
{
target: "brand",
common_value: "ssang yong",
condition: {
brand: ["ssang-yong", "ssangyong"]
}
},
{
target: "model",
common_value: "ceed",
condition: {
brand: ["kia"],
model: ["c'eed"]
}
}

ADF is mostly used to move data from one place to another and to manage ELT process.
So my use case in this scenario would be:
1) copy raw data with ADF to ADLS from sources
2) perform transformations with Azure data lake analytics and save output to the new file
3) import file into power bi (if you do not have analysis service to create tabular model)

Related

Meta data keeps showing as "## Build Setup" for every page in Vuejs/Nuxt?

I've added individual meta to every page following the Nuxt documentation but whenever I share my links on social media, the meta just show this '## build setup'. Another issue is the same metadata is showing for every page. I read you need to put "hids" to have individual page meta but nothing seems to be working?
Index Meta:
<script>
export default {
head: {
title: 'Animal Crossing Portal | The Best Tier Lists for Animal Crossing',
meta: [
{ property: 'og:description', hid: 'og:description', name: 'og:description', content: 'Vote monthly in Animal Crossing Tier Lists for New Horizons & Pocket Camp! Including Villager Tier Lists, Sanrio, Gyroids & more at Animal Crossing Portal!' },
{ name: 'twitter:title', hid: 'twitter:title', content: 'Animal Crossing Portal | The Best Tier Lists for Animal Crossing' },
{ name: 'twitter:description', hid: 'twitter:description', content: 'Vote monthly in Animal Crossing Tier Lists for New Horizons & Pocket Camp! Including Villager Tier Lists, Sanrio, Gyroids & more at Animal Crossing Portal!' },
{ name: 'twitter:card', hid: 'twitter:card', content: 'summary_large_image' },
{ name: 'twitter:image:src', hid: 'twitter:image:src', content: 'https://www.animalcrossingportal.com/images/meta.jpg' },
{ property: 'og:title', hid: 'og:title', name: 'og:title', content: 'Animal Crossing Portal | The Best Tier Lists for Animal Crossing' },
{ property: 'og:type', hid: 'og:type', content: 'website' },
{ property: 'og:site_name', hid: 'og:site_name', content: 'Animal Crossing Portal' },
{ property: 'og:url', hid: 'og:url', content: 'https://www.animalcrossingportal.com/' },
{ property: 'og:image', hid: 'og:image', content: 'https://www.animalcrossingportal.com/images/meta.jpg' }
],
link: [
{
rel: 'canonical',
href: 'https://www.animalcrossingportal.com/'
}
]
}
}
</script>
My nuxt.config.js file has:
head: {
meta: [
{ name: 'viewport', content: 'width=device-width, initial-scale=1' }
],
link: [
{ rel: 'icon', type: 'image/x-icon', href: '/favicon.ico' }
]
}

The meta was actually lying inside of a README.md file, removing it from there fixed OP's issue!

I thought it was the readme file, but after a few hours of digging (issue resurfaced), turns out it was actually an empty build tag in the nuxt config file which was making Nuxt try SSR. Removing the empty tag 100% fixed it.

I was running into the same issue recently. After having a look into the source code view of the rendered page (Ctrl + U in the browser), a quick search yielded that the meta-field og:description was not populated correctly. Correction of that was straightforward.
Thus, a look on your rendered page source code might reveal the missing/wrongly populated field (e.g. "description").

Merging of API response data

I am currently working on a React.js full stack application with Express back-end. I had a question regarding a design decision for the API calls. I have 3 APIs at the moment
GET /airports/
{
"total_count":269,
"items":[
{
"airport_code":"ABJ",
"city":"ABJ",
"country":"CI",
"name":"Port Bouet Airport",
"city_name":"Abidjan",
"country_name":"Cote d'Ivoire",
"lat":5.261390209,
"lon":-3.926290035,
"alt":21,
"utc_offset":0.0
},
{
"airport_code":"ABV",
"city":"ABV",
"country":"NG",
"name":"Nnamdi Azikiwe International Airport",
"city_name":"Abuja",
"country_name":"Nigeria",
"lat":9.006790161,
"lon":7.263169765,
"alt":1123,
"utc_offset":1.0
},
........
]
}
GET /airports/{airport_code}
GET /flights/
{
"total_count": 898,
"items": [
{
"flight_number": "ZG6304",
"aircraft_registration": "ZGAJG",
"departure_airport": "BAH",
"arrival_airport": "LHR",
"scheduled_departure_time": "2020-01-01T20:50:00",
"scheduled_takeoff_time": "2020-01-01T21:00:00",
"scheduled_landing_time": "2020-01-02T03:00:00",
"scheduled_arrival_time": "2020-01-02T03:10:00"
},
{
"flight_number": "ZG6311",
"aircraft_registration": "ZGAJH",
"departure_airport": "CDG",
"arrival_airport": "FRA",
"scheduled_departure_time": "2020-01-01T06:45:00",
"scheduled_takeoff_time": "2020-01-01T06:55:00",
"scheduled_landing_time": "2020-01-01T07:50:00",
"scheduled_arrival_time": "2020-01-01T08:00:00"
},
........
]
}
I am working on building an airport arrivals and departures web application using the above data. My idea was to try and combine the data of /fligts/ and /airports/ API call based on departure_airport and arrival_airport to be able to have more information inside a single array such as information about the city_name, lat, long etc. to visualize the data. I wanted to know a good approach for solving this issue keeping in mind the computational overhead of filtering and merging large sets of data. I looked into using RxJS but I have not worked with it before to be sure if it would provide a good solution

I recommend to convert the airports array to an object. After you can access the airports by keys.
const airports = {
total_count: 269,
items: [
{
airport_code: 'ABJ',
city: 'ABJ',
country: 'CI',
name: 'Port Bouet Airport',
city_name: 'Abidjan',
country_name: "Cote d'Ivoire",
lat: 5.261390209,
lon: -3.926290035,
alt: 21,
utc_offset: 0.0,
},
{
airport_code: 'ABV',
city: 'ABV',
country: 'NG',
name: 'Nnamdi Azikiwe International Airport',
city_name: 'Abuja',
country_name: 'Nigeria',
lat: 9.006790161,
lon: 7.263169765,
alt: 1123,
utc_offset: 1.0,
},
],
};
const mappedAirports = airports.items.reduce(
(result, airport) =>
(result = { ...result, [airport.airport_code]: airport }),
{}
);
console.log(mappedAirports);
Output:
{"ABJ":{"airport_code":"ABJ","city":"ABJ","country":"CI","name":"Port Bouet Airport","city_name":"Abidjan","country_name":"Cote d'Ivoire","lat":5.261390209,"lon":-3.926290035,"alt":21,"utc_offset":0},"ABV":{"airport_code":"ABV","city":"ABV","country":"NG","name":"Nnamdi Azikiwe International Airport","city_name":"Abuja","country_name":"Nigeria","lat":9.006790161,"lon":7.263169765,"alt":1123,"utc_offset":1}}

Azure cosmos db error in data factory - data flow sink -> job failed due to reason: Conversion from StructType

I'm building a simple data flow in azure data factory to get some specific data from a content hub location. This information is in json format.
Transformations:
Source: rest API get method to retrieve the data from the URL
Transformation 1: flatten to put into rows an item list that contains all the articles.
Transformation 2: select to chose specific attributes from each items in the list.
Transformation 3: alter row to upsert data if condition true()
Sink: using cosmosdb dataset to load the selected data into a collection.
The problem is with the last item elements since this is a StructType {}:
elements: {
headline: {
title: "Title",
dataType: "string",
name: "headline",
variations: { },
multiValue: false,
:type: "string",
},
alternativeHeadline: {
title: "Subtitle",
dataType: "string",
name: "alternativeHeadline",
variations: { },
multiValue: false,
:type: "string",
},
author: {
title: "Author",
dataType: "string",
name: "author",
variations: { },
multiValue: false,
:type: "string",
},
...
}
When I run the job I'm getting this error:
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: Conversion from StructType(StructField(headline,StructType(StructField(:type,StringType,true), StructField(dataType,StringType,true), StructField(multiValue,BooleanType,true), StructField(name,StringType,true), StructField(title,StringType,true), StructField(value,StringType,true), StructField(variations,StructType(StructField(mobile,StructType(StructField(:type,StringType,true), StructField(dataType,StringType,true), StructField(multiValue,BooleanType,true), StructField(name,StringType,true), StructField(title,StringType,true), StructField(value,StringType,true)),true), StructField(spanish,StructType(StructField(:type,StringType,true), StructField(dataType,StringType,true), StructField(multiValue,BooleanType,true), StructField(name,StringType,true), StructField(title,StringType,true), StructField(value,StringType,true)),true)),true)),true), StructField(icon,StructType(StructField(:type,StringType,true), StructField(dataType,StringType,true), StructField(multiValue,BooleanType,true),","Details":""}
It seems one ore more data struct types are wrong from the source vs the target but I would like this to be dynamic. Since it is some metadata, the attribute can or can't be there.
I tried using a copy activity directly in a pipeline using the same dataset and it worked fine the problem is I'll need to transform the data later and the copy activity is limited in this aspect. Any thoughts?

I found the root of the problem. In the dataset, I imported the schema, once I cleared it, the struct data type verification stopped in the data-flow. The load ran successfully.

What is the best way to store constant strings in MongoDB?

In my nodejs app I have collection - football_players. Docs in this collection have information about role of the player. For example:
[
{
id: 'some_id_1',
name: 'Player1',
role: 'FORWARD',
},
{
id: 'some_id_2',
name: 'Player2',
role: 'DEFENDER',
},
{
id: 'some_id_3',
name: 'Player3',
role: 'FORWARD',
},
]
All types of role are constants. What is the best way to store it in MongoDB:
as strings (like in my example)
make declaration:
const roles = { FORWARD: 1, DEFENDER: 2 };
store as numbers and use this declaration in my nodejs app? Like: player.role === roles.FORWARD ? 'Great!' : 'He cann\'t score a goal'; ?
Is there any performance reason to use second way? Are there any other reasons to use first way?

Elasticsearch 6.2 - Completion Suggester for long texts

I want to be able to search and suggest through long texts.
Below is my input string:
Clinical Support Specialist Medical Staff
If I search for clin or supp or spe or med or st it should give the results as the above string.
Also searches could be like clinical sup or specialist medi
Below is the mappings I create for the field:
description: {
type: 'completion',
analyzer: 'simple',
preserve_separators: true,
preserve_position_increments: true,
contexts: {
name: 'company',
type: 'category',
path: 'company',
}
}
And below is the search body:
descSuggestor: {
prefix: searchTerm,
completion: {
field: 'description'
}
}

Your question does not specify the elastic search version, or the environment you are trying to write your search query. However, you would be able to do that with regular expression in Kibana. For example, in the Dev tools of Kibana, you could write something like:
GET utilization_aggregation_2018/_search
{
"query": {
"regexp" : {"name": "supp.*"}
}
}
Hope this helps!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Best way to transform data values in Azure Data Factory - azure

Related

Meta data keeps showing as "## Build Setup" for every page in Vuejs/Nuxt?

Merging of API response data

Azure cosmos db error in data factory - data flow sink -> job failed due to reason: Conversion from StructType

What is the best way to store constant strings in MongoDB?

Elasticsearch 6.2 - Completion Suggester for long texts

Categories

Resources