Cosine Similarity Spark - 2019-06-18 08:57:30

Cosine similarity between a static vector and each vector in a Spark data frame Ever want to calculate the cosine similarity between a static vector in Spark and each vector in a Spark data frame? Probably not, as this is an absurdly niche problem to solve but, if you ever have, here’s how to do it using spark.sql and a UDF. # imports we'll need import numpy as np from pyspark.

Spark + s3a:// = ❤️ - 2019-06-18 09:31:24

Typically our data science AWS workflows follow this sequence: Turn on EC2. Copy data from S3 via awscli to local machine file system. Code references local data via /path/to/data/. ??? Profit. However, if the data you need to reference is relatively small or you’re only passing over the data once, you can use s3a:// and stream the data direct from S3 into your code. Say we have this script as visits_by_day.