Web Data Extraction Specialist (i.e. Data Scrape Specialist)
Summary
Competitive Analytics is searching for talented data scientists who are passionate about the acquisition of data with strong skills and knowledge of web scraping, web services, file transfers, and everything data. This new role is a dedicated data scientist who will be assisting with designing and developing the tools, processes and infrastructure to extract large volumes of structured and unstructured data from a variety of database sources, primarily focusing on web data extraction AKA data scraping. At Competitive Analytics we believe data scraping is both an art and science, requiring innovative methodologies, deft programming skills, scientific ingenuity, and keen mathematical expertise. Our goal of providing clients with value-added data scraping also requires what we call creative analytics. Creative analytics requires the analyst to have the innate expertise and intuitive skill needed to transform raw data, decipher complex relationships, develop innovative algorithms, and design meaningful visualizations so that decision makers can truly make faster and better decisions in order to drive and sustain competitive advantage.
Required Submittals for Consideration
Cover letter
Resume
Click Here to complete our quick 5 minute interview questionnaire
Complete our Data Extraction Test (see bottom of page)
Primary Responsibilities
Gather and process both structured and unstructured data from external (scraping, APIs) and internal sources and prepare it for analyses
Design and develop a variety of tools and infrastructure to automate the extraction of publicly available and private information (writing web scrapers, calling third party APIs, creating SQL queries, etc.)
Create tools and processes to download data, parse it for relevant content, and store it in existing data management systems
Design and develop scalable, efficient, and robust internal data management systems
Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, writing applications, etc.)
Process unstructured data into a form suitable for analysis, utilizing custom applications and modern ETL/ELT
Support business decisions with ad hoc analysis as needed
Work closely with economists, data scientists, and machine learning experts to support both client-facing and internal projects
Set-up Linux Server
Set-up Proxies
Set-up data transfer to CA Server
Recruit off-shore scraper talent
Manage off-shore scraper talent
Education and Experience
B.S. degree in computer science, statistics, or other quantitative field and/or economics
Skills and Knowledge
Experience with SQL development, creating and administering databases, integrating multiple data sources, and performing ETL processes in tools such as MySQL, postgres, or Oracle
Knowledge in data mining, machine learning, natural language processing, or information retrieval
Understanding of distributed computing principles
Big Data experience with Hadoop (Hive/Pig/Impala/Spark) or Greenplum (postgres/madlib) is not required, but is a plus
Database and data warehousing experience, both in RDBMS and NoSQL environments.
Knowledge in MS SQL Server, PostgreSQL, Redshift, Couchbase is not required, but is a plus
Experience with Alteryx and/or Tableau is not required, but is a plus
Familiarity and/or Expertise in the Following is Preferred
Delphi
HTML
PHP
SQL
SQLite Programming
Crawler
Spyder
Perl
MySQL Administration
C#
C++
XML
AWS
Unix
Python
JAVA
Ajax
JQuery
R
SPSS
SAS
Alteryx
Tableau
Compensation
Competitive Analytics offers highly competitive compensation (full time, part time, or contract) based on experience, talent, skill, expertise, knowledge, proven capabilities, and potential capabilities.
Internship Opportunities
If you do not meet the experience and/or expertise required for this position yet are still highly motivated and passionate about this position, Competitive Analytics offers internship opportunities on a case by case basis. If interested, please inquire about our paid and unpaid internship programs.
About Competitive Analytics
From Fortune 100 companies to SMBs spanning myriad industries, Competitive Analytics helps the worlds most successful companies analyze their data and the competitive forces affecting them; with a customized business intelligence approach that addresses the challenges unique to each organization. Since our founding in January 2000, Competitive Analytics literally delivers “competitive analytics”. Competitive Analytics is an innovative, high-tech, and dynamic working environment where analysts work on a variety of advanced analytics, challenging client projects, and innovative business intelligence initiatives . . . across all industries and spanning the entire business intelligence process. For more information, please visit www.competitiveanalytics.com.
Data Extraction Test
The following three exercises were developed to gauge your level of web scraping expertise. Please click here to download a PDF which outlines three web scraping exercises. Thank you in advance for your time and we look forward to hearing from you and reviewing your exercises.