Skip to content

Commit

Permalink
More definitions
Browse files Browse the repository at this point in the history
  • Loading branch information
MrMimic committed Feb 13, 2024
1 parent 30bb5f5 commit 65a9ef4
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 2 deletions.
34 changes: 33 additions & 1 deletion 10_Toolbox/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,36 +2,68 @@

## 1_ MS Excel with Analysis toolpack

Microsoft Excel is a spreadsheet program included in the Microsoft Office suite of applications. The Analysis ToolPak is an Excel add-in program that provides data analysis tools for financial, statistical and engineering data analysis.

## 2_ Java, Python

Java and Python are high-level programming languages. Java is a general-purpose programming language that is class-based, object-oriented, and designed to have as few implementation dependencies as possible. Python is an interpreted, high-level, general-purpose programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java.

## 3_ R, Rstudio, Rattle

R is a programming language and free software environment for statistical computing and graphics. RStudio is an integrated development environment for R, a programming language for statistical computing and graphics. Rattle is a graphical data mining application built upon the open source statistical language R.

## 4_ Weka, Knime, RapidMiner

Weka, Knime, and RapidMiner are data mining tools. Weka is a collection of machine learning algorithms for data mining tasks. KNIME is a free and open-source data analytics, reporting and integration platform. RapidMiner is a data science software platform developed by the company of the same name.

## 5_ Hadoop dist of choice

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

## 6_ Spark, Storm

## 7_ Flume, Scibe, Chukwa
Spark and Storm are big data processing tools. Apache Spark is an open-source distributed general-purpose cluster-computing framework. Apache Storm is a free and open source distributed realtime computation system.

## 7_ Flume, Scribe, Chukwa

Flume, Scribe, and Chukwa are tools for managing large amounts of log data. Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Scribe is a server for aggregating log data streamed in real time from many servers. Apache Chukwa is an open source data collection system for monitoring large distributed systems.

## 8_ Nutch, Talend, Scraperwiki

Nutch, Talend, and Scraperwiki are data scraping tools. Apache Nutch is a highly extensible and scalable open source web crawler software project. Talend is an open source software integration platform/vendor. ScraperWiki is a web-based platform for collaboratively building programs to process and analyze data.

## 9_ Webscraper, Flume, Sqoop

Webscraper, Flume, and Sqoop are tools for data ingestion. Web Scraper is a tool for extracting information from websites. Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

## 10_ tm, RWeka, NLTK

tm, RWeka, and NLTK are text mining tools. tm is a text mining framework for R. RWeka provides R programming language interface to Weka. NLTK is a leading platform for building Python programs to work with human language data.

## 11_ RHIPE

RHIPE is an R library that provides a way to use Hadoop's map-reduce functionality with R.

## 12_ D3.js, ggplot2, Shiny

D3.js, ggplot2, and Shiny are visualization tools. D3.js is a JavaScript library for producing dynamic, interactive data visualizations in web browsers. ggplot2 is a data visualization package for the statistical programming language R. Shiny is an R package that makes it easy to build interactive web apps straight from R.

## 13_ IBM Languageware

IBM LanguageWare is a technology that helps you to understand, analyze and interpret the content of your text.

## 14_ Cassandra, MongoDB

Cassandra and MongoDB are NoSQL databases. Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system. MongoDB is a source-available cross-platform document-oriented database program.

## 13_ Microsoft Azure, AWS, Google Cloud

Microsoft Azure, AWS, and Google Cloud are cloud computing services. They provide a range of cloud services, including those for computing, analytics, storage and networking. Users can pick and choose from these services to develop and scale new applications, or run existing applications, in the public cloud.

## 14_ Microsoft Cognitive API

Microsoft Cognitive Services (formerly Project Oxford) are a set of APIs, SDKs and services available to developers to make their applications more intelligent, engaging and discoverable.

## 15_ Tensorflow

<https://www.tensorflow.org/>
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ I just found this data science skills roadmap, drew by [Swami Chandrasekaran](ht

Jobs linked to __data science__ are becoming __more and more popular__. A __bunch of tutorials__ could easily complete this roadmap, helping whoever wants to __start learning stuff about data science__.

For the moment, a lot is __got on wikipedia__ (except for codes, always handmade). Any help's thus welcome!
For the moment, a lot is __got on wikipedia or generated by LLMs__ (except for codes, always handmade). Any help's thus welcome!

## Rules

Expand Down

0 comments on commit 65a9ef4

Please sign in to comment.