Executive Summary
Key Updates
Main Focus
The main focus of the month have been:
- Building dashboard for Status Application Analytics
- Building dashboard for Keycard
- Building RAG to improve IFT LLM
Visualization
- Status Application Analytics: Build Dashabord to identify user retention.
- Github Analytics - Have an idea of the contribution.
- Keycard Analytics
- Discord Visualization
Data Extraction
- IFT LLM, documentation extraction: IFT blogs, IFT Notion and contributer website.
- Status App - App Store data extraction
- Extraction of Umami data.
IFT LLM
- Ingest extracted Data
- Chunking the data extracting
- Embedding the data.
Infrastructure
- Data Lake: Finish deploying the environment
- DataHub: Start testing and configuring the Data Catalog.
- Airbyte: Upgrade to a more recent version.
- Web Analytics: Deploying Umami
- Deploying Qdrant for a Vector Database used by the IFT LLM RAG project.
Note, Superset wont be decommissed since it has been request by some teams.
Futue Plan
Visualization
- Umami Dashboard
- Keycard - Add older data
- Discord
Data Extraction
- Social media:
- Extract Reddit Data
- Extract Twitter Video data
- Improve Tweets extraction
- Improve Discord extraction - Extract message in Public channels
- Extract Luna Data
- Status App - Extract Paraswap data
- Status Network - Improve the data extraction: use the RPC endpoint, make it run at high frequecency.
- Finance:
IFT LLM
- Build first version of the entire pipeline and make it available
- Perpare a Sentiment Analysis Pipeline for Social media data.
Infrastructure
- IFT LLM:
- Dagster: Research and test alternative solution to Airflow
- Data Lake: Optimizing the prod environment
- DataHub: finish deploying and configuring DataHub
Sources and Useful Links
- Repository to create new issues: If your team need some Visualization or access to some data, please create an issue in this repo.
- BI documentation