aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorben2025-01-12 14:37:13 +0100
committerben2025-01-12 14:37:13 +0100
commit778188ed95ccf50d2e21938bf5b542d76e066f63 (patch)
treee5138e638da98036e03cb11b2b0cf48fe4c590b2
downloadai_env-778188ed95ccf50d2e21938bf5b542d76e066f63.tar.gz
ai_env-778188ed95ccf50d2e21938bf5b542d76e066f63.tar.bz2
ai_env-778188ed95ccf50d2e21938bf5b542d76e066f63.tar.xz
Initial commit, first public version.
-rw-r--r--.gitignore1
-rw-r--r--README.md98
-rw-r--r--docker-compose.yml98
-rw-r--r--logo.webpbin0 -> 39410 bytes
-rwxr-xr-xsetup_desktop.sh13
-rw-r--r--src/aichat/Dockerfile7
-rw-r--r--src/aichat/config.yaml8
-rw-r--r--src/llm_provision/Dockerfile12
-rw-r--r--src/llm_provision/entrypoint.sh4
-rwxr-xr-xsrc/llm_provision/init_models.sh17
-rw-r--r--src/nginx/nginx.conf61
-rw-r--r--src/tts/Dockerfile47
-rw-r--r--src/tts/download_voices_tts-1.sh8
-rw-r--r--src/tts/voice_to_speaker.default.yaml36
-rw-r--r--src/whisper/Dockerfile13
-rwxr-xr-xtools/aichatbin0 -> 12073152 bytes
-rwxr-xr-xtools/stt.sh121
-rwxr-xr-xtools/tts.sh115
18 files changed, 659 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..4c49bd7
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1 @@
+.env
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..c089330
--- /dev/null
+++ b/README.md
@@ -0,0 +1,98 @@
+# Privacy-First Command-Line AI for Linux
+
+![AI_ENV](logo.webp)
+
+Unlock the power of AI—right from your Linux terminal.
+
+This project delivers a fully local AI environment, running open source language models directly on your machine.
+
+No cloud. No GAFAM. Just full privacy, control, and the freedom to manipiulate commands in your shell.
+
+## How it works
+
+* [Ollama](https://ollama.com/) run language models on the local machine.
+* [openedai-speech](https://github.com/matatonic/openedai-speech) provide text to speech capability.
+* [speaches-ai](https://github.com/speaches-ai/speaches) provide transcription, translation, and speech generation.
+* [nginx](https://nginx.org/en/) add an authentication to the API.
+* [aichat](https://github.com/sigoden/aichat) is used as LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents.
+
+Everything is free, open-source and automated using Docker Compose and shell scripts.
+
+## Requirements
+
+To run this project efficiently, a modern computer with a recent NVIDIA GPU is required.
+As an example, I achieve good performance with an Intel(R) Core(TM) i7-14700HX, a GeForce RTX 4050, and 32GB of RAM.
+
+You must use Linux and the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit).
+
+Note that it is probably possible to run the project on other GPUs or modern MacBooks, but this is not the purpose of this project.
+
+## How to launch the server
+
+Choose the models you wish to use in the docker-compose.yaml file and change the API token in the .env file as follows:
+```
+LLM_API_KEY=1234567890
+```
+
+Next, start the servers and their configuration with Docker Compose:
+```bash
+docker compose up --build -d
+```
+
+## How to use
+
+The `setup_desktop.sh` script allows for copying a compiled static version of [aichat](https://github.com/sigoden/aichat) from a container to your host and configuring the tool.
+
+### Aichat essentials
+
+To launch a chatbot while maintaining context:
+```bash
+aichat -m ollama:qwen2.5 -s
+```
+
+With a prompt:
+```bash
+aichat -m ollama:qwen2.5 --prompt "I want you to act as an English translator, spelling corrector and improver. I will speak to you in any language and you will detect the language, translate it and answer in the corrected and improved version of my text, in English. I want you to only reply the correction, the improvements and nothing else, do not write explanations."
+```
+
+Pipe a command and transform the result with the LLM:
+```
+ls | aichat -m ollama:qwen2.5 --prompt "transform to json"
+```
+
+Go to the [AIChat](https://github.com/sigoden/aichat) website for other possible use cases.
+
+### Text To Speech
+
+To use text-to-speech, use the script in the `tools/tts.sh` file.
+
+Example:
+```
+./tools/tts.sh -l french -v pierre --play "Aujourd'hui, nous sommes le $(date +%A\ %d\ %B\ %Y)."
+```
+
+### Speech To Text
+
+For the Speech to Text functionality use `tools/stt.sh`.
+The function record allows you to use PulseAudio to record the computer's audio (for example, a video in the browser).
+The transcription function converts the audio file into text.
+
+Example:
+```bash
+./tools/stt.sh record -s alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__Speaker__sink.monitor
+./tools/stt.sh transcription -f record_20250112_125726.wav -l fr
+```
+
+## How to Use Remotely
+
+The API authentication via Nginx allows you to open the API on the internet and use it remotely.
+By adding a reverse proxy like Caddy in front of it, you can also add TLS encryption.
+This way, you can securely use this environment remotely.
+
+To use the scripts tools in a remote context, use the environment variables TTS_API_HOST and STT_API_HOST.
+
+Example:
+```
+TTS_API_HOST="https://your-remote-domain" ./tools/tts.sh -l french -v pierre --play "Aujourd'hui, nous sommes le $(date +%A\ %d\ %B\ %Y)."
+STT_API_HOST="https://your-remote-domain" ./tools/stt.sh transcription -f speech_20250112_124805.wav -l fr
+```
diff --git a/docker-compose.yml b/docker-compose.yml
new file mode 100644
index 0000000..25d4ef3
--- /dev/null
+++ b/docker-compose.yml
@@ -0,0 +1,98 @@
+services:
+ ollama:
+ image: ollama/ollama
+ volumes:
+ - ollama:/root/.ollama
+ restart: always
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+ count: all
+ capabilities: [gpu]
+ healthcheck:
+ test: ollama --version && ollama ps || exit 1
+ interval: 60s
+ retries: 5
+ start_period: 20s
+ timeout: 10s
+ openedai-speech:
+ build:
+ dockerfile: src/tts/Dockerfile
+ environment:
+ - TTS_HOME=voices
+ volumes:
+ - voices:/app/voices
+ - speech-config:/app/config
+ restart: unless-stopped
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+ count: all
+ capabilities: [gpu]
+ healthcheck:
+ test: curl --fail http://localhost:8000 || exit 1
+ interval: 60s
+ retries: 5
+ start_period: 10s
+ timeout: 10s
+ llm_provision:
+ build:
+ dockerfile: src/llm_provision/Dockerfile
+ environment:
+ - MODELS=qwen2.5:latest,qwen2.5-coder:32b,nomic-embed-text:latest
+ restart: no
+ depends_on:
+ ollama:
+ condition: service_healthy
+ restart: true
+ links:
+ - ollama
+ aichat-build:
+ build:
+ dockerfile: src/aichat/Dockerfile
+ faster-whisper-server:
+ image: fedirz/faster-whisper-server:latest-cuda
+ environment:
+ - WHISPER__MODEL=Systran/faster-whisper-large-v3
+ volumes:
+ - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+ count: all
+ capabilities: [gpu]
+ healthcheck:
+ test: timeout 10s bash -c ':> /dev/tcp/127.0.0.1/8000' || exit 1
+ interval: 30s
+ timeout: 15s
+ retries: 3
+ nginx:
+ image: nginx
+ volumes:
+ - ./src/nginx/nginx.conf:/etc/nginx/templates/nginx.conf.template
+ environment:
+ - NGINX_ENVSUBST_OUTPUT_DIR=/etc/nginx
+ - API_KEY=${LLM_API_KEY}
+ depends_on:
+ - openedai-speech
+ - faster-whisper-server
+ - ollama
+ links:
+ - ollama
+ - faster-whisper-server
+ - openedai-speech
+ ports:
+ - "11434:11434"
+ - "8000:8000"
+ - "8001:8001"
+volumes:
+ ollama:
+ voices:
+ speech-config:
+ hf-hub-cache:
diff --git a/logo.webp b/logo.webp
new file mode 100644
index 0000000..9b1f516
--- /dev/null
+++ b/logo.webp
Binary files differ
diff --git a/setup_desktop.sh b/setup_desktop.sh
new file mode 100755
index 0000000..94bf0bd
--- /dev/null
+++ b/setup_desktop.sh
@@ -0,0 +1,13 @@
+#!/usr/bin/env bash
+
+SCRIPT=$(readlink -f "$0")
+SCRIPTPATH=$(dirname "$SCRIPT")
+cd "$SCRIPTPATH" || exit
+
+container_id=$(docker create "aichat-build")
+docker cp "${container_id}:/usr/local/cargo/bin/aichat" "./tools/"
+docker rm "${container_id}"
+
+source .env
+mkdir -p ~/.config/aichat/
+cat src/aichat/config.yaml | sed "s/__LLM_API_KEY__/${LLM_API_KEY}/" > ~/.config/aichat/config.yaml
diff --git a/src/aichat/Dockerfile b/src/aichat/Dockerfile
new file mode 100644
index 0000000..df13f63
--- /dev/null
+++ b/src/aichat/Dockerfile
@@ -0,0 +1,7 @@
+FROM rust:latest
+
+RUN rustup target add x86_64-unknown-linux-musl
+RUN apt update && apt install -y musl-tools musl-dev
+RUN update-ca-certificates
+
+RUN cargo install --target x86_64-unknown-linux-musl aichat
diff --git a/src/aichat/config.yaml b/src/aichat/config.yaml
new file mode 100644
index 0000000..a74af2c
--- /dev/null
+++ b/src/aichat/config.yaml
@@ -0,0 +1,8 @@
+# see https://github.com/sigoden/aichat/blob/main/config.example.yaml
+
+model: ollama
+clients:
+- type: openai-compatible
+ name: ollama
+ api_base: http://localhost:11434/v1
+ api_key: __LLM_API_KEY__
diff --git a/src/llm_provision/Dockerfile b/src/llm_provision/Dockerfile
new file mode 100644
index 0000000..77701fe
--- /dev/null
+++ b/src/llm_provision/Dockerfile
@@ -0,0 +1,12 @@
+FROM debian:bookworm-slim
+
+ENV DEBIAN_FRONTEND=noninteractive
+RUN apt-get update
+RUN apt-get --yes -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confnew" install bash curl jq
+
+ADD ./src/llm_provision/init_models.sh /init_models.sh
+ADD ./src/llm_provision/entrypoint.sh /entrypoint.sh
+RUN chmod 755 /entrypoint.sh
+
+ENTRYPOINT ["/entrypoint.sh"]
+#ENTRYPOINT ["tail", "-f", "/dev/null"] # to debug
diff --git a/src/llm_provision/entrypoint.sh b/src/llm_provision/entrypoint.sh
new file mode 100644
index 0000000..d0b6e85
--- /dev/null
+++ b/src/llm_provision/entrypoint.sh
@@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+
+echo "pull models into ollama volumes"
+bash /init_models.sh
diff --git a/src/llm_provision/init_models.sh b/src/llm_provision/init_models.sh
new file mode 100755
index 0000000..0afbbd0
--- /dev/null
+++ b/src/llm_provision/init_models.sh
@@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+
+OLLAMA_HOST="http://ollama:11434"
+
+IFS=',' read -r -a models_arr <<< "${MODELS}"
+
+## now loop through the above array
+for m in "${models_arr[@]}"
+do
+ curl -s "${OLLAMA_HOST}/api/tags" | jq '.models[].name' | grep ${m} > /dev/null
+ if [[ $? -ne 0 ]]
+ then
+ curl -s "${OLLAMA_HOST}/api/pull" -d "{\"model\": \"${m}\"}"
+ else
+ echo "${m} already installed"
+ fi
+done
diff --git a/src/nginx/nginx.conf b/src/nginx/nginx.conf
new file mode 100644
index 0000000..2dc6d52
--- /dev/null
+++ b/src/nginx/nginx.conf
@@ -0,0 +1,61 @@
+events{}
+http {
+ server_tokens off;
+ client_max_body_size 200m;
+
+ server {
+ listen 11434;
+ set $deny 1;
+ if ($http_authorization = "Bearer $API_KEY") {
+ set $deny 0;
+ }
+ if ($deny) {
+ return 403;
+ }
+ location / {
+ proxy_pass http://ollama:11434;
+ proxy_set_header Host $host;
+ proxy_set_header X-Real-IP $remote_addr;
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+ proxy_set_header X-Forwarded-Proto $scheme;
+ }
+ }
+ server {
+ listen 8000;
+ set $deny 1;
+ if ($http_authorization = "Bearer $API_KEY") {
+ set $deny 0;
+ }
+ if ($deny) {
+ return 403;
+ }
+ location / {
+ proxy_pass http://openedai-speech:8000;
+ proxy_set_header Host $host;
+ proxy_set_header X-Real-IP $remote_addr;
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+ proxy_set_header X-Forwarded-Proto $scheme;
+ }
+ }
+ server {
+ listen 8001;
+ set $deny 1;
+ if ($http_authorization = "Bearer $API_KEY") {
+ set $deny 0;
+ }
+ if ($deny) {
+ return 403;
+ }
+ location / {
+ proxy_pass http://faster-whisper-server:8000;
+ proxy_set_header Host $host;
+ proxy_set_header X-Real-IP $remote_addr;
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+ proxy_set_header X-Forwarded-Proto $scheme;
+ proxy_read_timeout 180;
+ proxy_http_version 1.1;
+ proxy_set_header Upgrade $http_upgrade;
+ proxy_set_header Connection "upgrade";
+ }
+ }
+}
diff --git a/src/tts/Dockerfile b/src/tts/Dockerfile
new file mode 100644
index 0000000..1636bd2
--- /dev/null
+++ b/src/tts/Dockerfile
@@ -0,0 +1,47 @@
+FROM python:3.11-slim
+
+RUN --mount=type=cache,target=/root/.cache/pip pip install -U pip
+
+ARG TARGETPLATFORM
+RUN <<EOF
+apt-get update
+apt-get install --no-install-recommends -y curl ffmpeg git
+if [ "$TARGETPLATFORM" != "linux/amd64" ]; then
+ apt-get install --no-install-recommends -y build-essential
+ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+fi
+
+# for deepspeed support - image +7.5GB, over the 10GB ghcr.io limit, and no noticable gain in speed or VRAM usage?
+#curl -O https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.1-1_all.deb
+#dpkg -i cuda-keyring_1.1-1_all.deb
+#rm cuda-keyring_1.1-1_all.deb
+#apt-get install --no-install-recommends -y libaio-dev build-essential cuda-toolkit
+
+apt-get clean
+rm -rf /var/lib/apt/lists/*
+EOF
+#ENV CUDA_HOME=/usr/local/cuda
+ENV PATH="/root/.cargo/bin:${PATH}"
+
+WORKDIR /app
+RUN mkdir -p voices config
+
+ARG USE_ROCM
+ENV USE_ROCM=${USE_ROCM}
+
+RUN git clone https://github.com/matatonic/openedai-speech.git /tmp/app
+RUN mv /tmp/app/* /app/
+ADD src/tts/download_voices_tts-1.sh /app/download_voices_tts-1.sh
+ADD src/tts/voice_to_speaker.default.yaml /app/voice_to_speaker.default.yaml
+RUN if [ "${USE_ROCM}" = "1" ]; then mv /app/requirements-rocm.txt /app/requirements.txt; fi
+RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt
+
+
+ARG PRELOAD_MODEL
+ENV PRELOAD_MODEL=${PRELOAD_MODEL}
+ENV TTS_HOME=voices
+ENV HF_HOME=voices
+ENV COQUI_TOS_AGREED=1
+
+CMD bash startup.sh
+
diff --git a/src/tts/download_voices_tts-1.sh b/src/tts/download_voices_tts-1.sh
new file mode 100644
index 0000000..f880650
--- /dev/null
+++ b/src/tts/download_voices_tts-1.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+# cat voice_to_speaker.default.yaml | yq '.tts-1 ' | grep mode | cut -d'/' -f2 | cut -d'.' -f1 | sort -u | xargs
+models=${*:-"en_GB-alba-medium en_GB-northern_english_male-medium en_US-bryce-medium en_US-john-medium en_US-libritts_r-medium en_US-ryan-high fr_FR-siwis-medium fr_FR-tom-medium fr_FR-upmc-medium"}
+piper --update-voices --data-dir voices --download-dir voices --model x 2> /dev/null
+for i in $models ; do
+ [ ! -e "voices/$i.onnx" ] && piper --data-dir voices --download-dir voices --model $i < /dev/null > /dev/null
+done
+
diff --git a/src/tts/voice_to_speaker.default.yaml b/src/tts/voice_to_speaker.default.yaml
new file mode 100644
index 0000000..53acda6
--- /dev/null
+++ b/src/tts/voice_to_speaker.default.yaml
@@ -0,0 +1,36 @@
+# Use https://rhasspy.github.io/piper-samples/ to configure
+tts-1:
+ alloy:
+ model: voices/en_US-libritts_r-medium.onnx
+ speaker: 79
+ siwis:
+ model: voices/fr_FR-siwis-medium.onnx
+ speaker: 0
+ tom:
+ model: voices/fr_FR-tom-medium.onnx
+ speaker: 0
+ pierre:
+ model: voices/fr_FR-upmc-medium.onnx
+ speaker: 1
+ jessica:
+ model: voices/fr_FR-upmc-medium.onnx
+ speaker: 0
+ alba:
+ model: voices/en_GB-alba-medium.onnx
+ speaker: 0
+ jack:
+ model: voices/en_GB-northern_english_male-medium.onnx
+ speaker: 0
+ john:
+ model: voices/en_US-john-medium.onnx
+ speaker: 0
+ bryce:
+ model: voices/en_US-bryce-medium.onnx
+ speaker: 0
+ ryan:
+ model: voices/en_US-ryan-high.onnx
+ speaker: 0
+ echo:
+ model: voices/en_US-libritts_r-medium.onnx
+ speaker: 134
+
diff --git a/src/whisper/Dockerfile b/src/whisper/Dockerfile
new file mode 100644
index 0000000..2909803
--- /dev/null
+++ b/src/whisper/Dockerfile
@@ -0,0 +1,13 @@
+FROM debian:bookworm-slim
+
+RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
+ sudo \
+ python3 \
+ python3-distutils \
+ python3-pip \
+ ffmpeg
+
+RUN pip install -U openai-whisper --break-system-packages
+WORKDIR /app
+
+CMD ["whisper"]
diff --git a/tools/aichat b/tools/aichat
new file mode 100755
index 0000000..ff31ede
--- /dev/null
+++ b/tools/aichat
Binary files differ
diff --git a/tools/stt.sh b/tools/stt.sh
new file mode 100755
index 0000000..13a1b5a
--- /dev/null
+++ b/tools/stt.sh
@@ -0,0 +1,121 @@
+#!/bin/bash
+
+# Function to print usage information
+usage() {
+ echo "Usage: $0 [record|transcription] <options>"
+ echo ""
+ echo "Actions:"
+ echo " record Record audio from a selected source"
+ echo " transcription Transcribe audio from a .wav file"
+ echo ""
+ echo "Options for 'record':"
+ echo " -s, --source Specify the audio source (required)"
+ echo ""
+ echo "Options for 'transcription':"
+ echo " -f, --file Specify the audio file to transcribe (required)"
+ echo " -l, --lang Specify the audio file language (default: en)"
+ exit 1
+}
+
+if [[ $# -eq 0 ]]; then
+ usage
+fi
+
+# Check for required environment variable
+if [[ -z "${LLM_API_KEY}" ]]; then
+ echo "The environment variable LLM_API_KEY is not set."
+ echo 'You can use the following command: export $(xargs < ../.env))'
+ exit 1
+fi
+
+ACTION=$1
+shift
+
+host=${STT_API_HOST:-"http://localhost:8001"}
+LANG="en" # Default language
+
+if [ "$ACTION" == "record" ]; then
+ if [ "$#" -eq 0 ]; then
+ echo "Error: Source is required for record action."
+ echo "Available sources:"
+ pactl list short sources | awk '{print $2}'
+ exit 1
+ fi
+
+ SOURCE=""
+ while [[ "$#" -gt 0 ]]; do
+ case $1 in
+ -s | --source)
+ SOURCE="$2"
+ shift
+ ;;
+ *)
+ echo "Unknown parameter passed: $1"
+ usage
+ ;;
+ esac
+ shift
+ done
+
+ # Validate the provided source
+ if ! pactl list short sources | awk '{print $2}' | grep -q "^$SOURCE$"; then
+ echo "Error: Invalid audio source. Available sources:"
+ pactl list short sources | awk '{print $2}'
+ exit 1
+ fi
+
+ timestamp=$(date +"%Y%m%d_%H%M%S")
+ filename="record_${timestamp}.wav"
+ echo "Start recording to ${filename} ; use CTRL+C to terminate."
+ parec -d "${SOURCE}" --file-format=wav "${filename}"
+elif [ "$ACTION" == "transcription" ]; then
+ if [ "$#" -eq 0 ]; then
+ echo "Error: File is required for transcription action."
+ usage
+ fi
+
+ FILE=""
+ while [[ "$#" -gt 0 ]]; do
+ case $1 in
+ -f | --file)
+ FILE="$2"
+ shift
+ ;;
+ -l | --lang)
+ LANG="$2"
+ shift
+ ;;
+ *)
+ echo "Unknown parameter passed: $1"
+ usage
+ ;;
+ esac
+ shift
+ done
+
+ if [ -z "$FILE" ]; then
+ echo "Error: File is required for transcription action."
+ usage
+ fi
+
+ # Check if the file exists
+ if [ ! -f "$FILE" ]; then
+ echo "Error: File '$FILE' does not exist."
+ exit 1
+ fi
+
+ # Ensure that curl is available
+ if ! command -v curl &>/dev/null; then
+ echo "curl is required for transcription but could not be found on your system. Please install it."
+ exit 1
+ fi
+
+ # Transcribe the specified file
+ echo "Transcribing file $FILE, be patient"
+ curl "${host}/v1/audio/transcriptions" -H "Authorization: Bearer ${LLM_API_KEY}" \
+ -F "file=@${FILE}" \
+ -F "stream=true" \
+ -F "language=${LANG}"
+else
+ usage
+fi
diff --git a/tools/tts.sh b/tools/tts.sh
new file mode 100755
index 0000000..2065a3d
--- /dev/null
+++ b/tools/tts.sh
@@ -0,0 +1,115 @@
+#!/bin/bash
+
+# Function to display usage information
+usage() {
+ echo "Usage: $0 -l <lang> -v <voice> -s <speed> [--play] \"<text>\""
+ echo " -l|--lang : Specify the language (french|english)"
+ echo " -v|--voice : Specify the voice"
+ echo " -s|--speed : Specify the speed (0.0 > 3.0, default is 1.0)"
+ echo " --play : Play the generated audio file using ffplay"
+ echo " <text> : The text to synthesize"
+ exit 1
+}
+
+# Function to check if a value is a valid float between 0 and 3.0
+is_valid_float() {
+ local value=$1
+ # Check if the value is a valid number
+ if [[ $value =~ ^-?[0-9]+(\.[0-9]+)?$ ]]; then
+ # Check if the value is between 0 and 3.0
+ if (($(echo "$value >= 0" | bc -l))) && (($(echo "$value <= 3.0" | bc -l))); then
+ return 0
+ fi
+ fi
+ return 1
+}
+
+# Check for required environment variable
+if [[ -z "${LLM_API_KEY}" ]]; then
+ echo "The environment variable LLM_API_KEY is not set."
+ echo 'You can use the following command: export $(xargs < ../.env))'
+ exit 1
+fi
+
+# Default values
+speed=1.0
+host=${TTS_API_HOST:-"http://localhost:8000"}
+play_audio=false
+
+# Parse command line arguments
+while [[ $# -gt 0 ]]; do
+ case $1 in
+ -l | --lang)
+ lang="$2"
+ shift 2
+ ;;
+ -v | --voice)
+ voice="$2"
+ shift 2
+ ;;
+ -s | --speed)
+ speed="$2"
+ shift 2
+ ;;
+ --play)
+ play_audio=true
+ shift 1
+ ;;
+ -h | --help)
+ usage
+ ;;
+ -* | --*)
+ echo "Unknown option $1"
+ usage
+ ;;
+ *)
+ break
+ ;;
+ esac
+done
+
+# Optionally grab the text after the options
+if [[ $# -gt 0 ]]; then
+ text="$*"
+else
+ echo "Error: Text to synthesize is required."
+ usage
+fi
+
+# Generate a timestamp
+timestamp=$(date +"%Y%m%d_%H%M%S")
+
+# Construct the filename with the current date and time
+filename="speech_${timestamp}.wav"
+
+# Validate language and voice options
+if [[ -z "$lang" || -z "$voice" ]]; then
+ echo "Error: Language (-l) and voice (-v) options are required."
+ usage
+fi
+
+# Check if the speed is valid
+if ! is_valid_float "$speed"; then
+ echo "Error: Speed must be a float between 0.0 and 3.0."
+ exit 1
+fi
+
+# Fetch the audio file from the API
+http_status_code=$(curl -s "${host}/v1/audio/speech" -o "${filename}" -w "%{http_code}" -H "Authorization: Bearer ${LLM_API_KEY}" -H "Content-Type: application/json" -d "{\"model\": \"tts-1\",\"input\": \"${text}\",\"voice\": \"${voice}\",\"response_format\": \"wav\",\"speed\": ${speed}}")
+
+# Check the response code for successful HTTP request
+if [[ "$http_status_code" -ne 200 ]]; then
+ echo "Error: Failed to fetch audio file. Received HTTP status code: $http_status_code"
+ exit 1
+fi
+
+# Optionally play the generated WAV file with ffplay
+if [ "$play_audio" = true ]; then
+ if ! command -v ffplay &>/dev/null; then
+ echo "Error: ffplay is not installed. Please install mpv to play audio files."
+ exit 1
+ fi
+ ffplay ${filename} -nodisp -nostats -hide_banner -autoexit -v quiet
+fi
+
+echo "Audio file '$filename' generated successfully."