Initial commit, first public version.

author: ben 2025-01-12 14:37:13 +0100
committer: ben 2025-01-12 14:37:13 +0100
commit: 778188ed95ccf50d2e21938bf5b542d76e066f63 (patch)
tree: e5138e638da98036e03cb11b2b0cf48fe4c590b2
download: ai_env-778188ed95ccf50d2e21938bf5b542d76e066f63.tar.gz
ai_env-778188ed95ccf50d2e21938bf5b542d76e066f63.tar.bz2
ai_env-778188ed95ccf50d2e21938bf5b542d76e066f63.tar.xz
18 files changed, 659 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..4c49bd7
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1 @@
+.env
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..c089330
--- /dev/null
+++ b/README.md
@@ -0,0 +1,98 @@
+# Privacy-First Command-Line AI for Linux
+
+![AI_ENV](logo.webp)
+
+Unlock the power of AI—right from your Linux terminal.
+
+This project delivers a fully local AI environment, running open source language models directly on your machine. 
+
+No cloud. No GAFAM. Just full privacy, control, and the freedom to manipiulate commands in your shell.
+
+## How it works
+
+* [Ollama](https://ollama.com/) run language models on the local machine.
+* [openedai-speech](https://github.com/matatonic/openedai-speech) provide text to speech capability.
+* [speaches-ai](https://github.com/speaches-ai/speaches) provide transcription, translation, and speech generation.
+* [nginx](https://nginx.org/en/) add an authentication to the API.
+* [aichat](https://github.com/sigoden/aichat) is used as LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents.
+
+Everything is free, open-source and automated using Docker Compose and shell scripts.
+
+## Requirements
+
+To run this project efficiently, a modern computer with a recent NVIDIA GPU is required.
+As an example, I achieve good performance with an Intel(R) Core(TM) i7-14700HX, a GeForce RTX 4050, and 32GB of RAM.
+
+You must use Linux and the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit).
+
+Note that it is probably possible to run the project on other GPUs or modern MacBooks, but this is not the purpose of this project.
+
+## How to launch the server
+
+Choose the models you wish to use in the docker-compose.yaml file and change the API token in the .env file as follows:
+```
+LLM_API_KEY=1234567890
+```
+
+Next, start the servers and their configuration with Docker Compose:
+```bash
+docker compose up --build -d
+```
+
+## How to use
+
+The `setup_desktop.sh` script allows for copying a compiled static version of [aichat](https://github.com/sigoden/aichat) from a container to your host and configuring the tool.
+
+### Aichat essentials
+
+To launch a chatbot while maintaining context:
+```bash
+aichat -m ollama:qwen2.5 -s
+```
+
+With a prompt:
+```bash
+aichat -m ollama:qwen2.5 --prompt "I want you to act as an English translator, spelling corrector and improver. I will speak to you in any language and you will detect the language, translate it and answer in the corrected and improved version of my text, in English. I want you to only reply the correction, the improvements and nothing else, do not write explanations."
+```
+
+Pipe a command and transform the result with the LLM:
+```
+ls | aichat -m ollama:qwen2.5 --prompt "transform to json"
+```
+
+Go to the [AIChat](https://github.com/sigoden/aichat) website for other possible use cases.
+
+### Text To Speech
+
+To use text-to-speech, use the script in the `tools/tts.sh` file.
+
+Example:
+```
+./tools/tts.sh -l french -v pierre --play "Aujourd'hui, nous sommes le $(date +%A\ %d\ %B\ %Y)."
+```
+
+### Speech To Text
+
+For the Speech to Text functionality use `tools/stt.sh`.
+The function record allows you to use PulseAudio to record the computer's audio (for example, a video in the browser).
+The transcription function converts the audio file into text.
+
+Example:
+```bash
+./tools/stt.sh record -s alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__Speaker__sink.monitor
+./tools/stt.sh transcription -f record_20250112_125726.wav -l fr
+```
+
+## How to Use Remotely
+
+The API authentication via Nginx allows you to open the API on the internet and use it remotely.
+By adding a reverse proxy like Caddy in front of it, you can also add TLS encryption. 
+This way, you can securely use this environment remotely.
+
+To use the scripts tools in a remote context, use the environment variables TTS_API_HOST and STT_API_HOST.
+
+Example:
+```
+TTS_API_HOST="https://your-remote-domain" ./tools/tts.sh -l french -v pierre --play "Aujourd'hui, nous sommes le $(date +%A\ %d\ %B\ %Y)."
+STT_API_HOST="https://your-remote-domain" ./tools/stt.sh transcription -f speech_20250112_124805.wav -l fr
+```
diff --git a/docker-compose.yml b/docker-compose.yml
new file mode 100644
index 0000000..25d4ef3
--- /dev/null
+++ b/docker-compose.yml
@@ -0,0 +1,98 @@
+services:
+  ollama:
+    image: ollama/ollama
+    volumes:
+      - ollama:/root/.ollama
+    restart: always
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+    healthcheck:
+      test: ollama --version && ollama ps || exit 1
+      interval: 60s
+      retries: 5
+      start_period: 20s
+      timeout: 10s
+  openedai-speech:
+    build:
+      dockerfile: src/tts/Dockerfile
+    environment:
+      - TTS_HOME=voices
+    volumes:
+      - voices:/app/voices
+      - speech-config:/app/config
+    restart: unless-stopped
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+    healthcheck:
+      test: curl --fail http://localhost:8000 || exit 1
+      interval: 60s
+      retries: 5
+      start_period: 10s
+      timeout: 10s
+  llm_provision:
+    build:
+      dockerfile: src/llm_provision/Dockerfile
+    environment:
+      - MODELS=qwen2.5:latest,qwen2.5-coder:32b,nomic-embed-text:latest
+    restart: no
+    depends_on:
+      ollama:
+        condition: service_healthy
+        restart: true
+    links:
+      - ollama
+  aichat-build:
+    build:
+      dockerfile: src/aichat/Dockerfile
+  faster-whisper-server:
+    image: fedirz/faster-whisper-server:latest-cuda
+    environment:
+      - WHISPER__MODEL=Systran/faster-whisper-large-v3
+    volumes:
+      - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+    healthcheck:
+      test: timeout 10s bash -c ':> /dev/tcp/127.0.0.1/8000' || exit 1
+      interval: 30s
+      timeout: 15s
+      retries: 3
+  nginx:
+    image: nginx
+    volumes:
+      - ./src/nginx/nginx.conf:/etc/nginx/templates/nginx.conf.template
+    environment:
+      - NGINX_ENVSUBST_OUTPUT_DIR=/etc/nginx
+      - API_KEY=${LLM_API_KEY}
+    depends_on:
+      - openedai-speech
+      - faster-whisper-server
+      - ollama
+    links:
+      - ollama
+      - faster-whisper-server
+      - openedai-speech
+    ports:
+      - "11434:11434"
+      - "8000:8000"
+      - "8001:8001"
+volumes:
+  ollama:
+  voices:
+  speech-config:
+  hf-hub-cache:
diff --git a/logo.webp b/logo.webp
new file mode 100644
index 0000000..9b1f516
--- /dev/null
+++ b/logo.webp
diff --git a/setup_desktop.sh b/setup_desktop.sh
new file mode 100755
index 0000000..94bf0bd
--- /dev/null
+++ b/setup_desktop.sh
@@ -0,0 +1,13 @@
+#!/usr/bin/env bash
+
+SCRIPT=$(readlink -f "$0")
+SCRIPTPATH=$(dirname "$SCRIPT")
+cd "$SCRIPTPATH" || exit
+
+container_id=$(docker create "aichat-build")
+docker cp "${container_id}:/usr/local/cargo/bin/aichat" "./tools/"
+docker rm "${container_id}"
+
+source .env
+mkdir -p ~/.config/aichat/
+cat src/aichat/config.yaml | sed "s/__LLM_API_KEY__/${LLM_API_KEY}/" > ~/.config/aichat/config.yaml
diff --git a/src/aichat/Dockerfile b/src/aichat/Dockerfile
new file mode 100644
index 0000000..df13f63
--- /dev/null
+++ b/src/aichat/Dockerfile
@@ -0,0 +1,7 @@
+FROM rust:latest
+
+RUN rustup target add x86_64-unknown-linux-musl
+RUN apt update && apt install -y musl-tools musl-dev
+RUN update-ca-certificates
+
+RUN cargo install --target x86_64-unknown-linux-musl aichat
diff --git a/src/aichat/config.yaml b/src/aichat/config.yaml
new file mode 100644
index 0000000..a74af2c
--- /dev/null
+++ b/src/aichat/config.yaml
@@ -0,0 +1,8 @@
+# see https://github.com/sigoden/aichat/blob/main/config.example.yaml
+
+model: ollama
+clients:
+- type: openai-compatible
+  name: ollama
+  api_base: http://localhost:11434/v1
+  api_key: __LLM_API_KEY__
diff --git a/src/llm_provision/Dockerfile b/src/llm_provision/Dockerfile
new file mode 100644
index 0000000..77701fe
--- /dev/null
+++ b/src/llm_provision/Dockerfile
@@ -0,0 +1,12 @@
+FROM debian:bookworm-slim
+
+ENV DEBIAN_FRONTEND=noninteractive
+RUN apt-get update
+RUN apt-get --yes -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confnew" install bash curl jq
+
+ADD ./src/llm_provision/init_models.sh /init_models.sh
+ADD ./src/llm_provision/entrypoint.sh /entrypoint.sh
+RUN chmod 755 /entrypoint.sh
+
+ENTRYPOINT ["/entrypoint.sh"]
+#ENTRYPOINT ["tail", "-f", "/dev/null"] # to debug
diff --git a/src/llm_provision/entrypoint.sh b/src/llm_provision/entrypoint.sh
new file mode 100644
index 0000000..d0b6e85
--- /dev/null
+++ b/src/llm_provision/entrypoint.sh
@@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+
+echo "pull models into ollama volumes"
+bash /init_models.sh
diff --git a/src/llm_provision/init_models.sh b/src/llm_provision/init_models.sh
new file mode 100755
index 0000000..0afbbd0
--- /dev/null
+++ b/src/llm_provision/init_models.sh
@@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+
+OLLAMA_HOST="http://ollama:11434"
+
+IFS=',' read -r -a models_arr <<< "${MODELS}"
+
+## now loop through the above array
+for m in "${models_arr[@]}"
+do
+  curl -s "${OLLAMA_HOST}/api/tags" | jq '.models[].name' | grep ${m} > /dev/null
+  if [[ $? -ne 0 ]]
+  then
+    curl -s "${OLLAMA_HOST}/api/pull" -d "{\"model\": \"${m}\"}"
+  else
+    echo "${m} already installed"
+  fi
+done
diff --git a/src/nginx/nginx.conf b/src/nginx/nginx.conf
new file mode 100644
index 0000000..2dc6d52
--- /dev/null
+++ b/src/nginx/nginx.conf
@@ -0,0 +1,61 @@
+events{}
+http {
+    server_tokens off;
+    client_max_body_size 200m;
+
+    server {
+        listen 11434;
+        set $deny 1;
+        if ($http_authorization = "Bearer $API_KEY") {
+            set $deny 0;
+        }
+        if ($deny) {
+            return 403;
+        }
+        location / {
+            proxy_pass http://ollama:11434;
+            proxy_set_header Host $host;
+            proxy_set_header X-Real-IP $remote_addr;
+            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+            proxy_set_header X-Forwarded-Proto $scheme;
+        }
+    }
+    server {
+        listen 8000;
+        set $deny 1;
+        if ($http_authorization = "Bearer $API_KEY") {
+            set $deny 0;
+        }
+        if ($deny) {
+            return 403;
+        }
+        location / {
+            proxy_pass http://openedai-speech:8000;
+            proxy_set_header Host $host;
+            proxy_set_header X-Real-IP $remote_addr;
+            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+            proxy_set_header X-Forwarded-Proto $scheme;
+        }
+    }
+    server {
+        listen 8001;
+        set $deny 1;
+        if ($http_authorization = "Bearer $API_KEY") {
+            set $deny 0;
+        }
+        if ($deny) {
+            return 403;
+        }
+        location / {
+            proxy_pass http://faster-whisper-server:8000;
+            proxy_set_header Host $host;
+            proxy_set_header X-Real-IP $remote_addr;
+            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+            proxy_set_header X-Forwarded-Proto $scheme;
+            proxy_read_timeout 180;
+            proxy_http_version 1.1;
+            proxy_set_header Upgrade $http_upgrade;
+            proxy_set_header Connection "upgrade";
+        }
+    }
+}
diff --git a/src/tts/Dockerfile b/src/tts/Dockerfile
new file mode 100644
index 0000000..1636bd2
--- /dev/null
+++ b/src/tts/Dockerfile
@@ -0,0 +1,47 @@
+FROM python:3.11-slim
+
+RUN --mount=type=cache,target=/root/.cache/pip pip install -U pip
+
+ARG TARGETPLATFORM
+RUN <<EOF
+apt-get update
+apt-get install --no-install-recommends -y curl ffmpeg git
+if [ "$TARGETPLATFORM" != "linux/amd64" ]; then
+	apt-get install --no-install-recommends -y build-essential
+	curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+fi
+
+# for deepspeed support - image +7.5GB, over the 10GB ghcr.io limit, and no noticable gain in speed or VRAM usage?
+#curl -O https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.1-1_all.deb
+#dpkg -i cuda-keyring_1.1-1_all.deb
+#rm cuda-keyring_1.1-1_all.deb
+#apt-get install --no-install-recommends -y libaio-dev build-essential cuda-toolkit
+
+apt-get clean
+rm -rf /var/lib/apt/lists/*
+EOF
+#ENV CUDA_HOME=/usr/local/cuda
+ENV PATH="/root/.cargo/bin:${PATH}"
+
+WORKDIR /app
+RUN mkdir -p voices config
+
+ARG USE_ROCM
+ENV USE_ROCM=${USE_ROCM}
+
+RUN git clone https://github.com/matatonic/openedai-speech.git /tmp/app
+RUN mv /tmp/app/* /app/
+ADD src/tts/download_voices_tts-1.sh /app/download_voices_tts-1.sh
+ADD src/tts/voice_to_speaker.default.yaml /app/voice_to_speaker.default.yaml
+RUN if [ "${USE_ROCM}" = "1" ]; then mv /app/requirements-rocm.txt /app/requirements.txt; fi
+RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt
+
+
+ARG PRELOAD_MODEL
+ENV PRELOAD_MODEL=${PRELOAD_MODEL}
+ENV TTS_HOME=voices
+ENV HF_HOME=voices
+ENV COQUI_TOS_AGREED=1
+
+CMD bash startup.sh
+
diff --git a/src/tts/download_voices_tts-1.sh b/src/tts/download_voices_tts-1.sh
new file mode 100644
index 0000000..f880650
--- /dev/null
+++ b/src/tts/download_voices_tts-1.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+# cat voice_to_speaker.default.yaml | yq '.tts-1 ' | grep mode | cut -d'/' -f2 | cut -d'.' -f1 | sort -u | xargs
+models=${*:-"en_GB-alba-medium en_GB-northern_english_male-medium en_US-bryce-medium en_US-john-medium en_US-libritts_r-medium en_US-ryan-high fr_FR-siwis-medium fr_FR-tom-medium fr_FR-upmc-medium"}
+piper --update-voices --data-dir voices --download-dir voices --model x 2> /dev/null
+for i in $models ; do
+    [ ! -e "voices/$i.onnx" ] && piper --data-dir voices --download-dir voices --model $i < /dev/null > /dev/null
+done
+
diff --git a/src/tts/voice_to_speaker.default.yaml b/src/tts/voice_to_speaker.default.yaml
new file mode 100644
index 0000000..53acda6
--- /dev/null
+++ b/src/tts/voice_to_speaker.default.yaml
@@ -0,0 +1,36 @@
+# Use https://rhasspy.github.io/piper-samples/ to configure
+tts-1:
+  alloy:
+    model: voices/en_US-libritts_r-medium.onnx
+    speaker: 79 
+  siwis:
+    model: voices/fr_FR-siwis-medium.onnx
+    speaker: 0
+  tom:
+    model: voices/fr_FR-tom-medium.onnx
+    speaker: 0
+  pierre:
+    model: voices/fr_FR-upmc-medium.onnx
+    speaker: 1
+  jessica:
+    model: voices/fr_FR-upmc-medium.onnx
+    speaker: 0
+  alba:
+    model: voices/en_GB-alba-medium.onnx
+    speaker: 0
+  jack:
+    model: voices/en_GB-northern_english_male-medium.onnx
+    speaker: 0
+  john:
+    model: voices/en_US-john-medium.onnx
+    speaker: 0
+  bryce:
+    model: voices/en_US-bryce-medium.onnx
+    speaker: 0
+  ryan:
+    model: voices/en_US-ryan-high.onnx
+    speaker: 0
+  echo:
+    model: voices/en_US-libritts_r-medium.onnx
+    speaker: 134 
+
diff --git a/src/whisper/Dockerfile b/src/whisper/Dockerfile
new file mode 100644
index 0000000..2909803
--- /dev/null
+++ b/src/whisper/Dockerfile
@@ -0,0 +1,13 @@
+FROM debian:bookworm-slim
+
+RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
+    sudo \
+    python3 \
+    python3-distutils \
+    python3-pip \
+    ffmpeg
+
+RUN pip install -U openai-whisper --break-system-packages
+WORKDIR /app
+
+CMD ["whisper"]
diff --git a/tools/aichat b/tools/aichat
new file mode 100755
index 0000000..ff31ede
--- /dev/null
+++ b/tools/aichat
diff --git a/tools/stt.sh b/tools/stt.sh
new file mode 100755
index 0000000..13a1b5a
--- /dev/null
+++ b/tools/stt.sh
@@ -0,0 +1,121 @@
+#!/bin/bash
+
+# Function to print usage information
+usage() {
+	echo "Usage: $0 [record|transcription] <options>"
+	echo ""
+	echo "Actions:"
+	echo "  record        Record audio from a selected source"
+	echo "  transcription Transcribe audio from a .wav file"
+	echo ""
+	echo "Options for 'record':"
+	echo "  -s, --source  Specify the audio source (required)"
+	echo ""
+	echo "Options for 'transcription':"
+	echo "  -f, --file    Specify the audio file to transcribe (required)"
+	echo "  -l, --lang    Specify the audio file language (default: en)"
+	exit 1
+}
+
+if [[ $# -eq 0 ]]; then
+	usage
+fi
+
+# Check for required environment variable
+if [[ -z "${LLM_API_KEY}" ]]; then
+	echo "The environment variable LLM_API_KEY is not set."
+	echo 'You can use the following command: export $(xargs < ../.env))'
+	exit 1
+fi
+
+ACTION=$1
+shift
+
+host=${STT_API_HOST:-"http://localhost:8001"}
+LANG="en" # Default language
+
+if [ "$ACTION" == "record" ]; then
+	if [ "$#" -eq 0 ]; then
+		echo "Error: Source is required for record action."
+		echo "Available sources:"
+		pactl list short sources | awk '{print $2}'
+		exit 1
+	fi
+
+	SOURCE=""
+	while [[ "$#" -gt 0 ]]; do
+		case $1 in
+		-s | --source)
+			SOURCE="$2"
+			shift
+			;;
+		*)
+			echo "Unknown parameter passed: $1"
+			usage
+			;;
+		esac
+		shift
+	done
+
+	# Validate the provided source
+	if ! pactl list short sources | awk '{print $2}' | grep -q "^$SOURCE$"; then
+		echo "Error: Invalid audio source. Available sources:"
+		pactl list short sources | awk '{print $2}'
+		exit 1
+	fi
+
+	timestamp=$(date +"%Y%m%d_%H%M%S")
+	filename="record_${timestamp}.wav"
+	echo "Start recording to ${filename} ; use CTRL+C to terminate."
+	parec -d "${SOURCE}" --file-format=wav "${filename}"
+elif [ "$ACTION" == "transcription" ]; then
+	if [ "$#" -eq 0 ]; then
+		echo "Error: File is required for transcription action."
+		usage
+	fi
+
+	FILE=""
+	while [[ "$#" -gt 0 ]]; do
+		case $1 in
+		-f | --file)
+			FILE="$2"
+			shift
+			;;
+		-l | --lang)
+			LANG="$2"
+			shift
+			;;
+		*)
+			echo "Unknown parameter passed: $1"
+			usage
+			;;
+		esac
+		shift
+	done
+
+	if [ -z "$FILE" ]; then
+		echo "Error: File is required for transcription action."
+		usage
+	fi
+
+	# Check if the file exists
+	if [ ! -f "$FILE" ]; then
+		echo "Error: File '$FILE' does not exist."
+		exit 1
+	fi
+
+	# Ensure that curl is available
+	if ! command -v curl &>/dev/null; then
+		echo "curl is required for transcription but could not be found on your system. Please install it."
+		exit 1
+	fi
+
+	# Transcribe the specified file
+	echo "Transcribing file $FILE, be patient"
+	curl "${host}/v1/audio/transcriptions" -H "Authorization: Bearer ${LLM_API_KEY}" \
+		-F "file=@${FILE}" \
+		-F "stream=true" \
+		-F "language=${LANG}"
+else
+	usage
+fi
diff --git a/tools/tts.sh b/tools/tts.sh
new file mode 100755
index 0000000..2065a3d
--- /dev/null
+++ b/tools/tts.sh
@@ -0,0 +1,115 @@
+#!/bin/bash
+
+# Function to display usage information
+usage() {
+	echo "Usage: $0 -l <lang> -v <voice> -s <speed> [--play] \"<text>\""
+	echo "  -l|--lang       : Specify the language (french|english)"
+	echo "  -v|--voice      : Specify the voice"
+	echo "  -s|--speed      : Specify the speed (0.0 > 3.0, default is 1.0)"
+	echo "  --play          : Play the generated audio file using ffplay"
+	echo "  <text>          : The text to synthesize"
+	exit 1
+}
+
+# Function to check if a value is a valid float between 0 and 3.0
+is_valid_float() {
+	local value=$1
+	# Check if the value is a valid number
+	if [[ $value =~ ^-?[0-9]+(\.[0-9]+)?$ ]]; then
+		# Check if the value is between 0 and 3.0
+		if (($(echo "$value >= 0" | bc -l))) && (($(echo "$value <= 3.0" | bc -l))); then
+			return 0
+		fi
+	fi
+	return 1
+}
+
+# Check for required environment variable
+if [[ -z "${LLM_API_KEY}" ]]; then
+	echo "The environment variable LLM_API_KEY is not set."
+	echo 'You can use the following command: export $(xargs < ../.env))'
+	exit 1
+fi
+
+# Default values
+speed=1.0
+host=${TTS_API_HOST:-"http://localhost:8000"}
+play_audio=false
+
+# Parse command line arguments
+while [[ $# -gt 0 ]]; do
+	case $1 in
+	-l | --lang)
+		lang="$2"
+		shift 2
+		;;
+	-v | --voice)
+		voice="$2"
+		shift 2
+		;;
+	-s | --speed)
+		speed="$2"
+		shift 2
+		;;
+	--play)
+		play_audio=true
+		shift 1
+		;;
+	-h | --help)
+		usage
+		;;
+	-* | --*)
+		echo "Unknown option $1"
+		usage
+		;;
+	*)
+		break
+		;;
+	esac
+done
+
+# Optionally grab the text after the options
+if [[ $# -gt 0 ]]; then
+	text="$*"
+else
+	echo "Error: Text to synthesize is required."
+	usage
+fi
+
+# Generate a timestamp
+timestamp=$(date +"%Y%m%d_%H%M%S")
+
+# Construct the filename with the current date and time
+filename="speech_${timestamp}.wav"
+
+# Validate language and voice options
+if [[ -z "$lang" || -z "$voice" ]]; then
+	echo "Error: Language (-l) and voice (-v) options are required."
+	usage
+fi
+
+# Check if the speed is valid
+if ! is_valid_float "$speed"; then
+	echo "Error: Speed must be a float between 0.0 and 3.0."
+	exit 1
+fi
+
+# Fetch the audio file from the API
+http_status_code=$(curl -s "${host}/v1/audio/speech" -o "${filename}" -w "%{http_code}" -H "Authorization: Bearer ${LLM_API_KEY}" -H "Content-Type: application/json" -d "{\"model\": \"tts-1\",\"input\": \"${text}\",\"voice\": \"${voice}\",\"response_format\": \"wav\",\"speed\": ${speed}}")
+
+# Check the response code for successful HTTP request
+if [[ "$http_status_code" -ne 200 ]]; then
+	echo "Error: Failed to fetch audio file. Received HTTP status code: $http_status_code"
+	exit 1
+fi
+
+# Optionally play the generated WAV file with ffplay
+if [ "$play_audio" = true ]; then
+	if ! command -v ffplay &>/dev/null; then
+		echo "Error: ffplay is not installed. Please install mpv to play audio files."
+		exit 1
+	fi
+	ffplay ${filename} -nodisp -nostats -hide_banner -autoexit -v quiet
+fi
+
+echo "Audio file '$filename' generated successfully."
author	ben	2025-01-12 14:37:13 +0100
committer	ben	2025-01-12 14:37:13 +0100
commit	778188ed95ccf50d2e21938bf5b542d76e066f63 (patch)
tree	e5138e638da98036e03cb11b2b0cf48fe4c590b2
download	ai_env-778188ed95ccf50d2e21938bf5b542d76e066f63.tar.gz ai_env-778188ed95ccf50d2e21938bf5b542d76e066f63.tar.bz2 ai_env-778188ed95ccf50d2e21938bf5b542d76e066f63.tar.xz