Reply Pilot Backend

Tato stranka popisuje modul reply-pilot-be/.

Role modulu

  • vlastni JSON API pro inbox a komunikacni workflow
  • vlastni lokalni snapshot schranky a inbox cache
  • vola reply-pilot-gmail pro Gmail sync a drafty
  • pri zobrazeni inboxu/detailu threadu udrzuje komunikacni historii v reply-pilot-db
  • vlastni OpenAI debug/generation flow pro reply-pilot-app
  • vlastni server-side lead import workflow pro JSONL batch soubory
  • je hlavni backend owner e-mailove vrstvy

Runtime

  • bezi jako samostatny Flask/Gunicorn kontejner
  • zapisuje do reply-pilot-be/data a reply-pilot-be/logs
  • conf/ zustava read-only mount
  • na interni siti se hlasi jako reply-pilot-be
  • reply-pilot-app a reply-pilot-worker ho volaji pres http://reply-pilot-be:5000
  • Gmail sluzbu vola pres GMAIL_API_BASE_URL, defaultne http://reply-pilot-gmail:5000
  • PostgreSQL vola pres reply-pilot-db na interni siti

Email Artifact Flow

Import emailu probiha ve trech vrstvach:

  1. Gmail sync vytvori thread snapshot v reply-pilot-be/data/emails/threads/ a pri dostupnych prilohach stahne binarky do reply-pilot-be/data/emails/attachments/.
  2. Activity import zapise do PostgreSQL:
  3. activity + activity_email
  4. activity_email_attachment jako DB metadata nad lokalni file cache
  5. activity_email_link jako explicitni URL nalezene v body_text
  6. Pozdejsi attachment phase muze dotahnout chybejici binarky; backend pak znovu synchronizuje activity_email_attachment, aby DB metadata odpovidala aktualni lokalni cache.

Dulezite pravidlo:

  • PostgreSQL je source of truth pro attachment metadata a pro nalezene URL
  • binarni obsah priloh zustava v backend file cache
  • vazba z DB na file cache vede pres activity_email_attachment.relative_path
  • odeslane e-maily se po uspesnem Gmail send materializuji stejnou ActivityImportStore cestou jako importovane zpravy; lead import wizard pri akci odeslat+a importovat posila e-mail v backendu, aby se pred zapisem activity_email nejdriv zalozila firma a kontaktni vazby

Lead Import Runtime

Lead import flow cte serverove JSONL soubory z backend storage, typicky pod reply-pilot-be/data/lead-imports/:

  • inbox/ ceka na operatora
  • processing/ drzi soubor navazany na aktivni review batch
  • done/ drzi dokoncene batch soubory
  • failed/ drzi soubory, ktere nesly nacist nebo zalozit do DB

Backend:

  • vypise dostupne soubory pro web app
  • ze zvoleneho souboru zalozi lead_import_batch a lead_import_item
  • pri otevreni detailu polozky umi vygenerovat AI draft prvniho osloveni
  • pri operatorove akci umi zalozit nebo doplnit firmu, odeslat prvni osloveni, ulozit odeslany e-mail do kontaktni historie, ulozit activity_note, propsat party_outreach_policy a vytvorit Jira task

Manual Company Create

Backend exposeuje POST /api/companies pro rucni zalozeni firmy z web appky. Endpoint v jedne transakci zalozi pouze organizaci nad existujicim party modelem. Kontakty, osoby, poznamky, outreach policy ani Jira tasky nejsou soucasti tohoto V1 formulare.

V1 request obsahuje:

  • company: display_name, legal_name, company_registration_number, tax_identifier, website, roles, show_by_default

Pred insertem backend kontroluje duplicity podle presneho ICO, DIC a DOMAIN/WEBSITE identifikatoru. Pri nalezu vraci 409 s kandidaty a zadny zaznam neuklada.

Email Thread Reply Tasky

Backend vlastni vytvareni Jira tasku typu Email Thread Reply.

  • UI vola POST /api/email-threads/<email_id>/reply-tasks s party_id, assignee_user_id, summary a description.
  • Backend nacita e-mailove vlakno ze sve e-mail sluzby, kontroluje duplicitu pres task_jira_email_thread_reply.external_thread_id, vytvari Jira issue typu JIRA_DEFAULT_ISSUE_TYPE_REPLY_EMAIL a uklada existujici lokalni tabulky task, task_jira a task_jira_email_thread_reply.
  • Do Jira description doplnuje link na detail firmy a po lokalnim ulozeni i link na detail tasku podle APP_PUBLIC_BASE_URL.
  • Pri duplicitnim threadu endpoint vraci 409 a metadata existujiciho tasku, aby app mohla presmerovat na jiz zalozeny task.

Activity import pri nove prichozi zprave od externiho dodavatele nejdriv preklopi existujici reply task do stavu Drafting Reply. Kdyz pro thread reply task jeste neni a firma ma Supplier Onboarding task, backend zalozi novy Email Thread Reply, prevezme assignee z onboarding tasku a novy Jira ticket take prevede do Drafting Reply. V1 nepridava primou DB vazbu reply task -> onboarding task; vztah se odvozuje pres stejnou firmu (task_jira.party_id) a e-mailove vlakno.

AI Evidence Output Contract

Pro budouci AI klasifikaci requirement evidence jsou source of truth tyto JSON schema soubory:

  • reply-pilot-be/default-conf/ai-schemas/enlistment-table.schema.json
  • reply-pilot-be/default-conf/ai-schemas/feed.schema.json
  • reply-pilot-be/default-conf/ai-schemas/supplier-identifier.schema.json

Spolecna pravidla:

  • vsechny vystupy musi projit validaci proti schema pred zapisem do DB
  • additionalProperties jsou vsude zakazane, aby backend nemusel resit tichy drift
  • evidence_key a fact_key musi byt unikatni v ramci jedne AI odpovedi
  • confidence je vzdy 0..1
  • model_name, model_version a prompt_version se neberou z AI payloadu; doplnuje je backend pri zapisu
  • extract_json ma ukladat validovany fragment, ktery vedl ke vzniku konkretniho DB radku, ne jen volny textovy summary

AI Classification Runtime

reply-pilot-be exposeuje worker endpoint POST /api/requirements/ai-classify/run. Tenhle job:

  • vybira jen emaily, ktere uz prosly deterministic prefiltrem
  • scope ENLISTMENT_TABLE bere z party_requirement_evidence s created_by = deterministic-enlistment-detector
  • scope FEED bere z party_feed_fact s created_by = deterministic-email-parser
  • scope SUPPLIER_IDENTIFIER bere z party_supplier_identifier_fact s created_by = deterministic-email-parser
  • AI job znovu nepousti kandidaty, ktere uz maji AI materializovane radky s created_by = ai-requirement-classifier
  • pri chybe jedne AI klasifikace backend zaloguje chybu, preskoci konkretni email a pokracuje dalsim kandidatem

Structured output validace:

  • backend posila do OpenAI commitnute JSON schema z default-conf/ai-schemas/
  • OpenAI schema drzi jen subset kompatibilni s Responses API; podminena pravidla typu EMAIL_ATTACHMENT => attachment_* povinne nebo DIRECT_URL => url_* povinne hlida nasledna backend validace
  • po odpovedi dela vlastni strict validaci fixnich payload shape pravidel, bez runtime dependency na externi jsonschema knihovne
  • az po uspesne validaci zapisuje DB evidence/facts

Requirement Aggregation Runtime

reply-pilot-be exposeuje i worker endpoint POST /api/requirements/evaluate/run. Tenhle job:

  • cte party_requirement_eval_queue
  • claimuje pending polozky pres FOR UPDATE SKIP LOCKED
  • pro ENLISTMENT_TABLE agreguje historicke party_requirement_evidence
  • pro FEED agreguje historicke party_requirement_evidence a party_feed_fact
  • pro SUPPLIER_IDENTIFIER agreguje historicke party_supplier_identifier_fact
  • vysledek zapisuje do party_requirement_state
  • pro FEED propaguje i finalni resolved_delivery_mode_code a feed_url
  • pro SUPPLIER_IDENTIFIER propaguje resolved_value_type_code a hodnotu
  • pri manualnim override radek v party_requirement_state neprepisuje
  • pri chybe jedne queue polozky ji jen odlozi na pozdeji a zpracovani dalsich firem pokracuje
  • kdyz se zmeni REQUIREMENT_AGGREGATION_RULESET_VERSION, backend jednorazove znovu zaqueueuje vsechny firmy do party_requirement_eval_queue

Historical Requirement Backfill

Pro jednorazovy nebo strankovany backfill historickych threadu exposeuje backend endpoint POST /api/requirements/backfill/run.

Tenhle job:

  • umi zpracovat jednu firmu pres party_id nebo strankovanou davku pres after_party_id + limit
  • nejdriv znovu synchronizuje deterministic email artifacts a extracted facts nad historickymi thready dane firmy
  • preferuje thread snapshot z backend cache; kdyz chybi, spadne na DB-only fallback nad activity_email a activity_email_attachment
  • AI klasifikaci i requirement agregaci pousti jen nad vybranou sadou firem, ne globalne nad celou frontou
  • je idempotentni, protoze deterministic vrstvy delaji replace svych radku a agregace bezi pres existujici queue/upsert model
  • vraci operativni report poctu firem, threadu, synchronizovanych zprav, AI kandidatu a prepoctenych atributu

AI prompt refresh:

  • AI candidate scan bere v uvahu aktualni REQUIREMENT_AI_PROMPT_VERSION
  • pokud pro email existuji jen starsi AI radky s jinou prompt verzi, backend je povazuje za stale a zaradi email znovu do AI klasifikace

ENLISTMENT_TABLE

Schema:

  • top-level summary
  • pole evidence_items
  • kazdy evidence_item reprezentuje jeden kandidátní dukaz

Mapovani do DB:

  • pro kazdy evidence_item vytvor jeden radek v party_requirement_evidence
  • requirement_code = 'ENLISTMENT_TABLE'
  • source_type_code, verdict_code, confidence, reason se berou primo z itemu
  • activity_email_id a external_thread_id doplni backend z aktualniho importovaneho emailu
  • pokud je source_type_code = 'EMAIL_ATTACHMENT', backend prenese attachment_filename, attachment_relative_path a volitelne attachment_sha256
  • pokud je source_type_code = 'EMAIL_MESSAGE', attachment sloupce zustanou prazdne
  • extract_json na evidence radku ma obsahovat presny evidence_item a volitelne i top-level summary
  • top-level summary se nezapisuje primo do party_requirement_state; slouzi jako vstup pro pozdejsi agregacni worker

FEED

Schema:

  • top-level summary
  • pole evidence_items
  • pole feed_facts
  • kazdy feed_fact odkazuje na jeden evidence_item pres evidence_key

Mapovani do DB:

  • evidence_items se materializuji do party_requirement_evidence
  • requirement_code = 'FEED'
  • backend si vytvori mapu evidence_key -> party_requirement_evidence.id
  • pro kazdy feed_fact vytvori jeden radek v party_feed_fact
  • evidence_id se naplni z evidence_key; kdyz reference chybi nebo ukazuje na neexistujici key, cely AI payload se ma povazovat za nevalidni
  • delivery_mode_code, url_raw, url_normalized, confidence, reason se berou primo z factu
  • pri delivery_mode_code DIRECT_URL nebo LOGIN_REQUIRED musi byt vyplnene url_raw i url_normalized
  • pri IDENTIFIER_ONLY, MANUAL_EXPORT, NO_FEED nebo OTHER mohou byt URL prazdne
  • attachment metadata se do party_feed_fact kopiruji z navazaneho evidence_item; kdyz je zdrojem EMAIL_MESSAGE, zustanou attachment sloupce prazdne
  • extract_json na fact radku ma obsahovat konkretni feed_fact; extract_json na evidence radku konkretni evidence_item

SUPPLIER_IDENTIFIER

Schema:

  • top-level summary
  • pole evidence_items
  • pole identifier_facts
  • kazdy identifier_fact odkazuje na jeden evidence_item pres evidence_key

Mapovani do DB:

  • v aktualnim schema V1 se SUPPLIER_IDENTIFIER nematerializuje do party_requirement_evidence, protoze requirement_type zatim nema odpovidajici kod
  • backend proto validuje evidence_items, ale do DB zapisuje jen party_supplier_identifier_fact
  • evidence_key slouzi jako interní most pro kopirovani source metadata z evidence_items do fact radku
  • identifier_type_code, value_raw, value_normalized, confidence, reason se berou primo z identifier_fact
  • evidence_id zustava NULL
  • attachment_filename, attachment_relative_path a attachment_sha256 se kopiruji z navazaneho evidence_item, pokud je zdroj attachment
  • extract_json na fact radku ma obsahovat konkretni identifier_fact a volitelne odkaz na pouzity evidence_item
  • pokud se pozdeji prida requirement_type pro identifikatory dodavatele, schema neni potreba menit; backend jen zacne evidence_items materializovat i do party_requirement_evidence

Endpointy

  • GET /healthz
  • GET /api/meta
  • GET /api/inbox
  • GET /api/emails
  • GET /api/emails/<email_id>
  • GET /api/emails/<email_id>/attachments
  • POST /api/email-threads/<email_id>/reply-tasks
  • POST /api/companies
  • POST /api/people
  • PATCH /api/people/<party_id>
  • POST /api/parties/<party_id>/contacts
  • PATCH /api/parties/<party_id>/contacts/<contact_mech_id>
  • DELETE /api/parties/<party_id>/contacts/<contact_mech_id>
  • POST /api/companies/<party_id>/identifiers
  • DELETE /api/companies/<party_id>/identifiers/<identifier_id>
  • DELETE /api/companies/<company_id>/people/<person_id>
  • POST /api/mailbox/import
  • GET /api/mailbox/import/status
  • POST /api/mailbox/import/step
  • POST /api/cme/company-check/run
  • GET /api/lead-imports/files
  • POST /api/lead-imports/batches
  • GET /api/lead-imports/batches/<batch_id>
  • GET /api/lead-imports/batches/<batch_id>/items/<item_id>
  • POST /api/lead-imports/batches/<batch_id>/items/<item_id>/actions
  • POST /api/requirements/ai-classify/run
  • POST /api/requirements/evaluate/run
  • POST /api/requirements/backfill/run
  • POST /api/emails/<email_id>/drafts/reply
  • POST /api/emails/<email_id>/send/reply
  • POST /api/emails/<email_id>/debug/generate

Reply draft/send endpointy prijimaji puvodni JSON payload bez priloh. Pokud odpoved obsahuje prilohy, klient posila multipart/form-data se stejnymi textovymi poli recipient, subject, body_text, reply_to_message_id a opakovanym file polem attachments. Limit je 10 souboru, 10 MB na soubor a 20 MB celkem.

Party Mutation API

Backend exposeuje rucni mutace pro osoby, kontakty, identifikace a vazby nad existujicim party/contact modelem. Vsechny mutace bezi v jedne DB transakci a overuji, ze cilova party existuje, neni mergnuta a ma ocekavany typ.

  • POST /api/people zalozi osobu a volitelne ji navaze na firmu pres CONTACT_FOR.
  • PATCH /api/people/<party_id> upravi zakladni identitu osoby.
  • POST/PATCH/DELETE /api/parties/<party_id>/contacts... spravuje aktivni vazbu party na EMAIL nebo PHONE contact mechanism.
  • DELETE kontaktu nastavuje party_contact_mech.thru_date; contact_mech ani historie aktivit se nemazou.
  • POST /api/companies/<party_id>/identifiers podporuje WEBSITE, DOMAIN, GLN, VENDOR_CODE a EXTERNAL_ID; ICO/DIC zustavaji v editaci firmy.
  • DELETE /api/companies/<party_id>/identifiers/<identifier_id> maze party_identifier, protoze tabulka nema lifecycle sloupec.
  • DELETE /api/companies/<company_id>/people/<person_id> ukonci aktivni CONTACT_FOR vztah pres thru_date; osoba zustava zachovana.

Chyby vraci JSON se status: error a reason. Validacni chyby pouzivaji 400, konflikt duplicit 409 s volitelnymi candidates, chybejici zaznam 404 a nenakonfigurovana DB 503.