top of page

COSTOM logic automation

To address your objective of building a Python API that converts the logic from an Excel mapping specification into PHP code, I'll outline possible solutions, recommend the best approach, highlight free/open-source options, and provide a corrected PHP code example for the specific case of `CAR_STORAGE_GARAGE_PARKING_COUNT`.


### Possible Solutions to Automate Logic Conversion


1. **Rule-Based Parsing with Python (Using Libraries like `pandas` and `openpyxl`)**:

- **Description**: Use Python to read the Excel file (`pandas` or `openpyxl`), parse the `SystemName`, `FunctionName`, and comments, and generate PHP code based on predefined templates. For `CUSTOM` functions, interpret the comments using natural language processing (NLP) or regex-based parsing to extract logic (e.g., conditionals, string manipulations).

- **Pros**:

- Highly customizable and flexible.

- Can handle complex logic with proper parsing rules.

- Integrates well with Excel file formats.

- **Cons**:

- Requires robust parsing logic for varied comment formats.

- Maintenance overhead for new logic patterns.

- **Tools**:

- `pandas`/`openpyxl` for Excel processing (free, open-source).

- `re` (Python regex) for comment parsing (built-in).

- Optional: `nltk` or `spacy` for advanced NLP (free, open-source).


2. **Template-Based Code Generation (Using `Jinja2`)**:

- **Description**: Define PHP code templates for common logic patterns (e.g., conditionals, string append, null checks). Use `Jinja2` to populate templates with parsed values from the Excel file. Parse comments to identify the appropriate template and variables.

- **Pros**:

- Simplifies code generation with reusable templates.

- Reduces errors by standardizing output.

- **Cons**:

- Limited by predefined templates unless extended.

- Still requires comment parsing logic.

- **Tools**:

- `Jinja2` for templating (free, open-source).

- `pandas`/`openpyxl` for Excel reading.


3. **LLM-Assisted Code Generation (Using Open-Source Models)**:

- **Description**: Leverage a local or cloud-based open-source LLM (e.g., LLaMA, Mistral) to interpret comments and generate PHP code directly. Combine with `pandas` to read Excel and structure inputs for the LLM.

- **Pros**:

- Handles complex, varied comment formats with minimal manual parsing.

- Can generate idiomatic PHP code.

- **Cons**:

- Requires significant computational resources for local LLMs.

- Potential for inconsistent outputs unless fine-tuned.

- Dependency on model quality and prompt engineering.

- **Tools**:

- `transformers` by Hugging Face for local LLMs (free, open-source).

- `pandas` for Excel processing.


4. **Hybrid Approach (Template + LLM)**:

- **Description**: Use templates for common logic patterns and fall back to an LLM for complex or unrecognized comment logic. Parse Excel with `pandas`, classify comments with regex or simple NLP, and route to templates or LLM accordingly.

- **Pros**:

- Balances reliability (templates) with flexibility (LLM).

- Scales to new logic types without extensive rule updates.

- **Cons**:

- More complex to implement and maintain.

- LLM dependency for edge cases.

- **Tools**:

- `Jinja2` for templates.

- `transformers` for LLMs.

- `pandas`/`openpyxl` for Excel.


5. **Custom DSL (Domain-Specific Language)**:

- **Description**: Define a simple DSL for comments to standardize logic descriptions (e.g., `IF(GarageSpaces>0, GarageSpaces, ...)`). Parse the DSL with a Python library like `pyparsing` and generate PHP code.

- **Pros**:

- Highly structured and reliable.

- Eliminates ambiguity in comments.

- **Cons**:

- Requires modifying the Excel file format or preprocessing comments.

- Steep learning curve for users to adopt DSL.

- **Tools**:

- `pyparsing` for DSL parsing (free, open-source).

- `pandas` for Excel.


### Best Option and Free/Open-Source Recommendation


**Best Option**: **Hybrid Approach (Template + LLM)**

- **Why**: This approach combines the reliability of template-based generation for common patterns (e.g., conditionals, null checks) with the flexibility of an LLM for complex or custom logic. It minimizes manual parsing while handling the diverse comment formats in your examples (e.g., date splitting, string filtering, conditional logic). It also scales well as new logic types are added.

- **Implementation Outline**:

1. Use `pandas` to read the Excel file and extract `JSON Attribute Name`, `SystemName`, `FunctionName`, and comments.

2. Use regex or simple keyword matching to classify comments into common patterns (e.g., "pass thru", "append", "if-else").

3. Map classified patterns to `Jinja2` templates for PHP code generation.

4. For unrecognized or complex comments, pass to an open-source LLM (e.g., Mistral via `transformers`) to generate PHP code.

5. Validate generated PHP code syntax using a PHP parser (e.g., `nikic/php-parser`).

6. Expose the generation logic via a Python API (e.g., using `FastAPI`).


**Best Free/Open-Source Tools**:

- **Excel Processing**: `pandas` (Apache 2.0 license) or `openpyxl` (MIT license).

- **Templating**: `Jinja2` (BSD license).

- **LLM**: `transformers` by Hugging Face (Apache 2.0 license) with Mistral or LLaMA models (check model-specific licenses, e.g., Mistral’s Apache 2.0).

- **API Framework**: `FastAPI` (MIT license) for building the Python API.

- **PHP Parsing**: `nikic/php-parser` (BSD license) for validating generated PHP code.

- **Regex/Comment Parsing**: Python’s built-in `re` module (PSF license).


**Why These Tools?**:

- All are free, open-source, and widely adopted, ensuring community support and reliability.

- `pandas` and `Jinja2` are lightweight and sufficient for structured data and templating.

- `transformers` provides access to powerful LLMs for complex logic without proprietary dependencies.

- `FastAPI` is modern, fast, and ideal for exposing the generation logic as an API.


### PHP Code Example for `CAR_STORAGE_GARAGE_PARKING_COUNT`


The provided PHP code is mostly correct but has a few issues:

- The `Common::customIgnoreZeroes` function is undefined in the context provided, and its purpose is unclear.

- The empty string `''` for the `else` case may not align with the comment’s instruction to return `NULL`.

- It assumes `$record` and `$jsonRecord` are properly defined arrays.


Here’s a corrected and idiomatic PHP code example based on the comment logic:


```php

// CAR_STORAGE_GARAGE_PARKING_COUNT

if (isset($record['GarageSpaces']) && $record['GarageSpaces'] > 0) {

$jsonRecord['CAR_STORAGE_GARAGE_PARKING_COUNT'] = $record['GarageSpaces'];

} elseif (isset($record['CarportSpaces']) && $record['CarportSpaces'] > 0) {

$jsonRecord['CAR_STORAGE_GARAGE_PARKING_COUNT'] = $record['CarportSpaces'];

} elseif (isset($record['ParkingTotal']) && $record['ParkingTotal'] > 0) {

$jsonRecord['CAR_STORAGE_GARAGE_PARKING_COUNT'] = $record['ParkingTotal'];

} else {

$jsonRecord['CAR_STORAGE_GARAGE_PARKING_COUNT'] = null;

}

```


**Changes Made**:

- Added `isset()` checks to prevent undefined index errors if the keys are missing in `$record`.

- Replaced `Common::customIgnoreZeroes` with direct assignment, as the comment doesn’t specify additional transformations.

- Changed the `else` case to assign `null` (PHP’s `null` keyword) to match the comment’s instruction.

- Maintained the conditional priority (GarageSpaces → CarportSpaces → ParkingTotal → NULL).


**Assumptions**:

- `$record` is an associative array containing source data (e.g., `GarageSpaces`).

- `$jsonRecord` is an associative array for output JSON data.

- The values for `GarageSpaces`, `CarportSpaces`, and `ParkingTotal` are numeric (integers or floats).


### Additional Notes


- **API Design**: Your Python API could accept an Excel file upload, process each row, and return a ZIP file containing generated PHP files (one per `JSON Attribute Name`). Use `FastAPI` with endpoints like `/generate-php` and validate inputs with Pydantic.

- **Comment Parsing**: For the hybrid approach, start with regex to detect keywords like “if”, “else”, “pass thru”, “append”, and “split”. For example, the `CAR_STORAGE_GARAGE_PARKING_COUNT` comment can be parsed with a regex like `If (\w+) > 0 then pass in value` to extract variables and conditions.

- **LLM Integration**: If using an LLM, structure prompts like: “Convert this logic to PHP: If GarageSpaces > 0 then pass in value, Else if CarportSpaces > 0 then pass in value, Else if ParkingTotal > 0 then pass in value, Else NULL. Output should assign to $jsonRecord['CAR_STORAGE_GARAGE_PARKING_COUNT'].”

- **Testing**: Validate generated PHP code by running it through `nikic/php-parser` or executing it in a sandboxed PHP environment to catch syntax errors.



Recent Posts

See All
Mysql/Python for data extraction

Use case :— processing up to 3 million records (30 lakh) in MySQL with name/address preprocessing and splitting logic , here is a...

 
 
 

Comments


I Sometimes Send Newsletters

Thanks for submitting!

© 2023 by Mohammed Jassim

bottom of page